Hello everyone,
I’m currently working on an iOS app that uses Bluetooth to stream audio data from a Seeed Studio XIAO nRF52840 (Sense) board, which has a PDM microphone. The board is running the OMI Friend firmware, which is supposed to facilitate Bluetooth audio streaming. I’m trying to convert the audio data from PDM to PCM so that I can send it to Deepgram for speech transcription. However, I’m running into issues where the resulting audio is just different types of white noise, and Deepgram isn’t able to transcribe it.
Here’s the current context of my project:
- I’m using Swift for the iOS app, leveraging CoreBluetooth to connect to the board and Starscream to interact with Deepgram’s WebSocket API.
- The PDM audio data is sent from the XIAO BLE Sense board to the iOS device. I accumulate it and then attempt to convert it from PDM to PCM using a FIR filter in Swift.
- I then send the PCM data to Deepgram’s API for transcription.
Despite trying several things, I’m still only getting white noise as output. I’ve tried:
- Using a FIR filter to convert the PDM data to PCM with a Hamming window for coefficients.
- Converting the PCM data to little-endian format before saving it or sending it to Deepgram.
- Adjusting the PDM gain and FIR filter size to improve the audio quality.
Here is a summary of the issues I’m experiencing:
- When I inspect the
audio_dump.pcm
file in Audacity, I only see noise. I’ve imported the data with the following settings: Signed 16-bit PCM, Little Endian, 16000 Hz, Mono. - The Deepgram API does not produce any transcript, and instead, I’m getting blank responses.
- The audio data seems to lack any recognizable speech characteristics, which makes me think the PDM to PCM conversion isn’t working properly.
Relevant code snippet for the PDM to PCM conversion in Swift:
private func convertPDMToPCM(pdmData: Data) -> Data? {
let firLength = 64
guard pdmData.count >= firLength else {
print("Erreur: la taille des données PDM est insuffisante pour la conversion.")
return nil
}
var firCoefficients = [Float](repeating: 0, count: firLength)
vDSP_hamm_window(&firCoefficients, vDSP_Length(firLength), 0)
let pdmValues = pdmData.flatMap { byte -> [Float] in
return (0..<8).map { bitIndex in
((byte >> (7 - bitIndex)) & 0x01) == 1 ? 1.0 : -1.0
}
}
guard pdmValues.count >= firLength else {
print("Erreur: la taille des valeurs PDM est insuffisante pour la conversion FIR.")
return nil
}
let outputLength = pdmValues.count - firLength + 1
var pcmValues = [Float](repeating: 0.0, count: outputLength)
vDSP_conv(pdmValues, 1, firCoefficients, 1, &pcmValues, 1, vDSP_Length(outputLength), vDSP_Length(firLength))
let pcmSamples = pcmValues.map { value -> Int16 in
return Int16(clamping: Int(min(max(value, -1.0), 1.0) * 32767.0))
}
return convertToLittleEndian(data: Data(buffer: UnsafeBufferPointer(start: pcmSamples, count: pcmSamples.count)))
}
I’m not sure if:
- My PDM to PCM conversion logic is flawed or needs a more sophisticated filter.
- There is something wrong with the data processing steps (e.g., endian conversion).
- The PDM data I’m receiving from the XIAO board is configured incorrectly on the firmware side.
Has anyone faced similar issues when working with PDM microphones and trying to convert the data to PCM for speech recognition purposes? Any guidance on improving the quality of the PCM output or getting meaningful transcriptions would be greatly appreciated.
Thanks in advance for any help or pointers!