Practical use case for onboard VAD

I’m trying to understand how we’d use the onboard VAD functionality. As far as I currently understand, we’d have to poll the device to read the VAD voice activity status, but how would we associate this value to the audio data we are receiving. For example, lets say we chose to use python’s pyaudio and have setup our callback to be invoked every 100ms of audio. How long does this VAD voice activity status parameter persist? How do we associate it with what we’re eventually getting though, pyaudio, for example?



I guess, my question put another way is this: “How was the onboard VAD functionality meant to be used?”



Thank you,



Marc

Hi Dear Customer, We can use below code to read the VAD status.

-------------------------------------------------------------------------------------------

from tuning import Tuning

import usb.core

import usb.util

import time



dev = usb.core.find(idVendor=0x2886, idProduct=0x0018)

#print dev

if dev:

Mic_tuning = Tuning(dev)

print Mic_tuning.is_voice()

while True:

try:

print Mic_tuning.is_voice()

time.sleep(0.001)

except KeyboardInterrupt:

break

-------------------------------------------------------------------------------------------



For the threshold of VAD, we also can use the GAMMAVAD_SR to set.

-------------------------------------------------------------------------------------------



GAMMAVAD_SR float 1000 0 rw Set the threshold for voice activity detection.

[−inf … 60] dB (default: 3.5dB 20log10(1.5))

-------------------------------------------------------------------------------------------



thanks.



Seeed techsupport Team

Bill

Is there no way to use the integrated VAD with a normal recording routine? For now, the array blinks for every key stroke on the keyboard. In other words it recognizes every noise. It would make sense to have it filter input though the VAD to only become active if voice has been recognized … ?



m.

Hi, Sorry we can’t set the filter. thanks for understanding.



Seeed techsupport team

Bill

Hi,
I have a few follow up questions about this example

I see it is using is_voice. There is also a is_speech. Can anyone say what the difference is between the two?

For detecting the start of speech and the end of speech which do you recommend? Based on the example, I assume is_voice

Based on the example above, Is it safe to say the value returned by is_voice is a comparison of the instantaneous amplitude of the signal measured at that time ( by the mic). If it is above (1) or below (0) the threshold?

Or are the measurements filtered in some way (perhaps the one this is the filter mention here which “can’t be set”)

If it cant be set, does anyone know what it set to by default?

Thanks,
spencer

My primary concern is understanding how to effectively associate the VAD status with the audio data received through the module. Specifically, I’m using Python’s pyaudio library and have set up a callback to process audio chunks every 100ms. How can I ensure that the VAD status aligns correctly with the corresponding audio chunks?