Practical use case for onboard VAD

Hi,
I have a few follow up questions about this example

I see it is using is_voice. There is also a is_speech. Can anyone say what the difference is between the two?

For detecting the start of speech and the end of speech which do you recommend? Based on the example, I assume is_voice

Based on the example above, Is it safe to say the value returned by is_voice is a comparison of the instantaneous amplitude of the signal measured at that time ( by the mic). If it is above (1) or below (0) the threshold?

Or are the measurements filtered in some way (perhaps the one this is the filter mention here which “can’t be set”)

If it cant be set, does anyone know what it set to by default?

Thanks,
spencer