Any suggestions on how to increase the voice recognition accuracy in noisy environment?

I would like to put the mic in a noisy environment, e.g. streets, malls.

What would you suggest to increase the accuracy to capture the user’s voice?

Hi there,



Here are all the parameters. Please check the detail functions at right side. thanks.



pi@raspberrypi:~/usb_4_mic_array $ python tuning.py -p

name type max min r/w info

-------------------------------

AECFREEZEONOFF int 1 0 rw Adaptive Echo Canceler updates inhibit.

0 = Adaptation enabled

1 = Freeze adaptation, filter only

AECNORM float 16 0.25 rw Limit on norm of AEC filter coefficients

AECPATHCHANGE int 1 0 ro AEC Path Change Detection.

0 = false (no path change detected)

1 = true (path change detected)

AECSILENCELEVEL float 1 1e-09 rw Threshold for signal detection in AEC [-inf … 0] dBov (Default: -80dBov = 10log10(1x10-8))

AECSILENCEMODE int 1 0 ro AEC far-end silence detection status.

0 = false (signal detected)

1 = true (silence detected)

AGCDESIREDLEVEL float 0.99 1e-08 rw Target power level of the output signal.

[−inf … 0] dBov (default: −23dBov = 10log10(0.005))

AGCGAIN float 1000 1 rw Current AGC gain factor.

[0 … 60] dB (default: 0.0dB = 20log10(1.0))

AGCMAXGAIN float 1000 1 rw Maximum AGC gain factor.

[0 … 60] dB (default 30dB = 20log10(31.6))

AGCONOFF int 1 0 rw Automatic Gain Control.

0 = OFF

1 = ON

AGCTIME float 1 0.1 rw Ramps-up / down time-constant in seconds.

CNIONOFF int 1 0 rw Comfort Noise Insertion.

0 = OFF

1 = ON

DOAANGLE int 359 0 ro DOA angle. Current value. Orientation depends on build configuration.

ECHOONOFF int 1 0 rw Echo suppression.

0 = OFF

1 = ON

FREEZEONOFF int 1 0 rw Adaptive beamformer updates.

0 = Adaptation enabled

1 = Freeze adaptation, filter only

FSBPATHCHANGE int 1 0 ro FSB Path Change Detection.

0 = false (no path change detected)

1 = true (path change detected)

FSBUPDATED int 1 0 ro FSB Update Decision.

0 = false (FSB was not updated)

1 = true (FSB was updated)

GAMMAVAD_SR float 1000 0 rw Set the threshold for voice activity detection.

[−inf … 60] dB (default: 3.5dB 20log10(1.5))

GAMMA_E float 3 0 rw Over-subtraction factor of echo (direct and early components). min … max attenuation

GAMMA_ENL float 5 0 rw Over-subtraction factor of non-linear echo. min … max attenuation

GAMMA_ETAIL float 3 0 rw Over-subtraction factor of echo (tail components). min … max attenuation

GAMMA_NN float 3 0 rw Over-subtraction factor of non- stationary noise. min … max attenuation

GAMMA_NN_SR float 3 0 rw Over-subtraction factor of non-stationary noise for ASR.

[0.0 … 3.0] (default: 1.1)

GAMMA_NS float 3 0 rw Over-subtraction factor of stationary noise. min … max attenuation

GAMMA_NS_SR float 3 0 rw Over-subtraction factor of stationary noise for ASR.

[0.0 … 3.0] (default: 1.0)

HPFONOFF int 3 0 rw High-pass Filter on microphone signals.

0 = OFF

1 = ON - 70 Hz cut-off

2 = ON - 125 Hz cut-off

3 = ON - 180 Hz cut-off

MIN_NN float 1 0 rw Gain-floor for non-stationary noise suppression.

[−inf … 0] dB (default: −10dB = 20log10(0.3))

MIN_NN_SR float 1 0 rw Gain-floor for non-stationary noise suppression for ASR.

[−inf … 0] dB (default: −10dB = 20log10(0.3))

MIN_NS float 1 0 rw Gain-floor for stationary noise suppression.

[−inf … 0] dB (default: −16dB = 20log10(0.15))

MIN_NS_SR float 1 0 rw Gain-floor for stationary noise suppression for ASR.

[−inf … 0] dB (default: −16dB = 20log10(0.15))

NLAEC_MODE int 2 0 rw Non-Linear AEC training mode.

0 = OFF

1 = ON - phase 1

2 = ON - phase 2

NLATTENONOFF int 1 0 rw Non-Linear echo attenuation.

0 = OFF

1 = ON

NONSTATNOISEONOFF int 1 0 rw Non-stationary noise suppression.

0 = OFF

1 = ON

NONSTATNOISEONOFF_SR int 1 0 rw Non-stationary noise suppression for ASR.

0 = OFF

1 = ON

RT60 float 0.9 0.25 ro Current RT60 estimate in seconds

RT60ONOFF int 1 0 rw RT60 Estimation for AES. 0 = OFF 1 = ON

SPEECHDETECTED int 1 0 ro Speech detection status.

0 = false (no speech detected)

1 = true (speech detected)

STATNOISEONOFF int 1 0 rw Stationary noise suppression.

0 = OFF

1 = ON

STATNOISEONOFF_SR int 1 0 rw Stationary noise suppression for ASR.

0 = OFF

1 = ON

TRANSIENTONOFF int 1 0 rw Transient echo suppression.

0 = OFF

1 = ON

VOICEACTIVITY int 1 0 ro VAD voice activity status.

0 = false (no voice activity)

1 = true (voice activity)



Seeed techsupport team

Bill

Hi,

Have you found a solution?

If found, can you provide some reference?

Thank you~

Hi there,



please try below settings and see any improvement. thanks.



debian@beaglebone:~/usb_4_mic_array$ sudo python tuning.py STATNOISEONOFF

STATNOISEONOFF: 1

debian@beaglebone:~/usb_4_mic_array$ sudo python tuning.py MIN_NN 1.0

MIN_NN: 1.0

debian@beaglebone:~/usb_4_mic_array$ sudo python tuning.py GAMMA_NS 3.0

GAMMA_NS: 3.0



i use the 2k sine signal as background noise(first graph)

<LINK_TEXT text=“https://github.com/SeeedDocument/forum_ … e_tone.wav”>https://github.com/SeeedDocument/forum_doc/raw/master/reg/2k_sine_tone.wav</LINK_TEXT>



i record it with 1 channel firmware with STATNOISEONOFF = 0 setting as second graph.

<LINK_TEXT text=“https://github.com/SeeedDocument/forum_ … 2ksine.wav”>https://github.com/SeeedDocument/forum_doc/raw/master/reg/off_2ksine.wav</LINK_TEXT>



i record it with 1 channel firmware with above setting as third graph. it is much better.

<LINK_TEXT text=“https://github.com/SeeedDocument/forum_ … 2ksine.wav”>https://github.com/SeeedDocument/forum_doc/raw/master/reg/on_2ksine.wav</LINK_TEXT>