AEC when source of audio is very close

Hi, we are building a smart speaker using respeaker v2 4 mic array for audio input and snowboy for hotword recognition. We also have a speaker with a sound deflector. All inside an enclosure.

The normal operation works ok (i.e, recognising the hotword and getting the audio response). However if we try to say the hotword during the previous question playback (to stop the playback and issue a new question) it doesn’t pick up the word.

We DFU the 6 channel firmware and we realised that the channel 5 (audio output from respeaker) is very low in comparison with the audio that is being picked by the four mics. The channel 0 (processed ASR) is too distorted and impossible to discern the hotword during playback.

It’s a tricky situation, because if we disable the AGCONOFF and we put a lower gain to avoid distortion then the capture doesn’t pick up the hotword. We checked also some other parameters for the tuning but honestly we don’t really now what they do, seems the documentation here is sparse.

Is there any advise or configuration suggestion from you guys? In general, would you say that AEC should work when the audio source is loud and really close to the mics?
