Need help programming respeaker lite with esp-idf to build a voice assistant - has anybody done it?

wukong · September 28, 2025, 7:01pm

Hello,

I’m new to embedded hw and embedded programming, did a bunch of C back in the days, but I do AI stuff for a living and I’m really fed up with google dot or alexa. I want a real personal voice assistant in the house just like I have on my phone, but I’m not interested in home automation at this time.

After a lot of research I landed on the respeaker lite + xiao esp32-s3 as the optimal platform. Further research pointed out that programming with the esp-idf as opposed with the arduino compatible module was going to be the way as you can make the most of the available hardware, so that’s what I went with.

However I’ve not been able to get even a basic example working. I managed to build a simple firmware that records 10s and then plays back, altho I have lots of issues with volume. I’ve also managed to use esp-skainet to get wakeword going, but I cannot figure out how to combine the two together.

The respeaker should be doing all the audio magic in hw, which is why I picked it, so I don’t think the code I’ve seen from espressif for the korvo works as they do AEC , beam-forming etc in software, which steals CPU from other AI tasks I wanna run.

Any help would be greatly appreciated, hoping to open source all of it once there’s something working, but right now it’s just sweat and tears with no results.

Is this actually doable in my situation?
thanks

Toastee0 · September 30, 2025, 1:23pm

You can Cheat, open both samples in vscode and make Claude show you how to merge them via co-pilot. There’s also a few people working on the re speaker on the discord. Join us over there

wukong · October 1, 2025, 3:39am

thanks @Toastee0 , that’s what I did to get the recording sample going, but I don’t know where the issue about volume is coming from and I didn’t see a sample for the wake word. And yes I’m on discord, I came over to the forum because it’s been weeks with no answers.
thanks again for taking the time to respond.

Toastee0 · October 1, 2025, 3:47am

You’re working on difficult stuff, I was working on doing similar things, but I’m gone down the nrf54L15 rabbit hole and haven’t been focusing on the audio stuff for a few weeks.
Though I was trying to use a rp2350 to do it.

The volume issue can be a bunch of different stuff, from the amp circuit, to how you encode the audio, I don’t know enough to help you effectively. But I do have the lite and an ESP or 10 I can help you test stuff with.

Open source works if we all try to lend a hand… even if we’re not the experts…

PJ_Glasso · October 1, 2025, 2:47pm

Hi there

SO , I have been lurking on this one I would have to agree with @Toastee0 It’s a TASK in the brain to put all that together. I would also agree the HARDWARE should do most of the magic I believe it does, However I know it doesn’t come that way too.
Both of your objectives are reasonable and doable, but requires some fine threading the two together without breaking the hardware or at minimum working against it…
There are many threads on the Hardware as I’m sure you’re aware. In those are a few Nuggets.
Start by understanding there are 2 firmware pieces at play. one for the Hardware and one on the Xiao.
One firmware for the Hardware , makes it a speaker d/a or a/d (i2S) I forget which. The second makes it an Alexa compatible node/speaker/IOT dev.
Those both can be modified with limits AFAIK.
The Xiao is the Second piece , requires firmware (I2S mode) or IDK (DEV) mode where I think your project fits in.
There are also a number of Videos on the Hardware subject too, Look for those, some links in them may prove useful.
Obviously start with the WiKi, I see they have also added another board to that family. SOme hints in it’s info may tell you what they found the older hardware couldn’t do ? maybe?

It’s a great area of Technology, keep pushing and looking. The ESP_IDF is were you want to develop this kind of thing though, none of the other Dev platforms can support IMO this niche area.
Espressif has more for it, currently. i.e. wake word, speech libraries, etc.
I got as far as the alexa speaker as that’s what I was looking for, I could build it (add-it) into a device (smart parts cabinet/locker)
" Mevlin, where are the 2N222 transistors located ? ",
RE: "SMD or through hole leads? " ,
“SMD”
… RE: " Shelf 3 - Drawer_4 ’ IN the back ", " and your almost out of Solder paste "
“and btw you are lazy” …(made that up)
Why not, LLM local… good Voice recognition and tight audio, IOT is a beautiful thing.

A lot of user have gotten tripped up with the wrong firmware and even on the wrong piece (some bricks) so READ everything and ask questions, more folks will chime in.

HTH
GL PJ

I have seen the Wake word MLmodel Demo on the Xiao so it can be done.

wukong · October 6, 2025, 7:04pm

hey guys, really appreciate you all chiming in. I guess my biggest issue is that if I want to see this project being successful, and I do, I need to take a step back and figure out embedded development.

I’m used to be able to dump system state and log stuff and tcpdump network packets so that if I really want to I can see the inner working of the system any time. I’ve no idea how to do this with embedded. I tried to add logging, but the results was that loggin was consuming too much resources and creating problems with the audio processing, so I was adding issues instead of solving them.

any tips of guide on setting up a good development pipeline?

thanks