Capture Video on Vision AI V2 and Run Object Detection Simultaneously

marcusob · January 7, 2026, 3:04am

I have a camera connected to the AI Vision V2, and a gesture model loaded. I also have an XIAO ESP32-S3 connected to the AI Vision v2 with an Micro SD Card also plugged into the AI Vision V2.

Is it possible to capture and record the video stream from the camera connected to the AI Vision V2 ? I want to record the video stream on the SD card on the AI Vision V2.

To turn on and off recording of the video stream I want to use Gestures. I can already detect the rock,paper,scissors gestures using the gesture AI model on the AI Vision V2 and capture the output on the ESP32-S3. So can I

1. Record Audio and Video straight to the SD card on the AI Vision V2?

2. Also run on the AI vision 2 the gesture AI model to capture gestures and send them to the ESP32-S3 (like the demo)

3. If the gesture is a paper hand sign toggle the Enable and Disable of the recording on the SD card ?

olivia_49 · January 7, 2026, 7:31pm

marcusob:

I have a camera connected to the AI Vision V2, and a gesture model loaded. I also have an XIAO ESP32-S3 connected to the AI Vision v2 with an Micro SD Card also plugged into the AI Vision V2.

Is it possible to capture and record the video stream from the camera connected to the AI Vision V2 ? I want to record the video stream on the SD card on the AI Vision V2.

To turn on and off recording of the video stream I want to use Gestures. I can already detect the rock,paper,scissors gestures using the gesture AI model on the AI Vision V2 and capture the output on the ESP32-S3. So can I

Record Audio and Video straight to the SD card on the AI Vision V2?

Also run on the AI vision 2 the gesture AI model to capture gestures and send them to the ESP32-S3 (like the demo)

If the gesture is a paper hand sign toggle the Enable and Disable of the recording on the SD card ?

At the moment, AI Vision V2 is really meant for running models and sending results, not for recording full audio/video streams to the SD card. You can run the gesture model and send gestures to the ESP32-S3, but using gestures to control recording would be better handled on the ESP32 side rather than directly on the AI Vision V2.

marcusob · January 7, 2026, 9:28pm

Thanks for the response, I will be handling the controlling of the camera on the ESP32 by condition of what gesture comes from the AI Vision, but the AI Vision has an SD card slot and a camera connected directly to it, so is it possible to record all the camera frames from the attached camera to the SD card all on the Ai Vision ?

If not, can I use an XIAO ESP32-S3 Sense with a camera attached to it, then record this stream on the ESP32-S3 Senses SD card, and pass a single frame every 30 frames from the ESP32-S3 sense to the AI Vision for inference/gesture identification ? So is it possible to stream a frame into the AI Vision via i2c/uart/usb etc rather than just have a camera supply the frames to the Ai vision with CSS ?

olivia_49 · January 10, 2026, 7:34pm

Yeah, the AI Vision V2 really isn’t meant to be a video recorder, even though it has a camera and an SD slot. It’s built to run models and send back results, not to save a full camera stream. The clean way to do this is to let the ESP32-S3 Sense handle the camera and SD card recording, then send frames to the AI Vision V2 for gesture detection and use those results to decide when to start or stop recording.