OCR for Grove AI V2

mk123233 · August 9, 2024, 11:40am

INTRO
Hi everyone,
I was wondering if you guys could provide me with some guidance regarding a hobby project that I’m working on.
I am still quite new to the embedded world but I know the basics and got my hands on several projects. I have my fair share of knowledge in ML but no expertise in it either. I’m also new to the community, so here’s for my first post
QUESTION
Can Optical Character Recognition be done on the Grove AI V2? I successfully created a custom model using the Colab templates provided by SSCMA, which performs recognition of alphanumeric classes, but when I tried it in action, it gave nonsensical results. My hypothesis is that the 192x192 size of the image the NN gets is insufficient to process words let alone a whole document. From my understanding, the grove is incapable of processing the full image per operation since it has only 2 mb sram.
TASK SPECIFICS
For specifics, I am trying to read a word that is located above the user’s finger wherever he points to the document. I already made an algorithm that crops out the word using OpenCV with colour thresholding and some histogram analysis. I thought of maybe building a pipeline which takes the cropped image and feeds it to the NN but again I’m not sure it’s possible, hell, maybe the hardware just doesn’t fit this type of task. If anyone has any suggestions or ideas of how this could be done I would be very appreciative, thanks a lot!

PJ_Glasso · August 9, 2024, 3:09pm

Hi there,
Sounds like a fun project I would recommend you scale it back and try two approaches.
First just use one Letter, Train your model with several samples or fonts. Then add the "Gesture "
looking for the finger point gesture. Scale it back and get the first thing working first then proceed.
Add the ESP32S3 as the Xiao more ram and Horsepower in that. After some success copy the models to the flash and try inferencing that!

would be my first recommend.
HTH
GL PJ