Wio Terminal App (rpcWiFi lib) sending telemetry data to Azure Storage Tables stops working after many uploads

I’m working on an App which is sending telemetry data via secure https Post requests (rpcWiFi HTTPClient) to Azure Storage Tables every minute and actualizes system time every 5 minutes via NTP.
-https://github.com/RoSchmi/AzureDataSender_Wio_Terminal
Actually with RTL8720 firmware v2.1.0 and the actual libraries as of 31.12.2020 everthing works fine but after many successful uploads (more than 100, one time more than 500) the transmission stops as all successive post requests return with error code -1.
The program loop is still running and new tries to upload are initiated in the expected intervals.
Can anybody give some advice what might be the probable cause of this issue and how to debug and solve the issue?
(Memory leak? Race condition?)
Did anybody experience similar problems?

I made some more investigations:

  1. I synchronized SystemTime with the DateTime value which is returned in the Date header of the received response on my post requests.
    -> Since the issue persisted the NTP requests used before cannot be the (only) cause.

  2. I included a NVIC_SystemReset(); after a failed request:
    -> The system rebooted as expected after a failed request and then performed normal operation for many uploads (up to about 300)

  3. I included a watchdog in my application (I saw up to now no wathdog resets)

  4. I activated the logging functions in rpcUnified (I observed no irregularities when I used https protocol)

  5. I tried with http instead of https
    Surprisingly I saw many failed requests (often after only a few successful operations).
    -> In the logging output I could see that often the DNS request were not successful (error code -5)

Any ideas how to find the cause of the sporadic failed https post requests and why the App (without rebooting) doesn’t find back to normal operation after the first failed request?

Best regards
RoSchmi

This time, when I had it running over night, I had 450 successful uploads, then there was a WatchDog Reset and the App continued to work after reboot.
The logging output seems to show that the start_ssl_client function didn’t return.
ssl_failure

Hi @RoSchmi

This is a common issue at the moment where it stops at start_ssl_client function.

If you try this demo and output the debug log, the problem is the same as yours.

We are testing whether this problem is with the RTL SDK or with our communication protocol itself. If it is a problem with the RTL SDK, we will get in touch with Realtek to solve this issue.

Best Regards,
Lakshantha

Hi @Iakshan,
I’m happy to see that you are searching the cause and I’m sure you’ll find it soon.
My investigations let me think that it might be a similar reason as posted here:

At least if I add a ‘_connected = false’ in the WiFiClientSecure::stop() routine, my application continues to work after a failed upload.
(However it may be that the true cause is somewhere else)
Best regards
RoSchmi

Good news !! Now with new RTL8720 firmware (seeed-ambd-firmware) v2.1.1 in a new test run until now 650 successful uploads (https) and still running. There was one failed upload (is expected to can happen) but the program continued to perform successive uploads as expected. Watchdog was not activated.

that’s great! This new firmware is meant to fix the SSL issues.

Hope you can keep running your demo

Unfortunately I was happy too early. Stopping of uploads seems to occure now much less often but very rarely my App still stops uploading (all subsequent push request return with error code -1) while the loop is still running.
Does it work endlessly on your side with firmware v2.2.1?

Hi @RoSchmi,
Could you try changing the delay value here Seeed_Arduino_rpcUnified > src > erpc > erpc_arduino_uart_transport.cpp (line 299) on your end and test it? The delay seems to be the issue. When the delay is too high, it seems to be slow, and when the delay is too low, it seems to be unstable.

Best Regards,
Lakshantha

Thanks, I’ll try, actually 5 ms delay.
BtW: Is there a way to reset the MCU -> RTL8720 connection without rebooting the complete board. So one could start a retry of the ssl transmission and would not loose data in ram.

Hi @RoSchmi,
Please add the following codes where you need to restart the MCU -> RTL8720 connection.

 pinMode(RTL8720D_CHIP_PU, OUTPUT); 
 digitalWrite(RTL8720D_CHIP_PU, LOW); 
 delay(500); 
 digitalWrite(RTL8720D_CHIP_PU, HIGH);  
 delay(500); 

Best Regards,
Lakshantha

1 Like

Thanks @lakshan . Unfortunately this code alone didn’t work to recover the connection. Probably it would also be needed to reinitialize the serial port of the SAMD51 MCU.
I think it would be too difficult to find out how this could be done.
Best Regards
RoSchmi

Hi @RoSchmi,
You need to initialize the Wi-Fi libraries again after the codes that I have mentioned above.

Best Regards,
Lakshantha

Thanks @lakshan, I’m not sure which code you mean with:

However, before trying this I would like to rule out that there is a memory leak causing the issue.
Do you have some routines which you use on the SAMD51 to test for memory leaks?

@RoSchmi,
I mean these codes.

 pinMode(RTL8720D_CHIP_PU, OUTPUT); 
 digitalWrite(RTL8720D_CHIP_PU, LOW); 
 delay(500); 
 digitalWrite(RTL8720D_CHIP_PU, HIGH);  
 delay(500); 

You need to reinitialize the Wi-Fi libraries after the above codes.

To be clear:

  1. Include the above codes inside your demo, where you need to restart the MCU -> RTL8720 connection.
  2. Reinitialize the Wi-Fi libraries

Will get back to you if there is any routine to use on the SAMD51 to test for memory leaks.

1 Like

I think, that I have already found some code;
Edit: The code I posted here before didn’t work for me as expected.
Now I’m using this code in my loop to check if there is a memory leak:
(depends on rpcWiFi library)
Edit2: Using pvPortMalloc() is not correct, malloc() has to be used, code is corrected

  // Keep track of tries to insert and check for memory leak
  insertCounterAnalogTable++;
uint32_t * ptr_one = (uint32_t *)malloc(100);
free(ptr_one);
if (insertCounterAnalogTable == 1)
{          
     referenceHeapAddr = (uint32_t)ptr_one;
}
lostLeakageBytes =  (uint32_t)ptr_one - referenceHeapAddr;
char buf[25] {0};
sprintf(buf, "Allocating at: %10X", (unsigned int)ptr_one);
Serial.println(buf);
sprintf(buf, " Lost %i",  lostLeakageBytes);
Serial.println(buf);

Result: There seems to be no memory leak :grinning:

that’s great. Thanks for sharing!

Hi @lakshan,
I made tests with 5ms and 6ms delay. Didn’t solve the issue.
In my application it is not needed to store data in Ram permanently, so I can use the watchdog or reboot after a failed upload. With these measures I can do my uploads permanently. For other applications I will have to wait on your team to solve the problem.
Best Regards
RoSchmi

Hi @RoSchmi,
Could you test with a higher delay such as 10ms or 20ms?

Great to hear that you have found a workaround for your application!

Can I know what other applications are you referring to?

Best Regards,
Lakshantha

Hi @lakshan, thanks for your interest,

yes, I can test with 10 or 20 ms if needed.

Actually my App is running flawlessly (delay 4 ms) for 9 hours without using the watchdog, that’s a new record !! :smiley: It’s a pity that I have to stop it now to continue programming.
Did you make any changes in the rpcUnified library update of yesterday which could have been a game changer?

I still have no other serious applications. If I had an idea of an internet using App which has the need of keeping data in the Ram I would not start before the platform has proved to run stable over longer periods of time.