Seeeduino LoRaWAN mysteriously halts after hours of running

I have a use case where a Seeeduino needs to run for weeks unattended, mostly sitting in a delay loop. I’ve added the SAMCrashMonitor library and it is working well.

Frustratingly I’ve discovered that the device can still get locked up, after about 3 hours of operation, while doing nothing more than the following:

SerialUSB.println("Delaying...");

for(int i=0; i<10*60; i++) //wait 10 minutes
{
  SAMCrashMonitor::iAmAlive();
  delay(1000);
}

The symptoms are that the delay loop never ends, the CHG charge status LED stops blinking, and the L LED begins blinking rapidly. Power cycling or uploading a new image to flash restores operation.

How do I avoid getting into this state, or otherwise provide reliable long term operation?

It just happened again, this time 2 hours and 18 minutes after booting. Some more data:

  • The CHG charge status LED did not stop blinking. I think I was mistaken before.

  • The red RX LED is blinking rapidly this time. It seems to be at the same rate as the L LED, exactly out of phase.

  • This time the USB cable was plugged in and connected to a computer. I had suspected the issue was to do with the USB cable not being plugged in, but this rules that out.

  • Since the USB cable was plugged in, I can narrow down the halt point in the code:

    for(int i=0; i<10*60; i++) //wait 10 minutes
    {
      SAMCrashMonitor::iAmAlive();
      delay(1000);
        
      if(i%2) //Alternate between two colours to show activity.
        setRgbLed(rgbDarkGreen);
      else
        setRgbLed(rgbGreen);
    
      SerialUSB.print(i);
      if(i%60)
        SerialUSB.print(",");
      else
        SerialUSB.println();
    }
    

The LED is stuck on rgbGreen and last thing printed was 331,. So that means we’re a bit more than half way through the loop, and are stuck somewhere between SerialUSB.print(","); and setRgbLed(rgbDarkGreen);. The setRgbLed function has no loops, never calls iAmAlive() and just has three calls to the Adafruit_NeoPixel library. The LED is connected to pin 4.

I’m at a loss to what state the device could be in. Any ideas?

The L and RX LED blinking is a perfect 25Hz square wave (20ms on, 20ms off).

The USB D+ and D- lines look like this (CH1 in both shots - couldn’t get both probes on at once!). Nothing appears in any serial reading application.

For other’s benefit, the resolution I’ve implemented is to add a definition for _wrap_body to my sketch that has the serial detection stuff removed:

void _wrap_body()
{
  setup();
  for (;;)
  {
    loop();
/*
    yield(); // yield run usb background task
    if (serialEventRun)
      serialEventRun();
*/
  }
}

and then flash the resulting hex file over the top of the bootloader using the SWD pins via a JLink programmer.

I suspect the behaviour I was seeing was the bootloader jumping in and waiting for a new program instead of running the existing one. Now I’ve made these changes the device no longer halts. It does still reboot occasionally and the cause of that is not yet clear (seems power supply related) but at least I can now make progress.

> “It does still reboot occasionally…”

Have you tried connecting a LiPo battery? This might help boost the onboard 3V3 conversion and smooth out any glitches from USB power.

Adding a battery is impractical in our use case and we don’t use USB power. Nonetheless I can now confirm that it wasn’t related to fluctuations in the 5V supply, but to the Vin rail generated by the battery charger IC. I’ve documented these findings here. I’ve since bridged out the battery charger IC and the problem has never returned.

Aye, different solutions to the same problem… I think this could indicate that your system load is sometimes exceeding the DC-DC conversion current-limit so causing the Vin/SYS voltage to drop. (I’ve actually found this is more acute in the Seeduino Lotus M0+, which uses a different MP2617B power chip to the ETA6003 in the SeeedUino LoRaWan.)

In this situation, either of these power chips can draw extra current from the battery to support the rail voltage, which is the work-around I’ve been using to date.

Could you please share details of how to implement your ‘bridge’ solution? I’m interested because it might save me a (battery) component for scenarios where I have a reliable 5V supply. Thank you, Dave.

I investigated excess current but couldn’t find any evidence of it - the Vin/SYS rail only supplies internal circuits, and I couldn’t identify any change to the load that corresponded with the voltage drop.

To implement the bridge I simply removed U10 (the ETA6003 battery charger IC) and shorted R46 to connect the 5V rail directly to Vin.

Thank you, that’s useful info.

Our case is possibly different to your own… we have some fairly steady external load connected to Grove3V3, then some intermittent load, e.g. on Lora3V3 when we transmit a message, and on 3V3 when we toggle LED’s.

Hi all,

I am experiencing a similar issue with approximately 24 boards that measure CO2 and CH4.These boards are connected to both electricity and a LiPo battery. While the LiPo battery has helped reduce the frequency of the problem, it hasn’t completely resolved it. Most of the boards run fine, but occasionally they stop sending messages and fail to write data to the SD card. The occurrence of these stoppages seems to be random, as some boards are unaffected. I suspect that the problem is related to power, but I would appreciate any ideas or suggestions on how to address this issue.

Thank you!

I suppose it’s possible that the presence of the LiPo has eliminated the power issue and you’re now chasing some other issue? Some ideas for investigation:

  1. Get a board that is affected, put basic monitoring code on it like in my earlier post, and leave it running. If it still hangs then you can probably rule out a bug in your software and will get some more information on the nature of the hangs.
  2. Get a board that is affected, remove the LiPo, do the bridge modification I described, and leave it running your application code. If it still hangs then you can probably rule out a power issue, and might be able to dig deeper into your own code.