5000 microsecond delay inaccurate with either delayMicroseconds() or nrf_delay_us()

I’m using a Seeed Xiao BL:E Sense board with nrf52840 on it. I observed the 5 millisecond delay being off a lot. It was often > 8 ms and changed each time. I used both delayMicroseconds() and nrf_delay_us(). The behaviors were the same. I also tried to disable interrupts just in case, but got no effect. BTW, I measured the delay time by sending pulses through a debug pin.

#define DBG_PIN 0

void DBG_pulse(void) {
  digitalWrite(DBG_PIN, HIGH);
  digitalWrite(DBG_PIN, LOW);
  return;
}

void onBLEWrite(uint8_t* buffer, size_t size) {
    ...
    noInterrupts();
    DBG_pulse();
    delayMicroseconds(5000);
    //nrf_delay_us(5000);
    DBG_pulse();
    interrupts();
    ...
}    

I noticed if I set the delay to under 3 ms, it seemed working fine. But if I set a 3 ms delay followed by a 2 ms delay, the actually delay went back to > 8 ms. Does someone have any ideas?

Looks like ArduiniBLE BLE.begin() caused the issue. Once BLE.begin() is called in setup(), delayMicrosecond() starts to be off when the delay is longer than ~4700 us. Any idea on how to resolve this?