Kernel level issue Jetson Orin NX J4012

valenlopez993 · September 20, 2024, 8:19pm

Hello everyone,

I am facing a severe hardware-level issue with my Jetson Orin NX. The Jetson is using the reComputed J4012 carrier board, running JetPack 5.1.2 [L4T 35.4.1] and Ubuntu 20.04.

An exception started appearing about three weeks ago, and I haven’t been able to resolve it:

[vie sep 20 15:36:19 2024] ------------[ cut here ]------------
[vie sep 20 15:36:19 2024] WARNING: CPU: 0 PID: 0 at drivers/gpio/gpio-tegra186.c:937 tegra186_gpio_irq+0x1ac/0x1f0
[vie sep 20 15:36:19 2024] Modules linked in: nvidia_modeset(O) fuse lzo_rle lzo_compress zram ramoops reed_solomon snd_soc_tegra186_asrc snd_soc_tegra210_iqc snd_soc_tegra210_ope snd_soc_tegra186_dspk snd_soc_tegra210_mvc snd_soc_tegra186_arad snd_soc_tegra210_afc snd_soc_tegra210_dmic snd_soc_tegra210_mixer snd_soc_tegra210_adx snd_soc_tegra210_amx snd_soc_tegra210_i2s iwlmvm snd_soc_tegra210_admaif mac80211 snd_soc_tegra_pcm snd_soc_tegra210_sfc snd_soc_tegra210_adsp aes_ce_blk crypto_simd cryptd aes_ce_cipher ghash_ce sha2_ce sha256_arm64 sha1_ce snd_soc_spdif_tx snd_soc_tegra_machine_driver snd_soc_tegra_utils snd_soc_simple_card_utils snd_soc_tegra210_ahub nvadsp tegra210_adma binfmt_misc snd_hda_codec_hdmi snd_hda_tegra tegra_bpmp_thermal snd_hda_codec userspace_alert snd_hda_core iwlwifi spi_tegra114 nv_imx477 cfg80211 r8168 nvidia(O) loop ina3221 pwm_fan nvgpu nvmap ip_tables x_tables [last unloaded: mtd]
[vie sep 20 15:36:19 2024] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W  O      5.10.120-tegra #1
[vie sep 20 15:36:19 2024] Hardware name: Unknown NVIDIA Orin NX Developer Kit/NVIDIA Orin NX Developer Kit, BIOS 4.1-33958178 08/01/2023
[vie sep 20 15:36:19 2024] pstate: 20400089 (nzCv daIf +PAN -UAO -TCO BTYPE=--)
[vie sep 20 15:36:19 2024] pc : tegra186_gpio_irq+0x1ac/0x1f0
[vie sep 20 15:36:19 2024] lr : tegra186_gpio_irq+0x11c/0x1f0
[vie sep 20 15:36:19 2024] sp : ffff800010003ef0
[vie sep 20 15:36:19 2024] x29: ffff800010003ef0 x28: ffff2635c0e94460 
[vie sep 20 15:36:19 2024] x27: ffffadaddfd1cfe8 x26: 0000000000000018 
[vie sep 20 15:36:19 2024] x25: ffff2635c5ea1880 x24: ffff2635c5f0a000 
[vie sep 20 15:36:19 2024] x23: 000000000000000c x22: 000000000000004c 
[vie sep 20 15:36:19 2024] x21: 00000000000000b9 x20: 0000000000000000 
[vie sep 20 15:36:19 2024] x19: ffffadaddf1cf290 x18: 0000000000000000 
[vie sep 20 15:36:19 2024] x17: 0000000000000000 x16: ffffadadde198810 
[vie sep 20 15:36:19 2024] x15: 0000000000000000 x14: 0000000000000000 
[vie sep 20 15:36:19 2024] x13: 0000000000000003 x12: 0000000000000500 
[vie sep 20 15:36:19 2024] x11: 0000000000000040 x10: ffffadaddfc87b60 
[vie sep 20 15:36:19 2024] x9 : ffffadaddfc87b58 x8 : ffff2635c04b9268 
[vie sep 20 15:36:19 2024] x7 : 0000000000000000 x6 : 0000000000000001 
[vie sep 20 15:36:19 2024] x5 : 0000000000000000 x4 : 0000000000000000 
[vie sep 20 15:36:19 2024] x3 : 0000000000000000 x2 : ffffadadde090d70 
[vie sep 20 15:36:19 2024] x1 : 000000000000004c x0 : 0000000000000000 
[vie sep 20 15:36:19 2024] Call trace:
[vie sep 20 15:36:19 2024]  tegra186_gpio_irq+0x1ac/0x1f0
[vie sep 20 15:36:19 2024]  generic_handle_irq+0x40/0x60
[vie sep 20 15:36:19 2024]  __handle_domain_irq+0x70/0xd0
[vie sep 20 15:36:19 2024]  gic_handle_irq+0x68/0x134
[vie sep 20 15:36:19 2024]  el1_irq+0xd0/0x180
[vie sep 20 15:36:19 2024]  cpuidle_enter_state+0xb8/0x410
[vie sep 20 15:36:19 2024]  cpuidle_enter+0x40/0x60
[vie sep 20 15:36:19 2024]  call_cpuidle+0x44/0x80
[vie sep 20 15:36:19 2024]  do_idle+0x208/0x270
[vie sep 20 15:36:19 2024]  cpu_startup_entry+0x2c/0x70
[vie sep 20 15:36:19 2024]  rest_init+0xdc/0xe8
[vie sep 20 15:36:19 2024]  arch_call_rest_init+0x18/0x20
[vie sep 20 15:36:19 2024]  start_kernel+0x500/0x538
[vie sep 20 15:36:19 2024] ---[ end trace 8c31d42c728e02cf ]---

Based on the log, I understand that the issue is related to the GPIO ports, but despite weeks of troubleshooting, I haven’t been able to find the root cause. This is creating a major problem in my system because every time I run the following command:

systemct restart nvargus-daemon.service

to restart the camera drivers and launch my application (which acquires real-time images from a process), the entire Jetson reboots unexpectedly.

To clarify, the exception mentioned above repeats indefinitely in the kernel logs, but it doesn’t seem to cause any issues in the system until I restart the nvargus-daemon service to launch my application. I’m confident that this is related to the error.

I would greatly appreciate any assistance or guidance to help me resolve this issue. If you need any additional information to clarify my situation, please feel free to ask.

Thanks in advance!

PJ_Glasso · September 20, 2024, 11:00pm

Hi there,
So anytime I have seen this kind of thing it’s been power supply related, Unplug any unused devices and is the camera a USB camera? If you have a meter or scope to look at the supply I would look there.
If it was a result of an APT-Get Update " then look at the USB drivers. Do you have WIFI?
Post more info we can help diagnose it.
You got a picture.?
HTH
GL PJ

valenlopez993 · September 23, 2024, 2:28pm

Hi, thanks for responding.

Let me explain further. I have two IMX477 cameras connected via CSI, and I’m using this WIFI module: Network controller [0280]: Intel Corporation Device [8086:2725] (rev 1a). For camera configuration, I followed the process outlined here: Supported Cameras, and for the WIFI module, I’m using the iwlwifi-ty-a0-gf-a0-59 driver.

Regarding the power supply issue you mentioned, we had a similar suspicion because our Jetson is mounted on a hydraulic pallet truck and powered by the truck’s battery. We conducted the following tests, always checking the dmesg log to see if the error disappeared:

Jetson (with the WIFI module) connected to the truck. The error appeared.
We unmounted the Jetson (with the WIFI module) and powered it using an external power source. The error appeared.
Then, we removed the WIFI module and powered the Jetson (without the WIFI module) using an external power source. The error disappeared.
We reconnected the WIFI module to confirm, expecting the error to reappear. The error did not appear.
Finally, we reconnected the Jetson (with the WIFI module) to the truck, and now the error reappeared.

After all these tests, we thought the issue might be related to the electric circuit of the pallet truck. However, to make the error disappear, we always have to completely unplug the WIFI module after connecting the system to an external power supply. Otherwise, the error persists.

We don’t understand why the error doesn’t disappear simply by changing the power supply. If this is a power supply issue, shouldn’t the error go away after switching to an external power source without having to unplug the WIFI module entirely?

Regarding the photo you asked for, did you mean an image of our setup or something else?

Best

PJ_Glasso · September 23, 2024, 2:39pm

Hi there,
I would say an RF issue is being caused by a ground loop or power issue, What operating voltage is the the pallet truck?
Heavy filtering on the power supply from the pallet truck if going that route.
Can you go all DC? example Pallet truck is 24vdc, step down to a 12vDC
can you scope the power VCC line?
See what the ripple or noise is?
HTH
GL PJ

valenlopez993 · September 23, 2024, 4:36pm

In the end, we discovered that the issue was caused by a screen connected to the HDMI port.

I found the solution here: Kernel level issue Jetson Orin NX J4012 - Jetson Orin NX - NVIDIA Developer Forums.

Thanks for the help!