Orin NX 16 J4012, inferencing on CPU! cuda unavailable

Hi all!
I just purchased the unit for using it with utlralytics’ yolov8. here’s what I did so far:

  1. created a venv and activated it, installed ultralytics, ran inference on sample video. the inference is >290ms, which is very slow. turns out it’s running on cpu.
  2. using python, imported torch to check cuda.is_available(). it says False.
  3. checked online on both forums (nvidia’s included) it seems some steps are needed to install the right pytorch for the Orin NX 16 (link: PyTorch for Jetson - Announcements - NVIDIA Developer Forums), I followed it (with Jetpack 5.1.1), but it gives me an error when I reach to this step:
pip3 install numpy torch-1.8.0-cp36-cp36m-linux_aarch64.whl

it says error:

ERROR: torch-1.8.0-cp36-cp36m-linux_aarch64.whl is not a supported wheel on this platform.

I am now stuck, can someone help please?