Random drops of network connection. Not only a few packets but complete network outage for tens of seconds.
After months of seeking the right HW platform for my home NAS (fanless, powered by an external adapter, at least 3 SATA ports) I have found SeeedStudio Odyssey X86J4125 as the best solution for my idea. Ordered, ordered M.2 PCI 5xSATA board, ordered M.2 SATA SSD, ordered 3x 2.5" SATA HDDs, ordered re_computer case, put everything together.
I have considered TrueNAS as my first choice but then I have read about not-well-supported syncthing in TrueNAS and switched my mind to (my beloved) Debian. Installed Debian 11, configured two ZFS pools (one mirrored for important data, one over single disk for unimportant data).
Even during initial configuration work via SSH I have detected freezing of my SSH session. Sometimes for a few seconds, sometimes for even a minute. I have focused to seek these network issues. Tried also Debian 10 (ok, fine, 11 is too new, maybe there are some kernel bugs) but no difference. Every time I considered “YES!!! I made it!”, after some while another network dropout occurred. And even after I have spent bambillion hours of troubleshooting, I still don’t know what’s wrong. No evidence in “dmesg”, no evidence in syslog, nothing strange in “atop” logs,…
What I have tried already
In order you would probably ask me not in the order I have tried it
- used a much more powerful power source than the original one (16V/4.5A)
- passed memtest (free latest version) with zero errors
- upgraded BIOS to lastest one (SD-BS-CJ41G-300-101-H)
- upgraded EmbededController firmware to latest one (SD-EC-CJ41G-M-101-Q)
- disabled wifi and Bluetooth adapters
- used USB based Ethernet adapter on USB3 port
- used the same USB Ethernet adapter on the USB2 port
- tried several different CAT5E UTP cables
- tuned Mikrotik switch to disable all possible STP checks, multicasts,…
- connect Odyssey to an ordinary stupid 100Mbps switch together with my PC (to avoid any strange influence of Mikrotik)
- disabled any power management in BIOS
- disabled any CPU freq management in BIOS
- disabled virtualization support in BIOS
- disabled “Energy Efficient Ethernet” in Debian
- changed eth speed to 100Mbps
- tuned any possible queues and buffers for network interface
- installed “tuned” and used “throughput-performance” profile
- disabled power management in Debian
- tried “powertop” to find some strange power consumption
- stop all my services running on the box using network (samba, minidlna, syncthing)
- stop ZFS
- replaced SATA power+data cable
- disconnect one of the HDD
- disconnect all HDDs
- disconnect M.2 PCI 5xSATA card
- boot from live Linux distro from USB and run it from ramdisk
Many times after any of these steps above I let ping the server for a few hours and I have realized “Heureeeka!!!” Ping has 0% lost packets. I won!
Then I have just pressed a few keys within the SSH session and the 30-second freeze came again. I could ignore troubles with ssh session, syncthing is also somehow able to retransmit data. But the primary usage - samba for videos and minidlna for music - is unusable. Streaming video from samba is failing every few minutes.
Very strange detail
When I run ping from any other device on my network (notebook, another home server, router,…) the ping losses are 10-40% over several hours.
BUT!!! When I run ping out from Odyssey to some of my other devices, the ping losses are 0%. And now the magic comes - when I run ingress ping at the same time, the losses are just about 2%
Anyway, it couldn’t be used as a “dirty workaround” because those 2% losses don’t mean occasional packet drop but still outages for tens of seconds. Just less often.
Regarding the kernel tuning. To be honest, I don’t remember everything I have tried. Every possible recommendation offered by discussion forums for network troubles.
But, I somehow expect that the not-tuned kernel should provide just worse performance, not malfunction.
Has any of you idea what I could try more?