Hello All,
Since 4 days my raspberry Pi 4 is experiencing a NIC connectivity problem. The NIC stops working with the following backtrace:
[code]
Jul 21 06:02:17 xxx.xxxxx kernel: ------------[ cut here ]------------
Jul 21 06:02:17 xxx.xxxxx kernel: NETDEV WATCHDOG: end0 (bcmgenet): transmit queue 0 timed out
Jul 21 06:02:17 xxx.xxxxx kernel: WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x278/0x280
Jul 21 06:02:17 xxx.xxxxx kernel: Modules linked in: xt_LOG nf_log_syslog xt_recent ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_conntrack iptable_filter xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 brcmfmac brcmutil hci_uart btbcm bluetooth rpivid_hevc(C) bcm2835_codec(C) bcm2835_v4l2(C) >
Jul 21 06:02:17 xxx.xxxxx kernel: CPU: 3 PID: 0 Comm: swapper/3 Tainted: G C 6.1.38-2-rpi-ARCH #1
Jul 21 06:02:17 xxx.xxxxx kernel: Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
Jul 21 06:02:17 xxx.xxxxx kernel: pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
Jul 21 06:02:17 xxx.xxxxx kernel: pc : dev_watchdog+0x278/0x280
Jul 21 06:02:17 xxx.xxxxx kernel: lr : dev_watchdog+0x278/0x280
Jul 21 06:02:17 xxx.xxxxx kernel: sp : ffffffc00801bd90
Jul 21 06:02:17 xxx.xxxxx kernel: x29: ffffffc00801bd90 x28: ffffffd83475b274 x27: ffffffc00801beb0
Jul 21 06:02:17 xxx.xxxxx kernel: x26: ffffffd834e18008 x25: 0000000000000000 x24: ffffffd83514d860
Jul 21 06:02:17 xxx.xxxxx kernel: x23: ffffffd835146000 x22: 0000000000000000 x21: ffffff8102fe83dc
Jul 21 06:02:17 xxx.xxxxx kernel: x20: ffffff8102fe8000 x19: ffffff8102fe8488 x18: 0000000000000006
Jul 21 06:02:17 xxx.xxxxx kernel: x17: ffffffa9ca1aa000 x16: 0000000000000010 x15: 0000000000000001
Jul 21 06:02:17 xxx.xxxxx kernel: x14: 0000000020000000 x13: 0000000000000002 x12: 0000000000000000
Jul 21 06:02:17 xxx.xxxxx kernel: x11: 0000000000000000 x10: ffffffd8351c3740 x9 : ffffffd833cfa564
Jul 21 06:02:17 xxx.xxxxx kernel: x8 : 00000000ffffefff x7 : ffffffd8351c3740 x6 : 80000000fffff000
Jul 21 06:02:17 xxx.xxxxx kernel: x5 : ffffff81fefc4990 x4 : 0000000000000003 x3 : 0000000000000004
Jul 21 06:02:17 xxx.xxxxx kernel: x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffff81002c3e00
Jul 21 06:02:17 xxx.xxxxx kernel: Call trace:
Jul 21 06:02:17 xxx.xxxxx kernel: dev_watchdog+0x278/0x280
Jul 21 06:02:17 xxx.xxxxx kernel: call_timer_fn+0x3c/0x1cc
Jul 21 06:02:17 xxx.xxxxx kernel: __run_timers+0x258/0x314
Jul 21 06:02:17 xxx.xxxxx kernel: run_timer_softirq+0x38/0x60
Jul 21 06:02:17 xxx.xxxxx kernel: __do_softirq+0x198/0x4d8
Jul 21 06:02:17 xxx.xxxxx kernel: ____do_softirq+0x18/0x24
Jul 21 06:02:17 xxx.xxxxx kernel: call_on_irq_stack+0x24/0x54
Jul 21 06:02:17 xxx.xxxxx kernel: do_softirq_own_stack+0x24/0x3c
Jul 21 06:02:17 xxx.xxxxx kernel: __irq_exit_rcu+0xd4/0x120
Jul 21 06:02:17 xxx.xxxxx kernel: irq_exit_rcu+0x18/0x50
Jul 21 06:02:17 xxx.xxxxx kernel: el1_interrupt+0x38/0x70
Jul 21 06:02:17 xxx.xxxxx kernel: el1h_64_irq_handler+0x18/0x2c
Jul 21 06:02:17 xxx.xxxxx kernel: el1h_64_irq+0x64/0x68
Jul 21 06:02:17 xxx.xxxxx kernel: arch_cpu_idle+0x18/0x2c
Jul 21 06:02:17 xxx.xxxxx kernel: default_idle_call+0x54/0x19c
Jul 21 06:02:17 xxx.xxxxx kernel: do_idle+0x26c/0x2b0
Jul 21 06:02:17 xxx.xxxxx kernel: cpu_startup_entry+0x30/0x3c
Jul 21 06:02:17 xxx.xxxxx kernel: secondary_start_kernel+0x128/0x150
Jul 21 06:02:17 xxx.xxxxx kernel: __secondary_switched+0xb0/0xb4
Jul 21 06:02:17 xxx.xxxxx kernel: ---[ end trace 0000000000000000 ]---
[/code]
No traffic goes through, LEDs on the NIC are blinking though.
The ip command shows the NIC is up, ethtool also reports no problem. I don't see much else besides the tx errors counter increasing.
Interestingly enough rebooting does [b]not[/b] fix the issue: upon reboot the issue happens again immediately after the NIC goes up.
Bringing the NIC down and up again does not fix it either.
The only fix I found so far is physically disconnecting the RJ45 and plugging it back again: traffic resumes only in this case.
I tried upgrading the kernel from 6.1.35-4 to 6.1.38-2 to no avail. I just upgraded to 6.1.39-2. I will see if tonight the issue happens again.
Any idea?
Thanks!