[Solved] Pi 3 B eth0: hw csum failure

Raspberry Pi 2

[Solved] Pi 3 B eth0: hw csum failure

Postby jp430bb » Thu Oct 04, 2018 1:13 am

After several days, errors related to eth0 hw checksum failures appear in the journal. This is a Raspberry Pi 3 B v1.2. Sometimes, when the checksum failures appear, there are also errors related to the on-board Bluetooth adapter, which I have scanning for BLE advertisements.

$this->bbcode_second_pass_code('', 'Sep 30 23:26:42 kernel: eth0: hw csum failure
Sep 30 23:26:42 kernel: CPU: 0 PID: 0 Comm: swapper/0 Tainted: G C 4.14.71-1-ARCH #1
Sep 30 23:26:42 kernel: Hardware name: BCM2835
Sep 30 23:26:42 kernel: [<8010ee54>] (unwind_backtrace) from [<8010b94c>] (show_stack+0x10/0x14)
Sep 30 23:26:42 kernel: [<8010b94c>] (show_stack) from [<80a8095c>] (dump_stack+0x9c/0xc8)
Sep 30 23:26:42 kernel: [<80a8095c>] (dump_stack) from [<8097a4cc>] (__skb_checksum_complete+0xb4/0xb8)
Sep 30 23:26:42 kernel: [<8097a4cc>] (__skb_checksum_complete) from [<80a04e0c>] (__udp4_lib_rcv+0x100/0x93c)
Sep 30 23:26:42 kernel: [<80a04e0c>] (__udp4_lib_rcv) from [<809cee38>] (ip_local_deliver_finish+0xd0/0x348)
Sep 30 23:26:42 kernel: [<809cee38>] (ip_local_deliver_finish) from [<809cf74c>] (ip_local_deliver+0x50/0xec)
Sep 30 23:26:42 kernel: [<809cf74c>] (ip_local_deliver) from [<809cfa28>] (ip_rcv+0x240/0x594)
Sep 30 23:26:42 kernel: [<809cfa28>] (ip_rcv) from [<80985248>] (__netif_receive_skb_core+0x998/0xd04)
Sep 30 23:26:42 kernel: [<80985248>] (__netif_receive_skb_core) from [<809876c8>] (process_backlog+0x94/0x144)
Sep 30 23:26:42 kernel: [<809876c8>] (process_backlog) from [<8098b8bc>] (net_rx_action+0x168/0x448)
Sep 30 23:26:42 kernel: [<8098b8bc>] (net_rx_action) from [<8010157c>] (__do_softirq+0xd4/0x32c)
Sep 30 23:26:42 kernel: [<8010157c>] (__do_softirq) from [<80134ca0>] (irq_exit+0x8c/0x148)
Sep 30 23:26:42 kernel: [<80134ca0>] (irq_exit) from [<80185010>] (__handle_domain_irq+0x58/0xb8)
Sep 30 23:26:42 kernel: [<80185010>] (__handle_domain_irq) from [<80a9aa78>] (__irq_svc+0x58/0x74)
Sep 30 23:26:42 kernel: [<80a9aa78>] (__irq_svc) from [<801085d0>] (arch_cpu_idle+0x30/0x3c)
Sep 30 23:26:42 kernel: [<801085d0>] (arch_cpu_idle) from [<80172f78>] (do_idle+0xf0/0x144)
Sep 30 23:26:42 kernel: [<80172f78>] (do_idle) from [<80173244>] (cpu_startup_entry+0x18/0x1c)
Sep 30 23:26:42 kernel: [<80173244>] (cpu_startup_entry) from [<80e00c84>] (start_kernel+0x38c/0x418)
Sep 30 23:26:43 kernel: Bluetooth: hci0 advertising data length corrected
Sep 30 23:26:43 kernel: Bluetooth: hci0: Frame reassembly failed (-84)
Sep 30 23:26:43 kernel: Bluetooth: hci0: Frame reassembly failed (-84)
Sep 30 23:26:43 kernel: Bluetooth: hci0: Frame reassembly failed (-84)')

/boot/cmdline.txt has this:
$this->bbcode_second_pass_code('', 'root=/dev/mmcblk0p2 rw rootwait console=ttyAMA0,115200 console=tty1 selinux=0 plymouth.enable=0 smsc95xx.turbo_mode=N dwc_otg.lpm_enable=0 kgdboc=ttyAMA0,115200 elevator=noop')

This means the eth0 checksum error messages are output on /dev/ttyAMA0, which is also connected to the Bluetooth adapter, and depending on the state of the Bluetooth adapter at the time the messages are emitted, the Bluetooth adapter may be put into an undefined state. After the Bluetooth hci0 error messages, the application doing the BLE scanning exits and cannot restart.

Issue 2659 on the Raspberry Pi Foundation's GitHub Linux repo has some comments suggesting that a workaround for the hardware checksum errors with recent kernels is to run

$this->bbcode_second_pass_code('', 'ethtool -K eth0 rx off')

I've added a service to run this on boot-up and removed both the console and kgdboc items from /boot/cmdline.txt. So far, the BLE scanning is stable and no eth0 checksum failures have shown up, but it's been running less than 24 hours.

Does this sound reasonable?
jp430bb
 
Posts: 6
Joined: Mon Nov 25, 2013 6:44 pm

Return to Broadcom

Who is online

Users browsing this forum: No registered users and 4 guests