I have a RPi3B+ running Arch ARM armv7h (or interesting aarch64 as well) and am experiencing lockups when copying files from an NFS share to the same NFS share. For example:
$this->bbcode_second_pass_code('', 'rsync -a --delete-after -W -x -q /scratch/armc8/root/ /scratch/armc8/facade')
If I tail dmesg:
$this->bbcode_second_pass_code('', '
[ +0.000005] nfs: server ease.lan not responding, still trying
[ +1.061951] nfs: server ease.lan OK
[ +0.000007] nfs: server ease.lan OK
[ +0.000019] nfs: server ease.lan OK
[ +0.000667] nfs: server ease.lan OK
[ +0.000037] nfs: server ease.lan OK
[ +0.000036] nfs: server ease.lan OK
[ +0.000015] nfs: server ease.lan OK
[ +0.000012] nfs: server ease.lan OK
[ +0.000012] nfs: server ease.lan OK
[ +0.000012] nfs: server ease.lan OK
[Dec22 10:44] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [10.1.8.228-mana:1151]
[ +0.008170] Modules linked in: rpcsec_gss_krb5 btsdio brcmfmac rc_cec brcmutil vc4 cec rc_core nf_log_ipv4 nf_log_common ipt_REJECT drm_kms_helper xt_LOG drm microchip hci_uart cfg80211 xt_limit btqca drm_panel_orientation_quirks syscopyarea btbcm xt_addrtype btintel sysfillrect lan78xx sysimgblt fb_sys_fops bluetooth xt_conntrack raspberrypi_hwmon ecdh_generic bcm2835_thermal pwm_bcm2835 rfkill i2c_bcm2835 bcm2835_wdt ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter
[ +0.055229] CPU: 2 PID: 1151 Comm: 10.1.8.228-mana Not tainted 4.19.10-1-ARCH #1
[ +0.007777] Hardware name: Raspberry Pi 3 Model B+ (DT)
[ +0.005500] pstate: 20000005 (nzCv daif -PAN -UAO)
[ +0.005054] pc : nfs_sb_active+0x98/0xf0
[ +0.004133] lr : nfs_delegation_reap_unclaimed+0x3c/0x108
[ +0.005676] sp : ffff00000dea3c50
[ +0.003484] x29: ffff00000dea3c50 x28: 0000000000000000
[ +0.005592] x27: ffff800030d064c8 x26: ffff80002f5d0a40
[ +0.005592] x25: ffff0000097c96c8 x24: 0000000000000000
[ +0.005590] x23: 0000000000000000 x22: ffff800030d06400
[ +0.005591] x21: ffff800030d064c8 x20: ffff8000313aa400
[ +0.005591] x19: ffff80002f485800 x18: 0000000000000000
[ +0.005590] x17: ffff7e000082e880 x16: 00000000fffffff8
[ +0.005592] x15: 0000000000000003 x14: 1800000000000000
[ +0.005590] x13: 0000000000000000 x12: 0100000000000000
[ +0.015436] x11: 00000006756df2ec x10: 00000000000009a0
[ +0.015328] x9 : ffff00000dea3960 x8 : ffff800035e39780
[ +0.015345] x7 : 0000000000000000 x6 : 0000000000000000
[ +0.015122] x5 : ffff80002f48589c x4 : 000000000000000c
[ +0.014783] x3 : ffff8000313aa400 x2 : ffff80002f48589c
[ +0.014487] x1 : 0000000000000000 x0 : 0000000000000001
[ +0.014115] Call trace:
[ +0.010777] nfs_sb_active+0x98/0xf0
[ +0.011860] nfs_delegation_reap_unclaimed+0x3c/0x108
[ +0.013324] nfs4_state_end_reclaim_reboot+0x174/0x240
[ +0.013310] nfs4_recovery_handle_error+0x5c/0x1b0
[ +0.012956] nfs4_do_reclaim+0x214/0x250
[ +0.012076] nfs4_state_manager+0x438/0x960
[ +0.012373] nfs4_run_state_manager+0x2c/0x40
[ +0.012579] kthread+0x130/0x138
[ +0.011429] ret_from_fork+0x10/0x1c
')
I posted this to the github issue page for linux/raspberrypi (#2788) and through the course of investigating, pelwell, one of the project developers, suggested that I try the same hardware using Raspbian which I did. Interestingly, I could not get the bug to manifest on that distro.
Further, he suggested that I copy the kernel and modules from raspberrypi/firmware to my Arch box which I did. To my surprise, I could boot it and not reproduce the error running it. Finally, pelwell is thinking the issue is not with the upstream code but somehow with Arch ARM.
I took the Raspbian kernel config and substituted it for our config in the PKGBUILD/built 4.14.90 with it and got the bug to manifest, so it's not something in the config it would seem. I would appreciated any thoughts to try to further debug this. Thanks all!