Lockup when copying files on NFS share on RPi3B+

Raspberry Pi 2

Lockup when copying files on NFS share on RPi3B+

Postby graysky » Sun Dec 23, 2018 10:51 pm

I have a RPi3B+ running Arch ARM armv7h (or interesting aarch64 as well) and am experiencing lockups when copying files from an NFS share to the same NFS share. For example:

$this->bbcode_second_pass_code('', 'rsync -a --delete-after -W -x -q /scratch/armc8/root/ /scratch/armc8/facade')

If I tail dmesg:
$this->bbcode_second_pass_code('', '
[ +0.000005] nfs: server ease.lan not responding, still trying
[ +1.061951] nfs: server ease.lan OK
[ +0.000007] nfs: server ease.lan OK
[ +0.000019] nfs: server ease.lan OK
[ +0.000667] nfs: server ease.lan OK
[ +0.000037] nfs: server ease.lan OK
[ +0.000036] nfs: server ease.lan OK
[ +0.000015] nfs: server ease.lan OK
[ +0.000012] nfs: server ease.lan OK
[ +0.000012] nfs: server ease.lan OK
[ +0.000012] nfs: server ease.lan OK
[Dec22 10:44] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [10.1.8.228-mana:1151]
[ +0.008170] Modules linked in: rpcsec_gss_krb5 btsdio brcmfmac rc_cec brcmutil vc4 cec rc_core nf_log_ipv4 nf_log_common ipt_REJECT drm_kms_helper xt_LOG drm microchip hci_uart cfg80211 xt_limit btqca drm_panel_orientation_quirks syscopyarea btbcm xt_addrtype btintel sysfillrect lan78xx sysimgblt fb_sys_fops bluetooth xt_conntrack raspberrypi_hwmon ecdh_generic bcm2835_thermal pwm_bcm2835 rfkill i2c_bcm2835 bcm2835_wdt ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter
[ +0.055229] CPU: 2 PID: 1151 Comm: 10.1.8.228-mana Not tainted 4.19.10-1-ARCH #1
[ +0.007777] Hardware name: Raspberry Pi 3 Model B+ (DT)
[ +0.005500] pstate: 20000005 (nzCv daif -PAN -UAO)
[ +0.005054] pc : nfs_sb_active+0x98/0xf0
[ +0.004133] lr : nfs_delegation_reap_unclaimed+0x3c/0x108
[ +0.005676] sp : ffff00000dea3c50
[ +0.003484] x29: ffff00000dea3c50 x28: 0000000000000000
[ +0.005592] x27: ffff800030d064c8 x26: ffff80002f5d0a40
[ +0.005592] x25: ffff0000097c96c8 x24: 0000000000000000
[ +0.005590] x23: 0000000000000000 x22: ffff800030d06400
[ +0.005591] x21: ffff800030d064c8 x20: ffff8000313aa400
[ +0.005591] x19: ffff80002f485800 x18: 0000000000000000
[ +0.005590] x17: ffff7e000082e880 x16: 00000000fffffff8
[ +0.005592] x15: 0000000000000003 x14: 1800000000000000
[ +0.005590] x13: 0000000000000000 x12: 0100000000000000
[ +0.015436] x11: 00000006756df2ec x10: 00000000000009a0
[ +0.015328] x9 : ffff00000dea3960 x8 : ffff800035e39780
[ +0.015345] x7 : 0000000000000000 x6 : 0000000000000000
[ +0.015122] x5 : ffff80002f48589c x4 : 000000000000000c
[ +0.014783] x3 : ffff8000313aa400 x2 : ffff80002f48589c
[ +0.014487] x1 : 0000000000000000 x0 : 0000000000000001
[ +0.014115] Call trace:
[ +0.010777] nfs_sb_active+0x98/0xf0
[ +0.011860] nfs_delegation_reap_unclaimed+0x3c/0x108
[ +0.013324] nfs4_state_end_reclaim_reboot+0x174/0x240
[ +0.013310] nfs4_recovery_handle_error+0x5c/0x1b0
[ +0.012956] nfs4_do_reclaim+0x214/0x250
[ +0.012076] nfs4_state_manager+0x438/0x960
[ +0.012373] nfs4_run_state_manager+0x2c/0x40
[ +0.012579] kthread+0x130/0x138
[ +0.011429] ret_from_fork+0x10/0x1c
')

I posted this to the github issue page for linux/raspberrypi (#2788) and through the course of investigating, pelwell, one of the project developers, suggested that I try the same hardware using Raspbian which I did. Interestingly, I could not get the bug to manifest on that distro.

Further, he suggested that I copy the kernel and modules from raspberrypi/firmware to my Arch box which I did. To my surprise, I could boot it and not reproduce the error running it. Finally, pelwell is thinking the issue is not with the upstream code but somehow with Arch ARM.

I took the Raspbian kernel config and substituted it for our config in the PKGBUILD/built 4.14.90 with it and got the bug to manifest, so it's not something in the config it would seem. I would appreciated any thoughts to try to further debug this. Thanks all!
graysky
Developer
 
Posts: 1728
Joined: Sun Jun 26, 2011 6:56 am
Location: /run/user/1000

Re: Lockup when copying files on NFS share on RPi3B+

Postby TheSaint » Mon Dec 24, 2018 4:36 am

It might be an issue on the gcc version. Debian may not use the latest one.
TheSaint
 
Posts: 346
Joined: Mon Jul 23, 2018 7:57 am

Re: Lockup when copying files on NFS share on RPi3B+

Postby graysky » Mon Dec 24, 2018 12:07 pm

$this->bbcode_second_pass_quote('TheSaint', 'I')t might be an issue on the gcc version. Debian may not use the latest one.


You are correct. When I boot into the updated raspbian image:
$this->bbcode_second_pass_code('', '$ gcc --version
gcc (Raspbian 6.3.0-18+rpi1+deb9u1) 6.3.0 20170516
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
')

Perhaps I should try compiling in a chroot configured to use $this->bbcode_second_pass_code('', 'http://tardis.tiny-vps.com/aarm/repos/2017/03/12/$arch/$repo') and test again.

EDIT: Damn, can't do it... I don't think the chroot of that vintage is compatible with a current parent system:
$this->bbcode_second_pass_code('', '% sudo MAKEFLAGS=-j5 makechrootpkg -r /scratch/armc7
==> Synchronizing chroot copy [/scratch/armc7/root] -> [facade]...done
==> Making package: linux-raspberrypi 4.14.90-2 (Mon Dec 24 07:31:43 2018)
==> Retrieving sources...
-> Found 6d68e517b3ec73b08f3af96f5859c5d083b66535.tar.gz
-> Found config.txt
-> Found cmdline.txt
-> Found config
-> Found linux.preset
-> Found 60-linux.hook
-> Found 90-linux.hook
==> Validating source files with md5sums...
6d68e517b3ec73b08f3af96f5859c5d083b66535.tar.gz ... Passed
config.txt ... Passed
cmdline.txt ... Passed
config ... Passed
linux.preset ... Passed
60-linux.hook ... Passed
90-linux.hook ... Passed
Failed to attach 3377 to compat systemd cgroup /user.slice/user-1000.slice/session-1.scope/payload: No such file or directory
Failed to attach 3351 to compat systemd cgroup /user.slice/user-1000.slice/session-1.scope/supervisor: No such file or directory
Failed to chown() cgroup /sys/fs/cgroup/systemd/user.slice/user-1000.slice/session-1.scope/payload: No such file or directory
Parent died too early
==> ERROR: Build failed, check /scratch/armc7/facade/build')
graysky
Developer
 
Posts: 1728
Joined: Sun Jun 26, 2011 6:56 am
Location: /run/user/1000

Re: Lockup when copying files on NFS share on RPi3B+

Postby graysky » Mon Dec 24, 2018 10:02 pm

I was able to build as I described above, but this kernel produced the same system-lock-up.
graysky
Developer
 
Posts: 1728
Joined: Sun Jun 26, 2011 6:56 am
Location: /run/user/1000


Return to Broadcom

Who is online

Users browsing this forum: No registered users and 3 guests