[SOLVED] RPi4 8Gb + USB SSD + root on ZFS

This forum is for topics dealing with problems with software specifically in the AArch64 repo.

[SOLVED] RPi4 8Gb + USB SSD + root on ZFS

Postby pretendpersonbot » Wed Apr 06, 2022 10:26 pm

After about five months or so I have finally accepted that this issue is not going to magically fix itself and it has stubbornly resisted all my attempts to resolve it so far... time to ask for help.

I have a RPi4 8Gb instance called "cakey" that lives behind the TV in my workspace and runs Kodi 24/7 - it's probably my favourite computer out of the 50 or so I own and keeps me entertained all day while I work. About a year (?) ago I experimented with setting it up as root-on-ZFS booting aarch64 from a USB SSD. Worked perfectly and has remained in constant daily use ever since with absolutely no problems: except one. cakey will not upgrade to any kernel past linux-raspberrypi4-5.10.77-1-aarch64 and I really do mean any kernel.

cakey is upgraded daily (I like software updates) and is always completely up to date except for the kernel + matching headers package which I have had to blacklist since linux-raspberrypi4-5.10.77-1 and remain "stuck" on to this day. The problem first manifested on the day linux-raspberrypi4-5.10.78-1-aarch64 was released (2021-11-09 I think) when cakey failed to boot post-upgrade. Rolling back to the previous 5.10.77-1-aarch64 kernel + headers restored normal operation.

Since then once a week or so I remove the IgnorePkg directive blacklisting both packages and let cakey upgrade them. Not a single kernel past 5.10.77-1-aarch64 has worked. Every other package on the system (including the bootloader) is still updated daily however. My most recent test was today (2022-04-01) with kernel + headers linux-rpi-5.15.32-3, same result.

The issue occurs early in boot after the system has successfully initialized the kernel and initramfs from the USB SSD. I've transcribed the relevant chunk of output from a phone pic for legibility omitting timestamps:

$this->bbcode_second_pass_code('', 'Freeing unused kernel memory: 2624K
Run /init as init process
:: running early hook [udev]
mmc1: new high speed SDIO card at address 0001
usb 1-1: new high speed USB device number 2 using xhci_hcd
Starting version 250.4-2-arch
:: running hook [udev]
:: Triggering uvents...
usb 1-1: New USB device found, idVendor=2109, idProduct=3431, bcdDevice= 4.21
usb 1-1: New USB device strings: Mfr=0, Product=1, SerialNumber=0
usb 1-1: Product USB2.0 Hub
hub 1-1:1.0: USB hub found
hub 1-1:1.0: 4 ports detected
usb 2-1: new SuperSpeed USB device number 2 using xhci_hcd
usb 2-1: New USB device found, idVendor=174c, idProduct=55aa, bcdDevice= 1.00
usb 2-1: New USB strings: Mfr=2, Product=3, SerialNumber=1
usb 2-1: Product: ASM105x
usb 2-1: Manufacturer: ASMT
usb 2-1: SerialNumber: 12345678E9FC
scsi host0: uas
scsi 0:0:0:0: Direct-Access ASMT ASM105x 0 PQ: 0 ANSI: 6
sd 0:0:0:0: [sda] 117231408 512-byte logical blocks: (60.0 GB/55 GiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 0:0:0:0: [sda] Optimal transfer size 33553920 bytes
:: running hook [zfs]
sda: sda1 sda2
usb 1-1.3: new low speed USB device number 3 using xhci_hcd
sd 0:0:0:0: [sda] Attached SCSI disk
usb 1-1.3: New USB device found, idVendor=413c, idProduct=2105, bcdDevice= 3.52
usb 1-1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=0
usb 1-1.3: Product Dell USB Keyboard
usb 1-1.3: Manufacturer: Dell
ZFS: Importing pool rpool.
The ZFS modules are not loaded.
Try running '/sbin/modprobe zfs' as root to load them.
/init: line 51: die: not found
The ZFS modules are not loaded.
Try running '/sbin/modprobe zfs' as root to load them.
:: running late hook [zfs]
input: Dell Dell USB Keyboard as /devices/platform/scb/fd500000,pcie,blahblah
The ZFS modules are not loaded.
Try running '/sbin/modprobe zfs' as root to load them.
:: running cleanup hook [udev]
spl: loading out-of-tree module taints kernel.
icp: module license 'CDDL' taints kernel.
Disabling lock debugging due to kernel taint
ERROR: failed to mount the real root device.
Bailing out, you are on your own, Good luck.
sh: can't access tty: job control turned off')

I'm extremely familiar with linux + zfs and troubleshooting it when it goes wrong: normally at this point dropping to the emergency prompt and issuing the following will load the modules, set up the system and resume a normal boot:

$this->bbcode_second_pass_code('', 'modprobe zfs
zpool import rpool
zfs mount -a
exit')

However on cakey with any effected kernel this will work until exiting (modules are loaded correctly, zpool is imported and zfs datasets mounted, filesystem mounted) but cakey instead faults immediately with "cannot find root filesystem".

I've spent a fair bit of time on/off probing this issue and trying to resolve it with absolutely zero progress so far. I have tried:

Modifying config.txt and cmdline.txt with every variation known to man
Switching bootloaders
Testing the latest available kernel every week or two for five months
Blacklisting or upgrading only certain packages (bootloader, etc) while trying upgrades in stages
Manually rebuilding the zfs modules via dkms and rebuilding the initramfs
Combed through this forum, google, the github issues pages for relevant projects and everywhere else for hints
Changed hooks order in /etc/mkinitcpio.conf
Trying different compatible adapters, SSDs and another RPi4 8Gb to rule out individual units

After installing a "bad" kernel I can move the SSD to a handy RPi3B running exactly the same software stack but booting from a regular ext4 partition to repair it. The RPi3B is also aarch64 with the same zfs variant installed and has no issues with the same kernel upgrades: there I can chroot in, roll back the kernel to linux-raspberrypi4-5.10.77-1-aarch64 and restore normal working order. I don't normally bother as I'm so used to this now I image the entire SSD first to a file or spare SSD and then post-install failure I dd the clone back - it's faster and easier.

If I examine the initramfs from a failed kernel install I can see all the relevant zfs bits are indeed present:

$this->bbcode_second_pass_code('', 'comrade@failbot:~/RPIKODI$ lsinitramfs FAILEDBOOTDUMP/initramfs-linux.img | egrep -i 'zfs|zpool'
etc/zfs/
etc/zfs/zed.d/
hooks/zfs
usr/bin/fsck.zfs
usr/bin/mount.zfs
usr/bin/zfs
usr/bin/zpool
usr/lib/libzfs_core.so.3
usr/lib/libzfs.so.4
usr/lib/libzpool.so.5
usr/lib/modules/5.15.32-3-rpi-ARCH/kernel/zfs.ko
usr/lib/udev/rules.d/90-zfs.rules')

This correctly matches the output of the same command run against a working initramfs from the same system.

So it seems that although all the right pieces are in place for a normal boot, something is wrong. I would very much appreciate any pointers and have made a bet with myself that it's going to be something really, really simple that I am overlooking. Anyone?

As this post has somehow got rather long I'll follow up in a minute with the details of the relevant software, bootloader, firmware etc that cakey is currently using plus the boot stanzas from cmdline.txt and config.txt.
Last edited by pretendpersonbot on Tue Apr 19, 2022 6:55 pm, edited 1 time in total.
pretendpersonbot
 
Posts: 6
Joined: Sat Apr 02, 2022 12:36 am

Re: RPi4 8Gb + USB SSD + root on ZFS

Postby pretendpersonbot » Wed Apr 06, 2022 10:58 pm

Ok here's the rest of the relevant details, please ask if I miss anything.

$this->bbcode_second_pass_code('', '[comrade@cakey ~]$ pacman -Q | grep bootloader
raspberrypi-bootloader 20220401-1')

$this->bbcode_second_pass_code('', '[comrade@cakey ~]$ pacman -Q | grep zfs
zfs-dkms 2.1.4-1
zfs-utils 2.1.4-1')

$this->bbcode_second_pass_code('', '[comrade@cakey ~]$ pacman -Q | grep firmware
firmware-raspberrypi 20220328-2
linux-firmware 20220309.cd01f85-1
linux-firmware-whence 20220309.cd01f85-1
raspberrypi-firmware 20220324-1')

$this->bbcode_second_pass_code('', '[comrade@cakey ~]$ cat /boot/config.txt
# dtoverlay=vc4-kms-v3d
initramfs initramfs-linux.img followkernel
# for bluetooth apparently:
dtparam=krnbt=on
include kodi.config.txt')

$this->bbcode_second_pass_code('', '[comrade@cakey ~]$ cat /boot/cmdline.txt
zfs=bootfs zfs=rpool/ROOT/arch zfs_force=1 rw console=serial0,115200 console=tty1 selinux=0 plymouth.enable=0 smsc95xx.turbo_mode=N dwc_otg.lpm_enable=0 kgdboc=serial0,115200')
pretendpersonbot
 
Posts: 6
Joined: Sat Apr 02, 2022 12:36 am

Re: RPi4 8Gb + USB SSD + root on ZFS

Postby tamjan » Mon Apr 11, 2022 5:53 am

I used zfs on rpi1. Not on the rootfs however. So, I'm just guessing wildly below.
Think I noted that the module loading was changed a while back. Have you checked that zfs stuff are included (seems to be as it tries to import the pool...) and that $this->bbcode_second_pass_code('', '/lib/modules-load.d/zfs.conf') looks ok in your initramfs?
I'm not sure what difference it makes, but have you tried the alternate syntax for rootfs specification? Like
$this->bbcode_second_pass_code('', 'root=zfs:rpool/ROOT/arch')
And, when I think of it. When in the emergency shell - have you checked that your physical devices are available and that the pool is actually imported?

It seems to me that the modules are loaded way too late. Is the zfs module in the $this->bbcode_second_pass_code('', 'MODULES') array in $this->bbcode_second_pass_code('', '/etc/mkinitcpio.conf')?
tamjan
 
Posts: 19
Joined: Tue Jan 14, 2014 7:23 am
Location: Lund, Sweden

Re: RPi4 8Gb + USB SSD + root on ZFS

Postby pretendpersonbot » Mon Apr 11, 2022 7:05 pm

Thanks for your reply - I understand ZFS on the RPi is a bit niche so wasn't particularly optimistic but at least someone else out there has experimented apart from me.

$this->bbcode_second_pass_quote('', 'T')hink I noted that the module loading was changed a while back.


Any source for that would be very helpful if you can find it - my gut feeling is that something has indeed changed in the RPi kernel/initramfs/bootloader setup relatively recently as it's the only architecture/distro combination I'm experiencing any issues with booting root-on-ZFS and I have literally tens of them on the go without any problems. Including Arch on x64 which gives me a good source of comparisons for problem solving.

$this->bbcode_second_pass_quote('', '/')lib/modules-load.d/zfs.conf


This file is not needed and is not present within the initramfs of any of my other root-on-ZFS instances, including the Arch x64 ones.

$this->bbcode_second_pass_quote('', 'r')oot=zfs:rpool/ROOT/arch


I have indeed systematically cycled through just about every possible combination of cmdline.txt options including this one.

$this->bbcode_second_pass_quote('', 'h')ave you checked that your physical devices are available and that the pool is actually imported?


In the original dmesg I included the SSD is recognised correctly and the RPi4 actively loads the initramfs from it as per usual. The fault occurs when the boot fails to load the zfs modules from it for an unknown reason. The modules are all there and can indeed not only be loaded from the emergency prompt but then can be used to import the pool and mount the zfs datasets. So the error message is misleading: all of the necessary bits are present in the initramfs, they're just not loaded on demand at the correct time.

To be more precise, the boot messages actually indicate the zfs modules are loaded but a moment later when the system tries to actually use them to import the pool it errors with "zfs modules not loaded".

$this->bbcode_second_pass_quote('', 'I')t seems to me that the modules are loaded way too late. Is the zfs module in the modules array in /etc/mkinitcpio.conf?


This seems to be the heart of the issue - it's not that the modules aren't there, they're just not being loaded correctly. But yes, the relevant hooks stanza is:

$this->bbcode_second_pass_code('', 'HOOKS=(base udev autodetect modconf block keyboard zfs filesystems)')

I believe the order is important and have tested moving the zfs entry around but with no change. Additionally the Arch x64 comparison systems with working root-on-ZFS have the same stanza as above with no problems and the RPi4 booted every kernel up to linux-raspberrypi4-5.10.77-1 with this config.

¯\_(ツ)_/¯
pretendpersonbot
 
Posts: 6
Joined: Sat Apr 02, 2022 12:36 am
Top

Re: RPi4 8Gb + USB SSD + root on ZFS

Postby tamjan » Mon Apr 11, 2022 7:51 pm

I'll just reply shortly now. When I have more time I might elaborate more.
From your dmesg transcript:
$this->bbcode_second_pass_code('', '
The ZFS modules are not loaded.
Try running '/sbin/modprobe zfs' as root to load them.
:: running cleanup hook [udev]
spl: loading out-of-tree module taints kernel.
icp: module license 'CDDL' taints kernel.
')

Those last two lines indicate zfs being loaded, which is why I said it's way too late. I think the initramfs is messed up somehow.
I might actually run zfs-rootfs on my rpi4 as well, but I'm not sure when I'll get to that.

I do however woner why you have to zfs= declarations on the command line - one that points out a root dataset and one that points out a nested one. Surely you have only one rootfs.
tamjan
 
Posts: 19
Joined: Tue Jan 14, 2014 7:23 am
Location: Lund, Sweden

Re: RPi4 8Gb + USB SSD + root on ZFS

Postby pretendpersonbot » Mon Apr 11, 2022 8:03 pm

I've annotated the boot messages here for some (hopeful) clarity. Unfortunately I can't include whitespace so the result is still not very clear - comments are preceded with <<< and in bold:

Freeing unused kernel memory: 2624K
Run /init as init process <<< ok off we go
:: running early hook [udev]
mmc1: new high speed SDIO card at address 0001 <<< no SD card present of course
usb 1-1: new high speed USB device number 2 using xhci_hcd <<< here's the SSD device already
Starting version 250.4-2-arch
:: running hook [udev] <<< udev hook right on time
:: Triggering uvents...
usb 1-1: New USB device found, idVendor=2109...
<usb stuff snipped here>
sd 0:0:0:0: [sda] Optimal transfer size 33553920 bytes
:: running hook [zfs] <<< zfs hook fires here
sda: sda1 sda2
usb 1-1.3: new low speed USB device number 3 using xhci_hcd
<more usb stuff snipped>
usb 1-1.3: Manufacturer: Dell
ZFS: Importing pool rpool. <<< zfs hook tries to import rpool...
The ZFS modules are not loaded. <<< ...and here's the error
Try running '/sbin/modprobe zfs' as root to load them. <<< but it was your job to load them initramfs
/init: line 51: die: not found
The ZFS modules are not loaded. <<< but why? they are there
Try running '/sbin/modprobe zfs' as root to load them.
:: running late hook [zfs]
input: Dell Dell USB Keyboard as <snip>
The ZFS modules are not loaded.
Try running '/sbin/modprobe zfs' as root to load them.
:: running cleanup hook [udev]
spl: loading out-of-tree module taints kernel. <<< this is part of zfs, what is going on here?
icp: module license 'CDDL' taints kernel. <<< so is this - are you loading zfs modules or not?

Digging deeper I've unpacked the initramfs from a faulted attempt so I can look through the actual init and zfs hooks script themselves and think I've narrowed the cause down to an ordering or wait issue between init and the zfs hook interacting but it'd probably take an actual dev to spot it. I'll maybe attempt a git bisect of the kernels before and after this bug appears.

The next step is to probably try the ZFS user mailing list where I'm pretty active anyway but there I'll have the opposite problem to here where everyone is using a RPi but nobody uses root-on-ZFS: on the ZFS lists everyone uses root-on-ZFS but nobody uses it on RPis. Perhaps I'm the only person on the internet in the Venn intersection of both.
pretendpersonbot
 
Posts: 6
Joined: Sat Apr 02, 2022 12:36 am

Re: RPi4 8Gb + USB SSD + root on ZFS

Postby tamjan » Tue Apr 12, 2022 4:53 am

Yes, those last two dmesg lines indicate that loading of zfs takes place (spl is part of zfs).

And, previously I wondered if you had zfs in the MODULES array, not the HOOKS array.

If you unpack the initramfs - is zfs in /lib/modules?

If I were you I'd try to understand why the modules are not loaded and see where that gets me. I think that something is needed to get the zfs module to load during early init. The boot process will not load it automatically. This is why you have the MODULES array, or stuff like $this->bbcode_second_pass_code('', '/lib/modules-load.d/zfs.conf') (c.f. https://www.freedesktop.org/software/systemd/man/modules-load.d.html), for instance.

I can't from the top of my head come up with an explanation of why it's loaded with a previous kernel and not with a later one...
tamjan
 
Posts: 19
Joined: Tue Jan 14, 2014 7:23 am
Location: Lund, Sweden

Re: RPi4 8Gb + USB SSD + root on ZFS

Postby pretendpersonbot » Wed Apr 13, 2022 6:34 pm

In a really bad mood I returned to my cursed RPi4 again for another round of fighting and...

SWEET SWEET VICTORY AT LAST

$this->bbcode_second_pass_quote('', 'A')nd, previously I wondered if you had zfs in the MODULES array, not the HOOKS array.


It was this - presumably all along. tamjan I owe you one - a second pair of eyes to go over my work and suggest things was key. Perhaps this was the "Think I noted that the module loading was changed a while back" issue you were wondering about previously?

Strangely this argument isn't required elsewhere on any of the many distro/architecture combinations I have running for what passes as my idea of fun, even the Arch x64 system I was using as my main point of reference. It's tempting to guess that this is entirely RPi specific but after all this time, I'll take the win gratefully and not care about it too much. This change was introduced specifically on the RPi side (it is categorically not from the ZFS codebase) at exactly this point:

linux-raspberrypi4-5.10.77-1 > linux-raspberrypi4-5.10.78-1

I also tidied up my cmdline.txt stanza a little bit and for posterity I'll include the exact voodoo snippets needed for anyone else stupid enough to run root-on-ZFS on RPi4 themselves:

$this->bbcode_second_pass_code('', '
[comrade@cakey ~]$ cat /boot/cmdline.txt
zfs=rpool/ROOT/arch rw console=serial0,115200 console=tty1 selinux=0 plymouth.enable=0 smsc95xx.turbo_mode=N dwc_otg.lpm_enable=0 kgdboc=serial0,115200
')

The previous duplicated zfs=bootfs and zfs=rpool/ROOT/arch were actually redundant identical arguments by the way, either/or was sufficient to boot the system. I've settled arbitrarily on the "cleaner" version and eliminated the unnecessary zfs_force=1 argument.

$this->bbcode_second_pass_code('', '
[comrade@cakey ~]$ egrep 'HOOKS|MODULES' /etc/mkinitcpio.conf | grep -v \#
MODULES=(zfs)
HOOKS=(base udev autodetect modconf block keyboard zfs filesystems)
')

This was the mission critical change: only these two specific "zfs" inserts in MODULES and HOOKS differ from any normal non-zfs RPi's working mkinitcpio.conf. Adding "zfs" to HOOKS was the only thing required to finally hoist my system up on the new kernel.

$this->bbcode_second_pass_code('', '
[comrade@cakey ~]$ uname -r && zfs --version
5.15.33-1-rpi-ARCH
zfs-2.1.4-1
zfs-kmod-2.1.4-1
')

$this->bbcode_second_pass_code('', '
[comrade@cakey ~]$ zfs list | head -n 4
NAME USED AVAIL REFER MOUNTPOINT
rpool 19.5G 34.3G 96K /
rpool/ROOT 17.3G 34.3G 96K none
rpool/ROOT/arch 17.3G 34.3G 8.97G /
')

With immense satisfaction I shall mark this thread [SOLVED] in a few days but only after a couple of regular successful RPi kernel upgrades as they come down the pipe.

I'm going to promptly open another one though relating to a firmware (?)bug I've also run into on the same system after noticing that the eeprom was out of date whilst chasing this one down. Such is the life of a habitual fiddler I guess, I just can't leave things alone ¯\_(ツ)_/¯

Thanks again tamjan!
pretendpersonbot
 
Posts: 6
Joined: Sat Apr 02, 2022 12:36 am
Top

Re: RPi4 8Gb + USB SSD + root on ZFS

Postby tamjan » Thu Apr 14, 2022 12:55 pm

I'm glad I could help. :)
tamjan
 
Posts: 19
Joined: Tue Jan 14, 2014 7:23 am
Location: Lund, Sweden

Re: [SOLVED] RPi4 8Gb + USB SSD + root on ZFS

Postby pretendpersonbot » Tue Apr 19, 2022 6:59 pm

After a week and a couple of successful and completely uneventful kernel upgrades I'm marking this [SOLVED].

tldr - add "zfs" to MODULES stanza of /etc/mkinitcpio like so:

[comrade@cakey ~]$ grep 'MODULES=(zfs)' /etc/mkinitcpio.conf
MODULES=(zfs)
pretendpersonbot
 
Posts: 6
Joined: Sat Apr 02, 2022 12:36 am


Return to ARMv8

Who is online

Users browsing this forum: No registered users and 12 guests