Help Wanted Solving Kernel Panic

This is for ARMv8 based devices

Help Wanted Solving Kernel Panic

Postby ecod00m » Sat Dec 28, 2024 12:56 am

Original Post Here:
https://archlinuxarm.org/forum/viewtopic.php?p=73157

Updated Output Here:
https://imgdrop.io/image/caught-it50pct.Uym6d

Extra (sda) Storage Info:
- Samsung 980 M.2 PCI-E NVMe SSD 500GB
- Cooler Master Oracle Air M.2 SSD Enclosure

EDIT: I am aware of potential NVMe SSD firmware issues with this model and I am currently transporting the drive to a direct NVMe interface to check and update it if needed.

EDIT: Drive is running the latest firmware. No issue detected. Running extended SMART scan.

EDIT: Extended SMART Scan Claims Perfect Health, no errors ever.

EDIT: fdisk -l (run from another system)

## PROBLEM DRIVE:

Disk /dev/nvme3n1: 465.76 GiB, 500107862016 bytes, 976773168 sectors
Disk model: Samsung SSD 980 500GB
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 16384 bytes / 131072 bytes
Disklabel type: dos
Disk identifier: 0x6c7e15b3

Device Boot Start End Sectors Size Id Type
/dev/nvme3n1p1 2048 1050623 1048576 512M c W95 FAT32 (LBA)
/dev/nvme3n1p2 1050624 976773167 975722544 465.3G 83 Linux

## OTHER NVME DRIVES ON ANOTHER SYSTEM:

Disk /dev/nvme0n1: 931.51 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: Samsung SSD 970 EVO Plus 1TB
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 46CBB563-49CD-47A5-817F-03A9583DC8BA

Device Start End Sectors Size Type
/dev/nvme0n1p1 2048 2099199 2097152 1G EFI System
/dev/nvme0n1p2 2099200 2131967 32768 16M Microsoft reserved
/dev/nvme0n1p3 2131968 421562367 419430400 200G Microsoft basic data
/dev/nvme0n1p4 421562368 423100415 1538048 751M Windows recovery environment
/dev/nvme0n1p6 423100416 1953523711 1530423296 729.8G Linux LVM

Disk /dev/nvme2n1: 3.73 TiB, 4096805658624 bytes, 8001573552 sectors
Disk model: TEAM TM8FP4004T
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 327AA0D2-0D8F-4655-8E8F-E87A48443FCE

Device Start End Sectors Size Type
/dev/nvme2n1p1 2048 2099200 2097153 1G EFI System
/dev/nvme2n1p2 1050677248 8001572863 6950895616 3.2T Linux LVM
/dev/nvme2n1p3 2101248 1050677247 1048576000 500G Microsoft basic data

## Problem drive observations

- It's running a dos partition table. This was recommended when I was setting up Arch Linux ARM, and it ended up working so I left it, despite not using that kind of partition table ANYWHERE else.
- The I/O size, both minimum, and optimal, are far larger than the logical/physical, whereas all other drives are matched.

------------------------------------------------------------------
OMG is it just me or is this forum really broken???
ecod00m
 
Posts: 12
Joined: Thu Dec 26, 2024 2:43 am

NVMe I/O Sizes, partition table kind, does it matter?

Postby ecod00m » Sat Dec 28, 2024 6:18 am

In trying to find answers for my Kernel Panic on another topic, I have come across this sub-question. Below is the output of fdisk -l on my main workstation. I have temporarily installed an NVMe I am having problems with (on a Raspberry Pi 4 via USB). Writes, then reads, start to fail at a high sector/block number (I'm still in the process of determining whether they are OOB or valid), then the Kernel Panics because - surprisingly - it can't access swap or disk. I am wondering if the observations (below) mean anything to this.


## PROBLEM DRIVE:

Disk /dev/nvme3n1: 465.76 GiB, 500107862016 bytes, 976773168 sectors
Disk model: Samsung SSD 980 500GB
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 16384 bytes / 131072 bytes
Disklabel type: dos
Disk identifier: 0x6c7e15b3

Device Boot Start End Sectors Size Id Type
/dev/nvme3n1p1 2048 1050623 1048576 512M c W95 FAT32 (LBA)
/dev/nvme3n1p2 1050624 976773167 975722544 465.3G 83 Linux

## OTHER NVME DRIVES ON ANOTHER SYSTEM:

Disk /dev/nvme0n1: 931.51 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: Samsung SSD 970 EVO Plus 1TB
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 46CBB563-49CD-47A5-817F-03A9583DC8BA

Device Start End Sectors Size Type
/dev/nvme0n1p1 2048 2099199 2097152 1G EFI System
/dev/nvme0n1p2 2099200 2131967 32768 16M Microsoft reserved
/dev/nvme0n1p3 2131968 421562367 419430400 200G Microsoft basic data
/dev/nvme0n1p4 421562368 423100415 1538048 751M Windows recovery environment
/dev/nvme0n1p6 423100416 1953523711 1530423296 729.8G Linux LVM

Disk /dev/nvme2n1: 3.73 TiB, 4096805658624 bytes, 8001573552 sectors
Disk model: TEAM TM8FP4004T
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 327AA0D2-0D8F-4655-8E8F-E87A48443FCE

Device Start End Sectors Size Type
/dev/nvme2n1p1 2048 2099200 2097153 1G EFI System
/dev/nvme2n1p2 1050677248 8001572863 6950895616 3.2T Linux LVM
/dev/nvme2n1p3 2101248 1050677247 1048576000 500G Microsoft basic data

## Problem drive observations

- It's running a dos partition table. This was recommended when I was setting up Arch Linux ARM, and it ended up working so I left it, despite not using that kind of partition table ANYWHERE else.
- The I/O size, both minimum, and optimal, are far larger than the logical/physical, whereas all other drives are matched.
ecod00m
 
Posts: 12
Joined: Thu Dec 26, 2024 2:43 am

Re: Help Wanted Solving Kernel Panic

Postby graysky » Sat Dec 28, 2024 9:19 am

MOD NOTE: merged
graysky
Developer
 
Posts: 1870
Joined: Sun Jun 26, 2011 6:56 am
Location: /run/user/1000

Re: Help Wanted Solving Kernel Panic

Postby graysky » Sat Dec 28, 2024 9:20 am

You appear to be running the linux-aarch64 kernel, install linux-rpi and report back.
graysky
Developer
 
Posts: 1870
Joined: Sun Jun 26, 2011 6:56 am
Location: /run/user/1000

Re: Help Wanted Solving Kernel Panic

Postby ecod00m » Sat Dec 28, 2024 11:26 pm

Hi! Will do. Thanks for the reply. I duplicated the second part of the information, though, for a reason. It is a separate question. I am interested to know if anyone has an answer.
ecod00m
 
Posts: 12
Joined: Thu Dec 26, 2024 2:43 am

Re: Help Wanted Solving Kernel Panic

Postby ecod00m » Sat Dec 28, 2024 11:40 pm

Reporting back: linux-rpi kernel has the same problem.

On linux-rpi, the system was running smoother, booting faster, and had processor scaling enabled, which is great, but it still encountered regular, if not **more** frequent kernel panics for the same reasons.

EDIT: Ran badblocks over the entire drive to try to prompt a kernel panic. Nothing.

EDIT: Analysed fstrim setup and found it was not working correctly. Have correctly enabled fstrim for the external drive following the guide here:make_clickable_callback(MAGIC_URL_FULL, ' ', 'https://www.jeffgeerling.com/blog/2020/enabling-trim-on-external-ssd-on-raspberry-pi.', '', ' class="postlink"') Upon running fstrim correctly for the first time, it reported: 417.7 GiB (448454115328 bytes) trimmed. But this has made no difference either.

EDIT: Just happened again. This time I noticed something at the very top that actually appears in all screens snapped so far, but at varying positions:

usb 2-2: USB disconnect, device number 2

The only external device hooked up to the raspberry pi is the external Cooler Master Oracle Air enclosure with the Samsung 500GiB NVMe in it.

EDIT: I went back through my camera roll and found the date that these issues began popping up. They coincide with this pacman update:

[2024-12-09T11:23:58+1100] [ALPM] upgraded linux-firmware-whence (20240909.552ed9b8-1 -> 20241111.b5885ec5-1)
[2024-12-09T11:24:22+1100] [ALPM] upgraded linux-firmware (20240909.552ed9b8-1 -> 20241111.b5885ec5-1)
[2024-12-09T11:24:35+1100] [ALPM] upgraded linux-aarch64 (6.11.3-1 -> 6.12.1-1)

I have reverted to linux-aarch64 (6.11.3-1) and related firmware, firmware-whence, and uboot-raspberrypi. If this holds stable for more than a day or two, then I'm guessing it's a kernel bug.
Last edited by ecod00m on Tue Dec 31, 2024 2:22 am, edited 2 times in total.
ecod00m
 
Posts: 12
Joined: Thu Dec 26, 2024 2:43 am

Re: Help Wanted Solving Kernel Panic

Postby ecod00m » Sun Dec 29, 2024 1:33 am

[deleted]
Last edited by ecod00m on Tue Dec 31, 2024 2:24 am, edited 1 time in total.
ecod00m
 
Posts: 12
Joined: Thu Dec 26, 2024 2:43 am

Re: Help Wanted Solving Kernel Panic

Postby ecod00m » Sun Dec 29, 2024 7:10 am

[deleted]
Last edited by ecod00m on Tue Dec 31, 2024 2:24 am, edited 1 time in total.
ecod00m
 
Posts: 12
Joined: Thu Dec 26, 2024 2:43 am

Re: Help Wanted Solving Kernel Panic

Postby ecod00m » Sun Dec 29, 2024 10:02 pm

[deleted]
Last edited by ecod00m on Tue Dec 31, 2024 2:24 am, edited 1 time in total.
ecod00m
 
Posts: 12
Joined: Thu Dec 26, 2024 2:43 am

Re: Help Wanted Solving Kernel Panic

Postby ecod00m » Mon Dec 30, 2024 10:12 am

[deleted]
Last edited by ecod00m on Tue Dec 31, 2024 2:25 am, edited 3 times in total.
ecod00m
 
Posts: 12
Joined: Thu Dec 26, 2024 2:43 am

Next

Return to ARMv8 Devices

Who is online

Users browsing this forum: No registered users and 33 guests