NVMe Failure After Kernel Update

Running an OG Reform on trixie/sid, with a Kioxia 1TB NVMe drive, model KXG60ZNV1T02, root on NVMe.

I’d left the machine alone for a few months, so there was a lot of updates. During the upgrade, kernel 6.9, was installed. Afterwards, the system was unbootable, dropping to initramfs shell due to a missing root device. Downgrading to kernel 6.1 restored functionality.

In both cases, the NVMe device shows up in lspci, but with kernel 6.9, no /dev/nvme0n1 block device shows up.

The relevant kernel log messages from 6.9:

[   66.549873] nvme nvme0: I/O tag 10 (000a) QID 0 timeout, disable controller
[   71.555492] nvme nvme0: Device not ready; aborting shutdown, CSTS=0x1
[   71.577920] nvme nvme0: failed to set APST feature (-4)
[   71.594361] nvme 0001:01:00.0: probe with driver nvme failed with error -4

Are there any tweaks known that can fix this, or do I just need to pin the older kernel version?

Thanks!

If you say OG Reform, I assume this is with i.MX8MQ processor? I would try adding the following to the kernel commandline, you can do this in u-boot by typing:

setenv bootargs pcie_aspm=off nvme_core.default_ps_max_latency_us=0

Does that help?

Correct, i.MX8MQ.

Setting these bootargs does not resurrect the NVMe device, but the error message changes:

[    0.000000] Kernel command line: ro no_console_suspend cryptomgr.notests loglevel=3  pcie_aspm=off nvme_core.default_ps_max_latency_us=0 console=ttymxc0,115200  console=ttymxc0,115200 cma=512M pci=nomsi console=tty1
[    4.829541] nvme nvme0: pci function 0001:01:00.0
[    4.829575] nvme 0001:01:00.0: enabling device (0000 -> 0002)
[   66.556163] nvme nvme0: I/O tag 20 (0014) QID 0 timeout, disable controller
[   66.560223] nvme nvme0: Device not ready; aborting shutdown, CSTS=0x1
[   66.584215] nvme nvme0: Identify Controller failed (-4)
[   66.600583] nvme 0001:01:00.0: probe with driver nvme failed with error -5