WiFi fails to start since recent kernel upgrade

I upgraded my machine a few days ago and now WiFi fails to start on 2 out of 3 cold boots. The kernel seems to initialize something, then the boot process chokes on everything that depends on networking. I have to turn the machine off and back on until it finally starts.

It would get to a prompt eventually if I wait long enough, but then there would be no WiFi and no internet, so I don’t even bother waiting anymore and I just reboot when it starts choking.

Does this happen to anyone else?

I’m not sure when the regression occurred exactly: I don’t exactly upgrade the machine religiously (precisely to avoid problems like that when it works well) and I haven’t paid attention to the kernel version number.

At the moment it’s 6.18.10 and the problem is still there.

It might be this known issue: On RK3588 since 6.18: some wifi and SSD are unreliable without pcie_aspm=off (#55) · Issues · Bugs / Bugs · GitLab

It sounds like it might be that. But the issue says this at the end:

This is a regression since 6.17 and thus probably a bug somewhere. pcie_aspm=off is added by reform-tools until the culprit is found.

I assume this kernel parameter would be added silently during the update process, in which case it might be some other problem, because I upgraded 12 hours ago and it didn’t fix it.

I haven’t uploaded reform-tools with that change yet. It is sitting here:

Which platform/SoM/wifi are you on? I’d like to add another data point and potentially have another tester for when I find the time to bisect the kernel to find the culprit for this.

1 Like

Platform is Reform classic

SoM is RK3588

WiFi is whatever came with the machine when I ordered it. I haven’t checked and I’m at work right now. “Intel Wi-Fi 6-e Card” is what the order page says - and it says it’s required with the RK3588 if you want WiFi, and I wanted WiFi, so I suppose that’s what I have.

If this is too vague, I’ll check when I get home tonight :slight_smile:

That that indeed seems to be the problem fixed by pcie_aspm=off.

That would be appreciated but no rush!

If you like, you could also try and confirm that pcie_aspm=off indeed fixes the problem by doing as root::

echo "U_BOOT_PARAMETERS=\"\$U_BOOT_PARAMETERS pcie_aspm=off\"" > "/etc/u-boot-menu/conf.d/reform_pcie_fix.conf"
u-boot-update

Then you reboot and your wifi should work. If not, confirm that you have pcie_aspm=off in your /proc/cmdline.

Done that and did 5 cold boots: no problems. That’s not definitely proof that it works now but it seems to have done the trick.

$ cat /proc/cmdline 
ro no_console_suspend cryptomgr.notests plymouth.ignore-serial-consoles clk_ignore_unused cma=256M swiotlb=65535 console=ttyS2,1500000 console=tty1 pcie_aspm=off

Out of curiosity, any pro/cons of pcie_aspm=off? I’m almost never on battery and almost always on the wall wart. So if it makes things a bit faster, I might just leave it there permanently even after the bug is fixed.

Thanks for your help!

Thanks a lot for trying this out! I’ve added you as a data point to On RK3588 since 6.18: some wifi and SSD are unreliable without pcie_aspm=off (#55) · Issues · Bugs / Bugs · GitLab

As far as I’ve understood Lucie in IRC, if aspm ever worked then your system will now consume a tiny bit more power.

1 Like

Thanks for reporting this! I have had problems with an old toshiba ssd. The kernel argument seems to work for me as well.

2 Likes

Also RK3588?

Discourse tells me: “You must post at least 20 characters” so here they go, I guess…

Yes, rk3588. (Running guix. A system roll-back saved me until the solution was clear :slight_smile: )

1 Like

How did you perform a roll-back if your system didn’t recognize your SSD?

My NVMe SSD is still working, but my WiFi on the Headset/Switch Board 2.0 stopped working after rebooting with the last kernel update to the 6.18.x series. At first there were plenty of these messages in DMESG:

[ 10.802646] ath: phy0: Chip reset failed
[ 10.802649] ath: phy0: Unable to reset channel, reset status -22
[ 10.814515] ath: phy0: DMA failed to stop in 10 ms AR_CR=0xffffffff AR_DIAG_SW=0xffffffff DMADBG_7=0xffffffff

after another reboot that changed to:

[ 35.066466] pcieport 0003:30:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[ 35.066490] pci_bus 0003:31: busn_res: [bus 31] end is updated to 31
[ 35.170417] pcieport 0003:30:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[ 35.170441] pci_bus 0003:31: busn_res: [bus 31] end is updated to 31
[ 35.274401] pcieport 0003:30:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[ 35.274422] pci_bus 0003:31: busn_res: [bus 31] end is updated to 31
[ 35.378389] pcieport 0003:30:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[ 35.378410] pci_bus 0003:31: busn_res: [bus 31] end is updated to 31
[ 35.482489] pcieport 0003:30:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[ 35.482510] pci_bus 0003:31: busn_res: [bus 31] end is updated to 31
[ 35.586406] pcieport 0003:30:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[ 35.586425] pci_bus 0003:31: busn_res: [bus 31] end is updated to 31
[ 35.690435] pcieport 0003:30:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[ 35.690453] pci_bus 0003:31: busn_res: [bus 31] end is updated to 31

had to disconnect that part of the headset/switch board to have it stop and will probably roll back to 6.17.x in the evening. Setting pcie_aspm=offdidn’t help much though.

It was recognised, but would fail after a couple of minutes. I got lucky and it worked during the roll-back.

A smarter way would have been to boot off a sd card with 6.17, chroot into the ssd and roll back from there!

Rollback to 6.17 in Guix System as well as booting the reform stock image didn’t change much, except that now the bridge configuration invalid errors are gone as well on both OS (had them as well on the stock OS today for two boots) but so is the WiFI card as it isn’t recognized anymore (no output in dmesg in any direction). I suspect hardware failure at this point, have to narrow down if its the cable between headset/switch board and SoM, the board itself, the wifi card etc.

1 Like