I had a really crappy night tonight and couldn’t sleep. So I messed around with 6.1.0 to see if I could use minute’s work-a-round above. Binding a key combo to trigger the output disable and enable commands didn’t work. But adding the function to the standby script did work. For me this is fine.
I guess the issues is in the graphics stack which sounds like fun to debug (note this is sarcastically said).
Nice to see some progress here. I went other direction and compiled 6.3 kernel. I had to rebase some of the patches (can share my diff if someone is interested). Compilation succeeded (it took 6.5 hours on Reform) but haven’t solved problems with suspend. There is small change though: while NVMe cannot fully wake from sleep, at least device files are still there after waking up.
Interesting, because on 6.1.0-8 I am 9 and 0 for resume. Just this morning I resumed and the wifi card was not that. I had to go to work so I just suspended it again, but I am curious if the card will be there when I resume again. Something about the pcie lane not initializing on resume.
Are we aware of a command that we could use to force the thing to try to initialize again? I know on the Librem 5, you can use a command to temporarily shunt power to the USB hub the cellular radio and wifi are connected to as a means to get the radio back. Curious if something is possible here that is similar.
pcie is supposedly hotpluggable so there should be some means of triggering a rescan/reset of the bus.
I think these two commands could get the job done. I found that suspending and resuming solved the issue as well.
I would be very much interested in seeing this get some attention. I would be willing to put some money into the pot for it as well. I know that it shouldn’t be too hard, because the Librem 5 which uses the same SoC is able to suspend and resume 100%. They had one issue that was keeping it from working but that was more in the phosh lock screen and not an actual hardware glitch.
I’m thinking about approaching the Librem 5 devs and saying, hey we got a pot of money if you can fix suspend on the Reform. hahahahaha!
Small investigation regarding suspend problems on MNT Reform; to be more correct those seems to be resume problems.
I’m not sure if Purism/Librem 5 found all solutions: resume has troubles waking up NVMe (not used in Librem 5) and screen - don’t know how similar are MNT Reform and Librem 5 in that regard.
Problems didn’t occur on 5.12, started around 6.0. Still occur on 6.1.20, 6.3 (my experiment) and 6.3.1 (available in Debian experimental right now).
It would be nice if someone could repeat my experiments, to make sure I didn’t make too many mistakes.
I found few suspicious commits when searching the web.
5e85eba6f50dc288c22083a7e213152bcc4b8208 from 2022-09-13
“PCI/ASPM: Refactor L1 PM Substates Control Register programming”
It was reverted on 2023-02-03 by commit ff209ecc376a2ea8dd106a1f594427a5d94b7dd3 Commit message for that mentions:
Thomas Witt reported that 5e85eba6f50d ("PCI/ASPM: Refactor L1 PM Substates
Control Register programming") broke suspend/resume on a Tuxedo
Infinitybook S 14 v5, which seems to use a Clevo L140CU Mainboard.
The main symptom is:
iwlwifi 0000:02:00.0: Unable to change power state from D3hot to D0, dev>
nvme 0000:03:00.0: Unable to change power state from D3hot to D0, device>
and the machine is only partially usable after resume. It can't run dmesg
and can't do a clean reboot. This happens on every suspend/resume cycle.
Revert 5e85eba6f50d until we can figure out the root cause.
Fixes: 5e85eba6f50d ("PCI/ASPM: Refactor L1 PM Substates Control Register >
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216877
So while symptoms are very similar, this reversal (present in 6.3, released on 2023-04-23) haven’t solved them for MNT Reform.
Another suspicous commit is 4ff116d0d5fd8a025604b0802d93a2d5f4e465d1 from 2022-09-13, related to 5e85eba6f50dc288c22083a7e213152bcc4b8208
Again, reverted on the same date as previous one (2023-02-03):
commit a7152be79b627428c628da2a887ca4b2512a78fd
Revert "PCI/ASPM: Save L1 PM Substates Capability for suspend/resume"
This reverts commit 4ff116d0d5fd8a025604b0802d93a2d5f4e465d1.
Tasev Nikola and Mark Enriquez reported that resume from suspend was broken
in v6.1-rc1. Tasev bisected to a47126ec29f5 ("PCI/PTM: Cache PTM
Capability offset"), but we can't figure out how that could be related.
Mark saw the same symptoms and bisected to 4ff116d0d5fd ("PCI/ASPM: Save L1
PM Substates Capability for suspend/resume"), which does have a connection:
it restores L1 Substates configuration while ASPM L1 may be enabled:
pci_restore_state
pci_restore_aspm_l1ss_state
aspm_program_l1ss
pci_write_config_dword(PCI_L1SS_CTL1, ctl1) # L1SS restore
pci_restore_pcie_state
pcie_capability_write_word(PCI_EXP_LNKCTL, cap[i++]) # L1 restore
which is a problem because PCIe r6.0, sec 5.5.4, requires that:
If setting either or both of the enable bits for ASPM L1 PM
Substates, both ports must be configured as described in this
section while ASPM L1 is disabled.
Separately, Thomas Witt reported that 5e85eba6f50d ("PCI/ASPM: Refactor L1
PM Substates Control Register programming") broke suspend/resume, and it
depends on 4ff116d0d5fd.
Revert 4ff116d0d5fd ("PCI/ASPM: Save L1 PM Substates Capability for
suspend/resume") to fix the resume issue and enable revert of 5e85eba6f50d
to fix the issue Thomas reported.
Note that reverting 4ff116d0d5fd means L1 Substates config may be lost on
suspend/resume. As far as we know the system will use more power but will
still *work* correctly.
Fixes: 4ff116d0d5fd ("PCI/ASPM: Save L1 PM Substates Capability for suspen>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216782
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216877
Next potentiall relevant commit is 3e347969a5776947a115649dae740a9ed47473f5 from 2022-09-21:
PCI/PM: Reduce D3hot delay with usleep_range()
PCIe r6.0, sec 5.9, requires a 10ms delay between programming a device to
change to or from D3hot and the time the device is next accessed (unless
Readiness Notifications are used).
The 10ms value (PCI_PM_D3HOT_WAIT) doesn't appear directly here because
some chipsets require 120ms for devices *below* them (pci_pm_d3hot_delay)
and some devices require more or less than 10ms (dev->d3hot_delay).
But msleep(10) typically waits about *20*ms, which is more than we need.
Switch to usleep_range() to improve the delay accuracy.
Based on a commit from Sajid in the Pixel 6 kernel tree [1]. On a Pixel 6,
the 10ms delay for the Exynos PCIe device delayed for an average of 19ms.
Switching to usleep_range() decreased the resume time by about 9ms.
I reverted it and built 6.3.1 with that change. Unfortunately it didn’t fix problems.
I’ll try investigating further. Haven’t yet fully read issues mentioned in commit messages. If someone has helpful suggestions, please put those here.
In discussions on the IRC channel, it is suspected that the etaniv driver is to blame. Perhaps a focus there as to what diffs are present would be able to shed some light on this?
Just as an FYI for any who happen upon this thread later, but kernels 6.4 and 6.5 have restored functional suspend. It works far more reliably now than ever before. I’m on over 200 successful resumes from suspend right on 6.5.
One other thing I would like to share is that if you have LibreOffice open make sure that no files are open with changes that have not been saved. This prevents a lock being held on the open file that seems to cause the system to crash when resuming. If you just save the file before suspending then it doesn’t cause issues.
If anyone followed @josch advice from Standby - Suspend to RAM (MNT Reform) - #105 by josch and installed u-boot-menu, few bits of warning.
First: it does not generate menu on installation, but only during new kernel installation.
Second: you will have 2 files, /boot/boot.scr (as in original configuration, generated by flash-kernel) and /boot/extlinux/extlinux.conf. u-boot considers the latter as more important than boot.scr, so boot process will be managed by extlnux.conf.
Third, related to second: if you introduced non-standard boot configuration (in my case - use btrfs subvolume) you’ll need to put those to /etc/default/u-boot, as /etc/default/flash-kernel is ignored by u-boot-menu.
This is reported here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1053814
I submitted this patch to u-boot-menu to fix it: debian/u-boot-menu.postinst: create extlinux.conf on initial installation (fba470ec) · Commits · Debian / u-boot-menu · GitLab
@vagrantc uploaded a new version of u-boot-menu with this fix applied two days ago. Maybe you are still on the old version? I hope my fix did the right thing and it’s not still broken somehow?
Thanks for info. I have updated my machine on Monday, so still have old u-boot-menu. I’ll try to update or remove and (re-)install to check how it behaves now.
Just want to report than I have successful suspended and resumed now 93 times and still going strong. It has been 3 months and 12 days since my Reform has been restarted.
I just upgraded to 6.6.13, and for me there is no change: MNT wakes up from suspend, but NVMe cannot be started.
At the same occasion I experimented with u-boot-menu. New version creates menu after installation, without need to (re-)install kernel. At the same time, after its removal (apt remove, without purge) one cannot install (or remove) kernels. Scripts in /etc/kernel/post{inst,rm}.d are left (those are configuration files, so only purge gets rid of them) but their execution fails as rest of package is gone.
Note, that when you observe a bug with a package in Debian, please file in with the Debian bugtracker instead of using the forum. If it concerns the packages around systimage-v4, feel free to put my mail into the X-Debbugs-Cc
pseudo header when filing the bug.
You say that you cannot install or remove kernels anymore. Can you show the actual error messages? Scripts in /etc
are left after package removal by design as /etc
contains your user configuration. Execution should not fail as the scripts look like this:
if [ -x /usr/sbin/u-boot-update ]
then
# Update extlinux configuration
u-boot-update
fi
So if you don’t have u-boot-menu installed, then there is no /usr/sbin/u-boot-update
and the scripts just do nothing. What’s the error you see?
6.5.0-1 is what I am using, and suspend has been flawless. Not sure about the new kernels. I haven’t been using my L5 but I believe they are still on 6.4. Perhaps the kernel you are using is too new and has reintroduced the issue.
I reported bug in BTS: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1061687
TL;DR: zz-u-boot-menu tries to read default configuration from /usr/share/u-boot-menu, which got removed - while scripts in /etc/kernel got left as conffiles.
My problems with resume started long before 6.6.13: it did not work at all with any 6.X kernel, including 6.5.0-1. I suspect my NVMe is more sensitive for some settings or events’ timings during resume.
What NVME drive do you have? I am using the Transcend 1tb from MNT and it has been working with suspend since the beginning. Nearly flawless under 6.5.