Let's get Hibernation working

For me the last rough edge that the Reform has is the lack of hibernation. With some people not able to suspend, and other able to, hibernation seems like natural low hanging fruit that would help everyone get the most out of their Reform especially when away from mains.

Should we do a bounty? I would be happy to contribute to something like that. Of course if someone is more familiar with this and can point me in the right direction, I’m not above trying to figure this out myself.

3 Likes

If you set something up I’d be happy to contribute.

1 Like

Honestly, I find hibernate way more useful than sleep/suspend so I put some effort into getting mine to hibernate last night.

I added an extra (unencrypted) swap on /dev/mmcblk1p1 just to get encryption and lvm out of the picture as a test, set that up as the resume target in the kernel cmdline.

One thing we have to fix is systemd flat out refuses to even try. It says Failed to hibernate system via logind: Sleep verb "hibernate" not supported if you do systemctl hibernate.

So I tried to do it via sysfs like it says in the Linux kernel docs: Debugging hibernation and suspend — The Linux Kernel documentation

doing echo disk | sudo tee /sys/power leaves the machine unresponsive, powered on with a black screen. Maybe there’s something useful on the serial console, but I haven’t opened it to hook one up to check.

Trying the various pm_test modes, It gets past freezer and devices, but platform, processor, or core will leave it with a blank screen needing a reset.

2 Likes

I wonder how much we’re in uncharted waters with this. The vast majority of info about hibernation on Linux I’ve found is concerning PC things like ACPI/BIOS/UEFI issues, “Secure” Boot, grub, etc. which don’t apply to the Reform.

1 Like

My understand was that hibernation was kind of hardware agnostic, in that the kernel is the one that has to react to the hibernation status and not the BIOS. As long as the kernel can read a “flag” the it should be booting using the hibernation swap space, then the underlying hardware shouldn’t matter. At least as far as I understand things.

1 Like

It is and it isn’t agnostic. The hardware doesn’t need to explicitly support it, but there are things that have to work with the hardware for it to be successful.

I finally got a serial console on my Reform during a hibernate test and this is where it gets stuck. Wondering if that stuff about the eDP bridge is the culprit of it coming back to a black screen and being stuck?

echo processors > /sys/power/pm_test
echo disk > /sys/power/state

[  129.899475] PM: hibernation: hibernation entry
[  130.013260] (NULL device *): firmware: direct-loading firmware regulatory.db
[  130.013260] (NULL device *): firmware: direct-loading firmware regulatory.db.p7s
[  130.039324] Filesystems sync: 0.009 seconds
[  130.043720] Freezing user space processes
[  130.050399] Freezing user space processes completed (elapsed 0.002 seconds)
[  130.057505] OOM killer disabled.
[  130.060905] PM: hibernation: Preallocating image memory
[  132.593501] PM: hibernation: Allocated 243710 pages for snapshot
[  132.599635] PM: hibernation: Allocated 974840 kbytes in 2.52 seconds (386.84 MB/s)
[  132.607300] Freezing remaining freezable tasks
[  132.613261] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[  132.621626] wlp1s0: deauthenticating from XX:XX:XX:XX:XX:XX by local choice (Reason: 3=DEAUTH_LEAVING)
[  132.658961] DEBUG: ti_sn65dsi86_suspend skipped.
[  132.715519] DEBUG: ti_sn_bridge_atomic_disable skipped.
[  132.720790] DEBUG: ti_sn_bridge_atomic_post_disable skipped.
[  132.729329] Disabling non-boot CPUs ...
[  132.734969] psci: CPU1 killed (polled 0 ms)
[  132.742201] psci: CPU2 killed (polled 0 ms)
[  132.747573] psci: CPU3 killed (polled 0 ms)
[  132.752401] PM: hibernation: debug: Waiting for 5 seconds.
[  137.776024] Enabling non-boot CPUs ...
[  137.780450] Detected VIPT I-cache on CPU1
[  137.780488] GICv3: CPU1: found redistributor 1 region 0:0x00000000388a0000
[  137.780543] CPU1: Booted secondary processor 0x0000000001 [0x410fd034]
[  137.781087] CPU1 is up
[  137.801480] Detected VIPT I-cache on CPU2
[  137.801506] GICv3: CPU2: found redistributor 2 region 0:0x00000000388c0000
[  137.801541] CPU2: Booted secondary processor 0x0000000002 [0x410fd034]
[  137.801953] CPU2 is up
[  137.822350] Detected VIPT I-cache on CPU3
[  137.822375] GICv3: CPU3: found redistributor 3 region 0:0x00000000388e0000
[  137.822410] CPU3: Booted secondary processor 0x0000000003 [0x410fd034]
[  137.822865] CPU3 is up

Or… it might be hantro_vpu and the WiFi’s fault.
The reform-standby script takes care of this for suspend. It seems to fix that for hibernate as well. I’m getting past that and it blows up with swapper crashing now. Interesting…

[  591.475655] nvme 0001:01:00.0: Unable to change power state from unknown to D0, device inaccessible
[  591.870335] irq 191: nobody cared (try booting with the "irqpoll" option)
[  591.877206] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        WC O       6.5.0-2-reform2-arm64 #1  Debian 6.5.6-1+reform20231010T093700Z1
[  591.889492] Hardware name: MNT Reform 2 HDMI (DT)
[  591.894242] Call trace:
[  591.896714]  dump_backtrace+0x9c/0x128
[  591.900509]  show_stack+0x20/0x38
[  591.903860]  dump_stack_lvl+0x48/0x60
[  591.907563]  dump_stack+0x18/0x28
[  591.910913]  __report_bad_irq+0x40/0x130
[  591.914881]  note_interrupt+0x318/0x370
[  591.918759]  handle_irq_event+0xe0/0x100
[  591.922725]  handle_fasteoi_irq+0xb8/0x220
[  591.926863]  generic_handle_domain_irq+0x34/0x58
[  591.931529]  gic_handle_irq+0x58/0x134
[  591.935317]  call_on_irq_stack+0x24/0x58
[  591.939281]  do_interrupt_handler+0x88/0x98
[  591.943509]  el1_interrupt+0x34/0x58
[  591.947124]  el1h_64_irq_handler+0x18/0x28
[  591.951264]  el1h_64_irq+0x64/0x68
[  591.954701]  default_idle_call+0x54/0x100
[  591.958754]  do_idle+0x214/0x278
[  591.962019]  cpu_startup_entry+0x3c/0x50
[  591.965985]  rest_init+0xd0/0xd8
[  591.969249]  arch_call_rest_init+0x18/0x20
[  591.973390]  start_kernel+0x558/0x6d0
[  591.977091]  __primary_switched+0xbc/0xd0
[  591.981143] handlers:
[  591.983439] [<00000000d61e6bc4>] pcie_pme_irq
[  591.987846] [<00000000bac4dcd5>] nvme_irq [nvme]
[  591.992522] Disabling IRQ #191
1 Like

I really appreciate your efforts here, and I feel like you are on the right path for sure!

Yeah, too bad systemd is so inscrutable. It doesn’t tell you why when it tells you no.

root@reform:~# systemctl hibernate
Call to Hibernate failed: Not enough swap space for hibernation
root@reform:~# free -m
               total        used        free      shared  buff/cache   available
Mem:            3930         620        3056           0         402        3310
Swap:          29662           0       29662

Apparently 29662 < 3930 in systemd land. :man_shrugging:

1 Like

Thank you systemd… it will do this if your swap device major:minor does not match what is in /sys/power/resume.

Worth noting, that gets set automatically if you have resume= in the kernel cmdline.

Where I am so far:

  1. Add a swap partition to the SD card and configure it in /etc/fstab
  2. Teach systemd that it is OK to try and hibernate by adding a file in /usr/lib/systemd/sleep.conf.d/
  3. Tell the kernel where to find the hibernate image by adding a file to /usr/share/flash-kernel/ubootenv.d/ to append resume= to bootargs and running flash-kernel

At this point, systemctl hibernate will work and it will write memory out to the swap partition and the machine will sort of halfway turn off. If you power it down and reboot, it will restore from the hibernate, load something like what you were doing before it hibernated, then hang while trying to initialize something.
The nvme and qoriq_thermal will both be very upset. rmmod qoriq_thermal before hibernation makes the thermal error spam go away, but it still hangs.

4 Likes

Hmmm, I wonder what the problem is. Any ideas? Or any ideas on a possible path forward. We seem close.

Close to what I’m not sure.
Have you (or anyone else) tried it yet?

1 Like

I have not tried it yet. I would like to, but need time to digest it all and get it setup.

I think hibernation would be awesome for the Reform even if it took a long time to resume from it. I just like being able to shut down, safe power, but when ready use things exactly like how I had them setup.

1 Like

Sorry for resurrecting this topic - but I decided to give suspend and hibernation a try. Both experiments on iMX8 (original board with 4GB of RAM, no Pro), no NVMe, OS run from uSD card. Debian, but my variant (on btrfs, no encryption) - not MNT OS.

Suspend and then wake up worked. It was just one try - but I’ll be checking this during the next days.
Hibernate worked - but resume from that state failed. I have messages like

ti_sn_bridge_atomic_disable skipped
ti_sn_bridge_atomic_post_disable skipped
ti_sn65dsi86_suspend skipped

Those are displayed (they appeared just before hibernation) and nothing happens: computer does not react on keyboard, not even on Ctrl-Alt-Del. After 15 minutes I used Circle-0 to turn off my machine.
I’ll try to debug it a bit more during next days (e.g. disable quiet mode from u-boot to kernel) to get more details.

2 Likes

Your efforts are greatly appreciated! Hibernation on the Reform and Pocket Reform would be legendary!

1 Like

I am looking to this as a big quality of life thing.

Many thanks

1 Like

A bit more tests, this time on A311D. Suspend did not work: screen got dark, but after abut 10-15s backlight was activated again. Unfortunately nothing was displayed, so I could not check what was the main problem.
OTOH hibernation worked. I was able to bring system back again to the same state it was when I hibernated it. The biggest problem I noticed was that NVMe could not be started: dmesg was showing timeout errors and trying to read partition table (i.e. first sector) lead to process hanging in “D” state. I could not shut down OS and had to use Circle-0. After fresh start NVMe was working again.
I think I’ll move info about suspend to separate, appropriate thread and here keep info about hibernation. At the same time I think I need to think about strategy of testing that to avoid complexity of changing too much at once.

1 Like

that sounds really cool

Work on different SBCs might give us clues for others, but in general I would expect it all to be a little different and what works on one, might not necessarily work on the other.

The issue you were noticing with the NVME not coming back is something that plagued the IMX8 for suspend purposes as well. My guess is that systemd is not properly shutting the NVME drive down, which is leading the drive just being in a hanged state.

Thank you very much for your efforts and sharing them here!