[RESPONDED] Arch hibernation woes on AMD 13

I’m in the same place as you. Both CachyOS (with either arch or cachyos kernel) and NixOS don’t hibernate/resume properly when running a 6.11 kernel, but resume fine with a 6.10 kernel

Elaborating somewhat - using a swapfile on a LUKS encrypted btrfs partition, systemd initrd instead of busybox, and letting systemd detect and write the HibernateLocation EFI var (meaning no kernel parameters for resume+resume_offset)

Hmm, I too am using systemd-boot and LUKS encrypted btrfs.

I actually see my old screen pre-hibernate (on a test_resume, which apparently bypasses the firmware) but can’t interact with the system.

I’m using ext4 and lvm on luks with a swap file and grub boot. Hibernation works very reliable here on 6.11.1. Maybe the systemd-boot cause this problem?

I wonder if test_resume skips systemd-boot or not, I thought it just uses the same kernel and tests it:

# echo test_resume > /sys/power/disk
# echo disk > /sys/power/state

This fails for me on 6.11.1, this is my kernel cmdline (with linebreaks so it’s easier to read)

rd.luks.name=<redacted>=archroot
rd.luks.options=<redacted>=tpm2-device=auto
root=/dev/mapper/archroot
rootflags=subvol=@archroot
rw
quiet
resume=UUID=<redacted>
rd.luks.uuid=<redacted>
rd.luks.options=<redacted>=tpm2-device=auto
splash
vt.global_cursor_default=0
video=efifb:nobgrt
udev.children-max=1000

I have none of the rd.luks/resume bits in my cmdline, since recent versions of systemd can handle that (see using /etc/crypttab.initramfs for prepping the luks device, and systemd-hibernate-resume(8) — Arch manual pages for systemd automatically setting resume location)

I note you don’t have resume_offset in there, does that mean you’re using a swap partition? I’m using a swapfile

Some of this sounds like [SOLVED] Using Linux QEMU/KVM causes s2idle hard freeze on Arch - Linux 6.10.8 which is a systemd/kernel bug.

Doesn’t look like it’s the same thing, I tried the systemd dropins mentioned in that thread and still get a black screen when resuming from sleep.

Now, the fact that that thread reckons they fixed the issue in 6.11-rc7 makes me wonder if that fix is the source of the regression we’re seeing

Yes, I am using a swap partition (encrypted with the keys in a TPM)

Some more attempts to narrow things down:

Tried my nixos config on my old 11th-gen Intel Zenbook (barring the Framework/AMD specific bits). That resumed with no issues.

Tried udev based HOOKS=, resume didn’t work

Just tested 6.11.2 and I see the same behaviour. Using 6.10.12 for now.

1 Like

This is also what I was experiencing and downgrading to 6.10 worked for me as well

Anyone have any idea how to debug this? Or which kernel mailing list to post this to? I’ve built many kernels in my lifetime so I’m more than happy to help debug.

6.11.3 had some interesting patch notes, so I gave it a try. With both the arch version in core-testing, and the cachyos-znver4 version from cachyos, I get exactly 1 successful resume. After that, black screen every time, including after a fresh hibernate.
I’ve also had that same behaviour once by stopping fprintd.service before hibernating, but again, I’m completely unable to reproduce this.
FWIW ubuntu 24.10 (kernel 6.11.0) also freezes on resume from hibernate for me. I’ve never had that working though, so can’t rule out the possibility that I just set it up wrong

I did two test hibernates

# echo test_resume > /sys/power/disk
# echo disk > /sys/power/state

in a row and it seems to work on arch 6.11.3, going to try a real one now.

Edit: Did an actual systemctl hibernate on 6.11.3 and it did not come back up. I’ll just revert back rather than test it some more.

OK, so it seems like 6.11.3 in combination with the mt7621e wifi script above made hibernate work. It was proabably two separate issues here.

Hibernate was working fine for me, but it broke recently. I booted with no_console_suspend and recovered the following crash, but I was not able to see it in the journal (only the kernel virtual TTY). So I’m posting an enhanced picture of the screen and OCR for convenience:


[ 146.0548891] Workqueue: async async_run_entry_fn
[ 146.054908] RIP: 0010:hci_unregister_dev+0x45/8x1f@ [bluetooth]
[ 146.054962] Code: 89 ef e8 ae 03 8b d2 f0 80 8b e9 De 00 00 08 48 89 ef e8 0e f1 8a d2 48 c7 c7 68 20 c4 c1 e8 22 52 8b d2 48 8b 43 08 48 8b 13 <48> 3b 18 0f 85 b5 c7 06 00 48 3b 5a 08 Of 85 ab c7 06 00 48 89 42
[ 146.054984] RSP: 0018: ffffbba241defcc8 EFLAGS: 00010246
[ 146.054996] RAX: dead000000000122 RBX: ffffa86a81276000 RCX: 0000000000000000
[ 146.055007] RDX: dead000000000100 RSI: ffffa06a81a47910 RDI: ffffffffc1c42068
[ 146.055018] RBP: ffffa06a812764d0 R08: 0000000000000000 R89: ffffa06a801d9610
[ 146.055028] R10: ffffbba241defcd0 R11: ffffbba241defcd8 R12: ffffa06a81276000
[ 146.055038] R13: ffffffffc2323278 R14: ffffffffc2323278 R15: ffffa06a8f9f0850 146.055050] FS:
[ 146.055050] FS: 0000000000000000(0000) GS:ffffa07960000000(0000) kn1GS:0000000000000000
[ 146.055062] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 146.055072] CR2: 0000000000000000 CR3: 000000070ba22000 CR4: 0000000000f50ef0
[ 146.055083] PKRU: 55555554
[ 146.855890) Call Trace:
[ 146.055099] <TASK>
[ 146.055187] ? __die_body.cold+0x19/0×27
[ 146.055122] ? die_addr+0x3c/0x60
[ 146.055134] ? exc_general_protection+0x17d/0x400
[ 146.055147] ? ep_poll_callback+0x24d/0x2a0
[ 146.055164] ? asm_exc_general_protection+0x26/0x30
[ 146.055184] ? hci_unregister_dev+0x45/0x1f@ [bluetooth 1400000003000888474e5500314a936b2959fa34]
[ 146.855238] ? hci_unregister_dev+0x3e/0x1f0 [bluetooth 1400000003000000474e5500314a936b2959fa34)
[ 146.055288] btusb_disconnect+0x67/0x170 [btusb 1400000003000000474e55007dd0b46154bf4bec]
[ 146.055307] usb_unbind_interface+0x90/0x290
[ 146.055325] device_release_driver_internal+0x19c/0x200
[ 146.055341] usb_forced_unbind_intf+0x75/0xb0
[ 146.055354] unbind_marked_interfaces.isra.0+0x59/0×80
[ 146.055368] ? __pfx_usb_dev_restore+0x10/0x10
[ 146.055381] usb_resume+0x5a/0x60
[ 146.055392] _dp_run_callback+0x47/0x150
[ 146.055407] device_resume+0xb0/0×280
[ 146.055419] async_resume+0x1d/8x30
[ 146.055431] async_run_entry_fn+0x31/0×140
[ 146.055444] process_one_work+0x17b/0x330
[ 147.894397] [drm] ring gfx_32772.1.1 was addedmes_kiq_3.1.0 uses VM inv eng 13 on hub 0vailable 68 20 c4 c1 e8 22 52 Bb d2 48 8b 43 08 48 8b 13 <48> 3b 18 0f
[ 147.896995] [drm] ring compute_32772.2.2 was added
[ 147.098944] [drm] ring sdma_32772.3.3 was added
[ 147.102569] [dro] ring gfx_32772.1.1 ib test pass
[ 147.186131] [drm] ring compute_32772.2.2 ib test pass
[ 147.188302] [drm] ring sdma_32772.3.3 ib test pass
[ 147.136203] usb 1-4.3: reset full-speed USB device number 8 using xhci_hcd
[ 147.238067] usb 1-4.3: unable to get BOS descriptor set
[ 148.449222] mt7921e 8000:04:00.0: Message 00020007 (seq 4) timeout
[ 148.451497] mt7921e 0000:04:00.0: PM: dpm_run_callback(); pci_pm_restore returns -110
[ 148.454598] mt7921e 8000:04:00.0: PM: failed to restore async: error -110
[ 148.534371] mt7921e 0000:04:00.0: HW/SW Version; 8x8a188a10, Build Time: 20240716163242a
[ 148.534371]
[ 148.912088] mt7921e 0000:04:00.0: WM Firmware Version: ____000000, Build Time: 20240716163327

Does anyone know what’s going on here? It looks perhaps related to the mt7921e driver, bluetooth, and USB.

A few additional notes:

  • My laptop is a framework 16
  • Without no_console_suspend, the system displays a frozen picture on the screen of whatever I was doing before it hibernated
  • If I put the laptop in airplane mode, it doesn’t hang on resume at all. It always seems to hang on resume when not in airplane mode.

To anyone reading this thread, I want to mention something important.

A lot of Linux professionals, by the way, recommend against using hibernation on Linux. It almost always never ends up being a good idea. Just low power suspend is good enough.

If you are using encryption and worried about access to the laptop, the cryptsetup-suspend tool can be installed. No need to set up hibernation, swap partition or swap file, anything like that. Just super simple and easy enough.

The only downside is that, sure, if someone maliciously uninstalls or disables cryptsetup-suspend then it won’t re-encrypt, but the attacker would need root to do that, along with bypassing any default SELinux permissions (if that’s installed).

So save yourself the headache and try that out, Most people on Linux loathe hibernation because it’s basically just a really fancy way of shutting down the laptop. Not worth setting up because of how buggy it usually is.

If no Linux professionals and developers are spending the time to make hibernation work, then why spend ten times the amount of effort as regular Linux users? Doesn’t make much sense to me. And it is sad that very experienced developers are not making the effort to get this to work as well as Windows and Mac handles it, but oh well… what can we do? :frowning:

There’s a very valid usecase for hibernate though – suspend-then-hibernate is a lot more user friendly. It’s the default on my work macbook.

If I put my laptop away end of day Friday to catch the train, I expect it to have battery for my first meeting Monday morning (and also not overheat in my backpack). Suspend then hibernate after an hour solves this usecase for me – it’ll lose about 1% battery for the hour before it hibernates, vs 25% per day in “low power suspend”.

Some of us are clearly power users and we clearly can help make this feature better by using it and reporting bugs like we’re doing in this thread.

Sorry for the tangent, every time this comes up, people always comment “why hibernate, suspend Works For Me, and should work for you too”.

3 Likes

Hah, glad you’ve fixed it. Missed your post but fixed mine today after somebody pointed me to https://bugzilla.kernel.org/show_bug.cgi?id=219290. I tested hibernate without bluetooth enabled on 6.11.4 and it all worked fine.

I’m now running this systemd service:

[Unit]
Description=Disable Bluetooth before going to sleep
Before=sleep.target
StopWhenUnneeded=yes

[Service]
Type=oneshot
RemainAfterExit=yes

ExecStart=/usr/bin/rfkill block bluetooth
ExecStop=/usr/bin/rfkill unblock bluetooth

[Install]
WantedBy=sleep.target

Which has fixed my issues

1 Like