[RESPONDED] FW 13 AMD - lockup on hibernate, only after suspend (battery drains on suspend-then-hibernate)

Hi all, I’ve been debugging this very odd issue for the last few months. I would be very grateful for some advice or to see if anyone else is running into this.

My previous post was here in the general hibernation issues thread. My issue is still as described there:

I have suspend-then-hibernate configured, and whenever I close the lid I come back to an empty battery after suspending the system and then walking away. The system suspends properly, but seems to lock up after waking and then attempting to hibernate. If I open the lid before the battery runs dry, the system is unresponsive and the screen is black, but the keyboard backlight is on, and the power LED is on.

Configs and specs are at the bottom of this post.

What troubleshooting I’ve done:

  • Ran amd_s2idle.py and confirmed it’s not a suspend issue. All green, and I can suspend with no issues.
  • Ran the hibernate stress test from the previous thread. No issues.
  • Narrowed the hibernation issue down to the “devices” stage of the “shutdown” hibernation method by following Debugging hibernation and suspend and only testing after manually suspending and resuming the system myself.

Yesterday, I attempted to use pm_trace to determine what’s locking the system up upon hibernation via DebuggingKernelSuspend - Ubuntu Wiki :

# echo 1 > /sys/power/pm_trace

And then shutting the lid, and waiting HibernateDelaySec for the system to come back from suspend, and then attempt to hibernate.

That resulted in the below:

[    0.548190] PM:   Magic number: 0:602:372
[    0.548301] PM:   hash matches drivers/base/power/main.c:1591
[    0.548459] pci 0000:02:00.0: hash matches

Where 0000:02:00.0 appears to be the SSD:

$ lspci -vvv | grep 02:00.0

02:00.0 Non-Volatile memory controller: Sandisk Corp WD Black SN850X NVMe SSD (rev 01) (prog-if 02 [NVM Express])

So I am at a loss - I’ve updated the firmware using the method used by GitHub - not-a-feature/wd_fw_update: Updates the firmware of Western Digital SSDs on Ubuntu / Linux Mint., but the issue persists despite this. There are no useful errors in dmesg anywhere.

And so, I ask - is anyone else suffering from this? I am tempted to try a different SSD, but as this is the one that came with the system, and I don’t have any spares lying around, I wanted to post here. I could also be convinced that something else is wrong.

Thanks for taking the time to look at this post.


Framework Laptop: AMD 13in w/ 7840U & 780M, 1TB SN850X HDD, 32GB RAM
OS: Arch Linux
Kernel: 6.9.3-273

/etc/systemd/sleep.conf:

[Sleep]
AllowSuspend=yes
AllowHibernation=yes
AllowSuspendThenHibernate=yes
HibernateMode=shutdown
HibernateDelaySec=30 #for testing, normally 3600

/etc/systemd/logind.conf:

[Login]
HandlePowerKey=ignore
HandlePowerKeyLongPress=poweroff
HandleLidSwitch=suspend-then-hibernate

The drive is encrypted, and kernel parameters are set correctly – the device will hibernate and restore successfully if I do not suspend beforehand, so I am sure this works. I can post kernel params and the output of amd_s2idle as well, but when I try to post them I get a 403 (perhaps a false positive on some anti-spam or exploit filter?).

Hi, are you by any chance also using arch like the post you linked?
also can you collect your journal log? for both successful hibernation and failed ones?

Hello! Thank you for taking a look.

I am using Arch. I’ve upgraded to 6.9.7 and collected two journal logs.

I was able to dig up an old spare SSD. I cloned my SN850X to a Samsung 970 Pro and swapped it in.

Hibernation after suspending is now much more consistent. I have not yet been able to trigger the issue with this SSD. I’m not sure of what the problem is with the SN850X that makes it hang up on the specific sequence of suspend-then-hibernate, but I’ll likely purchase another, non-WD SSD for use in the laptop.

If anyone has any further debugging suggestions, I’m happy to try them, but for now, this has resolved the issue for me.

After using the 970 Pro, I was able to observe hibernate freezing after a few days. It seems less consistent than with the SN850X.

I switched to a 990 Evo today, and the first hibernate was successful, but the second froze. It seems worse than the 970 Pro.

I wonder if this is [partially?] why there’s an SSD survey specifically for the AMD platform.

The unsuccessful hibernate is caused by the intel wifi card failing to resume correctly.
As a test, try disabling the wifi in the bios or remove the wifi card, or blacklist the wifi driver.
If that cures the problem, you can then concentrate on bug fixing the wifi drivers, firmware.

Thank you for your suggestion. I had tried switching wifi cards, and implementing a pre-suspend hook to run rfkill, but not removing the card completely yet.

As a test, I’ve removed the Intel AX210 completely and tested suspend-then-hibernate without any wifi card.

When I attempt to trigger the issue by running systemctl suspend-then-hibernate and cycle through suspend → hibernate → resume a few times, hibernation still freezes at

kernel: PM: hibernation: hibernation entry

usually at the 2nd hibernation, with no useful debug information surrounding it.

My next test will be to remove both the wifi card and NVME drive, and attempt to boot from the Framework storage card, and test suspend+hibernate.

Did you end up with a solution?

Unfortunately not.

Hibernation freezes eventually with all of the below drive models for me:

  • WD SN850X 1TB
  • Samsung 970 Pro 1TB
  • Samsung 990 Evo 1TB

I haven’t had time to configure and test booting from and hibernating to the storage expansion card.

A friend with a similar configuration (Arch, drive encryption, hibernation to swapfile) has said hibernation is more consistent for them (though not perfect). Their drive is a SK hynix Platinum P41 2TB.

I have disabled hibernation for the time being and am back to simply suspending.

Did you try CONFIG_PM_DEBUG? I think the log should have more information than just the single entry if this was on. At least one would hope. I am not sure how much overlap with pm_trace

Was also wondering if you tried doing this from a tty instead. You might be able to call hibernation and maybe get some logging on screen? I wonder if there are more logs that just aren’t able to be flushed to disk.

Did you try CONFIG_PM_DEBUG? I think the log should have more information than just the single entry if this was on. At least one would hope. I am not sure how much overlap with pm_trace

In my experience, CONFIG_PM_DEBUG didn’t provide any extra logging for me. It is already enabled in my kernel (it might be enabled by default in arch?):

$ zcat /proc/config.gz | grep CONFIG_PM_DEBUG
CONFIG_PM_DEBUG=y

I believe it only enables hibernation test modes (?), which I initially used to narrow down the failing stage of hibernation to the devices stage.

The extra logging I did have enabled was from no_console_suspend initcall_debug kernel parameters, but nothing appears after that line, even so.

Was also wondering if you tried doing this from a tty instead. You might be able to call hibernation and maybe get some logging on screen? I wonder if there are more logs that just aren’t able to be flushed to disk.

I briefly tried this from a tty as well, but unfortunately I wasn’t able to see any extra logs on the screen before it froze. Thank you for your suggestions!