Sleep issues after 3.19 BIOS update - 11th Gen Intel Framework 13

lbkNhubert · September 10, 2024, 1:09pm

Just tested this on an 11th gen that has had the rtc module put in place rather than the battery (not that that should matter), and it does resume from suspend ok. This machine is running the 6.10.8-arch1-1 kernel, and has mem_sleep_default=deep in grub. I will check my other 11th gen in a bit. So the good news hopefully is that this seems not to impact all systems. This machine has 64gb ram, mediatek mt7922 wifi card (I swapped out the intel card), and an sk hynix platinum nvme, with luks encryption.

Maybe try resetting the bios to defaults? Pulling the rtc battery presumably is resetting the main board.

Hopefully you are able to get things sorted out. Best of luck.

Andy321 · September 27, 2024, 8:47pm

Resetting BIOS settings did not help. Seems like more people in other threads are having the same issue too:

In any case, after filing a support ticket and a huge amount of forwards and backwards with them (for some reason they were first dead set on verifying my order, and I am quoting support here directly, for “security reasons”… when the issue we’re talking about is public and unrelated to my order / account… then they tried to offload all of their QA on me, insisting that “it’s my installation”), I finally got the following response:

Hi Andy,

Spoke with engineering for clarification on deep sleep not working. Yes, you will be using s2idle as we have found that the later BIOS are not supporting deep.

If there is anything else we can help with, let me know.

Kind regards,
Matt

This also seems to concur with:

and:

I note that as of this post the 3.20 BIOS release notes still do NOT state this.

Guys, I really think we need to bring this despicable and/or careless behaviour to Linus TT (who is an investor and is rather vocal about poor corpo behaviour) attention – how any of this (denying bios downgrades; not fixing sleep power consumption issues for 3 years; soft-bricking customer devices; not specifying significant changes in bios release notes; etc) is acceptable…

Darryl_A · October 9, 2024, 5:07pm

Suspected that the ML1220 RTC/CMOS battery was dying/dead and that replacing it might help. No such luck. Same behavior - will not wake from suspend Still searching for answers.

lbkNhubert · October 25, 2024, 3:48pm

I have two 11th gen setups that are running the 3.20 bios and are not exhibiting this behavior, thankfully. I’m happy to help anyone troubleshoot.

Typing it out to avoid the posting error, slash_proc_slash_cmdline on one includes nvme.noacpi=1 mem_sleep_default=deep i915.enable_psr=1 sysrq_always_enabled=1

I’ll check the other system and update this in a bit.

lvdd · October 25, 2024, 4:09pm

Thank you for being willing to help.

If you have the possibility, I can reliably reproduce this with a fresh default debian installation running the GNOME desktop. The default suspend mode is going to be s2idle. The only thing I need to do to soft-brick my system is to add mem_sleep_default=deep to grub and reboot. That will start the system with deep mode enabled. Then just press the powerbutton to activate suspend.
My system is going into this endless cycle and can only be revived by disconnecting all batteries (main and RTC).
And yes, I have reset the BIOS to default values before and after. The only way to use this system again is to remove the deep mode from grub again and use s2idle with its known drawbacks.

lbkNhubert · October 25, 2024, 4:17pm

Ok, I’ve been putting the system to sleep using the gnome extension. On the second system now and will try that and update here.

lvdd · October 25, 2024, 4:27pm

Nice,
can you please check which suspend mode your system is using? They say the gnome-extension suspends and then hibernates. That means nobody will care if suspend is s2idle in that case because it will hibernate after a couple of minutes. S2idle is kind of working but has terrible battery drain, which doesn’t matter much once you hibernate the system.
That’s why I am saying the gnome extension is great because it helps configuring hibernation but is not solving the original issue with the deep suspend mode , which was working fine in the older BIOS. I wish I could go back to y 3.17 BIOS and would never touch it again

lbkNhubert · October 25, 2024, 4:55pm

Tested on the second system:

Suspend from gnome-menu: wakes ok
Suspend with power button: wakes ok
Suspend via command sudo systemctl suspend: wakes ok

My logind.conf file has lid switch and idle action set to suspend-then-hibernate, but I don’t believe that that should apply for these cases.

Do you see anything in the logs when the system goes into the loop, or do they get lost because they have not been written to disk before the loop starts?

For completeness, the two systems have all usb-c or a mix of usb-c cards, sk-hynix drives (gold in one, platinum in the other), one has a mediatek wifi card, the other the intel ax210, one has 64gb ram, the other has 16gb ram, one is on arch, the other manjaro, both are using luks encryption with swap partitions rather than swap files.

What is your slash_proc_slash_cmdline, and what are your system specs? It would be great if we could figure out why my systems have avoided the issue, so that you and others could go back to making use of deep sleep.

lvdd · October 25, 2024, 5:37pm

Sorry, took a while as I had to de-brick my laptop again.

This is a Fedora 40 with all updates installed. LUKS encryption is used as well.
My slash_proc_slash_cmdline is:

BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.11.4-201.fc40.x86_64 root=UUID=cb5de268-5529-445a-864a-d8ea07209cc2 ro rootflags=subvol=root rd.luks.uuid=luks-cdb51624-dd59-4f6c-bfab-9811089b2e0f rhgb nvme.noacpi=1 mem_sleep_default=deep acpi_osi=!Windows 2020

I don’t know where this acpi_osi parameter comes from but it is not present on my debian installation on a separate SSD.
lspci:

00:00.0 Host bridge: Intel Corporation 11th Gen Core Processor Host Bridge/DRAM Registers (rev 01)
00:02.0 VGA compatible controller: Intel Corporation TigerLake-LP GT2 [Iris Xe Graphics] (rev 01)
00:04.0 Signal processing controller: Intel Corporation TigerLake-LP Dynamic Tuning Processor Participant (rev 01)
00:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor PCIe Controller (rev 01)
00:07.0 PCI bridge: Intel Corporation Tiger Lake-LP Thunderbolt 4 PCI Express Root Port #0 (rev 01)
00:07.1 PCI bridge: Intel Corporation Tiger Lake-LP Thunderbolt 4 PCI Express Root Port #1 (rev 01)
00:07.2 PCI bridge: Intel Corporation Tiger Lake-LP Thunderbolt 4 PCI Express Root Port #2 (rev 01)
00:07.3 PCI bridge: Intel Corporation Tiger Lake-LP Thunderbolt 4 PCI Express Root Port #3 (rev 01)
00:08.0 System peripheral: Intel Corporation GNA Scoring Accelerator module (rev 01)
00:0a.0 Signal processing controller: Intel Corporation Tigerlake Telemetry Aggregator Driver (rev 01)
00:0d.0 USB controller: Intel Corporation Tiger Lake-LP Thunderbolt 4 USB Controller (rev 01)
00:0d.2 USB controller: Intel Corporation Tiger Lake-LP Thunderbolt 4 NHI #0 (rev 01)
00:0d.3 USB controller: Intel Corporation Tiger Lake-LP Thunderbolt 4 NHI #1 (rev 01)
00:12.0 Serial controller: Intel Corporation Tiger Lake-LP Integrated Sensor Hub (rev 20)
00:14.0 USB controller: Intel Corporation Tiger Lake-LP USB 3.2 Gen 2x1 xHCI Host Controller (rev 20)
00:14.2 RAM memory: Intel Corporation Tiger Lake-LP Shared SRAM (rev 20)
00:15.0 Serial bus controller: Intel Corporation Tiger Lake-LP Serial IO I2C Controller #0 (rev 20)
00:15.1 Serial bus controller: Intel Corporation Tiger Lake-LP Serial IO I2C Controller #1 (rev 20)
00:15.3 Serial bus controller: Intel Corporation Tiger Lake-LP Serial IO I2C Controller #3 (rev 20)
00:16.0 Communication controller: Intel Corporation Tiger Lake-LP Management Engine Interface (rev 20)
00:1d.0 PCI bridge: Intel Corporation Tiger Lake-LP PCI Express Root Port #10 (rev 20)
00:1f.0 ISA bridge: Intel Corporation Tiger Lake-LP LPC Controller (rev 20)
00:1f.3 Audio device: Intel Corporation Tiger Lake-LP Smart Sound Technology Audio Controller (rev 20)
00:1f.4 SMBus: Intel Corporation Tiger Lake-LP SMBus Controller (rev 20)
00:1f.5 Serial bus controller: Intel Corporation Tiger Lake-LP SPI Controller (rev 20)
01:00.0 Non-Volatile memory controller: Kingston Technology Company, Inc. NV1 NVMe SSD SM2263XT (DRAM-less) (rev 03)
aa:00.0 Network controller: Intel Corporation Wi-Fi 6E(802.11ax) AX210/AX1675* 2x2 [Typhoon Peak] (rev 1a)

I am not sure what to look for in the logs. I have pasted the journal over here: https://anonpaste.org/?4d471e0b4d46602a#21vSqTMEtGGKRdQgxw8SmsabbfKeMqv9oWn4Vd274wXo
It says it enters deep and the end right before the time is set back

lvdd · October 25, 2024, 5:50pm

new paste as there is more weird stuff. It looks like it is trying to wake up but can’t and is falling back to sleep

https://anonpaste.org/?f21e67a03810709a#3tGESkvZVv2aDunWogHFpvi7fNgaunei8nS7BecDVpCD

lbkNhubert · October 25, 2024, 5:55pm

I will take a look at the log. I apologize for the stupid question, but when this hits, the only recourse is a mainboard reset? Holding the power button for 30+ seconds or doing a cold boot (power off, unplug for two minutes, plug back in, power back on) both don’t work? What about adding the kernel parameter sysrq_always_enabled=1 and trying to issue REISUB when the machine does not come back? Still no dice?

Off to peruse the log now…

lvdd · October 25, 2024, 6:15pm

Thank you!
The full cycle goes like this:

I boot with deep enabled via grub cmdline
In Gnome I just press the powerbutton and the system goes to sleep (powerbutton is flashing)
I then use the powerbutton again to wake it up
It tries to come out of sleep (screen flashing once and the light on the powerbutton turns on)
after maybe 5 seconds - the system turns off again as it seems to go back into sleep for 2 seconds (powerbutton flashing again)
and then it is trying to wake up again and stays in this cycle forever. At my first encounter I let it in this state for maybe 10 minutes.
When it is in that state, the fan starts spinning and is slowly ramping up until it is at full speed constantly
I can press the powerbutton for 30 seconds and the laptop turns fully off
I have tried letting it stay off for a few minutes
When I try to start it again it goes immediately into this sleep cycle
The only way to get it out of that is to remove all batteries (main and RTC) and start the laptop again
BIOS is going through the full initial cycle with RAM training and then starts booting normally
This is all happening on battery - there is nothing connected although I have tried it with power connected as well

I also tried a second SSD and installed a fresh Debian stable which showed exactly the same behaviour.
I am not familiar with this sysrq parameter but will try it out now and I am unfamiliar with this “REISUB” thing you mention. Can you maybe point me to a document explaining what that is?

Thanks

lbkNhubert · October 25, 2024, 6:20pm

That’s awful. The sysrq is a long shot, all I have managed to use it for is to somewhat blindly force a restart if the machine gets into a semi-hung state. Given what you note above I’m not sure that it will help, but it won’t hurt to try. Basically you hold ctrl-alt, hit the printscreen key, release that key while continuing to hold down ctrl-alt, and type reisub - each of those keys actually does a different thing, but it’s getting beyond my skill level. More here: Keyboard shortcuts - ArchWiki

lvdd · October 25, 2024, 6:51pm

Yeah, as you suspected, the parameter didn’t help. I have additionally removed the acpi_osi parameter, but the effect is the same. No dice!

But thank you again! I have learned something new today!

lbkNhubert · October 25, 2024, 6:54pm

Nothing stands out to me in the log. I can see the machine going into suspend, the only thing after that is when you have to reset the mainboard and reboot. Still digging.

lvdd · October 25, 2024, 7:02pm

new logs here: https://anonpaste.org/?49ae11a16203c370#D1zhnsBjvR4Pw8eAQvMhghKnqQ8GuwiRMVMXuhXzPsdu

I had a couple more cycles while it was in sleep this time. What I think is weird is that the clock falls back to Oct. 11 2:00
Why that date? and why does this happen before I remove the battery? Usually systems fall back to something like Jan 1st 1970 when the batteries are removed.

lbkNhubert · October 25, 2024, 7:40pm

Haven’t gotten to the logs yet. testing a few things.

What is the output of the following:

cat /sys/power/pm_test
cat /sys/power/disk
cat /sys/power/state

lvdd · October 25, 2024, 7:49pm

$ cat /sys/power/pm_test 
[none] core processors platform devices freezer
$ cat /sys/power/disk
[platform] shutdown reboot suspend test_resume 
$ cat /sys/power/state
freeze mem disk

lbkNhubert · October 25, 2024, 8:01pm

Ok. I’m currently hacking through this on one of my 11th gen setups, so we can try to step through it together. I do have to step out in a bit so we may have to pock this back up another day, unfortunately. Basically I am following the steps here: Debugging hibernation and suspend — The Linux Kernel documentation but instead of echoing disk to /sys/power/state I am echoing mem.

Stop me if I am being too step-by-step and you need me to speed up. I am muddling my way through, mind you.

In a terminal su to root
then

mount -t debugfs none /sys/kernel/debug
cat /sys/kernel/debug/suspend_stats

Then step through echoing freezer|devices|platform|processors|core to pm_test
e.g.

echo freezer > /sys/power/pm_test
echo platform > /sys/power/disk
echo mem > /sys/power/state

When the machine comes back (fingers crossed) I am running

cat /sys/kernel/debug/suspend_stats

On one of the test (platorm or processors), the machine took 5 or more minutes to “come back”, so be patient. If the machine does not come back, note what you had echoed into pm_test and we can see what we can figure out. I have gone through this before when testing hibernation, but not suspend. We may have to go through trying the different options for /sys/power/disk. Again, I am a bit out of my depth here but trying to work through it with you. Hopefully we find something.

lvdd · October 25, 2024, 9:56pm

Well, that was kind of uneventful All test passed!
The result is here: https://anonpaste.org/?60ffb0b2ecdad0a0#CG94MjcnToHtrfYsSr6QNoxyURh6cmYBh5kedqp3FawJ
There are a few open values for /sys/power/state left. I can test them as well, but that has to wait until tomorrow.