Another thing to try, if using a kernel < 6.10.0 is:
add kernel parameters:
rtc_cmos.use_acpi_alarm=1 amdgpu.sg_display=0
This only happens when I run the above script to turn off wake events, so not sure if it is the same driver bug?
rtc_cmos.use_acpi_alarm=1
parameter isn’t needed for most kernels, they’ve backported to all the stable trees a while now. The amdgpu.sg_display=0
is not needed with the latest BIOS.
Thank you for the latest s2idle report
I don’t actually understand some of it. The answer to the following question might help.
When you put it too sleep, how many seconds (approx) did you wait before pressing the power button? e.g. 5, 10, 30, 60, 120 ← that sort of approx
The first one is already set by default in Fedora 40:
$ cat /sys/module/rtc_cmos/parameters/use_acpi_alarm
Y
As for this:
I timed it to about 60 seconds.
The computer was shut down the whole last night and (like mentioned somewhere in the thread above) sleep works a day or so and then suddenly goes haywire. This, to me at least, sounds like some hardware/EC shenanigans.
I really appreciate your time and effort and I would really, really love it if we found a cause.
I ran a new test after the long poweroff/EC reset and the primary difference in the reports I see is the SMU idlemask.
For the reports with successful sleep I get an SMU idlemask 0x3ffbbefd
and for the latest broken one it was 0x3ffb3ebd
. Looking at the list of s2idle reports I have stored there are three different idlemasks (not sure if this is at all relevant):
amd_pmc: SMU idlemask s0i3: 0x3ffb3ebd
amd_pmc: SMU idlemask s0i3: 0x3ffbbebd
amd_pmc: SMU idlemask s0i3: 0x3ffbbefd
Thank you for the updated info.
from the s2idle report:
|2024-08-01 00:17:49,617 ERROR:|❌ In a hardware sleep state for 0:00:17.475311 (26.92%)|
|---|---|
|2024-08-01 00:18:55,608 ERROR:|❌ In a hardware sleep state for 0:00:08.592926 (13.43%)|
|2024-08-01 00:20:02,609 ERROR:|❌ In a hardware sleep state for 0:00:13.002349 (20.00%)|
|2024-08-01 00:21:07,612 ERROR:|❌ In a hardware sleep state for 0:00:06.107635 (9.69%)|
From the dmesg:
[ 280.745864] Timekeeping suspended for 17.783 seconds
[ 339.236831] Timekeeping suspended for 8.848 seconds
[ 409.510744] Timekeeping suspended for 13.864 seconds
[ 469.590032] Timekeeping suspended for 6.840 seconds
Summary:
We have disabled all wakeup except the power button.
The system is still being woken up by something at between 5-20 seconds after it slept. But the system then just waits till the power button, at 60 seconds, is pressed before returning to the woken state.
What to try next:
There might be a way to find out what is doing the wakeup.
After a new power cycle, so that some more of the wakeups are enabled.
find /sys -iwholename "*power/wakeup" -exec echo {} \; -exec cat {} \;
That will output a list of wakeups filenames, followed by whether they are enabled or disabled.
Only modify the ones that have “enabled” from the find command.
You probably want to make your own script here, but do trial an error until find the one that then does not have a red cross by the “In hardware sleep state”.
For example:
Case A: if the mystery one is set to disabled one sees:
hardware wakeup in the 5-20 second range.
and then userspace in the 60 second range. (when you press the power button)
Case B: If the mystery one is set to enabled one sees:
hardware wakeup in the 5-20 second range.
and then userspace in the 5-20 second range.
A binary search is probably the quickest to do.
Say there are 16 files have have “enabled” next to them.
set 8 of the enabled ones to disabled, leaving the other 8 enabled, do a sleep test.
If the result is case A, then the mystery one is one of the 8 disabled ones.
If the result is case B, then the mystery one is one of the 8 enabled ones.
Then adjust split the 8 into 4 enabled, 4 disabled and do a sleep test.
repeat until you find the single mystery item that is causing the problem.
Interesting about the SMU idlemask values. That is a register on the AMD Power Management controller SMU, but I don’t know what it does.
On my FW16 the idlemask has only varied between:
amd_pmc: SMU idlemask s0i3: 0x3ffb3eb5
amd_pmc: SMU idlemask s0i3: 0x3ffbbeb5
The system wakes up with all *power/wakeup
set to disabled, I cannot see how setting some of them back to disabled will have any effect.
The only thing I can do that consistently gives me 5x 60s sleep is shutting down the computer for a couple of hours and cold boot it.
To repeat what I wrote in the PM: When the computer goes into the “cannot sleep” mode it stays there regardless of reboots. When in this state the enabled/disabled status of the wakeups have no effect on whether the computer enters HW sleep or not.
So there’s nothing we’ve tried that brings it into HW sleep state. It will sleep fine the first couple of times after being shut off for an extended period of time, but soon returns to not wanting to sleep.
Say the cause of the wakeup is device A.
When set A to “disabled” we see hardware wakeup 8 seconds, user land wakeup 60 seconds.
When set A to “enabled” we see hardware wakeup 8 seconds, user land wakeup 8 seconds.
So by varying the enabled/disabled of the various devices we eventually narrow down which device is device A by how it affects the difference between hardware and userland wakeup times.
For these tests you need to remove the sleep script that disabled everything.
- I removed the sleep script
- I did not disable alarmtimer
- Attempted binary search, but could not find a pattern
- It quickly ended with all wakeups disabled, and the wakeup times are all over the place
Abbreviated log with all but timer disabled:
❌ In a hardware sleep state for 0:00:24.374054 (38.73%)
❌ In a hardware sleep state for 0:00:00.156097 (0.25%)
❌ In a hardware sleep state for 0:00:23.372680 (37.77%)
❌ In a hardware sleep state for 0:00:17.982391 (28.96%)
❌ In a hardware sleep state for 0:00:16.134491 (26.11%)
❌ In a hardware sleep state for 0:00:19.779449 (31.89%)
❌ In a hardware sleep state for 0:00:16.355316 (26.29%)
❌ In a hardware sleep state for 0:00:13.650390 (22.06%)
❌ In a hardware sleep state for 0:00:15.358642 (24.71%)
❌ In a hardware sleep state for 0:00:17.986846 (29.06%)
❌ In a hardware sleep state for 0:00:54.246916 (86.23%)
Ok, so no luck. It was worth a try.
I guess only AMD can help now with a better way to debug this problem.
Just clutching at straws here, but I saw this:
Kernel parameter:
acpi.prefer_microsoft_dsm_guid=1
Please read that thread and maybe give it a try.
Another aspect of this is it might be a bug in the EC code. I see APIs in their that are mixing 64 and 32 bit masks for things like SCI, so there are probably bugs where some wakeup bits might not get masked when they should. Unfortunately, I don’t have the schematics of the FW laptop, so it is very difficult to know which bit is for what. The EC source code does not document what each bit is.
I have been browsing the EC source code a bit.
It sends wakeup events when the battery charge changes.
To see if that might be the cause, please wait for the battery to fully charge and leave the power plugged in, and then try some suspend/sleeps.
Let me know if it works any better, as in properly sleeps and wake up when it should.
By fully charge, it does not have to be 100%. You can set the BIOS max to 50% or 70% and then that is enough charge for this test.
The kernel parameter does not have any effect.
I will run a test with the battery fully charged and without a battery.
I did four tests:
- Low battery being charged
- Battery at 90% (set in bios) and not charging more
- Battery physically disconnected
- Battery physically disconnected with input cover removed
I observe the same behavior with intermittent wake in all tests.
s2idle report for test 2: https://gist.githubusercontent.com/ripdajacker/b6964a990e64576438ba43d9858d9d7b/raw/45e9fdb3e6730b8a6edabf35a1a2dcb215a85107/gistfile1.txt
Ok. So not luck there.
We have tried all sorts of things now, but with no luck.
I don’t think there is anything else I can help with now.
We have no way to tell what is causing your ACPI SCI wakeups, and we have tried to discount the keyboard, mouse, lid switches.
What would have helped:
- Better debug on the EC console
- The full FW schematics, so we could tell what devices might trigger the ACPI SCI.
- Some help from AMD to explain what the idlemask bits mean.
Are you having any luck with FW support?
I’ve reached out to them and we are working on a solution
I knew nothing about s2idle before and now I feel like I know too much Thank you all for your time help.
After a lot back and forth the conclusion is a flaky input cover. I will be receiving a new input cover under warranty and report back.
Hi!
Did you get a replacement? And did it solve your issues?