CPU sometimes gets stuck in some sort of low power mode

It sometimes happens that the CPU gets stuck in a sort of low power mode.

Running a large parallel compilation in that state nets me an average of about 1GHz on all cores and a battery power draw of just 25W. For reference, the power draw would typically go up to 70W or so under such load and I believe clocks would be in >3GHz but I’m not sure.

Single core clocks are harder to judge for me but they’re certainly weird and it feels slow.

For something slightly reproducible, 100% load on 1C nets 1.something GHz while in power-saver and power draw is also barely above idle while it will typically go up to 40-50W. (That itself isn’t ideal but that’s for another topic.)

Needless to say, this slows things down significantly. Using any other program while such a compilation is running feels extremely sluggish. Just typing this out I can feel the lag.
Again needless to say but the compilation is extremely slow too.

I don’t know how exactly this happens but this machine has been up for 5 days and I only just noticed it. When I put it to sleep yesterday it was behaving normally.

This has happened a few times before and I think I once got it to revert back to normal by some sequence of plugging in power and going into s2idle but I can’t reproduce that.

Have any of you experienced this too?

I suspect this is related to the EC. I see entries like this when poking around in ectool console:

[2554107.949400 Battery 39% (Display 37.1 %) / 1h:40 to empty]
[2554108.010200 DC BEST PERFORMANCE]
[2554108.011400 event set 0x0100000000000000]
[2554108.060700 PMF: SPL 40000mW, sPPT 48000mW, fPPT 58000mW, p3T 118000mW, ao_sppt 0mW]

Perhaps that tells something to someone.

One thing I should note is that I switch power profiles very frequently.
I made it go into power-saver whenever I don’t interact with the machine for 2s because the other power modes draw like a few watts more which is insane considering the thing is basically idle with just Firefox doing idle browser things in the background.

cc resident AMD expert @Mario_Limonciello


Which Linux distro are you using?

NixOS

Which release version?

5633bcff0c6162b9e4b5f1264264611e950c8ec7

Which kernel are you using?

6.6.53

Which BIOS version are you using?

Latest according to fupdmgr

Which Framework Laptop 16 model are you using? (AMD Ryzen™ 7040 Series)

AMD Ryzen 7 7840HS w/ Radeon 780M Graphics

1 Like

Just after I typed this out and again while I was typing this, it shortly clocked up to what felt like normal clock speeds and power draw and the fans also immediately span up as I imagine the temperature quickly rose.

It was short-lived though and immediately reverted back to snail speed.

It just did it again and I think it was related to a short moment of slightly less load (~80%). It’s super weird.

Speaking of temperature, I forgot to mention that Tctl is at a meager +50.4°C, so this isn’t a thermal issue either.

It’s ~4.1GHz on all cores actually.

I’ve had this issue for quite a while, too. It seems to be caused by sleeping, but it doesn’t always get triggered. It’s roughly a 5% chance that the laptop gets stuck at 1GHz after suspending.

IME it happens much less frequently than that. I typically suspend and resume multiple times a day and trigger this bug less than once per month. I haven’t triggered it at all in the past few months actually.

How frequently do you suspend/resume and how much time is between suspend and resume for you?

For me the time between suspend and resume is typically much less than a day which might have something to do with it.

I came here looking if someone is hitting the same issue. 12th gen Intel, also NixOS.

Everything kind of works, but when I’m trying to compile anything it just takes forever, and CPUs seem in some super slow powersave mode / low freq?

Hello!

I’m curious in some more information about your system to see if there’s maybe something that might help. Since it’s been a while, did you happen to have any kernel or bios updates you installed that supersede the ones you initially listed?

I’ve found that the newest kernel (as of right now, 6.13) fixes a few of the smaller issues I’ve been having with the laptop regarding sleep. More specifically the touchpad and fingerprint reader not working after a sleep/suspend.

Did you at one point verify and/or replace the Liquid Metal in the laptop for the cpu? I was having lots of issues with performance being capped due to a single core (my post history has some details about it.) Since replacing the Liquid Metal with PTM, it’s also helped mitigate the impact of temperature/power usage noise.

That’s an entirely different platform; this is the FW16 category.

What’s interesting is that you’re also running NixOS and someone in my hackspace (also with a FW16 and NixOS) has experienced the same issue as I have IIRC.

We’re not doing anything special w.r.t. hardware support in NixOS, so distro shouldn’t matter.

I’d still be curious to know what distros people who can aswell as those who cannot reproduce use.

FWIW, I haven’t reproduced this issue in many months now.

I have not; just patch releases of kernel 6.6.

Good thing you reminded me though because the 0.0.3.5 firmware is now available according to fwupdmgr aswell as fingerprint firmware but that’s hardly relevant here.
I don’t expect it to change anything though as its patch notes don’t claim to have changed anything about power management. There might have been an AGESA update though which could.

I’ve never had those issues.

Especially the fingerprint reader has been extremely reliable for me; never had an issue with it.

Not yet. I plan to do so but not because I expect an improvement in thermals but to mitigate the risk of breakage. I don’t expect the liquid metal to leak any time soon but I’d rather not risk having conductive fluid leak onto the board.

I’d actually expect the PTM to be significantly worse w.r.t. thermals than the liquid metal; LM is the best TIM there is AFAIK.

As I said though, this likely isn’t related to thermals because none of the sensors are anywhere near critical when in that state.
I’d like to verify this however. Is there any sort of log for thermal events?

1 Like

It was reported that after bios 3.05 there is battery drain while doing stuff such as gaming in balanced mode, which was originally not supposed to be getting battery drain.

That’s excellent to hear! I was having issues with it after sleep/suspend, same with the touchpad.

Currently a lot of users with the laptop 16 are experiencing an issue where the Liquid Metal runs off of the die, and as a result the whole processor thermal throttles under a load and overall is a lot more loud. I replaced mine after detecting the issue (after I read about it.)

Perhaps, but it’s not as good at the moment due to potential runoff.

I personally used S-tui and stress/cinebench to provide an all core load. You’ll see in the S-tui graphs that one of the cpu cores will be significantly hotter than the rest if you’re being affected by the issue.

1 Like

People report such stuff all the time even if absolutely nothing actually changed.

Unless there has been a specific change to power management or someone actually reproducibly measured it, stuff like that is very likely just sampling bias or placebo.

When the LM runs off the die, you’d be lucky to have the machine power off quick enough to not destroy itself.

That stuff is conductive. If it gets on the board in the wrong spot, you’ll have a dead board and some magic smoke. That’s the primary reason you should replace it as I see it.

That’s nice but is in no way related to what I wrote.

I can put on a load and read out the usual sensors too and have done so. As mentioned previously, none of the thermals are anywhere near critical in that state; they’re arguably very cool which makes sense given that it barely draws much power in that state.

Hence me asking whether there’s a lower-level system that at least logs thermal events of perhaps also sensors that aren’t usually visible to the kernel. (Idk. VRMs or something; a loose pad on one of those could conceivably make the power management go into a safe low power state.)

When they say it ‘runs off the die’ they mean uneven spread between the heatsink shim and cpu contact point. It’s not running out of the containment area.
LM is in theory better than any paste, but in practical use for a laptop it’s often unable to be kept at the high pressure mounting it gets in a desktop. Normal paste or especially the ptm pads do better because they make up for the worse mounting system in a laptop heatsink.

2 Likes

Got the same issue. CPU is stuck at <1GHz and power draw ~11W. Linux 6.13.5, system firmware 0.0.3.5.

Weirdly, cpupower frequency-set -d 3Ghz doesn’t do anything.

Well, this morning I opened the laptop from suspend and it just happened again. I didn’t even recognise the symptoms for a second, wondering why everything was so slow and why simple tasks caused extremely high per-cpu util% because it’s been so long since they last became evident.

Some things I did yesterday night that are slightly different to what I usually do which could conceivably trigger this:

  • I re-plugged the laptop from top left to top right while the laptop was suspended
  • I plugged in the laptop in while it was suspended and then unplugged it a little while later.
  • I plugged into the centre port on the right (I usually plug into one of the top ports)
  • After that I resumed it at least once for a few minutes before suspending to ram again. I do not know whether the symptoms were present at that time already but I suspect not.
  • I also remember rebooting yesterday around noon. I don’t usually reboot often.

I just did a suspend cycle and there was no change.

Then in regular operatation I plugged into the right centre port and frequencies immediately changed back to normal.

Having similar issues on my FW16 with AMD 7840HS, on NixOS with kernel 6.14 (I think it also happened on 6.13, but can’t be sure). BIOS 3.05.

It seems to happen randomly after resuming from suspend and the only way I’ve been able to fix it is by rebooting.

I wonder if it reproduces on any system that is not NixOS.

I got this randomly without ever suspending though, so I don’t think suspend is the only culrpit here.

1 Like

I don’t own a framework, found this while searching around.

Running NixOS, sometimes laptop extremely throttles down coming out of suspend, < 5% chance of happening. Reboots don’t help for me, only fix that works for me is plugging in to a power source.
From what I can tell it most often seems to happen when I have the device going into suspend being plugged in, unplugging, then waking it up. Not sure if it has to be plugged in before suspend or if after suspend also triggers, I haven’t been able to reproduce it consistently.

I’m starting to lean towards NixOS being the culprit here. Running ~ this week’s unstable, but this has been happening for months.

What hardware are you using?
Did it start to happen only after switching to NixOS?

Dell Latitude 5520, i5-1135G7
Yes, but this was a Windows → NixOS conversion so not a ton of info there.