CPU sometimes gets stuck in some sort of low power mode

Since the 13 is older than 16, so doesn’t it make sense to fix it on the 13 first?

You can use udev rules instead: Framework Laptop 16 - ArchWiki

WDYM? Wasn’t BIOS 3.05 released just 2 or 3 months ago? It did not fix all the issues, but that does not mean that they are not releasing updates.

I am also frustrated by those issues, but things take time to fix :person_shrugging:

Not really, both AMD motherboards were released around the same time. It would make more sense for them to update both at the same time.

I can, but then it’s more of a hassle to disable. Read that page, it has the same issue (backlight remaining on/off).

7 months ago now.

That is not really fair, the FL16 has had at least two firmware updates.

Doesn’t mean they have fixed all the problems, or not introduced new ones.

1 Like

Yes really :slight_smile: Pre-orders for the 13 started on May while the 16 was only in July.

But even if they both came out in the same day, it makes no sense for a company to launch firmware updates for 2 devices at the same day. It requires planing, monitoring and taking action in case something goes wrong. It is safer to do it for 1 device and then, when things are smooth for that one, do the other.

Can’t you add a pre-suspend and after-resume script to turn the backlight off/on automatically?

Oh, right. I thought it was more recently. Even so, saying that they failed with their promise is a bit of a stretch.
But I do agree that it would be nice to have a fix. Maybe more frequent firmware updates with smaller increments/fixes.

Also, having official recognition/confirmation of the issues would also go a long way.

Anyone reproducing the issue originally described with stuck in low power state from charger and suspend please let me know if you can still reproduce with this patch. It has been reported to help this issue on other vendor’s systems.

This is part of kernel 6.15, right? If so, that is already on NixOS 25.05. I haven’t been hitting this issue, so maybe it was indeed fixed, but can’t say for sure yet.

Yes it is there. I don’t expect you to be able to reproduce with it present.

Btw. if you want to test whether you repro this, it’s not sufficient to look at whether clocks exceed 875MHz in lightly loaded conditions, you need to run some task that uses more than 5 CPUs. It can look like it’s clocking correctly but isn’t actually.

I’m still on 6.6 for unrelated reasons and just repro’d without noticing at first and then thinking it was fixed when it wasn’t.

When clocks are significantly above 1GHz with a task loading all cores, that’s an actual non-repro.

It does seem to be fixed on 6.15 since it hasn’t happened since I updated, while previously it would happen frequently.

I am currently fiddling around with a pretty similar behaviour of my Framework 16.

Ryzen™ 9 7940HS with the AMD Radeon™ RX 7700S expansion board
32 GB RAM
EndeavourOS Arch Linux (Kernel 6.15.9-arch1-1 / 6.12.43-1 lts)
Drivers, Firmware, everything up to date (except the non-lts Kernel since I have another problem with the 6.16 Kernel in regards to external monitors on my dock).

If the AMD Radeon™ RX 7700S expansion board is installed, the dGPU would get deactivated after a few minutes of regular usage by power management. Which is expected.

The moment this happens, the CPU will get limited to 544 MHz, no matter what I do, even if stress-testing or starting high graphic games – which will re-enable the dGPU just fine, the CPU limitation, however, persists, making the Notebook barely usable.

cpupower frequency-info

shows (besides other info)

current CPU frequency: 545 MHz (asserted by call to kernel)

And this is that. Reboot fixes the problem, but nothing very short of that. Next dGPU power cycle, the problem is back.

If I add the kernel parameter amdgpu.runpm=0 to disable the dGPU power management, everything runs just fine, the CPU keeps working at the speeds it should.

The problem also does not occur with the fan-only expansion bay installed (so without dGPU).

I already tried several iterations of setting power profiles with cpupower – which are completely ignored in that state.

Any ideas other than disabling the dGPU power managment anyone? :grinning_cat_with_smiling_eyes:

Greetings,

^.^ Dingo

I wrote this dumb little script that “fixes” the cpu freq most of the time when it happens, so I don’t need to reboot. Nasty workaround for now until they fix the bios and release another update.

#!/bin/bash
echo on > /sys/bus/pci/devices/0000\:03\:00.0/power/control
powerprofilesctl set power-saver; powerprofilesctl set balanced; powerprofilesctl set performance
echo auto > /sys/bus/pci/devices/0000\:03\:00.0/power/control

Today I’ve experienced the issue again, where the CPU capped at 999Mhz as soon as there was 100% demain. I am on 6.16 now, so I am not sure if it was a regression or coincidence.

I’ve never heard of getting locked at 999, this sounds different.

Are you on that beta bios 3.06? It has a ton of power management issues.
Mainly the 544MHz locks when the dGPU gets cycled on or off.

However, I have had mine go into the 999MHz lock you mention a few times, but it’s more rare. I think the 999MHz lock issue was a problem on the 3.05 bios as well. Might be a good idea to try to RCA and reproduce it. To be honest, I have not done much gaming on my Framework 16 in a long time because of all these unresolved throttling issues Framework has not bothered to fix.

I don’t think this is a kernel issue, it is a firmware issue Framework needs to fix (going on over 9 months now).

Um, you should read the OP then ^^’

1 Like

Thanks. Re-read it. You should report this to framework support to reproduce, diagnose and debug. I am in agreement it’s probably related to the EC performance settings but they would be in the best position to confirm it.

Why do you keep suggesting this? If you read through the thread you will see several have reported it to Framework already so they would have to be blind to not know about this issue. It’s been an ongoing problem since bios 3.05 at least. My uptimes on this laptop have never been more than a week or so because it eventually gets into this state and needs a poweroff to resolve. This was the most blaring bug that a bios update was needed to fix. We are just still waiting on that as the bios update made it worse (or added a new bug that was more common).

No, I am still on 3.05.

Why you don’t think it is a kernel issue? Apparently there was indeed a similar kernel issue that was already fixed: CPU sometimes gets stuck in some sort of low power mode - #45 by Mario_Limonciello

Isn’t it similar to the issue you fixed on the AMD’s kernel code? What makes you point to the EC?

The problem that I fixed was a race condition with the APU and EC on some designs. The EC would trigger PROCHOT while the APU was trying to enter s0i3 and get stuck. The time delay helped this for some designs.

FWIW this was an accidental fix. The change that went in was to fix another issue where some very specific (unspecified) OEM designs could trigger an EDC event on the VRM and cause an instant power off during s0i3. After that was root caused we realized there is nothing to stop another OEM doing the same thing in their design and so the safer thing to do was add the delay for everyone. It just so happened to also help fhe race condition with PROCHOT.

But this other issue mentioned is not a race. It sounds like a thermal limit is being hit until things cool. The CPU coefficients are programmed by the EC specific fo the design and the power slider position. If you’re hitting it in performance mode it sounds like they might be set too aggressively, or there is another thermal issue.

My point is this is not likely something we can do much to influence from Linux. But it is best to be debugged from people who chose all the settings the EC programs in the first place. They can capture a lot of the state when the issue happens to determine how and why it’s happening. If there is something that can be done from Linux we need to know what it is.

2 Likes

They do already know about it:
~~~
We found the issue was caused by a change that adjusts the GPU power limit based on its power state. The original power limit code didn’t control this well. After recently refactoring the power limit code, we can no longer reproduce the issue, though it is still under validation. Since the new code has a different structure, we don’t think it’s worth creating another solution for the current EC code. We plan to revert the change and have 3.07 BIOS for now and will soon release a new BIOS that includes the refactored power limit code.
~~~