[TRACKING] [FW 13 AMD 7840U] Cores stuck at low frequency and system lag

I’m using ryzenadj to monitor only, I’m not actually tweaking anything. Needless to say, performance is badly capped in this state.

Can you specifically try comparing /sys/kernel/debug/amd_pmf/current_power_limits in a failure vs non failure? That would help confirm if there is an EC bug with a power limiter.

It’s difficult for me to pinpoint which change brought this behavior back.

I think a likely cause is the EC triggering thermal throttling; but I don’t have a good way to prove that by reading any registers or so. Maybe if you can monitor the EC debug log in one tab you can check the last messages it emits?

To get the system unstuck, I tried changing the power profile, stopping or re-starting the PPD service, setting scaling_governor (this used to fix it for me) or energy_performance_preference manually, plugging or un-plugging power supply, suspending and resuming (both on AC and on battery), to no avail.

None of those things working really does make me suspect thermal throttling by the EC as well…

By chance - did you happen to unplug the power adapter while in suspend when this issue happened? I’m aware of a bug report in kernel bugzilla with another manufacturer that has a bug with this. It LOOKS like a thermal event sequencing problem with that manufacturer, but if you can confirm the same thing is happening on your Framework 13 that would be a really interesting data point.

Any hints on what to try to get the system unstuck?

If it’s the same thing as that other manufacturer and caused by power adapter changes while in suspend, plug in and then unplug the power adapter after you’ve resumed. See if that brings it back to normal.

Thanks for your quick reply! My system is unstuck again for now (hibernate / resume / unload wifi kernel module / reinsert kernel module), but I’ll check these as soon as I get a chance.

I don’t remember for sure, but this might be the case. I’ll try that this weekend, to see if I can reproduce this.

A note to future me, I did a total vanilla Fedora 40 install on Sat 4th May, and currently no driver, kernel or config shenanigans…

So far, I haven’t been able to reproduce the behavior.

I’m pretty sure the power limits shown by ryzenadj were not lowered when the system was stuck. The power consumption stayed way below the limits I could see (was running watch -n 1 ryzenadj in a terminal while the system was stuck. I didn’t take a screenshot though).

OK, so after 4 or 5 days or so in and out of sleep on a completely vanilla fresh fedora 40 install I got the “lagging” again.

So I really need to get this diagnosed as I feel it’s either “just” me, or this is being reported with different symptoms elsewhere.

The next time I’ll be sure to get s-tui screenshot.

Is there anything else I can do, any fresh ideas?

FWIW I don’t think it’s just you. I think it’s probably your combination of devices/chargers triggering a bug somewhere. My educated guess from these kinds of bugs is that it’s most likely in the EC or PD controller.

Note down EXACT order of events and what devices caused it. Did you have it plugged in before suspend, did you unplug during suspend? How did you wake it? Did you have a dock connected, is it tied to that?

If you can reproduce it at will with a sequence of events and devices then it’s more likely Framweork support can too and then they can capture debug information to fix it!