[SOLVED] Radeon 780M: Thermal throttling to 800Mhz until entire laptop chassis cools off

Which Linux distro are you using?

Fedora

Which release version?

40

Which kernel are you using?

6.10.11-200.fc40.x86_64

Which BIOS version are you using?

3.05

Which Framework Laptop 13 model are you using?

Ryzen 7840U, (64 GB)

Problem

Under certain workloads (e.g. gaming), my clocks normally average at around 3.3 GHz (CPU) and 2.1 GHz (shader clock). Once the GPU temp hits 93ºC, it caps to 1.1 GHz (CPU) and 0.8 GHz (shader clock). This behavior persists until I:

  1. Let it cool off completely, the entire chassis. Just letting the CPU/GPU cool down to 40ºC is not enough
  2. Turn the laptop off completely for a few seconds and boot again

This issue normally occurs after 1 or 2 hours, but I can reliably reproduce it in ~20 minutes if I put my laptop onto something that impairs airflow (e.g. bedsheets). Gaming mode makes no difference, and neither does Fedoras performance power mode. No issues in dmesg.

Thought: Maybe its not the gpu, but a nearby component that also heats up? (E.g. ram or a chip on the motherboard).

I did more testing and clocks can reset without a reboot. But that only happens after the entire laptop cools off completely, long after the CPU/GPU have reached 40ºC.

I found that AMD CPUs have a safety feature called STAPM (Skin Temperature-Aware Power Management), which applies throttling based on the laptops chassis temperature. The limit defaults to 42ºC, so I’ve installed ryzenadj to increase the value via --apu-skin-temp 60. This has helped people with similar problems [1][2][3], but made no difference on my machine. So I left the default 42ºC and saw it getting surpassed long before the throttling happens. It is probably a different component in the laptop that causes this(?). RyzenAdj has some other flags that I’ve tried, but I was not able to force a clock speed reset.

People had some success with UXTU, [4], the windows-counterpart to RyzenAdj. I probably have to investigate which values they set there. Maybe its posible to replicate UXTUs “extreme” mode with RyzenAdj.

Another thread with people describing issues similar to mine, but those solutions did not work for me either: Gaming Throttling - #28 by BigT

Interesting, I didn’t know that did anything below 80C.

Throttling depends on my rooms temperature, with a few degrees making all the difference. At 20.5ºC or below, it practically never happens. Up to 21.5ºC I occasionally notice a slight drop in gameplay smoothness/framerate. And above 22.5ºC it always throttles to 800 MHz after 20 to 35 minutes of heavy use.

I did more investigation into ryzenadj and found these presets [1] ported from the windows tool UXTU [2]:

PRESETS = {
    "Eco": "--tctl-temp=95 --apu-skin-temp=45 --stapm-limit=8000 --fast-limit=10000 --slow-limit=8000 --vrm-current=180000 --vrmmax-current=180000 --vrmsoc-current=180000 --vrmsocmax-current=180000 --vrmgfx-current=180000",
    "BalPreset": "--tctl-temp=95 --apu-skin-temp=50 --stapm-limit=15000 --fast-limit=18000 --slow-limit=15000 --vrm-current=180000 --vrmmax-current=180000 --vrmsoc-current=180000 --vrmsocmax-current=180000 --vrmgfx-current=180000",
    "PerformancePreset": "--tctl-temp=100 --apu-skin-temp=50 --stapm-limit=28000 --fast-limit=35000 --slow-limit=28000 --vrm-current=180000 --vrmmax-current=180000 --vrmsoc-current=180000 --vrmsocmax-current=180000 --vrmgfx-current=180000",
    "ExtremePreset": "--tctl-temp=100 --apu-skin-temp=50 --stapm-limit=35000 --fast-limit=60000 --slow-limit=35000 --vrm-current=180000 --vrmmax-current=180000 --vrmsoc-current=180000 --vrmsocmax-current=180000 --vrmgfx-current=180000",
    "AC": "--max-performance",
    "DC": "--power-saving"
}

The “performance” and “extreme” presets fix my problem. Both perform the same in terms of gameplay and framerate, but the extreme preset runs much hotter (100ºC CPU). This causes some occasional framerate fluctuation, so I just stick with the more consistent “performance” preset.

Solution (Linux)

  1. Disable secure boot (this can be avoided with quite some work, but DIY for that)
  2. Add iomem=relaxed as a cmdline argument to your kernel (/etc/defaults/grub and sudo grub2-mkconfig -o /boot/grub2/grub.cfg)
  3. Reboot
  4. Download and build RyzenAdj as described here
  5. When your system starts throttling, run sudo ./ryzenadj --tctl-temp=100 --apu-skin-temp=50 --stapm-limit=28000 --fast-limit=35000 --slow-limit=28000 --vrm-current=180000 --vrmmax-current=180000 --vrmsoc-current=180000 --vrmsocmax-current=180000 --vrmgfx-current=180000

Notes

  • Stopping your workload and letting your laptop cool off will reset the power profile. Afterwards throttling can reoccur, in which case you have to reapply the command above. But as long as your workload keeps running, the performance profile you’ve set via ryzenadj will stay
  • While this solution fixes throttling to 800 MHz, there will still be some performance/framerate loss if the laptop gets too warm. So refer to common knowledge like proper airflow, dust in heatsink, room temperature, lid open vs closed etc.
1 Like

Hey, just ran in to the same problem the other day with my new-to-me Ryzen Framework. Was playing Marvel Rivals for a couple hours until it suddenly became unplayable.

I find it odd that so few people have had the same problem, and that apparently no one else has noticed for what’s presumably a serious usability bug. If we leave our computers running hard for long periods of time, they become unusable until they’re left to cool for long periods of time. Is it the case that we just had particularly poor ventilation in our working areas?

Given that I intend to run very long compile jobs on this machine I’ll have to keep a close eye on it.

I doubt cooling is the problem. I rather blame AMD Skin Protection (or STAPM), which throttles based on case temperature, not CPU or GPU temp.

Can you check your temps after throttling kicked in? For me the CPU caps at around 70ºC to 80ºC, which is definitely below what the hardware can handle.

In my view the solution would be to merge RyzenAdj’s performance profile into Fedora’s “performance mode”. If compliance is a problem, maybe a give a warning regarding case temperature/skin burns, the first time people turn it on.

Yeah, the CPU temps were also in the 70s-80s when it throttled. And the case was definitely very hot to the touch, albeit at the bottom, where I wasn’t touching it, because it was on a desk.

I’m referring to the ventilation of the environment. At the time, I was in a room with a crappy wood desk, and the computer was sitting half on top of a couple sheets of paper.

That definitely has an impact on how fast the throttling is reached. But the problem is still this weird artificial throttling. Disabling it via RyzenAdj will bring the temperature ceiling to around 100ºC, at which point you get real “natural” throttling which is more sensitive to the environment. Then the laptop’s performance actually depends on room temps, how open the lid is, space below the laptop etc.

So it was pointed out to me that Smokeless UMAF has a ton of options available for STAPM, and I spent a few hours poking at it to see if I could make it not throttle.

I came up with a good reproduction method-- putting the laptop on top of a bubble mailer and running FurMark + Prime95 at the same time can reproduce the problem in minutes instead of hours.

Sadly that’s where my luck ended, no combination of settings or messing around with values in there made any difference whatsoever, it still locks to 15W after a certain period of time :frowning:

I’d really like to avoid the solution of “add some sketchy kernel parameter, and then maybe run a program to make it go away, but it’ll come back whenever it feels like it.” Something more permanent would be nice, especially if the problem appears right as I’m aiming at something.

Can someone at Framework confirm whether STAPM-related throttling occurs due to legal requirements and liability issues? Maybe regulations require the case not getting hot enough to affect peoples skin. If not, this issue can be solved with software.

Two possible solutions:

1. Update the firmware/BIOS with better power profiles

  • Could be locked behind a BIOS setting with a temperature warning when disabling

2. Or package ryzenadj for Linux distros

  • Requires custom SELinux policies so it works without disabling secure boot + unsafe kernel args
  • May need a systemd service to ensure the power profile stays applied. Otherwise it will reset every time the laptop cools off
  • User-friendly desktop integration. Either merge it into Gnome’s “performance power mode”, or write a Gnome Shell extension to switch ryzenadj profiles. A warning dialog about case temperatures could be added

I don’t know much about upstreaming Fedora packages, but solution 2 is doable in less than a month. Assuming we find someone else to maintain it after the implementation is done :wink: For the long-term, solution 1 would be better and cleaner.

I’m pretty sure this is not a regulatory issue. Other AMD platforms have similar issues, but they allow disabling STAPM in the BIOS.

Smokeless… XD

It might be a random bug, STAPM only reduces power from 33~35W to 27~30W it doesn’t reduce further. Fortunately there’s a way of disabing it.

STAPM throttles from 33-35W to 27-30W within a few minutes, but if you let it cook for a couple hours (or suffocate it in a blanket) it will lock to 15W and under until the chassis cools off.

@Adrian_Joachim as I mentioned above I tried to disable or cripple STAPM using Smokeless UMAF with no luck.

Use ryzenadj