GPE10 interrupt on framework 16 causing 100% load in a single core

I’ve noticed several times that, after a resume from sleep, core number 1 on the Ryzen 9 is pined at 100% on some kernel thread (appears red in htop).

I’ve managed to trace it down to gpe10, disabling it stops the problem:

su -
echo "disable" > /sys/firmware/acpi/interrupts/gpe10

Ubuntu 23.10
kernel 6.5.0-27-generic
No dgpu
64GB RAM
2xnvme pci4
Dell WD22TB4 dock
2 external displays

It seems hard to reproduce, not always does it happen after resuming from sleep.

Happy to help with diagnosing the issue next time it happens, I just don’t know what information to collect and how.

Should be the same as [TRACKING] [FW 13 AMD 7840U] Cores stuck at low frequency and system lag - #31 by Mario_Limonciello

Suggest you contact support.

My guess is it’s getting triggered by your dock and exposing a bug in the PD controller.

Thank you. I have observed as well the issue with the CPU getting stuck at 0.54Ghz, but would they be related?

Getting stuck like that sounds like the EC triggering thermal throttling most likely.

I have replied on the other thread, but I’ve switched to auto-cpufreq from ppd and I haven’t seen the CPU stuck at a low frequency re-occur.

My issue however is different, certainly the symptom is different.

IMO this is very likely a coincidence with such a switch.

If you reproduce what I believe is thermal throttling again, please capture the /sys/kernel/debug/gpio file and then capture it again when it’s not reproducing.

I believe there should be a GPIO to indicate thermal throttling is in use, but it would need cross referencing against the framework hardware design to confirm.

If the GPIO values are the same there are some MSRs to capture to isolate if it’s a kernel bug. Basically the CPPC ones used by amd-pstate.

1 Like

Copy that, I’ll report back once it happens again. Thank you.

@Steiner
I guess the real question is what is “gpe10”. I.e. what is it connected to?
I have seen this sort of behavior on other non-framework laptops. For example on another laptop, gpe13 was being triggered, and it was traced back to faulty firmware on SSD disks. The workaround/fix was just like you have done, and just disable gpe13.
But, it would be nice to understand which device is linked to gpe10 on the FW16.

It looks like I was also having a similar issue. Unplugged from a ThinkPad TB4 dock, GPE10 counts would stay flat, plug it in, and the count would rise rapidly, switching between output like below, and the same but with EN missing:

❯ cat /sys/firmware/acpi/interrupts/gpe10
    2200  EN     enabled      unmasked

One of the cores would either stay pegged between ~50% and ~100% usage. Updating the bios to 3.03 and the TB dock’s firmware seemingly solved it for the most part.
Initially when I had everything reboot after the firmware upgrades, the gpe10 values were still climbing, albeit much slower, and the EN values were always there when I checked, but unplugging the power cable and replugging it in stopped the values from climbing ever again. (The dock can be kinda funky with whether it will charge the FW16 or not at times, hence me resorting to using the FW16 charger on another port)

So far things are smoother than before when using a dock, currently using Tumbleweed + PPD + kernel 6.8.7
Edit: After going to sleep/waking the system up enough times, looks like it’s GPE10 interrupts are rising again, just current CPU usage sitting around 15-20%

I guess the real question is what is “gpe10”

Very true. I’d rather not disable hardware stuff willy-nilly personally.

It’s definitely related to something about the TB4 dock and whatever state things are after resuming from sleep. When it’s happening, unplugging the dock brings cpu back down and stops the interrupt avalanche. Reconnecting the dock brings it back.

It’s happened again, on bios 3.03.