GPE10 interrupt on framework 16 causing 100% load in a single core

Steiner · April 19, 2024, 10:01am

I’ve noticed several times that, after a resume from sleep, core number 1 on the Ryzen 9 is pined at 100% on some kernel thread (appears red in htop).

I’ve managed to trace it down to gpe10, disabling it stops the problem:

su -
echo "disable" > /sys/firmware/acpi/interrupts/gpe10

Ubuntu 23.10
kernel 6.5.0-27-generic
No dgpu
64GB RAM
2xnvme pci4
Dell WD22TB4 dock
2 external displays

It seems hard to reproduce, not always does it happen after resuming from sleep.

Happy to help with diagnosing the issue next time it happens, I just don’t know what information to collect and how.

Mario_Limonciello · April 19, 2024, 10:33am

Should be the same as [TRACKING] [FW 13 AMD 7840U] Cores stuck at low frequency and system lag - #31 by Mario_Limonciello

Suggest you contact support.

My guess is it’s getting triggered by your dock and exposing a bug in the PD controller.

Steiner · April 19, 2024, 1:25pm

Thank you. I have observed as well the issue with the CPU getting stuck at 0.54Ghz, but would they be related?

Mario_Limonciello · April 19, 2024, 1:27pm

Getting stuck like that sounds like the EC triggering thermal throttling most likely.

Steiner · April 19, 2024, 1:31pm

I have replied on the other thread, but I’ve switched to auto-cpufreq from ppd and I haven’t seen the CPU stuck at a low frequency re-occur.

My issue however is different, certainly the symptom is different.

Mario_Limonciello · April 19, 2024, 2:11pm

IMO this is very likely a coincidence with such a switch.

Mario_Limonciello · April 19, 2024, 2:18pm

If you reproduce what I believe is thermal throttling again, please capture the /sys/kernel/debug/gpio file and then capture it again when it’s not reproducing.

I believe there should be a GPIO to indicate thermal throttling is in use, but it would need cross referencing against the framework hardware design to confirm.

If the GPIO values are the same there are some MSRs to capture to isolate if it’s a kernel bug. Basically the CPPC ones used by amd-pstate.

Steiner · April 19, 2024, 3:43pm

Copy that, I’ll report back once it happens again. Thank you.

James3 · April 19, 2024, 11:11pm

@Steiner
I guess the real question is what is “gpe10”. I.e. what is it connected to?
I have seen this sort of behavior on other non-framework laptops. For example on another laptop, gpe13 was being triggered, and it was traced back to faulty firmware on SSD disks. The workaround/fix was just like you have done, and just disable gpe13.
But, it would be nice to understand which device is linked to gpe10 on the FW16.

Baconfield · April 21, 2024, 3:10am

It looks like I was also having a similar issue. Unplugged from a ThinkPad TB4 dock, GPE10 counts would stay flat, plug it in, and the count would rise rapidly, switching between output like below, and the same but with EN missing:

❯ cat /sys/firmware/acpi/interrupts/gpe10
    2200  EN     enabled      unmasked

One of the cores would either stay pegged between ~50% and ~100% usage. Updating the bios to 3.03 and the TB dock’s firmware seemingly solved it for the most part.
Initially when I had everything reboot after the firmware upgrades, the gpe10 values were still climbing, albeit much slower, and the EN values were always there when I checked, but unplugging the power cable and replugging it in stopped the values from climbing ever again. (The dock can be kinda funky with whether it will charge the FW16 or not at times, hence me resorting to using the FW16 charger on another port)

So far things are smoother than before when using a dock, currently using Tumbleweed + PPD + kernel 6.8.7
Edit: After going to sleep/waking the system up enough times, looks like it’s GPE10 interrupts are rising again, just current CPU usage sitting around 15-20%

Steiner · April 22, 2024, 9:11am

I guess the real question is what is “gpe10”

Very true. I’d rather not disable hardware stuff willy-nilly personally.

It’s definitely related to something about the TB4 dock and whatever state things are after resuming from sleep. When it’s happening, unplugging the dock brings cpu back down and stops the interrupt avalanche. Reconnecting the dock brings it back.

Steiner · April 22, 2024, 4:01pm

It’s happened again, on bios 3.03.

Piranha_Phish · July 26, 2024, 4:49am

I just want to chime in with a report of the same.

When docked to a Lenovo TB4 dock, gpe10 goes wild until I disable it. I can then immediately re-enable it again and everything is fine until some sort of ACPI/PM event occurs like if dock-connected displays go to sleep and then wake back up.

Strangely, I can easily trigger this 100% consistently, while docked, by restarting the libvirtd systemd service. This seems to be related to libvirt querying the kernel for PM capabilities.

Running BIOS 3.03

Steiner · July 26, 2024, 8:34am

That’s interesting, I also use libvirt and normally have at least one VM running.

ryanpetris · August 1, 2024, 3:59pm

I’ve masked this interrupt entirely by adding the following to GRUB_CMDLINE_LINUX_DEFAULT in my GRUB configuration:

acpi_mask_gpe=0x10

Haven’t had a problem since and I haven’t noticed any adverse side effects.

Piranha_Phish · December 5, 2024, 2:27pm

So this may have been resolved for me with a recent update to the firmware for the ThinkPad Universal Thunderbolt 4 Dock (40B0).

I had been experiencing the issue with resume from sleep as well as being able to consistently reproduce it on demand. But since the new firmware (v1.0.18, or v10.18 as reported by fwupdmgr) was offered and installed 2 days ago, I haven’t had the issue reoccur.

But I’m not fully convinced yet. I had experienced this with other USB-C docks, so it doesn’t make sense that the issue would be in the dock. And testing with another dock can’t reproduce the issue either. So it could be coincidence. I’ll report back if I experience the issue again.

Steiner · December 5, 2024, 3:05pm

The issue for me is intermittent. Not always does it happen. Anyway, disabling that interrupt seems to work around the problem well enough and I haven’t noticed any shenanigans as a result.

Piranha_Phish · December 5, 2024, 6:00pm

I hate the idea of a workaround, so I’m hoping the firmware solves it. But, now that I reflect, I bet you’re right and I just caught it in a dormant period.

FWIW, I did try masking the interrupt as well and had good results with no noticeable side effects, so add me as another person supporting that option as well.