[TRACKING] AMD: small group of kworkers keeping CPU 0 busy after suspend/resume cycle(s)

dk6 · February 28, 2024, 11:22pm

Still happens with 6.7.6-gentoo-x86_64 (vanilla 6.7.6 + Index of /~mpagano/genpatches/trunk/6.7 )

Matt_Hartley · February 28, 2024, 11:31pm

See if you can rebuild as described here and see if this helps:

dk6 · March 3, 2024, 7:38pm

I applied V2 of the patch to 6.7.8;

You say: see if this helps. Do you mean the patch is supposed to help with the issue? I was under the impression it only exposed the EC and allowed console output access.

Aron_Griffis · March 3, 2024, 10:25pm

Have you received any useful reply to your support case? I’ve been staying on 6.6.13 for now since it doesn’t exhibit the problem. I’m not anxious to file a support case personally, but I hope Framework is giving this attention.

dimitris · March 3, 2024, 10:45pm

Sorry I haven’t yet, things got a little busy. I’ll try to build a 6.7.8 plus the patches today to get up to date console output now that the behavior is more reproducible.

BTW if power cycling the hub/dock involved is an option then I wouldn’t hold onto 6.6, there are lots of important fixes, both security and FW-AMD specific, in the 6.7 series.

@Matt_Hartley I’ll go ahead and send an update on my support thread, unless I hear otherwise that it makes more sense to track here? (edit: I’ve responded to the ticket with logs/console output)

Mario_Limonciello · March 4, 2024, 3:44pm

Just a thought; does reverting ACPI: EC: Fix acpi_ec_dispatch_gpe() · torvalds/linux@b5539eb (github.com) help?

Matt_Hartley · March 4, 2024, 6:28pm

Ticket is easier. Thanks. We are at a workshop this week, so my replies here will be very limited.

Matt_Hartley · March 4, 2024, 6:30pm

This is a good idea. I have yet to repo here, which makes this challenging to really put a ton of focus into. If it hits multiple updates and continues, and we can repo, we can bug file.

dimitris · March 4, 2024, 7:11pm

Tried the patch (edit: The reverse of the patch, that is) on top of 6.7.8, it didn’t seem to change the pattern unfortunately.

Mario_Limonciello · March 4, 2024, 7:22pm

Thanks for trying.

Gabriel_Tremblay · March 5, 2024, 12:43pm

Same problem here, on 6.7.8-arch1-1.

The behaviour I’m experiencing seems to be caused by a specific action. It seems that if I reboot with AC on, and restrain from unplugging the laptop or closing the lid, it delays the issue for a while. It seems that masking gpe10 before the problem starts will prevent it but so far I haven’t been able to validate that completely, as the problem might appear only after 24-48hr on my laptop. It also looks like that masking gpe10 does causes some issues. My laptop does not seem to get out of sleep with the lid if I mask it, I need to use the button if I do.

dimitris · March 5, 2024, 6:27pm

Maybe important update: Turns out I can also reproduce this with 6.6.14. I installed it on Fedora using the fedora-repos-archive repository:

sudo dnf install fedora-repos-archive
sudo dnf --refresh --enablerepo updates-archive install kernel-6.6.14

I’ll update the title and initial description too accordingly.

Matt_Hartley · March 5, 2024, 6:40pm

dimitris:

Maybe important update: Turns out I can also reproduce this with 6.6.14. I installed it on Fedora using the fedora-repos-archive repository:
sudo dnf install fedora-repos-archive
sudo dnf --refresh --enablerepo updates-archive install kernel-6.6.14
I’ll update the title and initial description too accordingly.

Added this to the ticket for repo when I return to my home office Friday.

Aron_Griffis · March 6, 2024, 10:48am

I followed the instructions to build with the v2 patches, but when I try to reboot into it, I get a message about “bad shim signature.” Am I missing a step?

UPDATE: I was able to disable secure boot with sudo mokutil --disable-validation (from Secureboot - Fedora Project Wiki)

Gabriel_Tremblay · March 13, 2024, 2:15pm

Good news, I was able to precisely replicate the problem on demand. I don’t have what it takes to track why it’s happening, but I can definitely show how it’s happening.

To test my hypothesis, I used this command:

watch ‘grep . -r /sys/firmware/acpi/interrupts/ | grep gpe10’

Also, this is IMPORTANT: reboot before you test, once you get into the problem, it creates an infinite loop somewhere and you can’t test the behaviour anymore.

This interrupt gets called every time I plug a USB-C device in a USB-C port whatever the port might be. On my machine I use the back left one.

If I use a powerbank (Anker Nano II in my case), the interrupts get called when I unplug and when I plug it back. That’s it. 4 cycles : 8 interrupts.

Now. In my office setup, I use a docking station, specifically a Kensington SDS700T. As soon as I plug my laptop in it, I get a dozen interrupts called, and as soon as I do, even if I permanently disconnect the docking station from the laptop, some sort of internal kernel loop starts calling gpe10 forever and accelerates over time.

After ~24 hours of this, past the first unplug, the handlers starts to hog the CPU until it takes 100% of the handling core, rendering the machine unresponsive.

I did test another docking station, a CableMatters 201308 and it does NOT create the problem, I guess that it might be related to a protocol error that gets into a weird loop.

Both docking station had ethernet plugged in, but nothing else.

To add to the weirdness of the problem. Sometimes the problem pauses itself for a couple minutes, but eventually always restart if I plugged the kensington once.

I will be changing my docking station in the meanwhile, but i’m pretty sure this will impact other devices.

Let me know if I can help, ill keep the problematic docking station for now on the side.

Mario_Limonciello · March 13, 2024, 2:30pm

Try upgrading to 3.03b if you haven’t already. This brings an updated EC.

Gabriel_Tremblay · March 13, 2024, 2:30pm

I’m running on 3.03b

I can confirm that this problem existed on 3.03 and still exist on 3.03b.

Gabriel_Tremblay · March 13, 2024, 3:11pm

Update:

We had a stack of usb cables, from Ankers to OEM Apple and other power banks laying around (the legit framework one, and a PD 90W).

The behaviour becomes even more interesting. In no circumstances any cable produced the problem while using any of the power brick we had, including the framework one.

However, while using the Kensington Dock, we found only two cables that were able to reproduce the problem, an unbranded one and the one that comes with the station (how unfortunate…). And that, every single time. The anker branded and apple oem does not cause the problem on the station.

It seems that the problem might be related to power negotiation edge cases with certain cables under certain circumstances, regardless of the power that can be provided by the power brick on the other side (which separates the problem from what 3.03b is supposed to fix)

So I guess an easy fix for now is just getting rid of your cable that causes the problem.

dimitris · March 13, 2024, 7:07pm

Interesting. I only have the TB4 cable that came with the dock, and an older and very short TB3 Anker cable. I tried the Anker this morning and still saw this problem. Unless I hear back that you’ve reproduced this with your Anker or Apple cables soonish, I might head over to an Apple store to pick up one of their TB4 cables.

I assume both of those that are symptom-free (Anker and Apple) are active cables? IIRC Apple has both active and passive ones, or something to that effect - the longer one (2m?) being active.

Gabriel_Tremblay · March 13, 2024, 7:39pm

Just to make sure, this is how I get to a “clean state”:

Best way to test is to reboot while using the Framework OEM power adapter and set the watch on gpe10. You should see 0 (or nothing more than 2-3 lets say)
Then test that every time you unplug/plug, it should increase by 2. If you get there, you are in a stable state.

What you are saying is quite interesting…

I found at least two cables, including the ones that came with the docking station, that have the problem, so you might be out of luck with both of your cables for now. An interesting fact, as you are saying, both of these faulty cables are supposed to be Thunderbolt compatible, maybe the problem comes from this feature. As of now, i’m using a nondescript anker usb-c cable that has a fair chance of not supporting thunderbolt, on the faulty dock.

The other dock we have, the CableMatter, has a built-in cable. It seems we don’t see the problem with it since it does not support Thunderbolt, it’s just DisplayPort over usb-c. My Kensington dock however does support Thunderbolt 4.

I would be tempted to say that the problem comes from Thunderbolt 4 negotiation.

Topic		Replies	Views
[RESPONDED] Sleep while on TB dock, disconnect, resume: ~ 1min delay to resume Linux	10	2065	November 6, 2023
[RESPONDED] Framework forgets external monitor resolution on wake from sleep Linux	19	3870	February 15, 2023
Laptop powers on after hibernate when unplugging/unpowering dock Community Support	12	5381	July 12, 2022
[RESPONDED] AMD FW13 -- s2idle premature wake Linux	24	2993	January 31, 2024
[TRACKING] Kworker stuck at near 100% CPU usage with Ubuntu 22.04 Linux	52	20611	May 15, 2025

[TRACKING] AMD: small group of kworkers keeping CPU 0 busy after suspend/resume cycle(s)

Related topics