[TRACKING] AMD: small group of kworkers keeping CPU 0 busy after suspend/resume cycle(s)

I have an Anker 778 Thunderbolt 4 dock plugged in to my FW AMD 13 running Fedora 6.8.4-200. I was having the same kworkers issue so I’ve been watching this thread.

Yesterday I updated the dock firmware from v1.23 to the latest version (1.78 I think). And after 24 hours of frequent sleep / suspends (without rebooting) I have had no issues.

Unfortunately Anker does not supply a change log, so I’m not sure what issues the firmware addressed.

More info:
I have an HDMI monitor, USB scanner and USB microstreamer plugged in to the dock.

Thanks for the update! Were you able to update the firmware from Linux or did you boot into Windows or an Anker-provided ISO?

I had to boot in to Windows and run the Anker Dockmanager app (dockmanager download - Anker US). They have a Windows and Mac version, but no Linux version that I could find.

1 Like

Update: I commented too soon. Busy kworkers are back. Rebooting…

If feasible, does power-cycling the dock (once or a few times) “resolve” the kworker issue without a reboot? It does for me.

I did try that recently and it did not seem to help, but I will try that again next time, before rebooting.

Dimitris,

You are correct. After 2 or 3 power cycles of the dock the issue is resolved.

Jim

@Matt_Hartley as discussed on support thread some time ago, I finally opened a Fedora bug and tagged you there.

As mentioned in the bug thread I have perf data but would appreciate any insights on whether it is safe to upload to a public forum. As far as I can tell it’s just counts and stack traces, no actual data, but I haven’t used perf before so expert input is very welcome on this.

1 Like

Point I neglected to make with my last update:

This reproduces on first boot too, suspend/resume not required.

Discourse won’t let me update the thread title :frowning:

It’s listed here and that is what matters. Thanks for the update.

FWIW, I noticed the same symptoms yesterday with a Caldigit TS4 dock connected.
Unplugging and reconnecting the dock solved it for now.

I’ve been following this thread for a while and realised I hadn’t added my own story.

I experience the exact same symptoms periodically - the display and interactivity slowing to an absolute crawl (1 or 2 fps, and mouse/keyboard events taking 0.5s or more to respond) and the same kworker process using lots of CPU.

It happens once or twice a week, comes on very suddenly, and always while I’m either playing a video or in a video call. The only resolution I’ve found is to reboot.

But I don’t have a dock - my only regularly connected peripherals are a QHD monitor connected by a normal HDMI-USBC cable, a USB wireless mouse and a USB wireless mouse/keyboard combination. I haven’t spotted any pattern to using these peripherals that lines up with the sudden sluggishness.

If my experience is any use I’m happy to provide whatever data you need.

My laptop is the 7840U and it’s running Fedora (Silverblue) 40 although this happened with the many 39 kernels as well.

Simply disabling the gpe10 interrupt like mentioned earlier in this thread caused my machine to completely hang when trying to suspend.

What does seem to work well without any stability issues, is to have two systemd oneshot services:

  1. the first one disables the interrupt and that gets pulled in by multi-user.target and sleep.target and is ordered after both targets
  2. the second one enables the interrupt again and is pulled in by sleep.target and ordered before it.

So the net effect is that the interrupt is disabled at boot time, and gets re-enabled just before suspend. It then gets disabled again when the machine came back out of suspend.

3 Likes

Sounds like a reasonable workaround for now until this can be addressed in firmware.
Make sure that Framework has a support ticket for you so they can track fixing it in firmware.

Thanks for that! I’ve done this too, it seems to work so far. I’ll exercise/keep an eye on it for a couple days before adding this information to my existing ticket/email thread with support. I’d also repeat Mario’s suggestion to open your own ticket too, since this seems to affect a variety of docks/hubs.

Thank you for the workaround. Here are the systemd one-shot services I’m using, in case anybody wants a drop-in config:

# /etc/systemd/system/gpe10-boot.service
[Unit]
Description=Disable gpe10 interrupt on boot

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/bin/sh -c "echo disable > /sys/firmware/acpi/interrupts/gpe10"

[Install]
WantedBy=multi-user.target

and

# /etc/systemd/system/gpe10-sleep.service
[Unit]
Description=Enable gpe10 interrupt for sleep
Before=sleep.target
StopWhenUnneeded=true

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/bin/sh -c "echo enable > /sys/firmware/acpi/interrupts/gpe10"
ExecStop=/bin/sh -c "echo disable > /sys/firmware/acpi/interrupts/gpe10"

[Install]
WantedBy=sleep.target

After putting these in place, you will need to:

systemctl daemon-reload
systemctl enable gpe10-{boot,sleep}
systemctl start gpe10-boot
1 Like

Ha, talk about serendipity. I just added my version of those in the Fedora bug.

(edit to add: yes this has been a consistent workaround since implemented 3 days ago, many thanks to @rvdp!)

@Matt_Hartley workflow question for you: I’ll follow up on my FW support ticket. Given the context of this looking like something with a firmware root cause as Mario hinted above, should the Fedora ticket still be in the mix? Counting that, there’s two “public” threads tracking this issue plus the various 1:1 support threads. Feels like it would be better to consolidate any new info/workaround sharing into one?

The downside of that workaround is that my laptop no longer connects to my dock when I plug it in. It all stays dark until I re-enable the interrupt temporarily, then I can dock, then I can re-disable the interrupt afterward.

Funny that I didn’t test that until this morning!

Hmm, weird, I can still connect my dock without issues with this workaround applied.

I’ve actually only seen that issue once, so far. Not sure what prior sequence led to it (cold boot disconnected from dock, etc). I’ll follow up if it happens again and I can nail it down.