[TRACKING] AMD: small group of kworkers keeping CPU 0 busy after suspend/resume cycle(s)

If feasible, does power-cycling the dock (once or a few times) “resolve” the kworker issue without a reboot? It does for me.

I did try that recently and it did not seem to help, but I will try that again next time, before rebooting.

Dimitris,

You are correct. After 2 or 3 power cycles of the dock the issue is resolved.

Jim

@Matt_Hartley as discussed on support thread some time ago, I finally opened a Fedora bug and tagged you there.

As mentioned in the bug thread I have perf data but would appreciate any insights on whether it is safe to upload to a public forum. As far as I can tell it’s just counts and stack traces, no actual data, but I haven’t used perf before so expert input is very welcome on this.

1 Like

Point I neglected to make with my last update:

This reproduces on first boot too, suspend/resume not required.

Discourse won’t let me update the thread title :frowning:

It’s listed here and that is what matters. Thanks for the update.

FWIW, I noticed the same symptoms yesterday with a Caldigit TS4 dock connected.
Unplugging and reconnecting the dock solved it for now.

I’ve been following this thread for a while and realised I hadn’t added my own story.

I experience the exact same symptoms periodically - the display and interactivity slowing to an absolute crawl (1 or 2 fps, and mouse/keyboard events taking 0.5s or more to respond) and the same kworker process using lots of CPU.

It happens once or twice a week, comes on very suddenly, and always while I’m either playing a video or in a video call. The only resolution I’ve found is to reboot.

But I don’t have a dock - my only regularly connected peripherals are a QHD monitor connected by a normal HDMI-USBC cable, a USB wireless mouse and a USB wireless mouse/keyboard combination. I haven’t spotted any pattern to using these peripherals that lines up with the sudden sluggishness.

If my experience is any use I’m happy to provide whatever data you need.

My laptop is the 7840U and it’s running Fedora (Silverblue) 40 although this happened with the many 39 kernels as well.

1 Like

Simply disabling the gpe10 interrupt like mentioned earlier in this thread caused my machine to completely hang when trying to suspend.

What does seem to work well without any stability issues, is to have two systemd oneshot services:

  1. the first one disables the interrupt and that gets pulled in by multi-user.target and sleep.target and is ordered after both targets
  2. the second one enables the interrupt again and is pulled in by sleep.target and ordered before it.

So the net effect is that the interrupt is disabled at boot time, and gets re-enabled just before suspend. It then gets disabled again when the machine came back out of suspend.

3 Likes

Sounds like a reasonable workaround for now until this can be addressed in firmware.
Make sure that Framework has a support ticket for you so they can track fixing it in firmware.

Thanks for that! I’ve done this too, it seems to work so far. I’ll exercise/keep an eye on it for a couple days before adding this information to my existing ticket/email thread with support. I’d also repeat Mario’s suggestion to open your own ticket too, since this seems to affect a variety of docks/hubs.

Thank you for the workaround. Here are the systemd one-shot services I’m using, in case anybody wants a drop-in config:

# /etc/systemd/system/gpe10-boot.service
[Unit]
Description=Disable gpe10 interrupt on boot

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/bin/sh -c "echo disable > /sys/firmware/acpi/interrupts/gpe10"

[Install]
WantedBy=multi-user.target

and

# /etc/systemd/system/gpe10-sleep.service
[Unit]
Description=Enable gpe10 interrupt for sleep
Before=sleep.target
StopWhenUnneeded=true

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/bin/sh -c "echo enable > /sys/firmware/acpi/interrupts/gpe10"
ExecStop=/bin/sh -c "echo disable > /sys/firmware/acpi/interrupts/gpe10"

[Install]
WantedBy=sleep.target

After putting these in place, you will need to:

systemctl daemon-reload
systemctl enable gpe10-{boot,sleep}
systemctl start gpe10-boot
1 Like

Ha, talk about serendipity. I just added my version of those in the Fedora bug.

(edit to add: yes this has been a consistent workaround since implemented 3 days ago, many thanks to @rvdp!)

@Matt_Hartley workflow question for you: I’ll follow up on my FW support ticket. Given the context of this looking like something with a firmware root cause as Mario hinted above, should the Fedora ticket still be in the mix? Counting that, there’s two “public” threads tracking this issue plus the various 1:1 support threads. Feels like it would be better to consolidate any new info/workaround sharing into one?

The downside of that workaround is that my laptop no longer connects to my dock when I plug it in. It all stays dark until I re-enable the interrupt temporarily, then I can dock, then I can re-disable the interrupt afterward.

Funny that I didn’t test that until this morning!

Hmm, weird, I can still connect my dock without issues with this workaround applied.

I’ve actually only seen that issue once, so far. Not sure what prior sequence led to it (cold boot disconnected from dock, etc). I’ll follow up if it happens again and I can nail it down.

I had the opportunity to swap out my Anker dock for a Dell Thunderbolt Dock WD22TB4 and all of my problems have been resolved. No more busy kworkers and no charging issues.

Fedora 40 6.10.6-200

So this may have been resolved for me with a recent update to the firmware for the ThinkPad Universal Thunderbolt 4 Dock (40B0).

I had been experiencing the issue with resume from sleep as well as being able to consistently reproduce it on demand. But since the new firmware (v1.0.18, or v10.18 as reported by fwupdmgr) was offered and installed 2 days ago, I haven’t had the issue reoccur.

But I’m not fully convinced yet. I had experienced this with other USB-C docks, so it doesn’t make sense that the issue would be in the dock. And testing with another dock can’t reproduce the issue either. So it could be coincidence. I’ll report back if I experience the issue again.

I am on v1.0.18 for a while now, and still have the issue.

I am happy to report that the issue has not come back for me as of yet.

It has been almost 2 weeks and I have docked/undocked at least a dozen times, tried other docks, plugged various devices, rebooted a handful of times, and I haven’t seen the issue return.

I don’t know exactly what to chalk it up to. But I’m happy so far. And knocking on wood.