[TRACKING] AMD: small group of kworkers keeping CPU 0 busy after suspend/resume cycle(s)

Just for clarity - TBT4 isn’t an alt mode that is negotiated. Usb4 is the alt mode that negotiates. Tbt4 is a branding for USB4 that supports a specific set of speeds and features.

But nonetheless I agree this certainly sounds like a failure for the PD negotiation. You guys don’t happen to have access to a PD analyzer do you? Something like Total Phase?

Even though it might be a problem only showing up with some cables, the real problem might be in the dock itself.
This is the kind of situation that it makes sense to analyze which component in the chain is behaving out of spec.

Sadly not, but we have a few Framework laptops. Waiting for someone to chime in this thread to test any suggested solution / debug idea.

I’ve been testing the 6.9 kernel (with cros_ec patches and the out of tree framework-laptop kmod) and I’m seeing something potentially related.

This behaviour is consistent after sleep:

On resume:

  • Takes a long time to present screen
  • Power Indicator LED still breathes/flashes even when lid open/power on
  • Waylabd Plasma greeter appears after >15 Seconds
  • Keypress input is erratic ; keystrokes not registering and or doubled/trippled into password input box
  • System becomes unusable until reboot

I get a heap of these in the journal from resume where the issue exhibits:

Mar 15 08:31:58 emiemi-3d-ae-net-nz kernel: atkbd serio0: Spurious NAK on isa0060/serio0. Some program might be trying to access hardware directly.

I’ll try and set a watch of the ec console to dump somewhere on resume I can look at.

@dimitris @Gabriel_Tremblay These are some interesting updates! I continue to see the problem regularly with my Plugable TBT4-UDX1 Plugable Thunderbolt 4 & USB4 HDMI Docking Station with 96W Charging – Plugable Technologies

It’s fairly infuriating, and often takes numerous power cycles of the dock before it settles down. The only thing keeping me sane is knowing there are other people working on it. :pray:

I’m using the 1m cable that came with the dock, and I don’t have any other cables to test. I might try purchasing a different (shorter) cable to see if it makes a difference.

EDIT: Now that I’ve re-read @Gabriel_Tremblay’s comments, it seems that the purpose of a different cable would be to avoid TB4, but that doesn’t fit my use-case, so I’ll hold off on buying something.

No surprise, that Plugable looks identical to the Kensington SD5780T dock I use. I bet the cables are identical too.

I can’t find it right now but I recall in a list/review/teardown site that it is (both are) a relabel of the same ODM.

1 Like

Oh, interesting. I wonder if the fault is in the dock, in the laptop, or in Linux. It seems relevant that the problem didn’t start showing up until late in the 6.6.x kernel series? But if the problem is in the dock, I’d like to look into replacing it with something different.

It also happens with this mokin dock Mokin 15-IN-1 Thunderbolt 4 Adapter with Triple Monitor Dock which seems to have different ports than the kensington/plugable (though it might have the same chipset inside).

For my use case (machine docked almost all the time, external monitor) this kworker/interrupt issue isn’t too big a deal, although I’d like to see it resolved.

Getting dock-connected monitors (in my case using the dock’s HDMI output) to work would be a little more interesting, so I can retire the extra cable that’s protruding from the other side of the laptop. This is far more miss than hit on the Linux side, and so far I haven’t seen any traction upstream.

(Note that the exact dock/monitor/cable combo works with an 11th gen FW laptop, and also with a M1 Mac Pro).

So, I’m holding off on swapping out the dock until there’s some clarity on that front. I’d prefer not to throw parts at the problem anyway.

It could be that these issues share a root cause.

Anyway, let’s keep this thread focused on tracking down these stray interrupts/kworker CPU cycles. There are other threads here that cover the dock-connected monitor issue on Linux.

I disconnect and reconnect multiple times per day.

This is news to me. My 4K monitor works great connected to my dock (I use the one TB4 cable for everything) but I’ll have to look for the other threads you mentioned, just to be aware.

Just for statistics: my dock is Lenovo Thunderbolt 4 Dock 40B0, was ok on 6.6.x, getting kworkers issue with 6.7, 6.8.x

Interesting, coincidentally Linux 6.7 added UDB/Thunderbolt DP Alt Mode 2.1… https://www.phoronix.com/news/Linux-6.7-USB-Thunderbolt

Here is the patchset [PATCH v2 0/5] Displayport Alternate Mode 2.1 Support

There seems to be some logic to identify cable’s capabilities WRT VDO/DPSID capabilities.

1 Like

I have an Anker 778 Thunderbolt 4 dock plugged in to my FW AMD 13 running Fedora 6.8.4-200. I was having the same kworkers issue so I’ve been watching this thread.

Yesterday I updated the dock firmware from v1.23 to the latest version (1.78 I think). And after 24 hours of frequent sleep / suspends (without rebooting) I have had no issues.

Unfortunately Anker does not supply a change log, so I’m not sure what issues the firmware addressed.

More info:
I have an HDMI monitor, USB scanner and USB microstreamer plugged in to the dock.

Thanks for the update! Were you able to update the firmware from Linux or did you boot into Windows or an Anker-provided ISO?

I had to boot in to Windows and run the Anker Dockmanager app (dockmanager download - Anker US). They have a Windows and Mac version, but no Linux version that I could find.

1 Like

Update: I commented too soon. Busy kworkers are back. Rebooting…

If feasible, does power-cycling the dock (once or a few times) “resolve” the kworker issue without a reboot? It does for me.

I did try that recently and it did not seem to help, but I will try that again next time, before rebooting.

Dimitris,

You are correct. After 2 or 3 power cycles of the dock the issue is resolved.

Jim

@Matt_Hartley as discussed on support thread some time ago, I finally opened a Fedora bug and tagged you there.

As mentioned in the bug thread I have perf data but would appreciate any insights on whether it is safe to upload to a public forum. As far as I can tell it’s just counts and stack traces, no actual data, but I haven’t used perf before so expert input is very welcome on this.

1 Like

Point I neglected to make with my last update:

This reproduces on first boot too, suspend/resume not required.

Discourse won’t let me update the thread title :frowning: