thilog
September 8, 2024, 6:59pm
1
Hi there,
I am experiencing high load (> 4) on my FW16 (7840HS, iGPU, BIOS 3.03, docked to a Thunderbolt dock) despite the system being mostly idle. The culprit seem to be kworker/0:x+pm tasks, which often are in “uninterruptible sleep”. While I am using Arch Linux as my daily driver, I can easily reproduce this with the Fedora 40 Live ISO (although you need to leave it running for some time before the issue becomes apparent).
Can anyone else reproduce this? Any ideas how to troubleshoot this?
Getting the stack shows:
[<0>] rpm_resume+0x25f/0x700
[<0>] rpm_resume+0x2d3/0x700
[<0>] rpm_resume+0x2d3/0x700
[<0>] pm_runtime_work+0x70/0xb0
[<0>] process_one_work+0x17b/0x330
[<0>] worker_thread+0x2e2/0x410
[<0>] kthread+0xcf/0x100
[<0>] ret_from_fork+0x31/0x50
[<0>] ret_from_fork_asm+0x1a/0x30
or
[<0>] pci_power_up+0x144/0x190
[<0>] pci_pm_runtime_resume+0x33/0xf0
[<0>] __rpm_callback+0x41/0x170
[<0>] rpm_callback+0x55/0x60
[<0>] rpm_resume+0x4d3/0x700
[<0>] rpm_suspend+0x5db/0x5f0
[<0>] pm_runtime_work+0x84/0xb0
[<0>] process_one_work+0x17b/0x330
[<0>] worker_thread+0x2e2/0x410
[<0>] kthread+0xcf/0x100
[<0>] ret_from_fork+0x31/0x50
[<0>] ret_from_fork_asm+0x1a/0x30
Thanks,
Thilo
thilog
September 8, 2024, 7:26pm
2
Further debugging the situation this seems to be related to the power management of of the following devices:
PCI Device Advanced Micro Devices, Inc. [AMD] Family 19h USB4/Thunderbolt PCIe tunnel
PCI Device Advanced Micro Devices, Inc. [AMD] Family 19h USB4/Thunderbolt PCIe tunnel
PCI Device Advanced Micro Devices, Inc. [AMD] Pink Sardine USB4/Thunderbolt NHI controller #2
PCI Device Advanced Micro Devices, Inc. [AMD] Pink Sardine USB4/Thunderbolt NHI controller #1
If I disable runtime PM for these in powertop
the issue disappears and the load falls below 1.
Something I do to prevent annoying jet fans, is using cpupower and limit CPU speed to 4.5Ghz instead of 5.2Ghz.
My temps never go beyond 85C (even with videogames), fans noise is good enough when doing all cores compilation.
You can increase CPU speeds to 4.8Ghz, but I don’t see much difference under my workload.
thilog
September 8, 2024, 7:57pm
4
CPUs are mostly idle, this is some phantom load, the laptop is basically silent.
1 Like
Oh thats strange. I’m on Gentoo using kernel 6.10 and I’m not experiencing any issue like that.
Are you on 6.10 as well?
thilog
September 8, 2024, 8:01pm
6
6.10.8-arch1-1. Do you have the laptop connected to a Thunderbolt dock?
That’s a good point. I don’t.
My external monitor uses USB-C directly from the sides (iGPU)
At Linux, some USB devices can get problematic, like UAS protocol when the hardware not fully compliant with it.
Can you check if your dock makes any issue to other people at Linux? Sometimes are just “some” computers the impacted (like voltage issues)
thilog
September 8, 2024, 8:16pm
8
It’s a Lenovo Thunderbolt 4 Dock, which I used along with my previous laptop under Linux w/o any issues and which has been described as compatible on the dock thread.
interesting, sadly I don’t have any Thunderbolt dock with me, but I don’t get this issue when connecting DisplayPort on USB-C.
Q1) Do you have the issue with both Arch and Fedora?
Q2) Do you get this problem randomly or just connecting your monitor into the dock causes this hugh CPU behaviour?
Sounds like a usb4 cm bug.
Can you debug what is kicking off the resume event for these devices?
thilog
September 9, 2024, 6:58am
11
Do you have any pointer how to do this?
Edit:
$ sudo cat /sys/kernel/debug/wakeup_sources
name active_count event_count wakeup_count expire_count active_since total_time max_time last_change prevent_suspend_time
hidpp_battery_1 2 2 0 0 0 0 0 6930648 0
8-1.4.4.3 0 0 0 0 0 0 0 0 0
7-1.4.1.4 1 1 0 0 0 0 0 6923766 0
0000:62:00.0 11059 11059 0 11059 0 1122496 107 3760579 0
device:15 0 0 0 0 0 0 0 0 0
1-2 1 1 0 0 0 0 0 6923298 0
ucsi-source-psy-USBC000:004 5 5 0 0 0 4 1 6923937 0
ucsi-source-psy-USBC000:003 6 6 0 0 0 4 1 6923734 0
ucsi-source-psy-USBC000:002 1 1 0 0 0 0 0 36463 0
ucsi-source-psy-USBC000:001 1 1 0 0 0 0 0 36153 0
0000:c1:00.5 0 0 0 0 0 0 0 0 0
0000:c3:00.6 0 0 0 0 0 0 0 0 0
domain1 0 0 0 0 0 0 0 0 0
1-0 0 0 0 0 0 0 0 0 0
USBC000:00 0 0 0 0 0 0 0 0 0
0000:c3:00.5 0 0 0 0 0 0 0 0 0
domain0 0 0 0 0 0 0 0 0 0
0-0 0 0 0 0 0 0 0 0 0
AMDI0102:00 0 0 0 0 0 0 0 0 0
PIXA3854:00 0 0 0 0 0 0 0 0 0
i2c-PIXA3854:00 0 0 0 0 0 0 0 0 0
FRMW0003:00 0 0 0 0 0 0 0 0 0
AMDI0009:00 0 0 0 0 0 0 0 0 0
1-4.2 0 0 0 0 0 0 0 0 0
1-3.2 0 0 0 0 0 0 0 0 0
0000:c3:00.4 1 1 0 1 0 101 101 6923653 0
0000:c3:00.3 0 0 0 0 0 0 0 0 0
0000:c1:00.4 0 0 0 0 0 0 0 0 0
0000:c1:00.3 0 0 0 0 0 0 0 0 0
PNP0C14:00 0 0 0 0 0 0 0 0 0
alarmtimer.0.auto 0 0 0 0 0 0 0 0 0
00:01 0 0 0 0 0 0 0 0 0
PNP0C0A:00 0 0 0 0 0 0 0 0 0
PNP0C0C:00 0 0 0 0 0 0 0 0 0
PNP0C0D:00 0 0 0 0 0 0 0 0 0
ACAD 1 1 0 0 0 0 0 540 0
ACPI0003:00 0 0 0 0 0 0 0 0 0
AMDI0030:00 0 0 0 0 0 0 0 0 0
AMDI0010:03 0 0 0 0 0 0 0 0 0
AMDI0010:00 0 0 0 0 0 0 0 0 0
PNP0A08:00 0 0 0 0 0 0 0 0 0
device:45 0 0 0 0 0 0 0 0 0
device:40 0 0 0 0 0 0 0 0 0
device:46 0 0 0 0 0 0 0 0 0
device:41 0 0 0 0 0 0 0 0 0
device:3c 0 0 0 0 0 0 0 0 0
device:3a 0 0 0 0 0 0 0 0 0
device:21 0 0 0 0 0 0 0 0 0
device:19 0 0 0 0 0 0 0 0 0
device:33 0 0 0 0 0 0 0 0 0
device:23 0 0 0 0 0 0 0 0 0
device:18 0 0 0 0 0 0 0 0 0
device:22 0 0 0 0 0 0 0 0 0
LNXVIDEO:00 0 0 0 0 0 0 0 0 0
device:0f 0 0 0 0 0 0 0 0 0
device:0b 0 0 0 0 0 0 0 0 0
device:4b 0 0 0 0 0 0 0 0 0
device:4a 0 0 0 0 0 0 0 0 0
device:3b 0 0 0 0 0 0 0 0 0
device:39 0 0 0 0 0 0 0 0 0
device:16 0 0 0 0 0 0 0 0 0
0000:00:04.1 11061 22121 0 11061 0 1414118 132 6923656 0
device:14 22121 22121 0 0 0 6 0 6923555 0
0000:00:03.1 0 0 0 0 0 0 0 0 0
device:12 0 0 0 0 0 0 0 0 0
0000:00:02.4 0 0 0 0 0 0 0 0 0
device:0e 0 0 0 0 0 0 0 0 0
0000:00:02.2 0 0 0 0 0 0 0 0 0
device:0a 0 0 0 0 0 0 0 0 0
deleted 2 2 0 0 0 1 0 0 0
00:04.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 19h USB4/Thunderbolt PCIe tunnel (prog-if 00 [Normal decode])
Subsystem: Advanced Micro Devices, Inc. [AMD] Device 1453
Flags: bus master, fast devsel, latency 0, IRQ 41, IOMMU group 5
Bus: primary=00, secondary=62, subordinate=c0, sec-latency=0
I/O behind bridge: a000-efff [size=20K] [16-bit]
Memory behind bridge: 60000000-77ffffff [size=384M] [32-bit]
Prefetchable memory behind bridge: 6800000000-77ffffffff [size=64G] [32-bit]
Capabilities: <access denied>
Kernel driver in use: pcieport
62:00.0 PCI bridge: Intel Corporation Thunderbolt 4 Bridge [Goshen Ridge 2020] (rev 03) (prog-if 00 [Normal decode])
Subsystem: Intel Corporation Device 0000
Physical Slot: 0-1
Flags: bus master, fast devsel, latency 0, IRQ 51, IOMMU group 5
Bus: primary=62, secondary=63, subordinate=c0, sec-latency=0
I/O behind bridge: a000-efff [size=20K] [16-bit]
Memory behind bridge: 60000000-77ffffff [size=384M] [32-bit]
Prefetchable memory behind bridge: 6800000000-77ffffffff [size=64G] [32-bit]
Capabilities: <access denied>
Kernel driver in use: pcieport
I have no clue what device:14
is, though.
Take a look at cat /sys/bus/acpi/devices/device:14/path
and ls -alh /sys/bus/acpi/devices/device:14/ | grep physical
I’m guessing it’s that bridge at PCI 00:04.1 (GP22).
That’s the PCI bridge that’s used for tunneling. See if you keep the dock connected to that exact same physical port and turn off runtime PM for that PCI if everything works as intended. If so, this will need some tracing in the USB4 driver. I’d suggest raising a bug at kernel bugzilla.
thilog
September 9, 2024, 9:52pm
13
Your guess is right. I have written a mail summarizing the findings so far to linux-pm and linux-usb.
thilog
September 11, 2024, 2:09pm
14
thilog
September 24, 2024, 6:57pm
15
ATM I am working around this using a udev rule that turns off the device’s PM while docked.
@Mario_Limonciello Do you have any hint how to obtain the USB trace you mentioned? Haven’t gotten any response for my mail and bug report - maybe proactively creating the trace would help.
You can start using GitHub - intel/tbtools: Thunderbolt/USB4 debugging tools which manipulates debug knobs in the driver.
Does the problem occur only once the Desktop has the lock screen running?
Check out this KDE bug: 484323 – High CPU load of kwin_x11 when locking or turning off the screen
thilog
September 25, 2024, 8:31pm
18
No, it constantly happend while docked and the kwin_x11
process is not affected. No process is consuming excess CPU, but the load is still increased as the kworker/0:x+pm
tasks tasks are waiting in uninterruptible sleep.