If it’s new to 6.7 can you please bisect it so we can understand it?
Under normal use, it has taken a day’s or more of suspend/resume cycles (multiple) to surrface, Would the amd_s2idle/py
script be a good tool to speed up the repro?
Depends on the root cause of why they’re happening. It certainly might help speed it up.
I’m not sure what GPE10 actually points at.
Well, I managed to reproduce this without suspending/resuming. Or, at least, it didn’t manifest immediately upon resume, but only later, after the machine had been idle for long enough (15 minutes) that the screen saver activated (screens turned off) and the desktop locked.
Upon unlocking I saw the same type of worker busy that hadn’t been before me walking away from the machine.
Repeating the `trace-cmd` above showed this
$ grep -o -e "function=[_a-zA-Z_][_a-zA-Z0-9]*" trace.log |sort |uniq -c |sort -rn
484 function=pm_runtime_work
401 function=acpi_os_map_remove
240 function=btrfs_work_helper
238 function=kcryptd_crypt
194 function=psi_avgs_work
180 function=flush_to_ldisc
150 function=acpi_os_execute_deferred
81 function=toggle_allocation_gate
57 function=vmstat_update
56 function=blk_mq_run_work_fn
26 function=commit_work
16 function=wb_workfn
14 function=mt792x_pm_power_save_work
13 function=kfree_rcu_monitor
8 function=vmstat_shepherd
8 function=pci_pme_list_scan
7 function=mt792x_pm_wake_work
7 function=iova_depot_work_func
6 function=wb_update_bandwidth_workfn
4 function=flush_memcg_stats_dwork
2 function=neigh_managed_work
2 function=console_callback
2 function=blk_mq_timeout_work
1 function=work_fn
1 function=kernfs_notify_workfn
1 function=do_cache_clean
1 function=blkg_async_bio_workfn
Important edit here: While watching htop
I unplugged the TB4 dock that provides power and connections to various USB devices (keyboard, scanner), waited a few seconds, then plugged it in. This apparently resolved the issue.
Then some time later, after a resume this time, saw this again with a similar "frequency distribution".
$ grep -o -e "function=[_a-zA-Z_][_a-zA-Z0-9]*" trace.log.2 |sort |uniq -c |sort -rn
831 function=acpi_os_map_remove
333 function=acpi_os_execute_deferred
243 function=flush_to_ldisc
234 function=psi_avgs_work
148 function=commit_work
115 function=pm_runtime_work
107 function=toggle_allocation_gate
71 function=vmstat_update
46 function=mt792x_pm_power_save_work
45 function=kcryptd_crypt
29 function=wb_workfn
22 function=mt792x_pm_wake_work
19 function=iova_depot_work_func
18 function=kfree_rcu_monitor
12 function=vmstat_shepherd
8 function=blk_mq_requeue_work
7 function=btrfs_work_helper
5 function=wb_update_bandwidth_workfn
5 function=flush_memcg_stats_dwork
3 function=kernfs_notify_workfn
3 function=delayed_vfree_work
2 function=neigh_periodic_work
2 function=neigh_managed_work
2 function=console_callback
2 function=cfg80211_wiphy_work
1 function=reg_check_chans_work
1 function=inode_switch_wbs_work_fn
1 function=idle_cull_fn
1 function=hub_event
1 function=hci_rx_work
1 function=do_cache_clean
1 function=btusb_work
1 function=btusb_rx_work
1 function=blk_mq_timeout_work
1 function=blkg_async_bio_workfn
More edit: This time a simple unplug/plug in of the TB4 cable wouldn’t do it. I eventually got the issue to clear without a reboot by unplugging from TB4/power, suspending, resuming (no extra load any more) then plugging in (still no extra load).
During both of these events, grep '' /sys/firmware/acpi/interrupts/*
showed gpe10
showed it increasing but at a seemingly not high enough rate, something like 10/sec to 20/sec. powertop
on the other hand showed an ACPI kworker
taking significant time, for whatever that may be worth.
All this still on 6.7.4-200.fc39.x86_64
.
Any chance you can compile a kernel with the patches to cros_ec that let you access the EC debug log?
I think that’s the next clue here on where the EC bug is.
I can build a patched kernel, searched here for cros_ec patch
but couldn’t find them.
Edit: wait is this still the correct one?
I’ve booted into a patched 6.7.4 kernel, how/what to debug when this happens again?
Get a copy of the EC tool that @DHowett has mentioned in his blog and various posts.
Get familiar with using it when there isnt a problem (like how to launch it and view the EC debug log).
Then hopefully next time this happens you can run it and capture exactly what it says is going on.
It will be a clue for the Framework firmware guys to look at for understanding it.
I think I’m set up for debugging: cros_ec
kernel patches applied (V2 from the one I highlighted here, and ectool
built from the hx20-hx30
branch of FrameworkComputer/EmbeddedController
.
I think this means it works?
ectool console
output
$ sudo ./build/bds/util/ectool console
Place your right index finger on the fingerprint reader
quest(16632mV, 0mA)]
[2996894.945000 charge_request(16624mV, 0mA)]
[2996897.698800 charge_request(16632mV, 0mA)]
[2996897.947700 charge_request(16624mV, 0mA)]
PORT80: 3C01
PORT80: 3C08
PORT80: 3C01
PORT80: 0DB6
[2996909.714000 charge_request(16632mV, 0mA)]
[2996909.964600 charge_request(16624mV, 0mA)]
[2996911.716100 charge_request(16632mV, 0mA)]
[2996911.966300 charge_request(16624mV, 0mA)]
[2996912.718400 charge_request(16632mV, 0mA)]
[2996912.968900 charge_request(16624mV, 0mA)]
PORT80: 3C08
[2996918.726200 charge_request(16632mV, 0mA)]
[2996918.976800 charge_request(16624mV, 0mA)]
[2996922.731400 charge_request(16632mV, 0mA)]
[2996922.980600 charge_request(16624mV, 0mA)]
[2996926.728700 charge_request(16632mV, 0mA)]
[2996926.987000 charge_request(16624mV, 0mA)]
PORT80: 3C01
[2996930.740500 charge_request(16632mV, 0mA)]
[2996930.991600 charge_request(16624mV, 0mA)]
PORT80: 3C08
[2996933.650400 HC 0x0115 err 1]
PORT80: F90E
PORT80: 3C01
[2996941.142900 HC 0x0002]
[2996941.145100 HC 0x000b]
[2996941.146400 HC 0x0002]
[2996941.148500 HC 0x0004]
[2996941.754200 charge_request(16632mV, 0mA)]
[2996942.000700 charge_request(16624mV, 0mA)]
PORT80: 3C08
[2996943.754900 charge_request(16632mV, 0mA)]
[2996944.007200 charge_request(16624mV, 0mA)]
[2996954.769700 charge_request(16632mV, 0mA)]
[2996955.021000 charge_request(16624mV, 0mA)]
[2996956.773200 charge_request(16632mV, 0mA)]
[2996957.023300 charge_request(16624mV, 0mA)]
PORT80: 3C01
PORT80: 3C08
[2996962.780500 charge_request(16632mV, 0mA)]
[2996963.029800 charge_request(16624mV, 0mA)]
[2996963.779600 charge_request(16632mV, 0mA)]
[2996964.032400 charge_request(16624mV, 0mA)]
[2996965.784600 charge_request(16632mV, 0mA)]
[2996966.034800 charge_request(16624mV, 0mA)]
[2996967.786400 charge_request(16632mV, 0mA)]
[2996968.033600 charge_request(16624mV, 0mA)]
[2996968.787600 charge_request(16632mV, 0mA)]
[2996969.038700 charge_request(16624mV, 0mA)]
[2996973.792800 charge_request(16632mV, 0mA)]
[2996974.043300 charge_request(16624mV, 0mA)]
[2996975.797000 charge_request(16632mV, 0mA)]
[2996976.047200 charge_request(16624mV, 0mA)]
[2996977.799200 charge_request(16632mV, 0mA)]
[2996978.049500 charge_request(16624mV, 0mA)]
[2996978.800700 charge_request(16632mV, 0mA)]
[2996979.049600 charge_request(16624mV, 0mA)]
[2996981.802600 charge_request(16632mV, 0mA)]
[2996982.055100 charge_request(16624mV, 0mA)]
PORT80: 3C01
[2996982.804900 charge_request(16632mV, 0mA)]
[2996983.056300 charge_request(16624mV, 0mA)]
[2996985.809500 charge_request(16632mV, 0mA)]
[2996986.060900 charge_request(16624mV, 0mA)]
[2996991.816400 charge_request(16632mV, 0mA)]
[2996992.067300 charge_request(16624mV, 0mA)]
[2996994.645900 HC 0x0115 err 1]
PORT80: F90E
[2996994.819400 charge_request(16632mV, 0mA)]
[2996995.070900 charge_request(16624mV, 0mA)]
PORT80: 3C08
PORT80: 3D01
PORT80: 3D08
[2997021.852300 charge_request(16632mV, 0mA)]
[2997022.102400 charge_request(16624mV, 0mA)]
PORT80: 3C08
PORT80: 3C01
PORT80: 3C08
[2997036.869400 charge_request(16632mV, 0mA)]
[2997037.118700 charge_request(16624mV, 0mA)]
PORT80: 3C01
[2997040.865500 charge_request(16632mV, 0mA)]
[2997041.123200 charge_request(16624mV, 0mA)]
PORT80: 3C08
[2997055.695400 HC Suppressed: 0x97=0 0x98=0 0x113=0 0x103=0 0x115=60 0x2b=0 0x67=0 0x121=0]
[2997055.715200 HC 0x0115 err 1]
PORT80: F90E
PORT80: 3C01
PORT80: 3C08
PORT80: 3C01
PORT80: 3C08
[2997077.156500 charge_request(16632mV, 0mA)]
[2997077.407400 charge_request(16624mV, 0mA)]
PORT80: 3C01
PORT80: 3C08
PORT80: 3C01
PORT80: 3C08
PORT80: 3C01
[2997116.765300 HC 0x0115 err 1]
PORT80: F90E
PORT80: 3C08
PORT80: 3C01
PORT80: 3C08
PORT80: 3C01
PORT80: 3C08
PORT80: 3C01
PORT80: 3C08
[2997175.245500 charge_request(16632mV, 0mA)]
[2997175.496400 charge_request(16624mV, 0mA)]
[2997177.828700 HC 0x0115 err 1]
PORT80: F90E
PORT80: 3C01
[2997194.263000 charge_request(16632mV, 0mA)]
[2997194.513600 charge_request(16624mV, 0mA)]
[2997196.264700 charge_request(16632mV, 0mA)]
[2997196.513600 charge_request(16624mV, 0mA)]
PORT80: 3C08
PORT80: 3C01
[2997219.062200 HC 0x0002]
[2997219.064600 HC 0x000b]
Yup! That’s working. It gets the information needed to debug this next time it happens.
Starting to see this, with several of those kworker
s showing near/at the top of htop
again.
EC console
$ sudo ./build/bds/util/ectool console
Place your right index finger on the fingerprint reader
FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F81
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
[3021680.430600 HC 0x0115 err 1]
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3C08
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3C01
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: F90E
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F80
PORT80: 3FA0
[3021693.112600 HC 0x0002]
[3021693.114900 HC 0x000b]
So what’s PORT80
?
Those are BIOS post codes. I think you should file a bug with framework support with all of these details so they can dig in and understand them.
Thanks Mario, will do. For completeness since this thread will probably be referred to, here’s the console output with the last part showing the logspam mostly going away when the machine is unplugged from AC.
... (lots more PORT80 spam preceding)
PORT80: 3F84
PORT80: 3F88
PORT80: 3F80
PORT80: 3FA0
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F88
PORT80: 3F80
PORT80: 3FA0
[3030047.208300 HC 0x0115 err 1]
PORT80: 3FA4
PORT80: 3F84
PORT80: 3F88
PORT80: 3F80
PORT80: 3FA0
PORT80: 3F84
PORT80: 3F88
[3030047.690900 update charger!!]
[3030047.700000 AC off]
[3030047.701900 event set 0x0000000000000010]
P:3 SET TYPEC RP=1[3030047.713500 cypd_write_reg8_wait_ack pre 0x84 ]
[3030047.717200 cypd_write_reg8_wait_ack pre 0x84 ]
[3030047.720200 PORT_DISCONNECT]
PORT80: AA83
[3030047.726100 board_set_active_charge_port port -1, prev:3]
[3030047.737200 cypd_write_reg8_wait_ack pre 0x4 ]
[3030047.749200 event set 0x0400000000000000]
[3030047.767200 cypd_write_reg8_wait_ack pre 0x80 ]
[3030047.770000 event set 0x0400000000000000]
[3030047.784000 Battery 81% (Display 81.9 %) / ??h:?? to empty]
[3030047.788600 CL: p-1 s-1 i500 v0]
[3030047.790100 TODO Implement pd_set_new_power_request port 3]
PORT80: AA8F
[3030047.888900 cypd_update_power_status:0=0x8]
[3030047.892600 cypd_update_power_status:1=0x8]
PORT80: 3F44
PORT80: 3F48
PORT80: AA8F
[3030052.341300 Battery 81% (Display 81.8 %) / 11h:21 to empty]
[3030058.377500 HC 0x0002]
[3030058.380100 HC 0x000b]
Edit: opened the support ticket, issue summarized there and pointing back to this thread. I expect it’ll be a few days, especially with the Lunar New Year holiday.
I know it’s generally not useful to get “me too” posts, but since nobody else has corroborated this, I’ll say it. I haven’t dug in with ectool, but I have a bunch of busy kworker threads, and gpe10 is firing.
Thank you @dimitris for investigating and opening the bug report.
I’ve opened a support case with FW, where I owe them some logs/info. In the meantime, I’ve characterized this - at least as it happens with my setup - as related to the dock (Kensington 5780T) I’m using for power and USB devices.
The pattern is that, on the majority of resumes from s2idle
I’ll notice these parasitic kworker
s being busy. It always involves a pm
worker - I assume that’s power management - and often the acpi
worker is involved too. This never happens on fresh boot.
I can work around this by cycling power on the dock once or a few times. After the first (or second or third) quick-ish power cycle (just a few seconds off), these workers stop eating CPU.
The amount of state machinery involved here (PD/USB4/dock controllers + firmware as well as Linux drivers) seems so be making this race condition a little daunting to chase down.
However, one good first step would be decoding these EC console codes above. Any help on that would be appreciated.
I have this happen as well. Kernel 6.7.5-gentoo-x86_64, on AC power, USB4 dock (mokin) connected.
See if you can rebuild as described here and see if this helps:
I applied V2 of the patch to 6.7.8;
You say: see if this helps. Do you mean the patch is supposed to help with the issue? I was under the impression it only exposed the EC and allowed console output access.