Happened again. Exactly identical conditions to trigger it as my previous post above: playing music while navigating settings in gnome-control-center, same kernel/OS/DM/WM/Wayland.
Sep 02 20:19:37 kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:84dffffb, in gnome-control-c [30624]
Sep 02 20:19:37 kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
Sep 02 20:19:37 kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Sep 02 20:19:37 kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Sep 02 20:19:37 kernel: i915 0000:00:02.0: [drm] gnome-control-c[30624] context reset due to GPU hang
Sep 02 20:19:37 kernel: i915 0000:00:02.0: [drm] GuC firmware i915/adlp_guc_70.1.1.bin version 70.1
Sep 02 20:19:37 kernel: i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc_7.9.3.bin version 7.9
Sep 02 20:19:37 kernel: i915 0000:00:02.0: [drm] HuC authenticated
Sep 02 20:19:37 kernel: i915 0000:00:02.0: [drm] GuC submission enabled
Sep 02 20:19:37 kernel: i915 0000:00:02.0: [drm] GuC SLPC enabled
This time there are some slight differences as the previous time I triggered this freeze:
- kernel line contains the additional parameter
i915.request_timeout_ms=60000
- modprobe.d config file blacklisting
hid_sensor_hub
was disabled (meaning the module was loaded normally this time around)
So the i915.request_timeout_ms
param does nothing @ngxson, at least that one on its own, and the hid_sensor_hub has nothing to do with the issue either. I haven’t tested the other params yet, but I’ve enabled sysrq unraw command (set the value kernel.sysctl=4
in a config file in /etc/sysctl.conf
).
Next time this happens I’m going to attempt to grab control of the keyboard (alt+prtscr+r
) and then ctrl+c
to attempt to kill the display manager and everything spawned by it to give control back to PID1. At the very least, it will enable a semi-graceful ctrl+alt+del
reboot. (If that doesn’t work, then I’ll set kernel.sysrq=132
to enable an ungraceful soft reboot, but hopefully we won’t need to go that far.)
==EDIT== This happened, and the unraw plus sigint trick worked, but sigint (ctrl+c) can cause corruption. It’s better to try switching to a different TTY and then switching back after unraw.