[TRACKING] Hard freezing on Fedora 36 with the new 12th gen system

  1. Fedora 36 Silverblue
  2. kernel 5.19.4
  3. sudo grubby --update-kernel=ALL --args="module_blacklist=hid_sensor_hub"
  4. SK Hynix Gold P31
  5. Open GNOME Settings (was downloading stuff in the background)

Out of curiosity, did you try the Gnome changes I made above?
I haven’t had any more issues since then and I had found a bug for how those extensions interacted with each other.

I have no extensions

  1. Fedora 36
  2. Linux fedora 5.19.4-200.fc36.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Aug 25 17:42:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
  3. Freezing started when I rebooted with hid_sensor_hub blacklisted, so I reverted the change and it no longer occurs. No other kernel parameters have been added
  4. SK Hynix P41
  5. Gnome Settings, when modifying the settings for the touchpad

Wanted to update here, in case it is useful to anyone: About a week ago, I switched my UI scaling in GNOME display settings back to 100%, and have not noticed any lockups since, be they soft or hard. I had previously been scaling to 125% after enabling the option to do so, by issuing gsettings set org.gnome.mutter experimental-features "['scale-monitor-framebuffer']"Currently on 7 days uptime.

Surprisingly, the forum’s search function doesn’t show hits in this topic for “scale” or “scaling”… It does seem like a large portion of people posting here are using GNOME, on various distros, so it might be something worth trying for folks… It is an experimental feature, after all.

@nrp

  1. Arch Linux running GNOME 42.3 (on Wayland 1.26)
  2. Linux 5.19.6-arch1-1 #1 SMP PREEMPT_DYNAMIC Wed, 31 Aug 2022 22:09:40 +0000 x86_64 GNU/Linux
  3. kernel line contains nvme.noacpi=1. Additionally, /etc/modprobe.d has a config file to blacklist hid_sensor_hub
  4. Sabrent Rocket 4.0 2TB with fw version RKT401.3
  5. Was trying to change settings in gnome-control-center while listening to music. A few other windows were up in the background (browser, terminal, nautilus). The music player is foobar2000 v2.0-beta-3 (32-bit), a Windows application running through Wine 7.0 via flatpak.

When I pulled up gnome’s settings app, the system completely froze. The music I was listening to kept playing even though the gpu hanged and graphics were frozen. within seconds, the CPU fan started getting loud and was going on full blast. I couldn’t regain control, so I had to hold down power until the system shut down ungracefully. Here are the journalctl entries right when the crash happened:

Sep 01 16:34:28 OverARCHing-Framework kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:84dffffb, in gnome-control-c [17179]
Sep 01 16:34:28 OverARCHing-Framework kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
Sep 01 16:34:28 OverARCHing-Framework kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Sep 01 16:34:28 OverARCHing-Framework kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Sep 01 16:34:28 OverARCHing-Framework kernel: i915 0000:00:02.0: [drm] gnome-control-c[17179] context reset due to GPU hang
Sep 01 16:34:28 OverARCHing-Framework kernel: i915 0000:00:02.0: [drm] GuC firmware i915/adlp_guc_70.1.1.bin version 70.1
Sep 01 16:34:28 OverARCHing-Framework kernel: i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc_7.9.3.bin version 7.9
Sep 01 16:34:28 OverARCHing-Framework kernel: i915 0000:00:02.0: [drm] HuC authenticated
Sep 01 16:34:28 OverARCHing-Framework kernel: i915 0000:00:02.0: [drm] GuC submission enabled
Sep 01 16:34:28 OverARCHing-Framework kernel: i915 0000:00:02.0: [drm] GuC SLPC enabled

I doubt this has anything at all to do with the SSD; this seems to be an issue specific to GNOME on Wayland on i915 GPU driver. Where in the graphics stack this is happening isn’t clear, but the stopped heartbeat on rcs0 followed by rcs0 reset request timed out issue has shown up on Google searches going back to at least 5 years in one form or another.

1 Like

Definitely seems to be something with GNOME. Had the issue on Fedora 36 but not when I switched to using Fedora 36’s KDE spin.

Happened again. Exactly identical conditions to trigger it as my previous post above: playing music while navigating settings in gnome-control-center, same kernel/OS/DM/WM/Wayland.

Sep 02 20:19:37 kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:84dffffb, in gnome-control-c [30624]
Sep 02 20:19:37 kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
Sep 02 20:19:37 kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Sep 02 20:19:37 kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Sep 02 20:19:37 kernel: i915 0000:00:02.0: [drm] gnome-control-c[30624] context reset due to GPU hang
Sep 02 20:19:37 kernel: i915 0000:00:02.0: [drm] GuC firmware i915/adlp_guc_70.1.1.bin version 70.1
Sep 02 20:19:37 kernel: i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc_7.9.3.bin version 7.9
Sep 02 20:19:37 kernel: i915 0000:00:02.0: [drm] HuC authenticated
Sep 02 20:19:37 kernel: i915 0000:00:02.0: [drm] GuC submission enabled
Sep 02 20:19:37 kernel: i915 0000:00:02.0: [drm] GuC SLPC enabled

This time there are some slight differences as the previous time I triggered this freeze:

  1. kernel line contains the additional parameter i915.request_timeout_ms=60000
  2. modprobe.d config file blacklisting hid_sensor_hub was disabled (meaning the module was loaded normally this time around)

So the i915.request_timeout_ms param does nothing @ngxson, at least that one on its own, and the hid_sensor_hub has nothing to do with the issue either. I haven’t tested the other params yet, but I’ve enabled sysrq unraw command (set the value kernel.sysctl=4 in a config file in /etc/sysctl.conf).

Next time this happens I’m going to attempt to grab control of the keyboard (alt+prtscr+r) and then ctrl+c to attempt to kill the display manager and everything spawned by it to give control back to PID1. At the very least, it will enable a semi-graceful ctrl+alt+del reboot. (If that doesn’t work, then I’ll set kernel.sysrq=132 to enable an ungraceful soft reboot, but hopefully we won’t need to go that far.)

==EDIT== This happened, and the unraw plus sigint trick worked, but sigint (ctrl+c) can cause corruption. It’s better to try switching to a different TTY and then switching back after unraw.

2 Likes

Something seems to be in the works (I just googled, don’t fully understand any of it really):
https://lore.kernel.org/all/DM4PR11MB5971A43B5E78F34B30EA5E1587729@DM4PR11MB5971.namprd11.prod.outlook.com/t/

2 Likes

yikes…

1 Like

I get lock ups and I’ve always used 100% scaling with large text

2 Likes

Can confirm; I’ve had lockups happen both with and without experimental scaling feature enabled. This is very much looking like a kernel driver issue

1 Like

Ubuntu mate 22.04 kernel 5.18 finally had my freezing after 3 days of usage.

  • had a first micro freez where I couldn t move the mouse on the trackpad and the blutooth mouse. Then after 5 seconds it came up back to life.
  • Had a hard freez for more than 20 secondes doing heavy taks like gimp + lots of tabes in chrome. So I thought it might be slowing down over heating.
    After 20 sec totaly uresponsive I decided to close the lide and reopen it 2 sec after. To my suprise I was prompted with login and could go back to my work !
  • Tried kernel 5.19 see if it s any better, as soon as I do “xrandr --output eDP-1 --auto --scale 1.4x1.4” to scale the screen, it imediately start to lag the mouse and micro freez. This command work on 5.18 and I am running with it since day 1

I’ve been trying different parameters for the last 1 week and I think I fixed it (at least on my side)

➜  ~ uptime
 08:54:56 up 2 days

Here’s what I’m having:

➜  ~ cat /proc/cmdline 
[...] i915.mitigations=off quiet nvme.noacpi=1 intel_iommu=off i915.request_timeout_ms=60000 i915.enable_psr=1

These commands are run at boot:

# remember to run as root
sysctl dev.i915.perf_stream_paranoid=0;
echo 10000 > /sys/class/drm/card0/engine/rcs0/preempt_timeout_ms;
echo 10000 > /sys/class/drm/card0/engine/rcs0/heartbeat_interval_ms;
echo 1000 > /sys/class/drm/card0/engine/rcs0/stop_timeout_ms;
1 Like

Hello guys,

I’m now also a member of the Fedora 36 freezing club on my 1240p DIY with 16GB of RAM and a 1TB SN850 SSD

  • Kernel 5.19.6-200.fc32.x86_64
  • Kernel is untouched, apart from the override to get the brightness buttons to work
  • Installed GNOME extensions are Dash to panel, Blur my Shell, Caffeine and the Compiz Windows Effect
  • The freeze usually happens when some sort of settings app is open, most of the time GNOME settings, but it also occurred when I was changing the properties of Dash to Panel.
  • I’m on Wayland at 150% scaling. Haven’t tried X yet.

Update: this seems to be happening when running wayland and xwayland apps at the same time. Not having an xwayland instance, I’m unable to trigger the freeze as of yet

@nrp I have a long email thread of testing with Framework support on the topic, where I was left hanging with a swapped out main board and a “good luck” on further testing (swapping out the mainboard did not stop the freezing). It was disappointing the mainboard swap didn’t fix the issue. (I am on Gen 11)

There appear to be at least 3 types of freezing described in this thread. Mine is almost certainly hardware based since kdump catches nothing.

  1. Which distro you are using (most reports here are for Fedora 36, but there are some mentions of other distros).

Fedora 36, but the freezing happened with Fedora 34 and Fedora 35 as well.

  1. Which kernel you are on (you can run “uname -v” to check)

fedora 5.19.4-200.fc36.x86_64

  1. Whether you’ve adjusted any kernel parameters, like setting the workaround to disable to ALS (module_blacklist=hid_sensor_hub)

“rhgb quiet nvme.noacpi=1 mem_sleep_default=deep ro rootflags=subvol=root”

  1. What model of SSD you are using

Freezing was independent of SSD model. Tried two WD SN850. Current model is Samsung Electronics NVMe SSD Controller SM981/PM981/PM983

  1. What circumstances you are seeing the freeze during (e.g. when using a specific application like Settings or uncorrelated to a specififc application)

The events appear random with some possible patterns:

a. if I physically use the keyboard especially when on lap, it typically happens every 20-45 minutes

b. if I use Microsoft teams or (less often) slack, either in the browser or in the application. Sometimes Chrome/Firefox seems to be enough to set it off on its own.

c. Most reliably: if I let it sit overnight without input, but this freeze is different from (a) and (b) – (a) and (b) seem stuck in a buffer loop – if a song is playing it will roll the same half second or so indefinitely, and video can sometimes stick around or get stuck in a mid-paint. (c) type freezes are almost certainly due to Linux’s bad idle/deep sleep handling (which I’ve tried several iterations to see if this behavior can be avoided).

Neither (a) to (c) are related to RAM, CPU load in any way – I’ve tested it with several tools at this point.

None of the freezes are caught by the kernel’s typical processes – kdump catches nothing, and systemd doesn’t catch anything. As far as I can tell that points to a hardware issue.

I have swapped out the RAM with all new ram, tried single sticks, etc. So have conclusively determined it isn’t a faulty mainboard, RAM, or SSD. Other things it might be:

  1. Bad driver – tried multiple Linux at this point but all freeze, no evidence of a driver issue in logs
  2. Keyboard or display/speaker/audio/wireless and BT causing a short – no evidence to rule this out yet

Just came across this Reddit post/comment about Fedora + a new Dell XPS (also 12th gen Intel) experiencing hard freezes while in settings. They said they managed to fix the freezing issue by uninstalling xorg-x11-drv-intel. No clue if this will help but certainly seems related.

I was also experiencing the system hard-locking/freezing and not responding to SysRq, and I solved this by uninstalling xorg-x11-drv-intel , since that driver is not required or optimized for the new Xe graphics.

3 Likes

I saw this too, will this mess up XWayland if we do this?

1 Like