[TRACKING] Hard freezing on Fedora 36 with the new 12th gen system

I’ve been trying different parameters for the last 1 week and I think I fixed it (at least on my side)

➜  ~ uptime
 08:54:56 up 2 days

Here’s what I’m having:

➜  ~ cat /proc/cmdline 
[...] i915.mitigations=off quiet nvme.noacpi=1 intel_iommu=off i915.request_timeout_ms=60000 i915.enable_psr=1

These commands are run at boot:

# remember to run as root
sysctl dev.i915.perf_stream_paranoid=0;
echo 10000 > /sys/class/drm/card0/engine/rcs0/preempt_timeout_ms;
echo 10000 > /sys/class/drm/card0/engine/rcs0/heartbeat_interval_ms;
echo 1000 > /sys/class/drm/card0/engine/rcs0/stop_timeout_ms;
1 Like

Hello guys,

I’m now also a member of the Fedora 36 freezing club on my 1240p DIY with 16GB of RAM and a 1TB SN850 SSD

  • Kernel 5.19.6-200.fc32.x86_64
  • Kernel is untouched, apart from the override to get the brightness buttons to work
  • Installed GNOME extensions are Dash to panel, Blur my Shell, Caffeine and the Compiz Windows Effect
  • The freeze usually happens when some sort of settings app is open, most of the time GNOME settings, but it also occurred when I was changing the properties of Dash to Panel.
  • I’m on Wayland at 150% scaling. Haven’t tried X yet.

Update: this seems to be happening when running wayland and xwayland apps at the same time. Not having an xwayland instance, I’m unable to trigger the freeze as of yet

@nrp I have a long email thread of testing with Framework support on the topic, where I was left hanging with a swapped out main board and a “good luck” on further testing (swapping out the mainboard did not stop the freezing). It was disappointing the mainboard swap didn’t fix the issue. (I am on Gen 11)

There appear to be at least 3 types of freezing described in this thread. Mine is almost certainly hardware based since kdump catches nothing.

  1. Which distro you are using (most reports here are for Fedora 36, but there are some mentions of other distros).

Fedora 36, but the freezing happened with Fedora 34 and Fedora 35 as well.

  1. Which kernel you are on (you can run “uname -v” to check)

fedora 5.19.4-200.fc36.x86_64

  1. Whether you’ve adjusted any kernel parameters, like setting the workaround to disable to ALS (module_blacklist=hid_sensor_hub)

“rhgb quiet nvme.noacpi=1 mem_sleep_default=deep ro rootflags=subvol=root”

  1. What model of SSD you are using

Freezing was independent of SSD model. Tried two WD SN850. Current model is Samsung Electronics NVMe SSD Controller SM981/PM981/PM983

  1. What circumstances you are seeing the freeze during (e.g. when using a specific application like Settings or uncorrelated to a specififc application)

The events appear random with some possible patterns:

a. if I physically use the keyboard especially when on lap, it typically happens every 20-45 minutes

b. if I use Microsoft teams or (less often) slack, either in the browser or in the application. Sometimes Chrome/Firefox seems to be enough to set it off on its own.

c. Most reliably: if I let it sit overnight without input, but this freeze is different from (a) and (b) – (a) and (b) seem stuck in a buffer loop – if a song is playing it will roll the same half second or so indefinitely, and video can sometimes stick around or get stuck in a mid-paint. (c) type freezes are almost certainly due to Linux’s bad idle/deep sleep handling (which I’ve tried several iterations to see if this behavior can be avoided).

Neither (a) to (c) are related to RAM, CPU load in any way – I’ve tested it with several tools at this point.

None of the freezes are caught by the kernel’s typical processes – kdump catches nothing, and systemd doesn’t catch anything. As far as I can tell that points to a hardware issue.

I have swapped out the RAM with all new ram, tried single sticks, etc. So have conclusively determined it isn’t a faulty mainboard, RAM, or SSD. Other things it might be:

  1. Bad driver – tried multiple Linux at this point but all freeze, no evidence of a driver issue in logs
  2. Keyboard or display/speaker/audio/wireless and BT causing a short – no evidence to rule this out yet

Just came across this Reddit post/comment about Fedora + a new Dell XPS (also 12th gen Intel) experiencing hard freezes while in settings. They said they managed to fix the freezing issue by uninstalling xorg-x11-drv-intel. No clue if this will help but certainly seems related.

I was also experiencing the system hard-locking/freezing and not responding to SysRq, and I solved this by uninstalling xorg-x11-drv-intel , since that driver is not required or optimized for the new Xe graphics.

3 Likes

I saw this too, will this mess up XWayland if we do this?

1 Like

Unfortunately have no idea. Don’t have my Framework yet to give it a shot.

I have mine so I could give it a shot. generally I don’t like messing with my system too much although I’m running Silverblue so I could reverse it if it doesn’t work or breaks something

Ok, I’ve uninstalled it. Will let you know if I get more crashes or if XWayland breaks. Take no news as good news although I will be sure to give an update at some point

1 Like

After not having experienced any freezes for a while, it happened again today. The only things I think might have to do with it are the following: I had a USB mouse connected (Logitech M185), and had the Settings app open on the Mouse & Touchpad page. Touchpad was also enabled. I’ve now disabled the touchpad while a mouse is connected, I’ll update if it happens again.

  1. Fedora 36
  2. #1 SMP PREEMPT_DYNAMIC Wed Aug 31 17:58:15 UTC 2022
  3. BOOT_IMAGE=(hd0,gpt5)/vmlinuz-5.19.6-200.fc36.x86_64 root=UUID=6b4bede2-5076-47c1-836f-a8deca92003b ro rootflags=subvol=root rhgb quiet
  4. Samsung 980 1TB
  5. Gnome settings open in the Mouse & Touchpad tab (not the active window), USB mouse connected and touchpad enabled

I just came across this thread, and luckily I had this issue only once (per my logs). Let summarize some information because it’s scattered in the thread, and add some more hints:

People seem to experience this mostly when the GNOME settings app is running.

What helped is uninstalling the Xorg Intel driver (xorg-x11-drv-intel on Fedora, xf86-video-intel on Arch.

This is consistent with the observation that issues happen when XWayland applications are running.

Removing the Xorg Intel driver should not affect XWayland because it’s not used there, and anyway most sources seem to recommend removing it for newer generations of Intel graphics.

4 Likes

I’m not convinced that this is a solution, because I never had this driver installed and I’m still experiencing the issue. I’ve just avoided using the settings app because that’s where the crash seems to be happening

So far uninstalling xorg-x11-drv-intel has fixed my issue and I can still run everything just fine

I see, sad. Maybe uninstalling this just makes much more unlikely to hit a problem deeper in the driver stack…

edit: Actually I can confirm, I uninstalled xf86-video-intel after I read the thread here, and now saw the issue within hours (and just once in a few weeks with the package installed). I may try reinstalling it and see if this actually makes things better here.

1 Like

Since I haven’t had a crash yet since, I wanted to add something: I actually managed to unfreeze the system while doing the sysrq reisub sequence but I’m a bit confused as to what actually happened.
I first did the sequence without holding fn, because I have the function keys set to fx per default. When this didn’t do anything, I tried doing the sequence while holding fn, and after pressing prtscr+alt+fn+e (If I remember correctly), the laptop suddenly unfroze and loaded into the login screen. When I logged in, all applications were closed (as to be expected), but it also closed things like the wifi service which should normally be running on boot. Since I don’t know which processes needed to be launched and my applications were closed anyway, I rebooted my system.

I just noticed that on these crash logs, this line keeps showing up:
kernel: i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc_7.9.3.bin version 7.9
The name of the binary file tgl_huc_7.9.3.bin implies Tiger Lake, whereas the GuC line explicitly had “adlp” in the file name (adlp_guc_70.1.1.bin)… which makes me wonder if HuC is something that should actually be disabled for Alder Lake P.

The Arch wiki article for Intel graphics has this to say:


which makes me think it might be a potential suspect, or even the culprit in this case. I’m going to disable just HuC and see what happens.

I’m not sure why that would be causing the issue though, since I’m not using hardware HEVC decoding (and HuC is the HEVC “microcontroller”), but perhaps it hasn’t been updated to support ADLP? Or perhaps it didn’t need an update since ADLP inherits its graphics from TGL since nothing has overtly changed about Iris Xe between the two generations?

1 Like

I’m pretty sure this was not an accident:
https://patchwork.kernel.org/project/intel-gfx/patch/20210325180720.401410-38-matthew.d.roper@intel.com/

At this point, it may be best to reach out to i915 kernel folks.

Thanks for the speedy reply. Browsing the git tree for both my current kernel as well as the upcoming 6.0-rc4, it seems like they’re still using the same microcode definitions for tgl and adlp, so tgl_huc_7.9.3.bin is the correct module.

Variables eliminated so far:

  • removing wrong userspace driver (xf86-video-intel on Arch, xorg-x11-drv-intel on Fedora) does not solve the freeze (thanks @real_or_random for the additional data point)
  • Any combination of the tweaking the following kernel parameters do not solve the freeze:
    • i915.enable_psr
    • i915.request_timeout_ms
    • nvme.noacpi
    • module_blacklist=hid_sensor_hub (including modprobe.d config variant)
  • Fractional scaling
  • Integer scaling
  • disabling HuC or GuC

The crash almost always seems to be triggered by the gnome-control-center app, or less commonly some settings dialog within Gnome running on Wayland, while another application is either playing music or using xwayland.

3 Likes

What makes you think that this patch in particular is relevant? I applied it to my kernel and it didn’t seem to fix the GPU hangs I’d encountered.