[TRACKING] Hard freezing on Fedora 36 with the new 12th gen system

Still got a hard freeze then recovery when trying

enable_psr=0

@Aggraxis how do I undo your solution if I want? Delete the i915.conf file and reboot? Or do I need to run more commands e.g. something with dracut?

1 Like

This is what I get when running vainfo:

Trying display: wayland
libva info: VA-API version 1.16.0
libva info: Trying to open /usr/lib64/dri/iHD_drv_video.so
libva info: va_openDriver() returns -1
libva info: Trying to open /usr/lib64/dri/i965_drv_video.so
libva info: Found init function __vaDriverInit_1_15
libva error: /usr/lib64/dri/i965_drv_video.so init failed
libva info: va_openDriver() returns -1
vaInitialize failed with error code -1 (unknown libva error),exit

Seems like libva isn’t configured correctly as it can’t even init?

Edit: and this is after installing libva, libva-utils, libva-intel-driver, ffmpeg, which weren’t installed by default and therefore I couldn’t run vainfo

Edit2: after removing libva-intel-driver and insalling intel-media-driver, vainfo seems to complete successfully:

Trying display: wayland
libva info: VA-API version 1.16.0
libva info: Trying to open /usr/lib64/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_16
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.16 (libva 2.16.0)
vainfo: Driver version: Intel iHD driver for Intel(R) Gen Graphics - 22.5.4 ()
vainfo: Supported profile and entrypoints
      VAProfileNone                   :	VAEntrypointVideoProc
      ...

In firefox in about:config check if media.ffmpeg.vaapi.enabled is set to true. Close firefox, reopen and see if tha tmakes a difference. If that does not work go back to about:config and set gfx.webrender.all to true.

@Kelby_Faessler
You would either comment out the line in the i915.conf or remove the file, then run sudo dracut --force again, followed by a reboot.

1 Like

By the way, for anyone looking for more data, I upgraded to F37 without incident some time ago. My i915.conf changes are still in place.

1 Like

Happened again. Fedora 37, Linux fedora 6.0.11-300.fc37.x86_64 and I was in gnome-settings.

Tail of journalctl log below. What else should I look for when data collecting?

Dec 09 23:40:57 fedora kernel: Asynchronous wait on fence 0000:00:02.0:gnome-shell[1973]:15542 timed out (hint:intel_atomic_commit_ready [i915])
Dec 09 23:41:01 fedora kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:84dffffb, in gnome-control-c [28845]
Dec 09 23:41:01 fedora kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
Dec 09 23:41:01 fedora kernel: i915 0000:00:02.0: [drm] ERROR rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Dec 09 23:41:01 fedora kernel: i915 0000:00:02.0: [drm] ERROR rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Dec 09 23:41:01 fedora kernel: i915 0000:00:02.0: [drm] gnome-control-c[28845] context reset due to GPU hang
Dec 09 23:41:01 fedora kernel: i915 0000:00:02.0: [drm] brave[4303] context reset due to GPU hang
Dec 09 23:41:01 fedora kernel: i915 0000:00:02.0: [drm] GuC firmware i915/adlp_guc_70.1.1.bin version 70.1
Dec 09 23:41:01 fedora kernel: i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc_7.9.3.bin version 7.9
Dec 09 23:41:01 fedora kernel: i915 0000:00:02.0: [drm] HuC authenticated
Dec 09 23:41:01 fedora kernel: i915 0000:00:02.0: [drm] GuC submission enabled
Dec 09 23:41:01 fedora kernel: i915 0000:00:02.0: [drm] GuC SLPC enabled

@Elmo it looks like there was a gpu hang on the brave browser process. That doesn’t mean brave was the culrit, but that’s where it happened.

I haven’t been gaming on my laptop in a while, and today after an update it looks like I’m having GPU hangs again - even with the psr=0 parameter. What’s odd is that when using Wayland I can get the display to hang within about 2 minutes just by running circles around in a circle. With X11, although it was running noticeably slower, the system kept running for much longer without hanging up. I’m still digging, but it looks like maybe either Intel or the Wayland folks have something freaky going on.

edit - I haven’t been seeing these hangs under normal use, which for me means some browsing, a horizon client session, and some terminal work.

1 Like

Man, this makes my insides hurt. So we can see that the kernel module gets loaded, gets its firmware, does some happy normal stuff. I fire off my game, run about 6 circles around the area I’m in, and then VRRRP frozen. This last time it gave me one little burp of frames at the end, and I assume that’s what happened at that spot where the module seems to have been reloaded.

Anyhow, this is all on F37 w/ kernel 6.0.12-300.fc37. Going to be fun teasing this out.

2 Likes

Small update on this. Despite running Kernel 6.0.9, I encountered eleven freezes (!) with ecode 12:0:00000000 and one with ecode 12:1:0036abdf since Dec 10th. I’ll update to 6.1 in the coming days now that it’s released. I might also try using xorg instead of Wayland like @Aggraxis suggested.

I also thought it might be related to Ubuntu’s Power Mode (esp. Power Saving) but some freezes occurred in the default Balanced mode.

@KevSlashNull In playing with things yesterday I found that I was still able to alt-f3 over to another shell like the previous flavor of freezes. I was also able to get the system responsive again by finding the wayland session process (ps -eaf | grep wayland), killing it, and then in my case restarting the sddm process (systemctl restart sddm). Gnome users will need to restart gdm (systemctl restart gdm) instead.

What’s driving me nuts with this round of freezes is the inconsistency. I can run the aquarium from webglsamples.org with 25,000 fish at 60 fps for more an an hour with no freeze. Firing up FFXIV via xivlauncher and playing the game for not even a minute results in a freeze. AND JUST THE DISPLAY. I’m certain the game is otherwise happily running. The music is definitely playing.

This happens on battery, on AC power, in a house with a mouse, etc.

I took all of the kernel module tweaks out, but nothing changed. I even disabled all of the power saving features in the driver, still no result. I haven’t gotten anything else to barf out more info that anyone’s going to find useful. Still digging. Somehow between all of us out there we’ll figure this out.

Ok. I need to do a lot more testing, but for my specific freeze I think I found at least a partial answer:

https://wiki.archlinux.org/title/intel_graphics#Enable_GuC_/_HuC_firmware_loading

Based on that page, it says that that new in Gen12 is using the GuC for “scheduling, context submission, and power management.”

Hmmmmm… Ok, so just for giggles I added options i915 enable_guc = 0, which fully disables the firmware loading. Yes, this probably messes with video playback acceleration, but here’s the kicker:

FFXIV ran for 3 hours straight, and not just me running circles around an Aetheryte. I went all over the place, mined some weird stuff, and put the system through its paces (fan whirring on max the whole time). It ran great,

I commented the line out, re-ran dracut again, and on next boot the game crashed in 47 seconds. So yeah, I need to run more trails, and also check to see if options 1 or 2 make a difference. (I suspect option 2 will work fine, and that it’s 1 or 3 where it loads the GuC Submission that it will go bonkers, but I need to fiddle with it.)

Anyways, if you happen to try that out, let me know how it goes. More info is better.

Ok so far 1 crashed pretty quickly, and 3 (the default) also crashes. 0 ran well, and 2 is running fine right now. Going to abuse it a bit.

1 Like


Video acceleration still seems to be working fine with enable_guc=2. No crashes yet. Still abusing. Looks promising.

dang I tried enable_guc=0 and enable_guc=2 and I still get freezes when I reopen chrome tabs.

Weird. I think the only other changes I made to my system were steps 1 and 2 out of this article:

And really, I had RPMFusion set up already at that point. Step 2, for those who don’t want to read that other thread, is this set of package installs:

sudo dnf groupinstall multimedia
sudo dnf install intel-media-driver libva libva-utils gstreamer1-vaapi ffmpeg intel-gpu-tools mesa-dri-drivers mpv

Spoke with my contact with Fedora (works for Red Hat).

He indicated that the freezing is a known issue that has proven difficult to replicate with Fedora and other distros as well. So we’re working on it, but at this point, the best thing we can do is:

  • Keep updating. My hope is they’ll have an idea what is going on soon.
  • For any tweaks you make, please make a note in case a fix comes along and these tweaks (kernel parameters for example) end up breaking any potential fixes later on).
4 Likes

Sorry not a framework guy – but similar situation: f37 (6.0.13-300.fc37.x86_64) and 12th Gen Intel(R) Core™ i7-1260P.

I am currently experiencing some relief with boot param:

intel_idle.max_cstate=1 and intel_idle.max_cstate=2

Last night I got like 2 un-interrupted hours of use w/1 . Laptop fan ran the whole time.

Right now, w/2, stable for like 20 mins. Laptop fan cycles intermittently.

Prior to adding this (or nomodeset) I could hardly boot.

max_cstate can go to 3 or 4 maybe more. Has to do with power savings during idleness.

This sux…but HTH…

Changing cstate is a good idea, but based on what I am told from a representative of the Fedora project, it has affected other distros and has been difficult to replicate. I’m on a 12th Gen Framework, on Fedora 37, latest kernel - zero freezing. But I also have nothing attached.

Fedora dumped a bunch of f37 updates in last 12-24 hours. I got about 34 updates – lots of firmware packages. Noteworthy: kernel-6.0.14-300.fc37 and intel-gpu-firmware-20221214-145.fc37 and intel-gpu-firmware-20221109-144.fc37.

Made no real changes for me. After a fresh boot, no switches, usually with 5 min after gdm login hang, with lots of trash on my screen. This happened after these updates, too.

Rebooting with max_cstate=1 and max_cstate=2, AFAIK, is stable. max_cstate=3, had a lock up (but not graphics artifacts on screen) after about an hour or so. This is primarily browsing with chrome (google variant); some www/youtube videos, really not much else – machine too unreliable to work on at this point.

I did, for a work day, have this rig using HDMI monitor and then a thunderbolt hub with a 2nd HDMI driving another monitor, as well as ethernet rolling. This surprisingly worked out really well, for like 8 hours – primarily local browsing and then a VPN to my corporate with an RDC session.

I don’t know if some poison came in some dnf update – but hasn’t been stable since 12/19 and was only moderately stable since inception of this laptop on 12/15.

(will review thread to see how you/others got to some stability – maybe I missed something; the cstate thing was picked up from some other thread out on the WWW )

HTH someone

If it’s freezing while attached to something (hub, monitor, etc), I’d start by testing stability without those things. Then if it’s stable, we know that updates may have hosed something in terms of the extras attached.

Checking dmesg when possible or even better if it’s completely freezes, checking the journalctl.

But if a distro is unstable, begin stripping things away as to identify the trigger point (even if caused by an update). It’s a good starting point in conjunction to checking the journalctl after freezing.

As it sits now, there should be no reason to add cstate parameters for stability. When a specific kernel breaks something, sure, but otherwise if it’s needed, it’s time to revisit a previously working kernel.

I will run updates tonight to catch my Framework up to the latest. See if I can replicate it. I will not be connecting to a dock or display because I want to emulate Framework issues first, vs attached device compatibility issues. That’s after I establish the laptop is golden, first.

1 Like

Agreed, actually except for the 1 day of ‘success’, with hub, monitors, etc, I usually run the laptop with nothing plugged in. Sigh, almost seemed more stable with the hub and stuff – but not enough for a daily driver.

FWIW: last dnf update (230 PM est 12/23/2022) contained, among other things:

    Upgrade  xorg-x11-drv-intel-2.99.917-54.20210115.fc37.x86_64 @updates
    Upgraded xorg-x11-drv-intel-2.99.917-53.20200205.fc37.x86_64 @@System

I applied that and after a naked boot (just laptop, no peripherals), almost immediate hang/crudded-up screen. I added back in intel_idle.max_cstate=2; still seems to ok for me at the moment, no enable_psr=0, no Huc/Guc tuning.

HTH