[TRACKING] Hard freezing on Fedora 36 with the new 12th gen system

Matt_Hartley · November 22, 2022, 6:56pm

Please do. Working on my Fedora 37 installation as I type this. Did some suspend testing last night with excellent results.

vhx · November 22, 2022, 7:36pm

still good here after 24h.

although i am seeing a few new issues. No obvious symptoms in usability with these appearing.

$ dmesg -T | grep -i "drm] \*error\*"
[Tue Nov 22 16:31:40 2022] i915 0000:00:02.0: [drm] *ERROR* CPU pipe A FIFO underrun: port,transcoder,
[Wed Nov 23 14:02:24 2022] i915 0000:00:02.0: [drm] *ERROR* [ENCODER:275:DDI TC4/PHY TC4][DPRX] Failed to enable link training

both hinting towards a en/decoder problem which i’ve not seen before.

Matt_Hartley · November 22, 2022, 9:03pm

Definitely keep an eye on it. Thanks

Nicholas_La_Roux · November 23, 2022, 12:35am

6.0.9 is now generally available. Just upgraded to it via GNOME Software and removed the psr=0 config. Testing now. So far so good.

EDIT: After several hours of Chrome, VS Code, Spotify, Steam, a few suspends, and a clamshell session with peripherals over Thunderbolt, I still haven’t hit a single freeze. Feels like 6.0.9 may just be the fix we’ve been hoping for.

EDIT EDIT: Surprisingly, desite several hours of varied usage earlier, jumped back on and am experiencing the freezes again. This issue is less predicatble than I previously thought.

nadb · November 23, 2022, 2:00pm

@Nicholas_La_Roux my recommendation is upgrade to Fedora 37. The kernel does resolve a bunch of items that were cropping up, however based on what I have seen the freezes are also related to XWayland, and possibly GTK4, and how different calls are being made. The kernel is a step in the right direction but without those underlying items also receiving the latest I don’t think you are going to see the full benefit.

Nicholas_La_Roux · November 23, 2022, 11:01pm

I see and thank you but I’m already on Fedora 37 and have been for about 2 months at this point.

JHeffron · November 24, 2022, 6:17am

After testing a bit with kernel 6.0.9, the PSR fixes do improve/eliminate stuttering and major frame paint delays, but the i915 GPU hangs are completely unresolved.

To add some extra context, the hard freezes may not be linked just to chromium-based applications. I have played a few various Steam games (Barotrauma, Parkitect, Deep Rock Galactic) and all listed have been able to get lockups (with Parkitect tested for continued lockups on kernel 6.0.9). These apps (at least with Steam overlay enabled) and with background apps (namely Firefox, sometimes Blender, sometimes VSCodium) have been able to send the system into a post-i915 hang state.

Occasionally, depending on what crashes and if DRM can reset the display properly, the system might still be functional. In most cases though, the following will occur:

i915 hangs

kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:86cdffff, in Parkitect.x86_6 [4828]

followed by DRM failing to reset the GPU

kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
kernel: i915 0000:00:02.0: [drm] *ERROR* Failed to reset chip
kernel: i915 0000:00:02.0: [drm:add_taint_for_CI [i915]] CI tainted:0x9 by intel_gt_reset+0x23a/0x2a0 [i915]
kernel: [drm:__uc_sanitize [i915]] *ERROR* Failed to reset GuC, ret = -110

At which point, depending on which apps end up crashing will leave the system in various states. Namely, I have noticed if SDDM crashes, input control is regained and the system TTY sessions may be used, but ability to use graphical Xorg or Wayland sessions cannot be recovered/restarted during the boot.

If SDDM doesn’t crash, the system will outright keep hold of all input devices and while things like pipewire still function, the particular boot of the system will be left in lockup without any remote management available (ssh) to forcibly kill any stuck processes.

System information:
Framework 12th gen i5 1240p
Fedora 37
KDE Desktop (Wayland) with xwayland support enabled
SDDM (Xorg)
kernel 6.0.9-300.fc37.x86_64
special boot parameters: module_blacklist=hid_sensor_hub

Vik_A · November 27, 2022, 9:08pm

I arrived here from Google and have a dell XPS 15 12th gen and upgraded to Fedora 37 and updated to 6.0.9-300 kernel and still experience random freezes on Wayland with the Intel driver. Definitely not just a framework issue. Bummer it’s still not fixed in the newest kernel. Can usually get it on gnome settings after a while or Firefox + gnome settings. I do not have anything chromium or chrome based installed (no vscode Spotify or electron anything at the moment) so I don’t think it’s that either

Elmo · November 29, 2022, 10:46am

Update 29/11/2022.
Happened again at gnome settings. Fedora 37 with 6.0.9-200.fc36.x86_64 kernel.

vhx · November 29, 2022, 11:51am

STILL good for me with F37 KDE & no psr set in kernel args. Not a single issue with GPU BUG ecode 12:0:00000000 and using the laptop daily for multiple hours.
Without a way to reproduce on my hardware to confirm, seems like there’s something going on with the chrom{e|ium} libraries or gnome specific, which is also generating those different ecode values.

Gaming; flatpak Steam version, Quake1 was faultlessly reliable
non-gaming; general firefox browsing including youtube, browse-based emby playback, VLC, flatpak freecad & superslicer all running well.

KevSlashNull · November 29, 2022, 8:16pm

This also happened to me about once a week since I bought the Framework.

Ubuntu 22.04.1
5.15.0-53-generic #59-Ubuntu SMP
module_blacklist=hid_sensor_hub
Samsung SSD 980 PRO 1 TB
Randomly in Firefox, but yesterday I installed Kerbal Space Program (KSP), which seems to reliably cause the laptop to freeze a few minutes after game launch.

Random GPU hang in Firefox/VS Code/normal usage:

Okt 16 12:01:37 kevs-framework kernel: Asynchronous wait on fence 0000:00:02.0:gnome-shell[2584]:4fee timed out (hint:intel_atomic_commit_ready [i915])
Okt 16 12:01:41 kevs-framework kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:0:00000000
Okt 16 12:01:41 kevs-framework kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0

GPU hang while playing KSP:

Nov 29 20:52:35 kevs-framework kernel: Asynchronous wait on fence 0000:00:02.0:gnome-shell[3148]:cd6e timed out (hint:intel_atomic_commit_ready [i915])
Nov 29 20:52:39 kevs-framework kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:84dffffb, in KSP.x86_64 [5367]
Nov 29 20:52:39 kevs-framework kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
Nov 29 20:52:39 kevs-framework kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Nov 29 20:52:39 kevs-framework kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Nov 29 20:52:39 kevs-framework kernel: i915 0000:00:02.0: [drm] Renderer[5929] context reset due to GPU hang
Nov 29 20:52:39 kevs-framework kernel: i915 0000:00:02.0: [drm] KSP.x86_64[5367] context reset due to GPU hang

The random hang is usually recoverable by waiting 30-60 seconds, while the one while playing KSP (12:1:84dffffb) is not and requires a forced shutdown and boot.

vhx · November 30, 2022, 10:54am

Looks like you need to update kernel to >=6.0.9. ecode 12:0:00000000 should be resolved once thats done. Not sure if it’ll fix KSP, but worth a shot!

just a reminder; we’re running the latest 12th gen intel. We’re not going to find the required support or bug fixes in old kernels. This was a main driver why i moved to Fedora many years ago; much newer kernels for latest hardware.

real_or_random · November 30, 2022, 4:00pm

Was this a hard freeze or did the system come back? What was the ecode (or better the full log)?

Matt_Hartley · November 30, 2022, 5:21pm

Just to reiterate my own experiences:

So much this.

PDXTabs · November 30, 2022, 7:25pm

FWIW I’ve just been running vanilla mainline Linux kernels on my Ubuntu 22.04 equipped framework. You can find them here: Index of /mainline

Instructions here: How to Install the Latest Linux Kernel on Ubuntu & Linux Mint?

Nicholas_La_Roux · December 1, 2022, 1:39am

Quick update here, still experiencing freezes that automatically recover after about 10 seconds on Fedora 37 with 6.0.10 kernel (latest).

KevSlashNull · December 1, 2022, 3:10pm

Thanks for the help @vhx! I’ve installed kernel 6.0.9 on my Framework (yes, 12th gen) and it seems to have fixed the ecode 12:0:00000000, although I’ll know for sure in a few weeks. As for KSP, I’ve played it yesterday evening for like one or two hours with no freezes!

vhx · December 1, 2022, 3:24pm

i assume it’s generating an ecode; what is it? dmesg | grep -i ecode probably easiest way to find out.

egalanos · December 3, 2022, 5:44am

Whilst I haven’t been having GPU issues under F37 due to my relatively simple usage, seeing the ongoing posts made me think I should mention the debugging resources I had on my list of things to try in case the problem persisted.

Increase the level of logging with additional kernel command line parameters:
- drm.debug=0xe
  - Run modinfo drm to see the options
- log_buf_len=4M
- Source: https://01.org/linuxgraphics/documentation/bugs-and-debugging/tips-may-help-solve-your-issue-less-time
Capturing errors
- Prepare by installing igt-gpu-tools
- Capture error dumps:
  - cat /sys/class/drm/card*/error | gzip > gpu-error.gz
  - Source: https://01.org/linuxgraphics/documentation/how-get-gpu-error-state
- Run intel_error_decode to then decode an error dump
Online resources
- Issues tracker: Issues · drm / intel · GitLab
- https://01.org/linuxgraphics/documentation/bugs-and-debugging
  
  (look at the sections on the left side bar)
- https://01.org/linuxgraphics/documentation/development/how-debug-suspend-resume-issues

Hope that is helpful…

Nicholas_La_Roux · December 3, 2022, 11:48am

Just captured the error from a freeze (removed patch). Ocurred instantly after resuming from sleep.

~ 
❯ dmesg | grep -i ecode
[    1.261967] pci 0000:00:02.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
[    3.120876] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem