[TRACKING] Hard freezing on Fedora 36 with the new 12th gen system

Nicholas_La_Roux · September 13, 2022, 3:40pm

On the point above, I have an HDMI card in mine and have so far only experienced a single freeze. That freeze occurred while in GNOME settings within the first hour or so after Fedora installation. If I am able to reproduce the freeze I will try without the HDMI card installed to see if that makes any difference.

Firestorm980 · September 14, 2022, 12:42am

FWIW, I have neither DP or HDMI cards installed and have not had a freeze since I installed fedora over a week ago. Sounds like it might have some merit.

Banana · September 14, 2022, 11:18am

Seems to only happen on Gnome though. Maybe @T_RRR can use a different fedora spin?

Second_Coming · September 14, 2022, 4:38pm

So Linux [kernel] isn’t suitable for 12th gen Intel systems with iGPU right now.

So much for:

Framework saying “We’re ready for you.”. Linux saying “Not quite yet.”

So this is really an annual event where Linux plays catch up with the hardware…as always for the past 2 decades.

Something needs to change between chip makers / designers and kernel developers working relationship.

Mind you, with that notion, it seems to say “If it’s software, it’s not Framework’s issue”…then where does one draw the line when “[hardware] is optimized for [software]”? Because it seems to say, if it’s not working, it’s software issue. From a general consumer’s perspective…I’m not sure if that’s clear.

mjog · September 14, 2022, 11:37pm

You know what? Software has bugs. Complex software has more bugs. Software that interacts with the real world via hardware is even more complex and has even more bugs. Software like the Linux kernel that is supposed to run on every possible combination of every possible piece of hardware is one of the most complex pieces of software on the planet, and it sees a lot of bugs and regressions. But it does admirably well, given how broad its applications are.

So yes, Intel integrated GPU drivers for a newer chipset has problems on Linux, again. I’ve been running Linux-based systems with Intel iGPUs as my sole desktop and server OS for about 25 years, and I’ve seen regressions like this show up a number of times before. But I can assure you it’s not an annual event.

Would you rather Framework have a disclaimer about possible software regressions on their web site under that banner? What about one for kernel subsystems maintainers getting hit by a bus? How about a warning that your Debian stable may be out of date, your Arch could push an update that nukes your filesystem, or your Gentoo CFLAGS may be cooked? Anything else?

Intel already maintains the driver for their iGPUs. Maybe you should go file a bug report with them rather than complaining about it on a forum? It might actually get fixed that way.

Second_Coming · September 15, 2022, 1:08am

Yes, agree there, Sherlock. Something as simple as running graphical setting configurator could hard crash a system shows the lack of focus / polish.

It is if you want a generally usable current gen hardware to run Linux distro OOTB. Last year, it was fingerprint reader and wifi issues with the 11th gen Framework laptop.

I’m not sure if that makes the situation sounds better or worse. Having Intel maintains their own iGPU driver…and this STILL came through the cracks.

You mean no one has submitted a bug all this time? i.e. You’re saying this alternative action would actually be more useful…sad if that’s true…because now you’re saying no one, anyone with a 12th gen iGPU, had file a bug as of now.

The kernel, sure is complex. Intel’s developers on iGPU…one job.

Finding excuses seems to be the exercise for Linux distro supporters. Get it done, right, should be the goal instead.

p.s. Ah…I see you’re new here. FYI then: I complain, a lot.

It’s not what I ‘rather’. It’s an indicator that Fedora 36 (released on 2022-05-10) was not sufficiently used / tested / reviewed by Framework even though it was dogfooding internally. Months gone by and it had not been reported (?) nor addressed until customers report on it? And to be fair, neither did any other 12th gen Intel iGPU linux-focus laptop manufacturer (going base on your mention that this isn’t a Framework-only issue)?

mjog · September 15, 2022, 1:37am

Thanks, muted.

Second_Coming · September 15, 2022, 2:09am

If you wait long enough…things may get iron out enough such that your business can use the Framework Laptop…assuming your use cases can dodge enough bullets / issues. On a positive note, it seems the 12th gen models have less reported issues than last year’s model. I think the product is maturing overall. And this particular issue is expected to get fixed via software updates.

But if you need a trusty laptop to run Linux, today…well, you have your actual user experience to know what your next step(s) needs to be.

Going back to the issue:
To the majority of computer users, they care about whether the system can do what they need to do, reliably, without curve balls. They don’t care about which software project messed up or who owns the code to fix. It’s a user experience expectation. The Intel iGPU team is dropping the ball in this instance…does it matter to most people who are experiencing the freeze that it’s Intel devs team who messed up? I doubt the end-users care about the who.

ayane · September 16, 2022, 12:59am

Another data point; it happened again.

Sep 15 17:47:31 kernel: Asynchronous wait on fence 0000:00:02.0:gnome-shell[1869]:e4d34 timed out (hint:intel_atomic_commit_ready [i915])
Sep 15 17:47:35 kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:84dffffb, in gnome-shell [1869]
Sep 15 17:47:35 kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
Sep 15 17:47:35 kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Sep 15 17:47:35 kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Sep 15 17:47:35 kernel: i915 0000:00:02.0: [drm] gnome-shell[1869] context reset due to GPU hang
Sep 15 17:47:35 kernel: i915 0000:00:02.0: [drm] GuC firmware i915/adlp_guc_70.1.1.bin version 70.1
Sep 15 17:47:35 kernel: i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc_7.9.3.bin version 7.9
Sep 15 17:47:35 kernel: i915 0000:00:02.0: [drm] HuC authenticated
Sep 15 17:47:35 kernel: i915 0000:00:02.0: [drm] GuC submission enabled
Sep 15 17:47:35 kernel: i915 0000:00:02.0: [drm] GuC SLPC enabled

This time, I was watching a video through my browser (Librewolf) which was on wayland. I was in the Gnome apps menu sliding back and forth until the hang happened.

Didn’t hard reboot. I enabled the “unraw” sysrq value (alt+prtscr+R) and then was able to ctrl+C to kill the display manager. Waited a few moments to be presented with GDM again, and am logged back in right now.

==EDIT==
DON’T ctrl+c to kill gdm if this happens, especially if you do not have any accounts on the machine or are using systemd-homed to manage your acounts! My GDM got screwed up after I did that and was auto-logging into gnome-initial setup. I had to manually uninstall and reinstall gdm through a TTY, then entered my username and password to repopulate my account. Was a 20 minute detour that could’ve been avoided simply by rebooting the computer rather (ctrl+alt+del) than trying to kill gdm and keep the init running.

Simon_F · September 16, 2022, 5:45am

I think you ignore all those people, who dont have this issue…
You have no idea how many (percentage wise) are affected by this.
I am running F36 for about 4 weeks now and did not have a single freeze / crash and am very satisfied with the experience, so ymmv

Even big companies like apple push out new hardware with an OS that is home grown and still has bugs in it…

Second_Coming · September 16, 2022, 5:53am

No. Not ignored. Keyword on “sufficiently”.

Neither do you.

The following is somewhat related in concept / idea:

Not looking for bug-free software. Issue is with how ‘in your face’ the issue can be for those who hit it. Crashing in Gnome Settings is just one scenario.

Simon_F · September 16, 2022, 6:53am

True, i dont. Still only those, who have issues, get vocal about it, while the majority of people, who have no issues, stay silent. This can make an issue seem much more broad than it really is.

I have a fairly standard F36 installation, since i just started using Linux as my daily driver and dont want to mess with it too much. I dont have this issue.
Lets say the FW Team decided to give a 12th gen (each CPU config once) with F36 to three employees as a work machine for testing. There is a possibility that none of them had this issue, so how are they supposed to find out?
Who says that the bug was already present on Launch of F36 or when the first 12th gen were shipped?
There are so many variables to this issue and right now we dont even know who is the one to blame.

Second_Coming · September 16, 2022, 7:08am

People always try to make this as a point…for something.

Regardless the of ‘broadness’, it’s a bug to be fixed. Just like the one audio polarity issue that was fixed in the BIOS for one reported user.

Comes down to thoroughness of the test cases.

Two claims were made by mjog (above); “Linux kernel driver issue”, and “will be present on every 12th gen computer that uses the igpu.” …so going with that, devlopers (Intel) should / would have seen it…even before Framework would…then eventually other Linux-focus laptop brands.

If the issue wasn’t present in software when F36 was first released, Framework could have already mentioned something along the line of “F36 updates introduced stability issue”…really early in this thread …close to 6 weeks ago.

Again, regardless ‘who’ to blame (consumers don’t care generally, they / we just want a working system, soon). I’m saying, it’s an annual event with Linux distros.

I know I’m uptight as hell. My view is mine, and yours is yours. I’m not here to convince anyone…and neither should you or anyone.

Simon_F · September 16, 2022, 7:39am

Just put this into perspective of the discussion and you will get the point

100%

Dont know who he is or what reputation he has to make this claim, but again, i dont have the issue, so i guess this statement is false.

Its a bad idea to simply blame something, if you are not 110% sure about it

Not trying to convince anyone here, just giving another perspective.

JayV · September 16, 2022, 9:53am

FWIW I came here from a backlink from the Lenovo Linux forums where multiple folks with a with 12th gen based Thinkpad X1 Carbon (mine’s a 1250-P) are having thermal issues with some running into GUI freezes or black screens. I have the i915 GPU HANG myself:

Sep 15 15:43:48 carbon kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:2:0400000b, in chrome [4202]
Sep 15 15:43:54 carbon kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:2:0400000b, in chrome [5049]
Sep 15 15:44:01 carbon kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:2:0400000b, in chrome [5049]
Sep 15 15:44:07 carbon kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:2:0400000b, in chrome [5049]
Sep 15 15:44:14 carbon kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:2:0400000b, in chrome [7318]
Sep 15 15:44:16 carbon kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:2:0400000b
Sep 16 00:24:06 carbon kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:85dfbff7, in Xorg [2806]

I get it randomly when playing youtube videos in chrome (GPU decoding) on a 4k display connected via USB-C/DP. Some folks get it on Fedora, I’m running Ubuntu 22.04 on Linux 5.18.19 (most recent intel patches) with Cinnamon DE, I don’t even have gnome-settings installed so I doubt the app that crashes is even relevant.

Lenovo seems to struggle hard to get thermals under control with the 12th gen in thin/small laptops. (see linked post from Lenovo)

So I tend to agree with the statement that 12th gen has issues across the board, it’s not just a Framework problem, Lenovo also sells the X1 Carbon as an Ubuntu and Fedora Certified laptop and are still working through issues. The frequent stutters, it’s effectively slower than my 8th gen Dell XPS due to agressive (thermal) throttling, and the lost work from GUI freezes… all contributed to me replacing it with a Ryzen 6850U based laptop coming in tomorrow.

Second_Coming · September 16, 2022, 2:41pm

Did the tree fall when no one is around to see it?

My point: It’s undetermined how widespread it is…but if the software (igpu kernel driver in this case) is the same 12the gen systems with Iris xe, then that lays the foundational condition for the issue to exist. Hitting it or not is another matter. Doesn’t mean the issue doesn’t exist.

Sure, there are ways where the issue can be sliced so that it may not apply to some systems (e.g. microcode difference, EU count differences…etc… we don’t know). But that’s what this thread is partially about: Ask the wide population, get data point, THEN narrow in. Till then, the wide population (12th gen) is what we’re looking at.

For example, wings breaking off from planes…you ground them all (of that generation / model) even if you’re an airline from another country. Go wide…then narrow in.

Back to your point: It’s premature to say whether the statement is absolutely true or false. On the side of caution is the likely path forward for now.

ayane · September 16, 2022, 4:53pm

More updates -

The crash happens randomly, and cannot reliably be triggered with the Gnome control center or app overview grid. It happens during music or video playback when it does happen. It seems that resuming from hibernation makes the issue harder to trigger on my end, though the opposite is true for other users here. Resume plays a part.

==How to fix==
The simplest nondestructive workaround is to enable the unraw sysrq command (set kernel.sysrq=4 in a config file in /etc/sysctl.d/).
Then, if the freeze does occur, do the following:

Enter left alt+prtscr+R. This will give keyboard access back to the init.
Then try switching to a different TTY (e.g. ctrl+alt+F3). You may need to hold down the fn key when switching TTYs if you do not have fn lock turned on in the firmware.
From the different TTY, you can do one of two things:
- try switching back to the TTY running the display server containing your original session. If that isn’t an option, then
- log in and kill gnome-shell, recover data, reboot, etc

This is the simplest workaround. As @davidk0 mentioned earlier in this thread, you can also send SIGINT or SIGKILL via SSH, which is an equally simple solution, though it will require a second computer and ssh configuration.

ayane · September 17, 2022, 5:35am

This issue might get fixed in Linux 6.1, which will land in December: Intel Sends Updated GPU Firmware Handling, More Meteor Lake Graphics Code For Linux 6.1 - Phoronix
Pull request: [PULL] drm-intel-gt-next

Just in case this isn’t resolve in 6.1, there’s this bugtracker: Intel alder Lake GPU hangs on Thinkpad P1 Gen5 (#6757) · Issues · drm / intel · GitLab

Please go and post your journalctl output on that bugtracker so as to provide additional data points, along with the output of inxi -Gzx

confus · September 17, 2022, 2:59pm

Just for completeness: I have the same issue on NixOS on a framework with i5-1240P

$ sudo dmesg -w

[  659.921831] Asynchronous wait on fence 0000:00:02.0:X[2089]:57674 timed out (hint:intel_atomic_commit_ready [i915])
[  662.681008] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:0:00000000
[  662.681209] i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
[  662.783962] i915 0000:00:02.0: [drm] GuC firmware i915/adlp_guc_70.1.1.bin version 70.1
[  662.783973] i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc_7.9.3.bin version 7.9
[  662.800377] i915 0000:00:02.0: [drm] HuC authenticated
[  662.801187] i915 0000:00:02.0: [drm] GuC submission enabled
[  662.801192] i915 0000:00:02.0: [drm] GuC SLPC enabled

NixOS 22.11.20220913.9608ace
#1-NixOS SMP PREEMPT_DYNAMIC Thu Sep 8 09:24:80 UTC 2022
Linux 5.18.8, Frimware Version: 03.04
initrd=\efi\nixos\gzg3iqb90k3msc29np35qinirpw2bi2i-initrd-linux-5.19.8-initrd.efi init=/nix/store/nr0ykravq2zfqwxscda4fl5xw14qpd32-nixos-system-nixos-22.11.20220913.9608ace/init mem_sleep_default=deep nvme.noacpi=1 loglevel=4
Western Digital Black SN850 2TB
DWM on lightdm about every minute system freezes for 2 to 10 seconds

NixOS lends itself exceptionally well to reproduce issues of that kind as a configuration is completely almost bit-accurately reproducible. If someone needs a config, feel free to ask.

Btw. is the issue known to Framework? Should people contact support or something?

Second_Coming · September 17, 2022, 8:24pm

Yes.

and…