[TRACKING] Hard freezing on Fedora 36 with the new 12th gen system

I too really hope this isn’t as bad as it sounds. Going to be my only computer for the next month starting today.

That said, now that I have one, I can actually jump into debugging, seeing what I can find. Assuming I hit the same issues.

This occurred again for me, so I had another go at digging into it…

I was able to get output from journalctl and dmesg by piping output to a remote server over netcat; log output is similar to what was posted before (gnome-settings open, playing audio and manipulating touchpad settings eventually caused a crash here):

[ 1043.589794] Asynchronous wait on fence 0000:00:02.0:gnome-shell[2701]:4632 timed out (hint:intel_atomic_commit_ready [i915])
[ 1047.464971] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:0020fdfe, in gnome-control-c [5847]
[ 1047.465011] i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
[ 1047.567662] i915 0000:00:02.0: [drm] ERROR rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
[ 1047.568369] i915 0000:00:02.0: [drm] ERROR rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
[ 1047.568453] i915 0000:00:02.0: [drm] gnome-control-c[5847] context reset due to GPU hang
[ 1047.568507] i915 0000:00:02.0: [drm] GuC firmware i915/adlp_guc_70.1.1.bin version 70.1
[ 1047.568509] i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc_7.9.3.bin version 7.9
[ 1047.583819] i915 0000:00:02.0: [drm] HuC authenticated
[ 1047.584864] i915 0000:00:02.0: [drm] GuC submission enabled
[ 1047.584873] i915 0000:00:02.0: [drm] GuC SLPC enabled

The network stays up, so enabling sshd beforehand allows access from another system. Shelling in and sending SIGKILL to the gnome-shell process kicks the desktop back to the login prompt, without the need to hard power cycle. HTH

I’m bummed and throwing in the towel. My small business needs working systems. This really sucks and I’m sad – I had high hopes for the framework because I love the DIY and repairability promise offsetting some of the initial capex. Even with a new mainboard sent from Framework to look at one of our that have persistent freezing, freezing continues to worsen over time. My employees are complaining about thermal management being a major issue for them as well.

This isn’t a Framework laptop issue, but a Linux kernel driver issue, and will be present on every 12th gen computer that uses the igpu.

If you don’t want to hit it, don’t use Linux on any 12th gen Intel system that doesn’t have a discrete GPU.

3 Likes

Brand new DIY i7-1260P, Arch install, Gnome 42.4, Wayland (no XWayland), Hynix P31 2TB, Crucial 2x16GB RAM (from the approved list). I’m also having lockups in gnome settings.

5.19.7-arch1-1, #1 SMP PREEMPT_DYNAMIC Mon, 05 Sep 2022 18:09:09 +0000

GRUB_CMDLINE_LINUX=“cryptdevice=UUID=e1fb5806-1f0a-4edb-bbd4-855e2a6a4c2e:cryptroot:allow-discards root=/dev/mapper/cryptroot resume=UUID=967555c6-1617-4dd2-acd7-207a79a74dc5 resume_offset=192020480”

I saw the lockup when I had Plexamp AppImage installed and running. I was also in the TouchPad settings portion of the menu when this happened.

So this may not apply to everyone’s system, I have a Windows 11 setup. However, i found another thread that dealt with an issue with the DisplayPort and HDMI expansion cards which dealt with excessive sleep power consumption, [Beta] DisplayPort Expansion Card firmware update to reduce system power consumption - Framework Laptop / DIY Edition - Framework Community.

On a hunch I removed my DP card and have been now going almost 24 hours without a freeze/shutdown, whereas before the error would occur before every 3-5 hours. I’m still holding my breath on this this though. BTW, removing the DP card also seems to have corrected a lag I was having when in the UEFI/BIOS where I would get these pauses while scrolling through the menus.

Can anyone else check if removing their DP or HDMI card would make a difference on the stability of their laptop? Thanks. @Paul_Sorensen

1 Like

This may be it! My wife has a Displayport card in hers and I don’t in mine, that’s the only difference between our laptops. I’ll try swapping them and see if mine starts having the freezing behavior.

1 Like

On the point above, I have an HDMI card in mine and have so far only experienced a single freeze. That freeze occurred while in GNOME settings within the first hour or so after Fedora installation. If I am able to reproduce the freeze I will try without the HDMI card installed to see if that makes any difference.

1 Like

FWIW, I have neither DP or HDMI cards installed and have not had a freeze since I installed fedora over a week ago. Sounds like it might have some merit.

Seems to only happen on Gnome though. Maybe @T_RRR can use a different fedora spin?

So Linux [kernel] isn’t suitable for 12th gen Intel systems with iGPU right now.

So much for:

Framework saying “We’re ready for you.”. Linux saying “Not quite yet.”

So this is really an annual event where Linux plays catch up with the hardware…as always for the past 2 decades.

Something needs to change between chip makers / designers and kernel developers working relationship.

Mind you, with that notion, it seems to say “If it’s software, it’s not Framework’s issue”…then where does one draw the line when “[hardware] is optimized for [software]”? Because it seems to say, if it’s not working, it’s software issue. From a general consumer’s perspective…I’m not sure if that’s clear.

You know what? Software has bugs. Complex software has more bugs. Software that interacts with the real world via hardware is even more complex and has even more bugs. Software like the Linux kernel that is supposed to run on every possible combination of every possible piece of hardware is one of the most complex pieces of software on the planet, and it sees a lot of bugs and regressions. But it does admirably well, given how broad its applications are.

So yes, Intel integrated GPU drivers for a newer chipset has problems on Linux, again. I’ve been running Linux-based systems with Intel iGPUs as my sole desktop and server OS for about 25 years, and I’ve seen regressions like this show up a number of times before. But I can assure you it’s not an annual event.

Would you rather Framework have a disclaimer about possible software regressions on their web site under that banner? What about one for kernel subsystems maintainers getting hit by a bus? How about a warning that your Debian stable may be out of date, your Arch could push an update that nukes your filesystem, or your Gentoo CFLAGS may be cooked? Anything else?

Intel already maintains the driver for their iGPUs. Maybe you should go file a bug report with them rather than complaining about it on a forum? It might actually get fixed that way.

6 Likes

Yes, agree there, Sherlock. Something as simple as running graphical setting configurator could hard crash a system shows the lack of focus / polish.

It is if you want a generally usable current gen hardware to run Linux distro OOTB. Last year, it was fingerprint reader and wifi issues with the 11th gen Framework laptop.

I’m not sure if that makes the situation sounds better or worse. Having Intel maintains their own iGPU driver…and this STILL came through the cracks.

You mean no one has submitted a bug all this time? i.e. You’re saying this alternative action would actually be more useful…sad if that’s true…because now you’re saying no one, anyone with a 12th gen iGPU, had file a bug as of now.

The kernel, sure is complex. Intel’s developers on iGPU…one job.

Finding excuses seems to be the exercise for Linux distro supporters. Get it done, right, should be the goal instead.

p.s. Ah…I see you’re new here. FYI then: I complain, a lot.

It’s not what I ‘rather’. It’s an indicator that Fedora 36 (released on 2022-05-10) was not sufficiently used / tested / reviewed by Framework even though it was dogfooding internally. Months gone by and it had not been reported (?) nor addressed until customers report on it? And to be fair, neither did any other 12th gen Intel iGPU linux-focus laptop manufacturer (going base on your mention that this isn’t a Framework-only issue)?

Thanks, muted.

1 Like

If you wait long enough…things may get iron out enough such that your business can use the Framework Laptop…assuming your use cases can dodge enough bullets / issues. On a positive note, it seems the 12th gen models have less reported issues than last year’s model. I think the product is maturing overall. And this particular issue is expected to get fixed via software updates.

But if you need a trusty laptop to run Linux, today…well, you have your actual user experience to know what your next step(s) needs to be.

Going back to the issue:
To the majority of computer users, they care about whether the system can do what they need to do, reliably, without curve balls. They don’t care about which software project messed up or who owns the code to fix. It’s a user experience expectation. The Intel iGPU team is dropping the ball in this instance…does it matter to most people who are experiencing the freeze that it’s Intel devs team who messed up? I doubt the end-users care about the who.

Another data point; it happened again.

Sep 15 17:47:31 kernel: Asynchronous wait on fence 0000:00:02.0:gnome-shell[1869]:e4d34 timed out (hint:intel_atomic_commit_ready [i915])
Sep 15 17:47:35 kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:84dffffb, in gnome-shell [1869]
Sep 15 17:47:35 kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
Sep 15 17:47:35 kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Sep 15 17:47:35 kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Sep 15 17:47:35 kernel: i915 0000:00:02.0: [drm] gnome-shell[1869] context reset due to GPU hang
Sep 15 17:47:35 kernel: i915 0000:00:02.0: [drm] GuC firmware i915/adlp_guc_70.1.1.bin version 70.1
Sep 15 17:47:35 kernel: i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc_7.9.3.bin version 7.9
Sep 15 17:47:35 kernel: i915 0000:00:02.0: [drm] HuC authenticated
Sep 15 17:47:35 kernel: i915 0000:00:02.0: [drm] GuC submission enabled
Sep 15 17:47:35 kernel: i915 0000:00:02.0: [drm] GuC SLPC enabled

This time, I was watching a video through my browser (Librewolf) which was on wayland. I was in the Gnome apps menu sliding back and forth until the hang happened.

Didn’t hard reboot. I enabled the “unraw” sysrq value (alt+prtscr+R) and then was able to ctrl+C to kill the display manager. Waited a few moments to be presented with GDM again, and am logged back in right now.

==EDIT==
DON’T ctrl+c to kill gdm if this happens, especially if you do not have any accounts on the machine or are using systemd-homed to manage your acounts! My GDM got screwed up after I did that and was auto-logging into gnome-initial setup. I had to manually uninstall and reinstall gdm through a TTY, then entered my username and password to repopulate my account. Was a 20 minute detour that could’ve been avoided simply by rebooting the computer rather (ctrl+alt+del) than trying to kill gdm and keep the init running.

2 Likes

I think you ignore all those people, who dont have this issue…
You have no idea how many (percentage wise) are affected by this.
I am running F36 for about 4 weeks now and did not have a single freeze / crash and am very satisfied with the experience, so ymmv

Even big companies like apple push out new hardware with an OS that is home grown and still has bugs in it…

2 Likes

No. Not ignored. Keyword on “sufficiently”.

Neither do you.

The following is somewhat related in concept / idea:

Not looking for bug-free software. Issue is with how ‘in your face’ the issue can be for those who hit it. Crashing in Gnome Settings is just one scenario.

True, i dont. Still only those, who have issues, get vocal about it, while the majority of people, who have no issues, stay silent. This can make an issue seem much more broad than it really is.

I have a fairly standard F36 installation, since i just started using Linux as my daily driver and dont want to mess with it too much. I dont have this issue.
Lets say the FW Team decided to give a 12th gen (each CPU config once) with F36 to three employees as a work machine for testing. There is a possibility that none of them had this issue, so how are they supposed to find out?
Who says that the bug was already present on Launch of F36 or when the first 12th gen were shipped?
There are so many variables to this issue and right now we dont even know who is the one to blame.

People always try to make this as a point…for something.

Regardless the of ‘broadness’, it’s a bug to be fixed. Just like the one audio polarity issue that was fixed in the BIOS for one reported user.

Comes down to thoroughness of the test cases.

Two claims were made by mjog (above); “Linux kernel driver issue”, and “will be present on every 12th gen computer that uses the igpu.” …so going with that, devlopers (Intel) should / would have seen it…even before Framework would…then eventually other Linux-focus laptop brands.

If the issue wasn’t present in software when F36 was first released, Framework could have already mentioned something along the line of “F36 updates introduced stability issue”…really early in this thread …close to 6 weeks ago.

Again, regardless ‘who’ to blame (consumers don’t care generally, they / we just want a working system, soon). I’m saying, it’s an annual event with Linux distros.

I know I’m uptight as hell. My view is mine, and yours is yours. I’m not here to convince anyone…and neither should you or anyone.