I’m pretty sure this was not an accident:
https://patchwork.kernel.org/project/intel-gfx/patch/20210325180720.401410-38-matthew.d.roper@intel.com/
At this point, it may be best to reach out to i915 kernel folks.
I’m pretty sure this was not an accident:
https://patchwork.kernel.org/project/intel-gfx/patch/20210325180720.401410-38-matthew.d.roper@intel.com/
At this point, it may be best to reach out to i915 kernel folks.
Thanks for the speedy reply. Browsing the git tree for both my current kernel as well as the upcoming 6.0-rc4, it seems like they’re still using the same microcode definitions for tgl and adlp, so tgl_huc_7.9.3.bin
is the correct module.
Variables eliminated so far:
The crash almost always seems to be triggered by the gnome-control-center app, or less commonly some settings dialog within Gnome running on Wayland, while another application is either playing music or using xwayland.
What makes you think that this patch in particular is relevant? I applied it to my kernel and it didn’t seem to fix the GPU hangs I’d encountered.
A whole bunch of firmware updates were just released.
linux-firmware: Firmware for Linux kernel drivers
==============================================================================================================================================================================================
Package Architecture Version Repository Size
==============================================================================================================================================================================================
Upgrading:
iwl100-firmware noarch 39.31.5.1-138.fc36 updates 140 k
iwl1000-firmware noarch 1:39.31.5.1-138.fc36 updates 251 k
iwl105-firmware noarch 18.168.6.1-138.fc36 updates 219 k
iwl135-firmware noarch 18.168.6.1-138.fc36 updates 228 k
iwl2000-firmware noarch 18.168.6.1-138.fc36 updates 221 k
iwl2030-firmware noarch 18.168.6.1-138.fc36 updates 230 k
iwl3160-firmware noarch 1:25.30.13.0-138.fc36 updates 992 k
iwl3945-firmware noarch 15.32.2.9-138.fc36 updates 81 k
iwl4965-firmware noarch 228.61.2.24-138.fc36 updates 94 k
iwl5000-firmware noarch 8.83.5.1_1-138.fc36 updates 364 k
iwl5150-firmware noarch 8.24.2.2-138.fc36 updates 137 k
iwl6000-firmware noarch 9.221.4.1-138.fc36 updates 156 k
iwl6000g2a-firmware noarch 18.168.6.1-138.fc36 updates 336 k
iwl6000g2b-firmware noarch 18.168.6.1-138.fc36 updates 343 k
iwl6050-firmware noarch 41.28.5.1-138.fc36 updates 295 k
iwl7260-firmware noarch 1:25.30.13.0-138.fc36 updates 9.5 M
iwlax2xx-firmware noarch 20220815-138.fc36 updates 45 M
libertas-usb8388-firmware noarch 2:20220815-138.fc36 updates 105 k
linux-firmware noarch 20220815-138.fc36 updates 177 M
linux-firmware-whence noarch 20220815-138.fc36 updates 52 k
Installing weak dependencies:
amd-gpu-firmware noarch 20220815-138.fc36 updates 14 M
intel-gpu-firmware noarch 20220815-138.fc36 updates 7.1 M
nvidia-gpu-firmware noarch 20220815-138.fc36 updates 1.2 M
Transaction Summary
==============================================================================================================================================================================================
Install 3 Packages
Upgrade 20 Packages
Total download size: 258 M
Interested to see if they help at all, given my Framework 12th gen was just delivered.
I don’t think so. Arch has the relevant firmware files already for a while.
I still think that it’s best to report this to the intel-gfx
kernel maintainers, but probably it should be done by someone who can reproduce the problem often and is willing to run a vanilla kernel, see Reporting issues — The Linux Kernel documentation
I have experienced this the last few days on Fedora 36 and I still seeing if i can get any logs. I experience a hard lockup and audio stops as well as video. After rebooting I don’t see any unusual activity in the previous boot logs using journalctl --boot -1
Kernel:
5.19.6-200.fc36.x86_64
However it seems to always happen to me when I am in a google meet video call.
I too am having freezing issues with my newly upgraded 12th Gen i7-1260P with 16GB and a 2TB WD SN750. I noticed a comment from @Paul_Sorensen that mirrors the same problem I am having now. I also am using Windows 11 and my laptop just randomly freezes and then shuts off. I have run multiple tests on the hardware and it never seems to freeze during the testing. I have completely wiped and reinstalled the OS, no luck. I’m hoping for a solution soon as the laptop is not reliable in its current condition. It was doing so well at first!!
I also ran memtest86 and it passed all tests. Finally, I swapped the RAM module with my framework and my wife’s, and hers was still freezing. Then I swapped her SSD with my SSD and the one running her SSD still froze. However, there have been two times on the machine running my SSD that powered off and rebooted while I wasn’t there to observe if it frozen or not. I can see in the Windows Event Viewer that there was a Kernel-Power 41 error which I also observed are logged after it boots back up after freezing. So I suspect my computer is doing it too, but I haven’t witnessed it on mine.
Guess I’ll hold off on the upgrade for now then.
I too really hope this isn’t as bad as it sounds. Going to be my only computer for the next month starting today.
That said, now that I have one, I can actually jump into debugging, seeing what I can find. Assuming I hit the same issues.
This occurred again for me, so I had another go at digging into it…
I was able to get output from journalctl and dmesg by piping output to a remote server over netcat; log output is similar to what was posted before (gnome-settings open, playing audio and manipulating touchpad settings eventually caused a crash here):
[ 1043.589794] Asynchronous wait on fence 0000:00:02.0:gnome-shell[2701]:4632 timed out (hint:intel_atomic_commit_ready [i915])
[ 1047.464971] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:0020fdfe, in gnome-control-c [5847]
[ 1047.465011] i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
[ 1047.567662] i915 0000:00:02.0: [drm] ERROR rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
[ 1047.568369] i915 0000:00:02.0: [drm] ERROR rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
[ 1047.568453] i915 0000:00:02.0: [drm] gnome-control-c[5847] context reset due to GPU hang
[ 1047.568507] i915 0000:00:02.0: [drm] GuC firmware i915/adlp_guc_70.1.1.bin version 70.1
[ 1047.568509] i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc_7.9.3.bin version 7.9
[ 1047.583819] i915 0000:00:02.0: [drm] HuC authenticated
[ 1047.584864] i915 0000:00:02.0: [drm] GuC submission enabled
[ 1047.584873] i915 0000:00:02.0: [drm] GuC SLPC enabled
The network stays up, so enabling sshd beforehand allows access from another system. Shelling in and sending SIGKILL to the gnome-shell process kicks the desktop back to the login prompt, without the need to hard power cycle. HTH
I’m bummed and throwing in the towel. My small business needs working systems. This really sucks and I’m sad – I had high hopes for the framework because I love the DIY and repairability promise offsetting some of the initial capex. Even with a new mainboard sent from Framework to look at one of our that have persistent freezing, freezing continues to worsen over time. My employees are complaining about thermal management being a major issue for them as well.
This isn’t a Framework laptop issue, but a Linux kernel driver issue, and will be present on every 12th gen computer that uses the igpu.
If you don’t want to hit it, don’t use Linux on any 12th gen Intel system that doesn’t have a discrete GPU.
Brand new DIY i7-1260P, Arch install, Gnome 42.4, Wayland (no XWayland), Hynix P31 2TB, Crucial 2x16GB RAM (from the approved list). I’m also having lockups in gnome settings.
5.19.7-arch1-1, #1 SMP PREEMPT_DYNAMIC Mon, 05 Sep 2022 18:09:09 +0000
GRUB_CMDLINE_LINUX=“cryptdevice=UUID=e1fb5806-1f0a-4edb-bbd4-855e2a6a4c2e:cryptroot:allow-discards root=/dev/mapper/cryptroot resume=UUID=967555c6-1617-4dd2-acd7-207a79a74dc5 resume_offset=192020480”
I saw the lockup when I had Plexamp AppImage installed and running. I was also in the TouchPad settings portion of the menu when this happened.
So this may not apply to everyone’s system, I have a Windows 11 setup. However, i found another thread that dealt with an issue with the DisplayPort and HDMI expansion cards which dealt with excessive sleep power consumption, [Beta] DisplayPort Expansion Card firmware update to reduce system power consumption - Framework Laptop / DIY Edition - Framework Community.
On a hunch I removed my DP card and have been now going almost 24 hours without a freeze/shutdown, whereas before the error would occur before every 3-5 hours. I’m still holding my breath on this this though. BTW, removing the DP card also seems to have corrected a lag I was having when in the UEFI/BIOS where I would get these pauses while scrolling through the menus.
Can anyone else check if removing their DP or HDMI card would make a difference on the stability of their laptop? Thanks. @Paul_Sorensen
This may be it! My wife has a Displayport card in hers and I don’t in mine, that’s the only difference between our laptops. I’ll try swapping them and see if mine starts having the freezing behavior.
On the point above, I have an HDMI card in mine and have so far only experienced a single freeze. That freeze occurred while in GNOME settings within the first hour or so after Fedora installation. If I am able to reproduce the freeze I will try without the HDMI card installed to see if that makes any difference.
FWIW, I have neither DP or HDMI cards installed and have not had a freeze since I installed fedora over a week ago. Sounds like it might have some merit.
So Linux [kernel] isn’t suitable for 12th gen Intel systems with iGPU right now.
So much for:
Framework saying “We’re ready for you.”. Linux saying “Not quite yet.”
So this is really an annual event where Linux plays catch up with the hardware…as always for the past 2 decades.
Something needs to change between chip makers / designers and kernel developers working relationship.
Mind you, with that notion, it seems to say “If it’s software, it’s not Framework’s issue”…then where does one draw the line when “[hardware] is optimized for [software]”? Because it seems to say, if it’s not working, it’s software issue. From a general consumer’s perspective…I’m not sure if that’s clear.