Fedora tends to focus more on GNOME as it’s their official environment.
Now on stock latest Fedora 37 (gnome)
Getting some hangs that last 2-5 seconds, with these logs:
Feb 24 21:19:03 macaria kernel: Asynchronous wait on fence 0000:00:02.0:gnome-shell[1899]:46dfe timed out (hint:intel_atomic_commit_ready [i915])
Feb 24 21:19:07 macaria kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:0:00000000
Feb 24 21:19:07 macaria kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
Feb 24 21:19:07 macaria kernel: i915 0000:00:02.0: [drm] GuC firmware i915/adlp_guc_70.bin version 70.5.1
Feb 24 21:19:07 macaria kernel: i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc.bin version 7.9.3
Feb 24 21:19:07 macaria kernel: i915 0000:00:02.0: [drm] HuC authenticated
Feb 24 21:19:07 macaria kernel: i915 0000:00:02.0: [drm] GuC submission enabled
Feb 24 21:19:07 macaria kernel: i915 0000:00:02.0: [drm] GuC SLPC enabled
Some of them don’t include the i915
message, but they all show GPU HANG
. I wasn’t even playing any games this time, just running apps and streaming a remote desktop. Something’s definitely still up.
[gabe@macaria ~]$ cat /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="rhgb quiet nvme.noacpi=1 psr=0 module_blacklist=hid_sensor_hub"
GRUB_DISABLE_RECOVERY="true"
GRUB_ENABLE_BLSCFG=true
I have psr=0
in my grub, is this maybe causing the problem now?
That is more likely causing the issue.
@nadb Sure, I can remove that. Got it from the Framework Fedora 37 install recs Fedora 37 Installation on the Framework Laptop - Framework Guides
Considering it mentions that flag is a way to save energy with NVME drives, it’s definitely possible it’s causing hangs. As great as power saving stuff is, it can also cause various kinds issues, especially hangs such as with WiFi and graphics.
# Improve power saving for NVMe drives:
sudo grubby --update-kernel=ALL --args="nvme.noacpi=1"
Anecdotally, I have the psr=0
flag but no acpi
flag and I never hit flags these days. (kernel: 6.1.13)
New GPU HANG just dropped
Feb 25 19:23:55 macaria kernel: Asynchronous wait on fence 0000:00:02.0:Xwayland[2975]:92132 timed out (hint:intel_atomic_commit_ready [i915])
Feb 25 19:23:58 macaria kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:0:00000000
Feb 25 19:23:58 macaria kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
Feb 25 19:23:58 macaria kernel: i915 0000:00:02.0: [drm] GuC firmware i915/adlp_guc_70.bin version 70.5.1
Feb 25 19:23:58 macaria kernel: i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc.bin version 7.9.3
Feb 25 19:23:58 macaria kernel: i915 0000:00:02.0: [drm] HuC authenticated
Feb 25 19:23:58 macaria kernel: i915 0000:00:02.0: [drm] GuC submission enabled
Feb 25 19:23:58 macaria kernel: i915 0000:00:02.0: [drm] GuC SLPC enabled
[gabe@macaria ~]$ cat /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="rhgb quiet psr=0 module_blacklist=hid_sensor_hub"
GRUB_DISABLE_RECOVERY="true"
GRUB_ENABLE_BLSCFG=true
Programs running:
Telegram, Firefox, NoMachine (as a client, streaming another computer to me). I doubt it has anything to do with the programs, but thought I’d mention just in case. Guess I’ll turn off psr=0
and try again, and maybe try the other kernel param variant that disables psr.
Appending to my previous post (390). This issue does not seem to be KDE-specific. I had done a couple days of testing while developing applications with Android Studio and QEMU VMs within pure Weston (with XWayland enabled), and can definitely confirm that the GPU Hangs persist in Weston. Notably, there do seem to be a lesser frequency of them (presumably due to the simple, barebones environment) and that I have not a hang actually properly lockup weston permanently in comparison to kwin_wayland.
With Android Studio and IntelliJ IDEA, I notice that the most GPU Hangs occur when some kind of sub-window is being spawned (autocompletion, Alt+Enter actions, warnings, etc.) All of this was done with psr=0 set.
I do think that some proper time needs to be spent looking into this for more than only GNOME rather than dismissing this as a KDE-specific issue, given that the reference implementation for a Wayland compositor with only XWayland support enabled exhibits largely the same behavior.
Did you check the settings of the i915 module? I am not sure if this is the right way to set the module parameters.
The Arch wiki states: “If the module is built into the kernel, you can also pass options to the module using the kernel command line.”
Therefore I am not sure if psr=0 sets enable_psr in the i915 module.
For general reference, in case peeps haven’t see this older post above, I always used @Aggraxis’s method up here and it worked flawlessly.
That is the method I used on Arch. On Arch you also have to regenerate initramfs and check that the config file is included. Things may be different on other distributions. My advice: check the module setting on a running system (sudo systool -v -m i915).
As a general rule, start as basic as possible then slowly add to it.
-
So in your case, remove all added parameters. We include them in the guide as we have them tested working and proving their indented benefits. But, removing them and trying each application by itself allows us to track this down.
-
With the extra boot parameters removed, try one of those applications at a time. Freezing? Nope, add one more. Freezing? Yup? Then we have something to point to for further troubleshooting.
That works (and I’ve been using this forever with i915.enable_psr=0
), see upstream docs:
Parameters for modules which are built into the kernel need to be specified on the kernel command line. modprobe looks through the kernel command line (
/proc/cmdline
) and collects module parameters when it loads a module, so the kernel command line can be used for loadable modules too.
(The kernel’s command-line parameters — The Linux Kernel documentation)
I fixed the Arch Wiki page.
Framework gpu hangs have been reported in the i915 gitlab. Please refer to those issues and try out the workarounds there:
-
GPU HANG: ecode 12:0:00000000 Iris Xe XPS 9320 (#6916) · Issues · drm / intel · GitLab
-
Intel alder Lake GPU hangs on Thinkpad P1 Gen5 (#6757) · Issues · drm / intel · GitLab
i915.enable_guc=2
and i915.enable_psr=0
So after going back to completely stock Fedora 37 Gnome kernel params (and still getting freezing), I tried the method in this post. I’ve now put in about 8 hours of working with the same set of apps, no freezes and no log messages involving a GPU HANG.
I’m tempted to say that did the trick. I will probably give it a few more sessions and then try the same fix on Fedora 37 KDE spin, as using Gnome again made me realize why I like KDE so much
That’s great to hear! Just so I understand which parts (or all of the parts)?
/etc/modprobe.d/i915.conf
adding:
options i915 enable_psr=0
or adding
options i915 enable_psr=0
options i915 enable_guc=3
options i915 enable_fbc=1
We do suggest
sudo grubby --update-kernel=ALL --args="i915.enable_psr=0"
in the guide, so I’d be interested in getting a better feel for your approach. Thanks!
@Matt_Hartley Maybe the grubby
approach would not work if i915
is compiled as a module instead of embedded into the kernel, in which case the modprobe
config file would seem to be a good solution.
Or it can also be because i915
is not in the initramfs
image? And so it would be loaded later on? I don’t know exactly what the scope of “kernel arguments” is, but that could help us understand the mechanics.
@Matt_Hartley Yes sorry, I should’ve clarified.
I only added options i915 enable_psr=0
to my /etc/modprobe.d/i915.conf
[gabe@macaria ~]$ cat /etc/modprobe.d/i915.conf
options i915 enable_psr=0
The i915.conf
file did not exist yet, so I had to create it.
Then I ran
sudo dracut --force
and restarted my machine. I checked journalctl
for logs about i915 after the reboot and did see something about a “tainted kernel” but I’m not sure if that was always there or not. Either way, it definitely seems to have done something.
Well, I jinxed it. After about another hour of working, and starting up Obsidian for a few minutes, I get another hard freeze (this time completely unrecoverable).
GPU HANG with a bunch of junk from Obsidian
Mar 09 19:03:59 macaria kernel: Asynchronous wait on fence 0000:00:02.0:gnome-shell[2068]:d970e timed out (hint:intel_atomic_commit_ready [i915])
Mar 09 19:04:02 macaria kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:84dffffb, in obsidian [30394]
Mar 09 19:04:02 macaria kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
Mar 09 19:04:02 macaria kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Mar 09 19:04:02 macaria kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Mar 09 19:04:02 macaria kernel: i915 0000:00:02.0: [drm] obsidian[30394] context reset due to GPU hang
Mar 09 19:04:02 macaria kernel: i915 0000:00:02.0: [drm] GuC firmware i915/adlp_guc_70.bin version 70.5.1
Mar 09 19:04:02 macaria kernel: i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc.bin version 7.9.3
Mar 09 19:04:02 macaria md.obsidian.Obsidian.desktop[30342]: [56:0309/190402.859029:ERROR:shared_context_state.cc(855)] SharedContextState context lost via ARB/EXT_robustness. Reset status = GL_GUILTY_CONTEXT_RESET_KHR
Mar 09 19:04:02 macaria md.obsidian.Obsidian.desktop[30342]: [56:0309/190402.859670:ERROR:gpu_service_impl.cc(967)] Exiting GPU process because some drivers can't recover from errors. GPU process will restart shortly.
Mar 09 19:04:02 macaria md.obsidian.Obsidian.desktop[30342]: [13:0309/190402.873685:ERROR:gpu_process_host.cc(975)] GPU process exited unexpectedly: exit_code=8704
Mar 09 19:04:02 macaria kernel: i915 0000:00:02.0: [drm] HuC authenticated
Mar 09 19:04:02 macaria kernel: i915 0000:00:02.0: [drm] GuC submission enabled
Mar 09 19:04:02 macaria kernel: i915 0000:00:02.0: [drm] GuC SLPC enabled
Mar 09 19:04:02 macaria md.obsidian.Obsidian.desktop[30342]: [450:0309/190402.887720:ERROR:angle_platform_impl.cc(43)] Display.cpp:997 (initialize): ANGLE Display::initialize error 12289: glXQueryExtensionsString returned NULL
Mar 09 19:04:02 macaria md.obsidian.Obsidian.desktop[30342]: ERR: Display.cpp:997 (initialize): ANGLE Display::initialize error 12289: glXQueryExtensionsString returned NULL
Mar 09 19:04:02 macaria md.obsidian.Obsidian.desktop[30342]: [450:0309/190402.887799:ERROR:gl_display.cc(508)] EGL Driver message (Critical) eglInitialize: glXQueryExtensionsString returned NULL
Mar 09 19:04:02 macaria md.obsidian.Obsidian.desktop[30342]: [450:0309/190402.887829:ERROR:gl_display.cc(920)] eglInitialize OpenGL failed with error EGL_NOT_INITIALIZED, trying next display type
Mar 09 19:04:02 macaria md.obsidian.Obsidian.desktop[30342]: [450:0309/190402.888009:ERROR:angle_platform_impl.cc(43)] Display.cpp:997 (initialize): ANGLE Display::initialize error 12289: glXQueryExtensionsString returned NULL
Mar 09 19:04:02 macaria md.obsidian.Obsidian.desktop[30342]: ERR: Display.cpp:997 (initialize): ANGLE Display::initialize error 12289: glXQueryExtensionsString returned NULL
Mar 09 19:04:02 macaria md.obsidian.Obsidian.desktop[30342]: [450:0309/190402.888031:ERROR:gl_display.cc(508)] EGL Driver message (Critical) eglInitialize: glXQueryExtensionsString returned NULL
Mar 09 19:04:02 macaria md.obsidian.Obsidian.desktop[30342]: [450:0309/190402.888053:ERROR:gl_display.cc(920)] eglInitialize OpenGLES failed with error EGL_NOT_INITIALIZED
Mar 09 19:04:02 macaria md.obsidian.Obsidian.desktop[30342]: [450:0309/190402.888087:ERROR:gl_ozone_egl.cc(23)] GLDisplayEGL::Initialize failed.
Mar 09 19:04:02 macaria md.obsidian.Obsidian.desktop[30342]: [450:0309/190402.889036:ERROR:viz_main_impl.cc(186)] Exiting GPU process due to errors during initialization
Mar 09 19:04:02 macaria md.obsidian.Obsidian.desktop[30342]: libva error: vaGetDriverNameByIndex() failed with unknown libva error, driver_name = (null)
Mar 09 19:04:02 macaria md.obsidian.Obsidian.desktop[30342]: MESA-LOADER: failed to open iris: /usr/lib/x86_64-linux-gnu/GL/default/lib/dri/iris_dri.so: cannot open shared object file: No such file or directory (search paths /usr/lib/x86_64-linux-gnu/GL/default/lib/dri, suffix _dri)
Mar 09 19:04:02 macaria md.obsidian.Obsidian.desktop[30342]: failed to load driver: iris
Mar 09 19:04:02 macaria md.obsidian.Obsidian.desktop[30342]: MESA-LOADER: failed to open zink: /usr/lib/x86_64-linux-gnu/GL/default/lib/dri/zink_dri.so: cannot open shared object file: No such file or directory (search paths /usr/lib/x86_64-linux-gnu/GL/default/lib/dri, suffix _dri)
Mar 09 19:04:02 macaria md.obsidian.Obsidian.desktop[30342]: failed to load driver: zink
Mar 09 19:04:02 macaria md.obsidian.Obsidian.desktop[30342]: MESA-LOADER: failed to open kms_swrast: /usr/lib/x86_64-linux-gnu/GL/default/lib/dri/kms_swrast_dri.so: cannot open shared object file: No such file or directory (search paths /usr/lib/x86_64-linux-gnu/GL/default/lib/dri, suffix _dri)
Mar 09 19:04:02 macaria md.obsidian.Obsidian.desktop[30342]: failed to load driver: kms_swrast
Mar 09 19:04:02 macaria md.obsidian.Obsidian.desktop[30342]: MESA-LOADER: failed to open swrast: /usr/lib/x86_64-linux-gnu/GL/default/lib/dri/swrast_dri.so: cannot open shared object file: No such file or directory (search paths /usr/lib/x86_64-linux-gnu/GL/default/lib/dri, suffix _dri)
Mar 09 19:04:02 macaria md.obsidian.Obsidian.desktop[30342]: failed to load swrast driver
Mar 09 19:04:02 macaria md.obsidian.Obsidian.desktop[30342]: [74:0309/190402.922496:ERROR:command_buffer_proxy_impl.cc(128)] ContextResult::kTransientFailure: Failed to send GpuControl.CreateCommandBuffer.
Mar 09 19:04:02 macaria com.discordapp.Discord.desktop[4945]: MESA-LOADER: failed to open iris: /usr/lib/x86_64-linux-gnu/GL/default/lib/dri/iris_dri.so: cannot open shared object file: No such file or directory (search paths /usr/lib/x86_64-linux-gnu/GL/default/lib/dri, suffix _dri)
Mar 09 19:04:02 macaria com.discordapp.Discord.desktop[4945]: failed to load driver: iris
Mar 09 19:04:02 macaria com.discordapp.Discord.desktop[4945]: MESA-LOADER: failed to open zink: /usr/lib/x86_64-linux-gnu/GL/default/lib/dri/zink_dri.so: cannot open shared object file: No such file or directory (search paths /usr/lib/x86_64-linux-gnu/GL/default/lib/dri, suffix _dri)
Mar 09 19:04:02 macaria com.discordapp.Discord.desktop[4945]: failed to load driver: zink
Mar 09 19:04:02 macaria com.discordapp.Discord.desktop[4945]: MESA-LOADER: failed to open kms_swrast: /usr/lib/x86_64-linux-gnu/GL/default/lib/dri/kms_swrast_dri.so: cannot open shared object file: No such file or directory (search paths /usr/lib/x86_64-linux-gnu/GL/default/lib/dri, suffix _dri)
Mar 09 19:04:02 macaria com.discordapp.Discord.desktop[4945]: failed to load driver: kms_swrast
Mar 09 19:04:02 macaria com.discordapp.Discord.desktop[4945]: MESA-LOADER: failed to open swrast: /usr/lib/x86_64-linux-gnu/GL/default/lib/dri/swrast_dri.so: cannot open shared object file: No such file or directory (search paths /usr/lib/x86_64-linux-gnu/GL/default/lib/dri, suffix _dri)
Mar 09 19:04:02 macaria com.discordapp.Discord.desktop[4945]: failed to load swrast driver
Maybe I try Ubuntu…
This is definitely one option. I’ve used Obsidian successfully on Ubuntu.
While still on Fedora, with Obsidian open, do you have other Chromium based applications or browsers open by chance?
@Matt_Hartley Nope, all I had open was the apps I mentioned here plus Obsidian.
As an update, I added i915.enable_dc=0
to my grub kernel params with the following command:
sudo grubby --update-kernel=ALL --args="i915.enable_dc=0"
and I haven’t experienced any freezes since, with some heavy NoMachine usage with Obsidian open, as well as streaming a movie using Google Chrome. Still no idea if this has solved the issue but it’s at least looking good.