[TRACKING] Freezes on Newest Linux Kernels

So the last couple of Kernel iterations have seen a recurrence of the hard-freezing issues on the 12th Gen 1240P model on Arch Linux. I have tired disabling GUC, PSR, without any major success. The Linux-Clear kernel seems the most stable but it will still freeze any time that the GPU comes under load.

I have tried the mainline kernel, the default kernel and the Linux-Clear Kernel and at the moment all appear to be having this issue with the freezes. I have even had to come back to this topic post several times due to freezing mid-type.

open for any suggestions…

3 Likes

Can you downgrade your kernel and confirm that the issue is gone?

If yes, this should be reported upstream, at least on your distributions bugtracker.

There it can be taken care of and eventually open a bug report on the kernel list.

1 Like

I sadly cannot recall which Kernel last was absent the issue, due to its intermittent character.

1 Like

I am using linux-clear 6.1.1 on Arch linux and I did notice some freezes when coming back up from sleep.
More recent version of linux-clear like 6.1.7 and 6.1.10 seems to have a regression on the i915 driver that makes it impossible for me to change the backlight intensity, but maybe that is just because of my config.

Just though to leave it up here in case it might be of some help.

1 Like

I did notice a comment here about there seeming to be issues with the i915 driver related to the fix to the temporary hangs that were present in earlier kernel versions. The question is I wonder whether it is tied to MESA or to the kernel and if so finding when the regression occurred.

All I can say, the last two/three weeks have rendered my laptop almost useless and that has been deeply problematic.

1 Like

Please try a few, especially both on the 6.1 and 6.0 line and tell us your results.

Also please show us the kernel logs right before the freezes.

1 Like

We’ve been seeing success on newer kernels using i915 .enable_psr =0

May be worth trying this and removing other parameters for testing this. If it’s still a no-go, you may need to try a previous kernel.

1 Like

I am using no mitigations and have not had a hard freeze since December 26th. Before that it was weekly. Keep updating and stay as current as possible. I am on Fedora 37 Gnome i7-1260p defualt Wayland.

1 Like

I have tried using psr=0, I have tried enable_dc=0 I have tried fiddling with fbc and fastboot and nothing seems to work.

My personal feeling is that there is either a regression in the intel driver or in mesa. Like nadb, I had a period without any errors, but given that I am on Arch, I am always that much closer to the sun. I am likely seeing issues eventuate much more rapidly than others are.

Having looked through the commit logs of the kernel, I am pretty sure that something changed within either linux_frameware or mesa. The last 2/3 mesa updates have proven to be the most problematic, making the hard freezes occur more rapidly. But of course it is an absolute nightmare to try and revert a mesa update without it breaking xorg or wayland.

1 Like

Except Fedora updates almost as, and in some cases quicker than Arch. It ain’t quite as bleeding edge as it once was. Most distros also don’t roll a pure mainline kernel either, so something backed into the Arch kernels may also be causing it, or something not baked in . Also there are more items in play than just the kernel and drivers. I think the desktop environments compositor is to blame as well in some instances and some actual application level issues at least on Wayland. Especially if the application uses hardware acceleration and something is wrong with the config. Also for reference I am on kernel 6.1.10.

1 Like

So. I am finding this on 6.1.11, 6.2rc8, Linux-Clear, and the Arch LTS Kernel. This is why I am pretty sure that the issue is not Kernel based at this point. There are just too many different points of testing I have used for it to be happening. I have deliberately used Linux-Clear and Mainline as control elements to see if it was something limited to the Arch Kernel.

My mesa version is 22.3.4-1. I tried reverting to an earlier mesa version, but ended up breaking the links between mesa and X11 and Wayland. That said, while in this broken state I had 0 hangs…

2 Likes

@Maddison thanks for testing out, please send us some kernel logs before freezing so we can investigate in this direction. Thanks!

1 Like

@Maddison I am on mesa 22.3.4-1 as well with no problems. Are you on Xorg or Wayland?

1 Like

I am having issues on both! I am using both i3 and Sway. I wonder if I should try and remove all the options in my /etc/modprobe.d/i915.config and rebuild the UKIs.

@Anachron there is quite literally nothing in the logs. The last entries I am seeing are the things x86/split lock detection warnings from Steam.

1 Like

So, after some more experiments including moving mostly to wayland, I can report some more info on this.

I found the following error seems to be cropping up in my logs.

00:08:04.838 [ERROR] [wlr] [backend/drm/atomic.c:72] connector eDP-1: Atomic commit failed: Device or resource busy.

These errors seem to become unrecoverable, especially when I am using a higher cpu frequency. About the only reliable way I have found to reduce their frequency has been to run the computer with limited frequencies in the powersaving governor.

1 Like

Okay, now this is interesting. For the sake of troubleshooting, did this occur when connected to the module with a display or just simply having the module installed?

1 Like

eDP-1 should be the internal monitor…

I have done further refining, removed a bunch of applications that reset the screen, stopped things that are running in XWayland and have seemed to get a major increase in stability.

2 Likes

Ah, you’re correct. I missed this.

Ah, this is entirely possible.

1 Like

So I found that running steam seems to be a pretty big risk factor all things considered. X-Wayland applications themselves also seem to like to cause issues, but not sure which ones yet. Running all my applications that I can in wayland mode seems to have been the best way to be stable.

2 Likes

So my laptop had frozen while I wrote this. It was running steam, but 35 seconds before it froze the clight daemon had reported a compositor error. So I uninstalled that to see if that may help things.

1 Like