[TRACKING] Fedora Freezes and Flickering on newer Kernels

@nadb Congrats! Would you mind checking if the nightlight flickering still happens directly after boot though?

For me on 6.0.13 turning nightlight on and of some time after boot doesn’t cause the flickering, but still triggers it for the first few minutes after boot.

The only kernel parameters on my side are the light sensor thing and the nvme.noacpi=1 from the linux battery tuning thread.

Fantastic!

Your right, same here. Right after boot it still does it. Regardless an improvement, it was doing it before no matter when I tried it. Still not sure what is causing it or even what to look for in logs.

To be ultimately clear (for my sake), this flickering happening with the nightlight feature exclusively or in general use as well?

For me when turning on the night light feature it does its thing and then there is odd flickering almost like a frame drop. Then it settles down, and then it acts up again when you turn it off for about a minute afterwards.

Fedora 37 just got an update to kernel 6.0.14 and an intel-gpu-firmware. The night light issue is now gone immediately following boot as well. Also seeing a lot fewer gnome-shell errors in journalctl (referring to a variety of previously spammed messages). Will monitor this last and see if this continues.

1 Like

Since your experience has matched mine exactly (same config), borrowing a link to this for another thread.

1 Like

@Matt_Hartley go for it. The only difference I see based on your /etc/default/grub is that I am using luks encryption which is reflected in my /etc/default/grub. I have also seen even greater stability after the 3.06 beta BIOS update. Really if you are buying a processor that is less than one year old and plan on running linux, you have to be willing to live on the bleeding or not so bleeding edge :slight_smile:

1 Like

I agree with every single word said here. All of it. :slight_smile:

Yeah, I have LUKS on my Fedora 37 install (11th Gen) Framework lappy.

Have a great holiday weekend!

1 Like

@Matt_Hartley enjoy the holiday break, new year projects are around the corner.

1 Like

So I finally experienced a hard freeze last night. Almost three weeks without one. So what changed…well about a week ago I removed the i915.enable_guc=3 from my kernel parameters. I am currently on 6.0.15-300.fc37.x86_64 and I don’t think this was a regression, I have been reading the changelogs with every release. I am enabling this kernel parameter again to see how long it takes until the next hard freeze to see if this is a contributing factor. Logs were inconclusive as to pointing to a cause, just the usual bluetooth, gnome settings, after a screen lock. Yes I have narrowed it down to screen lock and not suspend. Apparently this was also the source of my losing my external displays attached to my dock, with no discernible pattern. I now use suspend instead of screen lock and so far the displays are working as they should. I realize guc is enabled by default for adler lake processors but huc is not. Since huc deals with CBR rate control on SKL low-power encoding mode and a web browser is a common theme with the freezing issues (and I have Firefox open in the background with every freeze), I am going to see how this plays out now. I had two weeks of no freezes with it on previously. I am goinng to leave it in place for a longer period of time and then remove it again, and see if I get my weekly visitor.

Some additional observations regarding suspend instead of screen lock, I no longer lose all my bluetooth devices, they remain connected instead. I really need to dig into how gnome screen lock works.

Okay, now this is interesting and someplace I can work on testing. I will share this with my Fedora contact as well.

Ah, there we are. Dock and external displays…we used to see a LOT of this at my previous job. So this seems to happen at screen locking?

The screen lock would not always disconnect the displays, but after a prolonged screen lock it was very likely. Additionally what was wasa interesting to me was that the ethernet connection did not get dropped, however bluetooth devices would.

I disabled hibernation, and hybrid-sleep in /etc/systemd/sleep.conf and modified /etc/Upower/Upower.conf to PowerOff on critical just to make sure the system was not trying to hibernate or anything else of the sort while screen locked. These modifications had no effect on the undesirable behavior. I then removed the Super+L shortcut reassigned it to systemctl suspend and removed automatic screen locking behavior in settings.

This had the desired effect. I am also hopeful it will further reduce the freezes as every freeze I have had has been after a recent screen lock. I just figured screen lock would turn into a suspend given sufficient time, guess I was wrong. Regardless there is definitely something wrong with screen lock in Wayland. I have now used suspend to “lock” my screen at least a dozen times, and the external displays come back up every time. With screen lock I would have had to remove the thunderbolt cable at least 4 times by now to get them back. This is something that I was even seeing on my old Thinkpad T480s, so I don’t think it is hardware specific.

So for you at least, this appears to have been the cause/workaround resolution? I agree, there is definitely something not right with screen lock in Wayland with external displays attached. I have not experienced any issues with merely the laptop display itself.

1 Like

Neither have I. Come to think of it all of my freezes have occurred while docked as well. None have occurred with just the laptop, being directly charged by a charger.

1 Like

Docks, a wonderful asset until they’re not. :confused:

While this is not amazing, it’s likely giving us a general direction in which to keep poking at.

1 Like

This may be a red herring. I’m somewhat confident that I’ve seen freezes with no dock/external monitor attached.

In general, it seems that the circumstances under which this occurs are rather hard to determine. We’ve seen a lot of “I did x and now the bug doesn’t appear anymore”, which then later turned out to be wrong. (It may very well the case that certain circumstances make freezing more likely to occur – but I don’t think this will take us further to the root cause.)

2 Likes

So I did some more digging today. 1) GUC and HUC are enabled by default on Fedora 37 no need for the kernel parameters 2) The following error:

i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
i915 0000:00:02.0: [drm] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
i915 0000:00:02.0: [drm] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}

appears to be the culprit, the dates match my memory of hard freezes. 3) Also experiencing non-fatal errors in intel_ddi_sync_state, which may be related.

Additional trobleshooting revealed that multiple monitors may contribute to the crashing. Logs are clean when the laptop is not attached to the dock. Once it is attached the number of errors in the logs enters the level of spamming. No extensions and entering the activity overview creates a bunch of errors. I am going to do additional testing here with usb-c connections and active displaylink adapter to see if it helps.

Another contributing factor may be hardware acceleration in browsers, however I believe this should be resolved once the new intel-media-driver hits rpmfusion. It is already available on Archlinux so I expect this will drop wthin the next week or so.

@Matt_Hartley Sorry, I didn’t see that you had replied to me until yesterday!

I actually just experienced the issue again a few minutes ago, so I looked at the journal and found this before I put my computer in sleep since that seems to unfreeze it:

Jan 06 08:05:18 framestancies plasmashell[6071]: libva info: va_openDriver() returns 0
Jan 06 08:05:18 framestancies plasmashell[6071]: ATTENTION: default value of option mesa_glthread overridden by environment.
Jan 06 08:05:18 framestancies systemd-timesyncd[1288]: Timed out waiting for reply from 220.158.215.21:123 (1.opensuse.pool.ntp.org).
Jan 06 08:05:28 framestancies systemd-timesyncd[1288]: Timed out waiting for reply from 51.38.105.7:123 (1.opensuse.pool.ntp.org).
Jan 06 08:05:32 framestancies systemd-logind[1348]: Power key pressed short.
Jan 06 08:05:32 framestancies dbus-daemon[1920]: [session uid=1000 pid=1920] Activating service name='org.kde.LogoutPrompt' requested by ':1.13' (uid=1000 pid=2068 comm="/usr/bin/ksmserver")
Jan 06 08:05:32 framestancies dbus-daemon[1920]: [session uid=1000 pid=1920] Successfully activated service 'org.kde.LogoutPrompt'
Jan 06 08:05:39 framestancies systemd-timesyncd[1288]: Timed out waiting for reply from 162.159.200.1:123 (2.opensuse.pool.ntp.org).
Jan 06 08:05:48 framestancies kernel: usb 3-7: USB disconnect, device number 3
Jan 06 08:05:48 framestancies kernel: usb 3-7: new full-speed USB device number 6 using xhci_hcd
Jan 06 08:05:48 framestancies kernel: usb 3-7: device descriptor read/64, error -71
Jan 06 08:05:49 framestancies kernel: usb 3-7: device descriptor read/64, error -71
Jan 06 08:05:49 framestancies kernel: usb 3-7: new full-speed USB device number 7 using xhci_hcd
Jan 06 08:05:49 framestancies systemd-timesyncd[1288]: Timed out waiting for reply from 204.2.134.162:123 (2.opensuse.pool.ntp.org).
Jan 06 08:05:49 framestancies kernel: usb 3-7: device descriptor read/64, error -71
Jan 06 08:05:49 framestancies systemd-timesyncd[1288]: Contacted time server 17.253.2.123:123 (2.opensuse.pool.ntp.org).
Jan 06 08:05:49 framestancies systemd-timesyncd[1288]: Initial clock synchronization to Fri 2023-01-06 08:05:49.931245 CST.
Jan 06 08:05:50 framestancies kernel: usb usb3-port7: attempt power cycle
Jan 06 08:05:50 framestancies systemd-logind[1348]: Lid closed.
Jan 06 08:05:50 framestancies systemd-logind[1348]: The system will suspend now!

After looking further through the journal to yesterday, at a time when it also happened, I found this common thread:

Jan 04 10:33:46 framestancies plasmashell[22493]: ATTENTION: default value of option mesa_glthread overridden by environment.

However, I’m not entirely sure this is helpful, as it’s a message so common throughout my log (appearing about 2000 times in the last month).

I also did see your message about the best course of action being perhaps just to wait for an update, and that Red Hat may be working on it. But hopefully this late response is at least somewhat helpful to finding this. Maybe there’s something else in the log I posted that could be a clue.