Help: Severe graphics instability on FW13 i5-1340

In the past month or so, I have been having increasingly severe graphical glitching, freezeups, and crashes. I had a perfectly working graphics setup that let me play most things in Steam, Wine, linux native games, etc just fine on either the iGPU or my nvidia eGPU, until all this started. Now, the Steam UI will not even reliably launch, let alone actually get into games using any version of proton. Lutris UI launches, but again any “3d” game I try crashes shortly after launch or actual level load. “2d” games (i.e. Freeciv, Endless sky, etc) seem to be okay. Crash occurs either with iGPU in use (eGPU not connected), or with nvidia 1050Ti eGPU connected and verified actively processing the DE, Steam, and games via nvidia-smi. Crash has progressed to affecting other programs as well (Vivaldi browser, orca-slicer, etc). The DE sometimes recovers from the freeze, but usually the graphic glitching is followed by a soft crash back to tty, or complete hard freeze of whatever DE/OS I am trying. I have noted the graphics issues in each of these DE’s:

  • Hyprland (0.42-0.45), wayland native
  • Gnome-shell 47.1, both wayland and x11 session
  • KDE/Plasma (6.1 to 6.2), both wayland and x11 session
  • Windows 11 24H2

(Wayland or X11 session in use was verified via loginctl show-session <ID>)

The graphics glitch looks either like random colored pixels sprinkled all over the screen, or blocks being overlayed on the screen (think damaged avi video file look). See example pictures of the graphic glitch at end of this post.

I am not seeing anything actionable in dmesg or journalctl on any system, but here are some relevant crash outputs anyway:

Which Linux distro are you using?

Symptom appears in each of the following OS’s, when run as a bare metal (not virtual machine) install.

  • fully updated arch linux x86_64, and
  • fresh install then fully updated Fedora 41, but only tested iGPU, no Nvidia drivers loaded or eGPU connected, and
  • fresh install then fully updated Win11 24H2, with Framework supplied 13th gen driver pack for iGPU, no Nvidia drivers loaded or eGPU connected

Which kernel are you using?

Symptoms appear in all of:

  • linux (earliest I noticed the issue was circa 6.11.2, but continues with 6.11.6)
  • linux-lts (circa 6.6.58 to 6.6.60)
  • linux-zen (circa 6.11.6)
  • linux-hardened (6.10.? to 6.11.7)
  • Windows 11 kernel that comes with 24H2, no idea the version/build number though

Therefore I really doubt this is kernel related at all. OpenGL, Mesa/Vulkan, Intel proprietary drivers, etc certainly suspect, but probably not kernel.

Which BIOS version are you using?

from lshw:

          description: BIOS
          vendor: INSYDE Corp.
          physical id: 0
          version: 03.05
          date: 06/04/2024
          size: 128KiB
          capacity: 16MiB

Which Framework Laptop 13 model are you using?

FW13 i5-1340p is my daily driver. Single 32GB lexar RAM module which passes all memtest probes with no errors. Crashes occur with and without AC power applied, with and without expansion cards in, etc.

The graphics problem does NOT happen if I move my SSD into an i5-11th gen mainboard (running BIOS 3.19 I think). Graphics problems also do not seem to occur in Windows 11 24H2 run via qemu or docker container, though everything runs unusably slowly of course.

I am usually pretty good at troubleshooting linux issues, but this stubbornly refuses to get any better despite rolling forward and backward tons of arch packages, fresh installing Framework officially supported OS’s, etc. Pulling out the hair I do not have over here…

First I want to know if this is affecting anyone else with an intel 13th gen mainboard?

If anyone has any ideas for troubleshooting that I missed, please let me know.

Otherwise I am going to have to pin it on failing hardware and try to RMA out this mainboard, which I would really prefer to not do if I do not actually have to.

Thanks for reading.





Hi,

Some people have reported problems with bluetooth causing general instability. Try with bluetooth disabled.

Thanks for the suggestion. Unfortunately, no improvement after

systemctl stop bluetooth.service and
systemctl disable bluetooth.service

( and a systemctl status bluetooth.service to confirm it was in fact unloaded )

Have you tried kernel 6.11.7
It has a corruption fix that might help.

commit def7d40da0333e6f3afad12c2333cceac1d368b6
Author: Frank Min Frank.Min@amd.com
Date: Thu Oct 10 16:41:32 2024 +0800

drm/amdgpu: fix random data corruption for sdma 7

Just tried linux-hardened 6.11.7. No change with that either.

The crash appears to be caused by “Hyprland”.
I don’t have hyprland running on my Linux system, so I don’t know what it is.
Maybe see if you can avoid hyprland somehow, perhaps try in Xorg mode instead of wayland, just to see if you can narrow down the problem a bit.

As I said in the OP, the problem persists in Gnome and KDE, on X11, and even in Windows.

OK. Please can you post the logs of a failure when running Xorg. I.e. something simpler than hyprland.

Framework Support replied (clearly without reading the OP of this thread that I linked to in my initial contact with them, yay)… Asked me to try Ubuntu 24.04 live USB, which I’m exploring right now (24.04.1 technically).

Gnome-Shell on X11 is what it boots to by default, and the graphics glitches start right from DE load up. However, it does seem to recover from the GPU hang events better, and I get minutes instead of seconds before 3d based programs crash.

See screenshots




Hi. I think there are a few threads on here that talk about graphics corruption on a more recent kernel and mesa.
My guess it could be:

  1. RAM
  2. Kernel driver bug
  3. Mesa bug
  4. GPU firmware
    I think you have discounted (1) RAM, so I guess as it says “GPU hang”, its most likely the (4) GPU Firmware and perhaps a recent change in Mesa/kernel that triggers the bug in the firmware.

So, I guess the only things to try, is to try older kernels, mesa, gpu firmware until you find a stable one, and then move forward until it breaks again, as some way to narrow down the problem a bit.

I think AMD GPUs get round this sort of problem in many, but not all, cases by resetting themselves very quickly, so the user hardly notices.
I don’t know what Intel do in these situations.

Small update:

I went ahead and did a full install of Ubuntu 24.04.1 to a spare hard drive, but was careful not to update any packages after the install that I did not absolutely have to. GPU freeze/thaw cycles were very apparent, maybe two a minute, but this older gnome-shell seems to recover from them a bit better. Got most of a 50 turn Civ VI scenario in (maybe an hour?) before it finally froze and couldn’t recover. I’ll update this post with exact “working” versions of all the relevant pieces of the iGPU stack next time I boot over to that hard drive.

So yes, symptoms are definitely software version dependent. But the fact that no one else is chiming in saying that this is affecting them as well leaves a defect in my specific mainboard as possible as well…

Updates:

  1. Support asked me to do two different kinds of mainboard resets:
    1a. With the computer off, cycle the chassis intrusion switch for 10x 2seconds on + release cycles. Results: After this my 3d printer slicer of choice (Orca Slicer) started working again. Steam games went from crashing in seconds to something like 3-10 minutes before lockup.
    1b. Full powerdown, RAM and SSD removal, battery removal, hold the SW1 switch for at least 20 seconds, leave mainboard depowered for 15 minutes. Results: no improvement.

Just for T/S purposes I did do a full software update on that Ubuntu 24.04.1 LTS external hard drive install. That update ruined what little stability I had, back to games crashing within a minute or so. So I did a fresh reinstall of Ubuntu on that external hard drive, this time with no internet connectivity, no 3rd party drivers installation, and definitely no updates. Results: also at <1min crashing on Steam games.

For now, I am using Nvidia Geforce Now as a workaround for games, just so I do not have to keep bailing on gaming sessions with my cousin. But this is far from a real fix.

Usually I can isolate a software problem from a hardware one pretty quickly, but this one is turning into a huge mess. The fact that stability is near zero in Windows, Ubuntu LTS, etc points to hardware issue in my mind.

6.12.1 is released in Arch mainline. Graphics stability significantly improved in this mornings testing on Hyprland. No visible GPU freeze/thaw cycles, nor anything in dmesg / journalctl.

However, I am still seeing some stuttering, up to 1-2 second lag on terminal input.