[RESPONDED] Blocky artifacts on AMD Framework laptop 13

Try

amdgpu.graphics_sg=0 amdgpu.sg_display=0
1 Like

Can you please try to downgrade the GPU firmware from January to December and see if this goes away? Don’t forget to rebuild initrd if you put it in there.

2 Likes

Tried this, sadly does not work. ):

What do you mean by downgrading the GPU firmware?

There is a file dcn_3.1.4_dmcub.bin that is stored in /lib/firmware/amdgpu.

An update was released in January that I am suspecting is causing this problem. If you can revert it to the version before that it could confirm.

See if the version before January’s helps.

Hey, I’ve got the exact same issue. Framework 13 AMD, 64gb with arch and i3.

How would I downgrade the driver? Is it just a case of replacing that file with an older version then rebooting?

Download this version

And replace it in your filesystem and rebuild the initrd and reboot.

I faced this on 6.7.0 when I got my laptop last week but haven’t faced this after downgrading to linux-lts which is 6.6.x.

I also use GNOME on Wayland + Arch.

Would you mind bisecting it?

Sure, I will spend some time tomorrow on this and get back.

Screen corruption similar to the OP on kernel 6.7.0 (linux-6.7.arch3-1-x86_64.pkg.tar.zst) : https://i.imgur.com/6nBuB0L.jpg

Firmware package: linux-firmware-20240115.9b6d0b08-1-any.pkg.tar.zst
DE: GNOME on Wayland
Distro: Arch

Now I have downgraded to linux-lts which is 6.6.13. Here I expect to experience white screen issues randomly throughout the day.

I will share a picture when I do. I was able to solve the white screen problem by using amdgpu.sg_display=0. I posted my experience here a week ago: [RESPONDED] FW13 AMD 7840U Arch Graphics Output Corruption - #2 by bullza

Is there any way we can fix this issue without disabling scatter/gather?

There are two issues here - the blocky corruption and the white screen.

The blocky corruption seems to be either a Linux firmware regression or 6.7 regression (unclear which right now).

The white screen issue with scatter gather doesn’t have a root cause.

I can’t reproduce either of these myself.

After ~3 hours of use, I now faced the white screen problem on linux-lts v6.6.13 : https://i.imgur.com/bFNGhoc.jpg

I have now added amdgpu.sg_display=0 back again and I expect things to be stable.

You might want to use an external monitor (via USB-C). I forgot to mention that I use an external monitor as well. Using a web browser + maximized window speeds up occurrence of the white screen problem. I have tried both Firefox (manually enabled VA-API in about:config) and Vivaldi (Chromium-based, default settings).

1 Like

Yeah I’ve tried multiple external monitors connected to a dock as well as VRAM stress workloads running on the system. I can’t reproduce it.

Please let me know if there is something else I can do to help you.

I also read somewhere that using UMA_Game_Optimized in UEFI reduces the probability of this occurring. I have it disabled (i.e., I am using the default setting).

  1. For first issue (blocky artifacts) If you can please bisect the issue with the blocky artifacts down to a kernel commit that would be really helpful.
  2. For second issue (white screen) I guess pay attention to specific sequence of events that causes it. When it happens, can you try to capture as much stateful information as you can over SSH from another machine? /sys/bus/pci/drivers/amdgpu/*/mem_info*, a scan from GC, and SDMA registers using UMR, and a kernel log? You can put these on a bug report on AMD’s Gitlab please. Right now I don’t know if it’s a mesa bug, a platform firmware bug, or a graphics bug. Until AMD can reproduce it it’s just guesses.
1 Like

I will start with 2 as it won’t interfere much with my work hours.

A bit off-topic, I only get the following message when I run: umr --logscan -O bits,follow,empty_log

Cannot open DRI name under debugfs: No such file or directory
ERROR: amdgpu.ko is loaded but /sys/kernel/debug/dri/0/name is not found
[WARNING]: Unknown ASIC [amd15bf] should be added to pci.did to get proper name
sh: line 1: /sys/kernel/debug/tracing/events/amdgpu/amdgpu_mm_wreg/enable: No such file or directory
sh: line 1: /sys/kernel/debug/tracing/events/amdgpu/amdgpu_mm_rreg/enable: No such file or directory
[ERROR]: Could not enable mm tracers

I think I need to enable some debug flags or something.

Sounds like debugfs isn’t enabled in your kernel.

Nothing to add here except to point out everything Mario indicated and requested would be helpful for troubleshooting.