[TRACKING] Graphical corruption in Fedora 39 (AMD 3.03 BIOS)

SDDM is Plasma/KDE’s display manager so I guess not.

Thanks! I’m running Fedora Workstation KDE, and the issue I’m describing is on the SDDM login screen. This is on the internal display, and there are no other displays connected. At the original time of my post, 6.5.9 was the latest kernel - I upgraded to 6.5.10 yesterday when it became available and confirmed the issue is still present with it as well.

I’m having this problem here using two displays coming back from wakeup with fedora 39, AMD 3.03 bios, will try the kernel params mentioned

1 Like

I just experienced the screen flickering issue with fedora 39 6.5.10-300 kernel, gnome when I opened the laptop lid. No external displays attached before nor after the closing and opening of the lid.
This time I saw many iommu callback fails. I have the UMA Game Optimized still on and have 2x48GB of ram.

1 Like

Being on Fedora 39 Gnome, energy saving in settings, Bios 3.03, 32GB RAM, no NVME (only 250GB expansion storage), 2xUSB-C, 1xUSB-A.
I haven’t tried this fix ( amdgpu.sg_display=0, does somebody know the performance/battery hit it will have?)

What I can say:
It happens on the internal display, with Gnome (dark mode) and happens if I locked the device and close the lid and open it after a while again.
Seems that suspend breaks it for me, at least (having disk encryption enabled via Fedora 39 installer, at least right now).
Running sudo systemctl restart gdm fixes the issue (could be that once the display went black after that and only a restart helped, but else this “helped” in being faster then restarting), but all programs are closed and have to login. But even with flicker the laptop is still responding, so terminal can be opened.

1 Like

Probably none, that only changes the way the GPU allocates system memory.

1 Like

So in the spirit of testing one variable at a time I have applyed this kernel param (amdgpu.sg_display=0) rebooted, and did nothing else. I have not updated my bios (I will when this is all over.)

so far that seems to have fixed it. I have not seen the white blocks since. Unfortunatly I left my unit unplugged so I’m currently only at 1 day and 20 hours of uptime. Still I would have expected to see the white patches already if it were not fixed.

after 2 days of uptime I intend to do some stress testing by messing with scales and resolutions and refresh rates to be really confident that it is resolved. I will report back after that.

I’m also seeing this on Fedora 39 Silverblue, no external monitor (I don’t own one). I have not updated the BIOS or tried amdgpu.sg_display=0 yet, but I do strongly suspect it has to do with the GPU. It only occurs after I’ve opened a fair amount of browser tabs, especially with lots of images/videos. And, interestingly, it goes away for a second when I do three-finger swipe up to view applications (or press meta). With the three-finger swipe method I can actually swipe up, then down until everything is back to where it should be, and it will remain flicker-free until I lift my fingers off the touchpad. I suspect at that point the GPU is no longer apply some kind of transition effect.

2 Likes

Being able to have better control over this, and, in particular, choose 16GB or more for the UMA buffer frame, would hugely help running ML tasks on the FW13 AMD. Most of the current libraries and projects - Stable Diffusion, for example - do not support APUs natively, and require VRAM pre-allocated. Otherwise they assume the system doesn’t have enough.

I really hope you guys can work this out with AMD. It would potentially turn FW13 AMD into the best portable choice for ML/AI-related work, rivaling the Macbooks.

1 Like

The BIOS on ThinkPad P14s with AMD Pro 7840U does allow to set this up to 8GB, in addition to “Auto”. Just as an example of another system with almost the same processor (UMA management doesn’t seem to be a “Pro” version feature, as far as I can tell).

Just here to add that I have the exact same issue with my AMD FW13. Video below:
AMD FW 13 display issues

How exactly did you apply this change? I’ve tried running the command:

sudo sysctl -w amdgpu.sg_display=0

What I got in return was “cannot stat /proc/sys/amdgpu/sg_display: No such file or directory”

Any help is appreciated, thank you.

after applying the kernel param (amdgpu.sg_display=0) at an uptime of 3 days 20 hours and some minutes, have have not seen the flickering white box/patches. I’ve tried different resolutions and refresh rates on i3 and sway. Looks fixed to me.

2 Likes

I used grubby.
grubby --update-kernel=ALL --args=" amdgpu.sg_display=0 "

Here’s some info on it:

3 Likes

Coming back to this thread to say that 2 days after running this command, it did help a lot. It did not reduce the is issue completely though. It occurs again whenever I adjust display scale, using Gnome’s fractional scaling experimental features. So it seems like this fix is temporary for now. Whatever the root issue is, I hope that it can get resolved as soon as possible.

For me, since the last update (via gnome software) including restarting the white boxes have not appeared again (which happened before only it when waking up from laptop being closed). Now I have the problem that sometimes I just seem to get logged out. I am at the login screen and when I login every window is closed. (not using amdgpu.sg_display=0 until now).
This issue makes it even more unreliable. With the white boxes I can at least see a bit and maybe save something, instead of getting kicked out at some point. Restart didn’t help, even searched for testing drivers (lvfs testing), but nothing new. Happens at least when running Brave in Wayland mode in Gnome (display scaled to 150%). Also running a few gnome extensions, not sure they break something.

If I were to bet, it would be on gnome-shell crashing on either suspend or resume. Do any interesting entries show up in abrt (aka Problem reporting)?

Also try journalctl --since '10 minutes ago' after you log in again (adjust time for a point just before suspend), something about the crash should show up there too.

Edit: If you had a stack trace on either of these places, it would probably be extra useful in troubleshooting this.

Forgive me if this is off-topic, but this isn’t limited to Fedora 39. This seems like it’s an upstream AMDGPU bug. Is there some place where we can collect more generic details to help upstream?

It seems like some people are more likely to hit it than others, could there be a machine spec detail as well (timing, amount of ram, etc?)

I’ve had this happen to me on Ubuntu 23.10 with both the 6.5 and 6.6 (mainline) kernels. So far so good on 22.10 with OEM C kernel.

Best,

I’ve never experienced this graphical corruption behavior until I was participating in a zoom using my tv as an external monitor. Note I use a 32" external monitor while at my desk and haven’t had this before. This is without setting any kernel parameters and I am running fedora 39 (for now until Ubuntu gets more stable). I haven’t installed any gnome extensions but did have synology diskstation, thunderbird, and a terminal session running. Journalctl returned a meta_window_set_stack_position_no_sync error in the logs but otherwise no errors.

Nov 14 16:46:25 cogsworth gnome-shell[4030]: meta_window_set_stack_position_no_sync: assertion 'window->stack_position >= 0' failed

video of flashing here

https://www.phoronix.com/news/AMD-Scatter-Gather-Re-Enabled

2 Likes