[TRACKING] Graphical corruption in Fedora 39 (AMD 3.03 BIOS)

Ash_Guy · November 20, 2023, 8:07pm

Just chiming in with my own experience of this. See it as per others on resume, almost vanilla fedora gnome 39 install / 3.03 etc / 64gb ram / Linux 6.5.11-300.fc39.x86_64 x86_64. I only got half way through the thread because I’ve got to get to work shortly, but will loop back with more detail this evening if I find it hasn’t been solved (there were some promising things in posts a few weeks ago).

In my case it was just on battery. Would have been resuming Firefox, Thunderbird, Alacritty, Terminal, Docker with Postgres and Redis and a Rust Web server app. Of those Alacritty is gpu accelerated (and I kind of love the idea that the gpu intense task that triggered the graphical issues was a terminal…).

Not sure if it was suspended or hibernating, probably the former as far as I understand the defaults? Most of the screen was artifacting (White with some visible regions of the login down the right) so I couldn’t get in and just had to hard-reboot. The only different thing I’ve done with the machine is that yesterday I did a full reinstall from KDE Spin to Gnome because of cascading instability with KDE (though I suspect that was all software and my fault as I had been playing with tiling plugins without cleaning them up).

Will return later on and read the rest of the thread / provide more context / logs as necessary.

Updated w/Additional Info: Running 150% scaled, I haven’t attached an external monitor at all to the laptop yet (though I do have the HDMI expansion card in there), have only heard the fans once for a 30 second period in the week I’ve had it. Also was running Slack which I just spotted reported that it had crashed.

I too don’t think this looks like a hardware fault at this stage, definitely feels different to hardware failures causing graphical corruptions I’ve seen in the past. Onward to Kernel 6.6 :).

jwp · November 20, 2023, 10:16pm

@mikeymop ; hate to break it to you but I’m running all the various included amdgpu patches that were added both for the 6.6 and 6.7 (6.7 has a lot more of them). And I am still getting freezes.

6.7 with SG disabled it’s far less frequent tho.

ollien · November 21, 2023, 7:04pm

Just chiming in with a datapoint: I’ve had my machine on Fedora 39 (XFCE/Xorg) for about two weeks now and have experienced no issues. I have not added any amgpu boot params. One thing I wonder is if this is related to the amount of memory in the system; I have 32G in mine and haven’t yet filled it up.

qemu-system-x86_64 · November 21, 2023, 7:19pm

As far as I know, the issue arises only when there is a memory capacity of 64GB or more.

https://gitlab.freedesktop.org/drm/amd/-/issues/2354

Jason5 · November 21, 2023, 10:15pm

Setting amdgpu.sg_display=0 seemed to work for me.

Specs:
PopOS 22.04
2x 1440p monitors
32GB RAM
X11
6.5.6-76060506-generic
3.03 BIOS

White flickering was really bad before and I needed to pretty much restart my computer to buy me another couple hours before it happened again.

Ash_Guy · November 22, 2023, 12:46am

Another datapoint from me. Left the machine off overnight again but this time Slack wasn’t open but Alacritty still was (which I thought might have been the culprit as the only obviously gpu accelerated app I had open the first time around). Opened fine this morning-- I think someone mentioned a theory about memory pressure / electron / node. May be onto something there.

mikeymop · November 23, 2023, 2:17am

I’ve been able to associate the flickering with AMD-Vi. I have the suspicion that this is an iommu grouping issue. I’ve had similar experiences with the root cause on a previous Intel desktop motherboard. That issue was resolved with a bios update.

Having captured the logs a few times successfully I reported the flickering issue on the amdgpu drm bug-tracker.

With the suggested 6.7-rc2 and rtc_cmos.use_acpi_alarm=1 I do not get any acpi errors on boot. Package wattage sits around 2.7w when idle with the total system power consumption around 5-6watts and at 8watts when the wifi chip is transmitting data.

When I re-awake from suspend I do get one error, however the logs are much quieter on this kernel. I’m going to run this rc for a few days and see how it goes.

Matt_Hartley · November 28, 2023, 1:57am

Do keep us posted, we are tracking this actively.

neeku · December 5, 2023, 3:33am

Enabling UMA_Game_Optimized worked for me.

Arch on Gnome (Wayland) and Hyprland
Docks/Hubs: Literally any, displaylink and not, USB3.2 and USB4, dp alt mode and not.
Bios: 03.03
Kernel: 6.6.3-arch1-1

RiverShen · December 5, 2023, 5:30am

Enabling UMA_Game_Optimized worked fine for normal productivity, but when I ran a game for a while, the issue reappeared in the middle of the gaming session. Disabled scatter-gather and no more issues, but curious how much performance (or efficiency) is lost from this.
6.6.2-201.fc39

Rijnder_Wever · December 5, 2023, 9:09am

Other datapoint: like others, this issue often pops up whenever I make an app fullscreen in GNOME.

However, and strangely enough, the issue does not affect fullscreen Emacs windows. One might think that this relates to Emacs running under X (or xwayland), but this is not the case, since I run the Wayland-native Emacs with pgtk support enabled. That is, I’ve installed Emacs from Arch Linux’s extra/emacs-wayland.

(6.6 kernel)

Nosut · December 7, 2023, 4:21am

Batch 9 reporting in.
Fedora 39, GNOME
AMD Ryzen 7, 64GB RAM
After returning to laptop from the lid being shut in sleep I encounter the white screen as well as after around 3-4 hours of use it sometimes suddenly pops up.

jwp · December 7, 2023, 5:32am

unfortunately amdgpu.graphics.sg=0 is still required with 6.7rc# - it just takes longer to trigger without it than previously.

Today on 6.7rc4 with patches for the AMDfw I hit it after running presentation via HDMI out for 4 hours - sleeping and resuming. Back to disabling SG for now.

jwp · December 7, 2023, 5:53am

From prior discussions this is possibly solved by inclusion of an updated vbios update for Phoenix - this gets redistributed as part of the BIOS for the platform. AMD have provided test firmware for related APU in one of the upstream bug trackers for this issue - so I would expect this to get rolled into an official OEM package to be delivered by integrator at some point specifically for Phoenix. But my reading is a bit murky as I’m no expert into the workings. I still have fewer issues with the amdgpu stack than the arc intel one and considerably less than nvidia’s approach which delivers what could possibly be an entire OS inside their proprietary driver module based on size alone

dimitris · December 7, 2023, 9:18pm

It’s possible to use the new video decoding firmware from the upstream bug just by running dracut manually to update the boot image after manipulating a file that’s part of the amd-gpu-firmware package.

Does the normal update process usually depend on more than just the vendor (AMD in this case) officially releasing the firmware in linux-firmware or similar upstream?

jwp · December 7, 2023, 9:40pm

Different blobs

NickCao · December 7, 2023, 9:54pm

And your distribution maintainers picking up these changes.

nlordell · December 13, 2023, 6:43pm

Just to add another data-point, I am running F39 with Sway (not the spin, just installed it on top of the regular F39 workstation install). Previously, I would experience that the whole screen would flicker white, this seems to usually be triggered by things that render over the entire screen (for example, grimshot’s area grab, swaylock, fullscreen Firefox, etc. Additionally, I experienced the usual full white external monitor connected over USB-C that others are reporting (oddly, the cursor would still render over the white screen ).

Disabling scatter/gather (amdgpu.sg_display = 0) has solved the issues as far as I can tell.

Furthermore, Zoom used to crash Firefox tabs. This typically happened (and was not very difficult to reproduce) when I would change workspaces while in an on-going Zoom call. I have no idea if it is related or just a coincidence, but this has also not happened since disabling scatter/gather.

Edit: In case this is relevant:

Model: Framework Laptop 13 AMD Ryzen 7 7840U Batch 7
Memory: 32.0 GiB

jwp · December 13, 2023, 9:08pm

I managed to trigger this WITH scatter gather disabled on the latest drm-firmware and 6.7-rc4 the other day.

Much less frequent; iommu seems implicated from logs. The fact I have now seen it trigger with Scatter gather disabled is alarming.

AkechiShiro · December 13, 2023, 9:32pm

@Matt_Hartley Is it possible to maybe create a wiki page, or update the one pinned : Active upstream AMDGPU issues affecting Ryzen 7840U (iGPU 780M)
with the issues to let people know that some issues have been fixed on the firmware side (VP9 encoding according to @dimitris I think)

I would also add, that if/when a patch lands that fixes this graphical issue, can we just track for each distribution when it will be shipped (to be able to have a look at the state of an issue accross all Linux distros)

It should be extremely helpful for the community, to have those wiki page or maybe just a wiki page with statuses for all the current issues and what is the latest status (waiting on AMD fix or currently investigated or need BIOS update release) also specifying condition for having an issue such as having 64GB of RAM or more, or less, also power plugged in, USB-C external display or hub in use or not.

Basically making a table that would list condition for reproducing the issue and whether it is fixed along with the commit patch or the workaround recommended.

I’m confident the issues are being worked on, I’d just like to be able to open a single page that show all issues current status.