[TRACKING] Graphical corruption in Fedora 39 (AMD 3.03 BIOS)

I’m running the Fedora 40 branch (eg: fc39 dnf system-upgrade to fc40).

Don’t do this as a novice! The beta isn’t even out yet.

Just install the kernel manually.

There is nothing that solves everything, but there are workarounds.

  • Kernel >= 6.8 has many improvements, but still has some white screen issues (very weird that it’s worse for you)
  • Change the BIOS settings from Auto to UMA_Game_Optimized.

This shouldn’t be necessary anymore in 6.8, but if you encounter issues feel free to set it:

  • Set kernel arg amdgpu.sg_display=0
1 Like

Yes, this is I read a couple of month ago, that motivated me to migrated to Fedora 40 (that I do at time of beta anyway)
That why I explain all cases I encounter screen corruption, because it is now unusable. I have to restart 2 times per day.

I already did, but perhaps it switched back to AUTO, I’ll double-check

I’ve created a framework backports repo. It currently only contains the kernel from Fedora 40. Let me know if more packages are required.

It’s updated once a day by downloading the packages from Fedora 40 and uploading them to this repo. Nothing is modified, so I’ve enabled the Fedora gpgkey’s.

USE AT YOUR OWN RISK!

sudo tee /etc/yum.repos.d/framework-backports.repo << "EOF" > /dev/null
[framework-backports]
name=Framework backports
baseurl=https://principis.fedorapeople.org/framework-backports/$releasever/
enabled=1
metadata_expire=7d
repo_gpgcheck=0
type=rpm
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-fedora-40-$basearch file:///etc/pki/rpm-gpg/RPM-GPG-KEY-fedora-41-$basearch file:///etc/pki/rpm-gpg/RPM-GPG-KEY-fedora-rawhide-$basearch
EOF

To remove this repo (and installed packages):

sudo rm -f /etc/yum.repos.d/framework-backports.repo
sudo dnf distrosync --refresh
sudo dnf remove "kernel-*fc40*" # THIS SHOULD ONLY REMOVE KERNEL* PACKAGES!!!
1 Like

I still get the flicker occasionally.

I’m running VMs via libvirt and when resizing or fullscreening them on my external 4k monitor, I occasionally either freeze the VM or my entire system with this kernel error:

[ 2034.021192] gmc_v11_0_process_interrupt: 46 callbacks suppressed
[ 2034.021197] amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:3 pasid:32770, for process qemu-system-x86 pid 5607 thread qemu-syste:cs0 pid 5642)
[ 2034.021205] amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000000018000000 from client 10
[ 2034.021209] amdgpu 0000:c1:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00301430
[ 2034.021211] amdgpu 0000:c1:00.0: amdgpu: 	Faulty UTCL2 client ID: SQC (data) (0xa)
[ 2034.021213] amdgpu 0000:c1:00.0: amdgpu: 	MORE_FAULTS: 0x0
[ 2034.021216] amdgpu 0000:c1:00.0: amdgpu: 	WALKER_ERROR: 0x0
[ 2034.021217] amdgpu 0000:c1:00.0: amdgpu: 	PERMISSION_FAULTS: 0x3
[ 2034.021219] amdgpu 0000:c1:00.0: amdgpu: 	MAPPING_ERROR: 0x0
[ 2034.021221] amdgpu 0000:c1:00.0: amdgpu: 	RW: 0x0
[ 2044.180516] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, but soft recovered

In my case, on Fedora branched/40, with 6.8.1-300.fc40, I was seeing rather frequent white screen corruption, always full-screen, and after setting amdgpu.sg_display=0, have not seen it since, though it’s only been around 2 days. I have 64 GB of RAM (2x32) from FW.

As setting amdgpu.sg_display=0 seems like it should be redundent, given the SG disabling code at ≥ 64 GB, I wonder if that check is no longer in the most recent branched kernel for some reason, whether it was an attempt to get SG legitimately working in this case, or something else? I haven’t had a chance to check.

Thats interesting as I have 32gb memory and experienced the glitches as well. However, rather than disabling scatter/gather I set UMA_GAME_OPTIMIZED to lock the gpu memory size and this has proven stable as well.

I’m not sure which method is preferred but it definitely seems like a subtle memory leak or overflowed buffer happening around the shared memory region.

I have only used the UMA_GAME_OPTIMIZED method, and never have any glitches when it’s enabled.

Btw, after recent BIOS update, I got more available ram. 29 GB vs 27 GB.

3.02 to 3.03 or 3.03 to 3.03b?

Also, after the BIOS update, is UMA_GAME_OPTIMIZED still set?

sorry I should’ve specified… it’s 3.03 to 3.03b

UMA_GAME_OPTIMIZED still set, without it the system is more prone to glitch

1 Like

Thanks. Weird that an EC only update, focused on power/charging, would change available memory…

I disabled UMA_GAME_OPTIMIZED today. Btw its called “Gaming” after 3.05.

I tried to trigger the corruption, like I did last time… open many tabs running YouTube, Totem running 1080p video, and plugging and unplugging my usb-c dock with monitor connected.

I opened 30 tabs in the background and plug/unplug dock, close/open lid, on/off fractional scaling, open/close lid - didn’t trigger white graphical corruption.

I used mine whole day without any issue.

worth noting, I’m on kernel 6.8, bios 3.05. no amdgpu.sg_display=0

4 Likes

So the setting for iGPU memory is as follow:
Auto: Allocate 512MB to FPU if system memory is below 64GB. else 2GB.
Gaming: Allocate 2GB to GPU if system memory is between 8-24GB. 4GB if system memory is 24GB and above.

Is 512MB enough for normal use?

2 Likes

When S/G is on absolutely it is. Once that 512 is exhausted the kernel driver will pull from GTT.

2 Likes

After more than a day of using it, I will fairly confidentially say that bios 3.05 (still in beta, officially) + kernel 6.8.x fixes all corruption problems!

No more restarts and workarounds :partying_face:

3 Likes

So we can turn scatter gather back on?

Uhu! Provided you update your bios, ofc.

Gotta try that at some point then.

Well looks like a in to me, just ran the cyberpunk benchmark with sg on and on igpu memory auto and it ran fine with none of the artifacts that needed turning off sg.

I’ll leave it on for now.

Has someone tried to reproduce this issue using beta BIOS 3.05 ?