[TRACKING] Graphical corruption in Fedora 39 (AMD 3.03 BIOS)

AMD GPU Driver Crash

Summary: I found a way to reliably reproduce the crash/hang in a deterministic way and wrote a guide at the bottom of this post.

Background

When running specific GPU loads, my system crashes sometimes. I managed to log the kernel and every single time I get this page fault:

amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:2 pasid:32772, for process qemu-system-x86 pid 3344 thread qemu-syste:cs0 pid 3369)
amdgpu:   in page starting at address 0x000000003f800000 from client 10
amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
amdgpu: 	 Faulty UTCL2 client ID: CB/DB (0x0)
amdgpu: 	 MORE_FAULTS: 0x0
amdgpu: 	 WALKER_ERROR: 0x0
amdgpu: 	 PERMISSION_FAULTS: 0x0
amdgpu: 	 MAPPING_ERROR: 0x0
amdgpu: 	 RW: 0x0

This issue occurs frequently when running 3d workloads inside a Qemu Guest VM. I was not able to reproduce this issue by running the same workload outside the VM. But this is not a Qemu bug, because the crash itself happens on the host system. And guests should never be able to crash their host.

System

Framework 13
AMD Ryzen 7840U
64 GB Memory
BIOS: 03.05

Configuration 1 Configuration 2
Host Fedora 40 Ubuntu 24.04 LTS (Live Environment)
Host Kernel 6.8.10-300.fc40
6.8.11-300.fc40
6.8.?? (not completely sure)
Guest Alpine 3.19.1 Fedora 39
Crash Behavior Screen(s) turn black
forever until poweroff
is forced
Full system freeze for 10 - 20 seconds.
Then it normalizes, until the still running
workload triggers it again

It makes no difference whether I run on battery or with the charger. Disabling/enabling amdgpu.sg_display=0 and/or GamingMode in the BIOS makes no difference either.

Guide To Reproduce The Issue

Time required: ~15 Minutes
Example OS: Ubuntu 24.04 LTS (Live Environment)

1. Setup a VM with 3d acceleration

  1. Install gnome-boxes trough the software app
  2. Open gnome-boxes and click the :heavy_plus_sign: on the top left corner to download an OS
  3. Select Fedora 39 and do a full installation. Just testing in the live image itself is not enough
  4. Reboot Fedora 39. Gnome-boxes may not autostart the VM, but you can do this by double-clicking on the VMs icon. Gnome-boxes sometimes crashes here, but just start it again and retry
  5. Gnome-boxes always boots into the live-image/installer, not the system which you just installed. So you have to interrupt GRUB and move down to Troubleshooting. There you can select the first partition. Note: You have to do this on every VM reboot to prevent starting the live-image/installer again
  6. Update Fedora 39 trough the software center. I don’t know if this is required to reproduce the crash, but I did it anyways. Note that you don’t need a full upgrade to Fedora 40, just the basic updates are enough. Then reboot if the updater tells you
  7. Shut down the VM
  8. Gnome-boxes has a 3d acceleration setting in the VMs properties, but unfortunately this does not work. Therefore we will boot our freshly installed Fedora 39 image with Virt-Manager
  9. sudo apt install virt-manager
  10. Start Virt-Manager. If it complains about not connecting to Qemu’s system session, ignore it. Create a new session and select “user session”
  11. In the menu where you can create a new VM, there is an option to import an existing VM image. Select it. The image should be located somewhere in ~/snap/gnome-boxes/...
  12. It asks you for an OS name. Enter Fedora 39
  13. Select “configure VM after creation” (or something like that)
  14. You now need to enable several GPU-related settings. I don’t know which order is the right one, but Virt-Manager will tell you. Settings to enable:
  • Memory
    • Enable shared memory: :heavy_check_mark:
  • Video Virtio
    • Model: Virtio
    • 3d acceleration: :heavy_check_mark:
  • Display Spice
    • Listen type: None
    • OpenGL: :heavy_check_mark:
  1. Press the button on the top left of this window which says something like “complete installation”

2. Trigger the crash

  1. Boot your VM and open Firefox
  2. Search for basemark webgl and it will send you to this page: https://web.basemark.com/
  3. Run the benchmark. The first 6 tests will pass, but on test 7/20 is where the crash happens

Note: When I’ve tested this on two different Linux distributions, there where some differences in behaviour. On the Fedora 40 host the crash happens within seconds. On the Ubuntu Live host it took around a minute.

2 Likes