[TRACKING] Graphical corruption in Fedora 39 (AMD 3.03 BIOS)

I’ve got my first instance of this bug on fedora rawhide (6.7 kernel, with xxmitsu/mesa-git) - It looks like for me it’s preceded by steam runtime coredumping/iommu issues coredumping. My guess is it is corrupting/not freeing some memory region accessed by amdgpu during a power state change.

Steps to reproduce:

a) Launch steam - run something intensive for a while (I used civ6 with proton-ge23)
b) observe steam runtime coredumps and/or iommu errrors in journal
c) change power state from being AC powered to Battery
d) watch as machine goes into sleep/idle
e) Quickly resume from sleep.
f) Get graphical corruption on resume (in my case a whited out screen where sddm/plasma unlock should be) - my cursor was still responsive/active but vtty switching was broken.
g) Hold power to hard reset
h) Look at previous 10m journal

Attached is output from my journal

Nov 15 06:58:00 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff90dc0000 flags=0x0000]
Nov 15 06:58:00 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff90dc1000 flags=0x0000]
Nov 15 06:58:00 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff90dc2000 flags=0x0000]
Nov 15 06:58:00 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff90dc3000 flags=0x0000]
Nov 15 06:58:00 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff90dc4000 flags=0x0000]
Nov 15 06:58:00 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff90dc5000 flags=0x0000]
Nov 15 06:58:00 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff90dc6000 flags=0x0000]
Nov 15 06:58:00 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff90dc7000 flags=0x0000]
Nov 15 06:58:00 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff90dc8000 flags=0x0000]
Nov 15 06:58:00 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff90dd4000 flags=0x0000]
Nov 15 06:58:05 emiemi kernel: amd_iommu_report_page_fault: 87654 callbacks suppressed
Nov 15 06:58:05 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83400000 flags=0x0000]
Nov 15 06:58:05 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83401000 flags=0x0000]
Nov 15 06:58:05 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83402000 flags=0x0000]
Nov 15 06:58:05 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83403000 flags=0x0000]
Nov 15 06:58:05 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83404000 flags=0x0000]
Nov 15 06:58:05 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83405000 flags=0x0000]
Nov 15 06:58:05 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83406000 flags=0x0000]
Nov 15 06:58:05 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83407000 flags=0x0000]
Nov 15 06:58:05 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83408000 flags=0x0000]
Nov 15 06:58:05 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83414000 flags=0x0000]
Nov 15 06:58:08 emiemi kernel: i2c_hid_acpi i2c-FRMW0005:00: i2c_hid_get_input: incomplete report (7/65535)
Nov 15 06:58:10 emiemi kernel: amd_iommu_report_page_fault: 87201 callbacks suppressed
Nov 15 06:58:10 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83640000 flags=0x0000]
Nov 15 06:58:10 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83641000 flags=0x0000]
Nov 15 06:58:10 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83642000 flags=0x0000]
Nov 15 06:58:10 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83643000 flags=0x0000]
Nov 15 06:58:10 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83644000 flags=0x0000]
Nov 15 06:58:10 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83645000 flags=0x0000]
Nov 15 06:58:10 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83646000 flags=0x0000]
Nov 15 06:58:10 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83647000 flags=0x0000]
Nov 15 06:58:10 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83648000 flags=0x0000]
Nov 15 06:58:10 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83654000 flags=0x0000]
Nov 15 06:58:15 emiemi kernel: amd_iommu_report_page_fault: 86848 callbacks suppressed
                                                #26 0x00007fb1552efdda n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libcef.so + 0x>
                                                #6  0x00007fb152d2c19c __clone3 (libc.so.6 + 0x10f19c)
                                                
                                                Stack trace of thread 36523:
                                                #0  0x00007fb152ca8799 __futex_abstimed_wait_common (libc.so.6 + 0x8b799)
                                                #1  0x00007fb152cab139 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x8e139)
                                                #2  0x00007fb13e76e64d cnd_wait (radeonsi_dri.so + 0x16e64d)
                                                #3  0x00007fb13e74d4bb util_queue_thread_func (radeonsi_dri.so + 0x14d4bb)
                                                #4  0x00007fb13e76e57c impl_thrd_routine (radeonsi_dri.so + 0x16e57c)
                                                #5  0x00007fb152cabec7 start_thread (libc.so.6 + 0x8eec7)
                                                #6  0x00007fb152d2c19c __clone3 (libc.so.6 + 0x10f19c)
                                                
                                                Stack trace of thread 36520:
                                                #0  0x00007fb152ca8799 __futex_abstimed_wait_common (libc.so.6 + 0x8b799)
                                                #1  0x00007fb152cab139 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x8e139)
                                                #2  0x00007fb13e76e64d cnd_wait (radeonsi_dri.so + 0x16e64d)
                                                #3  0x00007fb13e74d4bb util_queue_thread_func (radeonsi_dri.so + 0x14d4bb)
                                                #4  0x00007fb13e76e57c impl_thrd_routine (radeonsi_dri.so + 0x16e57c)
                                                #5  0x00007fb152cabec7 start_thread (libc.so.6 + 0x8eec7)
                                                #6  0x00007fb152d2c19c __clone3 (libc.so.6 + 0x10f19c)
                                                
                                                Stack trace of thread 36529:
                                                #0  0x00007fb152ca8799 __futex_abstimed_wait_common (libc.so.6 + 0x8b799)
                                                #1  0x00007fb152cab139 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x8e139)
                                                #2  0x00007fb13e76e64d cnd_wait (radeonsi_dri.so + 0x16e64d)
                                                #3  0x00007fb13e74d4bb util_queue_thread_func (radeonsi_dri.so + 0x14d4bb)
                                                #4  0x00007fb13e76e57c impl_thrd_routine (radeonsi_dri.so + 0x16e57c)
                                                #5  0x00007fb152cabec7 start_thread (libc.so.6 + 0x8eec7)
                                                #6  0x00007fb152d2c19c __clone3 (libc.so.6 + 0x10f19c)
                                                
                                                Stack trace of thread 36521:
                                                #0  0x00007fb152ca8799 __futex_abstimed_wait_common (libc.so.6 + 0x8b799)
                                                #1  0x00007fb152cab139 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x8e139)
                                                #2  0x00007fb13e76e64d cnd_wait (radeonsi_dri.so + 0x16e64d)
                                                #3  0x00007fb13e74d4bb util_queue_thread_func (radeonsi_dri.so + 0x14d4bb)
                                                #4  0x00007fb13e76e57c impl_thrd_routine (radeonsi_dri.so + 0x16e57c)
                                                #5  0x00007fb152cabec7 start_thread (libc.so.6 + 0x8eec7)
                                                #6  0x00007fb152d2c19c __clone3 (libc.so.6 + 0x10f19c)
                                                
                                                Stack trace of thread 36509:
                                                #0  0x00007fb152d1e61d __poll (libc.so.6 + 0x10161d)
                                                #1  0x00007fb15297d0ba _xcb_conn_wait.part.0 (libxcb.so.1 + 0xe0ba)
                                                #2  0x00007fb15297f1ac xcb_wait_for_special_event (libxcb.so.1 + 0x101ac)
                                                #3  0x00007fb141c6ade1 dri3_wait_for_event_locked (libGLX_mesa.so.0 + 0x51de1)
                                                #4  0x00007fb141c6c35b loader_dri3_wait_for_msc (libGLX_mesa.so.0 + 0x5335b)
                                                #5  0x00007fb141c5da33 dri3_drawable_get_msc (libGLX_mesa.so.0 + 0x44a33)
                                                #6  0x00007fb14103dc86 n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libGLESv2.so +>
                                                #7  0x00007fb14103f887 n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libGLESv2.so +>
                                                #8  0x00007fb140f2c429 n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libGLESv2.so +>
                                                #9  0x00007fb140e5295b n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libGLESv2.so +>
                                                #10 0x00007fb158f02a2b n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libcef.so + 0x>
                                                #11 0x00007fb158ef1e24 n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libcef.so + 0x>
                                                #12 0x00007fb158eed2a3 n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libcef.so + 0x>
                                                #13 0x00007fb157c16176 n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libcef.so + 0x>
                                                #14 0x00007fb157c26bbc n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libcef.so + 0x>
                                                #15 0x00007fb157bdfa6a n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libcef.so + 0x>
                                                #16 0x00007fb157c27284 n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libcef.so + 0x>
                                                #17 0x00007fb157bfee3e n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libcef.so + 0x>
                                                #18 0x00007fb15c830712 n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libcef.so + 0x>
                                                #19 0x00007fb157a69c2a n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libcef.so + 0x>
                                                #20 0x00007fb157a6add2 n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libcef.so + 0x>
                                                #21 0x00007fb15a3e012f n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libcef.so + 0x>
                                                #22 0x00007fb15a3e067e n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libcef.so + 0x>
                                                #23 0x00007fb157a69101 n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libcef.so + 0x>
                                                #24 0x00007fb157afb93c n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libcef.so + 0x>
                                                #25 0x00007fb157ad4e5d n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libcef.so + 0x>
                                                #26 0x00007fb1552efdda n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libcef.so + 0x>
                                                #27 0x00000000005c48b0 n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/steamwebhelper>
                                                #28 0x00000000005871ab n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/steamwebhelper>
                                                ELF object binary architecture: AMD x86-64
Nov 15 06:57:23 emiemi systemd[1]: systemd-coredump@7-54213-0.service: Deactivated successfully.
Nov 15 06:57:23 emiemi audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=syst>
Nov 15 06:57:23 emiemi systemd[1]: systemd-coredump@7-54213-0.service: Consumed 2.057s CPU time.
Nov 15 06:57:23 emiemi audit: BPF prog-id=175 op=UNLOAD
Nov 15 06:57:23 emiemi audit: BPF prog-id=174 op=UNLOAD