I’ve got my first instance of this bug on fedora rawhide (6.7 kernel, with xxmitsu/mesa-git) - It looks like for me it’s preceded by steam runtime coredumping/iommu issues coredumping. My guess is it is corrupting/not freeing some memory region accessed by amdgpu during a power state change.
Steps to reproduce:
a) Launch steam - run something intensive for a while (I used civ6 with proton-ge23)
b) observe steam runtime coredumps and/or iommu errrors in journal
c) change power state from being AC powered to Battery
d) watch as machine goes into sleep/idle
e) Quickly resume from sleep.
f) Get graphical corruption on resume (in my case a whited out screen where sddm/plasma unlock should be) - my cursor was still responsive/active but vtty switching was broken.
g) Hold power to hard reset
h) Look at previous 10m journal
Attached is output from my journal
Nov 15 06:58:00 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff90dc0000 flags=0x0000] Nov 15 06:58:00 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff90dc1000 flags=0x0000] Nov 15 06:58:00 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff90dc2000 flags=0x0000] Nov 15 06:58:00 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff90dc3000 flags=0x0000] Nov 15 06:58:00 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff90dc4000 flags=0x0000] Nov 15 06:58:00 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff90dc5000 flags=0x0000] Nov 15 06:58:00 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff90dc6000 flags=0x0000] Nov 15 06:58:00 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff90dc7000 flags=0x0000] Nov 15 06:58:00 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff90dc8000 flags=0x0000] Nov 15 06:58:00 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff90dd4000 flags=0x0000] Nov 15 06:58:05 emiemi kernel: amd_iommu_report_page_fault: 87654 callbacks suppressed Nov 15 06:58:05 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83400000 flags=0x0000] Nov 15 06:58:05 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83401000 flags=0x0000] Nov 15 06:58:05 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83402000 flags=0x0000] Nov 15 06:58:05 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83403000 flags=0x0000] Nov 15 06:58:05 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83404000 flags=0x0000] Nov 15 06:58:05 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83405000 flags=0x0000] Nov 15 06:58:05 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83406000 flags=0x0000] Nov 15 06:58:05 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83407000 flags=0x0000] Nov 15 06:58:05 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83408000 flags=0x0000] Nov 15 06:58:05 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83414000 flags=0x0000] Nov 15 06:58:08 emiemi kernel: i2c_hid_acpi i2c-FRMW0005:00: i2c_hid_get_input: incomplete report (7/65535) Nov 15 06:58:10 emiemi kernel: amd_iommu_report_page_fault: 87201 callbacks suppressed Nov 15 06:58:10 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83640000 flags=0x0000] Nov 15 06:58:10 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83641000 flags=0x0000] Nov 15 06:58:10 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83642000 flags=0x0000] Nov 15 06:58:10 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83643000 flags=0x0000] Nov 15 06:58:10 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83644000 flags=0x0000] Nov 15 06:58:10 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83645000 flags=0x0000] Nov 15 06:58:10 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83646000 flags=0x0000] Nov 15 06:58:10 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83647000 flags=0x0000] Nov 15 06:58:10 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83648000 flags=0x0000] Nov 15 06:58:10 emiemi kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff83654000 flags=0x0000] Nov 15 06:58:15 emiemi kernel: amd_iommu_report_page_fault: 86848 callbacks suppressed #26 0x00007fb1552efdda n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libcef.so + 0x> #6 0x00007fb152d2c19c __clone3 (libc.so.6 + 0x10f19c) Stack trace of thread 36523: #0 0x00007fb152ca8799 __futex_abstimed_wait_common (libc.so.6 + 0x8b799) #1 0x00007fb152cab139 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x8e139) #2 0x00007fb13e76e64d cnd_wait (radeonsi_dri.so + 0x16e64d) #3 0x00007fb13e74d4bb util_queue_thread_func (radeonsi_dri.so + 0x14d4bb) #4 0x00007fb13e76e57c impl_thrd_routine (radeonsi_dri.so + 0x16e57c) #5 0x00007fb152cabec7 start_thread (libc.so.6 + 0x8eec7) #6 0x00007fb152d2c19c __clone3 (libc.so.6 + 0x10f19c) Stack trace of thread 36520: #0 0x00007fb152ca8799 __futex_abstimed_wait_common (libc.so.6 + 0x8b799) #1 0x00007fb152cab139 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x8e139) #2 0x00007fb13e76e64d cnd_wait (radeonsi_dri.so + 0x16e64d) #3 0x00007fb13e74d4bb util_queue_thread_func (radeonsi_dri.so + 0x14d4bb) #4 0x00007fb13e76e57c impl_thrd_routine (radeonsi_dri.so + 0x16e57c) #5 0x00007fb152cabec7 start_thread (libc.so.6 + 0x8eec7) #6 0x00007fb152d2c19c __clone3 (libc.so.6 + 0x10f19c) Stack trace of thread 36529: #0 0x00007fb152ca8799 __futex_abstimed_wait_common (libc.so.6 + 0x8b799) #1 0x00007fb152cab139 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x8e139) #2 0x00007fb13e76e64d cnd_wait (radeonsi_dri.so + 0x16e64d) #3 0x00007fb13e74d4bb util_queue_thread_func (radeonsi_dri.so + 0x14d4bb) #4 0x00007fb13e76e57c impl_thrd_routine (radeonsi_dri.so + 0x16e57c) #5 0x00007fb152cabec7 start_thread (libc.so.6 + 0x8eec7) #6 0x00007fb152d2c19c __clone3 (libc.so.6 + 0x10f19c) Stack trace of thread 36521: #0 0x00007fb152ca8799 __futex_abstimed_wait_common (libc.so.6 + 0x8b799) #1 0x00007fb152cab139 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x8e139) #2 0x00007fb13e76e64d cnd_wait (radeonsi_dri.so + 0x16e64d) #3 0x00007fb13e74d4bb util_queue_thread_func (radeonsi_dri.so + 0x14d4bb) #4 0x00007fb13e76e57c impl_thrd_routine (radeonsi_dri.so + 0x16e57c) #5 0x00007fb152cabec7 start_thread (libc.so.6 + 0x8eec7) #6 0x00007fb152d2c19c __clone3 (libc.so.6 + 0x10f19c) Stack trace of thread 36509: #0 0x00007fb152d1e61d __poll (libc.so.6 + 0x10161d) #1 0x00007fb15297d0ba _xcb_conn_wait.part.0 (libxcb.so.1 + 0xe0ba) #2 0x00007fb15297f1ac xcb_wait_for_special_event (libxcb.so.1 + 0x101ac) #3 0x00007fb141c6ade1 dri3_wait_for_event_locked (libGLX_mesa.so.0 + 0x51de1) #4 0x00007fb141c6c35b loader_dri3_wait_for_msc (libGLX_mesa.so.0 + 0x5335b) #5 0x00007fb141c5da33 dri3_drawable_get_msc (libGLX_mesa.so.0 + 0x44a33) #6 0x00007fb14103dc86 n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libGLESv2.so +> #7 0x00007fb14103f887 n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libGLESv2.so +> #8 0x00007fb140f2c429 n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libGLESv2.so +> #9 0x00007fb140e5295b n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libGLESv2.so +> #10 0x00007fb158f02a2b n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libcef.so + 0x> #11 0x00007fb158ef1e24 n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libcef.so + 0x> #12 0x00007fb158eed2a3 n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libcef.so + 0x> #13 0x00007fb157c16176 n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libcef.so + 0x> #14 0x00007fb157c26bbc n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libcef.so + 0x> #15 0x00007fb157bdfa6a n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libcef.so + 0x> #16 0x00007fb157c27284 n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libcef.so + 0x> #17 0x00007fb157bfee3e n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libcef.so + 0x> #18 0x00007fb15c830712 n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libcef.so + 0x> #19 0x00007fb157a69c2a n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libcef.so + 0x> #20 0x00007fb157a6add2 n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libcef.so + 0x> #21 0x00007fb15a3e012f n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libcef.so + 0x> #22 0x00007fb15a3e067e n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libcef.so + 0x> #23 0x00007fb157a69101 n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libcef.so + 0x> #24 0x00007fb157afb93c n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libcef.so + 0x> #25 0x00007fb157ad4e5d n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libcef.so + 0x> #26 0x00007fb1552efdda n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/libcef.so + 0x> #27 0x00000000005c48b0 n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/steamwebhelper> #28 0x00000000005871ab n/a (/home/aenertia/.local/share/Steam/ubuntu12_64/steamwebhelper> ELF object binary architecture: AMD x86-64 Nov 15 06:57:23 emiemi systemd[1]: systemd-coredump@7-54213-0.service: Deactivated successfully. Nov 15 06:57:23 emiemi audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=syst> Nov 15 06:57:23 emiemi systemd[1]: systemd-coredump@7-54213-0.service: Consumed 2.057s CPU time. Nov 15 06:57:23 emiemi audit: BPF prog-id=175 op=UNLOAD Nov 15 06:57:23 emiemi audit: BPF prog-id=174 op=UNLOAD