Since a recent update (I’m guessing sometime earlier this week), I’ve started seeing strange behaviour where my screen will randomly “completely lock up” and turn itself on-and-off multiple times.
I looked at journalctl
logs, and it appears to be caused by a page fault somewhere in the AMD GPU driver:
Aug 14 16:49:20.864423 kernel: amdgpu 0000:c1:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:2 pasid:32787)
Aug 14 16:49:20.866611 kernel: amdgpu 0000:c1:00.0: amdgpu: in process RDD Process pid 2944 thread firefox:cs0 pid 12238)
Aug 14 16:49:20.866782 kernel: amdgpu 0000:c1:00.0: amdgpu: in page starting at address 0x0000800106f86000 from client 18
Aug 14 16:49:20.867086 kernel: amdgpu 0000:c1:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00203A11
Aug 14 16:49:20.867350 kernel: amdgpu 0000:c1:00.0: amdgpu: Faulty UTCL2 client ID: unknown (0x1d)
Aug 14 16:49:20.867682 kernel: amdgpu 0000:c1:00.0: amdgpu: MORE_FAULTS: 0x1
Aug 14 16:49:20.867835 kernel: amdgpu 0000:c1:00.0: amdgpu: WALKER_ERROR: 0x0
Aug 14 16:49:20.867977 kernel: amdgpu 0000:c1:00.0: amdgpu: PERMISSION_FAULTS: 0x1
Aug 14 16:49:20.868159 kernel: amdgpu 0000:c1:00.0: amdgpu: MAPPING_ERROR: 0x0
Aug 14 16:49:20.868406 kernel: amdgpu 0000:c1:00.0: amdgpu: RW: 0x0
The turning on-and-off seems to happen as the GPU is trying to reset itself in order to recover (but fails):
Aug 14 16:49:31.087880 kernel: amdgpu 0000:c1:00.0: amdgpu: GPU reset begin!
Aug 14 16:49:31.658494 kernel: [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
Aug 14 16:49:31.876468 kernel: [drm] Register(0) [regUVD_RB_RPTR] failed to reach value 0x00000300 != 0x00000280n
Aug 14 16:49:31.890438 kernel: [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
Aug 14 16:49:31.890589 kernel: amdgpu 0000:c1:00.0: amdgpu: MODE2 reset
Aug 14 16:49:31.932529 kernel: amdgpu 0000:c1:00.0: amdgpu: GPU reset succeeded, trying to resume
Aug 14 16:49:31.933181 kernel: [drm] PCIE GART of 512M enabled (table at 0x000000801FD00000).
Aug 14 16:49:31.933250 kernel: [drm] VRAM is lost due to GPU reset!
Aug 14 16:49:31.933295 kernel: amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
Aug 14 16:49:31.936445 kernel: amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
Aug 14 16:49:31.938428 kernel: [drm] DMUB hardware initialized: version=0x08003D00
Aug 14 16:49:32.347458 kernel: [drm] kiq ring mec 3 pipe 1 q 0
Aug 14 16:49:32.603702 kernel: amdgpu 0000:c1:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring vcn_unified_0 test failed (-110)
Aug 14 16:49:32.604502 kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <vcn_v4_0> failed -110
Aug 14 16:49:32.604572 kernel: amdgpu 0000:c1:00.0: amdgpu: GPU reset(1) failed
Aug 14 16:49:32.604891 kernel: amdgpu 0000:c1:00.0: amdgpu: GPU reset end with ret = -110
Aug 14 16:49:32.605129 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -110
Aug 14 16:49:33.895440 kernel: [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
Aug 14 16:49:34.147454 kernel: [drm] Register(0) [regUVD_RB_RPTR] failed to reach value 0x00000040 != 0x00000000n
Aug 14 16:49:36.709902 kernel: [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
Edit: To clarify, I’m on Fedora 40.