I’m running KDE Plasma on Arch with kernel 6.10.2 and every so often my laptop will freeze and the screen will display graphical glitches. The laptop has been extremely stable over the past few months and this issue has only cropped up recently. I’ve tried with both the scatter-gather kernel option enabled and disabled (sg_display) with no luck. I get the following messages appear in the kernel log:
Aug 04 14:56:03 framework kernel: amdgpu 0000:c1:00.0: amdgpu: MODE2 reset
Aug 04 14:56:03 framework kernel: amdgpu 0000:c1:00.0: amdgpu: GPU reset succeeded, trying to resume
Aug 04 14:56:21 framework kernel: amdgpu 0000:c1:00.0: amdgpu: failed to write reg 1a774 wait reg 1a786
Aug 04 14:56:21 framework kernel: [drm] PCIE GART of 512M enabled (table at 0x000000801FD00000).
Aug 04 14:56:21 framework kernel: [drm] VRAM is lost due to GPU reset!
Aug 04 14:56:21 framework kernel: amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
Aug 04 14:56:21 framework kernel: amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
Aug 04 14:56:21 framework kernel: [drm] DMUB hardware initialized: version=0x08003D00
Aug 04 14:56:22 framework kernel: [drm] kiq ring mec 3 pipe 1 q 0
Aug 04 14:56:22 framework kernel: [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
Aug 04 14:56:22 framework kernel: amdgpu 0000:c1:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring vcn_unified_0 test failed (-110)
Aug 04 14:56:22 framework kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <vcn_v4_0> failed -110
Aug 04 14:56:22 framework kernel: amdgpu 0000:c1:00.0: amdgpu: GPU reset(2) failed
Aug 04 14:56:22 framework kernel: amdgpu 0000:c1:00.0: amdgpu: GPU reset end with ret = -110
Aug 04 14:56:22 framework kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -110
Aug 04 14:56:22 framework kernel: [drm] Fence fallback timer expired on ring sdma0
Aug 04 14:56:23 framework kernel: [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
Aug 04 14:56:23 framework kernel: [drm] Fence fallback timer expired on ring sdma0
Aug 04 14:56:23 framework kernel: [drm] Register(0) [regUVD_RB_RPTR] failed to reach value 0x00000040 != 0x00000000n
Aug 04 14:56:24 framework kernel: [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
Aug 04 14:56:24 framework kernel: [drm] Fence fallback timer expired on ring sdma0
Aug 04 14:56:24 framework kernel: [drm] Fence fallback timer expired on ring sdma0
Aug 04 14:56:25 framework kernel: [drm] Fence fallback timer expired on ring sdma0
Aug 04 14:56:25 framework kernel: [drm] Fence fallback timer expired on ring sdma0
Aug 04 14:56:26 framework kernel: [drm] Fence fallback timer expired on ring sdma0
Aug 04 14:56:26 framework kernel: [drm] Fence fallback timer expired on ring sdma0
Aug 04 14:56:27 framework kernel: [drm] Fence fallback timer expired on ring sdma0
Aug 04 14:56:27 framework kernel: [drm] Fence fallback timer expired on ring sdma0
Aug 04 14:56:28 framework kernel: [drm] Fence fallback timer expired on ring sdma0
Aug 04 14:56:28 framework kernel: [drm] Fence fallback timer expired on ring sdma0
Aug 04 14:56:32 framework kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring vcn_unified_0 timeout, signaled seq=8209, emitted seq=8209
Aug 04 14:56:32 framework kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process RDD Process pid 5372 thread firefox:cs0 pid 5426
Aug 04 14:56:32 framework kernel: amdgpu 0000:c1:00.0: amdgpu: GPU reset begin!
More details on this:
It happend after a few minute after booting and plasma crashed and restart itself before a full graphical freeze of my system, here is the kernel log:
[ 201.841263] usb 1-1: USB disconnect, device number 2
[ 234.269521] ucsi_acpi USBC000:00: unknown error 0
[ 234.269533] ucsi_acpi USBC000:00: GET_CABLE_PROPERTY failed (-5)
[ 725.120960] gmc_v11_0_process_interrupt: 65 callbacks suppressed
[ 725.120967] amdgpu 0000:c1:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:1 pasid:32809)
[ 725.120974] amdgpu 0000:c1:00.0: amdgpu: in process RDD Process pid 4130 thread firefox:cs0 pid 4148)
[ 725.120977] amdgpu 0000:c1:00.0: amdgpu: in page starting at address 0x00008001055f4000 from client 18
[ 725.120980] amdgpu 0000:c1:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00103A11
[ 725.120983] amdgpu 0000:c1:00.0: amdgpu: Faulty UTCL2 client ID: unknown (0x1d)
[ 725.120985] amdgpu 0000:c1:00.0: amdgpu: MORE_FAULTS: 0x1
[ 725.120987] amdgpu 0000:c1:00.0: amdgpu: WALKER_ERROR: 0x0
[ 725.120989] amdgpu 0000:c1:00.0: amdgpu: PERMISSION_FAULTS: 0x1
[ 725.120990] amdgpu 0000:c1:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 725.120992] amdgpu 0000:c1:00.0: amdgpu: RW: 0x0
[ 725.120996] amdgpu 0000:c1:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:1 pasid:32809)
[ 725.120999] amdgpu 0000:c1:00.0: amdgpu: in process RDD Process pid 4130 thread firefox:cs0 pid 4148)
[ 725.121002] amdgpu 0000:c1:00.0: amdgpu: in page starting at address 0x000080010551c000 from client 18
[ 725.121004] amdgpu 0000:c1:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 725.121006] amdgpu 0000:c1:00.0: amdgpu: Faulty UTCL2 client ID: VMC (0x0)
[ 725.121009] amdgpu 0000:c1:00.0: amdgpu: MORE_FAULTS: 0x0
[ 725.121010] amdgpu 0000:c1:00.0: amdgpu: WALKER_ERROR: 0x0
[ 725.121012] amdgpu 0000:c1:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[ 725.121014] amdgpu 0000:c1:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 725.121016] amdgpu 0000:c1:00.0: amdgpu: RW: 0x0
[ 725.121019] amdgpu 0000:c1:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:1 pasid:32809)
[ 725.121022] amdgpu 0000:c1:00.0: amdgpu: in process RDD Process pid 4130 thread firefox:cs0 pid 4148)
[ 725.121024] amdgpu 0000:c1:00.0: amdgpu: in page starting at address 0x000080010551b000 from client 18
[ 725.121027] amdgpu 0000:c1:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 725.121028] amdgpu 0000:c1:00.0: amdgpu: Faulty UTCL2 client ID: VMC (0x0)
[ 725.121030] amdgpu 0000:c1:00.0: amdgpu: MORE_FAULTS: 0x0
[ 725.121032] amdgpu 0000:c1:00.0: amdgpu: WALKER_ERROR: 0x0
[ 725.121034] amdgpu 0000:c1:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[ 725.121036] amdgpu 0000:c1:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 725.121037] amdgpu 0000:c1:00.0: amdgpu: RW: 0x0
[ 725.121444] amdgpu 0000:c1:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:1 pasid:32809)
[ 725.121448] amdgpu 0000:c1:00.0: amdgpu: in process RDD Process pid 4130 thread firefox:cs0 pid 4148)
[ 725.121450] amdgpu 0000:c1:00.0: amdgpu: in page starting at address 0x000080010551c000 from client 18
[ 725.121453] amdgpu 0000:c1:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00103A11
[ 725.121455] amdgpu 0000:c1:00.0: amdgpu: Faulty UTCL2 client ID: unknown (0x1d)
[ 725.121457] amdgpu 0000:c1:00.0: amdgpu: MORE_FAULTS: 0x1
[ 725.121459] amdgpu 0000:c1:00.0: amdgpu: WALKER_ERROR: 0x0
[ 725.121461] amdgpu 0000:c1:00.0: amdgpu: PERMISSION_FAULTS: 0x1
[ 725.121462] amdgpu 0000:c1:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 725.121464] amdgpu 0000:c1:00.0: amdgpu: RW: 0x0
[ 725.121468] amdgpu 0000:c1:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:1 pasid:32809)
[ 725.121471] amdgpu 0000:c1:00.0: amdgpu: in process RDD Process pid 4130 thread firefox:cs0 pid 4148)
[ 725.121473] amdgpu 0000:c1:00.0: amdgpu: in page starting at address 0x00008001055f4000 from client 18
[ 725.121475] amdgpu 0000:c1:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 725.121477] amdgpu 0000:c1:00.0: amdgpu: Faulty UTCL2 client ID: VMC (0x0)
[ 725.121479] amdgpu 0000:c1:00.0: amdgpu: MORE_FAULTS: 0x0
[ 725.121481] amdgpu 0000:c1:00.0: amdgpu: WALKER_ERROR: 0x0
[ 725.121483] amdgpu 0000:c1:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[ 725.121485] amdgpu 0000:c1:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 725.121486] amdgpu 0000:c1:00.0: amdgpu: RW: 0x0
[ 725.121490] amdgpu 0000:c1:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:1 pasid:32809)
[ 725.121493] amdgpu 0000:c1:00.0: amdgpu: in process RDD Process pid 4130 thread firefox:cs0 pid 4148)
[ 725.121495] amdgpu 0000:c1:00.0: amdgpu: in page starting at address 0x000080010551b000 from client 18
[ 725.121497] amdgpu 0000:c1:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 725.121499] amdgpu 0000:c1:00.0: amdgpu: Faulty UTCL2 client ID: VMC (0x0)
[ 725.121501] amdgpu 0000:c1:00.0: amdgpu: MORE_FAULTS: 0x0
[ 725.121502] amdgpu 0000:c1:00.0: amdgpu: WALKER_ERROR: 0x0
[ 725.121504] amdgpu 0000:c1:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[ 725.121506] amdgpu 0000:c1:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 725.121507] amdgpu 0000:c1:00.0: amdgpu: RW: 0x0
[ 725.121513] amdgpu 0000:c1:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:1 pasid:32809)
[ 725.121516] amdgpu 0000:c1:00.0: amdgpu: in process RDD Process pid 4130 thread firefox:cs0 pid 4148)
[ 725.121518] amdgpu 0000:c1:00.0: amdgpu: in page starting at address 0x000080010551c000 from client 18
[ 725.121520] amdgpu 0000:c1:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 725.121522] amdgpu 0000:c1:00.0: amdgpu: Faulty UTCL2 client ID: VMC (0x0)
[ 725.121523] amdgpu 0000:c1:00.0: amdgpu: MORE_FAULTS: 0x0
[ 725.121525] amdgpu 0000:c1:00.0: amdgpu: WALKER_ERROR: 0x0
[ 725.121527] amdgpu 0000:c1:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[ 725.121529] amdgpu 0000:c1:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 725.121530] amdgpu 0000:c1:00.0: amdgpu: RW: 0x0
[ 735.282156] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring vcn_unified_0 timeout, signaled seq=8626, emitted seq=8629
[ 735.282818] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process RDD Process pid 4130 thread firefox:cs0 pid 4148
[ 735.283497] amdgpu 0000:c1:00.0: amdgpu: GPU reset begin!
[ 735.573529] [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
[ 735.811063] [drm] Register(0) [regUVD_RB_RPTR] failed to reach value 0x00000280 != 0x00000200n
[ 736.051069] [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
[ 736.057197] amdgpu 0000:c1:00.0: amdgpu: MODE2 reset
[ 736.093976] amdgpu 0000:c1:00.0: amdgpu: GPU reset succeeded, trying to resume
[ 736.094811] [drm] PCIE GART of 512M enabled (table at 0x000000801FD00000).
[ 736.094956] [drm] VRAM is lost due to GPU reset!
[ 736.094961] amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
[ 736.095973] amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
[ 736.098239] [drm] DMUB hardware initialized: version=0x08003D00
[ 736.407961] [drm] kiq ring mec 3 pipe 1 q 0
[ 736.663268] [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
[ 736.663427] amdgpu 0000:c1:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
[ 736.664240] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[ 736.664244] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ 736.664247] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ 736.664249] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[ 736.664252] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[ 736.664254] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[ 736.664256] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[ 736.664258] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[ 736.664260] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[ 736.664262] amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[ 736.664265] amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[ 736.664267] amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[ 736.664270] amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[ 736.667054] amdgpu 0000:c1:00.0: amdgpu: recover vram bo from shadow start
[ 736.667057] amdgpu 0000:c1:00.0: amdgpu: recover vram bo from shadow done
[ 736.667072] amdgpu 0000:c1:00.0: amdgpu: GPU reset(1) succeeded!
[ 736.669846] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 736.832229] firefox:cs0[4148]: segfault at 0 ip 000055d89a73bf36 sp 00007f80fd5ff9c0 error 6 in firefox[1cf36,55d89a72a000+5e000] likely on CPU 5 (core 2, socket 0)
[ 736.832245] Code: 48 8b 2d bd 50 06 00 48 8b 75 00 e8 54 ee fe ff 48 8b 75 00 bf 0a 00 00 00 e8 d6 ee fe ff 48 8d 05 8f 53 06 00 48 89 18 31 d2 <89> 14 25 00 00 00 00 0f 0b f3 0f 1e fa 50 58 48 8d 3d 0c f0 04 00
[ 737.952899] [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000000n
[ 738.190826] [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000000n
[ 739.605231] lockdown_is_locked_down: 5 callbacks suppressed
[ 739.605236] Lockdown: systemd-logind: hibernation is restricted; see man kernel_lockdown.7
[ 739.618105] Lockdown: systemd-logind: hibernation is restricted; see man kernel_lockdown.7
[ 739.618273] Lockdown: systemd-logind: hibernation is restricted; see man kernel_lockdown.7
[ 739.618419] Lockdown: systemd-logind: hibernation is restricted; see man kernel_lockdown.7
[ 739.618434] Lockdown: systemd-logind: hibernation is restricted; see man kernel_lockdown.7
[ 776.755691] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring vcn_unified_0 timeout, signaled seq=8636, emitted seq=8638
[ 776.756314] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process RDD Process pid 7214 thread firefox:cs0 pid 7580
[ 776.756846] amdgpu 0000:c1:00.0: amdgpu: GPU reset begin!
[ 777.065320] [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
[ 777.282149] [drm] Register(0) [regUVD_RB_RPTR] failed to reach value 0x000000c0 != 0x00000040n
[ 777.499099] [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
[ 777.505171] amdgpu 0000:c1:00.0: amdgpu: MODE2 reset
[ 777.542106] amdgpu 0000:c1:00.0: amdgpu: GPU reset succeeded, trying to resume
[ 777.542923] [drm] PCIE GART of 512M enabled (table at 0x000000801FD00000).
[ 777.542970] [drm] VRAM is lost due to GPU reset!
[ 777.542973] amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
[ 777.545160] amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
[ 777.546808] [drm] DMUB hardware initialized: version=0x08003D00
[ 777.858451] [drm] kiq ring mec 3 pipe 1 q 0
[ 778.102969] [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
[ 778.103465] amdgpu 0000:c1:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
[ 778.104057] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[ 778.104061] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ 778.104063] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ 778.104065] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[ 778.104067] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[ 778.104069] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[ 778.104071] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[ 778.104073] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[ 778.104075] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[ 778.104077] amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[ 778.104079] amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[ 778.104081] amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[ 778.104084] amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[ 778.105579] amdgpu 0000:c1:00.0: amdgpu: recover vram bo from shadow start
[ 778.105583] amdgpu 0000:c1:00.0: amdgpu: recover vram bo from shadow done
[ 778.105602] amdgpu 0000:c1:00.0: amdgpu: GPU reset(2) succeeded!
[ 778.173163] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 778.291828] firefox:cs0[7580]: segfault at 0 ip 00005611225b3f36 sp 00007f4da39ff9c0 error 6 in firefox[1cf36,5611225a2000+5e000] likely on CPU 1 (core 0, socket 0)
[ 778.291846] Code: 48 8b 2d bd 50 06 00 48 8b 75 00 e8 54 ee fe ff 48 8b 75 00 bf 0a 00 00 00 e8 d6 ee fe ff 48 8d 05 8f 53 06 00 48 89 18 31 d2 <89> 14 25 00 00 00 00 0f 0b f3 0f 1e fa 50 58 48 8d 3d 0c f0 04 00
[ 779.356441] [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000000n