[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: failed to remove hardware queue from MES, doorbell=0x1002
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: MES might be in unrecoverable state, issue a GPU reset
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: Failed to evict queue 1
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: Failed to evict process queues
[Sat Oct 11 22:19:13 2025] amdgpu: Failed to quiesce KFD
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: GPU reset begin!
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: Dumping IP State
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: Dumping IP State Completed
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: MODE2 reset
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: GPU reset succeeded, trying to resume
[Sat Oct 11 22:19:13 2025] [drm] PCIE GART of 512M enabled (table at 0x0000008000300000).
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: [drm] AMDGPU device coredump file has been created
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: [drm] Check your /sys/class/drm/card1/device/devcoredump/data
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: SMU is resuming...
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: SMU is resumed successfully!
[Sat Oct 11 22:19:13 2025] amdgpu: Freeing queue vital buffer 0x721f2c000000, queue evicted
[Sat Oct 11 22:19:13 2025] amdgpu: Freeing queue vital buffer 0x721f32c00000, queue evicted
[Sat Oct 11 22:19:13 2025] amdgpu: Freeing queue vital buffer 0x721f4b200000, queue evicted
[Sat Oct 11 22:19:13 2025] amdgpu: Freeing queue vital buffer 0x721f69400000, queue evicted
[Sat Oct 11 22:19:13 2025] amdgpu: Freeing queue vital buffer 0x721f6aa00000, queue evicted
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: [drm] DMUB hardware initialized: version=0x09002600
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring vcn_unified_1 uses VM inv eng 1 on hub 8
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring jpeg_dec_0 uses VM inv eng 4 on hub 8
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring jpeg_dec_1 uses VM inv eng 6 on hub 8
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring vpe uses VM inv eng 7 on hub 8
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: GPU reset(88) succeeded!
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: [drm] device wedged, but recovered through reset
I also have the coredump from: /sys/class/drm/card1/device/devcoredump/data) if that helps. I’d have to throw it in google drive or something.
I have something that helped on my FW16, and it had a similar, but not exactly the same, error in dmesg.
You are on a FW Desktop, so the work around might not help you there, but worth a try.
It is a bit of a work around really, and might cause some games to stop working.
But, it seems to fix ROCM / LLM problems for me.
This looks like the long run compute issue. There is a fix in the 6.14 OEM proposed kernel (-1014 is the kernel version). Enable proposed and try that kernel.