Hi,
just wanted to know if anyone else has similar issue with amdgpu crashes on Linux:
I’m using Arch, up to date as of 29.04.2025, kernel 6.14.4-arch1-1 on FW 13 with HX 370.
Usually, while using Firefox (especially when starting videos) or Kitty terminal, the amdgpu module may crash at random moment. Sometimes the affected process continue to work after amdgpu resumed, sometimes process crashes as well.
Typical crash log:
amdgpu 0000:c1:00.0: amdgpu: Dumping IP State
amdgpu 0000:c1:00.0: amdgpu: Dumping IP State Completed
amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 timeout, signaled seq=77533, emitted seq=77535
amdgpu 0000:c1:00.0: amdgpu: Process information: process RDD Process pid 61318 thread firefox:cs0 pid 62586
amdgpu 0000:c1:00.0: amdgpu: GPU reset begin!
amdgpu 0000:c1:00.0: amdgpu: MODE2 reset
amdgpu 0000:c1:00.0: amdgpu: GPU reset succeeded, trying to resume
[drm] PCIE GART of 512M enabled (table at 0x0000008001700000).
amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
[drm] DMUB hardware initialized: version=0x09001B00
amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec_0 uses VM inv eng 1 on hub 8
amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
amdgpu 0000:c1:00.0: amdgpu: ring vpe uses VM inv eng 4 on hub 8
amdgpu 0000:c1:00.0: amdgpu: GPU reset(3) succeeded!
Also during load amdgpu module says Optional firmware ... was not found:
amdgpu: ATOM BIOS: 113-STRIXEMU-001
amdgpu 0000:c1:00.0: amdgpu: VPE: collaborate mode false
amdgpu 0000:c1:00.0: amdgpu: [drm] Optional firmware "amdgpu/isp_4_1_0.bin" was not found
amdgpu 0000:c1:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default)
amdgpu 0000:c1:00.0: amdgpu: VRAM: 512M 0x0000008000000000 - 0x000000801FFFFFFF (512M used)
amdgpu 0000:c1:00.0: amdgpu: GART: 512M 0x00007FFF00000000 - 0x00007FFF1FFFFFFF
[drm] amdgpu: 512M of VRAM memory ready
[drm] amdgpu: 31787M of GTT memory ready.
Probably unrelated problem, but I’m also getting USB C errors from time to time:
I confirm I have both problems as stated in my posts. For gpu issue we need to wait for new kernel and/or firmware. For PD issue we probably need new BIOS/firmware update from Framework. Hardware is still new, but software will catch up pretty soon.
I’m using an extra monitor connected to an HDMI expansion card
amdgpu crashes, then resumes. This causes the extra monitor goes black, the system to become unresponsive, and the audio to still keep playing.
My crash log:
may 06 17:00:48 cheshire kernel: amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 timeout, signaled seq=206212>
may 06 17:00:48 cheshire kernel: amdgpu 0000:c1:00.0: amdgpu: Process information: process RDD Process pid 34>
may 06 17:00:48 cheshire kernel: amdgpu 0000:c1:00.0: amdgpu: GPU reset begin!
may 06 17:00:48 cheshire kernel: amdgpu 0000:c1:00.0: amdgpu: Dumping IP State
may 06 17:00:48 cheshire kernel: amdgpu 0000:c1:00.0: amdgpu: Dumping IP State Completed
may 06 17:00:48 cheshire kernel: amdgpu 0000:c1:00.0: amdgpu: MODE2 reset
may 06 17:00:48 cheshire kernel: amdgpu 0000:c1:00.0: amdgpu: GPU reset succeeded, trying to resume
may 06 17:00:48 cheshire kernel: [drm] PCIE GART of 512M enabled (table at 0x0000008001700000).
may 06 17:00:48 cheshire kernel: amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
may 06 17:00:48 cheshire kernel: amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
may 06 17:00:48 cheshire kernel: [drm] DMUB hardware initialized: version=0x0000D500
may 06 17:00:49 cheshire kernel: amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
may 06 17:00:49 cheshire kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
may 06 17:00:49 cheshire kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
may 06 17:00:49 cheshire kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
may 06 17:00:49 cheshire kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
may 06 17:00:49 cheshire kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
may 06 17:00:49 cheshire kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
may 06 17:00:49 cheshire kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
may 06 17:00:49 cheshire kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
may 06 17:00:49 cheshire kernel: amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
may 06 17:00:49 cheshire kernel: amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
may 06 17:00:49 cheshire kernel: amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec_0 uses VM inv eng 1 on hub 8
may 06 17:00:49 cheshire kernel: amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
may 06 17:00:49 cheshire kernel: amdgpu 0000:c1:00.0: amdgpu: ring vpe uses VM inv eng 4 on hub 8
may 06 17:00:49 cheshire kernel: amdgpu 0000:c1:00.0: amdgpu: recover vram bo from shadow start
may 06 17:00:49 cheshire kernel: amdgpu 0000:c1:00.0: amdgpu: recover vram bo from shadow done
may 06 17:00:49 cheshire kernel: amdgpu 0000:c1:00.0: amdgpu: GPU reset(1) succeeded!
I think, I am in the same boat. I could however tie my GPU driver instability specifically to Firefox hardware acceleration after its recent 138.0.3 update.
FW 13 Ryzen AI HX 370
96 GB RAM
CachyOS (experienced on Kernel 6.14.4 through 6.14.8)
Firmware 3.03
Running KDE 6.3 on Wayland
External monitors do not seem to play a role.
[18551.996374] amdgpu 0000:c1:00.0: amdgpu: Dumping IP State
[18551.998145] amdgpu 0000:c1:00.0: amdgpu: Dumping IP State Completed
[18551.998196] amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 timeout, signaled seq=3602701, emitted seq=3602703
[18551.998198] amdgpu 0000:c1:00.0: amdgpu: Process information: process RDD Process pid 3568 thread firefox:cs0 pid 10821
[18551.998201] amdgpu 0000:c1:00.0: amdgpu: GPU reset begin!
[18552.165466] amdgpu 0000:c1:00.0: amdgpu: MODE2 reset
[18552.183668] amdgpu 0000:c1:00.0: amdgpu: GPU reset succeeded, trying to resume
[18552.184212] [drm] PCIE GART of 512M enabled (table at 0x0000008001700000).
[18552.184282] amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
[18552.188578] amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
[18552.196621] [drm] DMUB hardware initialized: version=0x09001B00
[18552.322599] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[18552.322605] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[18552.322607] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[18552.322608] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[18552.322608] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[18552.322609] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[18552.322610] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[18552.322611] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[18552.322612] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[18552.322612] amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[18552.322613] amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[18552.322614] amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec_0 uses VM inv eng 1 on hub 8
[18552.322615] amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[18552.322616] amdgpu 0000:c1:00.0: amdgpu: ring vpe uses VM inv eng 4 on hub 8
[18552.326663] amdgpu 0000:c1:00.0: amdgpu: GPU reset(1) succeeded!
I have confirmed that downgrading Firefox to 138.0.1 resolves the issue for me completely. The 138.0.3 update seems to have addressd some bugs with both WebGL and video acceleration (changelog). “vcn_unified_0” being the source of the driver crash points to the video decoder. I was close to opening a bug on Mozilla’s bugtracker, but the fact that I couldn’t find anyone else with the problem outside of this forum thread, makes me reconsider whether it’s an issue specific to the new FW13 mainboard.
Were you folks also running Firefox, or one of its forks/redistributions (e.g. LIbreWolf, Zen, Tor Browser…) when the crashes happened?
EDIT: I forgot: Problem frequency went way down on newer Firefox versiosn when disabling hardware video acceleration, another indicator pointing towards VCN.
I was just looking through the thread on Mesa’s bugtracker. I had completely overlooked your first post when scrolling through this one here. I just switched to 6.15.rc7 and will report whether that fixes the issue. Thanks for your work on getting to the bottom of this!
Did this end up fixing the issue for you? Curious if I should bother switching or just deal with it for a few days until arch gets the update.
I have similar logs, but I’ll add another symptom for other folks searching for this: I’m running Hyprland and while I do have the same occasional flashes of a black screen like the others mentioned, every now and then Hyprland itself will crash, too. The last line in this crash report seems like it might be related (but I’m not really sure):
I wanted to make sure and used it for two full work days. I can say that at least for myself, on linux-cachyos-rc 6.15-rc7 and then linux-cachyos 6.15.0, the problem completely went away and I haven’t had a single GPU reset in four days.
It did. Nevertheless, there are more amdgpu issues that affect older kernels so, while this is much more stable now, I sometimes get random freezes caused by another issue.
My Ryzen AI 7 350 Framework 13 has had the same problems. I will try to load a video on say, Twitter, and then the video buffers, followed by a second-or-so-long GPU crash and the video suddenly plays. Sometimes two or three hangs occur, and I am bumped out of my desktop to either the display manager or TTY (depending on whether or not I am using a DM). In addition, these issues are present in every single browser and Linux distro I’ve used, Fedora, Bluefin, NixOS, everything.
I hope this gets fixed soon, I’ve heard that AMD graphics on Linux are very unstable for around the first year before changes come upstream. I have not had these issues on Windows.