[SOLVED] Amdgpu crashes and artifacts with Mesa 25, kernel 6.13

Aron_Griffis · March 13, 2025, 9:25pm

And now it just happened on 6.12.10 which is from mid-January. Specifically, after resuming from sleep. I also have two external monitors attached via TB4, so my setup pushes the limits of the chipset but that just means I trip on these things more often than people using only the laptop panel.

It clearly happens more often on 6.13.x versions but I’m not sure how far to go back to find a kernel version that works reliably. I’m sure this wasn’t happening in the fall.

Christopher_R_Miller · March 13, 2025, 9:58pm

Reading this and several other threads about this, I believe the issue is with Mesa-25, not the Kernel at all. I’m running Fedora 41, not Tumbleweed; however after downgrading to Mesa-24 I haven’t seen issues. But I also haven’t stressed the system enough to call it fixed.

As best I know there is no simple reproduction reported to the Mesa project at this time.

Aron_Griffis · March 13, 2025, 10:29pm

Thanks, Christopher. I disabled the various updates-testing repos and then sudo dnf distro-sync --allowerasing to get back to Mesa 24. I’ll let you know how it goes.

waltercool · March 13, 2025, 11:27pm

Personal advice

Skip 6.12 and 6.13 as possible, very buggy.

Screen issues, VM issues, ROCM doesn’t work.

Ise 6.10, 6.11 or some 6.14 kernels. 6.14 rc5 works good to me

David_Walker · March 14, 2025, 7:59pm

I am yet another Tumbleweeder using Gnome+Wayland, also with a AMD Ryzen™ 5 7640U laptop. Thanks for the tip about kernel-longterm; I just installed it.

FYI, here are a couple of openSUSE bugzilla reports that I’ve been following as (perhaps) being related to our problem:

1234732 System Freeze with AMD Vega GPUs After Mesa 24.3.x Update & Kernel Logs Reveal Multiple AMD Driver Issues - Takashi Iwai has been posting some test kernels here that have not solved this issue for me, but he’s paying attention. I’m coming to the conclusion, though, that it’s addressing a different problem.
1238204 Error in amdgpu - REG_WAIT timeout 1us * 10 tries - optc3_lock line:128 - This is closed as fixed in Mesa 25.0, but I’m still having the problem in 25.0.1, as Christopher Miller seems to have noticed here.
1238361 - Several second screen freezes after suspend/resume following an upgrade to 20250302 - My original openSUSE report. It was marked as a duplicate of 1234732. I’m going to add a comment to question that decision.

I notice that BIOS 3.07 has recently been released. Has anyone seen if it resolves any of this?

fw13amd · March 14, 2025, 9:00pm

Thanks for the recap , I’ll keep an eye on those.

I installed 3.07 a few days ago and it didn’t change a thing. kernel-longterm is the only fix for me, at least it seems to be that way so far (zero freezes).

David_Walker · March 15, 2025, 7:46pm

Kernel-longterm has also resolved the issue for me, at least for the past 24 hours. Also, the openSUSE bugzilla reports I mentioned above have been resolved without fixing our issue, so I’ve submitted a new one, 1239657 - Garbling screen artifacts after suspend/resume, specifically for what we’re seeing. Please add any additional information you may have. (It’ll show there are more than one person having the problem.)

I think I’ll hold off on 3.07 for now. No reason to roll too many dice at the same time.

lpapadakos · March 16, 2025, 1:57pm

Using Arch Linux. Framework 16 on Linux 6.13.7

I downgraded packages that contained mesa or radeon in the name and were at version 25.x (specifically 25.0.1) to v24.

In my commandline I have amdgpu.dcdebugmask=0x10 amdgpu.gpu_recovery=1

So far I don’t see the amdgpu page fault citing firefox.
We’ll see. I won’t consider it a valid workaround yet but hopeful for now

UPDATE 1 week later: Yup, it seems mesa was the issue. 25.0,2 is out but latest post says it doesn’t help so I’m staying in v24.

Filo · March 19, 2025, 10:36am

I’m yet another FW13 AMD running Fedora 41.
After some testing, I’m sure the problem is caused by mesa 25, and not necessarely correlated to firefox.
I’ve tested a number of times upgrading and downgrading between mesa 24.x and mesa 25, so far every time i downgrade all the artifacts appearing on screen disappeard. I’ve dnf versionlock mesa-dri-drivers to 24.2.4-1.fc41 for now and seems to have fixed the problem

fw13amd · March 19, 2025, 2:11pm

Thanks folks for pointing this in the right direction. By any chance, is any of you aware of a Mesa bug report, or have you filed one?

Thanks

EDIT: looks like a report exists

tripplehelix · March 20, 2025, 6:01pm

I’m getting graphics lock-ups that halt the whole system till it auto resets.

[11371.628554] amdgpu 0000:c1:00.0: amdgpu: Dumping IP State
[11371.630858] amdgpu 0000:c1:00.0: amdgpu: Dumping IP State Completed
[11371.640927] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=2396193, emitted seq=2396195
[11371.640932] amdgpu 0000:c1:00.0: amdgpu: Process information: process Diablo IV.exe pid 25118 thread vkd3d_queue pid 25189
[11371.640934] amdgpu 0000:c1:00.0: amdgpu: Starting gfx_0.0.0 ring reset
[11373.644731] amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=RESET
[11373.644738] [drm:amdgpu_mes_reset_legacy_queue [amdgpu]] *ERROR* failed to reset legacy queue
[11373.644838] amdgpu 0000:c1:00.0: amdgpu: Ring gfx_0.0.0 reset failure
[11373.644840] amdgpu 0000:c1:00.0: amdgpu: GPU reset begin!
[11375.703519] amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
[11375.703533] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[11375.974835] [drm:gfx_v11_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
[11375.976534] amdgpu 0000:c1:00.0: amdgpu: MODE2 reset
[11376.009082] amdgpu 0000:c1:00.0: amdgpu: GPU reset succeeded, trying to resume
[11376.009762] [drm] PCIE GART of 512M enabled (table at 0x00000080FFD00000).
[11376.009859] amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
[11376.011934] amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
[11376.019220] [drm] DMUB hardware initialized: version=0x08004800
[11376.344978] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[11376.344987] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[11376.344990] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[11376.344993] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[11376.344995] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[11376.344997] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[11376.344999] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[11376.345002] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[11376.345004] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[11376.345006] amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[11376.345009] amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[11376.345011] amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[11376.345014] amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[11376.347522] amdgpu 0000:c1:00.0: amdgpu: GPU reset(2) succeeded!
[11384.960509] usb 1-4: reset full-speed USB device number 3 using xhci_hcd
[11385.232465] usb 1-4: reset full-speed USB device number 3 using xhci_hcd
[11407.601853] amdgpu 0000:c1:00.0: amdgpu: VM memory stats for proc Diablo IV.exe(25189) task vkd3d_queue(25118) is non-zero when fini
[12706.265489] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[14498.283304] perf: interrupt took too long (3134 > 3128), lowering kernel.perf_event_max_sample_rate to 63750
[15648.482345] ucsi_acpi USBC000:00: unknown error 256
[15648.482352] ucsi_acpi USBC000:00: GET_CABLE_PROPERTY failed (-5)
[15780.063892] usb 1-4: reset full-speed USB device number 3 using xhci_hcd
[15780.351902] usb 1-4: reset full-speed USB device number 3 using xhci_hcd
[18752.454834] perf: interrupt took too long (3926 > 3917), lowering kernel.perf_event_max_sample_rate to 50750
[22363.849334] amdgpu 0000:c1:00.0: amdgpu: Dumping IP State
[22363.852200] amdgpu 0000:c1:00.0: amdgpu: Dumping IP State Completed
[22363.862251] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=13023333, emitted seq=13023335
[22363.862256] amdgpu 0000:c1:00.0: amdgpu: Process information: process Diablo IV.exe pid 107761 thread vkd3d_queue pid 107873
[22363.862259] amdgpu 0000:c1:00.0: amdgpu: Starting gfx_0.0.0 ring reset
[22365.866087] amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=RESET
[22365.866093] [drm:amdgpu_mes_reset_legacy_queue [amdgpu]] *ERROR* failed to reset legacy queue
[22365.866199] amdgpu 0000:c1:00.0: amdgpu: Ring gfx_0.0.0 reset failure
[22365.866202] amdgpu 0000:c1:00.0: amdgpu: GPU reset begin!
[22367.925667] amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
[22367.925683] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[22368.196607] [drm:gfx_v11_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
[22368.198294] amdgpu 0000:c1:00.0: amdgpu: MODE2 reset
[22368.231906] amdgpu 0000:c1:00.0: amdgpu: GPU reset succeeded, trying to resume
[22368.232606] [drm] PCIE GART of 512M enabled (table at 0x00000080FFD00000).
[22368.232716] [drm] VRAM is lost due to GPU reset!
[22368.232723] amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
[22368.234736] amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
[22368.242021] [drm] DMUB hardware initialized: version=0x08004800
[22368.567696] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[22368.567703] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[22368.567706] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[22368.567708] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[22368.567709] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[22368.567711] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[22368.567712] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[22368.567714] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[22368.567716] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[22368.567717] amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[22368.567719] amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[22368.567721] amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[22368.567723] amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[22368.569927] amdgpu 0000:c1:00.0: amdgpu: GPU reset(4) succeeded!

This has happened a number of times in the past while just browsing. I’ve had it pretty consistently happen while playing Diablo IV. It seems able to recover when playing a game, but is never recoverable when browsing with Firefox.

Not running Flatpaks, as mentioned early on in the thread.
OS: Debian Sid, XFCE (X11)
Running Linux 6.14.0-rc7
With Framework Laptop 13 - AMD Ryzen 7 7840U

Full kernal perams: amdgpu.sg_display=0 amdgpu.dcdebugmask=0x10 usbcore.autosuspend=-1 loglevel=4 i8042.unlock=1

Rednax35 · March 21, 2025, 5:41pm

Running Fedora Silverblue 41 on a Framework Laptop 13 - AMD Ryzen 7 7840U with the 2.8K display and was experiencing these graphical artifacts and GPU crashes, most of the time I would just get kicked back to GDM when it crashed.

I downgraded from Mesa 25.0.1 to Mesa 24.2.4 and that cleared up all the problems. I noticed Mesa 25.0.2 came out yesterday so I hope these issues are cleared up when it lands in Fedora.

inffy · March 21, 2025, 5:49pm

Seems Mesa 25.0.2 doesn’t fix all off these issues

tripplehelix · March 23, 2025, 5:18pm

No, 25.0.2 hasn’t fixed the GPU crashing. Can’t downgrade on Debian Sid unfortunately due to the dependency hell.

fw13amd · March 24, 2025, 5:56am

I don’t even have that kernel parameter enabled and I still face the issue.

If you read previous comments, it looks more likely to be caused by Mesa 25.

In a previous comment I pasted the link to the issue on the Mesa bug tracker.

fw13amd · March 24, 2025, 6:01am

Thread renamed, tags added to better describe the issue

efindus · March 24, 2025, 1:28pm

I’ve been having issues with amdgpu acting up for the better part of the last year, since around kernel 6.9. The solution I’ve been successfully using since February has been putting amdgpu.dcdebugmask=0x12 in the kernel command line. Right now I’m running 6.13.7 with mesa 25 without any issues. Good to hear though that 6.14 might be coming with an actual solution.

lpapadakos · March 25, 2025, 1:30pm

I just got Linux 6.14 on Arch Linux
I’ll do some experiments

UPDATE: I rebooted with mesa 25.0.2 and withing 1-2 hours the problem mentioned in this thread appeared. So Mesa 25 is still a no-go even with Linux 6.14…
Specifically this time it appeared when I opened a video with mpv. Everything froze except the cursor for 7 seconds, and sure enough when I checked dmesg I saw adgpu page fault in process mpv spammed.

My hunch is that this bug appears when playing video. Either firefox or mpv both will find themselves playing video.

The good news is that people seem to have located the commit in mesa that causes the issue

openSUSE seems to be adding a patch to its builds to revert it.

Is that the same issue? They don’t mention page faults there, only the artifacts

UPDATE 2: Seems like it isn’t nesessarilty video specific, since on the thread one person got in on kwin_wayland. And yup he reports the page faults like we see here, so it is the same issue. Arch could also adopt the revert patch tbh, but according to its philosphy it might wait to see it upstreamed first.

–

What remains to be seen is if I’ll get artifacts and freezes from PSR-SU ( have removed dcdebugmask from cmdline). I think this is a seperate issue from the one mentioned in this thread but still in the amdgpu umbrella.

UPDATE 3: 3 days in, it seems like dcdebugmask might not be needed anymore in 6.14. Will continue testing.

efindus · March 28, 2025, 10:54pm

Great to hear about the Mesa progress. Regarding kernel 6.14, a patch has landed temporarily disabling PSR for eDP displays (essentially what the debug flag does), so the lack of need for kernel argument is sadly not indicative of a fix. Fortunately this still means the bug is getting attention, so we might be getting an actual fix soon.

To_Chi · March 29, 2025, 12:18am

Not seeing any issues under 6.14 (via mainline) and Ubuntu 24.10, so that’s Mesa 24.2.8

Ubuntu have the beta of 25.04 out Right Now, which uses 6.14 kernel, and probably Mesa 25 but I’ve not had a chance to check it with a live USB yet.

Topic		Replies	Views
Serious graphical stability issues after suspend/hibernate on 6.13.7 Linux arch	4	255	March 23, 2025
Screen is glitchy with colored pixels moving on Fedora 41 Laptop 13 AMD Ryzen 7040 Linux fedora	41	2132	July 11, 2025
[SOLVED] Framework website crashes/freezes on kernel 6.10 with Firefox on AMD Framework Linux gentoo	25	2977	October 23, 2024
Graphical Corruption in Fedora 41 on AMD (BIOS3.06, Linux 6.13.5) Linux fedora	55	3633	May 11, 2025
Graphics card not available Linux arch	101	3023	May 10, 2025

[SOLVED] Amdgpu crashes and artifacts with Mesa 25, kernel 6.13

Related topics