DCMUB Error on BIOS 3.05 + Kernel 6.13.1 hit a very nasty AMDGPU bug on Framework Laptop 13 (AMD Ryzen 7 7840U)

AkechiShiro · March 1, 2025, 4:55pm

Hi all,

I have 64GB of RAM official from Framework, and 4GB are allocated to the AMDGPU (UMA optimized options in BIOS, I don’t remember the exact name anymore).

USB-C dock : Thinkpad USB-C dock gen 2 with one screen 1920x1080@60hz using HDMI plugged in.

So I was just browsing the web with about 30 tabs open, and the laptop just froze a few times and the gpu did recover at first, but then I tried to fullscreen a video and it completely froze, no GPU recovery, seemed to have been tried after that.

As you can see the input apparently was very slow.

I also saw these lines :

kwin_wayland_drm: Pageflip timed out! This is a kernel bug

And have more logs on my phone as some were not written due to me hard shutting down the laptop as I had no ssh session running and keyboard was unresponsive, I will enable sysrq keys and hope I can reproduce it next time.

I tried to switch tty and it did work I saw a lot of drm errors (flip_done timed out and (CRTC:79:crtc-01/CONNECTOR:93eDP-1/PLANE:50:plane-3: commit wait timed_out) but the keyboard wasn’t working nor was an external USB one, it isn’t that it wasn’t working but basically the laptop was extremely slow, like if I type one key, 10 minutes later the letter appears.

Unplugging or replugging the USB-C dock did not change anything during the issue.

I’m very concerned about this issue cause reproducing it, I have no clue how to do that.

Tagging :

@Mario_Limonciello if you know if there is any firmware bump in the firmware provided in the next BIOS release please feel free to let us know, or if there is already a similar bug that has been reported upstream.
@Kieran_Levin : Please let us know any news on the BIOS release 3.0.7, I won’t be testing the Beta however.
@Matt_Hartley : Could Framework work more closely with AMD on squashing those very edge cases ? Is there something we as users could do to help better pin those issues (given that reproducibility is very hard) ?

Currently I have rebooted on 6.13.4 #1-NixOS SMP PREEMPT_DYNAMIC Fri Feb 21 13:11:21 UTC 2025 x86_64 GNU/Linux
Linux firmware : /nix/store/n58qg129fcnbp52gwblp2rg00p1v0z3x-linux-firmware-20250211-zstd

May be related to : NixOS AMD Framework 13th AMD Ryzen 7 7840U/64GB Framework DDR5 RAM/UMA settings gamer on kernels 6.11.X have random heavy lags related to AMDGPU or possibly firmware

Kind regards,
Lahfa Samy

AkechiShiro · March 1, 2025, 7:55pm

Reproduced a less annoying issue (it did recover, and not freeze) :

This happened when I was on the tab : listenonrepeat.com (YouTube video embedded playing) with Firefox 135.01 with Kernel 6.13.4
Uptime doesn’t seem to matter.
KDE Plasma 6.2.5 (Wayland + XWayland)

[11493.749110] amdgpu 0000:c1:00.0: amdgpu: Dumping IP State
[11493.751769] amdgpu 0000:c1:00.0: amdgpu: Dumping IP State Completed
[11493.761816] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=2246503, emitted seq=2246505
[11493.761819] amdgpu 0000:c1:00.0: amdgpu: Process information: process .firefox-wrappe pid 3642 thread .firefox-w:cs0 pid 3730
[11493.761822] amdgpu 0000:c1:00.0: amdgpu: Starting gfx_0.0.0 ring reset
[11495.765672] amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=RESET
[11495.765679] [drm:amdgpu_mes_reset_legacy_queue [amdgpu]] *ERROR* failed to reset legacy queue
[11495.765852] amdgpu 0000:c1:00.0: amdgpu: Ring gfx_0.0.0 reset failure
[11495.765854] amdgpu 0000:c1:00.0: amdgpu: GPU reset begin!
[11495.766076] Bluetooth: hci0: ACL packet for unknown connection handle 3837
[11497.977385] amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
[11497.977401] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[11498.246347] [drm:gfx_v11_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
[11498.248326] amdgpu 0000:c1:00.0: amdgpu: MODE2 reset
[11498.285771] amdgpu 0000:c1:00.0: amdgpu: GPU reset succeeded, trying to resume
[11498.286476] [drm] PCIE GART of 512M enabled (table at 0x00000080FFD00000).
[11498.286560] amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
[11498.289240] amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
[11498.296513] [drm] DMUB hardware initialized: version=0x08004B01
[11498.957456] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[11498.957464] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[11498.957466] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[11498.957469] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[11498.957470] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[11498.957472] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[11498.957473] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[11498.957474] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[11498.957476] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[11498.957478] amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[11498.957480] amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[11498.957481] amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[11498.957483] amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[11498.959553] amdgpu 0000:c1:00.0: amdgpu: GPU reset(2) succeeded!
[11499.028123] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!

EDIT (02/03/2025) :

Added KDE Plasma version
Added observation of slow down in Firefox during sweeping (hover across multiples YouTubes tabs fast)
Not able to reproduce yet.

I believe the firefox preview of pinned YouTube tabs (sweeping the cursor) across multiples tabs very fast (this slows down the laptop quite a bit, as if it is struggling a lot to render frame per seconds but only for the Firefox window, but it doesn’t trigger the issue), I haven’t been able to reproduce it yet.

Without the preview sweeping across the tab does not cause slow down (the tab are highlighted and there is no slow down).

I noticed a slow down I can reproduce only happens in Firefox (playing a video or not), the other windows are refreshing properly.

I can’t reproduce on non-pinned tabs but I haven’t tried to unpin the YouTube tabs and attempt to reproduce the slow down, no message is shown in dmesg, I’ll try and launch Firefox from cmdline and see if there is anything, I have no clue also on how to investigate Firefox crash dumps but I have some.

I’ll have to check if I have the same behavior under Chromium based browser with YouTube tabs and the tab preview feature, hovering the cursor over multiples tabs sweeping across them.

EDIT (08/03/2025) :

Reproduced part of the issue but the gpu did recover (the screen flickered, went black during recovery) but I’m still on 6.13.4 with amdgpu.debugmask=0x10 :

[601581.219983] amdgpu 0000:c1:00.0: amdgpu: Dumping IP State
[601581.223062] amdgpu 0000:c1:00.0: amdgpu: Dumping IP State Completed
[601581.233116] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=46796963, emitted seq=46796965
[601581.233122] amdgpu 0000:c1:00.0: amdgpu: Process information: process .firefox-wrappe pid 3642 thread .firefox-w:cs0 pid 3730
[601581.233125] amdgpu 0000:c1:00.0: amdgpu: Starting gfx_0.0.0 ring reset
[601583.236959] amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=RESET
[601583.236967] [drm:amdgpu_mes_reset_legacy_queue [amdgpu]] *ERROR* failed to reset legacy queue
[601583.237166] amdgpu 0000:c1:00.0: amdgpu: Ring gfx_0.0.0 reset failure
[601583.237169] amdgpu 0000:c1:00.0: amdgpu: GPU reset begin!
[601585.288389] amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
[601585.288411] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[601585.551134] [drm:gfx_v11_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
[601585.553124] amdgpu 0000:c1:00.0: amdgpu: MODE2 reset
[601585.591467] amdgpu 0000:c1:00.0: amdgpu: GPU reset succeeded, trying to resume
[601585.592291] [drm] PCIE GART of 512M enabled (table at 0x00000080FFD00000).
[601585.592364] amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
[601585.595072] amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
[601585.601315] [drm] DMUB hardware initialized: version=0x08004B01
[601585.921724] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[601585.921730] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[601585.921732] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[601585.921734] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[601585.921735] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[601585.921737] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[601585.921738] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[601585.921740] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[601585.921741] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[601585.921743] amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[601585.921744] amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[601585.921746] amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[601585.921748] amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[601585.923402] amdgpu 0000:c1:00.0: amdgpu: GPU reset(6) succeeded!
[601585.950777] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!

Once I reboot with amdgpu.debugmask=0x12, I’ll see if I can reproduce at some point.

Once BIOS release 3.07 is stable, I’ll also have to try and reproduce any issues with amdgpu.debugmask=0x12,amdgpu.debugmask=0x10 and without I guess.

fish_177 · March 2, 2025, 5:22pm

Have you tried adding amdgpu.dcdebugmask=0x10 as a kernel parameter? I’ve needed this basically for every kernel since I think something in 6.10 or 6.11 to stop completely randomly occurring errors slowing the laptop input to a crawl until either reboot or force AMD GPU reset.

E: some bug reports that have been filed on the PSR thing:

efindus · March 2, 2025, 6:14pm

I’ve actually had more luck with amdgpu.dcdebugmask=0x12 (2+ weeks without the bug happening right now), if I recall correctly, I tried just 0x10 first and something was still broken.

PS. Just to demystify the number a little, it comes from this struct in the kernel code and is a hex sum. 0x12 disables memory stutter mode and PSR, 0x10 disables just PSR.

AkechiShiro · March 3, 2025, 11:38am

I do have the kernel parameter, to be more precise here is my kernel cmdline : amdgpu.dcdebugmask=0x10 amd_pstate=active rtc_cmos.use_acpi_alarm=1 loglevel=4

And I did have it on kernel 6.13.1 when I triggered the issue.

I’ll try to reproduce the issue with amdgpu.dcdebugmask=0x12

efindus · March 10, 2025, 9:36am

Have you been able to reproduce the issue since?

PS. Looking over your kernel parameters, I think the rtc_cmos one is no longer necessary on newer kernels (I haven’t been using it for a while with no issues, but I can’t find a source for this right now).

AkechiShiro · March 10, 2025, 6:46pm

I haven’t rebooted with amdgpu.dcdebugmask=0x12 but will do so soon and and will comment if I reproduce any issues I’ve had.

I will also remove that rtos parameter.

AkechiShiro · April 8, 2025, 8:01pm

Hey @efindus

I was able to reproduce with amdgpu.dcdebugmask=0x12 the MES issue :

[1067406.235675] amdgpu 0000:c1:00.0: amdgpu: Dumping IP State
[1067406.238089] amdgpu 0000:c1:00.0: amdgpu: Dumping IP State Completed
[1067406.248202] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=101872134, emitted seq=101872136
[1067406.248212] amdgpu 0000:c1:00.0: amdgpu: Process information: process .firefox-wrappe pid 3631 thread .firefox-w:cs0 pid 3957
[1067406.248216] amdgpu 0000:c1:00.0: amdgpu: Starting gfx_0.0.0 ring reset
[1067408.252057] amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=RESET
[1067408.252069] [drm:amdgpu_mes_reset_legacy_queue [amdgpu]] *ERROR* failed to reset legacy queue
[1067408.252527] amdgpu 0000:c1:00.0: amdgpu: Ring gfx_0.0.0 reset failure
[1067408.252532] amdgpu 0000:c1:00.0: amdgpu: GPU reset begin!
[1067410.475096] amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
[1067410.475109] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[1067410.681771] [drm:gfx_v11_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
[1067410.683674] amdgpu 0000:c1:00.0: amdgpu: MODE2 reset
[1067410.728232] amdgpu 0000:c1:00.0: amdgpu: GPU reset succeeded, trying to resume
[1067410.728618] [drm] PCIE GART of 512M enabled (table at 0x00000080FFD00000).
[1067410.728643] amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
[1067410.728936] amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
[1067410.738243] [drm] DMUB hardware initialized: version=0x08004D00
[1067411.422627] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[1067411.422639] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[1067411.422643] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[1067411.422646] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[1067411.422649] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[1067411.422652] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[1067411.422655] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[1067411.422658] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[1067411.422661] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[1067411.422664] amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[1067411.422667] amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[1067411.422670] amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[1067411.422673] amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[1067411.424948] amdgpu 0000:c1:00.0: amdgpu: GPU reset(2) succeeded!
[1067411.479232] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[1067615.641387] amdgpu 0000:c1:00.0: amdgpu: Dumping IP State
[1067615.644236] amdgpu 0000:c1:00.0: amdgpu: Dumping IP State Completed
[1067615.654262] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=101973553, emitted seq=101973555
[1067615.654272] amdgpu 0000:c1:00.0: amdgpu: Process information: process .firefox-wrappe pid 3631 thread .firefox-w:cs0 pid 3957
[1067615.654278] amdgpu 0000:c1:00.0: amdgpu: Starting gfx_0.0.0 ring reset
[1067617.658116] amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=RESET
[1067617.658128] [drm:amdgpu_mes_reset_legacy_queue [amdgpu]] *ERROR* failed to reset legacy queue
[1067617.658621] amdgpu 0000:c1:00.0: amdgpu: Ring gfx_0.0.0 reset failure
[1067617.658626] amdgpu 0000:c1:00.0: amdgpu: GPU reset begin!
[1067619.888808] amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
[1067619.888820] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[1067620.098275] [drm:gfx_v11_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
[1067620.107770] amdgpu 0000:c1:00.0: amdgpu: MODE2 reset
[1067620.147468] amdgpu 0000:c1:00.0: amdgpu: GPU reset succeeded, trying to resume
[1067620.148613] [drm] PCIE GART of 512M enabled (table at 0x00000080FFD00000).
[1067620.148662] amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
[1067620.150481] amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
[1067620.157176] [drm] DMUB hardware initialized: version=0x08004D00
[1067620.846328] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[1067620.846339] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[1067620.846343] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[1067620.846346] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[1067620.846349] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[1067620.846352] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[1067620.846354] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[1067620.846357] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[1067620.846360] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[1067620.846363] amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[1067620.846366] amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[1067620.846369] amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[1067620.846372] amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[1067620.848664] amdgpu 0000:c1:00.0: amdgpu: GPU reset(4) succeeded!
[1067620.900897] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!

Additional information :

Firefox black screen or cursors leaves afterimages (after GPU reset recovers and then the afterimages of the cursor stops at some point/the window properly starts to render again)
/nix/store/70r77bb25qmfrjn7mwnzd4mzzp303ld7-mesa-24.2.8
/nix/store/ijaj1g829viydqk8lyajczak554h6zr0-linux-firmware-20250311-zstd
/nix/store/vwqkb8xx9vjg1jycp6k5732aw2ip9vbr-firefox-136.0.3
proc/cmdline: initrd=\EFI\nixos\d14nn2i3skrl1m1lksz8vjfka9cnz2lh-initrd-linux-6.13.7-initrd.efi init=/nix/store/6arhxi68iiy118rd4b4pflmbj4a45rgd-nixos-system-Hostname-24.11.715908.7105ae395770/init amdgpu.dcdebugmask=0x12 amd_pstate=active loglevel=4
Linux 6.13.7 #1-NixOS SMP PREEMPT_DYNAMIC Thu Mar 13 12:08:08 UTC 2025 x86_64 GNU/Linux

I’m just noting the Nix store path to see if they are rebuild (change in a dependency or a patch could be backported for instance) later.

efindus · April 11, 2025, 3:59pm

That is very curious… I guess one more datapoint but it doesn’t solve your problem. I haven’t gotten one of these myself since setting the parameter. Right now I’m on 17 days of uptime so it shouldn’t be that either. The only meaningful difference I can spot between our setups is that I am currently on 25.0.1 of mesa.

My versions for the record:

linux-firmware-20250311.b69d4b74-2
mesa-1:25.0.1-2
firefox-developer-edition-137.0b5-1
kwin-6.3.3.1-1
Linux 6.13.6-arch1-1 #1 SMP PREEMPT_DYNAMIC Fri, 07 Mar 2025 20:19:00 +0000 x86_64 GNU/Linux

I’m also using Firefox Developer Edition but if anything that would be more unstable. Aside from Firefox video playback the other most consistent source of crashes for me has been playing Cyberpunk 2077 when my RAM wasn’t all basically free. I have since played with no issues for over 30h even when starting it when my ram usage was 24/32GB (I have additionally 16GB of swap).

AkechiShiro · April 14, 2025, 7:42pm

I’ll bump to (most likely) mesa 25.0.3 the 31 May 2025, I’ll try to reproduce any of the issues but I think it’s most likely a mesa bug if you can’t reproduce it on your side.

I don’t think it’s something specific to Firefox.

AkechiShiro · September 25, 2025, 11:56am

I wasn’t able to reproduce any of the above-mentioned issue with 0x12 for awhile now, probably since the MESA bump, any issue is gone.

I will move back to 0x10 and drive that for 6 months or so and report if I see any issue show up.

EDIT : At the time, this comment was made I’m running MESA 25.0.7 & Linux kernel 6.16.8 with in my kernel cmdlineamdgpu.dcdebugmask=0x12, BIOS version 3.05

rozap · September 25, 2025, 6:20pm

Thanks for the details on which versions you’re running when. I’m struggling a lot with this issue.

I’m on Mesa 25.0.7, kernel 6.14.11, and with `amdgpu.dcdebugmask=0x10`i still get around 5 crashes per day. Kernel 6.16.8 kernel panics on boot, so I’m not sure what’s up there. Trying 0x12 (again) with updated Mesa but I don’t have high hopes that it’ll fix the issue.

AkechiShiro · September 26, 2025, 7:49am

Could you send your logs from the kernel panic ?

And the logs from the issue you’re running into (AMDGPU or sudo dmesg | grep -E '(amd|amdgpu|oops)') ?

You can also check your journalctl -b 1 for the last logs from the previous boot in case you had to hard reboot your device.

Just curious and also your BIOS version is pretty important because AGESA could have been bumped/updated with new fixes or could have introduced regression.

However, I believe AMD has testing platforms provided by OEMs where they do testing for AGESA before rolling out, but they’re not always exactly the same hardware as what we’re using on our side as customers (I’ve ran into an issue with a Thinkpad Lenovo where AMD cannot reproduce on the testing platform but my laptop is barely usable and my issue has become more and more reproductible with time, the GPU issues are happening on Windows and on Linux, Windows is more stable however but some issues look the same under Windows or Linux, the screen becomes all gray, the only way to recover is to suspend/resume under Linux or Windows or on Linux trigger and AMDGPU reset from ssh).

I might be wrong regarding the above, especially for Framework Laptops, as maybe some AMD employees have as their personal laptop a Framework laptop.

I would be curious also if your Framework motherboard SKU is the same as mine (and maybe the revision of the hardware too, but I don’t know if they were multiples ones, someone from Framework could probably pitch in to let us know) :

I have the following :

sudo dmidecode | grep -i sku
SKU Number: FRANDGCP07

Note regarding the kernel oops, just FYI : https://unix.stackexchange.com/questions/91854/whats-the-difference-between-a-kernel-oops-and-a-kernel-panic

rozap · October 1, 2025, 4:04pm

So, good news. I was previously getting maybe 5-10 crashes per day on kernel 6.14. I tried a few other kernels with no luck. This is all with dcbdebugmask=0x10 and mesa 25.0.4. But I installed 6.15.10-061510-generic a few days ago and haven’t had a crash since then. Fingers crossed, but it seems that may have fixed it.

Bios and SKU

chris@gribl:~$ sudo dmidecode -s bios-version
03.03
chris@gribl:~$ sudo dmidecode  | grep -i sku
	SKU Number: FRANVECP07
	SKU Number: FRANVECP07

(very abbreviated) Logs from one of the previous crashes

Sep 22 21:03:19 gribl systemd-resolved[1297]: Using degraded feature set UDP instead of UDP+EDNS0 for DNS server 192.168.1.1.
Sep 22 21:04:49 gribl kernel: amdgpu 0000:c1:00.0: amdgpu: Dumping IP State
Sep 22 21:04:49 gribl kernel: amdgpu 0000:c1:00.0: amdgpu: Dumping IP State Completed
Sep 22 21:04:49 gribl kernel: amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 timeout, signaled seq=378562, emitted seq=378563
Sep 22 21:04:49 gribl kernel: amdgpu 0000:c1:00.0: amdgpu: Process information: process RDD Process pid 4294 thread browser 4 :cs0 pid 6729
Se[01;31me[Kep 22 21:04:49 gribl kernel: amdgpu 0000:c1:00.0: amdgpu: GPU reset begin!e[me[K
Sep 22 21:04:49 gribl kernel: amdgpu 0000:c1:00.0: amdgpu: MODE2 reset
Sep 22 21:04:49 gribl kernel: amdgpu 0000:c1:00.0: amdgpu: GPU reset succeeded, trying to resume
Sep 22 21:04:49 gribl kernel: [drm] PCIE GART of 512M enabled (table at 0x0000008000F00000).
Sep 22 21:04:49 gribl kernel: amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
Sep 22 21:04:49 gribl kernel: amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
Sep 22 21:04:49 gribl kernel: [drm] DMUB hardware initialized: version=0x09001200
Sep 22 21:04:50 gribl kernel: amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
Sep 22 21:04:50 gribl kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Sep 22 21:04:50 gribl kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Sep 22 21:04:50 gribl kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
Sep 22 21:04:50 gribl kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
Sep 22 21:04:50 gribl kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
Sep 22 21:04:50 gribl kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
Sep 22 21:04:50 gribl kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
Sep 22 21:04:50 gribl kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
Sep 22 21:04:50 gribl kernel: amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
Sep 22 21:04:50 gribl kernel: amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
Sep 22 21:04:50 gribl kernel: amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec_0 uses VM inv eng 1 on hub 8
Sep 22 21:04:50 gribl kernel: amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
Sep 22 21:04:50 gribl kernel: amdgpu 0000:c1:00.0: amdgpu: ring vpe uses VM inv eng 4 on hub 8
Sep 22 21:04:50 gribl kernel: amdgpu 0000:c1:00.0: amdgpu: GPU reset(1) succeeded!
Sep 22 21:04:50 gribl kernel: ------------[ cut here ]------------
Sep 22 21:04:50 gribl kernel: WARNING: CPU: 4 PID: 24057 at amd/amdgpu/../display/dc/dml2/dml2_translation_helper.c:996 map_dc_state_into_dml_display_cfg+0x1630/0x3940 [amdgpu]
Sep 22 21:04:50 gribl kernel: Modules linked in: ccm snd_seq_dummy snd_hrtimer rfcomm cmac algif_hash algif_skcipher af_alg qrtr bnep snd_ctl_led binfmt_misc nls_iso8859_1 amd_atl intel_rapl_msr intel_rapl_common snd_acp_legacy_mach snd_acp_mach snd_soc_nau8821 snd_acp3x_rn snd_acp70 snd_acp_i2s snd_soc_dmic snd_acp_pdm snd_acp_pcm snd_sof_amd_acp70 snd_sof_amd_acp63 snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir snd_hda_codec_realtek snd_sof_amd_acp snd_sof_pci snd_hda_codec_generic snd_sof_xtensa_dsp edac_mce_amd leds_cros_ec snd_hda_scodec_component snd_sof cros_ec_chardev cros_kbd_led_backlight led_class_multicolor cros_ec_hwmon cros_ec_sysfs cros_ec_debugfs snd_sof_utils gpio_cros_ec snd_hda_codec_hdmi snd_pci_ps snd_soc_acpi_amd_match snd_amd_sdw_acpi soundwire_amd snd_hda_intel soundwire_generic_allocation kvm_amd snd_intel_dspcfg soundwire_bus snd_intel_sdw_acpi mt7925e spd5118 snd_soc_sdca cros_ec_dev snd_hda_codec mt7925_common snd_usb_audio snd_soc_core btusb kvm mt792x_lib hid_sensor_als btrtl mt76_connac_lib
Sep 22 21:04:50 gribl kernel:  snd_hda_core hid_sensor_trigger snd_usbmidi_lib snd_compress btintel industrialio_triggered_buffer ac97_bus snd_hwdep mt76 kfifo_buf btbcm snd_ump irqbypass snd_pcm_dmaengine hid_sensor_iio_common snd_rpl_pci_acp6x btmtk rapl industrialio uvcvideo bluetooth mac80211 snd_seq_midi videobuf2_vmalloc uvc snd_seq_midi_event snd_acp_pci videobuf2_memops videobuf2_v4l2 snd_acp_legacy_common wmi_bmof snd_rawmidi snd_pci_acp6x snd_seq videobuf2_common i2c_piix4 k10temp snd_pcm i2c_smbus videodev snd_seq_device snd_pci_acp5x mc snd_timer cfg80211 amd_pmf snd_rn_pci_acp3x snd_acp_config amdxdna amdtee gpu_sched snd libarc4 snd_soc_acpi amd_sfh soundcore ccp snd_pci_acp3x tee platform_profile cros_ec_lpcs cros_ec amd_pmc joydev input_leds mac_hid sch_fq_codel msr parport_pc ppdev lp parport efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 dm_crypt hid_logitech_hidpp hid_logitech_dj uas usb_storage usbhid amdgpu(OE) amddrm_ttm_helper(OE) amdttm(OE) amddrm_buddy(OE) amdxcp(OE) drm_exec drm_suballoc_helper
Sep 22 21:04:50 gribl kernel:  amd_sched(OE) amdkcl(OE) drm_display_helper nvme cec nvme_core rc_core polyval_clmulni i2c_algo_bit polyval_generic ucsi_acpi drm_ttm_helper ghash_clmulni_intel hid_multitouch hid_sensor_hub hid_generic thunderbolt sha256_ssse3 typec_ucsi sha1_ssse3 serio_raw ttm nvme_auth typec video i2c_hid_acpi i2c_hid wmi hid aesni_intel crypto_simd cryptd
Sep 22 21:04:50 gribl kernel: CPU: 4 UID: 0 PID: 24057 Comm: kworker/u64:9 Tainted: G           OE      6.14.11-061411-generic #202506101206
Sep 22 21:04:50 gribl kernel: Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Sep 22 21:04:50 gribl kernel: Hardware name: Framework Laptop 13 (AMD Ryzen AI 300 Series)/FRANMGCP07, BIOS 03.03 03/10/2025
Sep 22 21:04:50 gribl kernel: Workqueue: events_unbound commit_work
Sep 22 21:04:50 gribl kernel: RIP: 0010:map_dc_state_into_dml_display_cfg+0x1630/0x3940 [amdgpu]
Sep 22 21:04:50 gribl kernel: Code: 01 48 83 bc 24 b8 00 00 00 06 0f 83 6b 0b 00 00 48 8b 44 24 48 8b bc 24 b0 00 00 00 39 78 38 0f 8f eb f6 ff ff e9 d7 f5 ff ff <0f> 0b e9 c0 fb ff ff 41 83 fe 13 0f 87 ed 02 00 00 41 83 fe 11 0f
Sep 22 21:04:50 gribl kernel: RSP: 0018:ffffd115c7c8f630 EFLAGS: 00010246
Sep 22 21:04:50 gribl kernel: RAX: 0000000000000006 RBX: 0000000000000000 RCX: ffff8a44e4406128
Sep 22 21:04:50 gribl kernel: RDX: ffff8a44e44062f8 RSI: 0000000000000005 RDI: ffff8a44e44072b8
Sep 22 21:04:50 gribl kernel: RBP: ffffd115c7c8f728 R08: ffff8a44a1899a28 R09: ffff8a443a2c0000
Sep 22 21:04:50 gribl kernel: R10: 0000000000000000 R11: ffff8a44a1899a28 R12: ffff8a44a1899a28
Sep 22 21:04:50 gribl kernel: R13: 0000000000000000 R14: ffff8a415cd4e800 R15: ffff8a44a1890000
Sep 22 21:04:50 gribl kernel: FS:  0000000000000000(0000) GS:ffff8a505e400000(0000) knlGS:0000000000000000
Sep 22 21:04:50 gribl kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 22 21:04:50 gribl kernel: CR2: 00007fdd9157cfc0 CR3: 00000001b0811000 CR4: 0000000000f50ef0
Sep 22 21:04:50 gribl kernel: PKRU: 55555554
Sep 22 21:04:50 gribl kernel: Call Trace:
Sep 22 21:04:50 gribl kernel:  <TASK>
Sep 22 21:04:50 gribl kernel:  dml_mode_support_wrapper+0x146/0x1010 [amdgpu]
Sep 22 21:04:50 gribl kernel:  ? resource_build_scaling_params+0x6ca/0xa20 [amdgpu]
Sep 22 21:04:50 gribl kernel:  call_dml_mode_support_and_programming+0x85/0x9f0 [amdgpu]
Sep 22 21:04:50 gribl kernel:  ? __lruvec_stat_mod_folio+0xc7/0xf0
Sep 22 21:04:50 gribl kernel:  dml2_validate+0x172/0x720 [amdgpu]
Sep 22 21:04:50 gribl kernel:  dcn35_validate_bandwidth+0x37/0x90 [amdgpu]
Sep 22 21:04:50 gribl kernel:  update_planes_and_stream_state+0x25f/0x5d0 [amdgpu]
Sep 22 21:04:50 gribl kernel:  update_planes_and_stream_v2+0x494/0x790 [amdgpu]
Sep 22 21:04:50 gribl kernel:  dc_update_planes_and_stream+0x84/0x120 [amdgpu]
Sep 22 21:04:50 gribl kernel:  amdgpu_dm_atomic_commit_tail+0x19d5/0x42c0 [amdgpu]
Sep 22 21:04:50 gribl kernel:  ? __pfx_amdgpu_crtc_get_scanout_position+0x10/0x10 [amdgpu]
Sep 22 21:04:50 gribl kernel:  ? amdgpu_crtc_get_scanout_position+0x27/0x50 [amdgpu]
Sep 22 21:04:50 gribl kernel:  ? drm_crtc_vblank_helper_get_vblank_timestamp_internal+0x161/0x3b0
Sep 22 21:04:50 gribl kernel:  ? __wait_for_common+0x158/0x190
Sep 22 21:04:50 gribl kernel:  ? __pfx_schedule_timeout+0x10/0x10
Sep 22 21:04:50 gribl kernel:  ? drm_crtc_get_last_vbltimestamp+0x54/0x90
Sep 22 21:04:50 gribl kernel:  ? wait_for_completion_timeout+0x1d/0x30
Sep 22 21:04:50 gribl kernel:  commit_tail+0xca/0x1b0
Sep 22 21:04:50 gribl kernel:  ? __schedule+0x2ca/0x660
Sep 22 21:04:50 gribl kernel:  commit_work+0x12/0x20
Sep 22 21:04:50 gribl kernel:  process_one_work+0x174/0x350
Sep 22 21:04:50 gribl kernel:  worker_thread+0x34a/0x480
Sep 22 21:04:50 gribl kernel:  ? _raw_spin_lock_irqsave+0xe/0x20
Sep 22 21:04:50 gribl kernel:  ? __pfx_worker_thread+0x10/0x10
Sep 22 21:04:50 gribl kernel:  kthread+0xf9/0x230
Sep 22 21:04:50 gribl kernel:  ? __pfx_kthread+0x10/0x10
Sep 22 21:04:50 gribl kernel:  ret_from_fork+0x44/0x70
Sep 22 21:04:50 gribl kernel:  ? __pfx_kthread+0x10/0x10
Sep 22 21:04:50 gribl kernel:  ret_from_fork_asm+0x1a/0x30
Sep 22 21:04:50 gribl kernel:  </TASK>
Sep 22 21:04:50 gribl kernel: ---[ end trace 0000000000000000 ]---
...snip....
Sep 22 21:05:00 gribl kernel: amdgpu 0000:c1:00.0: [drm] *ERROR* [CRTC:89:crtc-1] flip_done timed out

AkechiShiro · October 9, 2025, 10:22am

Maybe you should upgrade to the latest BIOS (or 3.05 at least) and attempt to reproduce because I’m not running into this issue on my end (with the amd debug dc mask set to 0x10 or 0x12 on kernel 6.17.X nor have ran into them using kernel 6.14.X)

lroeper · January 6, 2026, 5:12pm

Hey there,

I’m experiencing the same issue (as described in the first article), currently on Fedora 43 with Kernel 6.17.12 using a Framework Laptop 13 (AMD Ryzen 7040) on BIOS v03.17 with 32 GB of RAM. I’m not using any custom Kernel parameters (and not really planning to on this system).

I experienced this issue with and without my Dell WD22TB4 Dock connected (with a Dell Display connected via DP connected to it) but especially with the Dock and Display connected I observed the following behavior:

behavior description

The internal display freezes completely while the external display lags heavily for a few seconds
The kernel/amdgpu driver writes messages like
1. kernel: amdgpu 0000:c1:00.0: [drm] ERROR dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
2. kernel: amdgpu 0000:c1:00.0: [drm] ERROR [CRTC:80:crtc-0] flip_done timed out
Other Applications crash or report display errors
1. gnome-shell[3624]: meta_wayland_surface_role_notify_subsurface_state_changed: assertion ‘klass->notify_subsurface_state_changed’ failed
2. systemd-coredump[93154]: Process 4087 (gnome-software) of user 60456 terminated abnormally with signal 6/ABRT, processing…
3. and others
The internal display stays frozen, the external display recovers but shell elements are missing (most likely cannot be drawn), not-crashed applications continue to work
After about a minute the external display lags heavily again for a few seconds and the same kernel messages reappear, some applications may crash again
The system now stays in this state, (re)connecting displays causes them to no longer work and at some point you try to access something (for ex. a shell auth prompt) that no longer works
Resetting the amdgpu driver using /sys/class/drm/cardX/device/reset is confirmed by the kernel (kernel: amdgpu 0000:c1:00.0: resetting and kernel: amdgpu 0000:c1:00.0: reset done) but does not change the situation
Performing a clean shutdown is not possible, even with commandline access as the system never fully shuts down

Without the Dell dock connected the behavior is essentially the same (including the kernel messages) but as there is no external display the internal display just freezes and only a hard reset is possible.

The issue seems to appear about once a day or every few days currently and for me it’s not connected to hardware accelerated video playback (like in a browser). This install (storage and ram) was inside a Framework 16 Laptop (First Gen) until a few days ago and didn’t have this issue then.

Now again, I do not have any custom kernel commandline parameters.
However I do find it odd that I do not get a kernel oops or panic from the amdgpu issues or are the above named kernel commandline parameters required to see these messages?

I’m currently not really inclined to manually switch kernel versions or reinstall the system but are there any easy ways to further debug this issue?

Thanks in advance

AkechiShiro · January 11, 2026, 2:23pm

Hey,

I cannot help with debugging but you can file upstream a bug report at Making sure you're not a bot!

Mind that your bug may already be reported, please check as well as if you wish to work around your issue, the recommended workaround is to add amd.debugdcmask=0x10 to your kernel cmdline (this will disable PSR : Panel Self Refresh), the driver is much, much more stable).

I haven’t run into any issue after the reported one so far and I’m daily driving kernel 6.18.1 currently.

I’m a bit sad that this issue is still happening under latest kernel and one of the latest BIOSes, it seems it is pretty hard to fix properly once and for all in the AMDGPU drivers (by looking at the amount of issues in the issue tracker currently)

This depends on how much your laptop has froze during the issue, if it is completely frozen, it may be that the logs couldn’t be written to disk and thus are not available after a forced reboot, you can try to enable an ssh server to see if you can still get into the system despite a crashed GPU driver/laptop seems frozen.

But IMO you should have logs in dmesg, if you don’t there are ways to further acquires logs but AMD drivers engineers will be more able to help you, as I don’t have much experience with debugging such issues.

Lukas_Eickhoff · February 21, 2026, 9:26pm

Hey, I do have the same issue I think. Pretty randomly my screen just freezes. The logs show: gnome-shell[2830]: g_source_get_id: assertion ‘source->context != NULL’ failed

and later

amdgpu 0000:c1:00.0: [drm] ERROR dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
kernel: amdgpu 0000:c1:00.0: [drm] ERROR dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
kernel: amdgpu 0000:c1:00.0: [drm] ERROR [CRTC:80:crtc-0] flip_done timed out

I am on Ubuntu 24.04 bios 3.18 latest firemware installed and kernel Linux 6.17.0-14-generic Framework Laptop 13 AMD Ryzen 7040Series

Will try the boot parameter. But are there any infos about extra power consumption?

Edit: I had aspm turned off, before because of SSD issues. Since I turned it back on and dissabled it only for the SSD the issue mentioned above appeared.. is aspm so buggy or is this just bad luck ?

Topic		Replies	Views
NixOS AMD Framework 13th AMD Ryzen 7 7840U/64GB Framework DDR5 RAM/UMA settings gamer on kernels 6.11.X have random heavy lags related to AMDGPU or possibly firmware Linux nixos , other-distro	3	1065	March 3, 2025
FW16 Screen Freezeing on Linux Linux arch	58	1616	June 7, 2026
Amdgpu Error queueing DMUB command: status=2 when waking from suspend Linux opensuse	74	8918	November 10, 2025
Laptop 13 AMD Ryzen 7 7840U Ubuntu Freezing Linux ubuntu	8	576	September 28, 2025
Framework 13 amd gpu crash debian Linux debian	3	885	May 8, 2025

DCMUB Error on BIOS 3.05 + Kernel 6.13.1 hit a very nasty AMDGPU bug on Framework Laptop 13 (AMD Ryzen 7 7840U)

Additional information :

Related topics