Amdgpu Error queueing DMUB command: status=2 when waking from suspend

I’m using Gentoo with KDE Plamsa 6.1.4 Wayland with this linux firmware commit with BIOS 3.04beta

I’ve been experiencing this PSR issue since Linux 6.7.x ( I don’t remember 6.6.x ) and used 6.8.x, 6.9.x, 6.10.x

I decided to try Linux 6.11.0 sooner

Unfortunately the PSR issue is still there and I’m sure that any application that’s using Xwayland ( Linux 6.10.x and now 6.11.0 ) has a higher chance of triggering the PSR issue, especially when it comes to video ( youtube, reddit, twitch )

I’m having to use disable Wayland usage in Firefox because of high CPU usage.

The PSR issue was also being triggered when using JetBrains editors until I enabled Wayland use in them and the PSR issues reduced significantly.

There have been some improvements during the PSR issue, the screen updates more frequently ( 0.5-1.0 second(s) ) instead every 3-5 seconds ( closer to 3 ).

I’ve tried the following command when on the KDE Plasma desktop and again logging out to the TTY

sudo cat /sys/kernel/debug/dri/1/amdgpu_gpu_recover

and both times the screen goes black ( turns off I guess ) but never comes back on, so I’m having to cold reboot ( holding down power button for about 8 seconds and then turn back on again )

and I have the kernel command line parameter

amdgpu.gpu_recovery=1

set too. I think this is required for the amdgpu_gpu_recover to work

The kernel log ( different error to Linux 6.10 ) was showing alot ( not as many as Linux 6.10 )

amdgpu 0000:c2:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data

I’m wondering if there is a log stored somewhere

and one time I see ( in middle of previous error messages )

amdgpu 0000:c2:00.0: [drm] *ERROR* [CRTC:79:crtc-0] flip_done timed out

and one time I see ( in middle of previous error messages )

amdgpu 0000:c2:00.0: [drm] enabling link 0 failed: 15

When Linux 6.12-rc1 is released ( 29th/30th September maybe? ), I think I’ll try that as it will have Panel Replay ( I’m interpreting it as a better alternative to PSR ) enabled and see how that goes.

1 Like

So further up in the thread some people were hinting at that maybe the auto-brightness might be part of what ends up triggering this.
I was having this issue suddenly multiple times per week, which was probably roughly around when I enabled auto-brightness in gnome.
I turned it off about 1.5 weeks ago and it hasn’t happened a single time since.

Could of course be completely incidental, but worth trying for anyone that might be really disrupted by when this happens :person_shrugging:

Yeah I’ve mentioned that too earlier in this thread

1 Like

Well after fading into the background for month or two this came back. I doubt I’ll see it again for another two months.

The same was for me but it was happening about 1-2 times a month but since Linux 6.10.10 ( I’m now using 6.11.0 ), it’s been happening to me multiple times day ( 2-4 ) and so I just disabled PSR for now using kernel parameter

amdgpu.dcdebugmask=0x210

which disable PSR and PSR-SU ( although FW16 displays supports PSR-SU, amdgpu is not using it ).

I just got hit with this for the second time, and had to reboot the first time. I’m running Fedora 40 KDE spin on kernel 6.10.11, on a 7840HS system with no dGPU at all, and no support for auto brightness in KDE (to the best of my knowledge). The suggested amdgpu_gpu_recover file causes the screen to go black and never return, forcing a hard reboot. I have the exact same errors as @sinatosk above, but this only started for me on this kernel version (I’ve yet to try downgrading). Increased power usage is already somewhat unacceptable for me as I’m trying to ensure the longest battery life possible right now, so the PSR and PSR-SU disable kernel command line isn’t a fix for me.

Ok, I was so bummed out that my expensive laptop purchase has this issue because other than that I love this laptop!

My experience/information so far:

I don’t have a discrete CPU and I don’t often use suspend. this happens while the system is running.

At very random times (I cannot pinpoint it to high demand times, it’s literally just at random moments while I’m using the laptop) the system goes into a state where everything is slowed down, I get 1 FPS while moving the cursor or typing or anything like that, 2-3 seconds delay between typing something and it appearing on the screen. I switch to TTY3 and try closing the display manager, seeing CPU usage (not high at all, it’s very low, so it’s not the CPU being stressed). And my dmesg is flooded with such messages:

Οκτ 02 23:55:17 framework kernel: amdgpu 0000:c1:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data

I can open programs but everything is slow af. If I try to open an mp3 with mpv while in the TTY, it will take it’s sweet time to start playing, but once it opens after like a minute of waiting, the sound plays normally without distortion.

A lot of searching later.

I come upon this thread. :slight_smile:
I try running the command mentioned in here while I’m in the TTY:

sudo cat /sys/kernel/debug/dri/1/amdgpu_gpu_recover

The screen goes black and never comes back. I’m forced to REISUB
(Side Note: this took me a long time to find but You can do SysRQ on the Framework 16 by pressing together Fn + Alt + F11. then you can keep holding the alt while you successively type R E I S U B. Also for the magic combo to work you’ll need to have sysrq_always_enabled=1 in your kernel parameters)

I reboot.
I see the post above suggesting that gpu_recovery option of amdgpu driver might need to be set to 1 for the recover command to work properly.

I make the file /etc/modprobe.d/amdgpu.conf:

options amdgpu gpu_recovery=1

Regenerate initramfs.
Reboot again.

Now I try the recover command again. this time it works. The screen goes black and comes back again. But I have not yet tested it in the real world freeze scenario.

I will make a bash alias for this command so that I can run it easily the next time the semi-freeze happens

I’ll also keep in mind the other workarounds like disabling PSR as @sinatosk mentioned. Really helpful stuff.

I’m relieved to not be alone in this.

I am missing something - following the above steps, the cat command simply returns 0.

Edit to add - with the dGPU installed it is 1, the other one is 2, so the command that worked for him was:

sudo cat /sys/kernel/debug/dri/2/amdgpu_gpu_recover

Yeah before it was like 1-2 times a month but then me since Linux 6.10.10 ( later upgraded to 6.11.0 ), this went up to 1-4 times a day, so I just disabled PSR and PSR-SU

amdgpu.dcdebugmask=0x210

I’ve been on 6.12-rc1 with PSR, PSR-SU and PR ( Panel Replay ) enabled all is good so far except this but they’re ( AMD ) working on it. I don’t do much gaming and I use my screen at 165Hz so it’s only when something is at lower than 70 fps am I affected by this

==== offtopic
6.12-rc1 average power usage has dropped to where I’m getting 30-120 minutes more battery life

Ok so this just happened while watching a video and the recover command worked. @sinatosk I might try your debugmask config too.

Is this bug we’re experiencing known to the kernel devs?

EDIT: Seems like they are because of AMDGPU crash Error queuing DMUB command: status=2, Error waiting for DMUB idle: status=3 (#2862) · Issues · drm / amd · GitLab and similar issues

So you’re syaing if I run 6.12 I don’t need to disable PSR?

No, I’m saying

and running 6.12 as a release candidate is riskier but I’m taking that risk, better than having to deal with the crap bugs waiting for AMD to fix it, I can instead give them feedback ( I should have done RC sooner… )

Yeah I see the issue you mention but that only happens for under 70 fps.

so for normal operation on 6.12 you don’t have the freeze happening?

I’ve had 6.12 for 1 1/2 weeks now and I also think the issue hasn’t appeared since.

I think it’s too soon to tell but usually by now I would have seen the issue ( especially since 6.10.10 ).

Since 6.12-rc1 ( using 6.12-rc2 now), issue hasn’t shown and I’ve had PSR/PSR-SU, PR and IPS enabled and it’s not happened yet.

Since 6.12 ( and 6.11.3, @Mario_Limonciello say’s it’s been enabled there too ), AMD have enabled PR again. PSR and PR are mutually exclusive and PR has higher priority.

I’ll give it until end of rc4/rc5 before I consider this issue is gone as it’s used to happen with me at least once a month which is still time before 6.12 becomes stable

but I’m hoping that other issue I and someone else reported, AMD can fix that before 6.12.0 then I would say 6.12.0 gonna be awesome ( for atleast FW13/16 users )

======== offtopic
I’m loving this average lower power consumption. I’m seeing 5.9-6.0Wh in the evening ( screen brightness between 1-5% ) with ABM set to 2 and monitor at 165Hz

JFYI, panel replay is also enabled in 6.11.3

2 Likes

Huh. This just happened to me again on 6.12rc2 :frowning:

This time was different.
The screen went black and I could not do anything, no tty change, no commands.
REISUB.
journanctl -b -1 and towards the end I see the familiar
amdgpu 0000:c1:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data

So I will be disabling PSR again.

If your on Framework 16 and using Linux 6.11.3/6.12-rc1 or greater then PSR is already inactive as amdgpu is using PR instead ( assuming you’ve not set amdgpu.dcdebugmask kernel parameter )

I I have interesting news. Looking at the log again, somehow I had booted to 6.11.2

That’s good news, means 6.12 still works.

I’ll find out why my mainline got nuked.

And you guys are saying we can use 6.11.3+ as well to avoid the problem

I missed it but yeah

So…

PSR - Panel Self Refresh
PSR2 ( PSR-SU ) - Panel Self Refresh 2 - only changed parts of the frame buffer get updated on the display - not sure what “SU” means, sub update maybe?
PR - Panel Replay

Using

I’ve been using 6.12 rc1-rc4 ( just updated to rc5 this morning but this message not accounting for that )

I’ve tried many times to trigger this issue

  • Firefox, switching tabs ( some with and without videos )
  • loading and watching videos ( VLC, MPV )
  • Using JetBrains RustRover ( this was where I first experienced this issue and happens less when using Wayland instead of Xwayland )
  • Changing screen brightness manually or automatically. I’ve written some code ( Rust ) that changes the screen brightness levels relative to the ambient light sensor
  • Change the frequency ( the code ) of brightness levels adjustments ( currently capped 1 millisecond )
  • Sometimes just starting KDE Plasma would cause the issue ( infuriating )

There are other things that I can somewhat trigger this but the most common ( for me ) was when using applications that made use of Xwayland

nothing so far has triggered the PSR issue for me and I’ve had PSR, PSR-SU ( I’m not quite sure if FW16 panel supports this ) and PR enabled since 6.12-rc1

Looking at the path /sys/kernel/debug/dri/0/eDP-1/ ( you may need to be a super user to read this directory ) at the files that begin with psr_ ( for PSR ) show that it’s not enabled and files that begin with replay_ ( for PR ) are enabled and in use ( PSR or PR is enabled, not both ).

With all this said ( kinda wasteful me saying all this eh? ), it’s possible the PSR issues are still there as I’ve never disabled PR to see how PSR works as I’m preferring PR.

The only negative at the moment is this issue (link in previous post) but I don’t think it will be around for long. I’m hoping they ( AMD ) can fix it before 6.12 is stable and I think they’ll be able to backport those changes to 6.11.z

I’m now using 6.12-rc5 and intend to stay with the rest of 6.12 rc’s and into stable

============ off topic

  • Average battery life during usage ( based on my usage ) has increase by 30-90 minutes

  • I don’t remember if it was 6.11 or 6.12 this was added but you can now see the temperatures of each memory module

  • I’ve not tried battery controlling yet ( 6.11 or 6.12 I don’t remember ), some others have