Amdgpu Error queueing DMUB command: status=2 when waking from suspend

Is there any ETA on getting this fixed?

I would consider this sort of issue to be of high priority, it’s not good for a latop to become unusuable like this. What if this happens in the middle of a critical business presentation or a remote job interview?

@Vadim_Peretokin , this is an issue with the AMD Linux driver provided by the kernel. It is not provided by Framework. You have a few choices… Supposedly the newer AMD drivers have fixes to alleviate this issue. So, you can wait for the kernel to get the latest driver version. Depending on the Linux flavor you use this can take a long time. Or another option is to roll your own kernel and bring in the latest AMD driver. But there isn’t much for Framework to do here.

Personally, I removed the external GPU on my Framework and that seemed to help. People also have turned off auto brightness settings which helps alot. There is also a way to mitigate the problem by resetting the driver when the issue pops up.

as a mitigation until a fix comes later, you can add to the kernel command line

amdgpu.dcdebugmask=0x210

which disables PSR and PSR-SU within amdgpu

here ( linux 6.11.0 ) between lines 254-267 are a list of other flags you can enable/disable

1 Like

Thanks! What does one lose in practice by disabling it?

I’ve noticed power usage is slightly higher ( 0.8-1.2W~ )

I’m using Gentoo with KDE Plamsa 6.1.4 Wayland with this linux firmware commit with BIOS 3.04beta

I’ve been experiencing this PSR issue since Linux 6.7.x ( I don’t remember 6.6.x ) and used 6.8.x, 6.9.x, 6.10.x

I decided to try Linux 6.11.0 sooner

Unfortunately the PSR issue is still there and I’m sure that any application that’s using Xwayland ( Linux 6.10.x and now 6.11.0 ) has a higher chance of triggering the PSR issue, especially when it comes to video ( youtube, reddit, twitch )

I’m having to use disable Wayland usage in Firefox because of high CPU usage.

The PSR issue was also being triggered when using JetBrains editors until I enabled Wayland use in them and the PSR issues reduced significantly.

There have been some improvements during the PSR issue, the screen updates more frequently ( 0.5-1.0 second(s) ) instead every 3-5 seconds ( closer to 3 ).

I’ve tried the following command when on the KDE Plasma desktop and again logging out to the TTY

sudo cat /sys/kernel/debug/dri/1/amdgpu_gpu_recover

and both times the screen goes black ( turns off I guess ) but never comes back on, so I’m having to cold reboot ( holding down power button for about 8 seconds and then turn back on again )

and I have the kernel command line parameter

amdgpu.gpu_recovery=1

set too. I think this is required for the amdgpu_gpu_recover to work

The kernel log ( different error to Linux 6.10 ) was showing alot ( not as many as Linux 6.10 )

amdgpu 0000:c2:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data

I’m wondering if there is a log stored somewhere

and one time I see ( in middle of previous error messages )

amdgpu 0000:c2:00.0: [drm] *ERROR* [CRTC:79:crtc-0] flip_done timed out

and one time I see ( in middle of previous error messages )

amdgpu 0000:c2:00.0: [drm] enabling link 0 failed: 15

When Linux 6.12-rc1 is released ( 29th/30th September maybe? ), I think I’ll try that as it will have Panel Replay ( I’m interpreting it as a better alternative to PSR ) enabled and see how that goes.

1 Like

So further up in the thread some people were hinting at that maybe the auto-brightness might be part of what ends up triggering this.
I was having this issue suddenly multiple times per week, which was probably roughly around when I enabled auto-brightness in gnome.
I turned it off about 1.5 weeks ago and it hasn’t happened a single time since.

Could of course be completely incidental, but worth trying for anyone that might be really disrupted by when this happens :person_shrugging:

Yeah I’ve mentioned that too earlier in this thread

1 Like

Well after fading into the background for month or two this came back. I doubt I’ll see it again for another two months.

The same was for me but it was happening about 1-2 times a month but since Linux 6.10.10 ( I’m now using 6.11.0 ), it’s been happening to me multiple times day ( 2-4 ) and so I just disabled PSR for now using kernel parameter

amdgpu.dcdebugmask=0x210

which disable PSR and PSR-SU ( although FW16 displays supports PSR-SU, amdgpu is not using it ).

I just got hit with this for the second time, and had to reboot the first time. I’m running Fedora 40 KDE spin on kernel 6.10.11, on a 7840HS system with no dGPU at all, and no support for auto brightness in KDE (to the best of my knowledge). The suggested amdgpu_gpu_recover file causes the screen to go black and never return, forcing a hard reboot. I have the exact same errors as @sinatosk above, but this only started for me on this kernel version (I’ve yet to try downgrading). Increased power usage is already somewhat unacceptable for me as I’m trying to ensure the longest battery life possible right now, so the PSR and PSR-SU disable kernel command line isn’t a fix for me.

Ok, I was so bummed out that my expensive laptop purchase has this issue because other than that I love this laptop!

My experience/information so far:

I don’t have a discrete CPU and I don’t often use suspend. this happens while the system is running.

At very random times (I cannot pinpoint it to high demand times, it’s literally just at random moments while I’m using the laptop) the system goes into a state where everything is slowed down, I get 1 FPS while moving the cursor or typing or anything like that, 2-3 seconds delay between typing something and it appearing on the screen. I switch to TTY3 and try closing the display manager, seeing CPU usage (not high at all, it’s very low, so it’s not the CPU being stressed). And my dmesg is flooded with such messages:

Οκτ 02 23:55:17 framework kernel: amdgpu 0000:c1:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data

I can open programs but everything is slow af. If I try to open an mp3 with mpv while in the TTY, it will take it’s sweet time to start playing, but once it opens after like a minute of waiting, the sound plays normally without distortion.

A lot of searching later.

I come upon this thread. :slight_smile:
I try running the command mentioned in here while I’m in the TTY:

sudo cat /sys/kernel/debug/dri/1/amdgpu_gpu_recover

The screen goes black and never comes back. I’m forced to REISUB
(Side Note: this took me a long time to find but You can do SysRQ on the Framework 16 by pressing together Fn + Alt + F11. then you can keep holding the alt while you successively type R E I S U B. Also for the magic combo to work you’ll need to have sysrq_always_enabled=1 in your kernel parameters)

I reboot.
I see the post above suggesting that gpu_recovery option of amdgpu driver might need to be set to 1 for the recover command to work properly.

I make the file /etc/modprobe.d/amdgpu.conf:

options amdgpu gpu_recovery=1

Regenerate initramfs.
Reboot again.

Now I try the recover command again. this time it works. The screen goes black and comes back again. But I have not yet tested it in the real world freeze scenario.

I will make a bash alias for this command so that I can run it easily the next time the semi-freeze happens

I’ll also keep in mind the other workarounds like disabling PSR as @sinatosk mentioned. Really helpful stuff.

I’m relieved to not be alone in this.

I am missing something - following the above steps, the cat command simply returns 0.

Edit to add - with the dGPU installed it is 1, the other one is 2, so the command that worked for him was:

sudo cat /sys/kernel/debug/dri/2/amdgpu_gpu_recover

Yeah before it was like 1-2 times a month but then me since Linux 6.10.10 ( later upgraded to 6.11.0 ), this went up to 1-4 times a day, so I just disabled PSR and PSR-SU

amdgpu.dcdebugmask=0x210

I’ve been on 6.12-rc1 with PSR, PSR-SU and PR ( Panel Replay ) enabled all is good so far except this but they’re ( AMD ) working on it. I don’t do much gaming and I use my screen at 165Hz so it’s only when something is at lower than 70 fps am I affected by this

==== offtopic
6.12-rc1 average power usage has dropped to where I’m getting 30-120 minutes more battery life

Ok so this just happened while watching a video and the recover command worked. @sinatosk I might try your debugmask config too.

Is this bug we’re experiencing known to the kernel devs?

EDIT: Seems like they are because of AMDGPU crash Error queuing DMUB command: status=2, Error waiting for DMUB idle: status=3 (#2862) · Issues · drm / amd · GitLab and similar issues

So you’re syaing if I run 6.12 I don’t need to disable PSR?

No, I’m saying

and running 6.12 as a release candidate is riskier but I’m taking that risk, better than having to deal with the crap bugs waiting for AMD to fix it, I can instead give them feedback ( I should have done RC sooner… )

Yeah I see the issue you mention but that only happens for under 70 fps.

so for normal operation on 6.12 you don’t have the freeze happening?

I’ve had 6.12 for 1 1/2 weeks now and I also think the issue hasn’t appeared since.

I think it’s too soon to tell but usually by now I would have seen the issue ( especially since 6.10.10 ).

Since 6.12-rc1 ( using 6.12-rc2 now), issue hasn’t shown and I’ve had PSR/PSR-SU, PR and IPS enabled and it’s not happened yet.

Since 6.12 ( and 6.11.3, @Mario_Limonciello say’s it’s been enabled there too ), AMD have enabled PR again. PSR and PR are mutually exclusive and PR has higher priority.

I’ll give it until end of rc4/rc5 before I consider this issue is gone as it’s used to happen with me at least once a month which is still time before 6.12 becomes stable

but I’m hoping that other issue I and someone else reported, AMD can fix that before 6.12.0 then I would say 6.12.0 gonna be awesome ( for atleast FW13/16 users )

======== offtopic
I’m loving this average lower power consumption. I’m seeing 5.9-6.0Wh in the evening ( screen brightness between 1-5% ) with ABM set to 2 and monitor at 165Hz

JFYI, panel replay is also enabled in 6.11.3

2 Likes

Huh. This just happened to me again on 6.12rc2 :frowning:

This time was different.
The screen went black and I could not do anything, no tty change, no commands.
REISUB.
journanctl -b -1 and towards the end I see the familiar
amdgpu 0000:c1:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data

So I will be disabling PSR again.