Amdgpu Error queueing DMUB command: status=2 when waking from suspend

I also have the problem, but what’s crazy is that my second screen who is directly plugged into the GPU work fine (just some latency when the cursor goes back to the intel screen).
Well, i made my second fw_ alias to try the sudo cat /sys/kernel/debug/dri/1/amdgpu_gpu_recover thanks

Can anyone that doesn’t have auto-brightness turned on anymore confirm that it “solves” the problem?
Or does it still happen?

I’ve written some code to auto adjust the brightness while also regularly updating the amdgpu firmware ( I’m now on this commit )

Over the last 4-5 updates without changing my code there has been a significant reduction on the PSR issue happening, but still happens.

I’ve limited it too one write per millisecond to the brightness file ( I’m not sure if this helps )

Most of the time when the PSR issue happens with me now is usually when I’m just doing regular browsing in Firefox or when I’m using an application that’s using Xwayland, for me this was with Jetbrains RustRover which I later configured it to use wayland and issue not happened since.

but it’s not often it happens for me now… 1-3 times a month

PR - Panel Replay
PSR - Panel Self Refresh

Thinking about what I’ve said recently.

I’ve purposely started using more software that uses Xwayland on KDE Plasma Wayland and I’ve been frequently getting more PSR hangs ( per day and in some cases multiple times per day, yesterday was three times and this morning twice )

I’ve also noticed when using Firefox ( Wayland or Xwayland ) on KDE Plasma Wayland, most of the hangs seem to happen on website that have video’s playing ( youtube, reddit, twitch for example )

I’ve also started using Hexchat ( IRC client ) which is used via Xwayland ( it doesn’t support Wayland yet ) and again, more PSR hangs

I’ve stopped using all that software using Xwayland ( except Firefox but not using those websites I’ve mentioned earlier ) and no PSR hangs

I’ve also noticed an increase in PSR hangs since Linux 6.10.10 on Gentoo ( happened twice while typing this post )

When the PSR hang happens, I logout KDE Plasma and onto my TTY, I then execute

sudo cat /sys/kernel/debug/dri/1/amdgpu_gpu_recover

but the screen is just blank ( it’s like this for over a minute ) and so I do a cold reboot

I think found some triggers ( I’ve not read around yet for what others have said for triggers ) but it’s still not enough to narrow down what the issue is in my opinion.

Linux 6.11.0 is released and I’ve not compiled/installed it as gentoo ( sys-kernel/gentoo-kernel ) hasn’t released it yet. I’ve noticed numerous changes to amdgpu related to PSR and PR. Maybe those changes have fixed the PSR hanging issues and I think PR is being enabled in Linux 6.12

Is there any ETA on getting this fixed?

I would consider this sort of issue to be of high priority, it’s not good for a latop to become unusuable like this. What if this happens in the middle of a critical business presentation or a remote job interview?

@Vadim_Peretokin , this is an issue with the AMD Linux driver provided by the kernel. It is not provided by Framework. You have a few choices… Supposedly the newer AMD drivers have fixes to alleviate this issue. So, you can wait for the kernel to get the latest driver version. Depending on the Linux flavor you use this can take a long time. Or another option is to roll your own kernel and bring in the latest AMD driver. But there isn’t much for Framework to do here.

Personally, I removed the external GPU on my Framework and that seemed to help. People also have turned off auto brightness settings which helps alot. There is also a way to mitigate the problem by resetting the driver when the issue pops up.

as a mitigation until a fix comes later, you can add to the kernel command line

amdgpu.dcdebugmask=0x210

which disables PSR and PSR-SU within amdgpu

here ( linux 6.11.0 ) between lines 254-267 are a list of other flags you can enable/disable

1 Like

Thanks! What does one lose in practice by disabling it?

I’ve noticed power usage is slightly higher ( 0.8-1.2W~ )

I’m using Gentoo with KDE Plamsa 6.1.4 Wayland with this linux firmware commit with BIOS 3.04beta

I’ve been experiencing this PSR issue since Linux 6.7.x ( I don’t remember 6.6.x ) and used 6.8.x, 6.9.x, 6.10.x

I decided to try Linux 6.11.0 sooner

Unfortunately the PSR issue is still there and I’m sure that any application that’s using Xwayland ( Linux 6.10.x and now 6.11.0 ) has a higher chance of triggering the PSR issue, especially when it comes to video ( youtube, reddit, twitch )

I’m having to use disable Wayland usage in Firefox because of high CPU usage.

The PSR issue was also being triggered when using JetBrains editors until I enabled Wayland use in them and the PSR issues reduced significantly.

There have been some improvements during the PSR issue, the screen updates more frequently ( 0.5-1.0 second(s) ) instead every 3-5 seconds ( closer to 3 ).

I’ve tried the following command when on the KDE Plasma desktop and again logging out to the TTY

sudo cat /sys/kernel/debug/dri/1/amdgpu_gpu_recover

and both times the screen goes black ( turns off I guess ) but never comes back on, so I’m having to cold reboot ( holding down power button for about 8 seconds and then turn back on again )

and I have the kernel command line parameter

amdgpu.gpu_recovery=1

set too. I think this is required for the amdgpu_gpu_recover to work

The kernel log ( different error to Linux 6.10 ) was showing alot ( not as many as Linux 6.10 )

amdgpu 0000:c2:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data

I’m wondering if there is a log stored somewhere

and one time I see ( in middle of previous error messages )

amdgpu 0000:c2:00.0: [drm] *ERROR* [CRTC:79:crtc-0] flip_done timed out

and one time I see ( in middle of previous error messages )

amdgpu 0000:c2:00.0: [drm] enabling link 0 failed: 15

When Linux 6.12-rc1 is released ( 29th/30th September maybe? ), I think I’ll try that as it will have Panel Replay ( I’m interpreting it as a better alternative to PSR ) enabled and see how that goes.

1 Like

So further up in the thread some people were hinting at that maybe the auto-brightness might be part of what ends up triggering this.
I was having this issue suddenly multiple times per week, which was probably roughly around when I enabled auto-brightness in gnome.
I turned it off about 1.5 weeks ago and it hasn’t happened a single time since.

Could of course be completely incidental, but worth trying for anyone that might be really disrupted by when this happens :person_shrugging:

Yeah I’ve mentioned that too earlier in this thread

1 Like

Well after fading into the background for month or two this came back. I doubt I’ll see it again for another two months.

The same was for me but it was happening about 1-2 times a month but since Linux 6.10.10 ( I’m now using 6.11.0 ), it’s been happening to me multiple times day ( 2-4 ) and so I just disabled PSR for now using kernel parameter

amdgpu.dcdebugmask=0x210

which disable PSR and PSR-SU ( although FW16 displays supports PSR-SU, amdgpu is not using it ).

I just got hit with this for the second time, and had to reboot the first time. I’m running Fedora 40 KDE spin on kernel 6.10.11, on a 7840HS system with no dGPU at all, and no support for auto brightness in KDE (to the best of my knowledge). The suggested amdgpu_gpu_recover file causes the screen to go black and never return, forcing a hard reboot. I have the exact same errors as @sinatosk above, but this only started for me on this kernel version (I’ve yet to try downgrading). Increased power usage is already somewhat unacceptable for me as I’m trying to ensure the longest battery life possible right now, so the PSR and PSR-SU disable kernel command line isn’t a fix for me.

Ok, I was so bummed out that my expensive laptop purchase has this issue because other than that I love this laptop!

My experience/information so far:

I don’t have a discrete CPU and I don’t often use suspend. this happens while the system is running.

At very random times (I cannot pinpoint it to high demand times, it’s literally just at random moments while I’m using the laptop) the system goes into a state where everything is slowed down, I get 1 FPS while moving the cursor or typing or anything like that, 2-3 seconds delay between typing something and it appearing on the screen. I switch to TTY3 and try closing the display manager, seeing CPU usage (not high at all, it’s very low, so it’s not the CPU being stressed). And my dmesg is flooded with such messages:

Οκτ 02 23:55:17 framework kernel: amdgpu 0000:c1:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data

I can open programs but everything is slow af. If I try to open an mp3 with mpv while in the TTY, it will take it’s sweet time to start playing, but once it opens after like a minute of waiting, the sound plays normally without distortion.

A lot of searching later.

I come upon this thread. :slight_smile:
I try running the command mentioned in here while I’m in the TTY:

sudo cat /sys/kernel/debug/dri/1/amdgpu_gpu_recover

The screen goes black and never comes back. I’m forced to REISUB
(Side Note: this took me a long time to find but You can do SysRQ on the Framework 16 by pressing together Fn + Alt + F11. then you can keep holding the alt while you successively type R E I S U B. Also for the magic combo to work you’ll need to have sysrq_always_enabled=1 in your kernel parameters)

I reboot.
I see the post above suggesting that gpu_recovery option of amdgpu driver might need to be set to 1 for the recover command to work properly.

I make the file /etc/modprobe.d/amdgpu.conf:

options amdgpu gpu_recovery=1

Regenerate initramfs.
Reboot again.

Now I try the recover command again. this time it works. The screen goes black and comes back again. But I have not yet tested it in the real world freeze scenario.

I will make a bash alias for this command so that I can run it easily the next time the semi-freeze happens

I’ll also keep in mind the other workarounds like disabling PSR as @sinatosk mentioned. Really helpful stuff.

I’m relieved to not be alone in this.

I am missing something - following the above steps, the cat command simply returns 0.

Edit to add - with the dGPU installed it is 1, the other one is 2, so the command that worked for him was:

sudo cat /sys/kernel/debug/dri/2/amdgpu_gpu_recover

Yeah before it was like 1-2 times a month but then me since Linux 6.10.10 ( later upgraded to 6.11.0 ), this went up to 1-4 times a day, so I just disabled PSR and PSR-SU

amdgpu.dcdebugmask=0x210

I’ve been on 6.12-rc1 with PSR, PSR-SU and PR ( Panel Replay ) enabled all is good so far except this but they’re ( AMD ) working on it. I don’t do much gaming and I use my screen at 165Hz so it’s only when something is at lower than 70 fps am I affected by this

==== offtopic
6.12-rc1 average power usage has dropped to where I’m getting 30-120 minutes more battery life

Ok so this just happened while watching a video and the recover command worked. @sinatosk I might try your debugmask config too.

Is this bug we’re experiencing known to the kernel devs?

EDIT: Seems like they are because of AMDGPU crash Error queuing DMUB command: status=2, Error waiting for DMUB idle: status=3 (#2862) · Issues · drm / amd · GitLab and similar issues

So you’re syaing if I run 6.12 I don’t need to disable PSR?

No, I’m saying

and running 6.12 as a release candidate is riskier but I’m taking that risk, better than having to deal with the crap bugs waiting for AMD to fix it, I can instead give them feedback ( I should have done RC sooner… )