Well after fading into the background for month or two this came back. I doubt I’ll see it again for another two months.
The same was for me but it was happening about 1-2 times a month but since Linux 6.10.10 ( I’m now using 6.11.0 ), it’s been happening to me multiple times day ( 2-4 ) and so I just disabled PSR for now using kernel parameter
amdgpu.dcdebugmask=0x210
which disable PSR and PSR-SU ( although FW16 displays supports PSR-SU, amdgpu is not using it ).
I just got hit with this for the second time, and had to reboot the first time. I’m running Fedora 40 KDE spin on kernel 6.10.11, on a 7840HS system with no dGPU at all, and no support for auto brightness in KDE (to the best of my knowledge). The suggested amdgpu_gpu_recover file causes the screen to go black and never return, forcing a hard reboot. I have the exact same errors as @sinatosk above, but this only started for me on this kernel version (I’ve yet to try downgrading). Increased power usage is already somewhat unacceptable for me as I’m trying to ensure the longest battery life possible right now, so the PSR and PSR-SU disable kernel command line isn’t a fix for me.
Ok, I was so bummed out that my expensive laptop purchase has this issue because other than that I love this laptop!
My experience/information so far:
I don’t have a discrete CPU and I don’t often use suspend. this happens while the system is running.
At very random times (I cannot pinpoint it to high demand times, it’s literally just at random moments while I’m using the laptop) the system goes into a state where everything is slowed down, I get 1 FPS while moving the cursor or typing or anything like that, 2-3 seconds delay between typing something and it appearing on the screen. I switch to TTY3 and try closing the display manager, seeing CPU usage (not high at all, it’s very low, so it’s not the CPU being stressed). And my dmesg is flooded with such messages:
Οκτ 02 23:55:17 framework kernel: amdgpu 0000:c1:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
I can open programs but everything is slow af. If I try to open an mp3 with mpv while in the TTY, it will take it’s sweet time to start playing, but once it opens after like a minute of waiting, the sound plays normally without distortion.
A lot of searching later.
I come upon this thread.
I try running the command mentioned in here while I’m in the TTY:
sudo cat /sys/kernel/debug/dri/1/amdgpu_gpu_recover
The screen goes black and never comes back. I’m forced to REISUB
(Side Note: this took me a long time to find but You can do SysRQ on the Framework 16 by pressing together Fn + Alt + F11. then you can keep holding the alt while you successively type R E I S U B. Also for the magic combo to work you’ll need to have sysrq_always_enabled=1
in your kernel parameters)
I reboot.
I see the post above suggesting that gpu_recovery
option of amdgpu driver might need to be set to 1 for the recover command to work properly.
I make the file /etc/modprobe.d/amdgpu.conf
:
options amdgpu gpu_recovery=1
Regenerate initramfs.
Reboot again.
Now I try the recover command again. this time it works. The screen goes black and comes back again. But I have not yet tested it in the real world freeze scenario.
I will make a bash alias for this command so that I can run it easily the next time the semi-freeze happens
I’ll also keep in mind the other workarounds like disabling PSR as @sinatosk mentioned. Really helpful stuff.
I’m relieved to not be alone in this.
I am missing something - following the above steps, the cat command simply returns 0.
Edit to add - with the dGPU installed it is 1, the other one is 2, so the command that worked for him was:
sudo cat /sys/kernel/debug/dri/2/amdgpu_gpu_recover
Yeah before it was like 1-2 times a month but then me since Linux 6.10.10 ( later upgraded to 6.11.0 ), this went up to 1-4 times a day, so I just disabled PSR and PSR-SU
amdgpu.dcdebugmask=0x210
I’ve been on 6.12-rc1 with PSR, PSR-SU and PR ( Panel Replay ) enabled all is good so far except this but they’re ( AMD ) working on it. I don’t do much gaming and I use my screen at 165Hz so it’s only when something is at lower than 70 fps am I affected by this
==== offtopic
6.12-rc1 average power usage has dropped to where I’m getting 30-120 minutes more battery life
Ok so this just happened while watching a video and the recover command worked. @sinatosk I might try your debugmask config too.
Is this bug we’re experiencing known to the kernel devs?
EDIT: Seems like they are because of AMDGPU crash Error queuing DMUB command: status=2, Error waiting for DMUB idle: status=3 (#2862) · Issues · drm / amd · GitLab and similar issues
So you’re syaing if I run 6.12 I don’t need to disable PSR?
No, I’m saying
and running 6.12 as a release candidate is riskier but I’m taking that risk, better than having to deal with the crap bugs waiting for AMD to fix it, I can instead give them feedback ( I should have done RC sooner… )
Yeah I see the issue you mention but that only happens for under 70 fps.
so for normal operation on 6.12 you don’t have the freeze happening?
I’ve had 6.12 for 1 1/2 weeks now and I also think the issue hasn’t appeared since.
I think it’s too soon to tell but usually by now I would have seen the issue ( especially since 6.10.10 ).
Since 6.12-rc1 ( using 6.12-rc2 now), issue hasn’t shown and I’ve had PSR/PSR-SU, PR and IPS enabled and it’s not happened yet.
Since 6.12 ( and 6.11.3, @Mario_Limonciello say’s it’s been enabled there too ), AMD have enabled PR again. PSR and PR are mutually exclusive and PR has higher priority.
I’ll give it until end of rc4/rc5 before I consider this issue is gone as it’s used to happen with me at least once a month which is still time before 6.12 becomes stable
but I’m hoping that other issue I and someone else reported, AMD can fix that before 6.12.0 then I would say 6.12.0 gonna be awesome ( for atleast FW13/16 users )
======== offtopic
I’m loving this average lower power consumption. I’m seeing 5.9-6.0Wh in the evening ( screen brightness between 1-5% ) with ABM set to 2 and monitor at 165Hz
JFYI, panel replay is also enabled in 6.11.3
Huh. This just happened to me again on 6.12rc2
This time was different.
The screen went black and I could not do anything, no tty change, no commands.
REISUB.
journanctl -b -1 and towards the end I see the familiar
amdgpu 0000:c1:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
So I will be disabling PSR again.
If your on Framework 16 and using Linux 6.11.3/6.12-rc1 or greater then PSR is already inactive as amdgpu is using PR instead ( assuming you’ve not set amdgpu.dcdebugmask kernel parameter )
I I have interesting news. Looking at the log again, somehow I had booted to 6.11.2
That’s good news, means 6.12 still works.
I’ll find out why my mainline got nuked.
And you guys are saying we can use 6.11.3+ as well to avoid the problem
So…
PSR - Panel Self Refresh
PSR2 ( PSR-SU ) - Panel Self Refresh 2 - only changed parts of the frame buffer get updated on the display - not sure what “SU” means, sub update maybe?
PR - Panel Replay
Using
- Framework 16 7840HS + Radeon 780M with BIOS 3.04 beta
- Gentoo
- Linux 6.12-rc5, GCC, znver4 ( not from Gentoo sys-kernel/git-sources but from kernel.org git mainline, I find it easier to do git bisecting if needed )
- KDE Plasma 6.1.5 - Wayland
- linux firmware 51db3c110192c8f0e5dcaf9172cf01a374732709
- microcode 0x0a704107 upgraded from 0x0a704104 ( from BIOS )
I’ve been using 6.12 rc1-rc4 ( just updated to rc5 this morning but this message not accounting for that )
I’ve tried many times to trigger this issue
- Firefox, switching tabs ( some with and without videos )
- loading and watching videos ( VLC, MPV )
- Using JetBrains RustRover ( this was where I first experienced this issue and happens less when using Wayland instead of Xwayland )
- Changing screen brightness manually or automatically. I’ve written some code ( Rust ) that changes the screen brightness levels relative to the ambient light sensor
- Change the frequency ( the code ) of brightness levels adjustments ( currently capped 1 millisecond )
- Sometimes just starting KDE Plasma would cause the issue ( infuriating )
There are other things that I can somewhat trigger this but the most common ( for me ) was when using applications that made use of Xwayland
nothing so far has triggered the PSR issue for me and I’ve had PSR, PSR-SU ( I’m not quite sure if FW16 panel supports this ) and PR enabled since 6.12-rc1
Looking at the path /sys/kernel/debug/dri/0/eDP-1/
( you may need to be a super user to read this directory ) at the files that begin with psr_
( for PSR ) show that it’s not enabled and files that begin with replay_
( for PR ) are enabled and in use ( PSR or PR is enabled, not both ).
With all this said ( kinda wasteful me saying all this eh? ), it’s possible the PSR issues are still there as I’ve never disabled PR to see how PSR works as I’m preferring PR.
The only negative at the moment is this issue (link in previous post) but I don’t think it will be around for long. I’m hoping they ( AMD ) can fix it before 6.12 is stable and I think they’ll be able to backport those changes to 6.11.z
I’m now using 6.12-rc5 and intend to stay with the rest of 6.12 rc’s and into stable
============ off topic
-
Average battery life during usage ( based on my usage ) has increase by 30-90 minutes
-
I don’t remember if it was 6.11 or 6.12 this was added but you can now see the temperatures of each memory module
-
I’ve not tried battery controlling yet ( 6.11 or 6.12 I don’t remember ), some others have
I can trigger this issue 100% of the time when I disconnect my usb-c wireless hdmi transmitter (Amazon.com: Wireless HDMI Transmitter and Receiver, Upgrade Type-C 3.1 Port Wireless Hdmi Plug & Play Portable 2.4G/5G for Streaming Video and Audio to HDTV/Projector/Monitor from Martphone/Tablet/Laptop : Electronics). This happens on my Framework 13 AMD running NixOS 24.5 and 24.11, with every kernel version that I have tried so far
sudo cat /sys/kernel/debug/dri/1/amdgpu_gpu_recover
fixes things, but no external display that I have tried works after that until I restart
Without amdgpu.dcdebugmask=0x210, I get the DMUB error the first time that I remove the usb-c wireless hdmi transmitter. (I never get to disconnect the transmitter a second time because the computer is unusable after this.) With the kernel option, I get the error when I plug the transmitter back in after removing it. This is using NixOS 24.11, kernel 6.11.x (as well as 6.6.x) and Gnome 47
The crashes significantly decreased with kernel 6.11. While I still experienced some minor UI glitches, crashes occurred only every 1-2 weeks.
After installing kernel 6.12, the glitches are gone. I’m hopeful that this update has fully resolved the issue, as @sinatosk mentioned, but only time will tell.
Oops. I wasn’t specific about my kernel version. I can still reproduce this on my AMD FW 13 running NixOS 24.11 with kernel 6.12.5 by unplugging my wireless display adapter