This looks like what 11752c013f562a1124088a35bd314aa0e9f0e88f fixed, but that’s definitely already in your kernel.
What is the version of your linux-firmware package? I want to double check how old the MES microcode is.
This looks like what 11752c013f562a1124088a35bd314aa0e9f0e88f fixed, but that’s definitely already in your kernel.
What is the version of your linux-firmware package? I want to double check how old the MES microcode is.
Also besides the version of the linux firmware package can you also provide output for amdgpu_firmware_info from debugfs?
I hope to provide the correct information needed:
dpkg -l gives me linux-firmware version 20240318.git3b128b60-0ubuntu2.14
/sys/kernel/debug/dri/0000:c1:00.0/amdgpu_firmware_info gives me:
VCE feature version: 0, firmware version: 0x00000000
UVD feature version: 0, firmware version: 0x00000000
MC feature version: 0, firmware version: 0x00000000
ME feature version: 35, firmware version: 0x0000000b
PFP feature version: 35, firmware version: 0x0000000d
CE feature version: 0, firmware version: 0x00000000
RLC feature version: 1, firmware version: 0x11520400
RLC SRLC feature version: 0, firmware version: 0x00000000
RLC SRLG feature version: 0, firmware version: 0x00000000
RLC SRLS feature version: 0, firmware version: 0x00000000
RLCP feature version: 1, firmware version: 0x11520400
RLCV feature version: 0, firmware version: 0x00000000
MEC feature version: 35, firmware version: 0x0000000d
IMU feature version: 0, firmware version: 0x0b342000
SOS feature version: 0, firmware version: 0x00000000
ASD feature version: 553648353, firmware version: 0x210000e1
TA XGMI feature version: 0x00000000, firmware version: 0x00000000
TA RAS feature version: 0x00000000, firmware version: 0x00000000
TA HDCP feature version: 0x00000000, firmware version: 0x17000041
TA DTM feature version: 0x00000000, firmware version: 0x12000016
TA RAP feature version: 0x00000000, firmware version: 0x00000000
TA SECUREDISPLAY feature version: 0x00000000, firmware version: 0x00000000
SMC feature version: 0, program: 11, firmware version: 0x0b650400 (101.4.0)
SDMA0 feature version: 60, firmware version: 0x0000000c
VCN feature version: 0, firmware version: 0x0711300d
DMCU feature version: 0, firmware version: 0x00000000
DMCUB feature version: 0, firmware version: 0x09002400
TOC feature version: 0, firmware version: 0x0000000b
MES_KIQ feature version: 6, firmware version: 0x00000078
MES feature version: 1, firmware version: 0x00000062
VPE feature version: 60, firmware version: 0x0000000f
VBIOS version: 113-STRIXEMU-001
And this might be worth mentioning:
For productive use I use the generic kernel. It had the initial freeze problem. But that one is very rare.
Linux jan-framwork13 6.11.0-29-generic #29~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Jun 26 14:16:59 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
Under the OEM kernel I have the freeze isse very often. That is this kernel version:
Linux jan-framwork13 6.14.0-1007-oem #7-Ubuntu SMP PREEMPT_DYNAMIC Mon Jun 30 08:41:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Can you please try to upgrade the firmware binary manually from upstream linux-firmware?
gc_11_5_0_mes_2.bin
Back up the old one in /lib/firmware/amdgpu and then replace with this one and rebuild your initramfs and reboot.
If the old one is compressed you’ll need to compress this one too.
When you’re rebooted check that debugfs file again to make sure that MES version has increased.
Here is what I did - just to make sure everything is correct:
But I still get these versions:
MES_KIQ feature version: 6, firmware version: 0x00000078
MES feature version: 1, firmware version: 0x00000062
Any further ideas from anyone?
I’m running on the generic kernel, because it is much more stable. (Just 1 or 2 crashes a week.)
But I’d really like to fix this for a really stable system.
Hmm; I’m not sure you updated properly. I just double checked on a Strix Framework 13 that I’m running all the latest firmware from linux-firmware.git.
❯ cat /sys/kernel/debug/dri/0/amdgpu_firmware_info | grep MES
MES_KIQ feature version: 6, firmware version: 0x0000006d
MES feature version: 1, firmware version: 0x00000074
Here’s the checksums for all the GC 11.5.0 binaries, although granted my zstd compression in Arch might be different than you did.
❯ md5sum /lib/firmware/amdgpu/gc_11_5_0*
6d4e01ad0c42d32efa197d05d28ccc3e /lib/firmware/amdgpu/gc_11_5_0_imu.bin.zst
da7ea6bfe2a75952f16b516962a1d0b7 /lib/firmware/amdgpu/gc_11_5_0_me.bin.zst
51076b7f3333f1abeb3dbc4cec43bf28 /lib/firmware/amdgpu/gc_11_5_0_mec.bin.zst
04acdceed64db6d2a520826bec0f0e15 /lib/firmware/amdgpu/gc_11_5_0_mes1.bin.zst
e1934cf4e2ab511096cf3aec001042cd /lib/firmware/amdgpu/gc_11_5_0_mes_2.bin.zst
3cd20106fbde3882e364573195222bd0 /lib/firmware/amdgpu/gc_11_5_0_pfp.bin.zst
022eb948adefc47f8b71e113b0b96a97 /lib/firmware/amdgpu/gc_11_5_0_rlc.bin.zst
In short - try again, maybe you made a mistake along the way?
Sorry for the delay. I didn’t find the time to follow up.
I tried again with the same steps and the same result.
Next I tried with gc_11_5_3 because I have the following files in /lib/firmware/amdgpu:
ls -latr gcmes_2
-rw-r–r-- 1 root root 50566 Jul 8 23:43 gc_11_5_0_mes_2.bin.zst.backup
-rw-r–r-- 1 root root 53618 Jul 15 18:59 gc_11_5_3_mes_2.bin.zst
-rw-r–r-- 1 root root 52066 Jul 15 18:59 gc_11_5_2_mes_2.bin.zst
-rw-r–r-- 1 root root 53624 Jul 15 18:59 gc_11_5_1_mes_2.bin.zst
-rw-r–r-- 1 root root 46957 Jul 15 18:59 gc_11_0_4_mes_2.bin.zst
-rw-r–r-- 1 root root 48101 Jul 15 18:59 gc_11_0_3_mes_2.bin.zst
-rw-r–r-- 1 root root 48056 Jul 15 18:59 gc_11_0_2_mes_2.bin.zst
-rw-r–r-- 1 root root 47929 Jul 15 18:59 gc_11_0_1_mes_2.bin.zst
-rw-r–r-- 1 root root 47093 Jul 15 18:59 gc_11_0_0_mes_2.bin.zst
-rw-rw-r-- 1 jan jan 59691 Aug 28 23:14 gc_11_5_0_mes_2.bin.zst
But this didn’t load the AMD firmware but resulted in a fallback mode.
Do you have any additional idea here for me?
Thanks!
Jan
There is also an updated Linux firmware in proposed now as of a week ago. You can use that if you’re having a hard time with manually compressing
The firmware from propsed didn’t work either.
The solution to get the firmware running was to replace all firmware files with the latest firmware version. I guess the update-initramfs did use one of the old versions for some reason.
3 hours later it seems to run stable with the OEM kernel. (Both AMD GPU and wlan.)
Thanks for the help!
Jan
Update: Even after this the system crashes. But now it takes a few hours not just a few minutes.
@Mario_Limonciello It seems that I’m the only person with that issue, right? Could this be a hardware-isse instead of a firmware issue?
I will say there was an issue that was fixed by the updated microcode that could lead to graphics hangs which manifested as a micro engine scheduler time out. Assuming you upgraded properly now you should have the fix for it. If you want to double-check this you can get the firmware version info from debugfs (/system/kernel/debug/dri/0/amdgpu_firmware_info) and we can cross reference it against the headers of the latest version upstream to confirm.
But as for whether your remaining timeout is hardware or not is a fuzzy line that’s really hard to identify from a stranger on the Internet with a single error message.
Normally what needs to be done is the state of the graphics hardware be dumped and analyzed. In order to do that we need a solid reproducer. For example the MES timeout that was fixed was identified by a specific action in Blender even though it could also manifest in other uses of graphics shader.
I updated to the kernel and firmware from proposed - which didn’t fix the issue.
And the problem is that there is no solid way to reproduce the error. It usually apperas while working with Vivaldi or Firefox browser - but other apps are cause the issue too. And there is no action within the browsers which causes the c rash every time. So unfortunately I only have the error message from the logs.
Is there anything else I can do to track that issue?
If no I’ll create a ticket and will as framework to replace the AMD GPU.
Given you’re the only one reporting it I know of I think that’s a good idea.
Hello. I think I may have the same problem - but the thread is long and I have not assimilated all of it. My problem is that LibreOffice Impress causes a really hard hang when using an HDMI external monitor. (I reported that problem here on this board. I have worked around the problem with a kernel boot switch.)
Cf. also this problem, different but again involved with external monitors, and apparently caused by interaction between the kernel (well, some versions of it) and the very latest AMD F13 BIOS (which I have yet to install).
I already posted the solution to the Xorg hang in that bug. Switch off cinnamon in Xorg or apply the kernel patch.
I was not seeking help with the (bug that we can call) the xorg bug. Rather I was seeking to determine whether the following three problems are distinct.
The timeout bug that this the present thread treats.
The xorg problem that I have.
The problem that is partially caused by the new BIOS and that has to do with resolution on external monitors.
I take it that each problem is distinct.
1 and 2 should be distinct. 3 I don’t know. I would pick up the solution for 2 to decide.
I’m using LibreOffice Impress a lot including external monitors and had no issues with that. On my system it often apperaed with no screen attached at all (only the internal display).
After weeks of testing, it turned out not to be a hardware fault, but a software issue. Disabling PSR at the driver level stopped all GPU resets and display hangs.
For anyone running into the same issue, create
/etc/modprobe.d/amdgpu-disable-psr.conf
with
options amdgpu dcdebugmask=0x10
and rebuild initramfs — that completely eliminated the freezes on my system.
The MES issue still exists, but it’s clearly unrelated — it wasn’t the trigger for these freezes.
Hopefully a future firmware/kernel update will include an official fix so this workaround won’t be necessary anymore.