AMD GPU MES Timeouts Causing System Hangs on Framework Laptop 13 (AMD AI 300 Series)

Jan_Theofel · June 30, 2025, 4:18pm

System Information

Model: Framework Laptop 13 (AMD Ryzen AI 300 Series)
CPU: AMD Ryzen AI (specific model not specified)
GPU: AMD Radeon 860M
OS: Ubuntu 24.04.2 LTS
Kernel: 6.11.0-28-generic

Problem Description

Experiencing repeated system hangs caused by AMD GPU MES (Micro Engine Scheduler) timeouts. The system becomes completely unresponsive, requiring hard power-off via power button. This issue occurs multiple times per day.

Error Messages and Logs

Primary Error Pattern:

amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)
[drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait

Critical GPU Timeout Leading to System Hang:

amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 timeout, signaled seq=33651, emitted seq=33652
amdgpu 0000:c1:00.0: amdgpu: Process information: process RDD Process pid 4263 thread firefox:cs0 pid 4708
amdgpu 0000:c1:00.0: amdgpu: GPU reset begin!

Reproduction Steps

Cannot be reliably reproduced. Issue occurs randomly during normal system usage, for example when using Firefox with hardware acceleration enabled.

Impact

System becomes completely unusable
Requires hard power-off (potential data loss)
Occurs multiple times per day
Affects normal web browsing activities

Current Workaround

Temporary Fix: Adding kernel parameter amdgpu.mes=0 to disable MES

sudo vi /etc/default/grub
# Add to GRUB_CMDLINE_LINUX_DEFAULT: amdgpu.mes=0
sudo update-grub
sudo reboot

Verification: After reboot, check with cat /proc/cmdline to confirm parameter is active.

Result: This workaround successfully prevents the MES timeouts and system hangs, but is only a temporary solution as it disables the MES system entirely.

Expected Behavior

The AMD GPU MES (Micro Engine Scheduler) should respond reliably to system commands without timing out, allowing normal system operation.

Mario_Limonciello · June 30, 2025, 7:44pm

Please use the OEM kernel. The VCN issue is fixed there. I haven’t seen the MES issue but all testing happens on OEM kernel.

Jan_Theofel · July 1, 2025, 6:18am

Didn’t know that yet. Thanks, I’ll install it. That might also fix a Wifi issue (looses connection a few times a day)?

Mario_Limonciello · July 1, 2025, 1:09pm

I haven’t seen that myself. But it may be access point dependent.

Btw there are two OEM kernel - 6.11 and 6.14. 6.14 will have better performance.

Jan_Theofel · July 3, 2025, 7:52pm

Using the OEM Kernel did make things better. Instead I get now regular crashes every 10 minutes or so.

I upgraded to OEM kernel 6.14.0-1005-oem following Framework’s Linux compatibility recommendations for improved AMD AI 300 Series hardware support.

New Issue After Upgrade: Following the OEM kernel installation, I encountered new DisplayPort AUX communication failures causing:

Mouse and keyboard input freezes
X-Server crashes
System hangs requiring hard power-off
Primary trigger appears to be Firefox with hardware acceleration enabled

Attempted Resolution: I added the amdgpu.sg_display=0 parameter to disable Scatter/Gather for the display subsystem:

GRUB_CMDLINE_LINUX_DEFAULT=“quiet splash amdgpu.mes=0 amdgpu.sg_display=0”

Current Situation: Unfortunately, despite both kernel parameters being active, DisplayPort AUX failures continue to occur with the same error patterns:

amdgpu: [drm] amdgpu: DP AUX transfer fail:4
amdgpu: [drm] amdgpu: AUX partially written
amdgpu: [drm] amdgpu: AUX reply command not ACK: 0x01
XIO: fatal IO error 4 on X server “:0”
(EE) AMDGPU(0): failed to set mode: Permission denied

But the issues I had wih the Wifi are gone with the OEM kernel.

Is there a workaround please let me know. For production I have to switch back to regular kernel.

Mario_Limonciello · July 3, 2025, 8:13pm

The aux messages are just noise. There is a missing patch in OEM kernel that makes them quieter.

Don’t use parameters to turn off sg display or mes. Please drop all that and share specifics of your crash without them and I’ll advise as best I could.

Also I STRONGLY suggest you don’t use X11. There are power management features that only work in Wayland. You will have higher power consumption.

Jan_Theofel · July 3, 2025, 9:11pm

No Wayland please. I had multiple issues with it. I can try to solve those later when my system is running without this GPU bug.

I removed the two parameters from GRUB and loaded OEM kernel 6.14.0-1005.
It took maybe 10 minutes til the first crash happend.

This is what I find in the logs: (first two lines are repetaed over and over again)

Jul 03 22:45:14 jan-framwork13 kernel: amdgpu 0000:c1:00.0: amdgpu: [drm] amdgpu: AUX partially written
Jul 03 22:45:14 jan-framwork13 kernel: amdgpu 0000:c1:00.0: amdgpu: [drm] amdgpu: AUX reply command not ACK: 0x01.
Jul 03 22:45:14 jan-framwork13 kernel: audit: type=1400 audit(1751575514.794:265): apparmor=“DENIED” operation=“capable” class=“cap” profile=“/snap/snapd/24718/usr/lib/snapd/snap-confine” pid=3098 comm=“snap-confine” capability=12 capname=“net_admin”
Jul 03 22:45:14 jan-framwork13 kernel: audit: type=1400 audit(1751575514.794:266): apparmor=“DENIED” operation=“capable” class=“cap” profile=“/snap/snapd/24718/usr/lib/snapd/snap-confine” pid=3098 comm=“snap-confine” capability=38 capname=“perfmon”
Jul 03 22:45:14 jan-framwork13 kernel: audit: type=1400 audit(1751575514.799:267): apparmor=“DENIED” operation=“open” class=“file” profile=“snap-update-ns.snapd-desktop-integration” name=“/proc/3129/maps” pid=3129 comm=“snap-update-ns” requested_mask=“r” denied_mask=“r” fsuid=1000 ouid=0
Jul 03 22:45:14 jan-framwork13 kernel: Lockdown: systemd-logind: hibernation is restricted; see man kernel_lockdown.7
Jul 03 22:45:14 jan-framwork13 kernel: rfkill: input handler disabled
Jul 03 22:45:15 jan-framwork13 kernel: audit: type=1326 audit(1751575515.098:268): auid=1000 uid=1000 gid=1000 ses=4 subj=snap.snapd-desktop-integration.snapd-desktop-integration pid=3188 comm=“snapd-desktop-i” exe=“/snap/snapd-desktop-integration/315/usr/bin/snapd-desktop-integration” sig=0 arch=c000003e syscall=203 compat=0 ip=0x7274c4c5c531 code=0x50000
Jul 03 22:45:15 jan-framwork13 kernel: audit: type=1326 audit(1751575515.098:269): auid=1000 uid=1000 gid=1000 ses=4 subj=snap.snapd-desktop-integration.snapd-desktop-integration pid=3188 comm=“snapd-desktop-i” exe=“/snap/snapd-desktop-integration/315/usr/bin/snapd-desktop-integration” sig=0 arch=c000003e syscall=203 compat=0 ip=0x7274c4c5c531 code=0x50000
Jul 03 22:45:15 jan-framwork13 kernel: audit: type=1326 audit(1751575515.098:270): auid=1000 uid=1000 gid=1000 ses=4 subj=snap.snapd-desktop-integration.snapd-desktop-integration pid=3188 comm=“snapd-desktop-i” exe=“/snap/snapd-desktop-integration/315/usr/bin/snapd-desktop-integration” sig=0 arch=c000003e syscall=203 compat=0 ip=0x7274c4c5c531 code=0x50000
Jul 03 22:45:15 jan-framwork13 kernel: audit: type=1326 audit(1751575515.098:271): auid=1000 uid=1000 gid=1000 ses=4 subj=snap.snapd-desktop-integration.snapd-desktop-integration pid=3188 comm=“snapd-desktop-i” exe=“/snap/snapd-desktop-integration/315/usr/bin/snapd-desktop-integration” sig=0 arch=c000003e syscall=203 compat=0 ip=0x7274c4c5c531 code=0x50000
Jul 03 22:45:22 jan-framwork13 kernel: show_signal_msg: 30 callbacks suppressed
Jul 03 22:45:22 jan-framwork13 kernel: gdbus[3794]: segfault at 0 ip 00007831fdd9b0c2 sp 00007831e9f1ea48 error 4 in libc.so.6[19b0c2,7831fdc28000+188000] likely on CPU 2 (core 2, socket 0)
Jul 03 22:45:22 jan-framwork13 kernel: Code: 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 89 f8 09 f0 c1 e0 14 3d 00 00 00 f8 0f 87 2a 03 00 00 62 e1 fe 28 6f 07 62 b2 7d 20 26 d0 <62> f1 7d 22 74 0e c5 fb 93 c9 ff c1 74 50 0f bc c9 0f b6 04 0f 0f
Jul 03 22:45:23 jan-framwork13 kernel: ucsi_acpi USBC000:00: unknown error 0
Jul 03 22:45:51 jan-framwork13 kernel: Lockdown: systemd-logind: hibernation is restricted; see man kernel_lockdown.7
Jul 03 22:46:28 jan-framwork13 systemd-journald[489]: Time jumped backwards, rotating.
Jul 03 22:50:31 jan-framwork13 kernel: warning: `Socket Thread’ uses wireless extensions which will stop working for Wi-Fi 7 hardware; use nl80211
Jul 03 22:51:15 jan-framwork13 kernel: Lockdown: systemd-logind: hibernation is restricted; see man kernel_lockdown.7
Jul 03 22:51:22 jan-framwork13 kernel: input: WH-1000XM3 (AVRCP) as /devices/virtual/input/input16

Mario_Limonciello · July 4, 2025, 4:33am

I don’t see a graphics crash here. That looks like an unrelated userspace crash to me.

Jan_Theofel · July 4, 2025, 5:13am

Thanks for your answer. But how can I fix this? I hardly can use my system without a fix.

So here are the things I want to summarize:

It does go away when I use the standard kernel with the amdgpu.mes=0 option.
It does become much more intense when switching form the standard kernel to the OEM kernel. (From 1-2 times a day to every 5-15 minutes.)
So there must be something kernel/GPU related to it, right?
So far it only happens when using Firefox. Most of the times when I am using AI web pages. But the firsts two crashes happend using canva.com
I turned off hardware acceleration in Firefox because it might be somehow related to a GPU issue. That didn’t help.

One workaround might be using a different web browser. To make sure that it is indeed a Firefox related issue.

But are there other things I can do to trace this and find a fix for it?

Thanks!

Jan_Theofel · July 5, 2025, 3:36pm

Update:
I used the generic kernel for the last two days without the MES-parameter in grub and used Vivaldi browser instead of firefox. And I had not a single crash. So it has to be related to Firefox somehow.

This is what I wanted to write.

But while I was writing the system hang again while using Vivald any tying this text. So it is not just firefox related.

I rebooted at 17:10 using the OEM kernel. And while continue to type the text the system hangs again.

At 17:16 the system crashed again with the OEM kernel while using vivaldi browser.

At 17:20 the next crash happend while using the terminal.

There is no obvious difference in the usage which I could see.
I really need help top get a stable system - or to understand that this might be a hardware isse and I need to change something physically.

Here are the kernel error logs.
Jul 05 17:03:47 jan-framwork13 wpa_supplicant[1233]: bgscan simple: Failed to enable signal strength monitoring
– Boot 970949319c3d4f509b648d3cd88db06f –
Jul 05 17:10:37 jan-framwork13 kernel: ucsi_acpi USBC000:00: unknown error 256
Jul 05 17:10:37 jan-framwork13 kernel: ucsi_acpi USBC000:00: unknown error 0
Jul 05 17:10:37 jan-framwork13 kernel: failed to load firmware /amdtee/f29bb3d9-bd66-5441-afb88acc2b2b60d6.bin
Jul 05 17:10:37 jan-framwork13 kernel: failed to copy TA binary
Jul 05 17:10:37 jan-framwork13 kernel: Failed to open TEE session err:0x0, rc:-12
Jul 05 17:10:37 jan-framwork13 kernel: amd-pmf AMDI0107:00: Failed to open TA session (-12)
Jul 05 17:10:40 jan-framwork13 bluetoothd[1144]: profiles/sap/server.c:sap_server_register() Sap driver initialization failed.
Jul 05 17:10:40 jan-framwork13 bluetoothd[1144]: sap-server: Operation not permitted (1)
Jul 05 17:10:44 jan-framwork13 wpa_supplicant[1231]: bgscan simple: Failed to enable signal strength monitoring
Jul 05 17:10:51 jan-framwork13 gdm3[1755]: Gdm: on_display_added: assertion ‘GDM_IS_REMOTE_DISPLAY (display)’ failed
Jul 05 17:10:56 jan-framwork13 gdm-password][2567]: gkr-pam: unable to locate daemon control file
Jul 05 17:10:57 jan-framwork13 gdm3[1755]: Gdm: on_display_added: assertion ‘GDM_IS_REMOTE_DISPLAY (display)’ failed
Jul 05 17:10:58 jan-framwork13 systemd[2590]: Failed to start app-gnome-gnome\x2dkeyring\x2dsecrets-2920.scope - Application launched by gnome-session-binary.
Jul 05 17:11:00 jan-framwork13 systemd[2590]: Failed to start app-gnome-ubuntu\x2dreport\x2don\x2dupgrade-3215.scope - Application launched by gnome-session-binary.
Jul 05 17:11:01 jan-framwork13 gdm3[1755]: Gdm: on_display_removed: assertion ‘GDM_IS_REMOTE_DISPLAY (display)’ failed
– Boot 818a899ab8ea421eb771bc5c4f49e115 –
ul 05 17:15:47 jan-framwork13 kernel: ucsi_acpi USBC000:00: unknown error 256
Jul 05 17:15:47 jan-framwork13 kernel: ucsi_acpi USBC000:00: unknown error 0
Jul 05 17:15:47 jan-framwork13 kernel: failed to load firmware /amdtee/f29bb3d9-bd66-5441-afb88acc2b2b60d6.bin
Jul 05 17:15:47 jan-framwork13 kernel: failed to copy TA binary
Jul 05 17:15:47 jan-framwork13 kernel: Failed to open TEE session err:0x0, rc:-12
Jul 05 17:15:47 jan-framwork13 kernel: amd-pmf AMDI0107:00: Failed to open TA session (-12)
Jul 05 17:15:49 jan-framwork13 bluetoothd[1151]: profiles/sap/server.c:sap_server_register() Sap driver initialization failed.
Jul 05 17:15:49 jan-framwork13 bluetoothd[1151]: sap-server: Operation not permitted (1)
Jul 05 17:15:53 jan-framwork13 wpa_supplicant[1216]: bgscan simple: Failed to enable signal strength monitoring
Jul 05 17:16:00 jan-framwork13 gdm3[1774]: Gdm: on_display_added: assertion ‘GDM_IS_REMOTE_DISPLAY (display)’ failed
Jul 05 17:16:09 jan-framwork13 gdm-password][2583]: gkr-pam: unable to locate daemon control file
Jul 05 17:16:10 jan-framwork13 gdm3[1774]: Gdm: on_display_added: assertion ‘GDM_IS_REMOTE_DISPLAY (display)’ failed
Jul 05 17:16:11 jan-framwork13 systemd[2606]: Failed to start app-gnome-gnome\x2dkeyring\x2dpkcs11-2939.scope - Application launched by gnome-session-binary.
Jul 05 17:16:14 jan-framwork13 gdm3[1774]: Gdm: on_display_removed: assertion ‘GDM_IS_REMOTE_DISPLAY (display)’ failed
– Boot f988383e531a4dd8b0b99e237bcbd250 –
Jul 05 17:21:37 jan-framwork13 kernel: ucsi_acpi USBC000:00: unknown error 256
Jul 05 17:21:37 jan-framwork13 kernel: ucsi_acpi USBC000:00: unknown error 0
Jul 05 17:21:37 jan-framwork13 kernel: snd_acp_pci 0000:c1:00.5: Unsupported device revision:0x71
Jul 05 17:21:37 jan-framwork13 kernel: snd_acp_pci 0000:c1:00.5: probe with driver snd_acp_pci failed with error -22
Jul 05 17:21:37 jan-framwork13 kernel: failed to load firmware /amdtee/f29bb3d9-bd66-5441-afb88acc2b2b60d6.bin
Jul 05 17:21:37 jan-framwork13 kernel: failed to copy TA binary
Jul 05 17:21:37 jan-framwork13 kernel: Failed to open TEE session err:0x0, rc:-12
Jul 05 17:21:37 jan-framwork13 kernel: amd-pmf AMDI0107:00: Failed to open TA session (-12)
Jul 05 17:21:40 jan-framwork13 bluetoothd[1418]: profiles/sap/server.c:sap_server_register() Sap driver initialization failed.
Jul 05 17:21:40 jan-framwork13 bluetoothd[1418]: sap-server: Operation not permitted (1)
Jul 05 17:21:45 jan-framwork13 wpa_supplicant[1503]: bgscan simple: Failed to enable signal strength monitoring
Jul 05 17:21:51 jan-framwork13 gdm3[1775]: Gdm: on_display_added: assertion ‘GDM_IS_REMOTE_DISPLAY (display)’ failed
Jul 05 17:21:59 jan-framwork13 gdm-password][2649]: gkr-pam: unable to locate daemon control file
Jul 05 17:21:59 jan-framwork13 gdm3[1775]: Gdm: on_display_added: assertion ‘GDM_IS_REMOTE_DISPLAY (display)’ failed
Jul 05 17:22:00 jan-framwork13 systemd[2673]: Failed to start app-gnome-gnome\x2dkeyring\x2dpkcs11-3001.scope - Application launched by gnome-session-binary.
Jul 05 17:22:00 jan-framwork13 systemd[2673]: Failed to start app-gnome-gnome\x2dkeyring\x2dsecrets-2999.scope - Application launched by gnome-session-binary.
Jul 05 17:22:00 jan-framwork13 systemd[2673]: Failed to start app-gnome-xdg\x2duser\x2ddirs-3018.scope - Application launched by gnome-session-binary.
Jul 05 17:22:03 jan-framwork13 gdm3[1775]: Gdm: on_display_removed: assertion ‘GDM_IS_REMOTE_DISPLAY (display)’ failed

Thanks!
Jan

Jan_Theofel · July 5, 2025, 5:06pm

And I installed all available updates today, rebooted. Next crash just a few minutes later.

And I should add that I get a ton of these messages:
Jul 05 18:48:09 jan-framwork13 kernel: xhci_hcd 0000:c3:00.0: Refused to change power state from D0 to D3hot

Maybe this helps?

I also installed rasdeamon to check for hardware errors. Did not find something.

Mario_Limonciello · July 5, 2025, 6:18pm

Can you try some Fedora 42 live media and see how things work? This will rule out anything in Ubuntu’s OEM kernel or the firmware or anything else you’ve done. If that’s not stable; I feel you really should contact Framework support. I’ll try to explain a few other things you have in your message though so you can rule them out.

Jul 05 17:10:37 jan-framwork13 kernel: ucsi_acpi USBC000:00: unknown error 256
Jul 05 17:10:37 jan-framwork13 kernel: ucsi_acpi USBC000:00: unknown error 0

This is coming from the EC and PD controller related to their handling of some UCSI messages from the kernel requesting some information. It’s a red herring to a stability issue.

Jul 05 17:10:37 jan-framwork13 kernel: failed to load firmware /amdtee/f29bb3d9-bd66-5441-afb88acc2b2b60d6.bin
Jul 05 17:10:37 jan-framwork13 kernel: failed to copy TA binary
Jul 05 17:10:37 jan-framwork13 kernel: Failed to open TEE session err:0x0, rc:-12
Jul 05 17:10:37 jan-framwork13 kernel: amd-pmf AMDI0107:00: Failed to open TA session (-12)

This is related to a missing firmware binary in Ubuntu but that is upstream. It’s not required for framework systems so it’s just noise.

Jul 05 18:48:09 jan-framwork13 kernel: xhci_hcd 0000:c3:00.0: Refused to change power state from D0 to D3hot
Maybe this helps?

It’s really hard to look at a snippet totally out of context and tell you what’s going on. For example that message about failing to change power states can happen when unplugging a dock.
I have a patch series under review upstream for removing some of those messages in certain scenarios.

The best way to look at a problem from the previous boot is to look at the journal from that boot. Something like this:

journalctl -b-1

will get you the entire journal from that boot. Throw it on a gist and hopefully the actual problems stand out.

Mario_Limonciello · July 5, 2025, 6:20pm

Oh! I think I see what’s going on. I was looking at linux-oem-6.14 package : Ubuntu to see which patches are in which kernel.
The VCN patch is in 6.14.0-1007.7 which is in proposed.
The repo has 6.14.0-1006.6 as the default right now.

You need to pull that newer one from proposed for now.

here’s the detailed bug with more information: Bug #2112582 “HW accelerated video playback causes VCN timeout o...” : Bugs : linux package : Ubuntu

BTW - Besides the power management stuff I mentioned that X11 doesn’t handle well, it doesn’t handle GPU resets well, which could explain behavior if the VCN patch is missing.

Jan_Theofel · July 6, 2025, 7:06am

Thank you! I installed the proposed kernel and hope that this make this bug vanish.

Jan_Theofel · July 6, 2025, 7:40am

Feels like the system is messing with me.
Just a few moments after my last message the system did freeze again.
While trying to write this it did another freeze. At that moment the system had no user input because I was talking to my daughter. And then a third freeze happend.

I understand that it is very hard to understand that issue just by the information I provided here. Can you please give me better instructions what data to collect after the next system freeze?

And using a Fedora boot system won’t help. Even if this runs two days without any crashes this does not mean they are gone. Because I also had two already two stable days of working without any freezing under Ubuntu. And that would mean two days in which I can’t access my configured programs like email, browser, etc…

Mario_Limonciello · July 6, 2025, 9:18am

Well that’s too bad; I suspect you have multiple confounded issues. Can you please share a complete journal output from a boot that freezes using the kernel you got from proposed? Upload it to a GitHub gist or pastebin.

Hopefully it stands out what’s wrong.

Jan_Theofel · July 6, 2025, 4:47pm

Now I was waiting for the freeze for hours… But finally it happend again.

Log file can be found here: crash_boot_complete.log - Nextcloud

And the second freeze happend just a few minutes later: crash_boot_complete2.log - Nextcloud

I also checked the temprature. Now it is at 50-60°, the freeze this morning were at 30-40°. I just wanted to add this because of the hot weather we are having in Germany right now.

Mario_Limonciello · July 7, 2025, 3:09am

Can you please try without the HDMI card in the slot connected to 0000:c3:00.0? It seems to me that malfunctions are happening associated with that, and I would like to see if they still happen without it.

Jan_Theofel · July 7, 2025, 6:52am

I’ll do that and let you know. I ordered another HDMI modul last week so I can also test without it and switch it for another one.

Jan_Theofel · July 8, 2025, 8:46am

It did also freeze without the HDMI port.
But the “refused to change power state” messages are gone. So they come clearly from the HDMI port and I will test with the new module when it arrives.

Here is the log from todays crash:

It shows again a MES issue shortly befor the freeze happend.

Topic		Replies	Views
[RESPONDED] Crashing amdgpu on AMD Ryzen 7040 13-inch (Ubuntu 22.04) Linux ubuntu	16	4578	August 1, 2024
Outstanding problems with LG monitor Thunderbolt stability / GPU hangs Framework Laptop 16 framework-laptop-16-amd-ai-300 , expansion-bay-shell	29	541	February 20, 2026
DCMUB Error on BIOS 3.05 + Kernel 6.13.1 hit a very nasty AMDGPU bug on Framework Laptop 13 (AMD Ryzen 7 7840U) Linux nixos	17	1466	February 21, 2026
[SOLVED] Amdgpu crashes and artifacts with Mesa 25, kernel 6.13 Linux fedora , arch , opensuse	56	9340	January 26, 2026
[RESPONDED] AMD Ryzen 7040 (7840U) - Arch Linux amdgpu errors, blank screen on opening Steam Linux arch	28	14149	October 24, 2024