FW Desktop locking up with 2 latest kernels in Ubuntu 24.04

System: Ryzen™ AI Max+ 395 - 128GB
OS: Ubuntu 24.04 LTS
Drives: WD_BLACK SN850X 8000GB & WD_BLACK SN850X 2000GB
Display: Running X11 and not Wayland.

This system has been rock solid running on kernel 6.17.0-19 and previous kernels for months now. But, with kernels 6.17.0-20 or 6.17.0-22 it will just randomly hang. Sometimes it will randomly reboot a minute or so after the hang, other times I will have to remove power and reboot.

There are no artifacts left over from crash that I can find. Logs all seem to stop at the point of the crash. It doesn’t appear to be a hardware problem as it runs flawlessly on 0-19.

Curious if anybody else is seeing this behavior. And if anybody might know the cause and a cure.

I’m not running Wayland as this system runs VMWare Workstation ( Latest version ) very heavily and Workstation doesn’t work so hot under Wayland.

Can you confirm it’s only in X11?

Try amdgpu.dcdebugmask=0x1000 if so to turn off IPS. This power savings feature has known issues with how X11 handles vblank.

Can you confirm it’s only in X11?

I don’t really have a way of doing this. This is my daily driver and VMWare Workstation doesn’t cooperate with Wayland in a usable fashion.

I will try that debug mask tomorrow morning when I boot up and see how things go. Sometimes the system will run fine for several days before it locks up. I’ll make the change and keep an eye on things.

After a bunch of searching on the Internet and digging through more logs I finally found this:

2026-04-29T13:40:48.256473-04:00 fmdskp-dev kernel: amdgpu 0000:c3:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)
2026-04-29T13:40:48.256484-04:00 fmdskp-dev kernel: amdgpu 0000:c3:00.0: amdgpu: failed to reg_write_reg_wait
2026-04-29T13:40:48.352198-04:00 fmdskp-dev gsd-power[4229]: Failed to acquire idle monitor proxy: Timeout was reached
2026-04-29T13:40:49.311466-04:00 fmdskp-dev kernel: amdgpu 0000:c3:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000032 SMN_C2PMSG_82:0x00000000
2026-04-29T13:40:49.311482-04:00 fmdskp-dev kernel: amdgpu 0000:c3:00.0: amdgpu: Failed to disable gfxoff!
2026-04-29T13:40:49.311483-04:00 fmdskp-dev kernel: amdgpu 0000:c3:00.0: amdgpu: Dumping IP State Completed
2026-04-29T13:40:49.312446-04:00 fmdskp-dev kernel: amdgpu 0000:c3:00.0: amdgpu: [drm] AMDGPU device coredump file has been created
2026-04-29T13:40:49.312447-04:00 fmdskp-dev kernel: amdgpu 0000:c3:00.0: amdgpu: [drm] Check your /sys/class/drm/card1/device/devcoredump/data
2026-04-29T13:40:49.312447-04:00 fmdskp-dev kernel: amdgpu 0000:c3:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=1676897, emitted seq=1676900
2026-04-29T13:40:49.312448-04:00 fmdskp-dev kernel: amdgpu 0000:c3:00.0: amdgpu: Starting gfx_0.0.0 ring reset
2026-04-29T13:40:50.357423-04:00 fmdskp-dev gsd-power[4229]: Error setting property 'PowerSaveMode' on interface org.gnome.Mutter.DisplayConfig: Timeout was reached (g-io-error-quark, 24)
2026-04-29T13:40:50.809463-04:00 fmdskp-dev kernel: amdgpu 0000:c3:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)
2026-04-29T13:40:50.809475-04:00 fmdskp-dev kernel: amdgpu 0000:c3:00.0: amdgpu: failed to reg_write_reg_wait
2026-04-29T13:40:52.914045-04:00 fmdskp-dev gsd-power[4229]: Failed to acquire idle monitor proxy: Timeout was reached
2026-04-29T13:40:53.376460-04:00 fmdskp-dev kernel: amdgpu 0000:c3:00.0: amdgpu: MES failed to respond to msg=RESET
2026-04-29T13:40:53.376470-04:00 fmdskp-dev kernel: amdgpu 0000:c3:00.0: amdgpu: failed to reset legacy queue
2026-04-29T13:40:53.376470-04:00 fmdskp-dev kernel: amdgpu 0000:c3:00.0: amdgpu: reset via MES failed and try pipe reset -110
2026-04-29T13:40:53.376471-04:00 fmdskp-dev kernel: amdgpu 0000:c3:00.0: amdgpu: The CPFW hasn't support pipe reset yet.
2026-04-29T13:40:53.376471-04:00 fmdskp-dev kernel: amdgpu 0000:c3:00.0: amdgpu: Ring gfx_0.0.0 reset failed
2026-04-29T13:40:53.376472-04:00 fmdskp-dev kernel: amdgpu 0000:c3:00.0: amdgpu: GPU reset begin!
2026-04-29T13:40:54.807713-04:00 fmdskp-dev gsd-power[4229]: Error setting property 'PowerSaveMode' on interface org.gnome.Mutter.DisplayConfig: Timeout was reached (g-io-error-quark, 24)
2026-04-29T13:40:55.929465-04:00 fmdskp-dev kernel: amdgpu 0000:c3:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)
2026-04-29T13:40:55.929480-04:00 fmdskp-dev kernel: amdgpu 0000:c3:00.0: amdgpu: failed to reg_write_reg_wait
2026-04-29T13:40:57.733761-04:00 fmdskp-dev gsd-power[4229]: Failed to acquire idle monitor proxy: Timeout was reached
2026-04-29T13:41:00.255458-04:00 fmdskp-dev kernel: amdgpu 0000:c3:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000032 SMN_C2PMSG_82:0x00000000
2026-04-29T13:41:00.255475-04:00 fmdskp-dev kernel: amdgpu 0000:c3:00.0: amdgpu: Failed to disable gfxoff!
2026-04-29T13:41:22.673501-04:00 fmdskp-dev kernel: ------------[ cut here ]------------

Seems there is a known issue with the micro scheduler at times. I have added the following to the 6.17.0-22 kernel boot parameters:

amdgpu.mes=0

I will be running with this for a few days. Hopefully no more crashes.

I can confirm these problems persist with with 26.04 LTS, kernel 7, and Wayland.

Are you seeing the same MicroEngine Scheduler (MES) error messages?

And did this fix anything for you:

amdgpu.mes=0

This:

amdgpu.mes=0

did not fix the problem. After a few hours I ended up with this crash:

[Thu Apr 30 13:17:50 2026] amdgpu 0000:c3:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000032 SMN_C2PMSG_82:0x00000000
[Thu Apr 30 13:17:50 2026] amdgpu 0000:c3:00.0: amdgpu: Failed to power gate VPE!
[Thu Apr 30 13:17:50 2026] [drm:amdgpu_dpm_enable_vpe [amdgpu]] *ERROR* Dpm disable vpe failed, ret = -62.
[Thu Apr 30 13:18:04 2026] amdgpu 0000:c3:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000032 SMN_C2PMSG_82:0x00000000
[Thu Apr 30 13:18:04 2026] amdgpu 0000:c3:00.0: amdgpu: Failed to power ungate VCN instance 1!
[Thu Apr 30 13:18:04 2026] [drm:amdgpu_dpm_enable_vcn [amdgpu]] *ERROR* Dpm enable uvd failed, ret = -62. 
[Thu Apr 30 13:18:04 2026] amdgpu 0000:c3:00.0: amdgpu: failed to load ucode VCN1_RAM(0x3C) 
[Thu Apr 30 13:18:04 2026] amdgpu 0000:c3:00.0: amdgpu: psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xFFFF0007)

1 Like

Sorry this is happening to you. This should be a supplier or retailer problem, not something consumers like us should have to deal with.

  • 3.0.4 firmware didn’t fix it.
  • 3.0.5 firmware didn’t fix it.
  • Wayland didn’t fix it.
  • kernel flags didn’t fix it.
  • Upgrading to 26.04 with Kernel 7 didn’t fix it.

We waited 4 months to go from 3.0.4 to 3.0.5 and we get “memory improvements” and “fixed boot time.” Meanwhile, the GPU and fabric flood events continue to stream in and those of us that bought the machine to actually use it are left with a $4000 paper weight.

Here’s the list of flags I’ve been asked to throw at this. None of them fixed it. YMMV:

  • GPU crashes
    • amdgpu.dcdebugmask=0x10
    • amdgpu.gpu_recovery=1
    • amdgpu.mes=0
  • Fabric floods
    • pcie_ports=native
    • pcie_ecrc=on

My “most stable” version was kernel 6.17.0-19 on firmware 3.0.3. Didn’t get much time to run below 3.0.3 because the system automatically updated when I got it and I haven’t spent the additional time to roll back. But, to the point, I shouldn’t have to.

Yup, on days where I can’t afford downtime I’m running 6.17.0-19 because it has not crashed on me yet.

I’m assuming this is most definitely a kernel or kernel firmware issue as it’s happening on more than just the FW Desktop according to reports I’m finding on the Internet.

I’m still running firmware 3.0.3 as I tend to wait awhile before updating firmware versions.

I guess I’m sort of stuck to a point in time now until somebody figures out what’s going on.

1 Like

This system has been rock solid running on kernel 6.17.0-19 and previous kernels for months now. But, with kernels 6.17.0-20 or 6.17.0-22 it will just randomly hang. Sometimes it will randomly reboot a minute or so after the hang, other times I will have to remove power and reboot.

I’d ask the Ubuntu/Canonical people, given this behavior. At a guess, some new back port probably broke something. When they backport so much there’s a lot of room for things to go sideways.

There is a bug report about this on drm/amd gitlab.

The issue is a race condition in VPE power gating that is exposed by newer mesa versions. It’s being looked at by AMD.

But it’s exposed specifically by Chrome based browsers with hardware video decode enabled. Doesn’t happen in Firefox.

2 Likes

Keep in mind chromium will be in non obvious places - any electron application could potentially trigger it too if it happens to use hardware video decode for something.

Hmm… I’m pretty sure every time this has happened there has been a video playing in Chrome. I’ll probably try disabling accelerated video decode in Chrome and try running a newer kernel over the weekend.

Do you happen to have a link to this?

Thanks much.

1 Like

@Mario_Limonciello , wrote up something here the other day that hints Chromium as well – Slack in Electron. Don’t know if this helps. Saw you were active in that Gitlab thread. Don’t think this provides any smoking gun, but it is supporting evidence (I think).

Doesn’t look the same to me unfortunately.

That’s… Unfortunate. I’ll keep beating on it. Looking forward to resolution on the other issue though.