[SOLVED] *ERROR* MES failed to response msg=14

Hi!

I’m seeing a strange behavior, with a Framework laptop 16, with no dedicated graphic card, an AMD Ryzen 7 7840HS w/ Radeon 780M Graphics running Debian Trixie with Gnome shell in Wayland, Linux kernel 6.7.12, bios 3.03.

The behavior is felt as a user like this: touchpad seems slower, freezes for like 200ms every now and then, keyboard feels “sluggish” too, like reporting keystrokes in bursts a few hundreds milliseconds after I typed them.

It looks correlated to events in the journal:

Jun 12 02:08:47 kalir kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
Jun 12 02:08:47 kalir kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=14

It shows in burst in the logs, I got around 2.5k in the few last hours.

It does not always happen: I can use the laptop for hours before it happen, and it can resolve by itself and I can use the laptop for hours without being concerned by it.

Other strange thing, the date are reported in the future in the journal, we’re on Jun 11, the dates are from Jun 12, so it’s a mess to read, see:

Jun 11 15:13:33 kalir rtkit-daemon[2213]: Supervising 1 threads of 1 processes of 1 users.
Jun 11 15:13:33 kalir rtkit-daemon[2213]: Supervising 1 threads of 1 processes of 1 users.
Jun 12 02:08:41 kalir kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=14
Jun 12 02:08:41 kalir kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
[... many duplicates of the two last lines ...]
Jun 12 02:09:57 kalir kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=14
Jun 12 02:09:57 kalir kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
Jun 11 15:15:05 kalir rtkit-daemon[2213]: Supervising 1 threads of 1 processes of 1 users.
Jun 11 15:15:06 kalir rtkit-daemon[2213]: Supervising 1 threads of 1 processes of 1 users.
Jun 12 02:09:58 kalir kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=14
Jun 12 02:09:58 kalir kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait

Haha just found that the slow behavior have been noticed and logged:

Jun 11 15:15:39 kalir gnome-shell[3738]: Key repeat discarded, Wayland compositor doesn't seem to be processing events fast enough!

Does someone have any idea?

More info one hour later:

I’m not only having error 14 but it’s the most common:

$ sudo journalctl --grep '\[drm:' -ocat | logtop
5147 elements
   1 2563 [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
   2 2563 [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=14
   3    8 amdgpu 0000:c1:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
   5    3 [drm:amdgpu_mes_self_test [amdgpu]] *ERROR* failed to add ring
   6    3 [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=2
   7    2 [drm:amdgpu_mes_add_hw_queue [amdgpu]] *ERROR* failed to add hardware queue to MES, doorbell=0x80a
   8    1 [drm:amdgpu_mes_add_hw_queue [amdgpu]] *ERROR* failed to add hardware queue to MES, doorbell=0xa00
   9    1 amdgpu 0000:c1:00.0: [drm:jpeg_v4_0_early_init [amdgpu]] JPEG decode is enabled in VM mode

Had a missing firmare (found using journalctl --grep firmware), added it, I don’t know if it resolves my issue, I’ll try to report back here.

Here’s what I did:

$ wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/amdgpu/gc_11_0_1_mes_2.bin
$ sudo mv gc_11_0_1_mes_2.bin /lib/firmware/amdgpu/

rebooted, and obviously the missing firmware error has gone.

The missing firmware had not resolved my issue, I still see err=14 popping in my logs. Yet it took like 24h before appearing, I still don’t know what triggers it.

Thanks for this thread @mdk , this could help others experiencing similar issues. Marking thread as solved.

cheers! :slight_smile:

It’s not just a missing firmware, it’s a totally outdated snapshot. Don’t just update the one file, update the whole snapshot of amdgpu/ firmware. And please complain to Debian to fix it.