Framework 16 AMD GPU Errors

Hello,

I have had my Framework 16 for a couple weeks now and have been using it quite regularly. I am running Ubuntu 24.04. This morning I encountered an alarming error where a game I was playing on the dgpu unexpectedly closed. I went into System Monitor to kill some processes of the game that were still hanging around and when I did so GNOME completely froze up except for mouse movement. After a hard reset I poked around in system logs using journalctl and noticed there were AMDGPU errors. Is this something I need to be worried about? Do the errors suggest something is wrong with my RX 7700S?

Here are some of the errors:
Jul 13 09:03:46 amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:18 param:0x00000005 message:TransferTableSmu2Dram?
Jul 13 09:03:46 amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:18 param:0x00000005 message:TransferTableSmu2Dram?
Jul 13 09:03:46 amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:18 param:0x00000005 message:TransferTableSmu2Dram?
Jul 13 09:03:46 amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:18 param:0x00000005 message:TransferTableSmu2Dram?
Jul 13 09:03:46 amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:41 param:0x00000000 message:DisallowGfxOff?
Jul 13 09:03:46 amdgpu 0000:03:00.0: amdgpu: Failed to disable gfxoff!
Jul 13 09:03:46 [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=3
Jul 13 09:03:46 [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] ERROR failed to unmap legacy queue
Jul 13 09:03:47 [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=3
Jul 13 09:03:47 [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] ERROR failed to unmap legacy queue
Jul 13 09:03:47 [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=3
Jul 13 09:03:47 [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] ERROR failed to unmap legacy queue
Jul 13 09:03:47 [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=3
Jul 13 09:03:47 [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] ERROR failed to unmap legacy queue
Jul 13 09:03:47 [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=3
Jul 13 09:03:47 [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] ERROR failed to unmap legacy queue
Jul 13 09:03:47 [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=3
Jul 13 09:03:47 [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] ERROR failed to unmap legacy queue
Jul 13 09:03:47 [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=3
Jul 13 09:03:47 [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] ERROR failed to unmap legacy queue
Jul 13 09:03:47 [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=3
Jul 13 09:03:47 [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] ERROR failed to unmap legacy queue
Jul 13 09:03:47 [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=3
Jul 13 09:03:47 [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] ERROR failed to unmap legacy queue
Jul 13 09:03:48 [drm:gfx_v11_0_cp_gfx_enable.isra.0 [amdgpu]] ERROR failed to halt cp gfx
Jul 13 09:03:48 [drm:psp_ring_cmd_submit [amdgpu]] ERROR ring_buffer_start = 00000000deeede98; ring_buffer_end = 00000000caf5a5d4; write_frame = 000000007dad6863
Jul 13 09:03:48 [drm:psp_ring_cmd_submit [amdgpu]] ERROR write_frame is pointing to address out of bounds
Jul 13 09:03:48 [drm:psp_ring_cmd_submit [amdgpu]] ERROR ring_buffer_start = 00000000deeede98; ring_buffer_end = 00000000caf5a5d4; write_frame = 000000007dad6863
Jul 13 09:03:48 [drm:psp_ring_cmd_submit [amdgpu]] ERROR write_frame is pointing to address out of bounds
Jul 13 09:03:48 [drm:psp_ring_cmd_submit [amdgpu]] ERROR ring_buffer_start = 00000000deeede98; ring_buffer_end = 00000000caf5a5d4; write_frame = 000000007dad6863
Jul 13 09:03:48 [drm:psp_ring_cmd_submit [amdgpu]] ERROR write_frame is pointing to address out of bounds
Jul 13 09:03:48 [drm:psp_ring_cmd_submit [amdgpu]] ERROR ring_buffer_start = 00000000deeede98; ring_buffer_end = 00000000caf5a5d4; write_frame = 000000007dad6863
Jul 13 09:03:48 [drm:psp_ring_cmd_submit [amdgpu]] ERROR write_frame is pointing to address out of bounds
Jul 13 09:03:49 [drm:psp_hw_init [amdgpu]] ERROR PSP firmware loading failed
Jul 13 09:03:49 [drm:amdgpu_device_fw_loading [amdgpu]] ERROR hw_init of IP block failed -22
Jul 13 09:03:49 amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_init failed
Jul 13 09:03:49 amdgpu 0000:03:00.0: amdgpu: Fatal error during GPU init
Jul 13 09:03:51 pci 0000:03:00.0: amdgpu: failed to clear page tables on GEM object close (-19)
Jul 13 09:03:51 pci 0000:03:00.0: amdgpu: leaking bo va (-19)

The bottom 2 messages repeat for a large number of entries.

This is most likely an amd driver or firmware bug. If you GPU was working OK after a reboot, it is a software bug.
Try to make sure you have the latest amd firmware and the most up to date linux kernel you can find.

1 Like

Not seing any open bug reports. But it’s probably best to report such bugs here: https://gitlab.freedesktop.org/drm/amd/-/issues/

As James3 sais, it sounds a lot like a driver/firmware issue.