Batch 11 of Framework Laptop 13 (AMD Ryzen™ 7040 Series)
My framework laptop keeps crashing. So, I took at look at journalctl and noticed that it seems to be a problem with the gpu. Have others had this issue? How did you fix it?
Apr 05 01:06:32 df rtkit-daemon[1226]: Supervising 18 threads of 12 processes of 2 users.
Apr 05 01:06:32 df rtkit-daemon[1226]: Supervising 18 threads of 12 processes of 2 users.
Apr 05 01:06:48 df kernel: i2c_hid_acpi i2c-FRMW0005:00: failed to set a report to device: -121
Apr 05 01:07:02 df rtkit-daemon[1226]: Supervising 18 threads of 12 processes of 2 users.
Apr 05 01:07:02 df rtkit-daemon[1226]: Supervising 18 threads of 12 processes of 2 users.
Apr 05 01:07:12 df kernel: i2c_hid_acpi i2c-FRMW0005:00: failed to set a report to device: -121
Apr 05 01:07:17 df kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=74141, emitted seq=74143
Apr 05 01:07:17 df kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0
Apr 05 01:07:17 df kernel: amdgpu 0000:c1:00.0: amdgpu: GPU reset begin!
Apr 05 01:07:18 df kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
Apr 05 01:07:18 df kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Apr 05 01:07:18 df kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
Apr 05 01:07:18 df kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Apr 05 01:07:18 df kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
Apr 05 01:07:18 df kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Apr 05 01:07:18 df kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
Apr 05 01:07:18 df kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Apr 05 01:07:18 df kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
Apr 05 01:07:18 df kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Apr 05 01:07:18 df kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
Apr 05 01:07:18 df kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Apr 05 01:07:18 df kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
Apr 05 01:07:18 df kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Apr 05 01:07:18 df kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
Apr 05 01:07:18 df kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Apr 05 01:07:18 df kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
Apr 05 01:07:18 df kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Apr 05 01:07:19 df kernel: [drm:gfx_v11_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
Apr 05 01:07:19 df kernel: amdgpu 0000:c1:00.0: amdgpu: MODE2 reset
Apr 05 01:07:19 df kernel: amdgpu 0000:c1:00.0: amdgpu: GPU reset succeeded, trying to resume
Apr 05 01:07:19 df kernel: [drm] PCIE GART of 512M enabled (table at 0x000000801FD00000).
Apr 05 01:07:19 df kernel: [drm] VRAM is lost due to GPU reset!
Apr 05 01:07:19 df kernel: amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
Apr 05 01:07:19 df kernel: amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
Apr 05 01:07:19 df kernel: [drm] DMUB hardware initialized: version=0x08000500
Apr 05 01:07:19 df kernel: [drm] Watermarks table not configured properly by SMU
Apr 05 01:07:19 df kernel: [drm] kiq ring mec 3 pipe 1 q 0
Apr 05 01:07:19 df kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode).
Apr 05 01:07:19 df kernel: amdgpu 0000:c1:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
Apr 05 01:07:19 df kernel: amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
Apr 05 01:07:19 df kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Apr 05 01:07:19 df kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Apr 05 01:07:19 df kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
Apr 05 01:07:19 df kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
Apr 05 01:07:19 df kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
Apr 05 01:07:19 df kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
Apr 05 01:07:19 df kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
Apr 05 01:07:19 df kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
Apr 05 01:07:19 df kernel: amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
Apr 05 01:07:19 df kernel: amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 1
Apr 05 01:07:19 df kernel: amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 1
Apr 05 01:07:19 df kernel: amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
Apr 05 01:07:19 df kernel: amdgpu 0000:c1:00.0: amdgpu: recover vram bo from shadow start
Apr 05 01:07:19 df kernel: amdgpu 0000:c1:00.0: amdgpu: recover vram bo from shadow done
My firmware appears to have the latest updates. I try to keep on top of it, but maybe I’m missing something:
$ fwupdmgr get-updates
Devices with no available firmware updates:
• Fingerprint Sensor
• UEFI dbx
• WD BLACK SN850X 1000GB
Devices with the latest available firmware version:
• System Firmware
No updates available
I’d try upgrading the GPU firmware (by extracting the amdgpu folder of upstream linux-firmware into /lib/firmware, then regenerating initramfs with sudo update-initramfs -c -k $(uname -r)) first, as Mario suggested.
Myself, I ultimately had to upgrade Mesa (by apt-pinning trixie and installing from there) to fix it, which caused enough dependency conflicts with other packages that I ended up upgrading to Debian Trixie fully. Setting the kernel parameter amdgpu.sg_display=0 and the GPU mode to UMA_GAME_OPTIMIZED in BIOS settings also helped.
I’m not sure. You can check the version of the equivalent Debian package, firmware-amd-graphics, with apt info firmware-amd-graphics - should be 20230210 on bookworm or 20230625 on trixie - and if necessary downgrade back to Debian’s version with sudo apt install --reinstall firmware-amd-graphics.
@grant2 (or anyone else with this problem): could you upgrade your BIOS to version 3.05 and see if the GPU resets still occur? I’ve done so myself and disabled a few other workarounds I had in place (aside from upgraded mesa) and I haven’t been able to replicate the bug yet.
I am going to wait for the BIOS upgrade to get mainlined and integrated into some kind of process (such as apt or fwupdmgr) instead of doing a manual upgrade. I’m trying to use my machine in a production setting, and I want to fiddle with it as little as possible.
That said, I did adjust the BIOS settings so that GPU mode now has the value UMA_GAME_OPTIMIZED. However, I have not had a chance to have an hour long conference call to see if my machine dies.
As an update, toggling the BIOS setting so that GPU mode has the value UMA_GAME_OPTIMIZED did not solve my problem. My computer still dies after about 40 minutes of being on a conference call.
I am going to take another look at the BIOS update mentioned by @northivanastan. I will provide further updates about the success or failure of that approach.
Unfortunately debian wouldn’t update stable for a bugfix like this, only for security bugs. Unfortunately for Debian, “stable” is meant to mean “not changing” not meant to mean “not buggy”. So stable doesn’t get new packages, it gets security patches backported to old packages as needed.
Better would be to see if a newer mesa package could be backported to stable and then it could be offered to upload to https://backports.debian.org/
My gnome session crashed, and I am wondering if it is related to this bug. I was able to press C-M F1 to log out and switch users to check out the log file:
Apr 30 15:33:53 df gnome-shell[2775]: meta_window_set_stack_position_no_sync: assertion 'window->stack_position >= 0' failed
Apr 30 15:35:48 df gnome-shell[2775]: Window manager warning: last_user_time (142244507) is greater than comparison timestamp (142244506). This most likely represents a buggy client sending inaccurate timestamps in messages such as _NET_ACTIVE_WINDOW. Trying to work around...
Apr 30 15:35:48 df gnome-shell[2775]: Window manager warning: W1016 appears to be one of the offending windows with a timestamp of 142244507. Working around...
Apr 30 15:44:38 df gnome-shell[2775]: Window manager warning: WM_TRANSIENT_FOR window 0x81032b for 0x81034b window override-redirect is an override-redirect window and this is not correct according to the standard, so we'll fallback to the first non-override-redirect window 0x80004c.
Apr 30 15:44:40 df gnome-shell[2775]: Window manager warning: Window 0x81035a sets an MWM hint indicating it isn't resizable, but sets min size 1 x 1 and max size 2147483647 x 2147483647; this doesn't make much sense.
Apr 30 15:44:40 df gnome-shell[2775]: Window manager warning: Window 0x81035a sets an MWM hint indicating it isn't resizable, but sets min size 1 x 1 and max size 2147483647 x 2147483647; this doesn't make much sense.
Apr 30 15:47:23 df gnome-shell[2775]: Window manager warning: WM_TRANSIENT_FOR window 0x810a96 for 0x810ad3 window override-redirect is an override-redirect window and this is not correct according to the standard, so we'll fallback to the first non-override-redirect window 0x80004c.
Apr 30 15:47:23 df gnome-shell[2775]: Window manager warning: WM_TRANSIENT_FOR window 0x810a96 for 0x810adf window override-redirect is an override-redirect window and this is not correct according to the standard, so we'll fallback to the first non-override-redirect window 0x80004c.
Apr 30 15:47:24 df gnome-shell[2775]: Window manager warning: WM_TRANSIENT_FOR window 0x810a96 for 0x810aed window override-redirect is an override-redirect window and this is not correct according to the standard, so we'll fallback to the first non-override-redirect window 0x80004c.
Apr 30 15:48:32 df gnome-shell[2775]: amdgpu: amdgpu_cs_query_fence_status failed.
Apr 30 15:48:32 df gnome-shell[17171]: amdgpu: The CS has been rejected (-125), but the context isn't robust.
Apr 30 15:48:32 df gnome-shell[17171]: amdgpu: The process will be terminated.
Apr 30 15:48:32 df gnome-shell[2775]: Connection to xwayland lost
Apr 30 15:48:32 df gnome-shell[2775]: X Wayland crashed; attempting to recover
Apr 30 15:48:32 df systemd[2613]: Stopped target gnome-session-x11-services-ready.target - GNOME session X11 services.
░░ Subject: A stop job for unit UNIT has finished
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A stop job for unit UNIT has finished.
░░
░░ The job identifier is 1197 and the job result is done.
Apr 30 15:48:32 df systemd[2613]: Stopping org.gnome.SettingsDaemon.XSettings.service - GNOME XSettings service...
░░ Subject: A stop job for unit UNIT has begun execution
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A stop job for unit UNIT has begun execution.
░░
░░ The job identifier is 1198.
Apr 30 15:48:32 df gnome-shell[2775]: Using public X11 display :0, (using :1 for managed services)
Apr 30 15:48:32 df gnome-shell[2775]: amdgpu: amdgpu_cs_query_fence_status failed.
Apr 30 15:48:32 df gnome-shell[2775]: amdgpu: The CS has been rejected (-125). Recreate the context.
Apr 30 15:48:32 df gnome-shell[2775]: amdgpu: The CS has been rejected (-125), but the context isn't robust.
Apr 30 15:48:32 df gnome-shell[2775]: amdgpu: The process will be terminated.
Apr 30 15:48:32 df systemd[2613]: org.gnome.SettingsDaemon.XSettings.service: Main process exited, code=exited, status=1/FAILURE
░░ Subject: Unit process exited
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ An ExecStart= process belonging to unit UNIT has exited.
░░
░░ The process' exit code is 'exited' and its exit status is 1.
Apr 30 15:48:32 df nautilus[124600]: Error reading events from display: Broken pipe
Apr 30 15:48:32 df gnome-clocks[43651]: Error reading events from display: Broken pipe
Apr 30 15:48:32 df gnome-calendar[3559]: Error reading events from display: Broken pipe
Apr 30 15:48:32 df gnome-terminal-[8989]: Error reading events from display: Broken pipe
Apr 30 15:48:32 df xdg-desktop-por[3341]: Error reading events from display: Broken pipe
Apr 30 15:48:32 df kdeconnectd[3150]: Error reading events from display: Broken pipe
Apr 30 15:48:32 df evolution-alarm[3084]: Error reading events from display: Broken pipe
Apr 30 15:48:32 df gsd-keyboard[3059]: Error reading events from display: Broken pipe
Apr 30 15:48:32 df gsd-wacom[3083]: Error reading events from display: Broken pipe
Apr 30 15:48:32 df xdg-desktop-por[3306]: Error reading events from display: Broken pipe
Apr 30 15:48:32 df unknown[3047]: Error reading events from display: Broken pipe
Apr 30 15:48:32 df gsd-power[3062]: Error reading events from display: Broken pipe
Apr 30 15:48:32 df unknown[3061]: Error reading events from display: Broken pipe
Apr 30 15:48:32 df systemd[2613]: org.gnome.SettingsDaemon.Color.service: Main process exited, code=exited, status=1/FAILURE
░░ Subject: Unit process exited
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ An ExecStart= process belonging to unit UNIT has exited.
░░
░░ The process' exit code is 'exited' and its exit status is 1.
Having the same problem, FW Laptop 13 (AMD Ryzen 7040Series), Debian GNU/Linux 12 (bookworm) x86_64, Kernel 6.1.0-21-amd64.
After waking up from suspension, I’m able to work normally for some time (about 2 hours!). Then suddenly the screen turns black and thats it. Sound and input still works flawlessly. Closing and opening the lid doesn’t do anything.
For me, this only started happening after the 3.05 firmware update. Since then I had it happen four times.
...
May 13 19:28:08 fir gnome-shell[1673]: amdgpu: amdgpu_cs_query_fence_status failed.
May 13 19:28:08 fir firefox.desktop[5698]: amdgpu: The CS has been rejected (-125). Recreate the context.
May 13 19:28:08 fir gnome-shell[1673]: amdgpu: amdgpu_cs_query_fence_status failed.
May 13 19:28:08 fir gnome-shell[1673]: amdgpu: amdgpu_cs_query_fence_status failed.
May 13 19:28:08 fir firefox.desktop[5698]: amdgpu: The CS has been rejected (-125). Recreate the context.
May 13 19:28:08 fir firefox.desktop[5698]: [GFX1-]: GFX: RenderThread detected a device reset in PostUpdate
May 13 19:28:08 fir gnome-shell[1673]: amdgpu: The CS has been rejected (-125), but the context isn't robust.
May 13 19:28:08 fir gnome-shell[1673]: amdgpu: The process will be terminated.
May 13 19:28:08 fir evolution-alarm[2260]: Error reading events from display: Broken pipe
May 13 19:28:08 fir gnome-shell[2397]: (EE) failed to read Wayland events: Broken pipe
May 13 19:28:08 fir firefox.desktop[5698]: [GFX1-]: Failed to create EGLSurface!: 0x3000
May 13 19:28:08 fir firefox.desktop[5698]: [GFX1-]: Failed to create EGLSurface. 1 renderers, 1 active.
May 13 19:28:08 fir evince[12456]: Error reading events from display: Broken pipe
May 13 19:28:08 fir gsd-power[2217]: Error reading events from display: Broken pipe
May 13 19:28:08 fir gsd-media-keys[2215]: Error reading events from display: Broken pipe
May 13 19:28:08 fir gsd-color[2200]: Error reading events from display: Broken pipe
May 13 19:28:08 fir gsd-wacom[2247]: Error reading events from display: Broken pipe
May 13 19:28:08 fir gsd-keyboard[2211]: Error reading events from display: Broken pipe
May 13 19:28:08 fir systemd[1563]: org.gnome.SettingsDaemon.Color.service: Main process exited, code=exited, status=1/FAILURE
May 13 19:28:08 fir keepassxc[2339]: Error reading events from display: Broken pipe
May 13 19:28:08 fir xdg-desktop-por[2684]: Error reading events from display: Broken pipe
May 13 19:28:08 fir firefox.desktop[5698]: ExceptionHandler::GenerateDump cloned child 55938
May 13 19:28:08 fir breezebird.desktop[3923]: Exiting due to channel error.
May 13 19:28:08 fir workplace.desktop[2400]: X connection to :0 broken (explicit kill or server shutdown).
May 13 19:28:08 fir xdg-desktop-por[2528]: Error reading events from display: Broken pipe
May 13 19:28:08 fir firefox.desktop[55938]: ExceptionHandler::WaitForContinueSignal waiting for continue signal...
May 13 19:28:08 fir firefox.desktop[5698]: ExceptionHandler::SendContinueSignalToChild sent continue signal to child
May 13 19:28:08 fir sudo[2251]: pam_unix(sudo:session): session closed for user root
...
May 13 19:28:08 fir gsd-keyboard[55956]: Cannot open display:
May 13 19:28:08 fir gsd-color[55955]: Cannot open display:
May 13 19:28:08 fir systemd[1563]: org.gnome.SettingsDaemon.Keyboard.service: Main process exited, code=exited, status=1/FAILURE
May 13 19:28:08 fir gsd-wacom[55959]: Cannot open display:
May 13 19:28:08 fir gsd-xsettings[55960]: Cannot open display:
...
May 13 19:28:10 fir systemd[1563]: org.gnome.SettingsDaemon.Keyboard.service: Scheduled restart job, restart counter is at 6.
May 13 19:28:10 fir systemd[1563]: org.gnome.SettingsDaemon.XSettings.service: Scheduled restart job, restart counter is at 6.
May 13 19:28:10 fir systemd[1563]: org.gnome.SettingsDaemon.Color.service: Scheduled restart job, restart counter is at 6.
May 13 19:28:10 fir systemd[1563]: org.gnome.SettingsDaemon.MediaKeys.service: Scheduled restart job, restart counter is at 6.
May 13 19:28:10 fir systemd[1563]: org.gnome.SettingsDaemon.Power.service: Scheduled restart job, restart counter is at 6.
May 13 19:28:10 fir systemd[1563]: org.gnome.SettingsDaemon.Wacom.service: Scheduled restart job, restart counter is at 6.
May 13 19:28:10 fir systemd[1563]: Stopped org.gnome.SettingsDaemon.Color.service - GNOME color management service.
May 13 19:28:10 fir systemd[1563]: org.gnome.SettingsDaemon.Color.service: Start request repeated too quickly.
May 13 19:28:10 fir systemd[1563]: org.gnome.SettingsDaemon.Color.service: Failed with result 'exit-code'.
...
The problem that I (and I think grant) reported does involve the displays turning on and off, but the Framework does not reboot itself. It simply kicks the user back to the login screen due to a compositor crash. And it was partially fixed by firmware 3.05, in my case. (GPU resets still happen, but infrequently, and the desktop reloads cleanly.)
Hmm, ok… I thought the issue was related because my log looks pretty much the same as the one posted by grant. The weird thing is that for me the issues started only AFTER updating to 3.05! If the issue persists I’ll maybe have to open another post…