[RESPONDED] VRAM is lost due to GPU reset! (followed by a crash)

Thank you for helping us by helping Mario by filing the bug report, @northivanastan - much appreciated.

1 Like

@grant2 (or anyone else with this problem): could you upgrade your BIOS to version 3.05 and see if the GPU resets still occur? I’ve done so myself and disabled a few other workarounds I had in place (aside from upgraded mesa) and I haven’t been able to replicate the bug yet.

Thank you for bringing this to my attention.

I am going to wait for the BIOS upgrade to get mainlined and integrated into some kind of process (such as apt or fwupdmgr) instead of doing a manual upgrade. I’m trying to use my machine in a production setting, and I want to fiddle with it as little as possible.

That said, I did adjust the BIOS settings so that GPU mode now has the value UMA_GAME_OPTIMIZED. However, I have not had a chance to have an hour long conference call to see if my machine dies.

1 Like

As an update, toggling the BIOS setting so that GPU mode has the value UMA_GAME_OPTIMIZED did not solve my problem. My computer still dies after about 40 minutes of being on a conference call.

I am going to take another look at the BIOS update mentioned by @northivanastan. I will provide further updates about the success or failure of that approach.

1 Like

Unfortunately debian wouldn’t update stable for a bugfix like this, only for security bugs. Unfortunately for Debian, “stable” is meant to mean “not changing” not meant to mean “not buggy”. So stable doesn’t get new packages, it gets security patches backported to old packages as needed.

Better would be to see if a newer mesa package could be backported to stable and then it could be offered to upload to https://backports.debian.org/

2 Likes

My gnome session crashed, and I am wondering if it is related to this bug. I was able to press C-M F1 to log out and switch users to check out the log file:

Apr 30 15:33:53 df gnome-shell[2775]: meta_window_set_stack_position_no_sync: assertion 'window->stack_position >= 0' failed
Apr 30 15:35:48 df gnome-shell[2775]: Window manager warning: last_user_time (142244507) is greater than comparison timestamp (142244506).  This most likely represents a buggy client sending inaccurate timestamps in messages such as _NET_ACTIVE_WINDOW.  Trying to work around...
Apr 30 15:35:48 df gnome-shell[2775]: Window manager warning: W1016 appears to be one of the offending windows with a timestamp of 142244507.  Working around...
Apr 30 15:44:38 df gnome-shell[2775]: Window manager warning: WM_TRANSIENT_FOR window 0x81032b for 0x81034b window override-redirect is an override-redirect window and this is not correct according to the standard, so we'll fallback to the first non-override-redirect window 0x80004c.
Apr 30 15:44:40 df gnome-shell[2775]: Window manager warning: Window 0x81035a sets an MWM hint indicating it isn't resizable, but sets min size 1 x 1 and max size 2147483647 x 2147483647; this doesn't make much sense.
Apr 30 15:44:40 df gnome-shell[2775]: Window manager warning: Window 0x81035a sets an MWM hint indicating it isn't resizable, but sets min size 1 x 1 and max size 2147483647 x 2147483647; this doesn't make much sense.
Apr 30 15:47:23 df gnome-shell[2775]: Window manager warning: WM_TRANSIENT_FOR window 0x810a96 for 0x810ad3 window override-redirect is an override-redirect window and this is not correct according to the standard, so we'll fallback to the first non-override-redirect window 0x80004c.
Apr 30 15:47:23 df gnome-shell[2775]: Window manager warning: WM_TRANSIENT_FOR window 0x810a96 for 0x810adf window override-redirect is an override-redirect window and this is not correct according to the standard, so we'll fallback to the first non-override-redirect window 0x80004c.
Apr 30 15:47:24 df gnome-shell[2775]: Window manager warning: WM_TRANSIENT_FOR window 0x810a96 for 0x810aed window override-redirect is an override-redirect window and this is not correct according to the standard, so we'll fallback to the first non-override-redirect window 0x80004c.
Apr 30 15:48:32 df gnome-shell[2775]: amdgpu: amdgpu_cs_query_fence_status failed.
Apr 30 15:48:32 df gnome-shell[17171]: amdgpu: The CS has been rejected (-125), but the context isn't robust.
Apr 30 15:48:32 df gnome-shell[17171]: amdgpu: The process will be terminated.
Apr 30 15:48:32 df gnome-shell[2775]: Connection to xwayland lost
Apr 30 15:48:32 df gnome-shell[2775]: X Wayland crashed; attempting to recover
Apr 30 15:48:32 df systemd[2613]: Stopped target gnome-session-x11-services-ready.target - GNOME session X11 services.
░░ Subject: A stop job for unit UNIT has finished
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░ 
░░ A stop job for unit UNIT has finished.
░░ 
░░ The job identifier is 1197 and the job result is done.
Apr 30 15:48:32 df systemd[2613]: Stopping org.gnome.SettingsDaemon.XSettings.service - GNOME XSettings service...
░░ Subject: A stop job for unit UNIT has begun execution
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░ 
░░ A stop job for unit UNIT has begun execution.
░░ 
░░ The job identifier is 1198.
Apr 30 15:48:32 df gnome-shell[2775]: Using public X11 display :0, (using :1 for managed services)
Apr 30 15:48:32 df gnome-shell[2775]: amdgpu: amdgpu_cs_query_fence_status failed.
Apr 30 15:48:32 df gnome-shell[2775]: amdgpu: The CS has been rejected (-125). Recreate the context.
Apr 30 15:48:32 df gnome-shell[2775]: amdgpu: The CS has been rejected (-125), but the context isn't robust.
Apr 30 15:48:32 df gnome-shell[2775]: amdgpu: The process will be terminated.
Apr 30 15:48:32 df systemd[2613]: org.gnome.SettingsDaemon.XSettings.service: Main process exited, code=exited, status=1/FAILURE
░░ Subject: Unit process exited
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░ 
░░ An ExecStart= process belonging to unit UNIT has exited.
░░ 
░░ The process' exit code is 'exited' and its exit status is 1.
Apr 30 15:48:32 df nautilus[124600]: Error reading events from display: Broken pipe
Apr 30 15:48:32 df gnome-clocks[43651]: Error reading events from display: Broken pipe
Apr 30 15:48:32 df gnome-calendar[3559]: Error reading events from display: Broken pipe
Apr 30 15:48:32 df gnome-terminal-[8989]: Error reading events from display: Broken pipe
Apr 30 15:48:32 df xdg-desktop-por[3341]: Error reading events from display: Broken pipe
Apr 30 15:48:32 df kdeconnectd[3150]: Error reading events from display: Broken pipe
Apr 30 15:48:32 df evolution-alarm[3084]: Error reading events from display: Broken pipe
Apr 30 15:48:32 df gsd-keyboard[3059]: Error reading events from display: Broken pipe
Apr 30 15:48:32 df gsd-wacom[3083]: Error reading events from display: Broken pipe
Apr 30 15:48:32 df xdg-desktop-por[3306]: Error reading events from display: Broken pipe
Apr 30 15:48:32 df unknown[3047]: Error reading events from display: Broken pipe
Apr 30 15:48:32 df gsd-power[3062]: Error reading events from display: Broken pipe
Apr 30 15:48:32 df unknown[3061]: Error reading events from display: Broken pipe
Apr 30 15:48:32 df systemd[2613]: org.gnome.SettingsDaemon.Color.service: Main process exited, code=exited, status=1/FAILURE
░░ Subject: Unit process exited
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░ 
░░ An ExecStart= process belonging to unit UNIT has exited.
░░ 
░░ The process' exit code is 'exited' and its exit status is 1.

Having the same problem, FW Laptop 13 (AMD Ryzen 7040Series), Debian GNU/Linux 12 (bookworm) x86_64, Kernel 6.1.0-21-amd64.
After waking up from suspension, I’m able to work normally for some time (about 2 hours!). Then suddenly the screen turns black and thats it. Sound and input still works flawlessly. Closing and opening the lid doesn’t do anything.
For me, this only started happening after the 3.05 firmware update. Since then I had it happen four times.

...
May 13 19:28:08 fir gnome-shell[1673]: amdgpu: amdgpu_cs_query_fence_status failed.
May 13 19:28:08 fir firefox.desktop[5698]: amdgpu: The CS has been rejected (-125). Recreate the context.
May 13 19:28:08 fir gnome-shell[1673]: amdgpu: amdgpu_cs_query_fence_status failed.
May 13 19:28:08 fir gnome-shell[1673]: amdgpu: amdgpu_cs_query_fence_status failed.
May 13 19:28:08 fir firefox.desktop[5698]: amdgpu: The CS has been rejected (-125). Recreate the context.
May 13 19:28:08 fir firefox.desktop[5698]: [GFX1-]: GFX: RenderThread detected a device reset in PostUpdate
May 13 19:28:08 fir gnome-shell[1673]: amdgpu: The CS has been rejected (-125), but the context isn't robust.
May 13 19:28:08 fir gnome-shell[1673]: amdgpu: The process will be terminated.
May 13 19:28:08 fir evolution-alarm[2260]: Error reading events from display: Broken pipe
May 13 19:28:08 fir gnome-shell[2397]: (EE) failed to read Wayland events: Broken pipe
May 13 19:28:08 fir firefox.desktop[5698]: [GFX1-]: Failed to create EGLSurface!: 0x3000
May 13 19:28:08 fir firefox.desktop[5698]: [GFX1-]: Failed to create EGLSurface. 1 renderers, 1 active.
May 13 19:28:08 fir evince[12456]: Error reading events from display: Broken pipe
May 13 19:28:08 fir gsd-power[2217]: Error reading events from display: Broken pipe
May 13 19:28:08 fir gsd-media-keys[2215]: Error reading events from display: Broken pipe
May 13 19:28:08 fir gsd-color[2200]: Error reading events from display: Broken pipe
May 13 19:28:08 fir gsd-wacom[2247]: Error reading events from display: Broken pipe
May 13 19:28:08 fir gsd-keyboard[2211]: Error reading events from display: Broken pipe
May 13 19:28:08 fir systemd[1563]: org.gnome.SettingsDaemon.Color.service: Main process exited, code=exited, status=1/FAILURE
May 13 19:28:08 fir keepassxc[2339]: Error reading events from display: Broken pipe
May 13 19:28:08 fir xdg-desktop-por[2684]: Error reading events from display: Broken pipe
May 13 19:28:08 fir firefox.desktop[5698]: ExceptionHandler::GenerateDump cloned child 55938
May 13 19:28:08 fir breezebird.desktop[3923]: Exiting due to channel error.
May 13 19:28:08 fir workplace.desktop[2400]: X connection to :0 broken (explicit kill or server shutdown).
May 13 19:28:08 fir xdg-desktop-por[2528]: Error reading events from display: Broken pipe
May 13 19:28:08 fir firefox.desktop[55938]: ExceptionHandler::WaitForContinueSignal waiting for continue signal...
May 13 19:28:08 fir firefox.desktop[5698]: ExceptionHandler::SendContinueSignalToChild sent continue signal to child
May 13 19:28:08 fir sudo[2251]: pam_unix(sudo:session): session closed for user root
...
May 13 19:28:08 fir gsd-keyboard[55956]: Cannot open display:
May 13 19:28:08 fir gsd-color[55955]: Cannot open display:
May 13 19:28:08 fir systemd[1563]: org.gnome.SettingsDaemon.Keyboard.service: Main process exited, code=exited, status=1/FAILURE
May 13 19:28:08 fir gsd-wacom[55959]: Cannot open display:
May 13 19:28:08 fir gsd-xsettings[55960]: Cannot open display:
...
May 13 19:28:10 fir systemd[1563]: org.gnome.SettingsDaemon.Keyboard.service: Scheduled restart job, restart counter is at 6.
May 13 19:28:10 fir systemd[1563]: org.gnome.SettingsDaemon.XSettings.service: Scheduled restart job, restart counter is at 6.
May 13 19:28:10 fir systemd[1563]: org.gnome.SettingsDaemon.Color.service: Scheduled restart job, restart counter is at 6.
May 13 19:28:10 fir systemd[1563]: org.gnome.SettingsDaemon.MediaKeys.service: Scheduled restart job, restart counter is at 6.
May 13 19:28:10 fir systemd[1563]: org.gnome.SettingsDaemon.Power.service: Scheduled restart job, restart counter is at 6.
May 13 19:28:10 fir systemd[1563]: org.gnome.SettingsDaemon.Wacom.service: Scheduled restart job, restart counter is at 6.
May 13 19:28:10 fir systemd[1563]: Stopped org.gnome.SettingsDaemon.Color.service - GNOME color management service.
May 13 19:28:10 fir systemd[1563]: org.gnome.SettingsDaemon.Color.service: Start request repeated too quickly.
May 13 19:28:10 fir systemd[1563]: org.gnome.SettingsDaemon.Color.service: Failed with result 'exit-code'.
...

Update:

  1. Against my initial statement, no neither sound nor input work anymore after the screens go black. Sometimes the laptop even reboots from itself.
  2. In my case the issue may be related to firefox, as I now managed to trigger the crash pretty reliably.
  3. I only experienced this issue while using my usb-c / DisplayLink dock. The problem could be related to my dock or the used DisplayLink driver.

I have managed to record the crash happening: https://kimendisch.de/fwcrashwithdock.mp4
Let me know whether your issues follow the same pattern!

The problem that I (and I think grant) reported does involve the displays turning on and off, but the Framework does not reboot itself. It simply kicks the user back to the login screen due to a compositor crash. And it was partially fixed by firmware 3.05, in my case. (GPU resets still happen, but infrequently, and the desktop reloads cleanly.)

Hmm, ok… I thought the issue was related because my log looks pretty much the same as the one posted by grant. The weird thing is that for me the issues started only AFTER updating to 3.05! If the issue persists I’ll maybe have to open another post…

I’m having this issue as well (Debian 12 / Kernel 6.9.3 / BIOS 3.05) while connected to a Surface Pro Dock (perhaps this dock is problematic?).

The system was idle, probably showing a screensaver via the xscreensaver* packages, and when I returned about an hour later it was on a fresh login screen, with my previous session nowhere to be found.

.xsession-errors.old

5836 amdgpu: amdgpu_cs_query_fence_status failed.
5837 amdgpu: The CS has been rejected (-125), but the context isn't robust.
5838 amdgpu: The process will be terminated.
5839 X connection to :0.0 broken (explicit kill or server shutdown).^M
5840 xfce4-panel-Message: 11:54:56.155: Plugin cpufreq-17 has been automatically restarted after crash.
5841 XIO:  fatal IO error 2 (No such file or directory) on X server ":0.0"^M
5842       after 17 requests (17 known processed) with 0 events remaining.^M
5843 XIO:  fatal IO error 4 (Interrupted system call) on X server ":0.0"^M
5844       after 27429 requests (27429 known processed) with 0 events remaining.^M
5845 xscreensaver: 11:54:56: pid 1992976: xscreensaver-gfx exited unexpectedly with status 1: re-launching
5846 X connection to :0.0 broken (explicit kill or server shutdown).^M
5847 XGB: xgb.go:403: A read error is unrecoverable: read unix @->/tmp/.X11-unix/X0: read: connection reset by peer
5848 XGB: xgb.go:403: A read error is unrecoverable: EOF
5849 xscreensaver-systemd: 11:54:56: X connection closed
5850 xscreensaver: 11:54:56: pid 3901: xscreensaver-systemd exited unexpectedly with status 1
5851 panic: close of closed channel

There are also a flurry of bamfdaemon errors at the same time:

Jun 16 11:54:56 jon-laptop systemd[2818]: bamfdaemon.service: Main process exited, code=exited, status=1/FAILURE
Jun 16 11:54:56 jon-laptop systemd[2818]: bamfdaemon.service: Failed with result 'exit-code'.
Jun 16 11:54:56 jon-laptop systemd[2818]: bamfdaemon.service: Consumed 9.779s CPU time.
Jun 16 11:54:56 jon-laptop systemd[2818]: vte-spawn-8cbb9485-b5d1-4d14-922f-9349e35f3b8c.scope: Consumed 25.013s>
Jun 16 11:54:56 jon-laptop systemd[2818]: vte-spawn-3f4c43d5-eab8-41b6-bace-c61ab23c520d.scope: Consumed 29.242s>
Jun 16 11:54:56 jon-laptop systemd[2818]: gnome-terminal-server.service: Main process exited, code=exited, statu>
Jun 16 11:54:56 jon-laptop systemd[2818]: gnome-terminal-server.service: Failed with result 'exit-code'.
Jun 16 11:54:56 jon-laptop systemd[2818]: gnome-terminal-server.service: Consumed 22.395s CPU time.
Jun 16 11:54:56 jon-laptop dbus-daemon[2840]: [session uid=1000 pid=2840] Monitoring connection :1.307 closed.
Jun 16 11:54:56 jon-laptop at-spi-bus-launcher[3055]: X connection to :0 broken (explicit kill or server shutdow>
Jun 16 11:54:56 jon-laptop systemd[2818]: bamfdaemon.service: Scheduled restart job, restart counter is at 1.
Jun 16 11:54:56 jon-laptop systemd[2818]: Stopped bamfdaemon.service - BAMF Application Matcher Framework.
Jun 16 11:54:56 jon-laptop systemd[2818]: bamfdaemon.service: Consumed 9.779s CPU time.
Jun 16 11:54:56 jon-laptop systemd[2818]: Starting bamfdaemon.service - BAMF Application Matcher Framework...
Jun 16 11:54:56 jon-laptop systemd[2818]: xdg-desktop-portal-gnome.service: Main process exited, code=exited, st>
Jun 16 11:54:56 jon-laptop systemd[2818]: xdg-desktop-portal-gnome.service: Failed with result 'exit-code'.
Jun 16 11:54:56 jon-laptop systemd[2818]: xdg-desktop-portal-gnome.service: Consumed 3.627s CPU time.
Jun 16 11:54:56 jon-laptop systemd[2818]: xdg-desktop-portal-gtk.service: Main process exited, code=exited, stat>
Jun 16 11:54:56 jon-laptop systemd[2818]: xdg-desktop-portal-gtk.service: Failed with result 'exit-code'.
Jun 16 11:54:56 jon-laptop systemd[2818]: xdg-desktop-portal-gtk.service: Consumed 1.147s CPU time.
Jun 16 11:54:56 jon-laptop systemd[2818]: xfce4-notifyd.service: Main process exited, code=exited, status=1/FAIL>
Jun 16 11:54:56 jon-laptop systemd[2818]: xfce4-notifyd.service: Failed with result 'exit-code'.
Jun 16 11:54:56 jon-laptop systemd[2818]: xfce4-notifyd.service: Consumed 1.878s CPU time.
Jun 16 11:54:57 jon-laptop bamfdaemon[2046245]: cannot open display: :0
Jun 16 11:54:57 jon-laptop systemd[2818]: bamfdaemon.service: Main process exited, code=exited, status=1/FAILURE
Jun 16 11:54:57 jon-laptop systemd[2818]: bamfdaemon.service: Failed with result 'exit-code'.
Jun 16 11:54:57 jon-laptop systemd[2818]: Failed to start bamfdaemon.service - BAMF Application Matcher Framewor>
Jun 16 11:54:57 jon-laptop systemd[2818]: bamfdaemon.service: Scheduled restart job, restart counter is at 2.
Jun 16 11:54:57 jon-laptop systemd[2818]: Stopped bamfdaemon.service - BAMF Application Matcher Framework.
Jun 16 11:54:57 jon-laptop systemd[2818]: Starting bamfdaemon.service - BAMF Application Matcher Framework...
Jun 16 11:54:58 jon-laptop bamfdaemon[2046278]: Invalid MIT-MAGIC-COOKIE-1 key
Jun 16 11:54:58 jon-laptop bamfdaemon[2046278]: cannot open display: :0

I’m guess the bamf errors are a result of the session crash.

Any ideas?

At first I also thought the issue was caused by my dock (a Lenovo one), but for me the issue has also appeared when on the go, with no dock or external power connected.
So far I have tried a lot of things, which seem to make the issue happen less often, but sadly didn’t manage to make it go away. The things I have tried:

  • Set gpu ram to 4GB in BIOS settings
  • Restrict gpu clock rate to lowest possible via /sys/class/drm/card0/device/power_dpm_force_performance_level
  • Manually install “missing” amd firmware
  • Reinstall displaylink drivers

By now the crashes happen at least once a week, when doing heavy work it even happens multiple times a day. I love my framework laptop, but this issue is just really really frustrating. I’m now at a point where I don’t even dare to go to openstreetmap.org anymore because it is prone to cause a crash.
Yes I know that Debian is not officially supported, but come on… Ubuntu is based on Debian and also Debian is not some weird niche OS a few people use… I’m okay with it not running super stable, but even though it is not officially supported, it should at least not crash every day!

I’ve had issues with this dock when using Bookworm on a Surface, I’m thinking the dock support is iffy.

There are some recommended docks here:

Another thing that helped immensely was installing a newer kernel and the latest amdgpu firmware.

If you can reproduce your issue on a mainline (upstream) not EOL kernel and updated GPU firmware you can report it to AMDs bug tracker. No promises in solving issues, but that’s the way things get fixed.

This just happened to me out of nowhere, too.

Just received my framework 13 back from service and updated to latest Arch Linux Kernel and Linux Firmware from Git, as well as Firmware from 3.03 to 3.05.

Which AMD bug tracker are you referring to?

I am using my Framework for work and this instability wasn’t there before updating.

1 Like

I also have had this crash for the 4th time this week now.
I’m on fedora 40 with firmware at 3.05, lvfs says no updates available.
Framework 13 7840 2nd Batch.
journalctl says that amdgpu has a page fault and the gpu resets.

I don’t know if there is a connection, but the device got really hot and unresponsive (had to hard reset) a day before the issues began. It felt like the fans didn’t go on (or were just really low rpm), and only after a restart did they spin up.

After about 50 Crashes caused by this bug, my root partition was finally damaged so badly that I was not able to recover it. So I left Debian 12 behind and reinstalled with Fedora. Running Fedora 40 for about 2 weeks now without a single crash or “VRAM lost” problem. All the things that caused a crash 100% on Debian now run completely normal. (Still having other issues though that were not present on Debian, like buggy video playback)

Sounds like outdated GPU firmware in Debian. The firmware there is from before Phoenix was production. Stability has increased with the newer firmware but Debian refuses to update it.