[RESPONDED] VRAM is lost due to GPU reset! (followed by a crash)

I’m having this issue as well (Debian 12 / Kernel 6.9.3 / BIOS 3.05) while connected to a Surface Pro Dock (perhaps this dock is problematic?).

The system was idle, probably showing a screensaver via the xscreensaver* packages, and when I returned about an hour later it was on a fresh login screen, with my previous session nowhere to be found.

.xsession-errors.old

5836 amdgpu: amdgpu_cs_query_fence_status failed.
5837 amdgpu: The CS has been rejected (-125), but the context isn't robust.
5838 amdgpu: The process will be terminated.
5839 X connection to :0.0 broken (explicit kill or server shutdown).^M
5840 xfce4-panel-Message: 11:54:56.155: Plugin cpufreq-17 has been automatically restarted after crash.
5841 XIO:  fatal IO error 2 (No such file or directory) on X server ":0.0"^M
5842       after 17 requests (17 known processed) with 0 events remaining.^M
5843 XIO:  fatal IO error 4 (Interrupted system call) on X server ":0.0"^M
5844       after 27429 requests (27429 known processed) with 0 events remaining.^M
5845 xscreensaver: 11:54:56: pid 1992976: xscreensaver-gfx exited unexpectedly with status 1: re-launching
5846 X connection to :0.0 broken (explicit kill or server shutdown).^M
5847 XGB: xgb.go:403: A read error is unrecoverable: read unix @->/tmp/.X11-unix/X0: read: connection reset by peer
5848 XGB: xgb.go:403: A read error is unrecoverable: EOF
5849 xscreensaver-systemd: 11:54:56: X connection closed
5850 xscreensaver: 11:54:56: pid 3901: xscreensaver-systemd exited unexpectedly with status 1
5851 panic: close of closed channel

There are also a flurry of bamfdaemon errors at the same time:

Jun 16 11:54:56 jon-laptop systemd[2818]: bamfdaemon.service: Main process exited, code=exited, status=1/FAILURE
Jun 16 11:54:56 jon-laptop systemd[2818]: bamfdaemon.service: Failed with result 'exit-code'.
Jun 16 11:54:56 jon-laptop systemd[2818]: bamfdaemon.service: Consumed 9.779s CPU time.
Jun 16 11:54:56 jon-laptop systemd[2818]: vte-spawn-8cbb9485-b5d1-4d14-922f-9349e35f3b8c.scope: Consumed 25.013s>
Jun 16 11:54:56 jon-laptop systemd[2818]: vte-spawn-3f4c43d5-eab8-41b6-bace-c61ab23c520d.scope: Consumed 29.242s>
Jun 16 11:54:56 jon-laptop systemd[2818]: gnome-terminal-server.service: Main process exited, code=exited, statu>
Jun 16 11:54:56 jon-laptop systemd[2818]: gnome-terminal-server.service: Failed with result 'exit-code'.
Jun 16 11:54:56 jon-laptop systemd[2818]: gnome-terminal-server.service: Consumed 22.395s CPU time.
Jun 16 11:54:56 jon-laptop dbus-daemon[2840]: [session uid=1000 pid=2840] Monitoring connection :1.307 closed.
Jun 16 11:54:56 jon-laptop at-spi-bus-launcher[3055]: X connection to :0 broken (explicit kill or server shutdow>
Jun 16 11:54:56 jon-laptop systemd[2818]: bamfdaemon.service: Scheduled restart job, restart counter is at 1.
Jun 16 11:54:56 jon-laptop systemd[2818]: Stopped bamfdaemon.service - BAMF Application Matcher Framework.
Jun 16 11:54:56 jon-laptop systemd[2818]: bamfdaemon.service: Consumed 9.779s CPU time.
Jun 16 11:54:56 jon-laptop systemd[2818]: Starting bamfdaemon.service - BAMF Application Matcher Framework...
Jun 16 11:54:56 jon-laptop systemd[2818]: xdg-desktop-portal-gnome.service: Main process exited, code=exited, st>
Jun 16 11:54:56 jon-laptop systemd[2818]: xdg-desktop-portal-gnome.service: Failed with result 'exit-code'.
Jun 16 11:54:56 jon-laptop systemd[2818]: xdg-desktop-portal-gnome.service: Consumed 3.627s CPU time.
Jun 16 11:54:56 jon-laptop systemd[2818]: xdg-desktop-portal-gtk.service: Main process exited, code=exited, stat>
Jun 16 11:54:56 jon-laptop systemd[2818]: xdg-desktop-portal-gtk.service: Failed with result 'exit-code'.
Jun 16 11:54:56 jon-laptop systemd[2818]: xdg-desktop-portal-gtk.service: Consumed 1.147s CPU time.
Jun 16 11:54:56 jon-laptop systemd[2818]: xfce4-notifyd.service: Main process exited, code=exited, status=1/FAIL>
Jun 16 11:54:56 jon-laptop systemd[2818]: xfce4-notifyd.service: Failed with result 'exit-code'.
Jun 16 11:54:56 jon-laptop systemd[2818]: xfce4-notifyd.service: Consumed 1.878s CPU time.
Jun 16 11:54:57 jon-laptop bamfdaemon[2046245]: cannot open display: :0
Jun 16 11:54:57 jon-laptop systemd[2818]: bamfdaemon.service: Main process exited, code=exited, status=1/FAILURE
Jun 16 11:54:57 jon-laptop systemd[2818]: bamfdaemon.service: Failed with result 'exit-code'.
Jun 16 11:54:57 jon-laptop systemd[2818]: Failed to start bamfdaemon.service - BAMF Application Matcher Framewor>
Jun 16 11:54:57 jon-laptop systemd[2818]: bamfdaemon.service: Scheduled restart job, restart counter is at 2.
Jun 16 11:54:57 jon-laptop systemd[2818]: Stopped bamfdaemon.service - BAMF Application Matcher Framework.
Jun 16 11:54:57 jon-laptop systemd[2818]: Starting bamfdaemon.service - BAMF Application Matcher Framework...
Jun 16 11:54:58 jon-laptop bamfdaemon[2046278]: Invalid MIT-MAGIC-COOKIE-1 key
Jun 16 11:54:58 jon-laptop bamfdaemon[2046278]: cannot open display: :0

I’m guess the bamf errors are a result of the session crash.

Any ideas?

At first I also thought the issue was caused by my dock (a Lenovo one), but for me the issue has also appeared when on the go, with no dock or external power connected.
So far I have tried a lot of things, which seem to make the issue happen less often, but sadly didn’t manage to make it go away. The things I have tried:

  • Set gpu ram to 4GB in BIOS settings
  • Restrict gpu clock rate to lowest possible via /sys/class/drm/card0/device/power_dpm_force_performance_level
  • Manually install “missing” amd firmware
  • Reinstall displaylink drivers

By now the crashes happen at least once a week, when doing heavy work it even happens multiple times a day. I love my framework laptop, but this issue is just really really frustrating. I’m now at a point where I don’t even dare to go to openstreetmap.org anymore because it is prone to cause a crash.
Yes I know that Debian is not officially supported, but come on… Ubuntu is based on Debian and also Debian is not some weird niche OS a few people use… I’m okay with it not running super stable, but even though it is not officially supported, it should at least not crash every day!

I’ve had issues with this dock when using Bookworm on a Surface, I’m thinking the dock support is iffy.

There are some recommended docks here:

Another thing that helped immensely was installing a newer kernel and the latest amdgpu firmware.

If you can reproduce your issue on a mainline (upstream) not EOL kernel and updated GPU firmware you can report it to AMDs bug tracker. No promises in solving issues, but that’s the way things get fixed.

This just happened to me out of nowhere, too.

Just received my framework 13 back from service and updated to latest Arch Linux Kernel and Linux Firmware from Git, as well as Firmware from 3.03 to 3.05.

Which AMD bug tracker are you referring to?

I am using my Framework for work and this instability wasn’t there before updating.

1 Like

I also have had this crash for the 4th time this week now.
I’m on fedora 40 with firmware at 3.05, lvfs says no updates available.
Framework 13 7840 2nd Batch.
journalctl says that amdgpu has a page fault and the gpu resets.

I don’t know if there is a connection, but the device got really hot and unresponsive (had to hard reset) a day before the issues began. It felt like the fans didn’t go on (or were just really low rpm), and only after a restart did they spin up.

After about 50 Crashes caused by this bug, my root partition was finally damaged so badly that I was not able to recover it. So I left Debian 12 behind and reinstalled with Fedora. Running Fedora 40 for about 2 weeks now without a single crash or “VRAM lost” problem. All the things that caused a crash 100% on Debian now run completely normal. (Still having other issues though that were not present on Debian, like buggy video playback)

Sounds like outdated GPU firmware in Debian. The firmware there is from before Phoenix was production. Stability has increased with the newer firmware but Debian refuses to update it.