Random freezes with Arch Linux (requires hard reboot)

Hello Framework people :wave:

It’s now been 2 months since I started using my Framework 16 and I absolutely love it. I spent a lot of time tinkering and fixing all the issues I could encounter on my Arch Linux (Wayland with KDE), and it’s all working wonderfully well, I’m conquered and finally started using this laptop for all my work.

However, one issue still remains, random freezes (most likely kernel panics) that require a hard reboot using the power button. When it happens, I cannot do anything, the screen freezes, the RGB on the keyboard still works though (and I can still turn it on/off). I think this is a kernel issue since the virtual terminals are all unavailable when it happens, and it often happens during idle times (e.g. I’m away or doing something else and not doing anything on the laptop, after coming back, the screen is just frozen).

These freezes are very random and can happen one or two times a week, but also few times a day. Like today, it happened 3 times in the span of one hour (while actively working and using an external monitor) and then never again for the rest of the day. It feels terrible (and terrifying, what is something gets corrupted very badly).

What I already tried :

  • Looked a lot through the Framework and Arch forums, but with no logs to use, it’s hard to know if x is the reason of the crash
  • Setting up Kdump to get a clue of what may have gone wrong, sadly I never managed to get it working so far, mostly because I am using an UKI with an encrypted LUKS2 disk
  • Running memtest for an extensive amount of time (like at least 4 hours), no error reported
  • Opened my Framework multiple times to see if I could notice anything wrong, I also re-installed my RAM and SSD just in case, but they seem to be well positioned, not the problem

With all that in mind, I highly doubt it’s hardware related, but then again without any log and Kdump not working, it’s hard to know. Kdump does not work even if I trigger a kernel panic manually, so it’s highly possible that I’m experiencing kernel panics all the time.

It’s also worth noting that on random occasions, the laptop automatically restarts by itself, instead of just freezing. However, it stops ungracefully, and results in the same as if I had turned the laptop off by forcing the power button.

I’m just hoping to get some ideas on what to try to pinpoint the problem, or perhaps someone here already had a similar issue and can lead me somewhere.

The “good” part of it is that it forced me into making backups weekly to an external drive, which relieves my mind a lot. :relieved:

System info :

OS: Arch Linux x86_64
Host: Laptop 16 (AMD Ryzen 7040 Series) (AJ)
Kernel: Linux 6.13.1-arch1-1
Packages: 932 (pacman)
Display (BOE0BC9): 2560x1600 @ 165 Hz (as 1706x1066) in 16" [Built-in]
DE: KDE Plasma 6.2.5
WM: KWin (Wayland)
CPU: AMD Ryzen 9 7940HS w/ Radeon 780M Graphics (16) @ 5.26 GHz
GPU: AMD Radeon 780M [Integrated]
Memory: 8.14 GiB / 28.41 GiB (29%)
Swap: 0 B / 4.00 GiB (0%)
Disk (/): 87.14 GiB / 1.79 TiB (5%) - ext4
2 Likes

Same exact problem here. The only difference is that I do have the Expansion Slot GPU (the AMD Radeon RX 7700S). System has been completely unusable after 5 minutes on the 6.13 kernel. I am going to try the LTS kernel and see if it is any better.

Sounds like you have tried most of the main possible causes and ruled them out already. I run Arch without this issue, however, I switched to mainly booting the LTS kernel a while ago due to all the instability I was getting from the main kernel. I would try installing the LTS and running that for a while to see if it still happens.

Other than that, try another distro and see if it still happens.

I had similar Problems. Ramdomly freezing of the desktop, start of a second sddm-loginscreen 5 minutes after login, graphical glitches in menue.

I always hoped that it woulöd get better after some updates. But (for what I can tell after 3 or 5 hours of testing afterwards) there where two things that made the system stable again:

  1. I switchted to another KDE-Plasma-Themes after I saw errors in journalctl refering to deprecated funtion calls in the theme I used. Now I’m back to one of the stock themes (beeze dark)
  2. For the flickering: It went way after I applyed this Kernel-Paramter found in this thread: Screen flickering on Linux kernel 6.12 - #17 by haykh

dcdebugmask=0x400

Thanks for the suggestions. I will definitely try another distro and keep it idle for a while to see if the freezes happen again if I can’t figure something out. I did not go with the LTS kernel as it was a fresh install anyway, I figured it would not be worth it.

I will certainly try the dcdebugmask parameter I have seen in a bunch of issues, including this one Artifacting and glitching on 7840HS/780M on Wayland (#3388) · Issues · drm / amd · GitLab that has been affecting me but not as hard as some other person (it happens very rarely and only for a split second). I wonder if it’s maybe related to the freezes ? It would explain why the screen freezes after being idle for a while (not a lot happening on the screen).

Unfortunately, this appears to be a Linux kernel 6.13 regression. Installing the latest LTS kernel (6.12) fixes the freezing issue.

Arch knows about it now and I would highly suggest installing the linux-lts kernel using pacman and then setting your system up to boot from that instead until this issue is fixed with the stable kernel.

For me, the issue would appear attempting to use flatpaks such as one to download YouTube videos. This of course was not the only application to hard lock the system though; Bottles also did this hard lock up.

I don’t think that’s related to my issue, as I already experienced these crashes in 6.12 and updated to 6.13 in hope of having it resolved. I also don’t use any flatpaks. For now, I will try with the kernel parameters related to power consumption and refresh rate and see if anything happens (hopefully no more crashes).

As with most things like this, the trick is to:

  1. Make the problem reproducible so it can be reproduced and fixed by the appropriate driver developer.
  2. Capture stack traces when it fails.
    Obviously, if the screen freezes and one needs to hard reboot the laptop, the problem is finding the crash dump / stack trace logging what happened.
    Places to look for stack traces:
    a) the Linux kernel logs in /var/log
    b) journalctl -b -1
    c) /var/lib/systemd/pstore
    On ubuntu any crash dump that requires a hard reboot or similar is stored in the UEFI pstore.
    systemd then reads the pstore at startup, and copies the output to /var/lib/systemd/pstore.
    I don’t know if Arch Linux does the same, but it is worth a look.

There will be times where none of the above helps, as is the case here:

But I thought I would mention the above a,b,c as not many people know about c.

1 Like

I haven’t seen this issue myself on my Arch install, but I’m running the zen kernel rather than the stock Arch kernel, so perhaps that makes a difference.

I have had an issue with KDE Plasma (kwin specifically) pulling an egregious amount of power, but that’s a different issue.

Thank you for the advices. I tried multiple times to reproduce it, but it really happens out of nowhere, with no prior conditions. The only hint I can give is that it happens most often when the laptop is idle. I have been reading the logs after each crash, but it never gives any meaningful information, most of the time the last logs are minutes before the crash even happened. I definitely did not know about pstore though, I will give that a shot next time it happens!

For the post you linked, I had already seen it, and I am following it closely to see where it goes. It could be possible that this is an issue with the Framework BIOS and power management, and would once again explain why it happens when idle.

I doubt it is related to the Arch kernel, but it’s also definitely possible. I’ve been working all day with my laptop today and experienced 0 freeze, I will report back if I really stopped experiencing freeze for a long time. The only thing I changed was adding amdgpu.dcdebugmask=0x12 to the kernel parameters to disable PSR because of another freeze I could finally catch kwin_wayland_drm: Pageflip timed out! This is a kernel bug, but it seemed to only happen with an external monitor connected and thus not really related to the usual freezes.

You can have both installed at the same time.

I didn’t know that, but it wouldn’t make much of a difference anyway, as the freezes already happened on the previous kernel.

As an update : I used the computer for 2 days now (with and without external display) with no freezes except one where the computer restarted automatically.

I have set up pstore and watchdog in the hope of catching a kernel panic, but none so far, it seems to be running smoothly. I did not do much except add amdgpu.dcdebugmask=0x12 and force all my applications to run in native Wayland if possible to have better performance when unplugged. Will report if anything happens.

I had a freeze the other day.

OS: Arch
Kernel: 6.13.1-zen1-1-zen
WM/DE: LXQT with mutter as the window manager

Can confirm this freeze happens pretty frequently for no apparent reason on kernel 6.12.11 Fedora 41.

Might this be a KDE/wayland/kernel 6.13.x issue? I see the same on my old Thinkpad T450s.

Just had yet another freeze after 4 days of using the laptop just fine. What is weird is that watchdog did not work at all and did not reboot the computer, /var/lib/systemd/pstore is empty as well. Here are the very last logs before it happened :

Feb 10 21:19:00 wpa_supplicant[2066]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-68 noise=9999 txrate=526600
Feb 10 21:19:00 wpa_supplicant[2066]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-65 noise=9999 txrate=526600
Feb 10 21:19:02 wpa_supplicant[2066]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-67 noise=9999 txrate=526600
Feb 10 21:19:12 wpa_supplicant[2066]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-64 noise=9999 txrate=468000
Feb 10 21:19:26 wpa_supplicant[2066]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-65 noise=9999 txrate=526600
Feb 10 21:19:37 wpa_supplicant[2066]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-63 noise=9999 txrate=526600
Feb 10 21:19:38 wpa_supplicant[2066]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-67 noise=9999 txrate=526600
Feb 10 21:20:24 wpa_supplicant[2066]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-67 noise=9999 txrate=468000
Feb 10 21:20:27 wpa_supplicant[2066]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-67 noise=9999 txrate=468000
Feb 10 21:20:30 wpa_supplicant[2066]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-64 noise=9999 txrate=468000
Feb 10 21:20:32 wpa_supplicant[2066]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-68 noise=9999 txrate=468000
Feb 10 21:20:33 wpa_supplicant[2066]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-66 noise=9999 txrate=468000
Feb 10 21:20:34 wpa_supplicant[2066]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-64 noise=9999 txrate=468000
Feb 10 21:20:38 wpa_supplicant[2066]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-65 noise=9999 txrate=526600
Feb 10 21:20:38 wpa_supplicant[2066]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-68 noise=9999 txrate=526600
Feb 10 21:20:39 wpa_supplicant[2066]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-68 noise=9999 txrate=526600
Feb 10 21:20:39 systemd[1]: Started dbus-:1.2-org.kde.powerdevil.backlighthelper@48.service.
Feb 10 21:20:39 wpa_supplicant[2066]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-68 noise=9999 txrate=526600
Feb 10 21:20:40 wpa_supplicant[2066]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-66 noise=9999 txrate=526600
Feb 10 21:20:41 wpa_supplicant[2066]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-66 noise=9999 txrate=526600
Feb 10 21:20:42 wpa_supplicant[2066]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-65 noise=9999 txrate=526600
Feb 10 21:20:43 wpa_supplicant[2066]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-67 noise=9999 txrate=526600
Feb 10 21:20:44 kscreenlocker_greet[208656]: qt.qpa.wayland: Could not create EGL surface (EGL error 0x3000)
Feb 10 21:20:44 kscreenlocker_greet[208656]: Failed to write to the pipe: Bad file descriptor.
Feb 10 21:20:44 wpa_supplicant[2066]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-68 noise=9999 txrate=526600
Feb 10 21:20:45 wpa_supplicant[2066]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-65 noise=9999 txrate=526600
Feb 10 21:20:45 wpa_supplicant[2066]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-65 noise=9999 txrate=526600
Feb 10 21:20:46 wpa_supplicant[2066]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-67 noise=9999 txrate=526600
Feb 10 21:20:46 wpa_supplicant[2066]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-65 noise=9999 txrate=526600
Feb 10 21:20:47 wpa_supplicant[2066]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-65 noise=9999 txrate=526600
Feb 10 21:20:47 wpa_supplicant[2066]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-66 noise=9999 txrate=526600
Feb 10 21:20:48 wpa_supplicant[2066]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-65 noise=9999 txrate=351000
Feb 10 21:20:48 wpa_supplicant[2066]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-65 noise=9999 txrate=351000
Feb 10 21:20:49 wpa_supplicant[2066]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-66 noise=9999 txrate=526600
Feb 10 21:20:49 wpa_supplicant[2066]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-64 noise=9999 txrate=526600
Feb 10 21:20:49 systemd[1]: dbus-:1.2-org.kde.powerdevil.backlighthelper@48.service: Deactivated successfully.
Feb 10 21:20:50 wpa_supplicant[2066]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-64 noise=9999 txrate=526600
Feb 10 21:20:50 wpa_supplicant[2066]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-64 noise=9999 txrate=526600
Feb 10 21:21:08 wpa_supplicant[2066]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-68 noise=9999 txrate=468000

So the only really relevant log before the freeze would be dbus-:1.2-org.kde.powerdevil.backlighthelper@48.service: Deactivated successfully., which I have seen quite a few times in logs before freezes, but there’s still a bit of delay between this log and the actual freeze. Could it be that this isn’t a kernel panic at all ? It seems strange to have no logs and having watchdog not do anything.

This freeze happened like one minute after I logged back into my computer that was standing idle.

My freeze occured on LXQT, not KDE.

Well, I had 2 more freezes with an instant reboot (presumably from watchdog) in the span of like 10 minutes. This is very weird after using the laptop for over 4 days without issues.

Last relevant logs are :

Feb 10 22:25:48 wpa_supplicant[2073]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-66 noise=9999 txrate=234000
Feb 10 22:25:47 wpa_supplicant[2073]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-68 noise=9999 txrate=351000
Feb 10 22:24:46 wpa_supplicant[2073]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-64 noise=9999 txrate=292500
Feb 10 22:24:46 wpa_supplicant[2073]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-61 noise=9999 txrate=292500
Feb 10 22:24:33 sudo[9101]: pam_unix(sudo:session): session closed for user root
Feb 10 22:24:26 kwin_wayland[2218]: kwin_libinput: Libinput: event17 - PIXA3854:00 093A:0274 Touchpad: kernel bug: Touch jump detected and discarded.
                                             See https://wayland.freedesktop.org/libinput/doc/1.27.1/touchpad-jumping-cursors.html for details
Feb 10 22:24:16 sudo[9101]: pam_unix(sudo:session): session opened for user root(uid=0) by yam(uid=1000)
Feb 10 22:24:16 sudo[9101]:      yam : TTY=pts/3 ; PWD=*** ; USER=root ; COMMAND=/usr/bin/mkinitcpio -p linux
Feb 10 22:23:12 wpa_supplicant[2073]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-65 noise=9999 txrate=351000
Feb 10 22:23:12 wpa_supplicant[2073]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-65 noise=9999 txrate=351000
Feb 10 22:23:08 wpa_supplicant[2073]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-65 noise=9999 txrate=292500
Feb 10 22:23:08 wpa_supplicant[2073]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-65 noise=9999 txrate=292500
Feb 10 22:23:07 wpa_supplicant[2073]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-63 noise=9999 txrate=292500
Feb 10 22:23:07 wpa_supplicant[2073]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-62 noise=9999 txrate=292500
Feb 10 22:23:02 wpa_supplicant[2073]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-65 noise=9999 txrate=292500
Feb 10 22:22:52 wpa_supplicant[2073]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-66 noise=9999 txrate=263300
Feb 10 22:14:20 kernel: cros-ec-dev cros-ec-dev.2.auto: Some logs may have been dropped...
Feb 10 22:06:11 wpa_supplicant[2065]: wlan0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-61 noise=9999 txrate=292500
Feb 10 22:05:39 systemd[1]: Finished Cleanup of Temporary Directories.
Feb 10 22:05:39 systemd[1]: systemd-tmpfiles-clean.service: Deactivated successfully.
Feb 10 22:05:39 systemd[1]: Starting Cleanup of Temporary Directories...
Feb 10 22:01:00 CROND[9724]: (root) CMDEND (run-parts /etc/cron.hourly)
Feb 10 22:01:00 CROND[9725]: (root) CMD (run-parts /etc/cron.hourly)
Feb 10 22:00:33 kded6[2377]: Service  ":1.114" unregistered
Feb 10 22:00:33 backintime[9278]: Main profile(1) :: INFO: Unlock
Feb 10 22:00:31 backintime[9278]: Main profile(1) :: ERROR: Snapshots directory not accessible. Tries stopped.

I doubt the errors are related to the crashes looking at the difference in time between them and the crashes. They’re also fairly common and not the type to trigger kernel panics. What I don’t understand is why I get absolutely no logs anywhere of what might go wrong, /var/lib/systemd/pstore is always empty. Do any of you have logs ?

Also keeping a close eye on FRWK16 - Random Crash then Reboots - #50 by PureKrome, as it seems to be getting somewhere and deeply related to the issues experienced here. It would mean this is related to Framework as a whole ? Since they’re mostly reporting it on Windows. It would explain why getting logs is so hard.

There is a way to double check if pstore is working or not.
Note: The following command halts the system, so make sure you have saved all work and typed “sync”.

sudo echo c >/proc/sysrq-trigger

That will halt the system, probably looking similar to the “Random freeze” you are seeing.
You will need to press the power button for a while to get the system to power off.
After power on and boot into linux, there should be a new folder in /var/lib/systemd/pstore
In the sub-folder, there should be a dmesg holding the crash dump caused by the echo c >/proc/sysrq-trigger

If that works, it means you have pstore set up correctly.