Random freezes with Arch Linux (requires hard reboot)

Hello Framework people :wave:

It’s now been 2 months since I started using my Framework 16 and I absolutely love it. I spent a lot of time tinkering and fixing all the issues I could encounter on my Arch Linux (Wayland with KDE), and it’s all working wonderfully well, I’m conquered and finally started using this laptop for all my work.

However, one issue still remains, random freezes (most likely kernel panics) that require a hard reboot using the power button. When it happens, I cannot do anything, the screen freezes, the RGB on the keyboard still works though (and I can still turn it on/off). I think this is a kernel issue since the virtual terminals are all unavailable when it happens, and it often happens during idle times (e.g. I’m away or doing something else and not doing anything on the laptop, after coming back, the screen is just frozen).

These freezes are very random and can happen one or two times a week, but also few times a day. Like today, it happened 3 times in the span of one hour (while actively working and using an external monitor) and then never again for the rest of the day. It feels terrible (and terrifying, what is something gets corrupted very badly).

What I already tried :

  • Looked a lot through the Framework and Arch forums, but with no logs to use, it’s hard to know if x is the reason of the crash
  • Setting up Kdump to get a clue of what may have gone wrong, sadly I never managed to get it working so far, mostly because I am using an UKI with an encrypted LUKS2 disk
  • Running memtest for an extensive amount of time (like at least 4 hours), no error reported
  • Opened my Framework multiple times to see if I could notice anything wrong, I also re-installed my RAM and SSD just in case, but they seem to be well positioned, not the problem

With all that in mind, I highly doubt it’s hardware related, but then again without any log and Kdump not working, it’s hard to know. Kdump does not work even if I trigger a kernel panic manually, so it’s highly possible that I’m experiencing kernel panics all the time.

It’s also worth noting that on random occasions, the laptop automatically restarts by itself, instead of just freezing. However, it stops ungracefully, and results in the same as if I had turned the laptop off by forcing the power button.

I’m just hoping to get some ideas on what to try to pinpoint the problem, or perhaps someone here already had a similar issue and can lead me somewhere.

The “good” part of it is that it forced me into making backups weekly to an external drive, which relieves my mind a lot. :relieved:

System info :

OS: Arch Linux x86_64
Host: Laptop 16 (AMD Ryzen 7040 Series) (AJ)
Kernel: Linux 6.13.1-arch1-1
Packages: 932 (pacman)
Display (BOE0BC9): 2560x1600 @ 165 Hz (as 1706x1066) in 16" [Built-in]
DE: KDE Plasma 6.2.5
WM: KWin (Wayland)
CPU: AMD Ryzen 9 7940HS w/ Radeon 780M Graphics (16) @ 5.26 GHz
GPU: AMD Radeon 780M [Integrated]
Memory: 8.14 GiB / 28.41 GiB (29%)
Swap: 0 B / 4.00 GiB (0%)
Disk (/): 87.14 GiB / 1.79 TiB (5%) - ext4

Same exact problem here. The only difference is that I do have the Expansion Slot GPU (the AMD Radeon RX 7700S). System has been completely unusable after 5 minutes on the 6.13 kernel. I am going to try the LTS kernel and see if it is any better.

Sounds like you have tried most of the main possible causes and ruled them out already. I run Arch without this issue, however, I switched to mainly booting the LTS kernel a while ago due to all the instability I was getting from the main kernel. I would try installing the LTS and running that for a while to see if it still happens.

Other than that, try another distro and see if it still happens.

I had similar Problems. Ramdomly freezing of the desktop, start of a second sddm-loginscreen 5 minutes after login, graphical glitches in menue.

I always hoped that it woulöd get better after some updates. But (for what I can tell after 3 or 5 hours of testing afterwards) there where two things that made the system stable again:

  1. I switchted to another KDE-Plasma-Themes after I saw errors in journalctl refering to deprecated funtion calls in the theme I used. Now I’m back to one of the stock themes (beeze dark)
  2. For the flickering: It went way after I applyed this Kernel-Paramter found in this thread: Screen flickering on Linux kernel 6.12 - #17 by haykh

dcdebugmask=0x400

Thanks for the suggestions. I will definitely try another distro and keep it idle for a while to see if the freezes happen again if I can’t figure something out. I did not go with the LTS kernel as it was a fresh install anyway, I figured it would not be worth it.

I will certainly try the dcdebugmask parameter I have seen in a bunch of issues, including this one Artifacting and glitching on 7840HS/780M on Wayland (#3388) · Issues · drm / amd · GitLab that has been affecting me but not as hard as some other person (it happens very rarely and only for a split second). I wonder if it’s maybe related to the freezes ? It would explain why the screen freezes after being idle for a while (not a lot happening on the screen).

Unfortunately, this appears to be a Linux kernel 6.13 regression. Installing the latest LTS kernel (6.12) fixes the freezing issue.

Arch knows about it now and I would highly suggest installing the linux-lts kernel using pacman and then setting your system up to boot from that instead until this issue is fixed with the stable kernel.

For me, the issue would appear attempting to use flatpaks such as one to download YouTube videos. This of course was not the only application to hard lock the system though; Bottles also did this hard lock up.

I don’t think that’s related to my issue, as I already experienced these crashes in 6.12 and updated to 6.13 in hope of having it resolved. I also don’t use any flatpaks. For now, I will try with the kernel parameters related to power consumption and refresh rate and see if anything happens (hopefully no more crashes).

As with most things like this, the trick is to:

  1. Make the problem reproducible so it can be reproduced and fixed by the appropriate driver developer.
  2. Capture stack traces when it fails.
    Obviously, if the screen freezes and one needs to hard reboot the laptop, the problem is finding the crash dump / stack trace logging what happened.
    Places to look for stack traces:
    a) the Linux kernel logs in /var/log
    b) journalctl -b -1
    c) /var/lib/systemd/pstore
    On ubuntu any crash dump that requires a hard reboot or similar is stored in the UEFI pstore.
    systemd then reads the pstore at startup, and copies the output to /var/lib/systemd/pstore.
    I don’t know if Arch Linux does the same, but it is worth a look.

There will be times where none of the above helps, as is the case here:

But I thought I would mention the above a,b,c as not many people know about c.

1 Like

I haven’t seen this issue myself on my Arch install, but I’m running the zen kernel rather than the stock Arch kernel, so perhaps that makes a difference.

I have had an issue with KDE Plasma (kwin specifically) pulling an egregious amount of power, but that’s a different issue.

Thank you for the advices. I tried multiple times to reproduce it, but it really happens out of nowhere, with no prior conditions. The only hint I can give is that it happens most often when the laptop is idle. I have been reading the logs after each crash, but it never gives any meaningful information, most of the time the last logs are minutes before the crash even happened. I definitely did not know about pstore though, I will give that a shot next time it happens!

For the post you linked, I had already seen it, and I am following it closely to see where it goes. It could be possible that this is an issue with the Framework BIOS and power management, and would once again explain why it happens when idle.

I doubt it is related to the Arch kernel, but it’s also definitely possible. I’ve been working all day with my laptop today and experienced 0 freeze, I will report back if I really stopped experiencing freeze for a long time. The only thing I changed was adding amdgpu.dcdebugmask=0x12 to the kernel parameters to disable PSR because of another freeze I could finally catch kwin_wayland_drm: Pageflip timed out! This is a kernel bug, but it seemed to only happen with an external monitor connected and thus not really related to the usual freezes.

You can have both installed at the same time.

I didn’t know that, but it wouldn’t make much of a difference anyway, as the freezes already happened on the previous kernel.

As an update : I used the computer for 2 days now (with and without external display) with no freezes except one where the computer restarted automatically.

I have set up pstore and watchdog in the hope of catching a kernel panic, but none so far, it seems to be running smoothly. I did not do much except add amdgpu.dcdebugmask=0x12 and force all my applications to run in native Wayland if possible to have better performance when unplugged. Will report if anything happens.