Hello,
I got my FW16 about two weeks ago and installed Arch Linux with the archinstall script. I’ve been consistenly running into a bug where the laptop’s screen freezes, but everything else continues to run. Any audio playing continues to play, and if I have a 2nd monitor the monitor continues to run normally. When this happens I get no errors in journalctl or dmesg. The only way to unfreeze the screen is to hold the power button down until the device turns off, or if I have a 2nd display I can press reboot. My system is dual booting with windows 11 and windows does not experience these issues. If anyone can help me that would be greatly appreciated, thanks you.
Which Linux distro are you using?
Arch Linux with KDE Plasma running with Wayland.
Which release version? (If rolling release, last date updated?)
Yesterday, August 13th.
Which kernel are you using?
6.15.9-arch1-1
Which BIOS version are you using?
6.03.05
Which Framework Laptop 16 model are you using? (AMD Ryzen™ 7040 Series)
Framework Laptop 16 AMD Ryzen 7040 Series with AMD Radeon 7700S GPU Module.
Logs from last screen freeze:
Memory: 3.66G / 30.7G
CPU: <2%
Time of crash: Aug 14th, 9:15:24pm
System uptime: 238s
Applications open: Spotify - Idle, Firefox 1 tab
I tried to take a screenshot of the screen being frozen, but the screenshot shows what the display should be showing and not the screen in its frozen state.
I don’t have freetube at all, and when it happens it will never recover. I had it just happen again while looking at my email in Thunderbird, I waited about 30 minutes and the screen never came back, after about 5 minutes of being frozen it stated having screen tearing artifacts.
I’ve had this also. The most effective thing I’ve tried (at the recommendation of FW support) was enabling “gaming mode” in the BIOS. That didn’t eliminate the freezes but it seemed to make them less frequent for me.
If you find out a resolution definitely let me know… I’m still trying a couple of things so I will do the same if anything I am trying works out to resolve the issue too.
I just now took the time to read through the journal, there is nothing in there that refers to the freeze and crash except these drkonqi messages:
Summary
Aug 14 21:12:55 LAPTOP-LUG34II drkonqi-coredump-processor[1349]: "/app/extra/bin/lunarclient" 58104 "/var/lib/systemd/coredump/core.lunarclient.1000.6a6f80fa5d174f77a471a4f7ff91b83c.58104.1754100079000000.zst" Aug 14 21:12:55 LAPTOP-LUG34II systemd[940]: Started Launch DrKonqi for a systemd-coredump crash (PID 1349/UID 1000). Aug 14 21:12:55 LAPTOP-LUG34II drkonqi-coredump-processor[1349]: "/usr/bin/plasma-discover" 2614 "/var/lib/systemd/coredump/core.plasma-discover.1000.e7a856a7ceb845c1a976ea5935b3ca4c.2614.1754198378000000.zst" Aug 14 21:12:55 LAPTOP-LUG34II drkonqi-coredump-launcher[2506]: Unable to find file for pid 58104 expected at "kcrash-metadata/lunarclient.6a6f80fa5d174f77a471a4f7ff91b83c.58104.ini" Aug 14 21:12:55 LAPTOP-LUG34II systemd[940]: Started Launch DrKonqi for a systemd-coredump crash (PID 1349/UID 1000). Aug 14 21:12:55 LAPTOP-LUG34II drkonqi-coredump-launcher[2507]: Unable to find file for pid 2614 expected at "kcrash-metadata/plasma-discover.e7a856a7ceb845c1a976ea5935b3ca4c.2614.ini" Aug 14 21:12:55 LAPTOP-LUG34II drkonqi-coredump-processor[1349]: "/usr/bin/systemsettings" 2849 "/var/lib/systemd/coredump/core.kinfocenter.1000.320436a120f24d6bac3429ce0d5e9e50.2849.1754293327000000.zst" Aug 14 21:12:55 LAPTOP-LUG34II systemd[940]: Started Launch DrKonqi for a systemd-coredump crash (PID 1349/UID 1000). Aug 14 21:12:55 LAPTOP-LUG34II drkonqi-coredump-launcher[2512]: Unable to find file for pid 2849 expected at "kcrash-metadata/systemsettings.320436a120f24d6bac3429ce0d5e9e50.2849.ini"
Same for the dmesg, there’s nothing at all O.o
Which mesa-vulkan-drivers and plasmashell version do you have?
I’ll try enabling “gaming mode” and seeing if that will help at all. I did try and reach out to FW support too see if they had any solutions and I’m awaiting a response. Thank you.
Thanks for taking the time to look through the log, I looked through them a few other times that the screen froze up and I also haven’t been able to find and errors or anything relating to the freeze, which has left me very confused.
My vulkan driver vulkran-radeon version 1:25.1.7-1 and plasmashell’s version is 6.4.4.
Okay, interesting; I am on 25.0.7 and 6.4.4, so definitely not too old xD
But maybe too new? You could give it a shot downgrading to 25.0.7 and trying a newer/older kernel. But that’s kind of a shot in the blue…
Just a whole new idea: Have you tried reseating the interposer to the dGPU? Maybe it’s not seated perfectly?
It’s kind of a long shot, but maybe worth a try?
I just tried reseating the interposer, the dGPU module, as well as the whole screen and eDP cables, and I had a freeze almost instantly.
I’ve noticed the last few freezes have all occurred while in firefox, I just tried to change a few of the hardware acceleration settings, but then I had it freeze while writing this reply. I’m going to try to turn off hardware acceleration and see if that helps, if it doesn’t I might try downgrading.
I just got this in an incredibly weird fashion, I’m making a separate post. But yes I’ve observed this too, it happens commonly when editing text fields in Firefox and disabling acceleration seems to make it less likely (although it doesn’t solve it.)
I might go back to some of the AMD driver flags, people have reported good success with them in the past. I sort of gave up on them because I wasn’t able to arrive at a config that made the crash never happen, and it’s so intermittent that it’s easy to fool yourself into thinking that some particular set of changes was what fixed it, but I just saw people talking again about how they felt like they had fixed things, so maybe it’s worth a try.
You might have seen, there is tons of discussion about it here:
What one person recommended was amdgpu.dcdebugmask=0x400 amdgpu.sg_display=0, but I can’t really tell whether it’s even the same issue.
Yes, the amdgpu driver has been quite buggy for a while. Beyond the various glitching bugs that still don’t seem to be fixed even in Linux 6.16.3, I currently get semi-regular freezes with amdgpu 0000:c1:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data, the gpu_recovery mechanism doesn’t seem to do anything, or it’s very slow. So either I do the SysRq key combination (which doesn’t seem to be too relibale) or when I’m at home I can ssh into my laptop and execute it via script, as just a reboot usually won’t do anything. Sadly, nobody at the AMD team seems to be able to replicate it, but even if I was able to go through commits to see which one caused this, without a reliable trigger of this issue, it’s impossible to figure this out.
amdgpu.sg_display=0
That shouldn’t be needed. If this helps any, you should check if you are in game mode. This should only cause issues with low VRAM, which is common with only 512 MB, but the 4 GB of game mode should suffice to prevent this one. And I think this does save a bit of power keeping sg around, especially when you need to disable PR to get rid of the flickers.
I have been having a similar issue with KDE under wayland. Using a 9070xt in a desktop. For whatever reason I don’t experience this under Hyprland so I’m inclined to believe this is how the compositors are interacting with the drivers. I haven’t been able to spend the time to nail down the exact issues though. On zen 6.16 it still happens for me.
So… when I started mucking around with this again, I saw that just last week the AMD development kernel where up-to-the-minute fixes go into, got some fixes related to DMUB and VRAM, and since updating to that version I haven’t seen the issue.
It’s still too early to say that that means it “fixes” it (since this issue is so intermittent) – I’ve been bitten several times before by thinking that some workaround or other was “the fix” only to have the problem crop up again later. But, if you want to try out what I’m trying and give some additional testing to it, this is the NixOS stanza for it:
IDK what the equivalent is for Arch, but that rev variable is the commit to fetch, and the url variable is the repo to clone, if you have the ability to translate that to Arch and want to try it.
Sadly, nobody at the AMD team seems to be able to replicate it
So this is interesting to me – there are two obvious possibilities:
Framework devices have such a high DPI and refresh rate that they expose driver bugs that are not commonly seen.
Some Framework devices just have hardware bugs, unrelated to the driver.
I used to feel like it was probably option 1, but I’m actually a little bit more over time leaning more towards option 2. I have had problems with a Mediatek network card, an AX210 network card, and an AMD GPU, to an extent that seems like it would have popped up and got dealt with, if it was a general expansion hardware / driver issue.
Maybe not. I’m still not real sure what is causing the issues honestly. Definitely there are known bugs in the AMD driver, and I feel like I remember people in the forums talking about how they would have issues on Linux but not on Windows… but even that isn’t really definitive (the Linux driver could operate things in a way that exposes some kind of hardware fault that isn’t revealed using the Windows driver).
Currently, I’m mucking around with different AMD driver options seeing if I can find a set of them that eliminates the problem, and probably if that does not work I will go way way back to kernel 6.6 and see if that produces some positive impact.
That’s sadly the difficult truth. I don’t even know how far AMD gives testing devices to their Linux devs. If you have something very rare (for all I remember only one other person or so said to have the same or a similar bug) it becomes really difficult to figure these things out. One of the AMD guys even recommended to me to try to bisect the problem to figure out the offending commit. But when you can’t even tell how to reliably trigger the issue, and when you have not the slightest clue about bisecting (compiling the kernel with a given config as Debian package is the most I know, and that’s dead simple), that’s sadly also not helpful. For all I remember, AMD has added some stuff to the kernel that may or may not help (no idea if for 6.16 or 6.17), but when it gets just too intricate, no matter how much I’d like to help at least by providing logs and stuff, that’s just so far above my paygrade…
I used to feel like it was probably option 1, but I’m actually a little bit more over time leaning more towards option 2.
That’s also the direction I’m heading in. Sadly the request form for the phase change thermal pad is closed, the next time my current issue appears I’ll look at the thermals. Beyond that log message I posted, the only thing these occurrences have in common is that the device is quite warm (usually from charging). Maybe it’s actually a thermal issue, even though it never happens in the rare occasions I actually have my FW in performance mode and e.g. compile the kernel, which should produce a lot more heat than the situations where this occurs, but you never know.
I have had problems with a Mediatek network card, an AX210 network card, and an AMD GPU, to an extent that seems like it would have popped up and got dealt with, if it was a general expansion hardware / driver issue.
Wow, I have had a similar experience. Though, MediaTek is just an abomination, it’s guaranteed to cause issues no matter the device and the OS. I also switched to the AX210, the only issue I have is 6 GHz WiFi not working (but lacking a decent AP for testing, but I just ordered one, and also having the same issue across distros, including Windows and the MTK module), though I never owned the dGPU expansion bay module. Additionally I have a weird bug with the headphone jack module that at least appears with debian, Ubuntu and Debian with the upstream kernel compiled based on a Debian config, but not with Fedora…Also I fear that e.g. compared to the infamous Intel 13th and 14th gen processors that had these weird issues, the AMD Ryzen 7000 (and especially the laptop SKUs) just aren’t used by that many people to manifest in numbers large enough to raise suspicion. I’m not sure with which kernel version all this started, but even Ubuntu 25.04 has “only” 6.14, and the Linux user base will probably just not be that massive for rare errors to crop up that much. So there’s basically nothing to build theories on since it seems impossible to get hard evidence.
and probably if that does not work I will go way way back to kernel 6.6 and see if that produces some positive impact.
Good idea since that’s still LTS. Also, all the VRR/PSR/PR glitches would be gone since they weren’t supported back then. Maybe I’ll try that out soon too.
Yeah, all makes sense. I actually do know how to bisect through the kernel tree to find the offending commit… I’m honestly a little bit tempted to try rolling back to kernel 6.6 without changing anything else, and see if I can run for a couple of weeks without issues in that config, and then if I can (I don’t think I ever saw the crash under 6.6 for the time period when I was running it), then I can start to take that investigative method. It’ll take a long time with how long it is between failures and how much time has passed since the kernel which I didn’t see the problem on… but maybe worth a shot. And then of course various other things change during the time when I’m working on bisecting…
Where was it that you were talking with them when they recommended bisecting? I’ve been on the freedesktop bug tracker / code repo for AMD which seems fairly well monitored, and nowhere else.