Kernel panics and lockups in linux (and Windows)

Hello, I received my Batch 5 i5 DIY in mid November. I have had issues with kernel panics and lockups as long as I have had it. The frequency of these panics seems to depend on how demanding the workload is on the laptop. For example it can typically run for a few hours if all i have running is a text editor (kate) and a terminal emulator, but I’d be lucky if it lasted more than an hour on YouTube. Most of the time it just panics and flashes the caps light, but occasionally it will just lock up and reboot after a few seconds.

I’ve also had some strange things happen that seem related, like Kwin crashing or Terraria having a massive audio and visual freakout when I tried to play it. Occasionally, I’ve also had a few small graphical artifacts when I have played videos.

When I first noticed these issues I did some basic trouble shooting:
I checked the itegady of the SSD, which was fine.
I ran memtest which passed.
I checked journalctl logs, which didn’t record anything from the panics.

After these basic steps I decided to just live with the issue until the end of my semester when I would have more time. I just used my desktop for most things. This past week I have reinstalled Manjaro with a new iso, which didn’t fix the issue. So I tried running the Fedora live environment on a USB drive and it locked up within a few minutes of starting a YouTube video.

I have no idea where to go from here as my linux skills aren’t good enough to figure out what is causing the panics. I just hope it’s not a hardware issue.

I’m running:
BIOS: 3.06
Kernel: 5.15.7-1 (I was running 15.13 earlier)
Ram: 2x8gb PNY (8GBU2X04JGGG39-12-K)
SSD: SK Hynix Gold P31 500gb

2 Likes

One thing you might also try doing is to set the performance mode for the CPU in the BIOS settings. Does the issue persist if you turn off performance mode and switch it to non-boosting mode? Obviously it’s not optimal long term to not have access to that performance, but it may help you isolate the issue.

If you’re willing to give it a try, you could also see if the issue happens if you’re running windows. (You can run it without a license, you just won’t be able to customize some aesthetic settings, but it would be for testing purposes only, so you can go back to linux later).

You might also try running the latest beta BIOS which is 3.07, keeping in mind it is a beta.

Does this happen when plugged into the charger, or on battery, or both? What charger are you using?

If no one else has any other ideas, and my suggestions above don’t help diagnose your issue, then I would suggest contacting framework support.

1 Like

Did you update the firmware on the P31 before you started using it? If not, this post over at macrumors gave me the clearest guidance on how to do so.

1 Like

Thanks for the replies.

I will try changing the performance mode and seeing if the issue persists in windows 10.
I am a bit hesitant to update the BIOS or the SSD firmware while the system is unstable.

The panics happen on and off battery power. I use the framework charger.

Has anyone stably used PNY memory? If there is an hardware issue I assume that might be it.

[Update]

I installed windows 11 along with the driver package.

It ran stably for a few hours on YouTube so I installed 3Dmark and ran the Time Spy bench mark. The system locked up part way though the GPU test, so I decided to reboot it and try the Night Raid benchmark. 3Dmark then crashed when the benchmark loaded and I had to manually kill the processes in task manager. I ran Night Raid again and the system locked up in the GPU test and left some artifacts on the screen. [Image below]


These artifacts are similar to those I saw in Linux.

After that I disabled Speed Step, Speed Shift, and Turbo Boost Max. I also set the boot performance mode to Max Non-Turbo. I ran Night Raid again and the laptop locked up again.

None of these lockups resulted in a BSOD and I have no idea what exactly is causing them but I’m sure at this point that it is some kind of hardware issue.

Should I contact Framework support?

It doesn’t sound like your testing has isolated the video chipset from the processor. Do you get kernel panics on high CPU-only loads, like compilation tasks?

(go and git clone something like clang and make -j8 it, or the rust compiler, and check if you can replicate the hang without touching graphics processing)

If the answer is no, my guess is that either the integrated GPU is dead or something it needs is screwy. If the answer is yes, my guess is that something shared by the iGPU and the CPU is screwy, e.g. memory (since the iGPU uses system memory) or power.


I know you said that you successfully ran memtest, but… Have you tried running the laptop with just one stick, and then with the other? Or with a known good memory stick?


Can you confirm that the Fedora live image you used was a squashfs or similar that lived entirely in memory? If it’s pulling stuff from across the USB, that’s just more possible failure area to consider with the live image.

1 Like

Thank you,

Your reply brought up some things I hadn’t thought about.
I haven’t tested each Memory stick or channel individually yet so I will do so using 3Dmark. (3Dmark is the only way I have been able to cause the crashes consistently.) I don’t have access to any other DDR4 laptop memory so I can’t completely take the two PNY sticks I have out of the question.

If the issue sill persists I will try the high CPU load test you recommended.

I do have some questions:
Can I run that compile test in a live environment?
How do I verify the live environment is running in memory?
If it isn’t how do I make it run in memory?

Can I run that compile test in a live environment?

Yes. Be aware that compiling can be memory intensive and that you’d probably be better off doing it from disk with some swap space.

How do I verify the live environment is running in memory?
If it isn’t how do I make it run in memory?

I must admit I didn’t fully think this one through when I mentioned it. That said:

That option might not be available on the Fedora image since apparently toram is ‘special’ in ways that are beyond my ken.


Annnddd, new thought:

I’m not familiar with how much memory Night Raid uses (… or even what it is). Could you watch how much memory it’s using the next time you run it (on Windows I assume)? Given that it looks like some kind of triple-A game I assume it’s using a bunch of memory and it’s possible that the system freeze is from the GPU cache-thrashing against swap (or whatever the heck Windows calls it instead of swap).

1 Like

Sorry it has taken me so long to reply. I was caught up with a lot of stuff around the holidays.

Night Raid is one of the bench marks in 3Dmark. I ran the CPU specific tests a few times and they didn’t cause any crashes. I did get one BSOD (HYPERVISOR_ERROR) while loading a video. That was the only BOSD I have gotten. All of the other times the system would just lock up and reboot after a few seconds. One of the suggested troubleshooting steps is to run the windows memory diagnostic. I did this many times and there were no errors reported while the test was running, but the results would not show up in the event viewer for some reason. (Thanks Windows, very cool.)

I tested every possible memory config I could. I tested each stick individually in both channels and both sticks with the channels swaped. 3Dmark would still cause crashes in every config.

I then reinstalled Manjaro to run the compile test you suggested. I could not wrap my head around compiling clang manually so I used the AUR to build llvm-git which includes clang. I watched Htop while the build was running and it pined all 8 threads at 100%, got little over 7GB of ram usage, and it did use the swap file quite a bit. It took roughly 2 hours and the system didn’t lock up or crash.

At this point I’m pretty convinced that the source of the issue is related to the iGPU and I will be submitting a support ticket.

Thank you all for your amazing support. I really do appreciate it.

Any updates on this? Were you ever able to pinpoint the source of the error?

I have been having the exact same issues you have listed.

However, the only hardware we have in common are the PNY RAM sticks, of which I am currently the most suspicious after running some memory stress tests.

Sorry I didn’t respond. The issue was the non-validated ram sticks I was using. Once I switched to sticks that were on the validated list the issues I was having were resolved.