Hardware issue or linux freezing issue?

Which Linux distro are you using?
pop.os
Which release version?
(if rolling release without a release version, skip this question)
22.04
(If rolling release, last date updated?)

Which kernel are you using?
Linux pop-os 6.9.3-76060903-generic
Which BIOS version are you using?
0.33 I’m not 100% sure but I think it is this
Which Framework Laptop 16 model are you using? (AMD Ryzen™ 7040 Series)
amd ryzen 7040 series

I am trying to figure out if i have a hardware issue or a linux freezing issue. I am trying to use psensor to montitor my cpu’s but i am not sure what all the sensors are.

Screenshot from 2024-11-18 10-48-41

Monitoring the temps are likely not going to yield much in regards to the “freezing issue”

What is the issue, in detail.

When did it start?

What resolves it? A reboot? Self-resolve over time?

When does it mostly occur?

You really need to try a different distro to test if your freezing issue is the OS or not. Also, it would be a good idea to run some memtest86 passes on the RAM to rule out a memory issue, which would probably be the most likely cause.

My friend is the one having this issue. I’ve been trying to help him figure it out for the last few months. He’s new to Linux as of this year. I’m no expert, but I’ve been using Linux for a while, so we’ve been trying anything I can think of whenever I get the chance to sit down and call.

To answer questions:
@knipp30
When did it start? - Sometime in June or July of this year. At first it was just a random freeze here or there with no apparent pattern, so we didn’t do much about it.

What resolves it? - A hard reboot. The freeze either causes the desktop to stutter or freeze completely. When it stutters, sometimes you can wait it out and sometimes it eventually freezes completely. Another thing I noticed is that the first thing to go is mouse control. When it’s not completely frozen, you can sometimes navigate by keyboard for a few seconds (which acts smoothly until it freezes as well).

When does it mostly occur? - When doing homework. The problem was so occasional before that it wasn’t really a problem. However, since my friend started up classes, it’s become a consistent headache. Ironically, we’ve tried heavy workloads such as gaming and watching 4k videos. So far as I understand, it almost never happens when he’s doing that. But when he has his development environment open (primarily Firefox, Zed, and a terminal running live-server through npm) it happens regularly. It’s not tied to this specific setup though. It’s happened while using VS Code or even just an editor like Gedit.

@jared_kidd
Have you tried a different distro? - Yes. Originally, he was using Manjaro since that is what I run on most of my devices and am familiar with. That, and he’s a gamer, so I figured since the steam deck was arch based, he’d have better luck with something similar. The first thing we tried when things got annoying was a clean reinstall of Manjaro. When that didn’t work, he tried PopOs. It doesn’t seem to be distro dependent. We’ve also tried rolling back a few kernels to see if it was an issue that was introduced this summer. It doesn’t seem to make a difference.

It would be a good idea to run some memtest86 passes - Thanks for the suggestion, I’ve downloaded it and am learning how to use it on my own computer so I can help him with it later today. If there’s anything specific I should watch for that is not obvious or easily found in the documentation, let me know.

Side note:
The reason we’ve been monitoring hardware was to see if there were any strange spikes to indicate a thermal or memory issue. I’ve searched all over this forum and the web for similar issues. The only two that had anything were hardware issues. One had thermal issues that required a mainboard replacement while the other had to do with specific cards in certain expansion slots causing the freeeze issue.

When we first started monitoring temps, I thought we were fine until my freind pointed out that one of the sensor is regularly in the 80=90+ Celsius range. We’re thinking that it’s one of the cpu cores, but I am not familiar enough with psensor to be certain. I was just trying to deduct from the sensor details such as chip and ID.

Temp 4 in linux is the “CPU temp” and unless its pushing past 100 its likely not a thermal issue.

Mine spikes and holds 95-100 on an all core load. There are some known thermal issues FW is working on addressing. However, I don’t think that is the issue.

I would use the laptop till it locks up again, do a reboot, and then check journalctl logs.

Note the exact time it locks up, and run the following - change the date and time to ~5 minutes before the freeze occurs:

journalctl --since 2023-04-05 08:00:00

It does sound hardware related - just not sold on temps. Have you running just one ram stick?

Yes it is just one ram stick

This Was the memtest results

@jared_kidd

this is what i Got for the past 5 minutes surrounding the freeze

https://raw.githubusercontent.com/danielvchambers/framework16_debugging/refs/heads/main/crashJournal.txt

I’ll preface this with the fact I am a virtualization guy… not a desktop/kernel/etc guy… but I can see a few things that stick out.

  1. Does this happen only when using the USB port on the GPU? Like… did you plug in something to the GPU before the freeze?

  2. What device is sda1?:

Nov 21 18:37:35 pop-os udisksd[2261]: Failed to mount '/dev/sda1': Input/output error
Nov 21 18:37:35 pop-os udisksd[2261]: NTFS is either inconsistent, or there is a hardware fault, or it's a
Nov 21 18:37:35 pop-os udisksd[2261]: SoftRAID/FakeRAID hardware. In the first case run chkdsk /f on Windows
Nov 21 18:37:35 pop-os udisksd[2261]: then reboot into Windows twice. The usage of the /f parameter is very
Nov 21 18:37:35 pop-os udisksd[2261]: important! If the device is a SoftRAID/FakeRAID then first activate
Nov 21 18:37:35 pop-os udisksd[2261]: it and mount a different device under the /dev/mapper/ directory, (e.g.
Nov 21 18:37:35 pop-os udisksd[2261]: /dev/mapper/nvidia_eahaabcc1). Please see the 'dmraid' documentation
Nov 21 18:37:35 pop-os udisksd[2261]: for more details.

There are plenty of errors in there that someone might know. And a search in the file for crash dump returns a lot…

Is there anything in /var/crash ?

Myself and others have a similar / same issue with pop-os on FW 13s. There are other threads on this forum with details.
I set up a boot entry to default to
Linux pop-os 6.6.10-76060610-generic #202401051437~1709764300~22.04~379e7a9 and it runs stable on it.
It seems like the display (Gnome/ DM etc) is what freezes.
When I test newer kernels / RDs, I enable the ssh daemon so I can SSH into my framework when it locks up and I can look at dmesg, logs. processes etc and then do a graceful shutdown/reboot.

Gnome’s tty shell/mode also works from the FW keyboard and screen for analysis and a gracefull restart.
From the Gnome help pages:

To get to a one of the four tty shells in gnome, press one of ctrl-alt-F3  thru f6.

To return to the graphical shell from a tty shell, type alt-f2.

Alt-f1 takes you to a new login session (GDM).

Within a tty shell, alt-arrow moves between tty shells.

You are not alone. I hope this info helps.
I think it is time to switch distro but I was hoping the 24.10 version of Pop! OS with the new cosmic DE would be out by now.