[TRACKING] Linux freezing on multiple distros

I am seeing random intermittent hard freezes, but only when my eGPU is connected. Approximately one hard freeze every 6-8 hours or so. I also see random bluetooth failures (maybe 10 hours uptime between fails), and occasionally wifi takes a union break as well (maybe 1 in 20 hours uptime if that).

If the Framework is disconnected from eGPU, I have not seen a single issue. Therefore, I chock this up to problems with the proprietary nvidia driver, which unfortunately only has one fix: waiting.

Currently running 5.15.10.arch1-1 kernel in Endeavour, with nvidia 496.46-4. Similar freeze ups were present at the start of my testing all this (5.15.6 / 470.xx ish), in both arch and Fedora.

1 Like

Thanks for the suggestions everyone.
@requiem @2disbetter
Checked the clock speeds and they seem ok.
Running
watch -n 1 'cat /proc/cpuinfo | grep -i mhz'
I can see that most cores are staying around 2.4GHz. For reference, I’m on power saver mode on Fedora 35 currently but I’ve also seen the issue on balanced.

I’ll keep that watch command up on the terminal to see what it says if/when it freezes.

Might try installing windows as well and seeing if I can reproduce/install that bios update.

@D.H
That is unfortunate. For me, its so far happened when I have nothing plugged in at all and on battery. Haven’t been able to use the machine for more than an hour at a time. It even happened on a fresh boot at the sign in screen once. I’m currently on kernel 5.15.8 that ships with Fedora 35.

More info:
I have the DIY i5 version.
16 gigs of ram sent from framework
500 gig sn850 wd_black drive I purchased elsewhere.
2 usb A
2 usb C

Just happened again. Most cores were still at 2.4GHz.
I did notice both this time and a couple previous times, it seems to get into some kind of degraded state before it freezes. I had another terminal instance going. If you try to type any terminal commands, they get stuck. Ctrl C won’t kill anything. Trying to shut down (tried this the previous time) during this degraded state does not work. From there, it just all freezes.

I had that issue today. I assumed it’s Microsoft’s fault, but maybe not.

Running Linux Mint 20.2. I had a bunch of tabs open with Outlook online, a Microsoft remote desktop web client, and a desktop version of teams running a meeting. Things seemed to be pokey, but I tried to open OneNote in the browser and everything just… stopped. No keyboard or mouse input worked. Since I was in a meeting, I rebooted rather than troubleshoot. No issues since. I had some performance issues prior, but there’s so many hops in this setup, I can’t be sure where the hiccup was.

You’re right, of course. I’m just used to assuming that if I have too much stuff running, the kernel can just surrender and die :stuck_out_tongue:

Tried to install windows to see if I could reproduce and or install that bios update.
The issue happened again while I was trying to reformat disk my via a fedora boot usb since I had the whole thing encrypted. Second time was the charm and was able to wipe the drive and get it to a format windows recognizes.

Installed windows via a windows boot usb. Fun fact, touchpad did not function during this process so I had to navigate around with tab. Not the end of the world. If anyone is stuck on accepting the tos, that checkbox only accepts space bar as an input, not enter. :slight_smile:

Got windows all installed and ran into a few hurdles. Wifi was not working. Spent 10 minutes to go find an ethernet cable and dongle. Right as I got back to the laptop, I was greeting by an immediate blue screen. Upon next boot, I got the message “Default Boot Device Missing”. Awesome.

Luckily, I came across a thread about this very thing while looking into my original issue.

When I find time to fiddle around some more, I’ll see if I can get windows reinstalled, the wd software downloaded and the firmware installed. Hopefully I can manage it before a blue screen. I’ll report back if that fixes things and if it was related to my original issue. I suspect its possible.

Turns out after a hard shutdown, windows is actually still there so no need to reinstall. The new tricky issue is getting the laptop to behave long enough to install the firmware. Had just clicked on the installer when the laptop suddenly seems like it went to sleep. Screen had backlight on but power button and keyboard lights turned off. I was able to wake it by clicking on the trackpad but keyboard was non functional after that. Even a reboot isn’t fixing the keyboard. I am not catching a break at all here lol…

Edit: Drive firmware update installed. Discovered USB C dongle for ethernet died as well with the keyboard. Unplugging it and plugging it back in magically made the keyboard wake up and lights on keyboard and power button come back up.

Did some testing today. Drive firmware did not solve the freezing issue on linux.
Ended up updating the bios to 3.07 and am so far a few of use on Fedora with no freezes. So far so good.

Quick followup.
I saw further issues with freezing on windows and linux. I decided to get a replacement sn850. This did the trick and i’ve been running stable on fedora since then with no hiccups.

3 Likes

I just got a Framework and have been having the exact same issue on Fedora 35. Random lock ups (within 15-20 min of use) where everything freezes and then the machine reboots. Otherwise, all hardware seems to be working fine. My laptop already came with bios 3.07.
@Popbear Thanks for your posts and updates. This gives me hope that it’s just a faulty ssd. I should have just bought my own ssd. Here’s hoping their support comes through

@mussa Did you solve your issue? I’ve got the same behavior and cannot pinpoint the problem. Kernel logs do not show anything as far as I’m aware. I’ve also got a SN850.

@43c No, their support stopped responding to me so I initiated an RMA and returned it. They told me they hadn’t received my last email which is either BS to save face, or they have an issue with their email server, as I have read on here that someone else had the same issue. I really wanted to like it but this issue made is unusable.
Anyways sorry for the rant. Your best bet is to get a hold of another nvme and test it out with that one. Otherwise, they will have you do a gazillion inspections and tests which can take weeks.

@mussa That sucks.

I’ve ordered a Samsung 980 Pro for testing. Freezing on Linux seems to be an issue with WD NVMEs. There are at least a few other reports and articles around like this one: WD NVME SSD Freezing on Linux Esc.sh

After a second look at the kernel logs while running a load, I’ve noticed that there are indeed NVME related warnings and errors:

I will take the time this weekend and do some testing. Maybe also test with a windows installation.

@43c I wish I had known that before returning it. Keep us posted about your results! Good luck

No issues with freezing on SK hynix P31 on Fedora 35 or Arch.

The new disk arrived and I had some time to reinstall the machine and do some testing. I think the WD SN850 was causing the issue since all NVME related errors and warnings are now gone from the kernel logs with the Samsung 980 Pro installed.

I did not have stability problems so far. I could previously reproduce a freeze consistently by stress testing. Now it seems to run smoothly.

Let’s see if this change is constant. I’ll use the machine for work this week and will see if things are stable.

After a week of work the system seems stable. Looks like the WD SN850 has indeed issues and can’t be recommended for use with Linux. I’ve actually put it into my Windows desktop and ran WD’s diagnostic tool, which of cause, did not find any problems with the drive. Another plus for the Samsung is that I can use the built in encryption instead of LUKS which gives slightly better performance, though still below of what you’d expect when using no encryption at all.

1 Like

I have the same issue occurring, and this is the first thread that actually looks like it identified the issue. Thank you 43c! I’ll test another hard drive. @Framework support would do well to check this out.

@T_RRR sure thing! I’ve actually contacted Framework support about the issue and pointed out that it might be something that a larger amount of customers are facing. Let’s see what they make of it.

Been having this random freezing issue since the dawn of days (of this frame.work). Also have an SN850 installed.

@43c how did you diagnose that problem?

@Framework anything from your end?