The new disk arrived and I had some time to reinstall the machine and do some testing. I think the WD SN850 was causing the issue since all NVME related errors and warnings are now gone from the kernel logs with the Samsung 980 Pro installed.
I did not have stability problems so far. I could previously reproduce a freeze consistently by stress testing. Now it seems to run smoothly.
Let’s see if this change is constant. I’ll use the machine for work this week and will see if things are stable.
After a week of work the system seems stable. Looks like the WD SN850 has indeed issues and can’t be recommended for use with Linux. I’ve actually put it into my Windows desktop and ran WD’s diagnostic tool, which of cause, did not find any problems with the drive. Another plus for the Samsung is that I can use the built in encryption instead of LUKS which gives slightly better performance, though still below of what you’d expect when using no encryption at all.
I have the same issue occurring, and this is the first thread that actually looks like it identified the issue. Thank you 43c! I’ll test another hard drive. @Framework support would do well to check this out.
@T_RRR sure thing! I’ve actually contacted Framework support about the issue and pointed out that it might be something that a larger amount of customers are facing. Let’s see what they make of it.
@Michael_Siebert
Open a terminal and monitor kernel logs while running a disk heavy load (using KDiskBench/stress or similar tools). After a while nvme related errors (previous post) start to appear and finally the system freezes. It sometimes takes 2-3 hours until the freeze, sometimes less.
You can monitor kernel logs with
sudo journalctl -k -p4 -f
-k : show only kernel logs
-p4 : severity (show errors and warnings only)
-f : follow log stream (print new logs)
If you just experienced a freeze on your last boot, you can display the logs for your last boot with
sudo journalctl -k -p4 -b-1
That is assuming your distro uses systemd.
I did the same tests with a Samsung 980 Pro and did not experience any freezes. The system runs solid since I replaced my WD. This seems to be a firmware issue that is affecting Linux in particular and is only present on some drives. Different parts used by WD depending on availability maybe? This is quite common with flash based storage.
I put the WD drive into my Windows desktop and also ran WD’s diagnosis software. It did not detect any issues whatsoever. The drive also has the latest firmware available.
On Windows it does not cause any problems for me, though I did not test windows on the framework in combination with the WD SN850 so take this with a grain of salt.
I’ve been using the SN850 1TB for a little over 6 weeks on the Framework now without any issues. FWIW I am using Slackware Linux (tried Windows 11 dual boot for a week too without issues), please let me know if there is anything that I can do to test/reproduce that might be of help.
So I wasn’t seeing anything disk-related in my journal so far, also during / after crashes. So I do not think at the moment that my specific problem is the SSD.
But: I’m running without a crash now for almost 2 days! A colleague of mine asked me whether it might be Bluetooth, so I’m running with a tooooooo short headset cable right now and a Logi Unifying Receiver for my mouse, with deactivated Bluetooth.
If that’s what it takes, I’ll need a longer headset cable.
Interesting my thinkpad e590 kept freezing after update to ubuntu 22.04.
And I had aswell blutooth just turning off!
Now I have updated the bios 3 days ago and no freez and no bluetooth off.
My freez where the dektop like a picture nothing happening at all. Needed to reset with hard reboot.
It happened to me on another device. A thinkpad from lenovo.
I was just sharing, that it might not be just framework laptops. I never had such freezings with ubuntu the past 10 years. The lenovo thinkpad bios update solved it for me. No idea what as changed in this bios.
If that helps I updated to 1.32 thinkpad bios ,Ibthink I was from 1.26ish.
Logs are quite obscure…maybe this windows 11 stuff made it work…
CHANGES IN THIS RELEASE
Version 1.31
[Important updates]
Security issue update LEN-60179, LEN-68033, LEN-59195, LEN-66614, LEN-66615 and LEN-61632.
No issues with freezing on EndeavourOS (arch derivative), on BIOS 3.07, 3.09, or 3.10, with heavy/continuous bluetooth audio and HID use. Kernels 5.15.* and 5.18.*, have not gotten to 5.19 yet.
My framework laptop also experiences occasional freezes in Gentoo. However it only happens when the screen is blank and the system is idle. It never happens if there’s any significant load on the system (never had it froze on me when I leave music playing on it). I configured my system to turn off display after 10 minutes of idling, and never suspend to RAM unless instructed. I’ve been experiencing this issue since I got the machine (running BIOS 3.02 back then) up to now (running BIOS 3.10). No freezes under Windows, or another laptop with a nearly identically configured Gentoo system. I don’t use bluetooth, and I have a Samsung 970 pro installed in my framework.
Apart from the HD and bios my system is identical running Ubuntu 22.04.1, try upgrading the bios to the latest 03.10.
My Framwork is running stable the fan only comes on when the machine is working hard ripping DVD,s and the like and my battery life is around 5 to 7 hr depending on what I am using it for.
@QRP Do you have the possibility to swap the disk with another model for testing? That did the trick for me but there seem to be different causes for freezing (e.g. faulty wifi/bluetooth cards). Also the same model of disk which caused my machine to crash does work perfectly fine for other people here. I suggest you contact the Framework support.
Are there any errors in your logs after a freeze? You can check with
sudo journalctl -k -p4 -b-1
In regards to the the temperature, 167°F (75°C) is quite normal @4.7GHz. Try the balanced power profile (if you are using power-profiles-daemon) to get that down in idle. Also applying a different thermal paste does help. I’ve used Thermal Grizzly’s Kryonaut, which works well for me. I’m at around 116.6°F (47°C) at idle but it can easily reach 167°F (75°C) when loaded.
I will check the log after the next freeze.
I have switched out mice, OS, browser, still getting random freezes. I will post after the next freeze.
Thanks for your help.
Just froze and rebooted.
sudo journalctl -k -p4 -b-1
Oct 29 19:37:42 pop-os kernel: ACPI BIOS Error (bug): Could not resolve symbol [_TZ.ETMD>
Oct 29 19:37:42 pop-os kernel:
Oct 29 19:37:42 pop-os kernel: No Local Variables are initialized for Method [_OSC]
Oct 29 19:37:42 pop-os kernel:
Oct 29 19:37:42 pop-os kernel: Initialized Arguments for Method [_OSC]: (4 arguments def>
Oct 29 19:37:42 pop-os kernel: Arg0: 00000000fd4f0b40 Buffer(16) 5D A>
Oct 29 19:37:42 pop-os kernel: Arg1: 00000000effbaf21 Integer 0000000>
Oct 29 19:37:42 pop-os kernel: Arg2: 000000007a0c7c28 Integer 0000000>
Oct 29 19:37:42 pop-os kernel: Arg3: 000000006b1f36e3 Buffer(8) 00 00>
Oct 29 19:37:42 pop-os kernel:
Oct 29 19:37:42 pop-os kernel: ACPI Error: Aborting method _SB.IETM._OSC due to previous>
lines 26-54/54 (END)
BTW, Pop! OS runs much cooler than Ubuntu.
BIOS Vendor: INSYDE Corp.
BIOS Version: 03.10