SSD Failed Twice, Maybe Temp Related

Mark_Leone · November 23, 2025, 9:08pm

Which Linux distro are you using? Pop!_OS

Which release version? 24.04 LTS
(if rolling release without a release version, skip this question)
SSD Failed Twice, Maybe Temp Related
(If rolling release, last date updated?)

Which kernel are you using? 6.17.4-76061704-generic

Which BIOS version are you using? 4.02

Which Framework Laptop 16 model are you using? (AMD Ryzen™ 7040 Series) 7940HS

I bought the laptop in May, and the SSD failed in July. The SSD was the WD Black SN770 sold by Framework. Framework replaced it, and the new one failed again two months later. Failure mode both times was multiple bad sectors in the boot sector and a bad superblock detected by fsck and other tools.

StorageReview rates the MTBF as 1.75 million hours. So either something extremely unlikely occurred, or something in my environment is stressing the disk. My use case is mostly web browsing and email, with occasional software development, but not a lot of disk I/O intensive work or compiling/building, etc. So temperature was my main suspicion.

After the second drive replacement, I started collecting temperature stats with smartd. I was seeing disk temps in the high 70s and low 80s (Celsius) consistently under load. Temps at idle in the 60s. I recalled that the thermal pad attached to the midplate is pretty narrow, so I replaced it with one that covers the full width of the SSD. That brought a slight improvement in the SSD temps, but I’m still seeing load temps around 77 Celsius on the SSD, and idle temps around 50.

Am I right to be concerned about these temps? Do I need to create more aggressive fan curves? I haven’t done anything with fan curves so far, but I’m monitoring fan performance. It doesn’t come on very often, and never more than a few hundred RPMs.

AnxiousDavid · November 23, 2025, 10:12pm

those temps are within the safe operating temperatures of the sn770 of 0ºC to 85ºC 250GB WD_BLACK SN770 NVMe™ SSD | Sandisk

Mark_Leone · November 24, 2025, 3:43am

Thanks, but what exactly does operating temp range mean? Could there be longevity implications of operating in the upper part of the range consistently? I received an LLM answer for SSDs in general that said 70-80 indicates poor airflow, and 80+ could induce throttling. I didn’t look into the sources for that answer, but I can. Nevertheless, I’m still left wondering why I had two disk failures in two months. I don’t want to just assume that was bad luck.

AnxiousDavid · November 24, 2025, 4:09am

operating temperature is the safe range for an SSD to operate at that shouldn’t cause any significantly increased degradation, WD/Sandisk have a controller that runs hotter than most brands and for general in SSDs the operating temperature range is a bit lower so i could see an AI model using general temps for that because for instance crucial and samsung SSDs are typically 0-70c range. Under 85c on a WD SSD is within spec and keeps the warranty intact so they wouldn’t put the number there if it was going to increase failure significantly because that would be on them to replace for free.

jared_kidd · November 24, 2025, 3:19pm

There is another thread that has been “tracking” these dead Framework sold drives for almost 2 years. I haven’t seen any root cause posted by Framework yet. Buy another brand ssd if you care about your data.

Mark_Leone · November 25, 2025, 12:55am

I don’t mind buying another SSD, but I want to make sure there is not an underlying problem with the Framework. If the problems are isolated to the WD drives, that would be good news.

AnxiousDavid · November 25, 2025, 3:14am

i have not been a huge fan of WD/Sandisk SSDs on paper they seem great but i have had some bad experiences with drive quality (IE dead on arrival) and premature failures, it is not a brand i even consider when buying SSDs any more. I hesitate to say they are worse than other brands only because my sample size being too limited to draw that kind of conclusion but personally i don’t consider them for builds.

I think it is worth trying a different brand and if it still fails then i would assume it is the laptop and you can contact framework about sorting everything out.

The most common m.2 drive slot failures i have seen would show up as different symptoms to what you have, either just runaway heating from a short or unreliable solder connection so the disk randomly disconnects.

Mark_Leone · December 18, 2025, 9:24pm

I wrote a python app to track my SSD health, and I finished it just in time to see that a catastrophic failure may be imminent as it’s wracking up hard media failures. Clearly, something is wrong with my Framework 16. Three primary SSD crashes since purchasing it in May, with roughly a two-month MTTBF.

I opened a support ticket, and Framework is currently looking at my log. If anyone would like to use the disk monitoring software I wrote, you can find it here. It’s a rich UI command-line client that discovers all NVME drives and shows current health and a histogram of temperature readings.

James3 · December 19, 2025, 2:40am

That is a little toasty.
On the FW16 there are heat pads near the SSD that help dissipate the heat. On a new FW16 they have a platic film on them? You need to remove the platic film for them to work.

Mark_Leone · December 19, 2025, 8:55pm

Thanks, I did remove the plastic film, after the first failure. After the second failure, I replaced it with a wider one (I noticed it only covered part of the width of the drive). Temperatures have been non-problematic since the first failure. Generally below 60 Celsius. There are numerous unsafe shutdowns and media errors in the smart log, even though I have not done any hard shutdowns and there have not been any crashes or hangs.

I’ll see what Framework says, but it seems to me there is likely an electrical problem. Noteworthy that it only happens to the primary slot. The secondary disk has never had any problems.

Charlie_6 · December 20, 2025, 7:03am

What’s the output of sudo name smart-log /dev/nvme0

Alan_Pearce · December 20, 2025, 5:27pm

These seem to get generated by the OS telling the drive to go into power save mode when nothing is happening.There have been a number of threads about this happening and the way ‘unsafe shutdown’ messages grow exponentially in the logs.

Mark_Leone · December 22, 2025, 4:38pm

There is no command ‘name’. What info are you asking me to obtain?

Mark_Leone · December 22, 2025, 4:44pm

That makes sense, because I have unsafe shutdown counts for the good disk also. The real issue is the media_error counts, which started suddenly and began increasing rapidly until I booted to the secondary and mounted the primary disk read-only.

The system is popping up a warning daily that a catastrophic SSD failure may be imminent. The kernel log also shows critical hardware errors with the drive. Still waiting to hear back from Framework after submitting the logs they requested.

Charlie_6 · December 22, 2025, 5:10pm

Sorry, my bad, should be ‘nvme’ instead

Mark_Leone · December 22, 2025, 5:44pm

Yeah, that’s the command that I’m running every 5 minutes as a service and then monitoring with a client app. It’s what alerted me to the problem. Here is current output. Media errors were 85 when I first noticed them. By the time I rebooted to the secondary, they went up to 112. The disk has been mounted read-only for several days now, so the error count is stable.

sudo nvme smart-log /dev/nvme0
Smart Log for NVME device:nvme0 namespace-id:ffffffff
critical_warning : 0x4
temperature : 39 °C (312 K)
available_spare : 100%
available_spare_threshold : 10%
percentage_used : 0%
endurance group critical warning summary: 0x4
Data Units Read : 5487928 (2.81 TB)
Data Units Written : 8771197 (4.49 TB)
host_read_commands : 251365795
host_write_commands : 55578575
controller_busy_time : 262
power_cycles : 50
power_on_hours : 160
unsafe_shutdowns : 12
media_errors : 112
num_err_log_entries : 112
Warning Temperature Time : 0
Critical Composite Temperature Time : 0
Temperature Sensor 1 : 52 °C (325 K)
Temperature Sensor 2 : 39 °C (312 K)
Thermal Management T1 Trans Count : 0
Thermal Management T2 Trans Count : 0
Thermal Management T1 Total Time : 0
Thermal Management T2 Total Time : 0

Mark_Leone · December 23, 2025, 3:54am

Framework just got in touch and said they think it’s the main board, and they’re going to replace it.

Topic		Replies	Views
SSD overheating Community Support framework-laptop-13-amd-7040 , temperature	6	1377	September 2, 2024
OWC Aura P12 Pro 8TB M.2 NVMe SSD getting a little toasty Framework Laptop 13	1	927	August 26, 2021
Second SSD Storage Size Help Framework Laptop 16 framework-laptop-16-amd-7040	13	2183	February 19, 2024
Framework 16 - two ssd potential overheat and prevention Framework Laptop 16 framework-laptop-16-amd-7040	13	1155	April 24, 2024
`smartctl` shows excessive power cycles and unsafe shutdowns Linux arch	36	1735	November 29, 2025

SSD Failed Twice, Maybe Temp Related

Related topics