Maybe not, but you may be able to find out the MTBF. I know one of my colleagues who oversaw a bit barn (collecting data from satellites) worked out that the failure rate of drives he had matched the manufacturer MTBF when he worked the figures through.
My workplace uses a lot of Dell Latitudes, it’s honestly surprisingly common for them to ship out with hardware issues such as a bad motherboard, a bad sound device or speakers, a bad webcam, etc. Moreover the amount of issues that crop up over the course of 1 year of usage is crazy.
I’ve come to accept that hardware issues are just part of any manufactured item and understand the importance of a good warranty (and support for that warranty) on everything because it means a company values a product that lasts. Unfortunately failed components happens so many ways and is hard to attribute to anything but it happens and often it’s not user error.
Please open a support ticket so we can drill down on this a bit further.
@Matt_Hartley I did, in the meantime. But sadly, the SSD was not sourced from Framework (weird rules between my institution and me), so the ticket was closed as a matter of policy. I was hoping that the case can provide your engineers with useful insight.
Appreciate you sharing this with us, I will keep an eye out on my own SN850 which has been humming right along thus far. If we spot a pattern, we’ll track it and pass it along for sure.
This happened to me today. Also a 11th gen. Framework 13 from March 2022; mine has seen very heavy use, powered on pretty much continuously. The exact model of the SSD is WDS100T1X0E-00AFY0. I placed the SSD in a USB enclosure and attached it to another host, and it doesn’t look so good:
usb 3-3: new high-speed USB device number 7 using xhci_hcd
usb 3-3: New USB device found, idVendor=0bda, idProduct=9210, bcdDevice=20.01
usb 3-3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 3-3: Product: RTL9210B-CG
usb 3-3: Manufacturer: Realtek
usb 3-3: SerialNumber: 012345678909
usb-storage 3-3:1.0: USB Mass Storage device detected
scsi host2: usb-storage 3-3:1.0
scsi 2:0:0:0: Direct-Access Realtek RTL9210 1.00 PQ: 0 ANSI: 6
sd 2:0:0:0: Attached scsi generic sg0 type 0
sd 2:0:0:0: [sda] Read Capacity(10) failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
sd 2:0:0:0: [sda] Sense Key : Illegal Request [current]
sd 2:0:0:0: [sda] Add. Sense: Invalid command operation code
sd 2:0:0:0: [sda] 0 512-byte logical blocks: (0 B/0 B)
sd 2:0:0:0: [sda] 0-byte physical blocks
sd 2:0:0:0: [sda] Test WP failed, assume Write Enabled
sd 2:0:0:0: [sda] Asking for cache data failed
sd 2:0:0:0: [sda] Assuming drive cache: write through
sd 2:0:0:0: [sda] Attached SCSI disk
sd 2:0:0:0: [sda] Read Capacity(10) failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
sd 2:0:0:0: [sda] Sense Key : Illegal Request [current]
sd 2:0:0:0: [sda] Add. Sense: Invalid command operation code
A bug in WD Dashboard is keeping me from launching it, so until they get back to me, I won’t know if it’s able to see the drive and maybe update its firmware.
You have my sincere sympathy.
Did yours die after going to sleep too? Mine says MDL: WDS100T1X0E-00AFY0
, same as yours. And I didn’t mention that I also use my Framework heavily.
I didn’t put mine in an enclosure, but this is what the kernel says when it is in the M.2 slot:
nvme 0000:01:00.0: platform quirk: setting simple suspend
nvme nvme0: pci function 0000:01:00.0
nvme nvme0: Device not ready; aborting initialisation, CSTS=0x0
Yes, I shut my Framework down for about 36 hours, and then it was dead at next startup.
So - take this with a grain of salt, but this post has got me looking into whether I have got DEALLOCATE patched through correctly [nope, I forgot to pass my initramfs a kernel flag to do that when it mounted the encrypted root partition – thanks for making me think to double check!]. My anecdote is that I had not one, but two SSDs fail on me in short order (~6 months from purchase), which I eventually tracked down to their firmwares having A Bad Day because I was using full-disk encryption without TRIM passthrough. As soon as I got TRIM going through correctly, both those drives sprang back to life, and haven’t given me a single lick of trouble since. That was SATA, this is NVMe, apples, oranges, but it may still be worthwhile to make sure your filesystem supports DEALLOCATE (/TRIM/unmap), and that if you’ve got any intervening layers (LVM, cryptsetup), that they’re also passing the appropriate command through.
As a second aside, the 1TB SN850X that I got (also not Framework sourced) defaulted to 512b sectors. I had to use the nvme
tool (per Switching your NVME ssd to 4k - Bjonnh.net ) to manually convert my drive to 4k blocks before I did my OS install. Having the block size be 4k may or may not have any long-term benefits as far as wear goes, but given that flash is frequently garbage collected in 4k (or larger) chunks, it makes me sleep a little bit better at night having mine in 4k mode. You can also use the nvme
tool to dump SMART data and other stats/logs from the drive if it’s at least enumerating to a /dev/nvmeXnX device – perhaps you can glean more about the specific failure from there [derp… looks like you already did that!].
FWIW it had discard configured end-to-end (you can make it stick in the LUKS v2 header, too). And it was using 4k blocks.
No SMART (or anything) now, but I do have about 6mo of daily smart stats logged. Other than an unclean shutdown every 25 power cycles on average, nothing suspicious.
I got the dreaded failure today. Exact same make and model and batch.
1TB WD Black SN850 - WDS100T1XOE-00AFY0
Mine was ordered as part of my DIY components.
Is there a wider ticket open for these issues with Western Digital for some form of RMA.
@Matt_Hartley I would like to throw my hat in the ring here as well. Same failure. Drive is dead, tested via usb to m.2 adapter.
500GB WD Black SN850 WDS500G1X0E - 00AFY0
Looks like the same batch as others. Also have the same question as @Adam_Sproul , this feels unacceptable to die after such little use. I have yet to have any SSD fail this suddenly on me.
Okay folks, now we need to identify common factors so we can see if this is a bad batch or something else. Please use the following template - please do not included tons of other details, we need to keep this spreadsheet friendly:
How is the SN850 attached: Internally or externally/USB? Do NOT include non-SN850 examples as this is not the same.
Was LUKS in use: Yes/No
Died after sleep or died on reboot/cold boot: Answer here as after sleep or reboot/cold boot.
Was the drive purchased from us? If so, which date(s): Yes/No, date if applicable.
Which Linux distro and kernel used in this instance: Please provide your distro and kernel version here.
How is the SN850 attached: Internally.
Was LUKS in use: Yes.
Died after sleep: Probably after coold boot [1].
Was the drive purchased from us: No.
Which Linux distro and kernel: Arch, 6.5.9.arch2-1.
[1] — Unclear. The system entered s2idle, but was set up to wake up 60 min later, hibernate, and shut down. I found it powered off. I think that if the SSD died on resume from suspend, the kernel would either continue running, or panic, reboot after 120 seconds, and get stuck in UEFI. I think that only successfully resuming, hibernating, and powering off would leave it in the state I found it in, which is fully off.
How is the SN850 attached: Internally
Was LUKS in use: Yes
Died after sleep or died on reboot/cold boot: Probably cold boot
Was the drive purchased from us? If so, which date(s): Yes, 2022-03-15
Which Linux distro and kernel used in this instance: Qubes OS 4.1.x, Linux kernel 5.10 series
How is the SN850 attached: Internally
Was LUKS in use: No
Died after sleep or died on reboot/cold boot: After reboot
Was the drive purchased from us? If so, which date(s): No, 2022-04-08
Which Linux distro and kernel used in this instance: Ubuntu 22.04.4, kernel ??
I think I saw the exact same failure today. Same drive and fw gen. Running Ubuntu and the drive died after a restart. Fwiw I’ve trying to update the fw BIOS over the last couple of days. I submitted a support request.
I just found this thread as I opened up my laptop to start work today and the drive was completely dead. I booted off a live USB and the drive doesn’t even show up in lsblk
or nvme list
. Nothing in dmesg. I put it in an M.2 USB-C enclosure and hooked it up to another machine and I don’t even see anything in dmesg
, so this seems completely dead.
Attached: Internal M.2
LUKS: Is this the default full disk encryption with Fedora? If so, yes.
Died: After waking from sleep
Purchased from Framework in October 2011 as part of my DIY kit
Linux: Fedora 39, whatever the most recent kernel was for that. It was up to date.
WDS500G1XHE-00AFY0
DOM: 3-May-2021
Had this happen to me yesterday as well, all very similar apart from it seems to have taken the storage controller on the mainboard with it (12th gen intel i7-1280p). I can no longer boot to or properly see any internally attached m.2 drive. When booting from USB everything seems to work fine. Not sure if anyone else experienced this and managed to resolve it?
Support ticket is in. Loved the laptop otherwise so hopefully can get this resolved in a meaningful way.
How is the SN850 attached: Internally
Was LUKS in use: Yes
Died after sleep or died on reboot/cold boot: After sleep
Was the drive purchased from us? If so, which date(s): Yes, order date was 28 Jan 2022, 500GB WD Black SN850 - WDS500G1X0E-00AFY0
Which Linux distro and kernel used in this instance: Pop_os! 6.9.3-76060903-generic
I suspect you will find the m.2 module is OK, seeing you can’t use any m.2 module. I’d wait until FW service resolve the problem before trashing your m.2 module.