Input/Output errors although SSD is fine

Distro: KDE Neon
Release: 22.04
Kernel: 6.9.1
BIOS: 3.03
Specs: Ryzen 7800, 7700S

I have a problem that only recently emerged: Sometimes my NVME SSD will just be inaccessible.
I cant really troubleshoot it, because once this state is reached, I cant use a terminal anymore.

I did a quick search online, but most people just say, it’s just the ssd dying (I checked smart values and they seemed fine, e2fsck reported no problems either) or a firmware problem (no new firmware for my lexar NM710).

Does anyone know how I could troubleshoot this problem?

Some more things I know:

  • Nextcloud-Desktop already died once I notice it.
  • Most of the time I only notice it when trying to save a file or trying to use a terminal
  • Kernel 6.9.1 worked without problems for a month, the problem started a week or two ago
  • I was running a tuxedo-neon-ubuntu frankenstein distro until this monday (had the sources all mixed up) but that is since 2 months already
  • I can still poweroff with sysrq although the filesystem syncing (s) and rebooting (b) arent working

Don’t trust that SSD. It might still be good – but in my experience, the symptoms that you’re reporting are a prelude to worse things to come. Maybe something gradual, but maybe a sudden and full no-workee-anymore. Keep your backups up-to-date, and I’d suggest using a disk format that checks the integrity of every file, like BTRFS or ZFS, if at all possible.

With that out of the way: sorry, I don’t know of any way to troubleshoot something like this that happens at random, and that you can’t do anything with once it does.

Ok, I contacted customer support right away and probably will move to zfs soon, as I already planned that.

1 Like

I just recently had to replace a failing HDD in a RAID 5, despite the fact that all of the S.M.A.R.T. values were passing ok. I ran a more in-depth device test, and both it and another drive in the array failed gloriously. I replaced all drives in the array (given that they were all the same model, bought at the same time, etc.) and was able to avoid losing any data. I know it’s not an SSD, but the learning may still apply, the S.M.A.R.T. failure heuristics are not the end-all-be-all of device diagnostic.

I have also had to replace an SSD in another RAID that did begin to trip the S.M.A.R.T. detectors, even though it was operating perfectly fine as far as I could tell. So I’ve seen the flip side of the coin as well.

The smartctl long test is what I ran, perhaps running it could help you diagnose your issue as well. Luckily in that situation, it was all business expensible, so if it simply looked like it might be failing, a couple hundred bucks was a no-brainer spend compared to the downtime and loss it could cost.
https://www.cyberciti.biz/tips/linux-find-out-if-harddisk-failing.html

Oh, thats interesting, thanks for the extensive answer!

I received a new SSD replacing the current one today and will make a move to kde neon on ZFS. Once my current SSD has been replaced, I will have 2 SSDs and be able to use a ZFS mirror.

I highly recommend SN850 if you are to move to zfs.

Some nvmes love to drop off the PCIE bus when zfs is involved: Unsuitable SSD/NVMe hardware for ZFS - WD BLACK SN770 and others · openzfs/zfs · Discussion #14793 · GitHub

And S.M.A.R.T. never reported anything for me before a drive fails :frowning:

Oh crap, I only had enough money lying around for a SN770, because I need a short SSD with 2TB and those are quite expensive :confused:

But it seems like there are no 2230 NVME SSDs with DRAM buffer, which seemed to be the problem.

Yeah I know how it feels. There are reports that 770 could work but I personally have no idea: https://www.reddit.com/r/zfs/comments/1ei46zo/comment/lg43ip1/?utm_source=share&utm_medium=mweb3x&utm_name=mweb3xcss&utm_term=1&utm_content=share_button

I don’t know if it’s the dram or just a firmware bug.

I personally decided to wait for the expansion bay before getting my Framework.

After digging a bit into this topic, I will try the 770 nevertheless and definitely use another SSD from another manufacturer as counterpart in the large slot.
I will report back in a month or two, unless the 770 dies on my the first day xD

Good luck bro and I am sorry for relaying the bad news.

I hope the 770 will survive and serve you reliably for a long time.

And reporting back will be extremely appreciated.

Just remember that when ordering, you need a 770M for the 2230 slot. If you order a straight 770 it is a 2280 length.

Yes, I know, thanks :slight_smile:

I have it installed now, struggled a lot with installing 24.04 with zfs, but it seems to work mostly now.
Just need to figure out how to setup the swap encryption to use it for hibernation.