Sudden loss of storage while laptop running

Hello,

(AMD batch 7)

My laptop suddently got a black screen and displayed then after this me.

I opened the laptop, but the storage was properly plugged.

Anyway I removed the screw, plugged back the storage and the laptop was able to boot again. However I’m a bit worried about a hardware failure either in the laptop or the storage.

Was the NVME drive loose at all? Reseating it was a good idea. If this does happen again, I would open a ticket with support, as you might have a hardware issue.

1 Like

Hi @2disbetter

no it was not loose, the screw was correctly fixed, so I assume the drive could not move.

Ok, well hopefully it was just a slight glitch. Now that you have reseated the drive, and the connection is good and solid, we’ll see if it holds. It should. If it does not, contact support please. They’ll be able to remedy the problem at that point.

2 Likes

I would also check the smart log (under linux, you can use “smartctl --all /dev/nvme0” or nvme tools with: “sudo nvme smart-log /dev/nvme0”

check the device names with “lsblk” or “nvme list”

I just had a Samsung 980 NVMe disk failing on my server because of some critical media errors that happened to be at the “beginning” of the disk invalidating all existing boot blocks which resulted in the system not recognizing it anymore.
Usually, NVMe SSD’s will remap the bad block transparently if possible (means, when not in use). But if bad blocks on a fairly new disk happen, this can go to fatal issue fast.

Make sure there are no Media and Integrity Errors showing up in your log.
See this one as example:

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        35 Celsius
Available Spare:                    97%
Available Spare Threshold:          10%
Percentage Used:                    4%
Data Units Read:                    28,856,115 [14.7 TB]
Data Units Written:                 49,244,732 [25.2 TB]
Host Read Commands:                 268,782,702
Host Write Commands:                672,738,666
Controller Busy Time:               2,586
Power Cycles:                       106
Power On Hours:                     5,141
Unsafe Shutdowns:                   17
Media and Data Integrity Errors:    38   <=== This
Error Information Log Entries:      38   <=== This shows the existing log entries
Warning  Comp. Temperature Time:    226
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               35 Celsius
Temperature Sensor 2:               39 Celsius
Thermal Temp. 2 Transition Count:   59326
Thermal Temp. 2 Total Time:         18032

According to a Samsung engineer, remapping should not show media and data integrity errors as these can be remapped. If that shows up, something is wrong with the silicone and needs replacing.
I got a replacement disk from Samsung in 5 days.

1 Like

If this is on Linux, then a few more details about the distro you are using etc, would be beneficial to this thread.

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        31 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    250,961 [128 GB]
Data Units Written:                 619,864 [317 GB]
Host Read Commands:                 1,826,774
Host Write Commands:                7,036,369
Controller Busy Time:               14
Power Cycles:                       185
Power On Hours:                     4
Unsafe Shutdowns:                   2
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged

Self-test Log (NVMe Log 0x06)
Self-test status: No self-test in progress
No Self-tests Logged

I use fedora 39

If I were you, I’d clean the contact with electronics cleaner or just alcohol.
I’ve actually run into issues like this with PCIe cards.
And I see too many people putting their fingers on these little pads.

Power on hours very low, power cycles however quite high. All spares still available.
Eventually you could initiate a self test (fast and long). But IMHO that will not show us anything new. Just make sure the contacts are clean as @Richard_Lee1 mentioned and see how it goes.