Anyway I removed the screw, plugged back the storage and the laptop was able to boot again. However I’m a bit worried about a hardware failure either in the laptop or the storage.
Was the NVME drive loose at all? Reseating it was a good idea. If this does happen again, I would open a ticket with support, as you might have a hardware issue.
Ok, well hopefully it was just a slight glitch. Now that you have reseated the drive, and the connection is good and solid, we’ll see if it holds. It should. If it does not, contact support please. They’ll be able to remedy the problem at that point.
I would also check the smart log (under linux, you can use “smartctl --all /dev/nvme0” or nvme tools with: “sudo nvme smart-log /dev/nvme0”
check the device names with “lsblk” or “nvme list”
I just had a Samsung 980 NVMe disk failing on my server because of some critical media errors that happened to be at the “beginning” of the disk invalidating all existing boot blocks which resulted in the system not recognizing it anymore.
Usually, NVMe SSD’s will remap the bad block transparently if possible (means, when not in use). But if bad blocks on a fairly new disk happen, this can go to fatal issue fast.
Make sure there are no Media and Integrity Errors showing up in your log.
See this one as example:
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 35 Celsius
Available Spare: 97%
Available Spare Threshold: 10%
Percentage Used: 4%
Data Units Read: 28,856,115 [14.7 TB]
Data Units Written: 49,244,732 [25.2 TB]
Host Read Commands: 268,782,702
Host Write Commands: 672,738,666
Controller Busy Time: 2,586
Power Cycles: 106
Power On Hours: 5,141
Unsafe Shutdowns: 17
Media and Data Integrity Errors: 38 <=== This
Error Information Log Entries: 38 <=== This shows the existing log entries
Warning Comp. Temperature Time: 226
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 35 Celsius
Temperature Sensor 2: 39 Celsius
Thermal Temp. 2 Transition Count: 59326
Thermal Temp. 2 Total Time: 18032
According to a Samsung engineer, remapping should not show media and data integrity errors as these can be remapped. If that shows up, something is wrong with the silicone and needs replacing.
I got a replacement disk from Samsung in 5 days.
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 31 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 0%
Data Units Read: 250,961 [128 GB]
Data Units Written: 619,864 [317 GB]
Host Read Commands: 1,826,774
Host Write Commands: 7,036,369
Controller Busy Time: 14
Power Cycles: 185
Power On Hours: 4
Unsafe Shutdowns: 2
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged
Self-test Log (NVMe Log 0x06)
Self-test status: No self-test in progress
No Self-tests Logged
If I were you, I’d clean the contact with electronics cleaner or just alcohol.
I’ve actually run into issues like this with PCIe cards.
And I see too many people putting their fingers on these little pads.
Power on hours very low, power cycles however quite high. All spares still available.
Eventually you could initiate a self test (fast and long). But IMHO that will not show us anything new. Just make sure the contacts are clean as @Richard_Lee1 mentioned and see how it goes.
Got this again yesterday night, I was on my laptop and it rebooted.
The storage device was not visible.
I put it this way on my desk, and this morning it boots …
(⎈|minikube:default)➜ ~ sudo nvme smart-log /dev/nvme0
Smart Log for NVME device:nvme0 namespace-id:ffffffff
critical_warning : 0
temperature : 32 °C (305 K)
available_spare : 100%
available_spare_threshold : 10%
percentage_used : 0%
endurance group critical warning summary: 0
Data Units Read : 1396065 (714.79 GB)
Data Units Written : 3658385 (1.87 TB)
host_read_commands : 12529858
host_write_commands : 66415326
controller_busy_time : 197
power_cycles : 379
power_on_hours : 68
unsafe_shutdowns : 17
media_errors : 0
num_err_log_entries : 0
Warning Temperature Time : 0
Critical Composite Temperature Time : 0
Thermal Management T1 Trans Count : 0
Thermal Management T2 Trans Count : 0
Thermal Management T1 Total Time : 0
Thermal Management T2 Total Time : 0
Ok. Go into ther BIOS and load failsafe defaults. reboot and make sure it boots.
Shutdown (realy shutdown, wait 35secs) and boot again into BIOS and load the performance defaults.
Also, check if there are some bios updates for the drive.
All I could advise to do, except in putting a different drive into it and see if that one has the same symptoms.
Not sure where you purchased the drive from, but if it happens again, it may be ticket worthy if the connections between the NVMe and the slot look healthy.