[RESPONDED] After forced reboot nvme sn850 ssd no longer shows up in uefi

Details:

  • Ubuntu
  • 22.04
  • i5 11th gen
  • WD_Black SN850 500GB

Everything had been working just fine. While the laptop should have been on standby, the fan just started spinning at full speed. The screen stayed off and it was completely unresponsive so I held the power button until it rebooted. When it did, I got a message about there being no boot device, so I booted from a usb drive, but the ssd didn’t show up. Then I check in the uefi and the only drive that showed up was the usb one. I also tried booting a Windows10 install image, but that crashed. I tried disabling secure boot, but nothing changed. The drive is encrypted with ZFS.

While I’d prefer not to, I’m ok with a solution that involves wiping the drive. If I can just get it to show up in uefi again, I can probably take it from there.

What did you boot from an USB drive? Another Ubuntu 22.04 live-image?

Is there anything NVME related in dmesg? (sudo dmesg | grep -i NVME)

Sorry I should have said, it was the Ubuntu 22.04 image I installed from.
output from ( sudo dmesg | grep -i NVME)

[    2.334525] nvme 0000:01:00.0: platform quirk: setting simple suspend
[    2.334568] nvme nvme0: pci function 0000:01:00.0
[    2.384153] nvme nvme0: Removing after probe failure status: -19

Did you use any boot parameters before on your NVME, like nvme.noacpi=<VALUE>, pci_aspm=<VALUE> or nvme_core.default_ps_max_latency_us=<VALUE>?

ClearLinux had a similiar issue.

Also, which kernel did you use when it detected the drive? Is it possible that you did an update/update before suspending, so that it would try to load a newer kernel which was faulty?

I never changed or added any boot parameters from default. It might be possible that there was an update, but shouldn’t the drive show up in firmware at least? Specifically, I press F2 while booting, navigate to Boot>EFI Boot Order and the only option listed is the usb thumb drive. My knowledge of efi is rather surface level. I grow up with BIOS, so maybe I’m thinking about this wrong. In any event, I don’t remember my kernel version number offhand, and I can’t check because I can’t access the drive. I know that the one on my install media worked (5.15.0-43), but that same install media can’t see it now. It doesn’t show up with (sudo parted -l) or in “Disks”.

Can you try to boot into your install media with nvme_core.default_ps_max_latency_us=0 and nvme.noacpi=1?

The drive won’t show up on anything for now, the kernel logs removed the device from the list of known devices due to probe failure. The boot args mentioned above could help with this.

The issue why you’re not seeing this drive could be that the EFI bootentry has been deleted. I’ve had this before.

(sudo dmesg| grep -i NVME) now gives

[    0.000000] Command line: BOOT_IMAGE=/casper/vmlinuz file=/cdrom/preseed/ubuntu.seed maybe-ubiquity quiet splash nvme_core.default_ps_max_latency_us=0 nvme.noacpi=1 ---
[    0.056139] Kernel command line: BOOT_IMAGE=/casper/vmlinuz file=/cdrom/preseed/ubuntu.seed maybe-ubiquity quiet splash nvme_core.default_ps_max_latency_us=0 nvme.noacpi=1 ---
[    1.589568] nvme nvme0: pci function 0000:01:00.0
[    1.636533] nvme nvme0: Removing after probe failure status: -19

That doesn’t look good.

Can you write a live-iso from a distro that uses a recent kernel, probably 6.1, and see if the issue persists with the same boot parameters?

This would be my last try before suggesting to move the troubleshooting to the Framework Support team.

I will try that, do you have a distro to suggest?
Edit: no luck, I’ve submitted to Framework Support. @Anachron, thank you for the help.