NVMe Drive Enumeration Inconsistencies in Linux on FW16

I’ve noticed that NVMe drive enumeration in Linux can be inconsistent. It seems that as the kernel detects drives, it assigns device names dynamically, leading to variations between boots or installations.

I have four NVMe drives, including those installed in the M.2 adapter. During a reinstall of Ubuntu, I found that the drive I originally installed on was nvme2, but during the reinstall, it showed up as nvme0. I’ve encountered similar behavior with SAN drives on RHEL, where I resolved it by using custom udev rules to enforce consistent naming.

I’m wondering if this variability could be influenced by a BIOS setting I might be missing or if a BIOS update could help address this. Has anyone else experienced this issue, and are there known solutions at the firmware level?

Thank you
Joe

Under linux, you usually access devices by device UUID.

$ lsblk -o NAME,ID,PARTTYPENAME,UUID
NAME                                          ID                                             PARTTYPENAME     UUID
sda                                           Generic_STORAGE_DEVICE_000000001531-0:0                         
nvme1n1                                       eui.aca32f037500e71e2ee4ac0000000001                            
└─nvme1n1p1                                   eui.aca32f037500e71e2ee4ac0000000001-part1     Linux filesystem 91a0b050-b995-4c07-af3d-d11e3578d9fb
nvme2n1                                       Sabrent_SB-2130-1TB_48797879300891                              
├─nvme2n1p1                                   Sabrent_SB-2130-1TB_48797879300891-part1       Linux filesystem b8ea6424-6007-4d26-86aa-bd1cde84f593
├─nvme2n1p2                                   Sabrent_SB-2130-1TB_48797879300891-part2       Linux filesystem 90fae8a8-7376-4131-bff6-415e8ab0f577
│ └─luks-90fae8a8-7376-4131-bff6-415e8ab0f577 name-luks-90fae8a8-7376-4131-bff6-415e8ab0f577                  cdd8f373-7fbb-4464-a2d8-38486625c08a
└─nvme2n1p3                                   Sabrent_SB-2130-1TB_48797879300891-part3       EFI System       8FFE-D3C4
nvme3n1                                       eui.002538d931a1777e                                            
└─nvme3n1p1                                   eui.002538d931a1777e-part1                     Linux filesystem 49234d38-8558-462f-82ab-d8a8ccd6b849
  └─luks-49234d38-8558-462f-82ab-d8a8ccd6b849 name-luks-49234d38-8558-462f-82ab-d8a8ccd6b849                  33852743-1d1d-4464-80b8-a899ba953228
nvme0n1                                       eui.36363030547918820025385800000001                            
├─nvme0n1p1                                   eui.36363030547918820025385800000001-part1     EFI System       D9CB-5FE2
└─nvme0n1p2                                   eui.36363030547918820025385800000001-part2     Linux filesystem 85d15bea-a969-4c6d-8f75-9d900ae69fe0

And the kernel will not count them as Windows does: +1 when you add a new one.
These device UUID’s are unique every time you create a partition+Filesystem, and that’s what you use when finetuning in /etc/fstab (UUID) or udev (ID).

Yes, with a running operating system this is a simple fix. During an installation, it poses a problem.

Long gone are the days of setting drive C, D, etc in a bios for Windows to use. This is why in Linux, partitions are usually mounted these days by the UUID because it will not matter which drive is found by the kernel “first”. I’m pretty sure Ubuntu installs that way. But I guess it’s possible you changed it for some reason.

Simply, it’s pretty much known to be “wrong” to directly address the drive device in fstab because order is not set in stone between reboots.

Modifying udev rules as you have done is the correct way to resolve your confusion.

@Joe_Name
What Linux Distro are you using.
For a long time now, most distros have been using UUIDs, so that the kernel command line points to the UUID for the root partition and the puts UUIDs in /etc/fstab.

If you are seeing nvme device names instead, something is wrong with the installer.

Generally I use NixOS. I have a need for some laser software that works a little better in a FHS environment, and wanted to setup a small partition for it. I built a working derivation, but further complications made it clear that a small installation would probably be easier. I had messed up two installations of Ubuntu, and that’s when I noticed the different drive enumeration. It’s something that could be addressed in firmware. Some boot managers (GRUB) default to friendly names instead of UUID. I agree that UUID is the way, completely agree and understand. Just surprised me when I found it, and thought I would share. It also might have been complicated by my having 4 drives in my system with the m.2 expansion card.

The direct answer is no, the drive enumeration order is not deterministic on most computers running Linux. However, you can usually fix the boot config after the fact even if you have to use a live environment.

2 Likes

In /dev/disk/by-path/ you can refer to disks by their physical slot configuration, which is stable.

Do you know WHY it is not deterministic by chance?
Or can point me to some resources?

I just want to get to the bottom of it… out of curiosity.

Probably the EFI bios asks the two drives “what’s up?” and first one to raise its hand gets slot 0.

2 Likes

The rather incomplete short version (because honestly I don’t know myself) is that the hardware decides when it’s ready to the host computer, and this isn’t deterministic, and the kernel assigns the dev nodes corresponding to whichever disk becomes ready first. This Red Hat document backs up this idea, stating:

The major and minor number range and associated sd names are allocated for each device when it is detected. This means that the association between the major and minor number range and associated sd names can change if the order of device detection changes.

The document goes on to give some examples of when devices may not initialize or may not do so in the right order. For instance, the same HBA with different disks can cause the initialization to happen in a different order simply because of the disks.

It’s something that’s been talked about since at least as far back as when /dev/hd* was deprecated, IIRC in the late 2000s. This stackexchange answer explains a bit of the background behind the naming in general; specifically, that /dev/sd used to be deterministic based on the SCSI ID, and /dev/hd* was also in-order based on the drive’s physical connection to the PATA bus. (discussion is mainly around sd nodes, but nvme nodes follow the same rules AFAIK) My semi-educated guess is that home users, mostly using hd until it got deprecated and perhaps only had one disk most of the time anyway, didn’t realize that the paths would change, and most people got lucky when all their drives were initializing in a specific order every time. However, because this isn’t guaranteed due to the above, relying on said order is generally discouraged.

Ultimately, even if the Framework firmware (whatever part of it is used for system intialization) was deterministic, the non-Framework devices are still just doing whatever they want. Nothing stops one of the SSDs from just e.g. randomly deciding to wait 10 seconds before deciding to respond on boot, causing it to initialize after the other one and thus get a different dev node.

And please do not forget that in the early times, we had ATA connectors and drives had to be jumper-configured in master/slave mode forcing a specific order anyway. We still had to think how to configure the drives before putting them into the system.