You still don’t have the firmware updated properly. Assuming it’s put into the filesystem properly, maybe it’s included in your initramfs and you forgot to rebuild it?
2024-08-31 17:18:16,223 DEBUG: amdgpu 0000:c1:00.0: [drm:jpeg_v4_0_early_init [amdgpu]] JPEG decode is enabled in VM mode
2024-08-31 17:18:16,223 DEBUG: amdgpu 0000:c1:00.0: firmware: failed to load amdgpu/gc_11_0_1_mes_2.bin (-2)
2024-08-31 17:18:16,223 DEBUG: firmware_class: See https://wiki.debian.org/Firmware for information about missing firmware
2024-08-31 17:18:16,223 DEBUG: amdgpu 0000:c1:00.0: firmware: failed to load amdgpu/gc_11_0_1_mes_2.bin (-2)
2024-08-31 17:18:16,223 DEBUG: amdgpu 0000:c1:00.0: Direct firmware load for amdgpu/gc_11_0_1_mes_2.bin failed with error -2
2024-08-31 17:18:16,223 DEBUG: [drm] try to fall back to amdgpu/gc_11_0_1_mes.bin
2024-08-31 17:18:16,223 DEBUG: amdgpu 0000:c1:00.0: firmware: direct-loading firmware amdgpu/gc_11_0_1_mes.bin
2024-08-31 17:18:16,223 DEBUG: amdgpu 0000:c1:00.0: firmware: direct-loading firmware amdgpu/gc_11_0_1_mes1.bin
When you’ve done it properly that script won’t complain anymore.
And now the script runs without complaint! But the computer still takes over a half-minute to wake. I think a few seconds worse than when I started, not sure. (And draw on the battery when unplugged as as bad, maybe a little worse than when I started, not sure.)
Grrr.
-kb, the Kent who feels like he is making progress, but on the wrong axis.
If you don’t have any sort of passwords set in the firmware the next thing I would suggest you do is check your NVME firmware version against the latest that is present on the manufacturer’s website. Most manufacturers don’t publish firmware updates for their disks for Linux unfortunately.
Since you’re seeing a page fault from the NVME disk in the interim some workarounds you can experiment to see if they help are either turning off the IOMMU (amd_iommu=off on kernel command line) or putting it in passthrough mode (iommu=pt on kernel command line).
@Kent_Borg There’s a thread here for updating WD SNX50 NVMe drives without WD’s official Windows FW update tool… Apparently somebody even wrote a python tool for updating the FW under linux (which i haven’t used - ymmv)… all in the thread linked.
I’m suspicious that I have a bad component. (I’ve had my /boot partition get corrupted twice. Yes, I have been messing with grub stuff at the time, so maybe I messed it up, that is why I am suspicious and not certain. smartctl -a /dev/nvme0 doesn’t show any obvious errors. I am running btrfs for / and /boot and when I scrub I get no crc errors.)
Yes, i have the same drive with the same firmware rev.
Rather than fiddeling with a system where one cannot determine it’s current state properly, i’d much rather try something more recent than debian with cherry-picked backports, e.g. a clean Fedora installation or one of the Arch Linux derivates, or try swapping the nvme drive, if you can.
Because, as @Mario_Limonciello pointed out above, the IOMMU errors and the resetting of the nvme controller are surely not conducive for the process and should be alarming imo, even without suspend/resume cycle issues.