Wake up From Suspend…Tricky

You still don’t have the firmware updated properly. Assuming it’s put into the filesystem properly, maybe it’s included in your initramfs and you forgot to rebuild it?

2024-08-31 17:18:16,223 DEBUG:	amdgpu 0000:c1:00.0: [drm:jpeg_v4_0_early_init [amdgpu]] JPEG decode is enabled in VM mode
2024-08-31 17:18:16,223 DEBUG:	amdgpu 0000:c1:00.0: firmware: failed to load amdgpu/gc_11_0_1_mes_2.bin (-2)
2024-08-31 17:18:16,223 DEBUG:	firmware_class: See https://wiki.debian.org/Firmware for information about missing firmware
2024-08-31 17:18:16,223 DEBUG:	amdgpu 0000:c1:00.0: firmware: failed to load amdgpu/gc_11_0_1_mes_2.bin (-2)
2024-08-31 17:18:16,223 DEBUG:	amdgpu 0000:c1:00.0: Direct firmware load for amdgpu/gc_11_0_1_mes_2.bin failed with error -2
2024-08-31 17:18:16,223 DEBUG:	[drm] try to fall back to amdgpu/gc_11_0_1_mes.bin
2024-08-31 17:18:16,223 DEBUG:	amdgpu 0000:c1:00.0: firmware: direct-loading firmware amdgpu/gc_11_0_1_mes.bin
2024-08-31 17:18:16,223 DEBUG:	amdgpu 0000:c1:00.0: firmware: direct-loading firmware amdgpu/gc_11_0_1_mes1.bin

When you’ve done it properly that script won’t complain anymore.

Okay: dpkg-reconfigure linux-image-6.1.0-25-amd64.

And now the script runs without complaint! But the computer still takes over a half-minute to wake. I think a few seconds worse than when I started, not sure. (And draw on the battery when unplugged as as bad, maybe a little worse than when I started, not sure.)

Grrr.

-kb, the Kent who feels like he is making progress, but on the wrong axis.

P.S. https://www.borg.org/s2idle_report-2024-09-01.txt

Is this a SED or do you have a BIOS password set in firmware?

I noticed an nvme page fault.

2024-09-01 08:24:18,648 DEBUG: nvme 0000:02:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x58e5e000 flags=0x0000]

If you have a BIOS password set for your storage please turn it off.

I have an admin password that it is supposed to ask for on power up. O(Sometimes it does not, however.)

I’ll try turning it off.

Thanks,

-kb

Turns out I reset the BIOS recently and do not have any passwords set now.

I tried with these firmware files and the 6.10.7 kernel I built a couple days ago. Same slow wake up (and same power consumption).

https://www.borg.org/s2idle_report-2024-09-01_on_6.10.7.txt

-kb

Definitely your issues that are coming from the slow wake up are caused by NVME not coming back properly.

2024-09-01 17:24:08,564 DEBUG: nvme 0000:02:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x58e5e000 flags=0x0000]

2024-09-01 17:24:08,564 DEBUG: nvme nvme0: 12/0/0 default/read/poll queues
2024-09-01 17:24:08,564 DEBUG: nvme nvme0: resetting controller due to AER
2024-09-01 17:24:08,564 DEBUG: nvme nvme0: Identify namespace failed (-4)
2024-09-01 17:24:08,564 DEBUG: nvme nvme0: 12/0/0 default/read/poll queues

If you don’t have any sort of passwords set in the firmware the next thing I would suggest you do is check your NVME firmware version against the latest that is present on the manufacturer’s website. Most manufacturers don’t publish firmware updates for their disks for Linux unfortunately.

Since you’re seeing a page fault from the NVME disk in the interim some workarounds you can experiment to see if they help are either turning off the IOMMU (amd_iommu=off on kernel command line) or putting it in passthrough mode (iommu=pt on kernel command line).

1 Like

@Kent_Borg There’s a thread here for updating WD SNX50 NVMe drives without WD’s official Windows FW update tool… Apparently somebody even wrote a python tool for updating the FW under linux (which i haven’t used - ymmv)… all in the thread linked.

1 Like

Wow! I am impressed.

Working my way through the twisty passages it seems I need firmware <fwversion>620361WD</fwversion> but it seems I am already in that:

root@theseion:/home/kentborg# cat /sys/class/nvme/nvme0/firmware_rev
620361WD

I’m suspicious that I have a bad component. (I’ve had my /boot partition get corrupted twice. Yes, I have been messing with grub stuff at the time, so maybe I messed it up, that is why I am suspicious and not certain. smartctl -a /dev/nvme0 doesn’t show any obvious errors. I am running btrfs for / and /boot and when I scrub I get no crc errors.)

-kb

Yes, i have the same drive with the same firmware rev.

Rather than fiddeling with a system where one cannot determine it’s current state properly, i’d much rather try something more recent than debian with cherry-picked backports, e.g. a clean Fedora installation or one of the Arch Linux derivates, or try swapping the nvme drive, if you can.

Because, as @Mario_Limonciello pointed out above, the IOMMU errors and the resetting of the nvme controller are surely not conducive for the process and should be alarming imo, even without suspend/resume cycle issues.