What I think the issue is:
At boot, amdxdna fails to probe the NPU due to a firmware protocol mismatch:
amdxdna 0000:c2:00.1: enabling device (0000 -> 0002)
amdxdna 0000:c2:00.1: [drm] *ERROR* aie2_check_protocol: Incompatible firmware protocol major 7 minor 2
amdxdna 0000:c2:00.1: [drm] *ERROR* aie2_hw_start: firmware is not alive
amdxdna 0000:c2:00.1: [drm] *ERROR* aie2_smu_exec: smu cmd 4 failed, 0xff
amdxdna 0000:c2:00.1: [drm] *ERROR* aie2_smu_fini: Power off failed, ret -22
amdxdna 0000:c2:00.1: [drm] *ERROR* aie2_init: start npu failed, ret -22
amdxdna 0000:c2:00.1: [drm] *ERROR* amdxdna_probe: Hardware init failed, ret -22
amdxdna 0000:c2:00.1: probe with driver amdxdna failed with error -22
The probe error path calls aie2_smu_fini() as cleanup, but because aie2_hw_start failed before SMU initialization completed, aie2_smu_fini attempts to power off hardware that was never fully initialized — and itself fails (Power off failed, ret -22). This leaves the NPU in an unknown hardware state.
Later, when the system suspends via s2idle, the last journal entry is:
PM: suspend entry (s2idle)
The system never resumes. The display stays blank and the machine must be hard-reset. This is reproducible on every suspend cycle.
Two distinct issues compound to produce the hang:
-
Firmware/driver protocol mismatch:
aie2_check_protocol()rejects firmware protocol major version 7. Thelinux-firmwarepackage (20260221) ships/lib/firmware/amdnpu/17f0_11/npu.sbinwith a protocol version the current driver does not accept. -
Broken error path in
aie2_smu_fini: When called from the probe failure cleanup path,aie2_smu_fini()unconditionally issues a power-off SMU command with no guard for the case where SMU was never successfully initialized. This leaves the NPU in a partially-powered, undefined hardware state that blocks s2idle power management — causing the display to fail to reinitialize on resume.
The following patches address related issues but do not appear to have landed in the 6.18 stable series:
aie2_smu: treat power-off failure as unrecoverable during init (LKML 2025-11-13)amdxdna: update firmware version check to use feature table lookup accepting major version 7 (LKML 2026-01-07)