Blank screen on wake after latest "critical" update

What I think the issue is:

At boot, amdxdna fails to probe the NPU due to a firmware protocol mismatch:

amdxdna 0000:c2:00.1: enabling device (0000 -> 0002)
amdxdna 0000:c2:00.1: [drm] *ERROR* aie2_check_protocol: Incompatible firmware protocol major 7 minor 2
amdxdna 0000:c2:00.1: [drm] *ERROR* aie2_hw_start: firmware is not alive
amdxdna 0000:c2:00.1: [drm] *ERROR* aie2_smu_exec: smu cmd 4 failed, 0xff
amdxdna 0000:c2:00.1: [drm] *ERROR* aie2_smu_fini: Power off failed, ret -22
amdxdna 0000:c2:00.1: [drm] *ERROR* aie2_init: start npu failed, ret -22
amdxdna 0000:c2:00.1: [drm] *ERROR* amdxdna_probe: Hardware init failed, ret -22
amdxdna 0000:c2:00.1: probe with driver amdxdna failed with error -22

The probe error path calls aie2_smu_fini() as cleanup, but because aie2_hw_start failed before SMU initialization completed, aie2_smu_fini attempts to power off hardware that was never fully initialized — and itself fails (Power off failed, ret -22). This leaves the NPU in an unknown hardware state.

Later, when the system suspends via s2idle, the last journal entry is:

PM: suspend entry (s2idle)

The system never resumes. The display stays blank and the machine must be hard-reset. This is reproducible on every suspend cycle.

Two distinct issues compound to produce the hang:

  1. Firmware/driver protocol mismatch: aie2_check_protocol() rejects firmware protocol major version 7. The linux-firmware package (20260221) ships /lib/firmware/amdnpu/17f0_11/npu.sbin with a protocol version the current driver does not accept.

  2. Broken error path in aie2_smu_fini: When called from the probe failure cleanup path, aie2_smu_fini() unconditionally issues a power-off SMU command with no guard for the case where SMU was never successfully initialized. This leaves the NPU in a partially-powered, undefined hardware state that blocks s2idle power management — causing the display to fail to reinitialize on resume.

The following patches address related issues but do not appear to have landed in the 6.18 stable series:

  • aie2_smu: treat power-off failure as unrecoverable during init (LKML 2025-11-13)
  • amdxdna: update firmware version check to use feature table lookup accepting major version 7 (LKML 2026-01-07)