Workaround for SMU deadlock / GPU freeze on Strix Halo — disable VPE idle power gating
TL;DR: A 3-line kernel patch adds amdgpu.no_vpe_idle_pg=1 module parameter that prevents VPE (Video Processing Engine) from cycling power during normal use. This eliminates the SMU deadlock that causes hard freezes during browser hardware video decode. 48+ hours stable with YouTube HW decode, where previously it crashed within 5-10 minutes.
System
-
Framework Desktop, AMD Ryzen AI Max 300 Series (Strix Halo, gfx1151)
-
BIOS INSYDE 03.04, PMFW 100.6.0
-
Kernel 7.0.1 (CachyOS), Wayland/KDE Plasma
-
Brave/Chrome with VAAPI hardware video decode enabled
The problem
When browsers use VAAPI hardware video decode (enabled by default since Chromium 143 / Brave 1.85 on Wayland), the amdgpu driver rapidly cycles VPE power state (PowerDownVpe / PowerUpVpe) via SMU messages every time a video starts, stops, or changes. The PMFW 100.6.0 firmware cannot handle this cycling — even a single PowerDown→PowerUp cycle can leave the SMU in a corrupted state where resp_reg gets stuck at 0. A few seconds later, the next SMU message times out, cascading into:
SMU: No response msg_reg: 32 resp_reg: 0
Failed to power gate VPE!
Failed to disable gfxoff!
ring gfx_0.0.0 timeout
GPU reset begin!
Followed by hard freeze requiring power cycle.
Root cause analysis
Using dynamic_debug tracing on smu_cmn.c, I captured the exact SMU message sequence before crashes. Key findings:
-
VPE is always involved — every crash includes PowerDownVpe (msg 0x32). VCN alone (PowerDownVcn0/Vcn1) cycles fine without crashes.
-
Timing doesn’t matter — tested settlement delays of 3ms (stock), 60ms, and 200ms between consecutive SMU messages. All crash. The bug is not about messages arriving “too fast.”
-
A single cycle is enough — one PowerDownVpe followed by one PowerUpVpe can corrupt the firmware state. It doesn’t require accumulation.
-
VCN cycling without VPE is stable — with VPE idle power gating disabled, VCN0 and VCN1 cycle freely (40+ transitions in 2 minutes) with zero errors.
The fix
The patch adds a module parameter amdgpu.no_vpe_idle_pg. When set to 1, vpe_ring_end_use() skips scheduling the idle work handler, so VPE stays powered after its first use. Suspend/resume is NOT affected — hw_fini / hw_init handle that path separately.
Power cost: ~0.5-1W idle (VPE block stays clocked). Negligible on a desktop system.
Patch (applies to kernel 7.0.x, should apply to 6.18+ with minor fuzz):
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -248,6 +248,7 @@
int amdgpu_umsch_mm_fwlog;
int amdgpu_rebar = -1; /* auto */
int amdgpu_user_queue = -1;
+int amdgpu_no_vpe_idle_pg;
uint amdgpu_hdmi_hpd_debounce_delay_ms;
@@ -424,6 +425,20 @@
module_param_named_unsafe(ip_block_mask, amdgpu_ip_block_mask, uint, 0444);
/**
+ * DOC: no_vpe_idle_pg (int)
+ * Disable VPE (Video Processing Engine) idle power gating (1 = VPE stays
+ * powered during normal use, 0 = normal power gating). Workaround for AMD
+ * Strix Halo PMFW 100.6.0 where PowerDownVpe/PowerUpVpe cycling causes an
+ * SMU deadlock during browser hardware video decode. Suspend/resume is not
+ * affected - hw_fini/hw_init handle that path separately. The default is 0
+ * (normal power gating behavior).
+ */
+MODULE_PARM_DESC(no_vpe_idle_pg,
+ "Disable VPE idle power gating (1 = skip, 0 = normal). "
+ "Workaround for Strix Halo PMFW 100.6.0 SMU deadlock (default: 0)");
+module_param_named(no_vpe_idle_pg, amdgpu_no_vpe_idle_pg, int, 0444);
+
+/**
* DOC: bapm (int)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -269,6 +269,7 @@
extern int amdgpu_wbrf;
extern int amdgpu_user_queue;
+extern int amdgpu_no_vpe_idle_pg;
extern uint amdgpu_hdmi_hpd_debounce_delay_ms;
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c
@@ -896,7 +896,8 @@
{
struct amdgpu_device *adev = ring->adev;
- schedule_delayed_work(&adev->vpe.idle_work, VPE_IDLE_TIMEOUT);
+ if (!amdgpu_no_vpe_idle_pg)
+ schedule_delayed_work(&adev->vpe.idle_work, VPE_IDLE_TIMEOUT);
}
How to use (without rebuilding kernel)
If your distro provides a way to patch the kernel (DKMS, out-of-tree module rebuild, or custom kernel), apply the patch above and boot with:
amdgpu.no_vpe_idle_pg=1
Or in /etc/modprobe.d/amdgpu-vpe.conf:
options amdgpu no_vpe_idle_pg=1
How to verify it’s working
# Enable SMU message tracing
echo 'module amdgpu file smu_cmn.c +p' | sudo tee /sys/kernel/debug/dynamic_debug/control
# Watch power messages (open/close YouTube videos in browser)
journalctl -kf -o short-precise | grep -iE "PowerUp|PowerDown"
Expected: PowerUpVcn0, PowerDownVcn0, PowerUpVcn1, PowerDownVcn1 cycling normally. No PowerUpVpe or PowerDownVpe messages after initial boot.
Request
If you have a Framework Desktop (Strix Halo) experiencing SMU deadlock / GPU freezes, please test this patch and report results. With 3-5 confirmations from different users I’ll submit it upstream to the amd-gfx mailing list.