[Arch/amdgpu] Screen freeze on zen kernel / new warnings on vanilla kernel - 6.12.5

Edit: disclaimer - it seems that complete freeze happens on -zen family of kernels, but part of this post (with the other warning printed by the kernel) is still applicable, thus leaving it anyway for better visibility. Especially given that -zen kernels are quite popular.

Which Linux distro are you using?
Arch

If rolling release, last date updated?
Just a moment ago.

Which kernel are you using?
6.12.5-zen

Which BIOS version are you using?
3.05

Which Framework Laptop 16 model are you using? (AMD Ryzen™ 7040 Series)
Ryzen 7 7840HS

I know that issues with Panel Replay / PSR are ongoing for some time already, but this one seem to be more problematic and I want to give everyone a heads up before upgrading to the newest kernel. Instead of having occasional refresh issues, this time screen freezes as soon as Plasma loads (Adaptive Sync enabled in Plasma settings, I guess it happens when it triggers it).

The problem goes away if I set amdgpu.dcdebugmask=0x610 (which effectively disables PSR + Panel Replay, I guess 0x400 would be sufficient though).

Following kernel warnings show up (first one shows even with PR disabled, second one is probably related to the screen hang and prints out only when PR is enabled):

  • one that shows up always (on both, vanilla and zen kernels, but doesn’t seem to cause any negative effects when PR is disabled)
kernel: ------------[ cut here ]------------
kernel: WARNING: CPU: 2 PID: 12 at drivers/gpu/drm/amd/amdgpu/../display/dc/dpp/dcn30/dcn30_dpp.c:534 dpp3_deferred_update+0x101/0x330 [amdgpu]
kernel: Modules linked in: snd_seq_dummy rfcomm snd_hrtimer snd_seq snd_seq_device typec_displayport cmac algif_hash algif_skcipher af_alg bnep vfat fat ch341 snd_sof_amd_acp70 snd_sof_amd_acp63 snd_soc_acpi
kernel:  hid_sensor_als snd_pcm_dmaengine hid_sensor_trigger snd_hda_codec snd_rpl_pci_acp6x industrialio_triggered_buffer iwlwifi snd_acp_pci kvm_amd kfifo_buf snd_hda_core snd_acp_legacy_common hid_sensor_
kernel:  aesni_intel drm_buddy nvme gf128mul drm_display_helper crypto_simd cryptd nvme_core cec ccp video crc16 nvme_auth wmi
kernel: CPU: 2 UID: 0 PID: 12 Comm: kworker/u64:1 Not tainted 6.12.5-zen1-1-zen #1 21e0c68887ee4451405af53cfb8d1434c391cada
kernel: Hardware name: Framework Laptop 16 (AMD Ryzen 7040 Series)/FRANMZCP07, BIOS 03.05 11/13/2024
kernel: Workqueue: events_unbound commit_work
kernel: RIP: 0010:dpp3_deferred_update+0x101/0x330 [amdgpu]
kernel: Code: 83 78 e1 00 00 0f b6 90 a8 02 00 00 48 8b 83 70 e1 00 00 8b b0 78 04 00 00 e8 db ad 12 00 8b 74 24 04 85 f6 0f 84 5d 01 00 00 <0f> 0b 0f b6 83 48 96 00 00 83 e0 f7 88 83 48 96 00 00 a8 01 0f 84
kernel: RSP: 0018:ffffbd4300187a40 EFLAGS: 00010202
kernel: RAX: 0000000000000066 RBX: ffff9d9bd21e0000 RCX: 0000000000000004
kernel: RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff9d9bd2080000
kernel: RBP: ffff9d9be3f00000 R08: ffffbd4300187a44 R09: 0000000000000000
kernel: R10: 0000000000000000 R11: ffffbd43001879ec R12: 0000000000000000
kernel: R13: ffff9d9be3f00308 R14: ffff9d9be3f05dc8 R15: ffff9d9bcd866000
kernel: FS:  0000000000000000(0000) GS:ffff9da925b00000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 000076d2d5ec2bb0 CR3: 0000000dbf822000 CR4: 0000000000f50ef0
kernel: PKRU: 55555554
kernel: Call Trace:
kernel:  <TASK>
kernel:  ? dpp3_deferred_update+0x101/0x330 [amdgpu 33bcc48fbf6361168e552fb71591e6476021068b]
kernel:  ? __warn.cold+0x93/0xed
kernel:  ? dpp3_deferred_update+0x101/0x330 [amdgpu 33bcc48fbf6361168e552fb71591e6476021068b]
kernel:  ? report_bug+0xe7/0x210
kernel:  ? handle_bug+0x58/0x90
kernel:  ? exc_invalid_op+0x19/0xc0
kernel:  ? asm_exc_invalid_op+0x1a/0x20
kernel:  ? dpp3_deferred_update+0x101/0x330 [amdgpu 33bcc48fbf6361168e552fb71591e6476021068b]
kernel:  dc_post_update_surfaces_to_stream+0x24f/0x470 [amdgpu 33bcc48fbf6361168e552fb71591e6476021068b]
kernel:  amdgpu_dm_commit_planes+0x1379/0x1fe0 [amdgpu 33bcc48fbf6361168e552fb71591e6476021068b]
kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
kernel:  amdgpu_dm_atomic_commit_tail+0x1312/0x3170 [amdgpu 33bcc48fbf6361168e552fb71591e6476021068b]
kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
kernel:  ? dma_fence_default_wait+0x8b/0x240
kernel:  ? __pfx_dma_fence_default_wait_cb+0x10/0x10
kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
kernel:  ? kvfree_call_rcu+0x26e/0x350
kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
kernel:  ? wait_for_completion_timeout+0x130/0x180
kernel:  commit_tail+0x91/0x130
kernel:  process_one_work+0x18f/0x350
kernel:  worker_thread+0x24c/0x380
kernel:  ? __pfx_worker_thread+0x10/0x10
kernel:  kthread+0xcf/0x100
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork+0x31/0x50
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork_asm+0x1a/0x30
kernel:  </TASK>
kernel: ---[ end trace 0000000000000000 ]---

  • shows after the above one when PR is disabled on zen kernel
kernel: ------------[ cut here ]------------
kernel: WARNING: CPU: 10 PID: 215 at drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dmub_replay.c:89 dmub_replay_enable+0xfe/0x170 [amdgpu]
kernel: Modules linked in: ccm snd_seq_dummy rfcomm snd_hrtimer snd_seq snd_seq_device typec_displayport cmac algif_hash algif_skcipher af_alg bnep vfat fat ch341 snd_sof_amd_acp70 snd_sof_amd_acp63 snd_soc_
kernel:  cros_ec_chardev kvm_amd cros_usbpd_notify cros_ec_debugfs snd_hda_core snd_acp_legacy_common mousedev cros_ec_sysfs cros_usbpd_logger gpio_cros_ec led_class_multicolor btmtk hid_sensor_iio_common bl
kernel:  aesni_intel drm_buddy nvme gf128mul drm_display_helper crypto_simd nvme_core cryptd cec ccp video crc16 nvme_auth wmi
kernel: CPU: 10 UID: 0 PID: 215 Comm: kworker/u64:3 Tainted: G        W          6.12.5-zen1-1-zen #1 21e0c68887ee4451405af53cfb8d1434c391cada
kernel: Tainted: [W]=WARN
kernel: Hardware name: Framework Laptop 16 (AMD Ryzen 7040 Series)/FRANMZCP07, BIOS 03.05 11/13/2024
kernel: Workqueue: events_unbound commit_work
kernel: RIP: 0010:dmub_replay_enable+0xfe/0x170 [amdgpu]
kernel: Code: 00 00 00 3d ff 00 00 00 74 c9 45 84 ff 74 69 85 c0 75 5a bf ac c4 20 00 41 83 c6 01 e8 ab a8 2c e8 41 81 fe e9 03 00 00 75 a7 <0f> 0b 48 8b 44 24 48 65 48 2b 04 25 28 00 00 00 75 57 48 83 c4 50
kernel: RSP: 0018:ffffb92800937990 EFLAGS: 00010246
kernel: RAX: 00000021e9c5343e RBX: 0000000000000002 RCX: 000000000000000a
kernel: RDX: 00000000000cb01c RSI: 00000000000ca4d9 RDI: 00000021e9b88422
kernel: RBP: 0000000000000000 R08: 0000000000000002 R09: ffff8db81255a880
kernel: R10: 000000000000000d R11: 0000000000000001 R12: ffff8db8127f2b20
kernel: R13: ffffb92800937994 R14: 00000000000003e9 R15: 0000000000000001
kernel: FS:  0000000000000000(0000) GS:ffff8dc565f00000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 000062de55fdce38 CR3: 00000007ef022000 CR4: 0000000000f50ef0
kernel: PKRU: 55555554
kernel: Call Trace:
kernel:  <TASK>
kernel:  ? dmub_replay_enable+0xfe/0x170 [amdgpu 33bcc48fbf6361168e552fb71591e6476021068b]
kernel:  ? __warn.cold+0x93/0xed
kernel:  ? dmub_replay_enable+0xfe/0x170 [amdgpu 33bcc48fbf6361168e552fb71591e6476021068b]
kernel:  ? report_bug+0xe7/0x210
kernel:  ? handle_bug+0x58/0x90
kernel:  ? exc_invalid_op+0x19/0xc0
kernel:  ? asm_exc_invalid_op+0x1a/0x20
kernel:  ? dmub_replay_enable+0xfe/0x170 [amdgpu 33bcc48fbf6361168e552fb71591e6476021068b]
kernel:  edp_set_replay_allow_active+0x149/0x1a0 [amdgpu 33bcc48fbf6361168e552fb71591e6476021068b]
kernel:  amdgpu_dm_replay_enable+0xc1/0xf0 [amdgpu 33bcc48fbf6361168e552fb71591e6476021068b]
kernel:  amdgpu_dm_commit_planes+0x1fd5/0x1fe0 [amdgpu 33bcc48fbf6361168e552fb71591e6476021068b]
kernel:  ? __entry_text_end+0x101e46/0x101e49
kernel:  amdgpu_dm_atomic_commit_tail+0x1312/0x3170 [amdgpu 33bcc48fbf6361168e552fb71591e6476021068b]
kernel:  ? __pfx_amdgpu_crtc_get_scanout_position+0x10/0x10 [amdgpu 33bcc48fbf6361168e552fb71591e6476021068b]
kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
kernel:  ? drm_crtc_vblank_helper_get_vblank_timestamp_internal+0x100/0x3b0
kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
kernel:  ? dma_fence_default_wait+0x8b/0x240
kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
kernel:  ? wait_for_completion_timeout+0x130/0x180
kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
kernel:  commit_tail+0x91/0x130
kernel:  process_one_work+0x18f/0x350
kernel:  worker_thread+0x24c/0x380
kernel:  ? __pfx_worker_thread+0x10/0x10
kernel:  kthread+0xcf/0x100
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork+0x31/0x50
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork_asm+0x1a/0x30
kernel:  </TASK>
kernel: ---[ end trace 0000000000000000 ]---

Edit: forgot to mention - it is recoverable with cat /sys/kernel/debug/dri/1/amdgpu_gpu_recover, but I haven’t use my laptop after to assess how stable it is.

Edit2: moved from 6.12.4 kernel. No major issues apart of occasional panel refresh hangs, disabling adaptive sync in Plasma settings was enough to make it work, so never went into that rabbit hole. More packages got update with this batch, including xorg-server. I’m in fact using -zen kernel, gonna try if it affects vanilla as well in a moment and gonna drop another edit on the bottom of the post.