[Arch/amdgpu] Screen freeze on zen kernel / new warnings on vanilla kernel - 6.12.5

Same problem here on a Framework 13 AMD.

Linux kolme 6.11.11-1-MANJARO #1 SMP PREEMPT_DYNAMIC Thu, 05 Dec 2024 16:26:44 +0000 x86_64 GNU/Linux

It appears sporadically since like end of October 2024.

After boot, I observe the dmesg below. The computer works fine until suddenly it gets irresponsive. Then I usually see kworker events_unbound processes that take 100% CPU. Only reboot helps then to get the machine back to usable state.

[   60.543139] ------------[ cut here ]------------
[   60.543144] WARNING: CPU: 12 PID: 439 at drivers/gpu/drm/amd/amdgpu/../display/dc/dpp/dcn30/dcn30_dpp.c:534 dpp3_deferred_update+0x101/0x330 [amdgpu]
[   60.543599] Modules linked in: snd_seq_dummy rfcomm snd_hrtimer snd_seq snd_seq_device qrtr cmac algif_hash algif_skcipher af_alg bnep vfat fat snd_sof_amd_acp63 snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp amd_atl intel_rapl_msr snd_sof_pci intel_rapl_common snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_pci_ps snd_hda_codec_realtek snd_amd_sdw_acpi soundwire_amd snd_hda_codec_generic mt7921e soundwire_generic_allocation snd_hda_scodec_component mt7921_common soundwire_bus mt792x_lib snd_hda_codec_hdmi mt76_connac_lib snd_soc_core snd_hda_intel mt76 snd_intel_dspcfg snd_compress uvcvideo cros_usbpd_charger leds_cros_ec ac97_bus snd_intel_sdw_acpi gpio_cros_ec cros_kbd_led_backlight led_class_multicolor cros_ec_sysfs cros_ec_chardev snd_pcm_dmaengine cros_usbpd_logger hid_sensor_als cros_charge_control mac80211 cros_usbpd_notify cros_ec_hwmon mousedev cros_ec_debugfs snd_hda_codec videobuf2_vmalloc snd_rpl_pci_acp6x hid_sensor_trigger snd_acp_pci uvc kvm_amd industrialio_triggered_buffer
[   60.543677]  snd_hda_core snd_acp_legacy_common videobuf2_memops kfifo_buf snd_pci_acp6x hid_sensor_iio_common snd_hwdep libarc4 btusb industrialio videobuf2_v4l2 cros_ec_dev spd5118 snd_pcm snd_pci_acp5x btrtl sp5100_tco cfg80211 videodev snd_timer btintel kvm snd_rn_pci_acp3x joydev hid_multitouch hid_sensor_hub btbcm ucsi_acpi snd_acp_config snd videobuf2_common typec_ucsi btmtk cros_ec_lpcs i2c_piix4 snd_soc_acpi cros_ec bluetooth mc wmi_bmof cdc_acm rapl typec pcspkr amd_pmf thunderbolt snd_pci_acp3x rfkill soundcore i2c_smbus k10temp roles amdtee i2c_hid_acpi amd_sfh platform_profile i2c_hid amd_pmc mac_hid i2c_dev crypto_user loop nfnetlink ip_tables x_tables btrfs blake2b_generic libcrc32c crc32c_generic xor raid6_pq dm_crypt cbc encrypted_keys trusted asn1_encoder tee dm_mod hid_generic usbhid amdgpu crct10dif_pclmul crc32_pclmul crc16 crc32c_intel amdxcp i2c_algo_bit polyval_clmulni drm_ttm_helper polyval_generic ghash_clmulni_intel ttm serio_raw sha512_ssse3 drm_exec atkbd sha256_ssse3 libps2 sha1_ssse3
[   60.543800]  gpu_sched aesni_intel drm_suballoc_helper vivaldi_fmap nvme gf128mul drm_buddy crypto_simd drm_display_helper nvme_core cryptd xhci_pci i8042 ccp video cec xhci_pci_renesas nvme_auth serio wmi
[   60.543825] CPU: 12 UID: 0 PID: 439 Comm: kworker/u64:6 Not tainted 6.11.11-1-MANJARO #1 bd5d6dc86bb7f74bdd4f3d28006de2c066abf4fd
[   60.543831] Hardware name: Framework Laptop 13 (AMD Ryzen 7040Series)/FRANMDCP07, BIOS 03.05 03/29/2024
[   60.543834] Workqueue: events_unbound commit_work
[   60.543841] RIP: 0010:dpp3_deferred_update+0x101/0x330 [amdgpu]
[   60.544131] Code: 83 78 e1 00 00 0f b6 90 a8 02 00 00 48 8b 83 70 e1 00 00 8b b0 78 04 00 00 e8 7b c0 11 00 8b 74 24 04 85 f6 0f 84 5d 01 00 00 <0f> 0b 0f b6 83 48 96 00 00 83 e0 f7 88 83 48 96 00 00 a8 01 0f 84
[   60.544132] RSP: 0018:ffffb9878061fba0 EFLAGS: 00010202
[   60.544134] RAX: 0000000000000066 RBX: ffff94e1d4b80000 RCX: 0000000000000004
[   60.544135] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff94e1d3a80000
[   60.544136] RBP: ffff94e232080000 R08: ffffb9878061fba4 R09: ffffb9878061fbd0
[   60.544137] R10: ffffb9878061fb48 R11: 0000000000000000 R12: 0000000000000000
[   60.544138] R13: ffff94e2320840a8 R14: ffff94e232085f78 R15: ffff94e20411f200
[   60.544139] FS:  0000000000000000(0000) GS:ffff94e91e800000(0000) knlGS:0000000000000000
[   60.544140] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   60.544141] CR2: 000072282420d000 CR3: 0000000732222000 CR4: 0000000000f50ef0
[   60.544142] PKRU: 55555554
[   60.544143] Call Trace:
[   60.544146]  <TASK>
[   60.544147]  ? dpp3_deferred_update+0x101/0x330 [amdgpu bb0261db58d620ad314df009a0edb3b84ae8ec8d]
[   60.544297]  ? __warn.cold+0x8e/0xe8
[   60.544300]  ? dpp3_deferred_update+0x101/0x330 [amdgpu bb0261db58d620ad314df009a0edb3b84ae8ec8d]
[   60.544444]  ? report_bug+0xff/0x140
[   60.544448]  ? handle_bug+0x58/0x90
[   60.544449]  ? exc_invalid_op+0x17/0x70
[   60.544451]  ? asm_exc_invalid_op+0x1a/0x20
[   60.544454]  ? dpp3_deferred_update+0x101/0x330 [amdgpu bb0261db58d620ad314df009a0edb3b84ae8ec8d]
[   60.544595]  dc_post_update_surfaces_to_stream+0x1b1/0x2b0 [amdgpu bb0261db58d620ad314df009a0edb3b84ae8ec8d]
[   60.544743]  amdgpu_dm_atomic_commit_tail+0x2cca/0x3ab0 [amdgpu bb0261db58d620ad314df009a0edb3b84ae8ec8d]
[   60.544912]  ? srso_alias_return_thunk+0x5/0xfbef5
[   60.544914]  ? drm_crtc_vblank_helper_get_vblank_timestamp_internal+0x162/0x3a0
[   60.544917]  ? srso_alias_return_thunk+0x5/0xfbef5
[   60.544919]  ? dma_fence_default_wait+0x8b/0x250
[   60.544922]  ? srso_alias_return_thunk+0x5/0xfbef5
[   60.544923]  ? wait_for_completion_timeout+0x130/0x180
[   60.544925]  ? srso_alias_return_thunk+0x5/0xfbef5
[   60.544926]  ? dma_fence_wait_timeout+0x108/0x140
[   60.544930]  commit_tail+0x91/0x130
[   60.544932]  process_one_work+0x17b/0x330
[   60.544936]  worker_thread+0x2ce/0x3f0
[   60.544938]  ? __pfx_worker_thread+0x10/0x10
[   60.544940]  kthread+0xcf/0x100
[   60.544942]  ? __pfx_kthread+0x10/0x10
[   60.544945]  ret_from_fork+0x31/0x50
[   60.544947]  ? __pfx_kthread+0x10/0x10
[   60.544949]  ret_from_fork_asm+0x1a/0x30
[   60.544953]  </TASK>
[   60.544954] ---[ end trace 0000000000000000 ]---