my FW13 AMD UI started to freeze this week. Not during wakeup, but normal use.
sometimes journal has DMCUB error
sometimes not.
a bunch of
amdgpu 0000:c1:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
amdgpu 0000:c1:00.0: [drm] *ERROR* [CRTC:79:crtc-0] flip_done timed out
again, a bunch of
amdgpu 0000:c1:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
amdgpu 0000:c1:00.0: [drm] *ERROR* flip_done timed out
------------[ cut here ]------------
kernel: WARNING: CPU: 6 PID: 1488 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:9205 amdgpu_dm_atomic_commit_tail+0x392f/0x3a00 [amdgpu]
kernel: Modules linked in: snd_seq_midi snd_seq_dummy snd_seq_midi_event snd_seq ixgbe xfrm_algo mdio_devres libphy mdio dca sd_mod scsi_mod scsi_common uhid usbhid ipmi_devintf ipmi_msgh>
kernel: gf128mul libarc4 snd_rpl_pci_acp6x hid_sensor_iio_common snd_seq_device snd_hwdep snd_pci_acp6x crypto_simd industrialio_triggered_buffer snd_pcm videobuf2_common amd_pmf cryptd >
kernel: drm_ttm_helper xhci_pci hid_generic xhci_hcd ttm i2c_hid_acpi i2c_hid cros_ec_dev drm_kms_helper hid nvme cros_ec_lpcs usbcore cros_ec thunderbolt nvme_core crc32_pclmul drm crc3>
kernel: CPU: 6 UID: 0 PID: 1488 Comm: Xorg Not tainted 6.12.13-amd64 #1 Debian 6.12.13-1
kernel: Hardware name: Framework Laptop 13 (AMD Ryzen 7040Series)/FRANMDCP07, BIOS 03.05 03/29/2024
kernel: RIP: 0010:amdgpu_dm_atomic_commit_tail+0x392f/0x3a00 [amdgpu]
kernel: Code: d8 60 50 c1 e8 42 ea 86 ff e9 20 fe ff ff 49 8d 87 40 31 04 00 c6 85 38 fe ff ff 00 48 89 85 48 fe ff ff e9 f6 cc ff ff 0f 0b <0f> 0b e9 64 f3 ff ff 0f 0b e9 28 cd ff ff 0f >
kernel: RSP: 0018:ffffb0744298f798 EFLAGS: 00010002
kernel: RAX: 0000000000000286 RBX: 0000000000000286 RCX: ffff8b7e70fe0118
kernel: RDX: 0000000000000001 RSI: 0000000000000297 RDI: ffff8b7e79500178
kernel: RBP: ffffb0744298f9e0 R08: ffffb0744298f684 R09: 0000000000000000
kernel: R10: ffffb0744298f6f0 R11: ffffb0744298f6f4 R12: 0000000000000002
kernel: R13: 0000000000000000 R14: ffff8b8241256c00 R15: ffff8b7e70fe0000
kernel: FS: 00007f76e405eb00(0000) GS:ffff8b9481d00000(0000) knlGS:0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007f8b25cde6d0 CR3: 000000012934e000 CR4: 0000000000f50ef0
kernel: PKRU: 55555554
kernel: Call Trace:
kernel: <TASK>
kernel: ? amdgpu_dm_atomic_commit_tail+0x392f/0x3a00 [amdgpu]
kernel: ? __warn.cold+0x93/0xf6
kernel: ? amdgpu_dm_atomic_commit_tail+0x392f/0x3a00 [amdgpu]
kernel: ? report_bug+0xff/0x140
kernel: ? handle_bug+0x58/0x90
kernel: ? exc_invalid_op+0x17/0x70
kernel: ? asm_exc_invalid_op+0x1a/0x20
kernel: ? amdgpu_dm_atomic_commit_tail+0x392f/0x3a00 [amdgpu]
kernel: ? amdgpu_dm_atomic_commit_tail+0x2c87/0x3a00 [amdgpu]
kernel: commit_tail+0x91/0x130 [drm_kms_helper]
kernel: drm_atomic_helper_commit+0x11a/0x140 [drm_kms_helper]
kernel: drm_atomic_commit+0xa6/0xe0 [drm]
kernel: ? __pfx___drm_printfn_info+0x10/0x10 [drm]
kernel: drm_atomic_helper_set_config+0x74/0xb0 [drm_kms_helper]
kernel: drm_mode_setcrtc+0x46c/0x8a0 [drm]
kernel: ? __pfx_drm_mode_setcrtc+0x10/0x10 [drm]
kernel: drm_ioctl_kernel+0xad/0x100 [drm]
kernel: drm_ioctl+0x277/0x4f0 [drm]
kernel: ? __pfx_drm_mode_setcrtc+0x10/0x10 [drm]
kernel: amdgpu_drm_ioctl+0x4b/0x80 [amdgpu]
kernel: __x64_sys_ioctl+0x91/0xd0
kernel: do_syscall_64+0x82/0x190
kernel: ? vfs_write+0x311/0x450
kernel: ? srso_alias_return_thunk+0x5/0xfbef5
kernel: ? vfs_write+0x311/0x450
kernel: ? srso_alias_return_thunk+0x5/0xfbef5
kernel: ? syscall_exit_to_user_mode+0x164/0x210
kernel: ? srso_alias_return_thunk+0x5/0xfbef5
kernel: ? do_syscall_64+0x8e/0x190
kernel: ? srso_alias_return_thunk+0x5/0xfbef5
kernel: ? amdgpu_drm_ioctl+0x6e/0x80 [amdgpu]
kernel: ? srso_alias_return_thunk+0x5/0xfbef5
kernel: ? syscall_exit_to_user_mode+0x164/0x210
kernel: ? srso_alias_return_thunk+0x5/0xfbef5
kernel: ? do_syscall_64+0x8e/0x190
kernel: ? srso_alias_return_thunk+0x5/0xfbef5
kernel: ? do_syscall_64+0x8e/0x190
kernel: ? srso_alias_return_thunk+0x5/0xfbef5
kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
kernel: RIP: 0033:0x7f76e43fe37b
kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 >
kernel: RSP: 002b:00007ffc853ff2e0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
kernel: RAX: ffffffffffffffda RBX: 0000560a1af48ea0 RCX: 00007f76e43fe37b
kernel: RDX: 00007ffc853ff370 RSI: 00000000c06864a2 RDI: 000000000000000f
kernel: RBP: 00007ffc853ff370 R08: 0000000000000000 R09: 0000560a1e1d9e50
kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c06864a2
kernel: R13: 000000000000000f R14: 0000560a1a0cab60 R15: 0000560a1a3fbdd0
kernel: </TASK>
---[ end trace 0000000000000000 ]---
then a repeated PID 1488 crash
- BIOS 3.05
- Debian testing/unstable
- kernel 6.12.13-amd64
- XFCE 4.20.1 with xserver, no wayland
sudo cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover
always works, but I have to do this via ssh.
sometimes journal has DMCUB error sometimes not.