Which Linux distro are you using?
NixOS
Which release version?
(if rolling release without a release version, skip this question)
24.05
(If rolling release, last date updated?)
Updates are sometimes backported unless they have major breaking changes, but I’m using the latest kernel package, not the one from 24.05 release.
Which kernel are you using?
6.11.6
Which BIOS version are you using?
3.05
Which Framework Laptop 13 model are you using? (AMD Ryzen™ 7040 Series, Intel® Core™ Ultra Series 1, 13th Gen Intel® Core™ , 12th Gen Intel® Core™, 11th Gen Intel® Core™)
AMD Ryzen 7040
Hello again @Mario_Limonciello, if you could share any insights again, if this could be firmware related or not likely ? (I haven’t seen any clocksource issue on 6.11.6 but haven’t ran it for long let’s say)
Related to : Random hard freezes fw13 amd7840u win11 - #491 by Mario_Limonciello, I just ran into a similar issue again and a on a earlier kernel on 6.11.X (probably 6.11.4 or 6.11.5) but the UI/laptop gets really slow (sometimes some lines do appear in dmesg showing the DCMUB error sometimes nothing at all shows up), just the laptop becomes very slow, cursor movement is very very slow, (sound does seems to still play normally however even during amdgpu recover).
I might be able to trigger it reliably at some point, but I’m not sure how I could get more detailed logs on the dc_dmub_srv DMCUB error
, I can boot with some debug parameter and wait for the thing to trigger (my current uptime is 6 days right now).
In both cases, I triggered an amdgpu_recover
manually and that helped recover the normal behavior (the desktop effects however in kwin seems to trigger a reset and do not work normally anymore after the recover, aka changing desktop, the animation doesn’t seem to work as intended, there is no glitch however).
I don’t see any way to reliably reproduce the issue. I just use Plasma KDE 6.0.1 (Wayland mostly), Firefox with lots of tabs and some desktop KDE effects/animations which are installed by default.
I’ve had these logs output in the dmesg for the one I just hit right now on Linux hostname 6.11.6 #1-NixOS SMP PREEMPT_DYNAMIC Fri Nov 1 01:02:44 UTC 2024 x86_64 GNU/Linux
:
[544173.066901] ucsi_acpi USBC000:00: GET_CABLE_PROPERTY failed (-5)
[547164.003747] pcieport 0000:00:08.1: PME: Spurious native interrupt!
[549790.627589] pcieport 0000:00:08.1: PME: Spurious native interrupt!
[551240.944579] amdgpu 0000:c1:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
[551241.220808] amdgpu 0000:c1:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
[551241.525301] amdgpu 0000:c1:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
[551241.802013] amdgpu 0000:c1:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
[551242.646129] amdgpu 0000:c1:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
[551242.917583] amdgpu 0000:c1:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
[551243.930038] amdgpu 0000:c1:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
[551244.631666] amdgpu 0000:c1:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
[551244.900847] amdgpu 0000:c1:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
[551347.902132] amdgpu 0000:c1:00.0: amdgpu: GPU reset begin!
[551349.824265] amdgpu 0000:c1:00.0: amdgpu: MODE2 reset
[551349.864679] amdgpu 0000:c1:00.0: amdgpu: GPU reset succeeded, trying to resume
[551349.865403] [drm] PCIE GART of 512M enabled (table at 0x00000080FFD00000).
[551349.865450] [drm] VRAM is lost due to GPU reset!
[551349.865453] amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
[551349.867396] amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
[551349.869980] [drm] DMUB hardware initialized: version=0x08004300
[551350.689489] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[551350.689498] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[551350.689501] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[551350.689503] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[551350.689505] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[551350.689507] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[551350.689509] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[551350.689511] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[551350.689513] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[551350.689515] amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[551350.689517] amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[551350.689519] amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[551350.689521] amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[551350.692304] amdgpu 0000:c1:00.0: amdgpu: recover vram bo from shadow start
[551350.692309] amdgpu 0000:c1:00.0: amdgpu: recover vram bo from shadow done
[551350.692325] amdgpu 0000:c1:00.0: amdgpu: GPU reset(1) succeeded!
Checking warns, I also see before (most likely when pluggin a USB-C dock (ThinkPad USB-C Dock Gen2) with a screen plugged in HDMI into it, these logs or after suspend/resume with the dock still plugged in but those seems unrelated to the above issue):
[460497.335450] amdgpu 0000:c1:00.0: [drm] REG_WAIT timeout 1us * 100 tries - dcn31_program_compbuf_size line:141
[483054.591962] ucsi_acpi USBC000:00: UCSI_GET_PDOS failed (-70)
[488479.807645] amdgpu 0000:c1:00.0: [drm] REG_WAIT timeout 1us * 100 tries - dcn31_program_compbuf_size line:141
Scrolling a lot back, I saw this stack trace (miru
is an Electron app that tends to crash a lot, I believe running with --no-gpu-sandbox
helps a bit, but I’ve never found out why exactly those crashes happen) :
[21867.427179] miru[1900449]: segfault at 60 ip 00007f3feba1eed8 sp 00007ffc4264ca00 error 4 in libLLVM-17.so[341eed8,7f3fe9127000+4707000] likely on CPU 15 (core 7, socket 0)
[21867.427195] Code: 24 38 0f b6 4c 24 18 88 4c 24 3c 8b 4c 24 2c 48 8b 54 24 38 89 4c 24 48 0f b6 4c 24 2a 48 89 54 24 40 88 4c 24 4c 48 8b 0c 24 <4c> 8b 51 60 4d 85 d2 4c 89 54 24 18 0f 84 e6 02 00 00 4c 89 6c 24
[21867.940880] miru[1900505]: segfault at 60 ip 00007fc1e601eed8 sp 00007ffdeb3d5840 error 4 in libLLVM-17.so[341eed8,7fc1e3727000+4707000] likely on CPU 10 (core 5, socket 0)
[21867.940896] Code: 24 38 0f b6 4c 24 18 88 4c 24 3c 8b 4c 24 2c 48 8b 54 24 38 89 4c 24 48 0f b6 4c 24 2a 48 89 54 24 40 88 4c 24 4c 48 8b 0c 24 <4c> 8b 51 60 4d 85 d2 4c 89 54 24 18 0f 84 e6 02 00 00 4c 89 6c 24
[21868.585313] miru[1900525]: segfault at 60 ip 00007fb1e6a1eed8 sp 00007ffc3b59e4f0 error 4 in libLLVM-17.so[341eed8,7fb1e4127000+4707000] likely on CPU 11 (core 5, socket 0)
[21868.585335] Code: 24 38 0f b6 4c 24 18 88 4c 24 3c 8b 4c 24 2c 48 8b 54 24 38 89 4c 24 48 0f b6 4c 24 2a 48 89 54 24 40 88 4c 24 4c 48 8b 0c 24 <4c> 8b 51 60 4d 85 d2 4c 89 54 24 18 0f 84 e6 02 00 00 4c 89 6c 24
[21869.310952] traps: VizCompositorTh[1900578] trap int3 ip:564ad3dfd143 sp:7f5eaeffdfa0 error:0 in miru[6c9d143,564acf112000+816b000]
[21870.515856] traps: VizCompositorTh[1900646] trap int3 ip:562bc76fd143 sp:7f0db7bfdfa0 error:0 in miru[6c9d143,562bc2a12000+816b000]
[21871.544647] traps: VizCompositorTh[1900729] trap int3 ip:55f16b2e9143 sp:7feda5dfdfa0 error:0 in miru[6c9d143,55f1665fe000+816b000]
[21872.468059] traps: VizCompositorTh[1900758] trap int3 ip:55b4060ef143 sp:7fd59d1fdfa0 error:0 in miru[6c9d143,55b401404000+816b000]
[21873.033814] traps: VizCompositorTh[1900787] trap int3 ip:557fea97b143 sp:7f51a9dfdfa0 error:0 in miru[6c9d143,557fe5c90000+816b000]
[21873.722247] traps: VizCompositorTh[1900816] trap int3 ip:5592fc0ee143 sp:7f04273fdfa0 error:0 in miru[6c9d143,5592f7403000+816b000]
[21874.304966] traps: miru[1897191] trap int3 ip:5648d12f02ca sp:7ffc65c62aa0 error:0 in miru[5bd92ca,5648cd6c9000+816b000]
[26895.270590] usb 5-1: USB disconnect, device number 2
[26895.270600] usb 5-1.3: USB disconnect, device number 3
[26895.270604] usb 5-1.3.2: USB disconnect, device number 4
[26895.297832] usb 6-1: USB disconnect, device number 2
[26895.297842] r8152-cfgselector 6-1.1: USB disconnect, device number 3
[26895.298025] r8152 6-1.1:1.0 enp195s0f4u1u1: Stop submitting intr, status -108
[26895.332747] [drm] DM_MST: stopping TM on aconnector: 00000000eb34a5dd [id: 112]
[26895.345893] usb 6-1.3: USB disconnect, device number 4
[26895.505905] usb 5-1.3.3: USB disconnect, device number 5
[26895.505914] usb 5-1.3.3.1: USB disconnect, device number 8
[26895.537785] usb 5-1.3.3.2: USB disconnect, device number 7
[26895.630787] amdgpu 0000:c1:00.0: [drm] REG_WAIT timeout 1us * 100 tries - dcn31_program_compbuf_size line:141
[26895.630933] ------------[ cut here ]------------
[26895.630935] WARNING: CPU: 13 PID: 3005 at drivers/gpu/drm/amd/amdgpu/../display/dc/hubbub/dcn31/dcn31_hubbub.c:151 dcn31_program_compbuf_size+0xd4/0x230 [amdgpu]
[26895.631522] Modules linked in: tcp_diag inet_diag uvcvideo videobuf2_vmalloc uvc videobuf2_memops videobuf2_v4l2 videodev videobuf2_common snd_usb_audio snd_usbmidi_lib snd_ump snd_rawmidi mc usbhid cdc_ether usbnet r8152 mii libphy ccm qrtr vhost_net vhost vhost_iotlb iptable_mangle xt_CHECKSUM xt_multiport iptable_nat nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device overlay af_packet uhid cmac algif_hash algif_skcipher af_alg bnep amdgpu mt7921e xt_conntrack mt7921_common snd_sof_amd_acp63 mt792x_lib snd_sof_amd_vangogh snd_sof_amd_rembrandt mt76_connac_lib snd_sof_amd_renoir snd_sof_amd_acp xt_policy mt76 snd_sof_pci snd_sof_xtensa_dsp snd_sof ip6t_rpfilter ipt_rpfilter snd_sof_utils snd_pci_ps snd_amd_sdw_acpi soundwire_amd mac80211 soundwire_generic_allocation soundwire_bus xt_pkttype snd_soc_core hid_sensor_als amdxcp snd_hda_codec_realtek drm_exec xt_LOG hid_sensor_trigger snd_compress snd_hda_codec_generic gpu_sched
[26895.631604] nf_log_syslog industrialio_triggered_buffer ac97_bus kfifo_buf snd_pcm_dmaengine drm_buddy snd_hda_scodec_component snd_hda_codec_hdmi snd_hda_intel xt_tcpudp drm_suballoc_helper edac_mce_amd hid_sensor_iio_common snd_rpl_pci_acp6x snd_intel_dspcfg cros_usbpd_charger leds_cros_ec nft_compat cros_charge_control cros_ec_sysfs mousedev industrialio cros_ec_hwmon cros_ec_debugfs cros_ec_chardev gpio_cros_ec cros_usbpd_logger cros_usbpd_notify led_class_multicolor cros_kbd_led_backlight spd5118 snd_acp_pci snd_intel_sdw_acpi edac_core drm_ttm_helper snd_acp_legacy_common intel_rapl_msr snd_hda_codec cfg80211 ttm amd_atl sp5100_tco snd_pci_acp6x snd_hda_core intel_rapl_common crct10dif_pclmul btusb drm_display_helper cros_ec_dev snd_hwdep crc32_pclmul watchdog snd_pci_acp5x polyval_clmulni amd_pmf btrtl polyval_generic nls_iso8859_1 ghash_clmulni_intel amdtee btintel snd_pcm snd_rn_pci_acp3x nls_cp437 btbcm sha512_ssse3 cec snd_acp_config i2c_piix4 hid_sensor_hub snd_soc_acpi amd_sfh snd_timer i2c_algo_bit
[26895.631707] hid_multitouch btmtk ucsi_acpi vfat cros_ec_lpcs sha256_ssse3 joydev hid_generic fat bluetooth nf_tables sha1_ssse3 cros_ec aesni_intel wmi_bmof evdev gf128mul sch_fq_codel snd platform_profile tiny_power_button crypto_simd rfkill cryptd rapl typec_ucsi video snd_pci_acp3x soundcore libarc4 k10temp i2c_smbus typec i2c_hid_acpi battery thermal roles i2c_hid ac wmi backlight hid uinput wireguard button tee tpm_crb amd_pmc curve25519_x86_64 input_leds libchacha20poly1305 led_class chacha_x86_64 mac_hid poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel serio_raw udp_tunnel loop xt_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c br_netfilter veth tun tap macvlan bridge stp llc kvm_amd ccp kvm fuse efi_pstore configfs nfnetlink efivarfs dmi_sysfs ip_tables x_tables autofs4 ext4 crc32c_generic crc16 mbcache jbd2 xhci_pci atkbd xhci_pci_renesas firmware_class libps2 vivaldi_fmap thunderbolt nvme tpm_tis xhci_hcd tpm_tis_core nvme_core tpm crc32c_intel i8042 rng_core nvme_auth
[26895.631818] libaescfb rtc_cmos serio ecdh_generic ecc dm_mod dax
[26895.631826] CPU: 13 UID: 1000 PID: 3005 Comm: .kwin_wayland-w Not tainted 6.11.6 #1-NixOS
[26895.631830] Hardware name: Framework Laptop 13 (AMD Ryzen 7040Series)/FRANMDCP07, BIOS 03.05 03/29/2024
[26895.631832] RIP: 0010:dcn31_program_compbuf_size+0xd4/0x230 [amdgpu]
[26895.632078] Code: 48 8b 43 28 8b 88 b0 01 00 00 48 8b 43 20 0f b6 50 6c 48 8b 43 18 8b b0 14 01 00 00 e8 a5 a9 10 00 85 c0 0f 85 35 01 00 00 90 <0f> 0b 90 48 8b 44 24 08 65 48 2b 04 25 28 00 00 00 0f 85 37 01 00
[26895.632081] RSP: 0018:ffffb449c2483760 EFLAGS: 00010202
[26895.632083] RAX: 0000000000000001 RBX: ffff9efe2aedd400 RCX: 000000000000001f
[26895.632085] RDX: 0000000000000000 RSI: 00000000000015bf RDI: ffff9efe22980000
[26895.632087] RBP: 0000000000000004 R08: ffffb449c2483764 R09: 0000000000000019
[26895.632088] R10: ffffb449c24834c8 R11: ffffffff9f53bf68 R12: ffff9f061adc0000
[26895.632090] R13: ffff9efe33c00000 R14: ffff9efe2aedd400 R15: ffff9f061adc5f78
[26895.632091] FS: 00007efeeb191000(0000) GS:ffff9f0c62080000(0000) knlGS:0000000000000000
[26895.632093] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[26895.632094] CR2: 00000fdc16f01000 CR3: 0000000116368000 CR4: 0000000000f50ef0
[26895.632096] PKRU: 55555554
[26895.632097] Call Trace:
[26895.632103] <TASK>
[26895.632107] ? __warn+0x80/0x120
[26895.632116] ? dcn31_program_compbuf_size+0xd4/0x230 [amdgpu]
[26895.632354] ? report_bug+0x164/0x190
[26895.632361] ? handle_bug+0x3d/0x80
[26895.632365] ? exc_invalid_op+0x17/0x70
[26895.632367] ? asm_exc_invalid_op+0x1a/0x20
[26895.632374] ? dcn31_program_compbuf_size+0xd4/0x230 [amdgpu]
[26895.632564] dcn20_optimize_bandwidth+0xe4/0x220 [amdgpu]
[26895.632822] dc_commit_state_no_check+0xc29/0xe60 [amdgpu]
[26895.633043] dc_commit_streams+0x2a1/0x400 [amdgpu]
[26895.633289] amdgpu_dm_atomic_commit_tail+0x5ef/0x3c40 [amdgpu]
[26895.633588] ? generic_reg_get+0x21/0x40 [amdgpu]
[26895.633841] ? srso_alias_return_thunk+0x5/0xfbef5
[26895.633847] ? optc1_get_crtc_scanoutpos+0x7b/0xb0 [amdgpu]
[26895.634092] ? srso_alias_return_thunk+0x5/0xfbef5
[26895.634095] ? dc_stream_get_scanoutpos+0x73/0xb0 [amdgpu]
[26895.634313] ? dm_crtc_get_scanoutpos+0x1/0x120 [amdgpu]
[26895.634555] ? ktime_get+0x3a/0xd0
[26895.634563] ? __pfx_amdgpu_crtc_get_scanout_position+0x10/0x10 [amdgpu]
[26895.634774] ? srso_alias_return_thunk+0x5/0xfbef5
[26895.634779] ? amdgpu_crtc_get_scanout_position+0x28/0x40 [amdgpu]
[26895.634987] ? srso_alias_return_thunk+0x5/0xfbef5
[26895.634990] ? drm_crtc_vblank_helper_get_vblank_timestamp_internal+0xf3/0x3b0
[26895.635001] ? srso_alias_return_thunk+0x5/0xfbef5
[26895.635004] ? wait_for_completion_timeout+0x135/0x160
[26895.635012] ? srso_alias_return_thunk+0x5/0xfbef5
[26895.635015] ? drm_crtc_get_last_vbltimestamp+0x55/0x90
[26895.635023] commit_tail+0x91/0x130
[26895.635031] drm_atomic_helper_commit+0x11a/0x140
[26895.635036] drm_atomic_commit+0xa8/0xe0
[26895.635042] ? __pfx___drm_printfn_info+0x10/0x10
[26895.635049] drm_mode_atomic_ioctl+0xb1f/0xd70
[26895.635057] ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
[26895.635060] drm_ioctl_kernel+0xb2/0x110
[26895.635068] drm_ioctl+0x274/0x4e0
[26895.635071] ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
[26895.635078] amdgpu_drm_ioctl+0x4e/0x90 [amdgpu]
[26895.635294] __x64_sys_ioctl+0x94/0xd0
[26895.635302] do_syscall_64+0xb7/0x200
[26895.635307] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[26895.635312] RIP: 0033:0x7efeefb189cf
[26895.635366] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[26895.635369] RSP: 002b:00007fff335b4da0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[26895.635373] RAX: ffffffffffffffda RBX: 0000000038bd4370 RCX: 00007efeefb189cf
[26895.635375] RDX: 00007fff335b4e90 RSI: 00000000c03864bc RDI: 0000000000000013
[26895.635377] RBP: 00007fff335b4e90 R08: 00007efe8c005230 R09: 0000000038466280
[26895.635379] R10: 0000000037d40ee0 R11: 0000000000000246 R12: 00000000c03864bc
[26895.635380] R13: 0000000000000013 R14: 0000000038bec4f4 R15: 0000000000000018
[26895.635386] </TASK>
[26895.635388] ---[ end trace 0000000000000000 ]---
[27835.218780] ucsi_acpi USBC000:00: unknown error 0
[27835.218801] ucsi_acpi USBC000:00: GET_CABLE_PROPERTY failed (-5)