Crashes probably related to amdgpu since 2 days

Since 2 days (Dec. 3 2025) I am having random crashes with my fw16. When the crash occurs, the display is frozen for some seconds, afterwards it goes black. The machine is not responsive anymore, even though it is powered (lights on keyboard etc. stay on). It is not possible to switch to another terminal (Ctrl-Alt-Fx). The machine has to be hard booted again. After some time (minutes or hours), it crashes again, no clear pattern observable.

At each crash I get a kernel message of this kind:

2025-12-05T10:52:46.200841+01:00 mobile-009 kernel: RIP: 0010:dcn314_smu_send_msg_with_param+0x11d/0x1c0 [amdgpu]

These messages never appeared before this date. I have tried booting the last gen kernel (6.8.0-87-generic), but crashes stay.

Anyone has an idea?

Which Linux distro are you using? Ubuntu 24.04, mainline

Which kernel are you using? 6.8.0-88-generic

Which BIOS version are you using? 0.0.3.7

Which Framework Laptop 16 model are you using? AMD Ryzen 7 7840HS w/ Radeon 780M Graphics

Here is a kern.log sample of the crashes. The first error logged is always like:

**kernel: \[drm:amdgpu_job_timedout \[amdgpu\]\] *ERROR* ring sdma0 timeout, signaled seq=3503, emitted seq=3505
2025-12-04T10:18:47.568305+01:00 mobile-009 kernel: \[drm:amdgpu_job_timedout \[amdgpu\]\] *ERROR* Process information: process  pid 0 thread  pid 0
2025-12-04T10:18:47.568307+01:00 mobile-009 kernel: amdgpu 0000:c2:00.0: amdgpu: GPU reset begin!
2025-12-04T10:18:53.408206+01:00 mobile-009 kernel: amdgpu 0000:c2:00.0: amdgpu: SMU: I’m not done with your previous command: SMN_C2PMSG_66:0x0000001A SMN_C2PMSG_82:0x00000000
2025-12-04T10:18:53.408230+01:00 mobile-009 kernel: amdgpu 0000:c2:00.0: amdgpu: Failed to disable gfxoff!**

More context (a longer log sample):

2025-12-04T10:18:47.568273+01:00 mobile-009 kernel: \[drm:amdgpu_job_timedout \[amdgpu\]\] *ERROR* ring sdma0 timeout, signaled seq=3503, emitted seq=3505
2025-12-04T10:18:47.568305+01:00 mobile-009 kernel: \[drm:amdgpu_job_timedout \[amdgpu\]\] *ERROR* Process information: process  pid 0 thread  pid 0
2025-12-04T10:18:47.568307+01:00 mobile-009 kernel: amdgpu 0000:c2:00.0: amdgpu: GPU reset begin!
2025-12-04T10:18:53.408206+01:00 mobile-009 kernel: amdgpu 0000:c2:00.0: amdgpu: SMU: I’m not done with your previous command: SMN_C2PMSG_66:0x0000001A SMN_C2PMSG_82:0x00000000
2025-12-04T10:18:53.408230+01:00 mobile-009 kernel: amdgpu 0000:c2:00.0: amdgpu: Failed to disable gfxoff!
2025-12-04T10:18:55.437219+01:00 mobile-009 kernel: \[drm\] psp gfx command INVOKE_CMD(0x3) failed and response status is (0x0)
2025-12-04T10:18:57.437211+01:00 mobile-009 kernel: \[drm\] psp gfx command INVOKE_CMD(0x3) failed and response status is (0x0)
2025-12-04T10:18:59.861397+01:00 mobile-009 kernel: ------------\[ cut here \]------------
2025-12-04T10:18:59.861419+01:00 mobile-009 kernel: WARNING: CPU: 6 PID: 597 at drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn314/dcn314_smu.c:159 dcn314_smu_send_msg_with_param+0x11d/0x1c0 \[amdgpu\]
2025-12-04T10:18:59.862524+01:00 mobile-009 kernel: Modules linked in: vhost_net vhost vhost_iotlb tap exfat ccm rfcomm cmac algif_hash algif_skcipher af_alg xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 snd_seq_dummy snd_hrtimer xt_tcpudp nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables bridge stp llc qrtr bnep sch_fq_codel msr evdi(O) intel_rapl_msr intel_rapl_common snd_sof_amd_acp63 snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp edac_mce_amd snd_sof_pci snd_sof_xtensa_dsp kvm_amd snd_hda_codec_realtek snd_sof snd_hda_codec_generic snd_hda_codec_hdmi snd_sof_utils kvm snd_hda_intel snd_soc_core snd_intel_dspcfg binfmt_misc irqbypass snd_compress snd_intel_sdw_acpi crct10dif_pclmul ac97_bus snd_hda_codec polyval_clmulni snd_pcm_dmaengine snd_hda_core polyval_generic snd_pci_ps mt7921e snd_hwdep uvcvideo btusb ghash_clmulni_intel snd_rpl_pci_acp6x mt7921_common snd_seq_midi videobuf2_vmalloc btrtl sha256_ssse3 snd_acp_pci mt792x_lib snd_seq_midi_event uvc btintel
2025-12-04T10:18:59.862535+01:00 mobile-009 kernel:  sha1_ssse3 nls_iso8859_1 snd_acp_legacy_common mt76_connac_lib hid_sensor_als snd_rawmidi videobuf2_memops btbcm aesni_intel snd_pci_acp6x hid_sensor_trigger mt76 videobuf2_v4l2 btmtk industrialio_triggered_buffer snd_seq snd_pcm crypto_simd videodev bluetooth mac80211 kfifo_buf snd_seq_device snd_pci_acp5x cryptd amd_pmf videobuf2_common hid_sensor_iio_common snd_rn_pci_acp3x ecdh_generic cros_ec_lpcs snd_timer amdgpu ecc snd_acp_config cros_ec mc amdtee rapl industrialio wmi_bmof cfg80211 snd_soc_acpi snd amd_sfh k10temp snd_pci_acp3x i2c_piix4 soundcore libarc4 ccp tee joydev amd_pmc input_leds platform_profile mac_hid amdxcp drm_exec gpu_sched drm_buddy drm_suballoc_helper drm_ttm_helper ttm drm_display_helper cec rc_core i2c_algo_bit parport_pc ppdev lp parport efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 usbhid uas usb_storage hid_multitouch hid_sensor_hub
2025-12-04T10:18:59.862536+01:00 mobile-009 kernel:  hid_generic nvme ucsi_acpi nvme_core i2c_hid_acpi xhci_pci typec_ucsi crc32_pclmul video thunderbolt xhci_pci_renesas i2c_hid nvme_auth typec wmi hid
2025-12-04T10:18:59.862538+01:00 mobile-009 kernel: CPU: 6 PID: 597 Comm: kworker/u32:7 Tainted: G           O       6.8.0-88-generic #89-Ubuntu
2025-12-04T10:18:59.862540+01:00 mobile-009 kernel: Hardware name: Framework Laptop 16 (AMD Ryzen 7040 Series)/FRANMZCP07, BIOS 03.07 08/27/2025
2025-12-04T10:18:59.862541+01:00 mobile-009 kernel: Workqueue: amdgpu-reset-dev drm_sched_job_timedout \[gpu_sched\]
2025-12-04T10:18:59.862542+01:00 mobile-009 kernel: RIP: 0010:dcn314_smu_send_msg_with_param+0x11d/0x1c0 \[amdgpu\]
2025-12-04T10:18:59.862543+01:00 mobile-009 kernel: Code: 41 5e 5d 31 d2 31 c9 31 f6 31 ff e9 d8 6d 52 c9 89 da 48 c7 c6 d8 32 b2 c1 48 c7 c7 e0 ec 6b c1 e8 68 b4 95 c8 e9 37 ff ff ff <0f> 0b 49 8b 3c 24 b9 80 84 1e 00 44 89 f2 44 89 ee e8 fd ae de ff
2025-12-04T10:18:59.862544+01:00 mobile-009 kernel: RSP: 0018:ffffd00f86e0b880 EFLAGS: 00010246
2025-12-04T10:18:59.862545+01:00 mobile-009 kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
2025-12-04T10:18:59.862546+01:00 mobile-009 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
2025-12-04T10:18:59.862546+01:00 mobile-009 kernel: RBP: ffffd00f86e0b8a0 R08: 0000000000000000 R09: 0000000000000000
2025-12-04T10:18:59.862547+01:00 mobile-009 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff8c50c80b7000
2025-12-04T10:18:59.862548+01:00 mobile-009 kernel: R13: 0000000000000012 R14: 0000000000000007 R15: ffff8c50cbc40000
2025-12-04T10:18:59.862549+01:00 mobile-009 kernel: FS:  0000000000000000(0000) GS:ffff8c677fd00000(0000) knlGS:0000000000000000
2025-12-04T10:18:59.862550+01:00 mobile-009 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2025-12-04T10:18:59.862551+01:00 mobile-009 kernel: CR2: 00001407a2f5e038 CR3: 000000011c8a8000 CR4: 0000000000f50ef0
2025-12-04T10:18:59.862552+01:00 mobile-009 kernel: PKRU: 55555554
2025-12-04T10:18:59.862553+01:00 mobile-009 kernel: Call Trace:
2025-12-04T10:18:59.862553+01:00 mobile-009 kernel:
2025-12-04T10:18:59.862554+01:00 mobile-009 kernel:  dcn314_smu_set_display_idle_optimization+0x6a/0x80 \[amdgpu\]
2025-12-04T10:18:59.863244+01:00 mobile-009 kernel:  ? dcn10_is_dig_enabled+0x44/0x80 \[amdgpu\]
2025-12-04T10:18:59.863274+01:00 mobile-009 kernel:  dcn314_update_clocks+0x489/0x590 \[amdgpu\]
2025-12-04T10:18:59.864257+01:00 mobile-009 kernel:  dcn20_optimize_bandwidth+0x143/0x290 \[amdgpu\]
2025-12-04T10:18:59.864261+01:00 mobile-009 kernel:  dc_commit_state_no_check+0x98b/0xdc0 \[amdgpu\]
2025-12-04T10:18:59.864262+01:00 mobile-009 kernel:  dc_commit_streams+0x312/0x6b0 \[amdgpu\]
2025-12-04T10:18:59.865267+01:00 mobile-009 kernel:  dm_suspend+0x274/0x2d0 \[amdgpu\]
2025-12-04T10:18:59.865277+01:00 mobile-009 kernel:  amdgpu_device_ip_suspend_phase1+0xb4/0x1c0 \[amdgpu\]
2025-12-04T10:18:59.865278+01:00 mobile-009 kernel:  amdgpu_device_ip_suspend+0x2a/0x80 \[amdgpu\]
2025-12-04T10:18:59.866215+01:00 mobile-009 kernel:  amdgpu_device_pre_asic_reset+0xd1/0x490 \[amdgpu\]
2025-12-04T10:18:59.866219+01:00 mobile-009 kernel:  amdgpu_device_gpu_recover+0x2f6/0x9b0 \[amdgpu\]
2025-12-04T10:18:59.866220+01:00 mobile-009 kernel:  amdgpu_job_timedout+0x182/0x270 \[amdgpu\]
2025-12-04T10:18:59.866225+01:00 mobile-009 kernel:  drm_sched_job_timedout+0x6d/0x110 \[gpu_sched\]
2025-12-04T10:18:59.866234+01:00 mobile-009 kernel:  ? wake_up_process+0x15/0x30
2025-12-04T10:18:59.866242+01:00 mobile-009 kernel:  process_one_work+0x181/0x3a0
2025-12-04T10:18:59.866244+01:00 mobile-009 kernel:  worker_thread+0x306/0x440
2025-12-04T10:18:59.866249+01:00 mobile-009 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
2025-12-04T10:18:59.867180+01:00 mobile-009 kernel:  ? \_raw_spin_lock_irqsave+0xe/0x20
2025-12-04T10:18:59.867183+01:00 mobile-009 kernel:  ? \__pfx_worker_thread+0x10/0x10
2025-12-04T10:18:59.867184+01:00 mobile-009 kernel:  kthread+0xef/0x120
2025-12-04T10:18:59.867185+01:00 mobile-009 kernel:  ? \__pfx_kthread+0x10/0x10
2025-12-04T10:18:59.867185+01:00 mobile-009 kernel:  ret_from_fork+0x44/0x70
2025-12-04T10:18:59.867186+01:00 mobile-009 kernel:  ? \__pfx_kthread+0x10/0x10
2025-12-04T10:18:59.867186+01:00 mobile-009 kernel:  ret_from_fork_asm+0x1b/0x30
2025-12-04T10:18:59.867187+01:00 mobile-009 kernel:
2025-12-04T10:18:59.867187+01:00 mobile-009 kernel: —\[ end trace 0000000000000000 \]—
2025-12-04T10:19:02.416199+01:00 mobile-009 kernel: \[drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 \[amdgpu\]\] *ERROR* MES failed to response msg=3
2025-12-04T10:19:02.416224+01:00 mobile-009 kernel: \[drm:amdgpu_mes_unmap_legacy_queue \[amdgpu\]\] *ERROR* failed to unmap legacy queue
2025-12-04T10:19:02.561177+01:00 mobile-009 kernel: \[drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 \[amdgpu\]\] *ERROR* MES failed to response msg=3
2025-12-04T10:19:02.561200+01:00 mobile-009 kernel: \[drm:amdgpu_mes_unmap_legacy_queue \[amdgpu\]\] *ERROR* failed to unmap legacy queue
2025-12-04T10:19:02.705201+01:00 mobile-009 kernel: \[drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 \[amdgpu\]\] *ERROR* MES failed to response msg=3
2025-12-04T10:19:02.705212+01:00 mobile-009 kernel: \[drm:amdgpu_mes_unmap_legacy_queue \[amdgpu\]\] *ERROR* failed to unmap legacy queue
2025-12-04T10:19:02.851174+01:00 mobile-009 kernel: \[drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 \[amdgpu\]\] *ERROR* MES failed to response msg=3
2025-12-04T10:19:02.851183+01:00 mobile-009 kernel: \[drm:amdgpu_mes_unmap_legacy_queue \[amdgpu\]\] *ERROR* failed to unmap legacy queue
2025-12-04T10:19:02.996174+01:00 mobile-009 kernel: \[drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 \[amdgpu\]\] *ERROR* MES failed to response msg=3
2025-12-04T10:19:02.996182+01:00 mobile-009 kernel: \[drm:amdgpu_mes_unmap_legacy_queue \[amdgpu\]\] *ERROR* failed to unmap legacy queue
2025-12-04T10:19:03.140188+01:00 mobile-009 kernel: \[drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 \[amdgpu\]\] *ERROR* MES failed to response msg=3
2025-12-04T10:19:03.140211+01:00 mobile-009 kernel: \[drm:amdgpu_mes_unmap_legacy_queue \[amdgpu\]\] *ERROR* failed to unmap legacy queue
2025-12-04T10:19:03.285328+01:00 mobile-009 kernel: \[drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 \[amdgpu\]\] *ERROR* MES failed to response msg=3
2025-12-04T10:19:03.285338+01:00 mobile-009 kernel: \[drm:amdgpu_mes_unmap_legacy_queue \[amdgpu\]\] *ERROR* failed to unmap legacy queue
2025-12-04T10:19:03.429241+01:00 mobile-009 kernel: \[drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 \[amdgpu\]\] *ERROR* MES failed to response msg=3
2025-12-04T10:19:03.429255+01:00 mobile-009 kernel: \[drm:amdgpu_mes_unmap_legacy_queue \[amdgpu\]\] *ERROR* failed to unmap legacy queue
2025-12-04T10:19:03.575176+01:00 mobile-009 kernel: \[drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 \[amdgpu\]\] *ERROR* MES failed to response msg=3
2025-12-04T10:19:03.575185+01:00 mobile-009 kernel: \[drm:amdgpu_mes_unmap_legacy_queue \[amdgpu\]\] *ERROR* failed to unmap legacy queue
2025-12-04T10:19:03.577205+01:00 mobile-009 kernel: amdgpu 0000:c2:00.0: amdgpu: MODE2 reset
2025-12-04T10:19:09.210223+01:00 mobile-009 kernel: amdgpu 0000:c2:00.0: amdgpu: SMU: I’m not done with your previous command: SMN_C2PMSG_66:0x0000001A SMN_C2PMSG_82:0x00000000
2025-12-04T10:19:09.210250+01:00 mobile-009 kernel: amdgpu 0000:c2:00.0: amdgpu: Mode2 reset failed!
2025-12-04T10:19:09.210252+01:00 mobile-009 kernel: amdgpu 0000:c2:00.0: amdgpu: ASIC reset failed with error, -62 for drm dev, 0000:c2:00.0
2025-12-04T10:19:09.210253+01:00 mobile-009 kernel: amdgpu 0000:c2:00.0: amdgpu: GPU reset(1) failed
2025-12-04T10:19:09.210255+01:00 mobile-009 kernel: amdgpu 0000:c2:00.0: amdgpu: GPU reset end with ret = -62
2025-12-04T10:19:09.210257+01:00 mobile-009 kernel: \[drm:amdgpu_job_timedout \[amdgpu\]\] *ERROR* GPU Recovery Failed: -62
2025-12-04T10:19:19.312215+01:00 mobile-009 kernel: \[drm:amdgpu_job_timedout \[amdgpu\]\] *ERROR* ring sdma0 timeout, signaled seq=3505, emitted seq=3507
2025-12-04T10:19:19.312237+01:00 mobile-009 kernel: \[drm:amdgpu_job_timedout \[amdgpu\]\] *ERROR* Process information: process  pid 0 thread  pid 0
2025-12-04T10:19:19.312238+01:00 mobile-009 kernel: amdgpu 0000:c2:00.0: amdgpu: GPU reset begin!
2025-12-04T10:19:19.312238+01:00 mobile-009 kernel: amdgpu 0000:c2:00.0: amdgpu: Failed to disallow df cstate

I am using the HWE (Hardware Enablement) kernel which enables me to a way more recent kernel and drivers.
And I don’t have that issue showing up. Currently under: 6.14.0-36-generic

You could check your upgrade log and look for recent updates to amdgpu or drm with something like:

cat /var/log/dpkg.log | grep ' install \| upgrade ' | grep amdgpu
2025-12-02 11:30:32 upgrade xserver-xorg-video-amdgpu:amd64 23.0.0-1build1 23.0.0-1ubuntu0.24.04.1

You may have go back through the log files a bit. Edit the \var\log\dpkg.log file name above to something like \var\log\dpkg.log.x[.gz] (x being an integer from 1 to 7 - for 2 and above, add .gz and use zcat):

cat /var/log/dpkg.log.1 | grep ' install \| upgrade ' | grep amdgpu
2025-11-20 07:33:38 upgrade libdrm-amdgpu1:amd64 2.4.122-1~ubuntu0.24.04.1 2.4.122-1~ubuntu0.24.04.2

or

zcat /var/log/dpkg.log.2.gz | grep ' install \| upgrade ' | grep amdgpu

If there are recent updates, you could try rolling back the update and seeing if the problem continues to occur.

The output above is from my kubuntu 24.04 lts install (using X11, not wayland) on an framework 13 intel ultra 125H device (I do not have display issues). Note the December 2 2025 update to xserver-xorg-video-amdgpu that happened on my system.

I’ve been following the display issues framework users are reporting on this forum and happened to notice the recent updates to drm and amdgpu. If your willing, would you mind reporting back if you are using X11 or wayland? I’d check your dmesg output above but its difficult to read with the formatting. You can enclose the output between pairs of triple backtics (```) to make it easier to read.

1 Like

Can you still ssh into it and look at the logs? But as the other person said, you should try a more recent kernel, or at least a different one. If you updated the kennel or firmware packages on that day when the issues started, roll them back to the previous version.

Thanks everybody for your comments and suggestions!
I improved the formatting of the logs, thanks for noting.

@sn6526
I noticed this “xserver-xorg-video-amdgpu” update too, but as I’m running wayland I suspect this has nothing to do with my issue. The libdrm update is too far in the past to really explain this … I am still puzzled what happened.
Thanks for pointing out the unreadable log text!

@Jorg_Mertin, @jared_kidd
I will try upgrading the kernel if other measures won’t help. “Other measures” means I am currently trying to set options for amdgpu to change the behaviour. A couple hours ago I disabled “gfxoff”, and since that the machine is running stable - but as you all know with these random sort of low-level issues this doesn’t mean a lot yet.
I will try using amdgpu options, if that won’t help I will forward roll the kernel as you suggest. Currently I am hesitating because new kernel potentially means new problems, etc.
Unfortunately I can’t ssh into the machine, as I am travelling and simply don’t have a second computer with me.

I will update this thread with the progress.

Thanks frameworkers! :+1:

2 Likes