Fedora 40 on the Framework Laptop 13

Nearly 2 months later and still getting hard unrecoverable freezes under Linux:

2024-07-10T04:41:02.084839Z     [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring vcn_unified_0 timeout, signaled seq=792, emitted seq=794
2024-07-10T04:41:02.085072Z     [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process RDD Process pid 5302 thread firefox-bi:cs0 pid 12020
2024-07-10T04:41:02.085097Z     amdgpu 0000:c1:00.0: amdgpu: GPU reset begin!
2024-07-10T04:41:08.333876Z     amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001A SMN_C2PMSG_82:0x00000000
2024-07-10T04:41:08.334688Z     amdgpu 0000:c1:00.0: amdgpu: Failed to disable gfxoff!
2024-07-10T04:41:08.926855Z     [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
2024-07-10T04:41:09.223852Z     [drm] Register(0) [regUVD_RB_RPTR] failed to reach value 0x00000000 != 0x00000380n
2024-07-10T04:41:10.718284Z     [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
2024-07-10T04:41:10.718404Z     ------------[ cut here ]------------
2024-07-10T04:41:10.718426Z     WARNING: CPU: 6 PID: 34205 at drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn314/dcn314_smu.c:159 dcn314_smu_send_msg_with_param+0x108/0x190 [amdgpu]
2024-07-10T04:41:10.718453Z     Modules linked in: hid_logitech_hidpp snd_usb_audio snd_usbmidi_lib snd_ump snd_rawmidi hid_logitech_dj mc r8153_ecm cdc_ether usbnet r8152 mii vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd virtiofs tun xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nft_compat nf_nat_tftp nf_conntrack_tftp bridge stp llc uinput rfcomm snd_seq_dummy snd_hrtimer uhid nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables qrtr bnep sunrpc binfmt_misc vfat fat snd_sof_amd_acp63 snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_pci_ps snd_amd_sdw_acpi snd_hda_codec_realtek soundwire_amd soundwire_generic_allocation snd_hda_codec_generic soundwire_bus snd_hda_scodec_component snd_hda_codec_hdmi snd_soc_core btusb intel_rapl_msr mt7921e btrtl amd_atl
2024-07-10T04:41:10.718532Z      btintel mt7921_common intel_rapl_common snd_hda_intel btbcm snd_compress mt792x_lib snd_intel_dspcfg btmtk edac_mce_amd ac97_bus snd_intel_sdw_acpi mt76_connac_lib snd_pcm_dmaengine snd_hda_codec bluetooth snd_rpl_pci_acp6x mt76 snd_acp_pci kvm_amd snd_hda_core cros_ec_lpcs uas snd_acp_legacy_common usb_storage cros_ec snd_pci_acp6x snd_hwdep hid_sensor_als kvm mac80211 snd_seq hid_sensor_trigger wmi_bmof snd_seq_device hid_sensor_iio_common rapl industrialio_triggered_buffer kfifo_buf libarc4 pcspkr snd_pcm industrialio snd_pci_acp5x snd_rn_pci_acp3x cfg80211 snd_acp_config snd_timer snd_soc_acpi snd amd_pmf thunderbolt soundcore snd_pci_acp3x amdtee k10temp amd_sfh rfkill i2c_piix4 tee platform_profile amd_pmc joydev loop nfnetlink zram dm_crypt amdgpu amdxcp i2c_algo_bit drm_ttm_helper ttm crct10dif_pclmul drm_exec crc32_pclmul crc32c_intel polyval_clmulni gpu_sched polyval_generic nvme drm_suballoc_helper drm_buddy nvme_core ghash_clmulni_intel drm_display_helper sha512_ssse3 hid_multitouch video
2024-07-10T04:41:10.718564Z      sha256_ssse3 ucsi_acpi hid_sensor_hub ccp cec typec_ucsi sha1_ssse3 nvme_auth sp5100_tco typec wmi i2c_hid_acpi i2c_hid serio_raw ip6_tables ip_tables fuse i2c_dev
2024-07-10T04:41:10.718583Z     CPU: 6 PID: 34205 Comm: kworker/u64:2 Not tainted 6.9.7-200.fc40.x86_64 #1
2024-07-10T04:41:10.718606Z     Hardware name: Framework Laptop 13 (AMD Ryzen 7040Series)/FRANMDCP07, BIOS 03.05 03/29/2024
2024-07-10T04:41:10.718623Z     Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
2024-07-10T04:41:10.718937Z     RIP: 0010:dcn314_smu_send_msg_with_param+0x108/0x190 [amdgpu]
2024-07-10T04:41:10.718978Z     Code: be 93 62 01 00 5d 41 5c 41 5d e9 b3 7c de ff 44 89 ea 48 c7 c6 08 c5 3f c1 48 c7 c7 c0 aa f2 c0 e8 4d d3 e7 e4 e9 48 ff ff ff <0f> 0b 48 8b 3b b9 80 84 1e 00 44 89 e2 89 ee e8 74 30 df ff eb b5
2024-07-10T04:41:10.718999Z     RSP: 0018:ffffb4e40117f8b8 EFLAGS: 00010246
2024-07-10T04:41:10.719022Z     RAX: 0000b26e6a2c8b6b RBX: ffff9b3fc5bec400 RCX: 0000000000000006
2024-07-10T04:41:10.719044Z     RDX: 0000000000008ad5 RSI: 00000000000080a9 RDI: 0000b26e6a2c0096
2024-07-10T04:41:10.719062Z     RBP: 000000000000000d R08: 0000000000000000 R09: ffffb4e40117f830
2024-07-10T04:41:10.719079Z     R10: 0000000000000000 R11: 0000000000010000 R12: 0000000000000000
2024-07-10T04:41:10.719096Z     R13: 0000000000000000 R14: ffff9b3fd0049ff8 R15: ffff9b4446200908
2024-07-10T04:41:10.719113Z     FS:  0000000000000000(0000) GS:ffff9b4e21d00000(0000) knlGS:0000000000000000
2024-07-10T04:41:10.719131Z     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2024-07-10T04:41:10.719148Z     CR2: 00007f1b00de1000 CR3: 0000000afe428000 CR4: 0000000000f50ef0
2024-07-10T04:41:10.719165Z     PKRU: 55555554
2024-07-10T04:41:10.719182Z     Call Trace:
2024-07-10T04:41:10.719199Z      <TASK>
2024-07-10T04:41:10.720108Z      ? dcn314_smu_send_msg_with_param+0x108/0x190 [amdgpu]
2024-07-10T04:41:10.72015Z       ? __warn.cold+0x8e/0xe8
2024-07-10T04:41:10.720176Z      ? dcn314_smu_send_msg_with_param+0x108/0x190 [amdgpu]
2024-07-10T04:41:10.720207Z      ? handle_bug+0x3c/0x80
2024-07-10T04:41:10.720223Z      ? exc_invalid_op+0x17/0x70
2024-07-10T04:41:10.72024Z       ? asm_exc_invalid_op+0x1a/0x20
2024-07-10T04:41:10.720256Z      ? dcn314_smu_send_msg_with_param+0x108/0x190 [amdgpu]
2024-07-10T04:41:10.720843Z      ? dcn314_smu_send_msg_with_param+0xae/0x190 [amdgpu]
2024-07-10T04:41:10.720885Z      link_set_dpms_off+0xfe/0x980 [amdgpu]
2024-07-10T04:41:10.720904Z      ? srso_alias_return_thunk+0x5/0xfbef5
2024-07-10T04:41:10.72194Z       ? generic_reg_set_ex+0xa8/0xf0 [amdgpu]
2024-07-10T04:41:10.721986Z      ? srso_alias_return_thunk+0x5/0xfbef5
2024-07-10T04:41:10.722001Z      ? optc31_set_drr+0x128/0x1d0 [amdgpu]
2024-07-10T04:41:10.722019Z      dcn31_reset_hw_ctx_wrap+0x218/0x440 [amdgpu]
2024-07-10T04:41:10.722971Z      dce110_apply_ctx_to_hw+0x4e/0x320 [amdgpu]
2024-07-10T04:41:10.723012Z      dc_commit_state_no_check+0x618/0x1960 [amdgpu]
2024-07-10T04:41:10.723032Z      dc_commit_streams+0x299/0x5b0 [amdgpu]
2024-07-10T04:41:10.72305Z       ? srso_alias_return_thunk+0x5/0xfbef5
2024-07-10T04:41:10.723851Z      dm_suspend+0x214/0x270 [amdgpu]
2024-07-10T04:41:10.723887Z      amdgpu_device_ip_suspend_phase1+0x9a/0x180 [amdgpu]
2024-07-10T04:41:10.723908Z      amdgpu_device_ip_suspend+0x29/0x70 [amdgpu]
2024-07-10T04:41:10.724843Z      amdgpu_device_pre_asic_reset+0xcd/0x420 [amdgpu]
2024-07-10T04:41:10.724884Z      amdgpu_device_gpu_recover.cold+0x475/0xb44 [amdgpu]
2024-07-10T04:41:10.724908Z      amdgpu_job_timedout+0x18e/0x1d0 [amdgpu]
2024-07-10T04:41:10.724927Z      drm_sched_job_timedout+0x73/0x100 [gpu_sched]
2024-07-10T04:41:10.724991Z      process_one_work+0x186/0x340
2024-07-10T04:41:10.725039Z      worker_thread+0x278/0x3b0
2024-07-10T04:41:10.725067Z      ? __pfx_worker_thread+0x10/0x10
2024-07-10T04:41:10.725092Z      kthread+0xcf/0x100
2024-07-10T04:41:10.725117Z      ? __pfx_kthread+0x10/0x10
2024-07-10T04:41:10.725142Z      ret_from_fork+0x31/0x50
2024-07-10T04:41:10.725166Z      ? __pfx_kthread+0x10/0x10
2024-07-10T04:41:10.725185Z      ret_from_fork_asm+0x1a/0x30
2024-07-10T04:41:10.725209Z      </TASK>
2024-07-10T04:41:10.725234Z     ---[ end trace 0000000000000000 ]---
2024-07-10T04:41:10.823897Z     [drm] DMUB HPD IRQ callback: link_index=7
2024-07-10T04:41:11.811864Z     [drm] DMUB HPD IRQ callback: link_index=7
2024-07-10T04:41:15.468888Z     amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001A SMN_C2PMSG_82:0x00000000
2024-07-10T04:41:15.469606Z     amdgpu 0000:c1:00.0: amdgpu: Failed to power gate VCN!
2024-07-10T04:41:15.46985Z      [drm:vcn_v4_0_stop [amdgpu]] *ERROR* Dpm disable uvd failed, ret = -62. 
2024-07-10T04:41:18.156869Z     [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
2024-07-10T04:41:18.157041Z     [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
2024-07-10T04:41:18.29886Z      [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
2024-07-10T04:41:18.298999Z     [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
2024-07-10T04:41:18.440978Z     [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
2024-07-10T04:41:18.441112Z     [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
2024-07-10T04:41:18.583856Z     [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
2024-07-10T04:41:18.583956Z     [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
2024-07-10T04:41:18.725833Z     [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
2024-07-10T04:41:18.725946Z     [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
2024-07-10T04:41:18.867833Z     [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
2024-07-10T04:41:18.867899Z     [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
2024-07-10T04:41:19.010147Z     [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
2024-07-10T04:41:19.010275Z     [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
2024-07-10T04:41:19.152852Z     [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
2024-07-10T04:41:19.152909Z     [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
2024-07-10T04:41:19.29485Z      [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
2024-07-10T04:41:19.294916Z     [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
2024-07-10T04:41:19.296847Z     amdgpu 0000:c1:00.0: amdgpu: MODE2 reset
2024-07-10T04:41:23.28689Z      ACPI Error: Aborting method \_SB.A018 due to previous error (AE_AML_LOOP_TIMEOUT) (20230628/psparse-529)
2024-07-10T04:41:23.287222Z     ACPI Error: Aborting method \_SB.ALIB due to previous error (AE_AML_LOOP_TIMEOUT) (20230628/psparse-529)
2024-07-10T04:41:23.287842Z     ACPI Error: Aborting method \_SB.PCI0.GP19.NHI0.PPS3 due to previous error (AE_AML_LOOP_TIMEOUT) (20230628/psparse-529)
2024-07-10T04:41:23.2879Z       ACPI Error: Aborting method \_SB.PCI0.GP19.NHI0._PS3 due to previous error (AE_AML_LOOP_TIMEOUT) (20230628/psparse-529)
2024-07-10T04:41:24.788851Z     amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001A SMN_C2PMSG_82:0x00000000
2024-07-10T04:41:24.789484Z     amdgpu 0000:c1:00.0: amdgpu: Mode2 reset failed!
2024-07-10T04:41:24.789724Z     amdgpu 0000:c1:00.0: amdgpu: ASIC reset failed with error, -62 for drm dev, 0000:c1:00.0
2024-07-10T04:41:24.790172Z     amdgpu 0000:c1:00.0: amdgpu: GPU reset(1) failed
2024-07-10T04:41:24.790546Z     amdgpu 0000:c1:00.0: amdgpu: GPU reset end with ret = -62
2024-07-10T04:41:24.790898Z     [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -62

I’ve set up a logging server since none of this ever perists in the journal once it locks up, but existing networking and processes (like sound in the background) seems to be OK.

Other notes:

  • Is extremely reproducible if I let a LibreOffice presentation run on a loop and have hardware acceleration enabled
  • Never seems to happen under load - only if using the machine lightly, and almost always if it’s Firefox or LibreOffice
  • Seems to be completelty independent of power profile that’s set, even if I set the performance level of the GPU directly
  • Does not need any external display attached, and doesn’t seem to matter what expansion cards I use
  • Displays will either lock up completely and show the last static image, or just go black
  • Ctrl+Alt+F# does not function - even capslock doesn’t light up - the thing is mostly dead
  • Windows is completely unaffected - if I’m doing something important I’ve switched to using that for the time being

Is there anything at all I can do to further diagnose this? Should I raise a support case?

1 Like