Fedora 40 on the Framework Laptop 13

PSA for folks experiencing freezes: please consider booting with a previous version of the kernel. YMMW, but I have experienced freezing issues twice since I’ve owned a Framework, and both times booting with a previous version fixed the issue for me.

As of this note, I get a blank screen with an unresponsive computer on wake-from-sleep frequently with the latest version of the kernel on Fedora (6.8.9-300.fc40). However, I’ve not experienced a freeze running with version 6.8.8-300.fc40.x86_64.

The Grub2 boot menu can be enabled as follows:

sudo grub2-editenv - unset menu_auto_hide

With that, you can choose the last known “good” kernel for your env to give it a shot, and, if it works, hold out until, hopefully, a future kernel update fixes the issue.

It seems that there is an issue when you plug HMDI device it going to be default output and if you did not change it before unplug the HDMI it will not play sound on the FW speakers.

EDIT: the problem for me is no longer occurring after latest updates.

There is an issue that affects color accuracy on the Fedora 40 & Framework 13 AMD when using the default power-profile-daemon. It now decreases color accuracy to save power in power saver and balanced profiles.

Currently, there is no direct way to disable this feature through the GUI. It may be beneficial to include this information in the Fedora 40 setup guide Fedora40-amd-fw13.md so that users are aware of this feature and know how to disable it if they prefer.

This sounds quite similar (but not exactly) like my issue:
When I boot F40 on my framework and have the dock attached while sddm runs, it always hard-freezes the first time. A hard power-off and reboot almost always works. Sounds crazy, but thats how it behaves.
I tried plugging the dock in first, before power up, and also tried letting it boot to sddm without the dock first, and plugging in the dock just when the login prompt appears. Both show the same behaviour. But I get dumped kernel panics in journald, its always the exact same spot:

Jul 06 10:15:31 fedora kernel: BUG: kernel NULL pointer dereference, address: 0000000000000920
Jul 06 10:15:31 fedora kernel: #PF: supervisor read access in kernel mode
Jul 06 10:15:31 fedora kernel: #PF: error_code(0x0000) - not-present page
Jul 06 10:15:31 fedora kernel: PGD 0 P4D 0 
Jul 06 10:15:31 fedora kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Jul 06 10:15:31 fedora kernel: CPU: 14 PID: 1855 Comm: kwin_wayland Not tainted 6.9.6-200.fc40.x86_64 #1
Jul 06 10:15:31 fedora kernel: Hardware name: Framework Laptop 13 (AMD Ryzen 7040Series)/FRANMDCP07, BIOS 03.05 03/29/2024
Jul 06 10:15:31 fedora kernel: RIP: 0010:is_dsc_need_re_compute+0xef/0x3b0 [amdgpu]
Jul 06 10:15:31 fedora kernel: Code: 04 dc 48 85 c0 74 dd 48 39 50 08 75 d7 48 8b a8 60 04 00 00 48 85 ed 74 cb 48 83 bd f8 11 00 00 00 74 c1 48 8b 85 e8 07 00 00 <80> b8 20 09 00 00 00 75 10 48 8b 85 f0 07 00 00 f6 80 84 02 00 00
Jul 06 10:15:31 fedora kernel: RSP: 0018:ffffa48985257960 EFLAGS: 00010286
Jul 06 10:15:31 fedora kernel: RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff90da8d080000
Jul 06 10:15:31 fedora kernel: RDX: ffff90da8bb49800 RSI: 0000000000000001 RDI: ffffa489852579b0
Jul 06 10:15:31 fedora kernel: RBP: ffff90da8a1fe000 R08: ffffa48985257980 R09: 0000000000000000
Jul 06 10:15:31 fedora kernel: R10: ffffa48985257a22 R11: ffff90da9cc43800 R12: ffff90dad1800000
Jul 06 10:15:31 fedora kernel: R13: ffff90daaf38a080 R14: 0000000000000000 R15: 0000000000000003
Jul 06 10:15:31 fedora kernel: FS:  00007f0f1dda5b40(0000) GS:ffff90e1de900000(0000) knlGS:0000000000000000
Jul 06 10:15:31 fedora kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 06 10:15:31 fedora kernel: CR2: 0000000000000920 CR3: 000000013041e000 CR4: 0000000000f50ef0
Jul 06 10:15:31 fedora kernel: PKRU: 55555554
Jul 06 10:15:31 fedora kernel: Call Trace:
Jul 06 10:15:31 fedora kernel:  <TASK>
Jul 06 10:15:31 fedora kernel:  ? __die_body.cold+0x19/0x27
Jul 06 10:15:31 fedora kernel:  ? page_fault_oops+0x15a/0x2c0
Jul 06 10:15:31 fedora kernel:  ? exc_page_fault+0x7e/0x180
Jul 06 10:15:31 fedora kernel:  ? asm_exc_page_fault+0x26/0x30
Jul 06 10:15:31 fedora kernel:  ? is_dsc_need_re_compute+0xef/0x3b0 [amdgpu]
Jul 06 10:15:31 fedora kernel:  pre_validate_dsc+0x22a/0x730 [amdgpu]
Jul 06 10:15:31 fedora kernel:  ? dm_update_crtc_state+0x420/0x810 [amdgpu]
Jul 06 10:15:31 fedora kernel:  amdgpu_dm_atomic_check+0x8f0/0x1570 [amdgpu]
Jul 06 10:15:31 fedora kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jul 06 10:15:31 fedora kernel:  ? __ww_mutex_lock.constprop.0+0x5b/0x9a0
Jul 06 10:15:31 fedora kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jul 06 10:15:31 fedora kernel:  drm_atomic_check_only+0x633/0xae0
Jul 06 10:15:31 fedora kernel:  drm_mode_atomic_ioctl+0x853/0xd20
Jul 06 10:15:31 fedora kernel:  ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
Jul 06 10:15:31 fedora kernel:  drm_ioctl_kernel+0xb0/0x100
Jul 06 10:15:31 fedora kernel:  drm_ioctl+0x28b/0x540
Jul 06 10:15:31 fedora kernel:  ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
Jul 06 10:15:31 fedora kernel:  amdgpu_drm_ioctl+0x4e/0x90 [amdgpu]
Jul 06 10:15:31 fedora kernel:  __x64_sys_ioctl+0x94/0xd0
Jul 06 10:15:31 fedora kernel:  do_syscall_64+0x82/0x160
Jul 06 10:15:31 fedora kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jul 06 10:15:31 fedora kernel:  ? syscall_exit_to_user_mode+0x75/0x230
Jul 06 10:15:31 fedora kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jul 06 10:15:31 fedora kernel:  ? do_syscall_64+0x8e/0x160
Jul 06 10:15:31 fedora kernel:  ? do_syscall_64+0x8e/0x160
Jul 06 10:15:31 fedora kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jul 06 10:15:31 fedora kernel:  ? syscall_exit_to_user_mode+0x75/0x230
Jul 06 10:15:31 fedora kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jul 06 10:15:31 fedora kernel:  ? do_syscall_64+0x8e/0x160
Jul 06 10:15:31 fedora kernel:  ? do_syscall_64+0x8e/0x160
Jul 06 10:15:31 fedora kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jul 06 10:15:31 fedora kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jul 06 10:15:31 fedora kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
Jul 06 10:15:31 fedora kernel: RIP: 0033:0x7f0f23f26d5d
Jul 06 10:15:31 fedora kernel: Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
Jul 06 10:15:31 fedora kernel: RSP: 002b:00007ffde49ff3e0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Jul 06 10:15:31 fedora kernel: RAX: ffffffffffffffda RBX: 000055e15a03de60 RCX: 00007f0f23f26d5d
Jul 06 10:15:31 fedora kernel: RDX: 00007ffde49ff4d0 RSI: 00000000c03864bc RDI: 0000000000000011
Jul 06 10:15:31 fedora kernel: RBP: 00007ffde49ff430 R08: 000055e15a03e088 R09: 0000000000000007
Jul 06 10:15:31 fedora kernel: R10: 000055e15a03e010 R11: 0000000000000246 R12: 00007ffde49ff4d0
Jul 06 10:15:31 fedora kernel: R13: 00000000c03864bc R14: 0000000000000011 R15: 0000000000000004
Jul 06 10:15:31 fedora kernel:  </TASK>
Jul 06 10:15:31 fedora kernel: Modules linked in: snd_usb_audio snd_usbmidi_lib snd_ump snd_rawmidi mc r8153_ecm cdc_ether usbnet r8152 mii rfcomm snd_seq_dummy snd_hrtimer nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 n>
Jul 06 10:15:31 fedora kernel:  snd_rpl_pci_acp6x mac80211 snd_acp_pci snd_hda_core snd_acp_legacy_common cros_ec_lpcs snd_pci_acp6x snd_hwdep cros_ec snd_seq snd_seq_device hid_sensor_als libarc4 hid_sensor_trigger snd_pci_acp5x snd_pcm rapl hid_sensor_iio_common industrialio_triggered_buffer wmi_bmof cfg80211 k>
Jul 06 10:15:31 fedora kernel: CR2: 0000000000000920
Jul 06 10:15:31 fedora kernel: ---[ end trace 0000000000000000 ]---

I have multiple of these dumps and all of them crash in is_dsc_need_re_compute+0xef/0xb. There have been some additional fixes to that function in 6.10 that could (in my basic understanding of c) prevent null dereferences. I’ll likely wait for that to hit f40 and see if it fixes the issue.

I have the same issue. Unplugging the dock and rebooting after an update and then plugging it back in…just works. Laving it plugged in and you run into an issue where it slow boots, hits the display manager and never get to the graphical target. Ran into this same problem months ago, and it ended up being one of the xdg packages.

Nearly 2 months later and still getting hard unrecoverable freezes under Linux:

2024-07-10T04:41:02.084839Z     [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring vcn_unified_0 timeout, signaled seq=792, emitted seq=794
2024-07-10T04:41:02.085072Z     [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process RDD Process pid 5302 thread firefox-bi:cs0 pid 12020
2024-07-10T04:41:02.085097Z     amdgpu 0000:c1:00.0: amdgpu: GPU reset begin!
2024-07-10T04:41:08.333876Z     amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001A SMN_C2PMSG_82:0x00000000
2024-07-10T04:41:08.334688Z     amdgpu 0000:c1:00.0: amdgpu: Failed to disable gfxoff!
2024-07-10T04:41:08.926855Z     [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
2024-07-10T04:41:09.223852Z     [drm] Register(0) [regUVD_RB_RPTR] failed to reach value 0x00000000 != 0x00000380n
2024-07-10T04:41:10.718284Z     [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
2024-07-10T04:41:10.718404Z     ------------[ cut here ]------------
2024-07-10T04:41:10.718426Z     WARNING: CPU: 6 PID: 34205 at drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn314/dcn314_smu.c:159 dcn314_smu_send_msg_with_param+0x108/0x190 [amdgpu]
2024-07-10T04:41:10.718453Z     Modules linked in: hid_logitech_hidpp snd_usb_audio snd_usbmidi_lib snd_ump snd_rawmidi hid_logitech_dj mc r8153_ecm cdc_ether usbnet r8152 mii vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd virtiofs tun xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nft_compat nf_nat_tftp nf_conntrack_tftp bridge stp llc uinput rfcomm snd_seq_dummy snd_hrtimer uhid nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables qrtr bnep sunrpc binfmt_misc vfat fat snd_sof_amd_acp63 snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_pci_ps snd_amd_sdw_acpi snd_hda_codec_realtek soundwire_amd soundwire_generic_allocation snd_hda_codec_generic soundwire_bus snd_hda_scodec_component snd_hda_codec_hdmi snd_soc_core btusb intel_rapl_msr mt7921e btrtl amd_atl
2024-07-10T04:41:10.718532Z      btintel mt7921_common intel_rapl_common snd_hda_intel btbcm snd_compress mt792x_lib snd_intel_dspcfg btmtk edac_mce_amd ac97_bus snd_intel_sdw_acpi mt76_connac_lib snd_pcm_dmaengine snd_hda_codec bluetooth snd_rpl_pci_acp6x mt76 snd_acp_pci kvm_amd snd_hda_core cros_ec_lpcs uas snd_acp_legacy_common usb_storage cros_ec snd_pci_acp6x snd_hwdep hid_sensor_als kvm mac80211 snd_seq hid_sensor_trigger wmi_bmof snd_seq_device hid_sensor_iio_common rapl industrialio_triggered_buffer kfifo_buf libarc4 pcspkr snd_pcm industrialio snd_pci_acp5x snd_rn_pci_acp3x cfg80211 snd_acp_config snd_timer snd_soc_acpi snd amd_pmf thunderbolt soundcore snd_pci_acp3x amdtee k10temp amd_sfh rfkill i2c_piix4 tee platform_profile amd_pmc joydev loop nfnetlink zram dm_crypt amdgpu amdxcp i2c_algo_bit drm_ttm_helper ttm crct10dif_pclmul drm_exec crc32_pclmul crc32c_intel polyval_clmulni gpu_sched polyval_generic nvme drm_suballoc_helper drm_buddy nvme_core ghash_clmulni_intel drm_display_helper sha512_ssse3 hid_multitouch video
2024-07-10T04:41:10.718564Z      sha256_ssse3 ucsi_acpi hid_sensor_hub ccp cec typec_ucsi sha1_ssse3 nvme_auth sp5100_tco typec wmi i2c_hid_acpi i2c_hid serio_raw ip6_tables ip_tables fuse i2c_dev
2024-07-10T04:41:10.718583Z     CPU: 6 PID: 34205 Comm: kworker/u64:2 Not tainted 6.9.7-200.fc40.x86_64 #1
2024-07-10T04:41:10.718606Z     Hardware name: Framework Laptop 13 (AMD Ryzen 7040Series)/FRANMDCP07, BIOS 03.05 03/29/2024
2024-07-10T04:41:10.718623Z     Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
2024-07-10T04:41:10.718937Z     RIP: 0010:dcn314_smu_send_msg_with_param+0x108/0x190 [amdgpu]
2024-07-10T04:41:10.718978Z     Code: be 93 62 01 00 5d 41 5c 41 5d e9 b3 7c de ff 44 89 ea 48 c7 c6 08 c5 3f c1 48 c7 c7 c0 aa f2 c0 e8 4d d3 e7 e4 e9 48 ff ff ff <0f> 0b 48 8b 3b b9 80 84 1e 00 44 89 e2 89 ee e8 74 30 df ff eb b5
2024-07-10T04:41:10.718999Z     RSP: 0018:ffffb4e40117f8b8 EFLAGS: 00010246
2024-07-10T04:41:10.719022Z     RAX: 0000b26e6a2c8b6b RBX: ffff9b3fc5bec400 RCX: 0000000000000006
2024-07-10T04:41:10.719044Z     RDX: 0000000000008ad5 RSI: 00000000000080a9 RDI: 0000b26e6a2c0096
2024-07-10T04:41:10.719062Z     RBP: 000000000000000d R08: 0000000000000000 R09: ffffb4e40117f830
2024-07-10T04:41:10.719079Z     R10: 0000000000000000 R11: 0000000000010000 R12: 0000000000000000
2024-07-10T04:41:10.719096Z     R13: 0000000000000000 R14: ffff9b3fd0049ff8 R15: ffff9b4446200908
2024-07-10T04:41:10.719113Z     FS:  0000000000000000(0000) GS:ffff9b4e21d00000(0000) knlGS:0000000000000000
2024-07-10T04:41:10.719131Z     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2024-07-10T04:41:10.719148Z     CR2: 00007f1b00de1000 CR3: 0000000afe428000 CR4: 0000000000f50ef0
2024-07-10T04:41:10.719165Z     PKRU: 55555554
2024-07-10T04:41:10.719182Z     Call Trace:
2024-07-10T04:41:10.719199Z      <TASK>
2024-07-10T04:41:10.720108Z      ? dcn314_smu_send_msg_with_param+0x108/0x190 [amdgpu]
2024-07-10T04:41:10.72015Z       ? __warn.cold+0x8e/0xe8
2024-07-10T04:41:10.720176Z      ? dcn314_smu_send_msg_with_param+0x108/0x190 [amdgpu]
2024-07-10T04:41:10.720207Z      ? handle_bug+0x3c/0x80
2024-07-10T04:41:10.720223Z      ? exc_invalid_op+0x17/0x70
2024-07-10T04:41:10.72024Z       ? asm_exc_invalid_op+0x1a/0x20
2024-07-10T04:41:10.720256Z      ? dcn314_smu_send_msg_with_param+0x108/0x190 [amdgpu]
2024-07-10T04:41:10.720843Z      ? dcn314_smu_send_msg_with_param+0xae/0x190 [amdgpu]
2024-07-10T04:41:10.720885Z      link_set_dpms_off+0xfe/0x980 [amdgpu]
2024-07-10T04:41:10.720904Z      ? srso_alias_return_thunk+0x5/0xfbef5
2024-07-10T04:41:10.72194Z       ? generic_reg_set_ex+0xa8/0xf0 [amdgpu]
2024-07-10T04:41:10.721986Z      ? srso_alias_return_thunk+0x5/0xfbef5
2024-07-10T04:41:10.722001Z      ? optc31_set_drr+0x128/0x1d0 [amdgpu]
2024-07-10T04:41:10.722019Z      dcn31_reset_hw_ctx_wrap+0x218/0x440 [amdgpu]
2024-07-10T04:41:10.722971Z      dce110_apply_ctx_to_hw+0x4e/0x320 [amdgpu]
2024-07-10T04:41:10.723012Z      dc_commit_state_no_check+0x618/0x1960 [amdgpu]
2024-07-10T04:41:10.723032Z      dc_commit_streams+0x299/0x5b0 [amdgpu]
2024-07-10T04:41:10.72305Z       ? srso_alias_return_thunk+0x5/0xfbef5
2024-07-10T04:41:10.723851Z      dm_suspend+0x214/0x270 [amdgpu]
2024-07-10T04:41:10.723887Z      amdgpu_device_ip_suspend_phase1+0x9a/0x180 [amdgpu]
2024-07-10T04:41:10.723908Z      amdgpu_device_ip_suspend+0x29/0x70 [amdgpu]
2024-07-10T04:41:10.724843Z      amdgpu_device_pre_asic_reset+0xcd/0x420 [amdgpu]
2024-07-10T04:41:10.724884Z      amdgpu_device_gpu_recover.cold+0x475/0xb44 [amdgpu]
2024-07-10T04:41:10.724908Z      amdgpu_job_timedout+0x18e/0x1d0 [amdgpu]
2024-07-10T04:41:10.724927Z      drm_sched_job_timedout+0x73/0x100 [gpu_sched]
2024-07-10T04:41:10.724991Z      process_one_work+0x186/0x340
2024-07-10T04:41:10.725039Z      worker_thread+0x278/0x3b0
2024-07-10T04:41:10.725067Z      ? __pfx_worker_thread+0x10/0x10
2024-07-10T04:41:10.725092Z      kthread+0xcf/0x100
2024-07-10T04:41:10.725117Z      ? __pfx_kthread+0x10/0x10
2024-07-10T04:41:10.725142Z      ret_from_fork+0x31/0x50
2024-07-10T04:41:10.725166Z      ? __pfx_kthread+0x10/0x10
2024-07-10T04:41:10.725185Z      ret_from_fork_asm+0x1a/0x30
2024-07-10T04:41:10.725209Z      </TASK>
2024-07-10T04:41:10.725234Z     ---[ end trace 0000000000000000 ]---
2024-07-10T04:41:10.823897Z     [drm] DMUB HPD IRQ callback: link_index=7
2024-07-10T04:41:11.811864Z     [drm] DMUB HPD IRQ callback: link_index=7
2024-07-10T04:41:15.468888Z     amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001A SMN_C2PMSG_82:0x00000000
2024-07-10T04:41:15.469606Z     amdgpu 0000:c1:00.0: amdgpu: Failed to power gate VCN!
2024-07-10T04:41:15.46985Z      [drm:vcn_v4_0_stop [amdgpu]] *ERROR* Dpm disable uvd failed, ret = -62. 
2024-07-10T04:41:18.156869Z     [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
2024-07-10T04:41:18.157041Z     [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
2024-07-10T04:41:18.29886Z      [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
2024-07-10T04:41:18.298999Z     [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
2024-07-10T04:41:18.440978Z     [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
2024-07-10T04:41:18.441112Z     [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
2024-07-10T04:41:18.583856Z     [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
2024-07-10T04:41:18.583956Z     [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
2024-07-10T04:41:18.725833Z     [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
2024-07-10T04:41:18.725946Z     [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
2024-07-10T04:41:18.867833Z     [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
2024-07-10T04:41:18.867899Z     [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
2024-07-10T04:41:19.010147Z     [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
2024-07-10T04:41:19.010275Z     [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
2024-07-10T04:41:19.152852Z     [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
2024-07-10T04:41:19.152909Z     [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
2024-07-10T04:41:19.29485Z      [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
2024-07-10T04:41:19.294916Z     [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
2024-07-10T04:41:19.296847Z     amdgpu 0000:c1:00.0: amdgpu: MODE2 reset
2024-07-10T04:41:23.28689Z      ACPI Error: Aborting method \_SB.A018 due to previous error (AE_AML_LOOP_TIMEOUT) (20230628/psparse-529)
2024-07-10T04:41:23.287222Z     ACPI Error: Aborting method \_SB.ALIB due to previous error (AE_AML_LOOP_TIMEOUT) (20230628/psparse-529)
2024-07-10T04:41:23.287842Z     ACPI Error: Aborting method \_SB.PCI0.GP19.NHI0.PPS3 due to previous error (AE_AML_LOOP_TIMEOUT) (20230628/psparse-529)
2024-07-10T04:41:23.2879Z       ACPI Error: Aborting method \_SB.PCI0.GP19.NHI0._PS3 due to previous error (AE_AML_LOOP_TIMEOUT) (20230628/psparse-529)
2024-07-10T04:41:24.788851Z     amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001A SMN_C2PMSG_82:0x00000000
2024-07-10T04:41:24.789484Z     amdgpu 0000:c1:00.0: amdgpu: Mode2 reset failed!
2024-07-10T04:41:24.789724Z     amdgpu 0000:c1:00.0: amdgpu: ASIC reset failed with error, -62 for drm dev, 0000:c1:00.0
2024-07-10T04:41:24.790172Z     amdgpu 0000:c1:00.0: amdgpu: GPU reset(1) failed
2024-07-10T04:41:24.790546Z     amdgpu 0000:c1:00.0: amdgpu: GPU reset end with ret = -62
2024-07-10T04:41:24.790898Z     [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -62

I’ve set up a logging server since none of this ever perists in the journal once it locks up, but existing networking and processes (like sound in the background) seems to be OK.

Other notes:

  • Is extremely reproducible if I let a LibreOffice presentation run on a loop and have hardware acceleration enabled
  • Never seems to happen under load - only if using the machine lightly, and almost always if it’s Firefox or LibreOffice
  • Seems to be completelty independent of power profile that’s set, even if I set the performance level of the GPU directly
  • Does not need any external display attached, and doesn’t seem to matter what expansion cards I use
  • Displays will either lock up completely and show the last static image, or just go black
  • Ctrl+Alt+F# does not function - even capslock doesn’t light up - the thing is mostly dead
  • Windows is completely unaffected - if I’m doing something important I’ve switched to using that for the time being

Is there anything at all I can do to further diagnose this? Should I raise a support case?

1 Like

Do you have a site or (sharable) document that reproduces?

I would suggest reporting to Linux AMDGPU upstream at drm / amd · GitLab but would also be good to see if I or someone else could reproduce.

Turns out I was somehow missing the command line tool used by the fedora script to change brightness. I installed via DNF and keys work again!

Hey I have the same issue, can I know which command line tool you are referring too?

dnf install light

I don’t know if they simply changed tool between 39 and 40. But I don’t recall changing anything manually.

Name : light
Version : 1.2.2
Release : 11.fc40
Architecture : x86_64
Size : 81 k
Source : light-1.2.2-11.fc40.src.rpm
Repository : @System
From repo : fedora
Summary : Control backlight controllers
URL : http://haikarainen.github.io/light/
License : GPL-3.0-only
Description : Light is a program to control backlight controllers under GNU/Linux,
: it is the successor of lightscript, which was a bash script
: with the same purpose, and tries to maintain the same functionality.
:
: Features
:
: - Works excellent where other software have been proven unusable
: or problematic, thanks to how it operates internally
: and the fact that it does not rely on X.
: - Can automatically figure out the best controller to use,
: making full use of underlying hardware.
: - Possibility to set a minimum brightness value, as some controllers
: set the screen to be pitch black at a value of 0 (or higher).

1 Like

Hi there, I’ve got exactly the same symptoms than yours with the same conditions.

Went to a lot of debug tries and errors. It was also quite difficult to link the dmesg or journalctl outputs to exactly problem exactly.

Today I started to use a dual monitor and let the dmesg running on the laptop screen (while using virt-manager on an external monitor). And here is the output

I started to see some sound artifacts when the amdgpu: Failed to export the SMU metrics table: occured.

After that, the system lagged / music loopback / bluetooth hanging and “recovered” few times until the full loop of:
`amdgpu 0000:c1:00.0: amdgpu: SMU: I’m not done with your previous command: SMN_C2PMSG_66:0x>

fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
`

Here is the full output:

sudo journalctl -b -1 -kp 0..6 --since 21:01:00
Jul 29 21:03:54 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:03:54 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:03:59 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:03:59 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:04:04 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:04:04 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:04:09 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:04:09 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:04:14 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:04:14 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:04:18 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:04:18 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:04:23 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:04:23 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:04:27 fedora kernel: perf: interrupt took too long (2639 > 2500), lowering kernel.perf_event_max_sample_rate to >
Jul 29 21:04:28 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:04:28 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:04:33 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:04:33 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:04:38 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:04:38 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:04:43 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:04:43 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:04:48 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:04:48 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:04:53 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:04:53 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:04:58 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:04:58 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:05:03 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:05:03 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:05:08 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:05:08 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:05:13 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:05:13 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:05:18 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:05:18 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:05:22 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:05:22 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:05:27 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
nolann@fedora:~/Documents/bash$ sudo journalctl -b -1 -kp 0..6 --since 21:01:00
Jul 29 21:03:05 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:03:05 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:03:10 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:03:10 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:03:15 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:03:15 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:03:20 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:03:20 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:03:24 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:03:24 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:03:29 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:03:29 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:03:34 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:03:34 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:03:39 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:03:39 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:03:44 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:03:44 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:03:49 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:03:49 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:03:52 fedora kernel: usb 1-4: reset full-speed USB device number 3 using xhci_hcd
Jul 29 21:03:53 fedora kernel: usb 1-4: reset full-speed USB device number 3 using xhci_hcd
Jul 29 21:03:54 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:03:54 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:03:59 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:03:59 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:04:04 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:04:04 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:04:09 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:04:09 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:04:14 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:04:14 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:04:18 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:04:18 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:04:23 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:04:23 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:04:27 fedora kernel: perf: interrupt took too long (2639 > 2500), lowering kernel.perf_event_max_sample_rate to >
Jul 29 21:04:28 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:04:28 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:04:33 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:04:33 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:04:38 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:04:38 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:04:43 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:04:43 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:04:48 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:04:48 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:04:53 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:04:53 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:04:58 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:04:58 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:05:03 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:05:03 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:05:08 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:05:08 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:05:13 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:05:13 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:05:18 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:05:18 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:05:22 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:05:22 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:05:27 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:05:27 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:05:32 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:05:32 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:05:37 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:05:37 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:05:42 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:05:42 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:05:47 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:05:47 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:05:52 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:05:52 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:05:57 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:05:57 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:06:02 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:06:02 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:06:07 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:06:07 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:06:12 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:06:12 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:06:17 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:06:17 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:06:22 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:06:22 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:06:26 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:06:26 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:06:31 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:06:31 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:06:36 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:06:36 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:06:41 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:06:41 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:06:46 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:06:46 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:06:51 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:06:51 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:06:56 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:06:56 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:07:01 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:07:01 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:07:06 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:07:06 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:07:11 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:07:11 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:07:16 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:07:16 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:07:21 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:07:21 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:07:25 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:07:25 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:07:30 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:07:30 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:07:35 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:07:35 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:07:40 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:07:40 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:07:45 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:07:45 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:07:50 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:07:50 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:07:55 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:07:55 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:08:00 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:08:00 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:08:05 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:08:05 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:08:10 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:08:10 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:08:15 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:08:15 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:08:18 fedora kernel: usb 1-1: USB disconnect, device number 2
Jul 29 21:08:20 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:08:20 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:08:21 fedora kernel: usb 1-1: new full-speed USB device number 5 using xhci_hcd
Jul 29 21:08:21 fedora kernel: usb 1-1: New USB device found, idVendor=32ac, idProduct=0002, bcdDevice= 0.00
Jul 29 21:08:21 fedora kernel: usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
Jul 29 21:08:21 fedora kernel: usb 1-1: Product: HDMI Expansion Card
Jul 29 21:08:21 fedora kernel: usb 1-1: Manufacturer: Framework
Jul 29 21:08:21 fedora kernel: usb 1-1: SerialNumber: 11AD1D00169F400930110B00
Jul 29 21:08:24 fedora kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=947448, emi>
Jul 29 21:08:24 fedora kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process gnome-shell pid 259>
Jul 29 21:08:24 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: GPU reset begin!
Jul 29 21:08:25 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:08:25 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:08:30 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:08:30 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:08:35 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:08:35 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to export SMU metrics table!
Jul 29 21:08:38 fedora kernel: amdgpu 0000:c1:00.0: [drm] *ERROR* Error queueing DMUB command: status=2
Jul 29 21:08:38 fedora kernel: amdgpu 0000:c1:00.0: [drm] *ERROR* Error queueing DMUB command: status=2
Jul 29 21:08:39 fedora kernel: amdgpu 0000:c1:00.0: [drm] *ERROR* Error queueing DMUB command: status=2
Jul 29 21:08:39 fedora kernel: amdgpu 0000:c1:00.0: [drm] *ERROR* Error queueing DMUB command: status=2

Jul 29 21:08:44 fedora kernel: ------------[ cut here ]------------
Jul 29 21:08:44 fedora kernel: WARNING: CPU: 9 PID: 10442 at drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn314/dcn31>
Jul 29 21:08:44 fedora kernel: Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nft_compat nf_nat_tftp>
Jul 29 21:08:44 fedora kernel:  btusb mt76_connac_lib snd_pci_acp6x btrtl snd_hwdep kvm_amd hid_sensor_als mt76 btintel sn>
Jul 29 21:08:44 fedora kernel: CPU: 9 PID: 10442 Comm: kworker/u48:2 Not tainted 6.9.11-200.fc40.x86_64 #1
Jul 29 21:08:44 fedora kernel: Hardware name: Framework Laptop 13 (AMD Ryzen 7040Series)/FRANMDCP05, BIOS 03.05 03/29/2024
Jul 29 21:08:44 fedora kernel: Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
Jul 29 21:08:44 fedora kernel: RIP: 0010:dcn314_smu_send_msg_with_param+0x108/0x190 
Jul 29 21:08:44 fedora kernel:  </TASK>
Jul 29 21:08:44 fedora kernel: ---[ end trace 0000000000000000 ]---
Jul 29 21:08:45 fedora kernel: amdgpu 0000:c1:00.0: [drm] *ERROR* Error queueing DMUB command: status=2
Jul 29 21:08:45 fedora kernel: amdgpu 0000:c1:00.0: [drm] *ERROR* Error queueing DMUB command: status=2
Jul 29 21:08:45 fedora kernel: amdgpu 0000:c1:00.0: [drm] *ERROR* Error queueing DMUB command: status=2
Jul 29 21:08:45 fedora kernel: amdgpu 0000:c1:00.0: [drm] *ERROR* Error queueing DMUB command: status=2
Jul 29 21:08:46 fedora kernel: amdgpu 0000:c1:00.0: [drm] *ERROR* Error queueing DMUB command: status=2
Jul 29 21:08:46 fedora kernel: amdgpu 0000:c1:00.0: [drm] *ERROR* Error queueing DMUB command: status=2
Jul 29 21:08:46 fedora kernel: amdgpu 0000:c1:00.0: [drm] *ERROR* Error queueing DMUB command: status=2
Jul 29 21:08:46 fedora kernel: amdgpu 0000:c1:00.0: [drm] *ERROR* Error queueing DMUB command: status=2
Jul 29 21:08:49 fedora kernel: amdgpu 0000:c1:00.0: [drm] *ERROR* Error queueing DMUB command: status=2
Jul 29 21:08:49 fedora kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to r>
Jul 29 21:08:49 fedora kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Jul 29 21:08:49 fedora kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to r>
Jul 29 21:08:49 fedora kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Jul 29 21:08:49 fedora kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to r>
Jul 29 21:08:49 fedora kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Jul 29 21:08:50 fedora kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to r>
Jul 29 21:08:50 fedora kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Jul 29 21:08:50 fedora kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to r>
Jul 29 21:08:50 fedora kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Jul 29 21:08:50 fedora kernel: ACPI Error: Aborting method \_SB.A018 due to previous error (AE_AML_LOOP_TIMEOUT) (20230628>
Jul 29 21:08:50 fedora kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to r>
Jul 29 21:08:50 fedora kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Jul 29 21:08:50 fedora kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to r>
Jul 29 21:08:50 fedora kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Jul 29 21:08:50 fedora kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to r>
Jul 29 21:08:50 fedora kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Jul 29 21:08:50 fedora kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to r>
Jul 29 21:08:50 fedora kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Jul 29 21:08:50 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: MODE2 reset
Jul 29 21:08:55 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x>
Jul 29 21:08:55 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: Mode2 reset failed!
Jul 29 21:08:55 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: ASIC reset failed with error, -62 for drm dev, 0000:c1:00.0
Jul 29 21:08:55 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: GPU reset(2) failed
Jul 29 21:08:55 fedora kernel: amdgpu 0000:c1:00.0: amdgpu: GPU reset end with ret = -62
Jul 29 21:08:55 fedora kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -62
Jul 29 21:08:55 fedora kernel: ACPI Error: Aborting method \_SB.A032 due to previous error (AE_AML_LOOP_TIMEOUT) (20230628>
Jul 29 21:08:55 fedora kernel: ACPI Error: Aborting method \_SB.ALIB due to previous error (AE_AML_LOOP_TIMEOUT) (20230628>
Jul 29 21:08:55 fedora kernel: ACPI Error: Aborting method \_SB.APX8 due to previous error (AE_AML_LOOP_TIMEOUT) (20230628>
Jul 29 21:08:55 fedora kernel: ACPI Error: Aborting method \_SB.PMF.PMF8 due to previous error (AE_AML_LOOP_TIMEOUT) (2023>
Jul 29 21:08:55 fedora kernel: ACPI Error: Aborting method \_SB.PMF.APMF due to previous error (AE_AML_LOOP_TIMEOUT) (2023>
Jul 29 21:08:55 fedora kernel: amd-pmf AMDI0102:00: APMF method:8 call failed

Any further development or solution would be welcome :sweat_smile:

To be honest I am seek of not being able to use my laptop as I wanted to almost 2 months after the purchase. I went through every more or less supported OS with the last updates but nothing removed the problem.

I am not 100% sure that this is an hardware problem and that it is the framework team’s fault for that. But it seems that some people with the same configuration do not have the same issue at all. Then is that a sufficient reason to ask for a mainboard exchange? (just want something that finally works without crashing every 15 minutes and not have to wait 4 months until a drivers release arrives :confused:)

*PS the configuration comes in the next message, not enough characters here.

Here are the information on my configuration:

system:
  Kernel: 6.9.11-200.fc40.x86_64 arch: x86_64 bits: 64 compiler: gcc
    v: 2.41-37.fc40
  Desktop: GNOME v: 46.3.1 Distro: Fedora Linux 40 (Workstation Edition)
Machine:
  Type: Laptop System: Framework product: Laptop 13 (AMD Ryzen 7040Series)
    v: A5 serial: <superuser required>
  Mobo: Framework model: FRANMDCP05 v: A5 serial: <superuser required>
    UEFI: INSYDE v: 03.05 date: 03/29/2024
Battery:
  ID-1: BAT1 charge: 45.2 Wh (87.8%) condition: 51.5/55.0 Wh (93.6%)
    volts: 16.7 min: 15.4 model: NVT Framewo status: discharging
CPU:
  Info: 6-core model: AMD Ryzen 5 7640U w/ Radeon 760M Graphics bits: 64
    type: MT MCP arch: Zen 4 rev: 1 cache: L1: 384 KiB L2: 6 MiB L3: 16 MiB
  Speed (MHz): avg: 655 high: 3468 min/max: 400/4971 cores: 1: 400 2: 400
    3: 400 4: 400 5: 400 6: 400 7: 400 8: 400 9: 400 10: 400 11: 3468 12: 400
    bogomips: 83837
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
Graphics:
  Device-1: AMD Phoenix1 vendor: Framework driver: amdgpu v: kernel
    arch: RDNA-3 bus-ID: c1:00.0 temp: 39.0 C
  Display: x11 server: X.Org v: 1.20.14 with: Xwayland v: 24.1.1 driver: X:
    loaded: amdgpu unloaded: fbdev,modesetting,vesa dri: radeonsi gpu: amdgpu
    resolution: 2256x1504~60Hz
  API: OpenGL v: 4.6 vendor: amd mesa v: 24.1.4 glx-v: 1.4
    direct-render: yes renderer: AMD Radeon 760M (radeonsi gfx1103_r1 LLVM
    18.1.6 DRM 3.57 6.9.11-200.fc40.x86_64)
  API: Vulkan v: 1.3.283 drivers: N/A surfaces: xcb,xlib devices: 2
  API: EGL Message: EGL data requires eglinfo. Check --recommends.
Audio:
  Device-1: AMD Rembrandt Radeon High Definition Audio vendor: Framework
    driver: snd_hda_intel v: kernel bus-ID: c1:00.1
  Device-2: AMD ACP/ACP3X/ACP6x Audio Coprocessor vendor: Framework
    driver: snd_pci_ps v: kernel bus-ID: c1:00.5
  Device-3: AMD Family 17h/19h HD Audio vendor: Framework
    driver: snd_hda_intel v: kernel bus-ID: c1:00.6
  API: ALSA v: k6.9.11-200.fc40.x86_64 status: kernel-api
  Server-1: JACK v: 1.9.22 status: off
  Server-2: PipeWire v: 1.0.7 status: active
Network:
  Device-1: MEDIATEK MT7922 802.11ax PCI Express Wireless Network Adapter
    driver: mt7921e v: kernel bus-ID: 01:00.0
  IF: wlp1s0 state: up mac: <filter>
Bluetooth:
  Device-1: MediaTek Wireless_Device driver: btusb v: 0.8 type: USB
    bus-ID: 1-5:4
  Report: btmgmt ID: hci0 rfk-id: 0 state: up address: <filter> bt-v: 5.2
    lmp-v: 11
Drives:
  Local Storage: total: 1.82 TiB used: 256.08 GiB (13.7%)
  ID-1: /dev/nvme0n1 vendor: Western Digital model: WD BLACK SN770 2TB
    size: 1.82 TiB temp: 34.9 C
Partition:
  ID-1: / size: 960.16 GiB used: 136.54 GiB (14.2%) fs: ext4
    dev: /dev/nvme0n1p1
  ID-2: /boot/efi size: 4.87 GiB used: 19 MiB (0.4%) fs: vfat
    dev: /dev/nvme0n1p3
Swap:
  ID-1: swap-1 type: zram size: 8 GiB used: 0 KiB (0.0%) dev: /dev/zram0
Sensors:
  System Temperatures: cpu: N/A mobo: N/A
  Fan Speeds (rpm): N/A
Info:
  Memory: total: 28 GiB available: 27.21 GiB used: 4.49 GiB (16.5%)
  Processes: 393 Uptime: 37m Init: systemd target: graphical (5)
  Packages: 6 Compilers: gcc: 14.1.1 Shell: Bash v: 5.2.26 inxi: 3.3.34

Here is the grub

GRUB_TIMEOUT=5
GRUB_TIMEOUT_STYLE=menu
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
# Trying adding amdgpu.dc=0 and amdgpu.dpm=0 one after another
GRUB_CMDLINE_LINUX="rhgb quiet amdgpu.sg_display=0 amdgpu.ppfeaturemask=0xfff7ffff amd_pstate=active"
GRUB_DISABLE_RECOVERY="true"
GRUB_ENABLE_BLSCFG=true


# After that use sudo grub2-mkconfig -o /boot/grub2/grub.cfg and reboot

The grub’s: amdgpu.ppfeaturemask=0xfff7ffff amd_pstate=active comes from a link (that I have to find back) suggesting this workaround for a quite similar issue. The funny thing is that until I putted that parameter, the screen was not instantaneous anymore. Like I was able to fight with a buggy behavior for few minutes until the required hard-reboot.

hey. I had exactly the same issue:

  • booting up with TB4 dock connected and second display = freeze on kernel load
  • restarting/shutdown with the same setup = hang on a black screen. which sometimes is solvable by plugging off the dock, but most of the times only by hard reset
    I remember that this behavior was not the case when I start using the laptop, but appeared after bunch of updates. Fedora 40 Beta → stable, framework firmware update to the latest (3.5 with my 13" 7040).
    After sorting out another issue with right ports and receiving MB replacement, I can confirm that this is in fact firmware issue. Because on replacement board there is an older version 3.3 - which does not shows such issue. nb I never plugged in my dock in faulty ports, so its not related

action: can you check your fw version? And if possible, try to downgrade the FW to confirm it with fwupdmgr downgrade. So we could hand this to the support

Hi kosta,

I am already under discussion with support in order to find a solution, so far they gave me some procedure to follow for further testing. I am waiting for their conclusions (maybe after some more testing) to share the result if we find something interesting.

Have a great day.

1 Like

Just wanted to note that I am experiencing this exact thing, with some additional stuff like


Aug 18 12:49:18.486228 archwork kernel: ACPI Error: Aborting method \_SB.A018 due to previous error (AE_AML_LOOP_TIMEOUT) (20240322/psparse-529)
Aug 16 19:40:27.345643 archwork kernel: cros_ec_lpcs cros_ec_lpcs.0: bad packet checksum f5
Aug 16 19:40:27.365118 archwork kernel: cros_ec_lpcs cros_ec_lpcs.0: packet too long (65535 bytes, expected 100)
Aug 16 19:40:51.938533 archwork kernel: amdgpu 0000:c1:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000012 SMN_C2P>
Aug 16 19:40:51.939245 archwork kernel: amdgpu 0000:c1:00.0: amdgpu: Failed to retrieve enabled ppfeatures!

I’m in contact with support as well. this is consistent across the 6.10.5 and 6.11 kernels. very frustrating and inconsistent. Given that I crashed inside the BIOS a few times(with graphical errors reminiscent of a dying gpu), I think it’s more than a hardware issue- though these errors may be a coincidence and unrelated.

1 Like

I’m having issues with my Anker 778 Thunderbolt Docking Station and an external display after updating to 6.10.x kernels. The external display is blank. 6.9.12-200 works fine, so I’m just booting to that for now.

I had the same issue with a Lenovo M14T external USB-C monitor.

Sorry for the late response. Your problems seems a bit different to me. I can boot up to the login manager with the dock plugged in and it will only freeze there, not right on when the kernel takes over.
Just now, I attempted to plug the dock in after the login manager, with plasma running. It worked for a few seconds, even found network & usb from the dock, then froze.

Log:

Aug 21 12:20:09 framework kernel: [drm] DMUB HPD IRQ callback: link_index=7
Aug 21 12:20:09 framework kernel: [drm] DMUB HPD IRQ callback: link_index=7
Aug 21 12:20:09 framework kernel: BUG: kernel NULL pointer dereference, address: 0000000000000920
Aug 21 12:20:09 framework kernel: #PF: supervisor read access in kernel mode
Aug 21 12:20:09 framework kernel: #PF: error_code(0x0000) - not-present page
Aug 21 12:20:09 framework kernel: PGD 0 P4D 0 
Aug 21 12:20:09 framework kernel: Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
Aug 21 12:20:09 framework kernel: CPU: 15 PID: 2589 Comm: kwin_wayland Not tainted 6.10.4-200.fc40.x86_64 #1
Aug 21 12:20:09 framework kernel: Hardware name: Framework Laptop 13 (AMD Ryzen 7040Series)/FRANMDCP07, BIOS 03.05 03/29/2024
Aug 21 12:20:09 framework kernel: RIP: 0010:is_dsc_need_re_compute+0xef/0x3b0 [amdgpu]
Aug 21 12:20:09 framework kernel: Code: 04 dc 48 85 c0 74 dd 48 39 50 08 75 d7 48 8b a8 90 64 00 00 48 85 ed 74 cb 48 83 bd f8 11 00 00 00 74 c1 48 8b 85 e8 07 00 00 <80> b8 20 09 00 00 00 75 10 48 8b 85 f0 07 00 00 f6 80 84 02 00 00
Aug 21 12:20:09 framework kernel: RSP: 0018:ffffb78b0103f698 EFLAGS: 00010286
Aug 21 12:20:09 framework kernel: RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff8e535a800000
Aug 21 12:20:09 framework kernel: RDX: ffff8e534d356000 RSI: 0000000000000001 RDI: ffffb78b0103f6e8
Aug 21 12:20:09 framework kernel: RBP: ffff8e55edb68000 R08: ffffb78b0103f6b8 R09: 0000000000000000
Aug 21 12:20:09 framework kernel: R10: ffffb78b0103f75a R11: ffff8e56c2c98000 R12: ffff8e56be640000
Aug 21 12:20:09 framework kernel: R13: ffff8e53f90f1800 R14: 0000000000000000 R15: 0000000000000003
Aug 21 12:20:09 framework kernel: FS:  00007fee22f29b40(0000) GS:ffff8e5a9e980000(0000) knlGS:0000000000000000
Aug 21 12:20:09 framework kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 21 12:20:09 framework kernel: CR2: 0000000000000920 CR3: 0000000107ac6000 CR4: 0000000000f50ef0
Aug 21 12:20:09 framework kernel: PKRU: 55555554
Aug 21 12:20:09 framework kernel: Call Trace:
Aug 21 12:20:09 framework kernel:  <TASK>
Aug 21 12:20:09 framework kernel:  ? __die_body.cold+0x19/0x27
Aug 21 12:20:09 framework kernel:  ? page_fault_oops+0x15a/0x2f0
Aug 21 12:20:09 framework kernel:  ? exc_page_fault+0x7e/0x180
Aug 21 12:20:09 framework kernel:  ? asm_exc_page_fault+0x26/0x30
Aug 21 12:20:09 framework kernel:  ? is_dsc_need_re_compute+0xef/0x3b0 [amdgpu]
Aug 21 12:20:09 framework kernel:  pre_validate_dsc+0x22a/0x730 [amdgpu]
Aug 21 12:20:09 framework kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 21 12:20:09 framework kernel:  ? dm_update_plane_state+0x582/0x680 [amdgpu]
Aug 21 12:20:09 framework kernel:  amdgpu_dm_atomic_check+0x8f0/0x1570 [amdgpu]
Aug 21 12:20:09 framework kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 21 12:20:09 framework kernel:  ? __ww_mutex_lock.constprop.0+0x5b/0x9a0
Aug 21 12:20:09 framework kernel:  drm_atomic_check_only+0x633/0xae0
Aug 21 12:20:09 framework kernel:  drm_mode_atomic_ioctl+0x853/0xd20
Aug 21 12:20:09 framework kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 21 12:20:09 framework kernel:  ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
Aug 21 12:20:09 framework kernel:  drm_ioctl_kernel+0xb0/0x100
Aug 21 12:20:09 framework kernel:  drm_ioctl+0x28b/0x540
Aug 21 12:20:09 framework kernel:  ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
Aug 21 12:20:09 framework kernel:  amdgpu_drm_ioctl+0x4e/0x90 [amdgpu]
Aug 21 12:20:09 framework kernel:  __x64_sys_ioctl+0x94/0xd0
Aug 21 12:20:09 framework kernel:  do_syscall_64+0x82/0x160
Aug 21 12:20:09 framework kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 21 12:20:09 framework kernel:  ? drm_modeset_drop_locks+0x52/0x70
Aug 21 12:20:09 framework kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 21 12:20:09 framework kernel:  ? drm_mode_atomic_ioctl+0x3c7/0xd20
Aug 21 12:20:09 framework kernel:  ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
Aug 21 12:20:09 framework kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 21 12:20:09 framework kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 21 12:20:09 framework kernel:  ? __check_object_size+0x58/0x230
Aug 21 12:20:09 framework kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 21 12:20:09 framework kernel:  ? drm_ioctl+0x2ba/0x540
Aug 21 12:20:09 framework kernel:  ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
Aug 21 12:20:09 framework kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 21 12:20:09 framework kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 21 12:20:09 framework kernel:  ? __pm_runtime_suspend+0x69/0xc0
Aug 21 12:20:09 framework kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 21 12:20:09 framework kernel:  ? amdgpu_drm_ioctl+0x71/0x90 [amdgpu]
Aug 21 12:20:09 framework kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 21 12:20:09 framework kernel:  ? syscall_exit_to_user_mode+0x72/0x220
Aug 21 12:20:09 framework kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 21 12:20:09 framework kernel:  ? do_syscall_64+0x8e/0x160
Aug 21 12:20:09 framework kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 21 12:20:09 framework kernel:  ? do_syscall_64+0x8e/0x160
Aug 21 12:20:09 framework kernel:  ? syscall_exit_to_user_mode+0x72/0x220
Aug 21 12:20:09 framework kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 21 12:20:09 framework kernel:  ? do_syscall_64+0x8e/0x160
Aug 21 12:20:09 framework kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 21 12:20:09 framework kernel:  ? do_syscall_64+0x8e/0x160
Aug 21 12:20:09 framework kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 21 12:20:09 framework kernel:  ? do_syscall_64+0x8e/0x160
Aug 21 12:20:09 framework kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 21 12:20:09 framework kernel:  ? do_syscall_64+0x8e/0x160
Aug 21 12:20:09 framework kernel:  ? __irq_exit_rcu+0x4a/0xb0
Aug 21 12:20:09 framework kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 21 12:20:09 framework kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
Aug 21 12:20:09 framework kernel: RIP: 0033:0x7fee29125f2d
Aug 21 12:20:09 framework kernel: Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
Aug 21 12:20:09 framework kernel: RSP: 002b:00007ffdd9bb7a00 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Aug 21 12:20:09 framework kernel: RAX: ffffffffffffffda RBX: 00005602535ed420 RCX: 00007fee29125f2d
Aug 21 12:20:09 framework kernel: RDX: 00007ffdd9bb7af0 RSI: 00000000c03864bc RDI: 0000000000000015
Aug 21 12:20:09 framework kernel: RBP: 00007ffdd9bb7a50 R08: 0000560253fa5678 R09: 00000005602535ed
Aug 21 12:20:09 framework kernel: R10: 0000560253fa5600 R11: 0000000000000246 R12: 00007ffdd9bb7af0
Aug 21 12:20:09 framework kernel: R13: 00000000c03864bc R14: 0000000000000015 R15: 0000000000000004
Aug 21 12:20:09 framework kernel:  </TASK>
Aug 21 12:20:09 framework kernel: Modules linked in: snd_usb_audio snd_usbmidi_lib snd_ump snd_rawmidi mc r8153_ecm cdc_ether usbnet r8152 mii tun overlay uinput rfcomm snd_seq_dummy snd_hrtimer nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject>
Aug 21 12:20:09 framework kernel:  snd_hda_intel snd_compress edac_mce_amd ac97_bus mt76 snd_intel_dspcfg snd_pcm_dmaengine snd_intel_sdw_acpi snd_rpl_pci_acp6x kvm_amd snd_hda_codec cros_ec_lpcs snd_acp_pci cros_ec snd_acp_legacy_common snd_hda_core mac80211 hid_sensor_als hid_sensor_trigger kvm snd_pci_acp6x sn>
Aug 21 12:20:09 framework kernel:  i2c_hid_acpi i2c_hid serio_raw ip6_tables ip_tables fuse i2c_dev
Aug 21 12:20:09 framework kernel: CR2: 0000000000000920
Aug 21 12:20:09 framework kernel: ---[ end trace 0000000000000000 ]---
Aug 21 12:20:09 framework kernel: RIP: 0010:is_dsc_need_re_compute+0xef/0x3b0 [amdgpu]
Aug 21 12:20:09 framework kernel: Code: 04 dc 48 85 c0 74 dd 48 39 50 08 75 d7 48 8b a8 90 64 00 00 48 85 ed 74 cb 48 83 bd f8 11 00 00 00 74 c1 48 8b 85 e8 07 00 00 <80> b8 20 09 00 00 00 75 10 48 8b 85 f0 07 00 00 f6 80 84 02 00 00
Aug 21 12:20:09 framework kernel: RSP: 0018:ffffb78b0103f698 EFLAGS: 00010286
Aug 21 12:20:09 framework kernel: RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff8e535a800000
Aug 21 12:20:09 framework kernel: RDX: ffff8e534d356000 RSI: 0000000000000001 RDI: ffffb78b0103f6e8
Aug 21 12:20:09 framework kernel: RBP: ffff8e55edb68000 R08: ffffb78b0103f6b8 R09: 0000000000000000
Aug 21 12:20:09 framework kernel: R10: ffffb78b0103f75a R11: ffff8e56c2c98000 R12: ffff8e56be640000
Aug 21 12:20:09 framework kernel: R13: ffff8e53f90f1800 R14: 0000000000000000 R15: 0000000000000003
Aug 21 12:20:09 framework kernel: FS:  00007fee22f29b40(0000) GS:ffff8e5a9e980000(0000) knlGS:0000000000000000
Aug 21 12:20:09 framework kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 21 12:20:09 framework kernel: CR2: 0000000000000920 CR3: 0000000107ac6000 CR4: 0000000000f50ef0
Aug 21 12:20:09 framework kernel: PKRU: 55555554
Aug 21 12:20:09 framework kernel: note: kwin_wayland[2589] exited with irqs disabled

I am currently on 3.05. I will check if fwupd lets me downgrade to 3.03 (and hope that doesnt brick anything :wink:

Well, I can confirm, at least, that its not just the firmware version. Exact same crash with 3.03/3.3:

Aug 21 12:45:32 framework kernel: [drm] DMUB HPD IRQ callback: link_index=8
Aug 21 12:45:32 framework kernel: BUG: kernel NULL pointer dereference, address: 0000000000000920
Aug 21 12:45:32 framework kernel: #PF: supervisor read access in kernel mode
Aug 21 12:45:32 framework kernel: #PF: error_code(0x0000) - not-present page
Aug 21 12:45:32 framework kernel: PGD 0 P4D 0 
Aug 21 12:45:32 framework kernel: Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
Aug 21 12:45:32 framework kernel: CPU: 15 PID: 2868 Comm: kwin_wayland Not tainted 6.10.4-200.fc40.x86_64 #1
Aug 21 12:45:32 framework kernel: Hardware name: Framework Laptop 13 (AMD Ryzen 7040Series)/FRANMDCP07, BIOS 03.03 10/17/2023
Aug 21 12:45:32 framework kernel: RIP: 0010:is_dsc_need_re_compute+0xef/0x3b0 [amdgpu]
Aug 21 12:45:32 framework kernel: Code: 04 dc 48 85 c0 74 dd 48 39 50 08 75 d7 48 8b a8 90 64 00 00 48 85 ed 74 cb 48 83 bd f8 11 00 00 00 74 c1 48 8b 85 e8 07 00 00 <80> b8 20 09 00 00 00 75 10 48 8b 85 f0 07 00 00 f6 80 84 02 00 00
Aug 21 12:45:32 framework kernel: RSP: 0018:ffffafec8661b820 EFLAGS: 00010286
Aug 21 12:45:32 framework kernel: RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff9835ddc00000
Aug 21 12:45:32 framework kernel: RDX: ffff9835cce79800 RSI: 0000000000000001 RDI: ffffafec8661b870
Aug 21 12:45:32 framework kernel: RBP: ffff9835c98aa000 R08: ffffafec8661b840 R09: 0000000000000000
Aug 21 12:45:32 framework kernel: R10: ffffafec8661b8e2 R11: ffff98369ac88000 R12: ffff9836a4d00000
Aug 21 12:45:32 framework kernel: R13: ffff9835ce380c00 R14: 0000000000000000 R15: 0000000000000003
Aug 21 12:45:32 framework kernel: FS:  00007fb3540b2b40(0000) GS:ffff983d1e980000(0000) knlGS:0000000000000000
Aug 21 12:45:32 framework kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 21 12:45:32 framework kernel: CR2: 0000000000000920 CR3: 0000000109804000 CR4: 0000000000f50ef0
Aug 21 12:45:32 framework kernel: PKRU: 55555554
Aug 21 12:45:32 framework kernel: Call Trace:

After the downgrade it bootet but did not detect the dock correctly. I had USB, but just USB (getting usb via usb isnt exactly magic…). So after booting to plasma I replugged the dock, it started getting the screens up and frooze. Here again, its not instant, it doesnt crash on the first sight of a dock. I visibly starts to use the peripherals (network, screens) and then amdgpu does an oopsie and takes everything else with it.

I must also add that on recent Fedora Kinoite I experienced dreadful regression that resulted in frequent video freezes, corrupted image decoding and high GPU load
I’m not sure if it was kernel or plasma, but it did go away with ostree rollback, and seems to be fixed in the latest rolling release
Not sure if its your case, just though it worth mentioning

I am currently on Plasma 6.1.4-1.fc40 and Kernel 6.10.4-200.fc40 and can confirm that everything GPU stressing is always scary as I expect a crash. And if it crashes, it crashes badly. But I cannot say that it has gotten worse. Has more or less always been that way.

Just saw that, alongside updates to 6.10.5-200.fc40 and Plasma 6.1.4-2.fc40 there are updates for “amd-gpu-firmware” and “amd-ucode-firmware”. Not expecting much of those kernel & plasma updates, but maybe the firmware ones bring hope? Will test that soonish.

1 Like