Adding some data points, looks like I might be the first one to provide a kernel crash trace? (see logs below)
system data:
- Ryzen 7 7840U
- 16GiB RAM
- NixOS 23.11
- kernel 6.7.0
- KDE Plasma 5.27.10
- using X11, not wayland
DMUB hardware initialized: version=0x08002A00
- no workarounds applid yet:
- not in UMA game mode
- not applied
amdgpu.sg_display=0
yet
when the bug appeared: Unfortunately I have no reliable reproducer yet, I keep experimenting with workarounds disabled until I find one.
I’ve encountered the issue in 2 situations so far:
- 2 times when resuming from a suspend-to-ram state with lid closed
- 1 time during normal operations, when launching 2 hardware-accelerated video streams at one (in mpv)
@Matt_Hartley So amdgpu.sg_display=0
appears to be the preferred workaround so far. But according to [AMD Re-Enables Scatter/Gather Support For All APUs On Linux - Phoronix](the Phoronix article), AMD themselfs consider scatter/gather an important feature. So if we need to keep this diabled longterm, what do we need to expect? What exactly does scatter/gather even do?
system log (including a kernel trace)
Due to restrictions in post length, I need to cut parts away from my system log. Here’s the part with the kernel crash, the complete log can be found here.
Jan 15 02:59:08 framenix systemd[1]: Starting Pre-Sleep Actions...
Jan 15 02:59:08 framenix systemd[1]: pre-sleep.service: Deactivated successfully.
Jan 15 02:59:08 framenix systemd[1]: Finished Pre-Sleep Actions.
Jan 15 02:59:08 framenix systemd[1]: Reached target Sleep.
Jan 15 02:59:08 framenix systemd[1]: Starting System Suspend...
Jan 15 02:59:08 framenix systemd-sleep[21564]: Entering sleep state 'suspend'...
Jan 15 02:59:08 framenix kernel: PM: suspend entry (s2idle)
Jan 15 02:59:08 framenix kernel: Filesystems sync: 0.005 seconds
Jan 16 20:40:26 framenix kernel: Freezing user space processes
Jan 16 20:40:26 framenix kernel: Freezing user space processes completed (elapsed 0.016 seconds)
Jan 16 20:40:26 framenix kernel: OOM killer disabled.
Jan 16 20:40:26 framenix kernel: Freezing remaining freezable tasks
Jan 16 20:40:26 framenix kernel: Freezing remaining freezable tasks completed (elapsed 0.000 seconds)
Jan 16 20:40:26 framenix kernel: printk: Suspending console(s) (use no_console_suspend to debug)
Jan 16 20:40:26 framenix kernel: queueing ieee80211 work while going to suspend
Jan 16 20:40:26 framenix kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache
Jan 16 20:40:26 framenix kernel: ACPI: EC: interrupt blocked
Jan 16 20:40:26 framenix kernel: ACPI: EC: interrupt unblocked
Jan 16 20:40:26 framenix kernel: nvme nvme0: 16/0/0 default/read/poll queues
Jan 16 20:40:26 framenix kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=14
Jan 16 20:40:26 framenix kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
Jan 16 20:40:26 framenix kernel: [drm] PCIE GART of 512M enabled (table at 0x000000801FD00000).
Jan 16 20:40:26 framenix kernel: amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
Jan 16 20:40:26 framenix kernel: amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
Jan 16 20:40:26 framenix kernel: ------------[ cut here ]------------
Jan 16 20:40:26 framenix kernel: WARNING: CPU: 10 PID: 21581 at drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dp_capability.c:1526 dp_retrieve_lttpr_cap+0x122/0x1e0 [amdgpu]
Jan 16 20:40:26 framenix kernel: Modules linked in: usbhid sd_mod uas usb_storage scsi_mod r8153_ecm scsi_common ccm qrtr rfcomm af_packet cmac algif_hash algif_skcipher af_alg bnep mt7921e mt7921_common mt792x_lib mt76_connac_lib cdc_mbim cdc_wdm mt76 cdc_ncm cdc_ether usbnet mac80211 snd_sof_amd_acp63 snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_hda_codec_realtek snd_sof_utils snd_hda_codec_generic snd_soc_core ledtrig_audio snd_hda_codec_hdmi btusb btrtl snd_compress ac97_bus btintel hid_sensor_als snd_pcm_dmaengine hid_sensor_trigger snd_hda_intel btbcm mousedev snd_pci_ps industrialio_triggered_buffer btmtk kfifo_buf snd_rpl_pci_acp6x snd_intel_dspcfg snd_intel_sdw_acpi hid_sensor_iio_common snd_acp_pci bluetooth snd_hda_codec snd_acp_legacy_common cfg80211 industrialio snd_pci_acp6x edac_mce_amd snd_hda_core snd_pci_acp5x nls_iso8859_1 snd_hwdep intel_rapl_msr edac_core xt_conntrack snd_rn_pci_acp3x sp5100_tco nls_cp437 snd_pcm nf_conntrack intel_rapl_common
Jan 16 20:40:26 framenix kernel: snd_acp_config ucsi_acpi ecdh_generic watchdog crc32_pclmul snd_soc_acpi typec_ucsi hid_multitouch snd_timer vfat hid_sensor_hub rfkill polyval_clmulni cros_ec_lpcs nf_defrag_ipv6 polyval_generic joydev fat hid_generic r8152 gf128mul ecc ghash_clmulni_intel crc16 mii cros_ec snd typec rapl tiny_power_button k10temp soundcore snd_pci_acp3x i2c_piix4 libarc4 battery nf_defrag_ipv4 tpm_crb thermal ac roles i2c_hid_acpi button i2c_hid tpm_tis amd_pmf hid tpm_tis_core platform_profile amd_pmc ip6t_rpfilter evdev mac_hid ipt_rpfilter serio_raw xt_pkttype xt_LOG nf_log_syslog xt_tcpudp nft_compat nf_tables nfnetlink sch_fq_codel ctr loop tun tap macvlan bridge stp llc vboxnetflt(O) vboxnetadp(O) vboxdrv(O) kvm_amd ccp kvm irqbypass fuse efi_pstore configfs zstd zram efivarfs dmi_sysfs ip_tables x_tables autofs4 dm_crypt aes_generic cbc encrypted_keys trusted asn1_encoder tee tpm rng_core xhci_pci xhci_pci_renesas input_leds xhci_hcd led_class nvme sha512_ssse3 sha512_generic atkbd sha256_ssse3 sha1_ssse3 libps2
Jan 16 20:40:26 framenix kernel: vivaldi_fmap nvme_core thunderbolt usbcore aesni_intel t10_pi libaes crypto_simd crc64_rocksoft cryptd crc64 i8042 crc_t10dif crct10dif_generic usb_common crct10dif_pclmul crct10dif_common serio rtc_cmos dm_mod dax btrfs blake2b_generic libcrc32c crc32c_generic crc32c_intel xor raid6_pq amdgpu i2c_algo_bit drm_ttm_helper ttm agpgart video wmi drm_exec drm_suballoc_helper amdxcp drm_buddy gpu_sched drm_display_helper drm_kms_helper drm backlight firmware_class
Jan 16 20:40:26 framenix kernel: CPU: 10 PID: 21581 Comm: kworker/u32:26 Tainted: G O 6.7.0 #1-NixOS
Jan 16 20:40:26 framenix kernel: Hardware name: Framework Laptop 13 (AMD Ryzen 7040Series)/FRANMDCP07, BIOS 03.03 10/17/2023
Jan 16 20:40:26 framenix kernel: Workqueue: events_unbound async_run_entry_fn
Jan 16 20:40:26 framenix kernel: RIP: 0010:dp_retrieve_lttpr_cap+0x122/0x1e0 [amdgpu]
Jan 16 20:40:26 framenix kernel: Code: 21 c8 48 c1 e2 38 48 09 d0 48 89 85 98 02 00 00 f6 85 c4 02 00 00 02 74 44 e8 8a ed ff ff 84 c0 75 3b 48 8b 85 d8 01 00 00 90 <0f> 0b 90 c6 85 9c 02 00 00 80 48 8b 40 10 48 8b 30 48 85 f6 74 04
Jan 16 20:40:26 framenix kernel: RSP: 0018:ffffba7e489bbbf0 EFLAGS: 00010246
Jan 16 20:40:26 framenix kernel: RAX: ffffa2a8cb457200 RBX: 00000000ffffffff RCX: 00ffffffffffffff
Jan 16 20:40:26 framenix kernel: RDX: 0000000000000000 RSI: ffffba7e489bbbf0 RDI: 0000000000000000
Jan 16 20:40:26 framenix kernel: RBP: ffffa2a8d009e800 R08: 0000000000000008 R09: 0000000000000000
Jan 16 20:40:26 framenix kernel: R10: 0000000000000002 R11: 0000000000000001 R12: ffffa2a8d00a2a00
Jan 16 20:40:26 framenix kernel: R13: ffffa2a8cbc31c70 R14: ffffa2a8d009a800 R15: 0000000000000009
Jan 16 20:40:26 framenix kernel: FS: 0000000000000000(0000) GS:ffffa2ac3df00000(0000) knlGS:0000000000000000
Jan 16 20:40:26 framenix kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 16 20:40:26 framenix kernel: CR2: 00007f7d48000b96 CR3: 0000000471a20000 CR4: 0000000000f50ef0
Jan 16 20:40:26 framenix kernel: PKRU: 55555554
Jan 16 20:40:26 framenix kernel: Call Trace:
Jan 16 20:40:26 framenix kernel: <TASK>
Jan 16 20:40:26 framenix kernel: ? dp_retrieve_lttpr_cap+0x122/0x1e0 [amdgpu]
Jan 16 20:40:26 framenix kernel: ? __warn+0x81/0x130
Jan 16 20:40:26 framenix kernel: ? dp_retrieve_lttpr_cap+0x122/0x1e0 [amdgpu]
Jan 16 20:40:26 framenix kernel: ? report_bug+0x171/0x1a0
Jan 16 20:40:26 framenix kernel: ? handle_bug+0x42/0x70
Jan 16 20:40:26 framenix kernel: ? exc_invalid_op+0x17/0x70
Jan 16 20:40:26 framenix kernel: ? asm_exc_invalid_op+0x1a/0x20
Jan 16 20:40:26 framenix kernel: ? dp_retrieve_lttpr_cap+0x122/0x1e0 [amdgpu]
Jan 16 20:40:26 framenix kernel: link_blank_all_dp_displays+0x56/0xd0 [amdgpu]
Jan 16 20:40:26 framenix kernel: dcn31_init_hw+0x1d4/0x840 [amdgpu]
Jan 16 20:40:26 framenix kernel: dc_set_power_state+0x5e/0xa0 [amdgpu]
Jan 16 20:40:26 framenix kernel: dm_resume+0xfc/0x880 [amdgpu]
Jan 16 20:40:26 framenix kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Jan 16 20:40:26 framenix kernel: ? _dev_info+0x79/0xa0
Jan 16 20:40:26 framenix kernel: amdgpu_device_ip_resume_phase2+0x4f/0xc0 [amdgpu]
Jan 16 20:40:26 framenix kernel: amdgpu_device_resume+0xa0/0x2c0 [amdgpu]
Jan 16 20:40:26 framenix kernel: ? __pfx_pci_pm_resume+0x10/0x10
Jan 16 20:40:26 framenix kernel: amdgpu_pmops_resume+0x4a/0x80 [amdgpu]
Jan 16 20:40:26 framenix kernel: ? __pfx_pci_pm_resume+0x10/0x10
Jan 16 20:40:26 framenix kernel: dpm_run_callback+0x89/0x1b0
Jan 16 20:40:26 framenix kernel: device_resume+0x88/0x190
Jan 16 20:40:26 framenix kernel: async_resume+0x1e/0x60
Jan 16 20:40:26 framenix kernel: async_run_entry_fn+0x31/0x130
Jan 16 20:40:26 framenix kernel: process_one_work+0x173/0x340
Jan 16 20:40:26 framenix kernel: worker_thread+0x27b/0x3a0
Jan 16 20:40:26 framenix kernel: ? __pfx_worker_thread+0x10/0x10
Jan 16 20:40:26 framenix kernel: kthread+0xd4/0x100
Jan 16 20:40:26 framenix kernel: ? __pfx_kthread+0x10/0x10
Jan 16 20:40:26 framenix kernel: ret_from_fork+0x31/0x50
Jan 16 20:40:26 framenix kernel: ? __pfx_kthread+0x10/0x10
Jan 16 20:40:26 framenix kernel: ret_from_fork_asm+0x1b/0x30
Jan 16 20:40:26 framenix kernel: </TASK>
Jan 16 20:40:26 framenix kernel: ---[ end trace 0000000000000000 ]---
Jan 16 20:40:26 framenix kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode).
Jan 16 20:40:26 framenix kernel: amdgpu 0000:c1:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
Jan 16 20:40:26 framenix kernel: amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
Jan 16 20:40:26 framenix kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Jan 16 20:40:26 framenix kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Jan 16 20:40:26 framenix kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
Jan 16 20:40:26 framenix kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
Jan 16 20:40:26 framenix kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
Jan 16 20:40:26 framenix kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
Jan 16 20:40:26 framenix kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
Jan 16 20:40:26 framenix kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
Jan 16 20:40:26 framenix kernel: amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
Jan 16 20:40:26 framenix kernel: amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
Jan 16 20:40:26 framenix kernel: amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
Jan 16 20:40:26 framenix kernel: amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
Jan 16 20:40:26 framenix kernel: [drm] ring gfx_32803.1.1 was added
Jan 16 20:40:26 framenix kernel: [drm] ring compute_32803.2.2 was added
Jan 16 20:40:26 framenix kernel: [drm] ring sdma_32803.3.3 was added
Jan 16 20:40:26 framenix kernel: [drm] ring gfx_32803.1.1 ib test pass
Jan 16 20:40:26 framenix kernel: [drm] ring compute_32803.2.2 ib test pass
Jan 16 20:40:26 framenix kernel: [drm] ring sdma_32803.3.3 ib test pass
Jan 16 20:40:26 framenix kernel: ucsi_acpi USBC000:00: GET_CONNECTOR_STATUS failed (-5)
Jan 16 20:40:26 framenix kernel: OOM killer enabled.
Jan 16 20:40:26 framenix kernel: Restarting tasks ...
On reporting this upstream: Is the kernel bugzilla or freedsktop/drm the proper place for such bug reports?