Hi!
I just received my Framework laptop (16) but I’m experiencing issues with what seems to be the amdgpu driver.
It happens randomy when the system load increases sometimes (it happened on Google maps for example, almost always with Firefox, never on Chromium for now).
To reproduce it, I can just open “Lost-O-Images” on http://webglsamples.org/, check all checkboxes and wait 3~5 seconds. It will either :
- Freeze the screen, flash to black but recover (not on this website though, the load is too high)
- Freeze the screen, flash to black and crash xorg so i’ll land on the login view
- Freeze the screen, flash to black and crash the whole OS, leading to a reboot (kernel panic I guess)
See at the end for an example of crash log (dmesg).
System infos :
- Debian stable (12) with latest kernel (6.10.11+bpo-amd64) and amdgpu driver (2.4.123-1~bpo12+1) from backports.
- Framework 16
- Currently used with a usb-c hub on port #2 with 4 USB, 1 RJ45, 1 HDMI, and power on port #4, audio jack on #5 (but the bug happens when nothing is plugged too, so that should not matter).
I tried adding amdgpu.ppfeaturemask=0xfffd3fff
or amdgpu.sg_display=0
to GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub (and then update-grub) but it did not help.
I was planning to use this for work but if I can’t make it work it’s just wasted money…
Thanks in advance for your help!
Crash log :
[ 78.581580] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=45147, emitted seq=45149
[ 78.581751] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process firefox-esr pid 2838 thread firefox-es:cs0 pid 2906
[ 78.581890] amdgpu 0000:c1:00.0: amdgpu: GPU reset begin!
[ 82.609877] amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
[ 82.609886] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[ 86.458455] amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
[ 86.458464] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[ 90.321521] amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
[ 90.321527] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[ 94.186092] amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
[ 94.186101] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[ 98.050104] amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
[ 98.050113] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[ 99.841252] rcu: INFO: rcu_preempt self-detected stall on CPU
[ 99.841260] rcu: 10-....: (5249 ticks this GP) idle=be84/1/0x4000000000000000 softirq=12434/12434 fqs=2621
[ 99.841266] rcu: (t=5250 jiffies g=12653 q=2400 ncpus=16)
[ 99.841270] CPU: 10 PID: 115 Comm: kworker/u64:2 Not tainted 6.10.11+bpo-amd64 #1 Debian 6.10.11-1~bpo12+1
[ 99.841273] Hardware name: Framework Laptop 16 (AMD Ryzen 7040 Series)/FRANMZCP07, BIOS 03.03 03/27/2024
[ 99.841275] Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
[ 99.841296] RIP: 0010:delay_halt_mwaitx+0x3c/0x50
[ 99.841305] Code: 31 d2 48 89 d1 48 05 00 60 00 00 0f 01 fa b8 ff ff ff ff b9 02 00 00 00 48 39 c6 48 0f 46 c6 48 89 c3 b8 f0 00 00 00 0f 01 fb <5b> e9 09 5f 2a 00 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 90 90
[ 99.841307] RSP: 0018:ffffba9ac0573960 EFLAGS: 00000297
[ 99.841309] RAX: 00000000000000f0 RBX: 0000000000001d65 RCX: 0000000000000002
[ 99.841310] RDX: 0000000000000000 RSI: 0000000000001d65 RDI: 00000063ab30a110
[ 99.841312] RBP: 0000000000001d65 R08: 0000000000000100 R09: 0000000000000003
[ 99.841314] R10: ffffba9ac0573a68 R11: ffffffff9ecca408 R12: 0000000000000040
[ 99.841315] R13: 00000000002dc6c0 R14: ffffa0a324c44290 R15: 0000000000000000
[ 99.841317] FS: 0000000000000000(0000) GS:ffffa0aa5e700000(0000) knlGS:0000000000000000
[ 99.841319] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 99.841320] CR2: 00007fa0018f1000 CR3: 0000000849c20000 CR4: 0000000000750ef0
[ 99.841322] PKRU: 55555554
[ 99.841323] Call Trace:
[ 99.841326] <IRQ>
[ 99.841330] ? rcu_dump_cpu_stacks+0xcb/0x110
[ 99.841338] ? rcu_sched_clock_irq+0x347/0x1100
[ 99.841346] ? srso_alias_return_thunk+0x5/0xfbef5
[ 99.841351] ? notifier_call_chain+0x5a/0xd0
[ 99.841357] ? srso_alias_return_thunk+0x5/0xfbef5
[ 99.841358] ? timekeeping_update+0xdd/0x130
[ 99.841368] ? srso_alias_return_thunk+0x5/0xfbef5
[ 99.841369] ? timekeeping_advance+0x377/0x590
[ 99.841371] ? srso_alias_return_thunk+0x5/0xfbef5
[ 99.841372] ? tmigr_requires_handle_remote+0x8d/0x100
[ 99.841382] ? update_process_times+0x6d/0xc0
[ 99.841385] ? tick_nohz_handler+0x8f/0x140
[ 99.841394] ? __pfx_tick_nohz_handler+0x10/0x10
[ 99.841397] ? __hrtimer_run_queues+0x10f/0x2a0
[ 99.841400] ? hrtimer_interrupt+0xfa/0x230
[ 99.841403] ? __sysvec_apic_timer_interrupt+0x55/0x150
[ 99.841410] ? sysvec_apic_timer_interrupt+0x6c/0x90
[ 99.841414] </IRQ>
[ 99.841415] <TASK>
[ 99.841416] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
[ 99.841431] ? delay_halt_mwaitx+0x3c/0x50
[ 99.841433] delay_halt+0x3c/0x70
[ 99.841438] amdgpu_fence_wait_polling+0x36/0x60 [amdgpu]
[ 99.841777] mes_v11_0_submit_pkt_and_poll_completion.constprop.0+0x2cc/0x3f0 [amdgpu]
[ 99.841935] mes_v11_0_unmap_legacy_queue+0x7f/0xd0 [amdgpu]
[ 99.842086] amdgpu_mes_unmap_legacy_queue+0x91/0xd0 [amdgpu]
[ 99.842231] amdgpu_gfx_disable_kcq+0xcf/0x190 [amdgpu]
[ 99.842375] gfx_v11_0_hw_fini+0x4d/0xf0 [amdgpu]
[ 99.842518] amdgpu_device_ip_suspend_phase2+0x102/0x1a0 [amdgpu]
[ 99.842633] ? amdgpu_device_ip_suspend_phase1+0x6c/0xe0 [amdgpu]
[ 99.842753] amdgpu_device_ip_suspend+0x40/0x70 [amdgpu]
[ 99.842872] amdgpu_device_pre_asic_reset+0xd0/0x2a0 [amdgpu]
[ 99.842992] amdgpu_device_gpu_recover+0x347/0xdc0 [amdgpu]
[ 99.843113] ? ___drm_dbg+0x90/0xd0 [drm]
[ 99.843134] amdgpu_job_timedout+0x13d/0x1f0 [amdgpu]
[ 99.843296] drm_sched_job_timedout+0x73/0x100 [gpu_sched]
[ 99.843300] process_one_work+0x179/0x390
[ 99.843304] worker_thread+0x265/0x380
[ 99.843307] ? __pfx_worker_thread+0x10/0x10
[ 99.843308] kthread+0xcf/0x100
[ 99.843311] ? __pfx_kthread+0x10/0x10
[ 99.843313] ret_from_fork+0x31/0x50
[ 99.843317] ? __pfx_kthread+0x10/0x10
[ 99.843319] ret_from_fork_asm+0x1a/0x30
[ 99.843324] </TASK>
[ 101.915761] amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
[ 101.915767] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[ 105.776549] amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
[ 105.776560] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[ 109.636910] amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
[ 109.636919] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[ 113.499030] amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
[ 113.499037] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[ 113.918974] [drm:gfx_v11_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
[ 113.920553] amdgpu 0000:c1:00.0: amdgpu: MODE2 reset
[ 113.930700] amdgpu 0000:c1:00.0: amdgpu: GPU reset succeeded, trying to resume
[ 113.931240] [drm] PCIE GART of 512M enabled (table at 0x000000801FD00000).
[ 113.931467] [drm] VRAM is lost due to GPU reset!
[ 113.931474] amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
[ 113.933286] amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
[ 113.934867] [drm] DMUB hardware initialized: version=0x08000500
[ 114.749854] pcieport 0000:00:08.1: PME: Spurious native interrupt!
[ 114.755339] [drm] kiq ring mec 3 pipe 1 q 0
[ 114.757425] amdgpu 0000:c1:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
[ 114.758323] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[ 114.758328] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ 114.758332] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ 114.758334] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[ 114.758337] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[ 114.758339] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[ 114.758341] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[ 114.758344] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[ 114.758347] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[ 114.758349] amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[ 114.758352] amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[ 114.758354] amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[ 114.758357] amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[ 114.770587] amdgpu 0000:c1:00.0: amdgpu: recover vram bo from shadow start
[ 114.770589] amdgpu 0000:c1:00.0: amdgpu: recover vram bo from shadow done
[ 114.770600] amdgpu 0000:c1:00.0: amdgpu: GPU reset(2) succeeded!
[ 114.772660] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!