[SOLVED] Amdgpu driver crash during high loads

Dagrut · October 29, 2024, 9:39pm

Hi!

I just received my Framework laptop (16) but I’m experiencing issues with what seems to be the amdgpu driver.

It happens randomy when the system load increases sometimes (it happened on Google maps for example, almost always with Firefox, never on Chromium for now).

To reproduce it, I can just open “Lost-O-Images” on http://webglsamples.org/, check all checkboxes and wait 3~5 seconds. It will either :

Freeze the screen, flash to black but recover (not on this website though, the load is too high)
Freeze the screen, flash to black and crash xorg so i’ll land on the login view
Freeze the screen, flash to black and crash the whole OS, leading to a reboot (kernel panic I guess)

See at the end for an example of crash log (dmesg).

System infos :

Debian stable (12) with latest kernel (6.10.11+bpo-amd64) and amdgpu driver (2.4.123-1~bpo12+1) from backports.
Framework 16
Currently used with a usb-c hub on port #2 with 4 USB, 1 RJ45, 1 HDMI, and power on port #4, audio jack on #5 (but the bug happens when nothing is plugged too, so that should not matter).

I tried adding amdgpu.ppfeaturemask=0xfffd3fff or amdgpu.sg_display=0 to GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub (and then update-grub) but it did not help.

I was planning to use this for work but if I can’t make it work it’s just wasted money…

Thanks in advance for your help!

Crash log :

[   78.581580] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=45147, emitted seq=45149
[   78.581751] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process firefox-esr pid 2838 thread firefox-es:cs0 pid 2906
[   78.581890] amdgpu 0000:c1:00.0: amdgpu: GPU reset begin!
[   82.609877] amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
[   82.609886] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[   86.458455] amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
[   86.458464] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[   90.321521] amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
[   90.321527] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[   94.186092] amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
[   94.186101] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[   98.050104] amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
[   98.050113] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[   99.841252] rcu: INFO: rcu_preempt self-detected stall on CPU
[   99.841260] rcu: 	10-....: (5249 ticks this GP) idle=be84/1/0x4000000000000000 softirq=12434/12434 fqs=2621
[   99.841266] rcu: 	(t=5250 jiffies g=12653 q=2400 ncpus=16)
[   99.841270] CPU: 10 PID: 115 Comm: kworker/u64:2 Not tainted 6.10.11+bpo-amd64 #1  Debian 6.10.11-1~bpo12+1
[   99.841273] Hardware name: Framework Laptop 16 (AMD Ryzen 7040 Series)/FRANMZCP07, BIOS 03.03 03/27/2024
[   99.841275] Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
[   99.841296] RIP: 0010:delay_halt_mwaitx+0x3c/0x50
[   99.841305] Code: 31 d2 48 89 d1 48 05 00 60 00 00 0f 01 fa b8 ff ff ff ff b9 02 00 00 00 48 39 c6 48 0f 46 c6 48 89 c3 b8 f0 00 00 00 0f 01 fb <5b> e9 09 5f 2a 00 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 90 90
[   99.841307] RSP: 0018:ffffba9ac0573960 EFLAGS: 00000297
[   99.841309] RAX: 00000000000000f0 RBX: 0000000000001d65 RCX: 0000000000000002
[   99.841310] RDX: 0000000000000000 RSI: 0000000000001d65 RDI: 00000063ab30a110
[   99.841312] RBP: 0000000000001d65 R08: 0000000000000100 R09: 0000000000000003
[   99.841314] R10: ffffba9ac0573a68 R11: ffffffff9ecca408 R12: 0000000000000040
[   99.841315] R13: 00000000002dc6c0 R14: ffffa0a324c44290 R15: 0000000000000000
[   99.841317] FS:  0000000000000000(0000) GS:ffffa0aa5e700000(0000) knlGS:0000000000000000
[   99.841319] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   99.841320] CR2: 00007fa0018f1000 CR3: 0000000849c20000 CR4: 0000000000750ef0
[   99.841322] PKRU: 55555554
[   99.841323] Call Trace:
[   99.841326]  <IRQ>
[   99.841330]  ? rcu_dump_cpu_stacks+0xcb/0x110
[   99.841338]  ? rcu_sched_clock_irq+0x347/0x1100
[   99.841346]  ? srso_alias_return_thunk+0x5/0xfbef5
[   99.841351]  ? notifier_call_chain+0x5a/0xd0
[   99.841357]  ? srso_alias_return_thunk+0x5/0xfbef5
[   99.841358]  ? timekeeping_update+0xdd/0x130
[   99.841368]  ? srso_alias_return_thunk+0x5/0xfbef5
[   99.841369]  ? timekeeping_advance+0x377/0x590
[   99.841371]  ? srso_alias_return_thunk+0x5/0xfbef5
[   99.841372]  ? tmigr_requires_handle_remote+0x8d/0x100
[   99.841382]  ? update_process_times+0x6d/0xc0
[   99.841385]  ? tick_nohz_handler+0x8f/0x140
[   99.841394]  ? __pfx_tick_nohz_handler+0x10/0x10
[   99.841397]  ? __hrtimer_run_queues+0x10f/0x2a0
[   99.841400]  ? hrtimer_interrupt+0xfa/0x230
[   99.841403]  ? __sysvec_apic_timer_interrupt+0x55/0x150
[   99.841410]  ? sysvec_apic_timer_interrupt+0x6c/0x90
[   99.841414]  </IRQ>
[   99.841415]  <TASK>
[   99.841416]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
[   99.841431]  ? delay_halt_mwaitx+0x3c/0x50
[   99.841433]  delay_halt+0x3c/0x70
[   99.841438]  amdgpu_fence_wait_polling+0x36/0x60 [amdgpu]
[   99.841777]  mes_v11_0_submit_pkt_and_poll_completion.constprop.0+0x2cc/0x3f0 [amdgpu]
[   99.841935]  mes_v11_0_unmap_legacy_queue+0x7f/0xd0 [amdgpu]
[   99.842086]  amdgpu_mes_unmap_legacy_queue+0x91/0xd0 [amdgpu]
[   99.842231]  amdgpu_gfx_disable_kcq+0xcf/0x190 [amdgpu]
[   99.842375]  gfx_v11_0_hw_fini+0x4d/0xf0 [amdgpu]
[   99.842518]  amdgpu_device_ip_suspend_phase2+0x102/0x1a0 [amdgpu]
[   99.842633]  ? amdgpu_device_ip_suspend_phase1+0x6c/0xe0 [amdgpu]
[   99.842753]  amdgpu_device_ip_suspend+0x40/0x70 [amdgpu]
[   99.842872]  amdgpu_device_pre_asic_reset+0xd0/0x2a0 [amdgpu]
[   99.842992]  amdgpu_device_gpu_recover+0x347/0xdc0 [amdgpu]
[   99.843113]  ? ___drm_dbg+0x90/0xd0 [drm]
[   99.843134]  amdgpu_job_timedout+0x13d/0x1f0 [amdgpu]
[   99.843296]  drm_sched_job_timedout+0x73/0x100 [gpu_sched]
[   99.843300]  process_one_work+0x179/0x390
[   99.843304]  worker_thread+0x265/0x380
[   99.843307]  ? __pfx_worker_thread+0x10/0x10
[   99.843308]  kthread+0xcf/0x100
[   99.843311]  ? __pfx_kthread+0x10/0x10
[   99.843313]  ret_from_fork+0x31/0x50
[   99.843317]  ? __pfx_kthread+0x10/0x10
[   99.843319]  ret_from_fork_asm+0x1a/0x30
[   99.843324]  </TASK>
[  101.915761] amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
[  101.915767] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[  105.776549] amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
[  105.776560] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[  109.636910] amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
[  109.636919] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[  113.499030] amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
[  113.499037] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[  113.918974] [drm:gfx_v11_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
[  113.920553] amdgpu 0000:c1:00.0: amdgpu: MODE2 reset
[  113.930700] amdgpu 0000:c1:00.0: amdgpu: GPU reset succeeded, trying to resume
[  113.931240] [drm] PCIE GART of 512M enabled (table at 0x000000801FD00000).
[  113.931467] [drm] VRAM is lost due to GPU reset!
[  113.931474] amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
[  113.933286] amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
[  113.934867] [drm] DMUB hardware initialized: version=0x08000500
[  114.749854] pcieport 0000:00:08.1: PME: Spurious native interrupt!
[  114.755339] [drm] kiq ring mec 3 pipe 1 q 0
[  114.757425] amdgpu 0000:c1:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
[  114.758323] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[  114.758328] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[  114.758332] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[  114.758334] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[  114.758337] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[  114.758339] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[  114.758341] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[  114.758344] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[  114.758347] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[  114.758349] amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[  114.758352] amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[  114.758354] amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[  114.758357] amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[  114.770587] amdgpu 0000:c1:00.0: amdgpu: recover vram bo from shadow start
[  114.770589] amdgpu 0000:c1:00.0: amdgpu: recover vram bo from shadow done
[  114.770600] amdgpu 0000:c1:00.0: amdgpu: GPU reset(2) succeeded!
[  114.772660] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!

lbkNhubert · October 29, 2024, 9:59pm

Maybe try setting the graphics memory to gaming in the bios. Ran those on my setup and no crashing. Good luck, I hope that you are able to get it resolved.

Dagrut · October 29, 2024, 10:12pm

Thanks, I just tried, but it still crashed :-/ .

Also I forgot to mention that upgrading the kernel did help, it was way more unstable before that.

lbkNhubert · October 29, 2024, 10:59pm

Sorry to hear that. The only other thing that I can think of is to boot from a live usb and see if you can reproduce it. Say, Fedora, in case you need to open a ticket with Support, as they likely will want you to test it on a supported distro. I am on kernel 6.11.5-arch1-1 on arch if it matters.

Dagrut · October 30, 2024, 8:07am

I tested it with fedora (this release : Fedora Xfce | The Fedora Project) and it does not crash. However, I experienced something weird : the interface was laggy : I could type on the keyboard or click but the screen would refresh only when I moved the mouse or when too much activity happened. I moved the window with the WebGL experiment to the HDMI screen and it suddenly ran smoothly, no jerky screen refresh.

I’ll ask the support to see what they think, thanks!

Edit: The kernel on fedora was a 6.11.4-301-fc41.x86_64

Dagrut · November 1, 2024, 9:14pm

I contacted the support, and after quite a lot of questions and tests (testing RAM, motherboard settings reset, etc.) we couldn’t find any way to make it work.

I tested xubuntu on a livecd, which seemed to work completely fine out of the box. I then took the decision to reinstall my system (which was quite painful since xubutu does not support cryptsetup alone during the install, so I had to install it on a separate hard drive, copy the files, update crypttab/fstab and rebuild the /boot…). I wasn’t so happy initially because I left ubuntu a while ago because of stability issues (long term), but I hope it has improved.

And finally it works! . No crash, no slowdowns, no freeze!!

FYI here are the versions I have now :

Linux kernel 6.8.0-48-generic
libdrm-amdgpu1 2.4.120-2build1

Even if the issue itself isn’t solved, I guess I’ll close this tread

Thanks again!

Edit: I’ll close it… If I find how to do that

pkunk · November 2, 2024, 4:49am

Hi @Dagrut,

Glad to hear you got it resolved. There are so many different linux variants it would be impossible to test all the variants for compatibility. Glad you had the knowledge and experience to basically hand install the distro to your liking!

I will mark the thread solved for you! Congrats on your Framework Laptop 16 and welcome to the community!

Tagging @Matt_Hartley in case he has not come across this as he is one of the linux masters for Framework!

Linux Linux

system · May 1, 2025, 4:49am

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Crashes probably related to amdgpu since 2 days Linux ubuntu	5	341	December 5, 2025
[RESPONDED] Crashing amdgpu on AMD Ryzen 7040 13-inch (Ubuntu 22.04) Linux ubuntu	16	4505	August 1, 2024
AMD Drivers Frequently Hanging and Crashing Framework Desktop	13	802	January 26, 2026
DCMUB Error on BIOS 3.05 + Kernel 6.13.1 hit a very nasty AMDGPU bug on Framework Laptop 13 (AMD Ryzen 7 7840U) Linux nixos	16	1263	January 11, 2026
AMD Framework 13 with Debian still crashing a year later Linux debian	20	1958	November 14, 2025

[SOLVED] Amdgpu driver crash during high loads

Related topics