FW13 AMD cpu/gpu bug?

  • Which OS (Operating System)?
  • Fedora 41
  • Which Framework laptop (11th, 12th or 13th generation Framework laptop, Chromebook or Framework Laptop 16) are you asking for support with?
  • FW13 AMD

Hello folks,

I have been struggling with this issue for a while with this laptop.
The issue being: The laptop would unpredictably go into a soft or hard lockup.
I am not sure what causes it. The laptop could be browsing the web, idling or even in suspend and the laptop would suddenly lockup. The lockup causes the CPU/GPU to ramp up power to 30+W and heat up. There have been more than once where i pull my laptop out of my bag feeling like it could cook eggs. I don’t think it has anything to do with the infamous FW13 AMD suspend bug as it happens while the laptop is active too.
I have got some systemd journals for this but the lockup don’t correlate with any particular software. It seems to go off more likely the longer I do not power off the laptop. I keep the kernel and firmware up to date so i have no idea what is going on.
To add on, this problem seem to be getting more and more frequent.
I remember when I first got this laptop, I could keep the laptop on for days before powering off. Now, I live in fear that my laptop would burst into flames every time I put it into my bag while in suspend.

systemd journals of an example lockup if they are helpful:

Feb 09 01:13:51 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)
Feb 09 01:13:51 localhost kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
Feb 09 01:13:51 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)
Feb 09 01:13:51 localhost kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
Feb 09 01:13:54 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)
Feb 09 01:13:54 localhost kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
Feb 09 01:13:54 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)
Feb 09 01:13:54 localhost kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
Feb 09 01:15:42 localhost kernel: Lockdown: systemd-logind: hibernation is restricted; see man kernel_lockdown.7
Feb 09 01:15:43 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)
Feb 09 01:15:43 localhost kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
Feb 09 01:15:46 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)
Feb 09 01:15:46 localhost kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
Feb 09 01:15:55 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)
Feb 09 01:15:55 localhost kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
Feb 09 01:15:57 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)
Feb 09 01:15:57 localhost kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
Feb 09 01:16:34 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)
Feb 09 01:16:34 localhost kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
Feb 09 01:16:36 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)
Feb 09 01:16:36 localhost kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
Feb 09 01:16:39 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)
Feb 09 01:16:39 localhost kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
Feb 09 01:16:39 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)
Feb 09 01:16:39 localhost kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
Feb 09 01:16:42 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)
Feb 09 01:16:42 localhost kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
Feb 09 01:16:42 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)
Feb 09 01:16:42 localhost kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
Feb 09 01:16:47 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)
Feb 09 01:16:47 localhost kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
Feb 09 01:16:47 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES ring buffer is full.
Feb 09 01:18:09 localhost kernel: [drm] Fence fallback timer expired on ring gfx_0.0.0
Feb 09 01:18:09 localhost kernel: [drm] Fence fallback timer expired on ring gfx_0.0.0
Feb 09 01:18:11 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES ring buffer is full.
Feb 09 01:18:11 localhost kernel: [drm] Fence fallback timer expired on ring sdma0
Feb 09 01:18:11 localhost kernel: [drm] Fence fallback timer expired on ring gfx_0.0.0
Feb 09 01:18:12 localhost kernel: [drm] Fence fallback timer expired on ring gfx_0.0.0
Feb 09 01:18:13 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES ring buffer is full.
Feb 09 01:18:16 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES ring buffer is full.
Feb 09 01:18:16 localhost kernel: clocksource: Long readout interval, skipping watchdog check: cs_nsec: 2364299560 wd_nsec: 2364277512
Feb 09 01:18:18 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES ring buffer is full.
Feb 09 01:18:24 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES ring buffer is full.
Feb 09 01:18:26 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES ring buffer is full.
Feb 09 01:18:29 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES ring buffer is full.
Feb 09 01:18:31 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES ring buffer is full.
Feb 09 01:18:37 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES ring buffer is full.
Feb 09 01:18:38 localhost kernel: i2c_designware AMDI0010:00: controller timed out
Feb 09 01:18:39 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES ring buffer is full.
Feb 09 01:18:39 localhost kernel: clocksource: Long readout interval, skipping watchdog check: cs_nsec: 4987196350 wd_nsec: 4987149343
Feb 09 01:18:42 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES ring buffer is full.
Feb 09 01:18:44 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES ring buffer is full.
Feb 09 01:19:54 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES ring buffer is full.
Feb 09 01:19:54 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES ring buffer is full.
Feb 09 01:19:54 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES ring buffer is full.
Feb 09 01:19:54 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES ring buffer is full.
Feb 09 01:19:54 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES ring buffer is full.
Feb 09 01:19:54 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES ring buffer is full.
Feb 09 01:19:54 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES ring buffer is full.
Feb 09 01:19:54 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES ring buffer is full.
Feb 09 01:19:54 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES ring buffer is full.
Feb 09 01:19:54 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES ring buffer is full.
Feb 09 01:19:54 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES ring buffer is full.
Feb 09 01:19:54 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES ring buffer is full.
Feb 09 01:19:54 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES ring buffer is full.
Feb 09 01:19:54 localhost kernel: watchdog: BUG: soft lockup - CPU#8 stuck for 23s! [kworker/u65:13:26756]
Feb 09 01:19:54 localhost kernel: CPU#8 Utilization every 4s during lockup:
Feb 09 01:19:54 localhost kernel:         #1:   2% system,          0% softirq,          0% hardirq,         99% idle
Feb 09 01:19:54 localhost kernel:         #2:   6% system,          0% softirq,          1% hardirq,         94% idle
Feb 09 01:19:54 localhost kernel:         #3:   3% system,          0% softirq,          1% hardirq,         98% idle
Feb 09 01:19:54 localhost kernel:         #4:   0% system,          0% softirq,          0% hardirq,         79% idle
Feb 09 01:19:54 localhost kernel:         #5:   1% system,          0% softirq,          0% hardirq,          0% idle
Feb 09 01:19:54 localhost kernel: Modules linked in: tun snd_seq_dummy snd_hrtimer nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib ip_set vfat fat snd_sof_amd_acp70 snd_sof_amd_acp63 snd_soc_acpi_amd_match snd_sof_amd_vangogh snd_sof_amd_rembrandt sn>
Feb 09 01:19:54 localhost kernel:  hid_sensor_trigger snd_acp_legacy_common hid_sensor_iio_common libarc4 kvm snd_seq_device industrialio_triggered_buffer snd_pci_acp6x kfifo_buf snd_pcm rapl industrialio wmi_bmof pcspkr snd_pci_acp5x cfg80211 snd_r>
Feb 09 01:19:54 localhost kernel: CPU: 8 UID: 0 PID: 26756 Comm: kworker/u65:13 Not tainted 6.12.11-200.fc41.x86_64 #1
Feb 09 01:19:54 localhost kernel: Hardware name: Framework Laptop 13 (AMD Ryzen 7040Series)/FRANMDCP07, BIOS 03.05 03/29/2024
Feb 09 01:19:54 localhost kernel: Workqueue: ttm ttm_bo_delayed_delete [ttm]
Feb 09 01:19:54 localhost kernel: RIP: 0010:handle_softirqs+0x8b/0x340
Feb 09 01:19:54 localhost kernel: Code: 00 e8 d9 76 07 00 89 6c 24 18 89 5c 24 0c 44 88 74 24 08 4c 89 3c 24 45 89 ef 31 c0 65 66 89 05 73 d3 f3 7d fb 0f 1f 44 00 00 <41> bd ff ff ff ff 48 c7 c3 c0 60 80 84 45 0f bc ef 41 83 c5 01 49
Feb 09 01:19:54 localhost kernel: RSP: 0018:ffffac788032cf80 EFLAGS: 00000246
Feb 09 01:19:54 localhost kernel: RAX: 0000000000000000 RBX: 000000000000000a RCX: 0000000000000000
Feb 09 01:19:54 localhost kernel: RDX: 000000000000006f RSI: 0000000026dc71fd RDI: 0000000000000002
Feb 09 01:19:54 localhost kernel: RBP: 0000000004208060 R08: 0000000000000000 R09: 0000000000000000
Feb 09 01:19:54 localhost kernel: R10: 0000000000000000 R11: ffffac788032cff8 R12: 0000000000000000
Feb 09 01:19:54 localhost kernel: R13: 0000000000000200 R14: 0000000000000000 R15: 0000000000000200
Feb 09 01:19:54 localhost kernel: FS:  0000000000000000(0000) GS:ffff9fb85e600000(0000) knlGS:0000000000000000
Feb 09 01:19:54 localhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 09 01:19:54 localhost kernel: CR2: 0000681f4a38d5f8 CR3: 00000006c582a000 CR4: 0000000000f50ef0
Feb 09 01:19:54 localhost kernel: PKRU: 55555554
Feb 09 01:19:54 localhost kernel: Call Trace:
Feb 09 01:19:54 localhost kernel:  <IRQ>
Feb 09 01:19:54 localhost kernel:  ? watchdog_timer_fn.cold+0x233/0x311
Feb 09 01:19:54 localhost kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Feb 09 01:19:54 localhost kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Feb 09 01:19:54 localhost kernel:  ? __pfx_watchdog_timer_fn+0x10/0x10
Feb 09 01:19:54 localhost kernel:  ? __hrtimer_run_queues+0x113/0x280
Feb 09 01:19:54 localhost kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Feb 09 01:19:54 localhost kernel:  ? hrtimer_interrupt+0xfa/0x210
Feb 09 01:19:54 localhost kernel:  ? __sysvec_apic_timer_interrupt+0x52/0x100
Feb 09 01:19:54 localhost kernel:  ? sysvec_apic_timer_interrupt+0x38/0x90
Feb 09 01:19:54 localhost kernel:  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
Feb 09 01:19:54 localhost kernel:  ? handle_softirqs+0x8b/0x340
Feb 09 01:19:54 localhost kernel:  ? sched_clock_cpu+0xf/0x1f0
Feb 09 01:19:54 localhost kernel:  __irq_exit_rcu+0x97/0xb0
Feb 09 01:19:54 localhost kernel:  sysvec_call_function_single+0x71/0x90
Feb 09 01:19:54 localhost kernel:  </IRQ>
Feb 09 01:19:54 localhost kernel:  <TASK>
Feb 09 01:19:54 localhost kernel:  asm_sysvec_call_function_single+0x1a/0x20
Feb 09 01:19:54 localhost kernel: RIP: 0010:_raw_spin_unlock_irqrestore+0x1d/0x40
Feb 09 01:19:54 localhost kernel: Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 c6 07 00 0f 1f 00 f7 c6 00 02 00 00 74 06 fb 0f 1f 44 00 00 <65> ff 0d 74 4d e1 7c 74 05 e9 f0 2d 2e 00 0f 1f 44 00 00 e9 e6 2d
Feb 09 01:19:54 localhost kernel: RSP: 0018:ffffac7888fffa28 EFLAGS: 00000206
Feb 09 01:19:54 localhost kernel: RAX: 0000000000000000 RBX: ffffac7888fffbb8 RCX: 0000000000000000
Feb 09 01:19:54 localhost kernel: RDX: 0000000000000000 RSI: 0000000000000286 RDI: ffff9fb6c0245108
Feb 09 01:19:54 localhost kernel: RBP: ffff9fb6c0200000 R08: 0000000000000000 R09: 0000000000000000
Feb 09 01:19:54 localhost kernel: R10: ffffffffffffffff R11: 0000000000000000 R12: 0000000000200b20
Feb 09 01:19:54 localhost kernel: R13: ffffac7888fffa78 R14: ffff9fb6c0244a30 R15: 0000000000000000
Feb 09 01:19:54 localhost kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Feb 09 01:19:54 localhost kernel:  mes_v11_0_submit_pkt_and_poll_completion.constprop.0+0x284/0x440 [amdgpu]
Feb 09 01:19:54 localhost kernel:  mes_v11_0_misc_op+0xa0/0x150 [amdgpu]
Feb 09 01:19:54 localhost kernel:  amdgpu_mes_reg_write_reg_wait+0x66/0xc0 [amdgpu]
Feb 09 01:19:54 localhost kernel:  amdgpu_gmc_fw_reg_write_reg_wait+0x191/0x1f0 [amdgpu]
Feb 09 01:19:54 localhost kernel:  amdgpu_gmc_flush_gpu_tlb+0xe4/0x280 [amdgpu]
Feb 09 01:19:54 localhost kernel:  amdgpu_gart_invalidate_tlb.part.0+0x5e/0x90 [amdgpu]
Feb 09 01:19:54 localhost kernel:  amdgpu_gart_unbind+0x9c/0xd0 [amdgpu]
Feb 09 01:19:54 localhost kernel:  amdgpu_ttm_backend_unbind+0x64/0xb0 [amdgpu]
Feb 09 01:19:54 localhost kernel:  amdgpu_ttm_tt_unpopulate+0x16/0xd0 [amdgpu]
Feb 09 01:19:54 localhost kernel:  ttm_tt_unpopulate+0x26/0x80 [ttm]
Feb 09 01:19:54 localhost kernel:  ttm_bo_cleanup_memtype_use+0x3a/0x70 [ttm]
Feb 09 01:19:54 localhost kernel:  ttm_bo_delayed_delete+0x44/0x80 [ttm]
Feb 09 01:19:54 localhost kernel:  process_one_work+0x176/0x330
Feb 09 01:19:54 localhost kernel:  worker_thread+0x252/0x390
Feb 09 01:19:54 localhost kernel:  ? __pfx_worker_thread+0x10/0x10
Feb 09 01:19:54 localhost kernel:  kthread+0xcf/0x100
Feb 09 01:19:54 localhost kernel:  ? __pfx_kthread+0x10/0x10
Feb 09 01:19:54 localhost kernel:  ret_from_fork+0x31/0x50
Feb 09 01:19:54 localhost kernel:  ? __pfx_kthread+0x10/0x10
Feb 09 01:19:54 localhost kernel:  ret_from_fork_asm+0x1a/0x30
Feb 09 01:19:54 localhost kernel:  </TASK>
Feb 09 01:19:54 localhost kernel: amdgpu 0000:c1:00.0: amdgpu: MES ring buffer is full.
Feb 09 01:19:54 localhost kernel: watchdog: BUG: soft lockup - CPU#15 stuck for 26s! [kworker/u65:15:29746]
Feb 09 01:19:54 localhost kernel: CPU#15 Utilization every 4s during lockup:
Feb 09 01:19:54 localhost kernel:         #1:   0% system,          0% softirq,          0% hardirq,         17% idle
Feb 09 01:19:54 localhost kernel:         #2:   3% system,          0% softirq,          0% hardirq,        136% idle
Feb 09 01:19:54 localhost kernel:         #3:   1% system,          0% softirq,          0% hardirq,         35% idle
Feb 09 01:19:54 localhost kernel:         #4:   0% system,          0% softirq,          0% hardirq,         78% idle
Feb 09 01:19:54 localhost kernel:         #5:   0% system,          0% softirq,          0% hardirq,          0% idle
Feb 09 01:19:54 localhost kernel: Modules linked in: tun snd_seq_dummy snd_hrtimer nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib ip_set vfat fat snd_sof_amd_acp70 snd_sof_amd_acp63 snd_soc_acpi_amd_match snd_sof_amd_vangogh snd_sof_amd_rembrandt sn>
Feb 09 01:19:54 localhost kernel:  hid_sensor_trigger snd_acp_legacy_common hid_sensor_iio_common libarc4 kvm snd_seq_device industrialio_triggered_buffer snd_pci_acp6x kfifo_buf snd_pcm rapl industrialio wmi_bmof pcspkr snd_pci_acp5x cfg80211 snd_r>
Feb 09 01:19:54 localhost kernel: CPU: 15 UID: 0 PID: 29746 Comm: kworker/u65:15 Tainted: G             L     6.12.11-200.fc41.x86_64 #1
Feb 09 01:19:54 localhost kernel: Tainted: [L]=SOFTLOCKUP
Feb 09 01:19:54 localhost kernel: Hardware name: Framework Laptop 13 (AMD Ryzen 7040Series)/FRANMDCP07, BIOS 03.05 03/29/2024
Feb 09 01:19:54 localhost kernel: Workqueue: ttm ttm_bo_delayed_delete [ttm]
Feb 09 01:19:54 localhost kernel: RIP: 0010:_raw_spin_unlock_irqrestore+0x1d/0x40
Feb 09 01:19:54 localhost kernel: Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 c6 07 00 0f 1f 00 f7 c6 00 02 00 00 74 06 fb 0f 1f 44 00 00 <65> ff 0d 74 4d e1 7c 74 05 e9 f0 2d 2e 00 0f 1f 44 00 00 e9 e6 2d
Feb 09 01:19:54 localhost kernel: RSP: 0018:ffffac78861cba28 EFLAGS: 00000206
Feb 09 01:19:54 localhost kernel: RAX: 0000000000000000 RBX: ffffac78861cbbb8 RCX: 0000000000000000
Feb 09 01:19:54 localhost kernel: RDX: 0000000000000000 RSI: 0000000000000282 RDI: ffff9fb6c0245108
Feb 09 01:19:54 localhost kernel: RBP: ffff9fb6c0200000 R08: 0000000000000000 R09: 0000000000000000
Feb 09 01:19:54 localhost kernel: R10: ffffffffffffffff R11: 0000000000000000 R12: 0000000000200b20
Feb 09 01:19:54 localhost kernel: R13: ffffac78861cba78 R14: ffff9fb6c0244a30 R15: 0000000000000000
Feb 09 01:19:54 localhost kernel: FS:  0000000000000000(0000) GS:ffff9fb85e980000(0000) knlGS:0000000000000000
Feb 09 01:19:54 localhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 09 01:19:54 localhost kernel: CR2: 00006d4c9aafecd8 CR3: 00000006c582a000 CR4: 0000000000f50ef0
Feb 09 01:19:54 localhost kernel: PKRU: 55555554
Feb 09 01:19:54 localhost kernel: Call Trace:
Feb 09 01:19:54 localhost kernel:  <IRQ>
Feb 09 01:19:54 localhost kernel:  ? watchdog_timer_fn.cold+0x233/0x311
Feb 09 01:19:54 localhost kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Feb 09 01:19:54 localhost kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Feb 09 01:19:54 localhost kernel:  ? __pfx_watchdog_timer_fn+0x10/0x10
Feb 09 01:19:54 localhost kernel:  ? __hrtimer_run_queues+0x113/0x280
Feb 09 01:19:54 localhost kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Feb 09 01:19:54 localhost kernel:  ? hrtimer_interrupt+0xfa/0x210
Feb 09 01:19:54 localhost kernel:  ? __sysvec_apic_timer_interrupt+0x52/0x100
Feb 09 01:19:54 localhost kernel:  </IRQ>
Feb 09 01:19:54 localhost kernel:  <TASK>
Feb 09 01:19:54 localhost kernel:  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
Feb 09 01:19:54 localhost kernel:  ? _raw_spin_unlock_irqrestore+0x1d/0x40
Feb 09 01:19:54 localhost kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Feb 09 01:19:54 localhost kernel:  mes_v11_0_submit_pkt_and_poll_completion.constprop.0+0x284/0x440 [amdgpu]
Feb 09 01:19:54 localhost kernel:  mes_v11_0_misc_op+0xa0/0x150 [amdgpu]
Feb 09 01:19:54 localhost kernel:  amdgpu_mes_reg_write_reg_wait+0x66/0xc0 [amdgpu]
Feb 09 01:19:54 localhost kernel:  amdgpu_gmc_fw_reg_write_reg_wait+0x191/0x1f0 [amdgpu]
Feb 09 01:19:54 localhost kernel:  amdgpu_gmc_flush_gpu_tlb+0xe4/0x280 [amdgpu]
Feb 09 01:19:54 localhost kernel:  amdgpu_gart_invalidate_tlb.part.0+0x5e/0x90 [amdgpu]
Feb 09 01:19:54 localhost kernel:  amdgpu_gart_unbind+0x9c/0xd0 [amdgpu]
Feb 09 01:19:54 localhost kernel:  amdgpu_ttm_backend_unbind+0x64/0xb0 [amdgpu]
Feb 09 01:19:54 localhost kernel:  amdgpu_ttm_tt_unpopulate+0x16/0xd0 [amdgpu]
Feb 09 01:19:54 localhost kernel:  ttm_tt_unpopulate+0x26/0x80 [ttm]
Feb 09 01:19:54 localhost kernel:  ttm_bo_cleanup_memtype_use+0x3a/0x70 [ttm]
Feb 09 01:19:54 localhost kernel:  ttm_bo_delayed_delete+0x44/0x80 [ttm]
Feb 09 01:19:54 localhost kernel:  process_one_work+0x176/0x330
Feb 09 01:19:54 localhost kernel:  worker_thread+0x252/0x390
Feb 09 01:19:54 localhost kernel:  ? __pfx_worker_thread+0x10/0x10
Feb 09 01:19:54 localhost kernel:  kthread+0xcf/0x100
Feb 09 01:19:54 localhost kernel:  ? __pfx_kthread+0x10/0x10
Feb 09 01:19:54 localhost kernel:  ret_from_fork+0x31/0x50
Feb 09 01:19:54 localhost kernel:  ? __pfx_kthread+0x10/0x10
Feb 09 01:19:54 localhost kernel:  ret_from_fork_asm+0x1a/0x30
Feb 09 01:19:54 localhost kernel:  </TASK>

Hi,

The log records you posted point to the problem being an AMD gpu driver bug.
On Linux two things affect this:

  1. linux kernel version. uname -a
  2. linux AMD firmware.

Also, being an amdgpu driver, it would be useful if you reported the bug to the amd driver bugtracker here: