Outstanding problems with LG monitor Thunderbolt stability / GPU hangs

Lukasz_Siudut · January 12, 2026, 11:01am

Edit: Happened again within 5 minutes. Switching full card in Google Photos seems to be triggering it quite aggressively. I’ll check on rocm-gdb. Have no idea how to use it but if I succeed maybe I’ll gather more data.

Edit2: Or won’t… rocm-gdb seems like a tool to debug rocm hip kernels. Unless I’m missing something.

Edit3: Stability is awful. Hang every 5-10 minutes. I rolled back the kernel to 6.18.3 to see if this is a matter of my current workflow or the kernel itself. linux-firmware also got updated in the meantime, but I assume that new release should contain all the up-to-date stuff.

Edit4: The same thing is happening w/ 6.18.3. It may be just a wild impression, but it seems that it triggers much faster on 6.18.4 - I managed to make it happen 3 times in a row within 15 minutes window, whereas it took me quite a while on 6.18.3. Similar workload.

Happened again, so no, things are not resolved. While running kernel 6.18.4-arch kernel, browsing Google Photos w/ Brave.

Monitors didn’t recover and remained black / disabled. This time I let it cook for a moment before reboot + took a look remotely. Hang log doesn’t stand out, but this time I also caught hung tasks. I preserved the coredump and can provide it if needed.

[155552.381837] amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:6 pasid:32788)
[155552.381844] amdgpu 0000:c1:00.0: amdgpu:  Process brave pid 3237 thread brave:cs0 pid 3262
[155552.381846] amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x000000003f800000 from client 10
[155552.381848] amdgpu 0000:c1:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00601430
[155552.381849] amdgpu 0000:c1:00.0: amdgpu:     Faulty UTCL2 client ID: SQC (data) (0xa)
[155552.381850] amdgpu 0000:c1:00.0: amdgpu:     MORE_FAULTS: 0x0
[155552.381851] amdgpu 0000:c1:00.0: amdgpu:     WALKER_ERROR: 0x0
[155552.381852] amdgpu 0000:c1:00.0: amdgpu:     PERMISSION_FAULTS: 0x3
[155552.381852] amdgpu 0000:c1:00.0: amdgpu:     MAPPING_ERROR: 0x0
[155552.381853] amdgpu 0000:c1:00.0: amdgpu:     RW: 0x0
[155562.733993] amdgpu 0000:c1:00.0: amdgpu: Dumping IP State
[155562.734971] amdgpu 0000:c1:00.0: amdgpu: Dumping IP State Completed
[155562.735061] amdgpu 0000:c1:00.0: amdgpu: [drm] AMDGPU device coredump file has been created
[155562.735063] amdgpu 0000:c1:00.0: amdgpu: [drm] Check your /sys/class/drm/card1/device/devcoredump/data
[155562.735064] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=5790839, emitted seq=5790841
[155562.735066] amdgpu 0000:c1:00.0: amdgpu:  Process brave pid 3237 thread brave:cs0 pid 3262
[155562.735068] amdgpu 0000:c1:00.0: amdgpu: Starting gfx_0.0.0 ring reset
[155564.738873] amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=RESET
[155564.738884] amdgpu 0000:c1:00.0: amdgpu: failed to reset legacy queue
[155564.738886] amdgpu 0000:c1:00.0: amdgpu: reset via MES failed and try pipe reset -110
[155564.738888] amdgpu 0000:c1:00.0: amdgpu: The CPFW hasn't support pipe reset yet.
[155564.738889] amdgpu 0000:c1:00.0: amdgpu: Ring gfx_0.0.0 reset failed
[155564.738891] amdgpu 0000:c1:00.0: amdgpu: GPU reset begin!. Source:  1
[155566.887593] amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
[155566.887599] amdgpu 0000:c1:00.0: amdgpu: failed to unmap legacy queue
[155567.076743] [drm:gfx_v11_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
[155567.078035] amdgpu 0000:c1:00.0: amdgpu: MODE2 reset
[155567.104173] amdgpu 0000:c1:00.0: amdgpu: GPU reset succeeded, trying to resume
[155567.104310] [drm] PCIE GART of 512M enabled (table at 0x000000801FB00000).
[155567.104324] amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
[155567.107817] amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
[155567.116905] amdgpu 0000:c1:00.0: amdgpu: [drm] DMUB hardware initialized: version=0x09003600
[155567.123374] thunderbolt 0000:c3:00.6: 0: failed to allocate DP resource for port 7
[155577.582467] amdgpu 0000:c1:00.0: amdgpu: [drm] *ERROR* wait_for_completion_timeout timeout!
[155579.172466] thunderbolt 0000:c3:00.6: 0:6 <-> 702:10 (DP): not active, tearing down
[155587.822561] amdgpu 0000:c1:00.0: amdgpu: [drm] *ERROR* wait_for_completion_timeout timeout!
[155598.062430] amdgpu 0000:c1:00.0: amdgpu: [drm] *ERROR* wait_for_completion_timeout timeout!
[155608.302723] amdgpu 0000:c1:00.0: amdgpu: [drm] *ERROR* wait_for_completion_timeout timeout!
[155618.542853] amdgpu 0000:c1:00.0: amdgpu: [drm] *ERROR* wait_for_completion_timeout timeout!
[155628.783083] amdgpu 0000:c1:00.0: amdgpu: [drm] *ERROR* wait_for_completion_timeout timeout!
[155639.023064] amdgpu 0000:c1:00.0: amdgpu: [drm] *ERROR* wait_for_completion_timeout timeout!
[155649.263201] amdgpu 0000:c1:00.0: amdgpu: [drm] *ERROR* wait_for_completion_timeout timeout!

And the hung tasks logs. Few kworkers hanging, but all traces are the same:

[155725.040900] INFO: task kworker/9:2:70120 blocked for more than 122 seconds.
[155725.040909]       Tainted: G        W  OE       6.18.4-arch1-1 #1
[155725.040911] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[155725.040912] task:kworker/9:2     state:D stack:0     pid:70120 tgid:70120 ppid:2      task_flags:0x4208060 flags:0x00080000
[155725.040918] Workqueue: events amdgpu_tlb_fence_work [amdgpu]
[155725.041145] Call Trace:
[155725.041146]  <TASK>
[155725.041150]  __schedule+0x418/0x1320
[155725.041159]  ? ttwu_queue_wakelist+0xfe/0x120
[155725.041164]  schedule+0x27/0xd0
[155725.041166]  schedule_timeout+0xbd/0x100
[155725.041170]  dma_fence_default_wait+0x196/0x270
[155725.041175]  ? __pfx_dma_fence_default_wait_cb+0x10/0x10
[155725.041176]  dma_fence_wait_timeout+0x129/0x150
[155725.041178]  amdgpu_tlb_fence_work+0x2c/0xe0 [amdgpu 6422097874d6b256c402231ccda3be13871c9e72]
[155725.041274]  process_one_work+0x193/0x350
[155725.041279]  worker_thread+0x2d7/0x410
[155725.041281]  ? __pfx_worker_thread+0x10/0x10
[155725.041282]  kthread+0xfc/0x240
[155725.041285]  ? __pfx_kthread+0x10/0x10
[155725.041286]  ? __pfx_kthread+0x10/0x10
[155725.041286]  ret_from_fork+0x1c2/0x1f0
[155725.041291]  ? __pfx_kthread+0x10/0x10
[155725.041292]  ret_from_fork_asm+0x1a/0x30
[155725.041297]  </TASK>

I’ve seen some TLB fence changes in 6.18.4, not sure how related these are. Before this even I’ve been using 6.18.3 for quite a while w/ success, but gosh, I honesly feel that I was just lucky… /sad face/

Topic		Replies	Views
AMD GPU MES Timeouts Causing System Hangs on Framework Laptop 13 (AMD AI 300 Series) Linux ubuntu	88	3875	January 30, 2026
FW16 Screen Freezeing on Linux Linux arch	51	1105	February 10, 2026
Graphics card not available Linux arch	102	2596	November 6, 2025
Amdgpu Error queueing DMUB command: status=2 when waking from suspend Linux opensuse	74	8326	November 10, 2025
AMD Framework 13 with Debian still crashing a year later Linux debian	20	2001	November 14, 2025

Outstanding problems with LG monitor Thunderbolt stability / GPU hangs

Related topics