Amdgpu instability (6.13.4 + firmware 20250219)

I’m using opensuse tumbleweed (KDE Plasma + Wayland), and the last week or so has been painful in terms of GPU instabilities.

Sometimes the entire UI freezes for a few seconds (I’m on Plasma), sometimes blocky artifacts appear here and there, or on the entire screen.

I’m on kernel 6.13.4-1-default (opensuse’s) and kernel-firmware-amdgpu 20250219

Luckily I have snapshots, so now I reverted back to 6.13.3-1-default and amdgpu firmware 20250206.

Here is an excerpt from my journal:

Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:6 pasid:32811)
Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:  in process chrome pid 10555 thread chrome:cs0 pid 10589)
Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800106200000 from client 10
Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00640051
Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:          Faulty UTCL2 client ID: CB/DB (0x0)
Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:          MORE_FAULTS: 0x1
Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:          WALKER_ERROR: 0x0
Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:          MAPPING_ERROR: 0x0
Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:          RW: 0x1
Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:128 vmid:6 pasid:32811)
Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:  in process chrome pid 10555 thread chrome:cs0 pid 10589)
Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800106207000 from client 10
Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:173 vmid:6 pasid:32811)
Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:  in process chrome pid 10555 thread chrome:cs0 pid 10589)
Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800106206000 from client 10
Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:8 vmid:6 pasid:32811)
Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:  in process chrome pid 10555 thread chrome:cs0 pid 10589)
Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800106205000 from client 10
Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:6 pasid:32811)
Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:  in process chrome pid 10555 thread chrome:cs0 pid 10589)
Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800106204000 from client 10
Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:160 vmid:6 pasid:32811)
Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:  in process chrome pid 10555 thread chrome:cs0 pid 10589)
Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800106203000 from client 10
Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:173 vmid:6 pasid:32811)
Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:  in process chrome pid 10555 thread chrome:cs0 pid 10589)
Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800106202000 from client 10
Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:6 pasid:32811)
Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:  in process chrome pid 10555 thread chrome:cs0 pid 10589)
Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800106201000 from client 10
Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:173 vmid:6 pasid:32811)
Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:  in process chrome pid 10555 thread chrome:cs0 pid 10589)
Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x000080010620d000 from client 10
Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:173 vmid:6 pasid:32811)
Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:  in process chrome pid 10555 thread chrome:cs0 pid 10589)
Feb 28 13:17:40 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x000080010620c000 from client 10
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: Dumping IP State
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: Dumping IP State Completed
Feb 28 13:17:51 andromeda kernel: gmc_v11_0_process_interrupt: 30 callbacks suppressed
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:157 vmid:6 pasid:32811)
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:  in process chrome pid 10555 thread chrome:cs0 pid 10589)
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800112e01000 from client 10
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x0060113B
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:          MORE_FAULTS: 0x1
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:          WALKER_ERROR: 0x5
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:          MAPPING_ERROR: 0x1
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:          RW: 0x0
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:6 pasid:32811)
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:  in process chrome pid 10555 thread chrome:cs0 pid 10589)
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800112e00000 from client 10
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x0060113B
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:          MORE_FAULTS: 0x1
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:          WALKER_ERROR: 0x5
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:          MAPPING_ERROR: 0x1
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:          RW: 0x0
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:157 vmid:6 pasid:32811)
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:  in process chrome pid 10555 thread chrome:cs0 pid 10589)
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800112e02000 from client 10
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:157 vmid:6 pasid:32811)
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:  in process chrome pid 10555 thread chrome:cs0 pid 10589)
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800112e03000 from client 10
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:157 vmid:6 pasid:32811)
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:  in process chrome pid 10555 thread chrome:cs0 pid 10589)
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800112e0a000 from client 10
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:157 vmid:6 pasid:32811)
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:  in process chrome pid 10555 thread chrome:cs0 pid 10589)
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800112e08000 from client 10
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:157 vmid:6 pasid:32811)
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:  in process chrome pid 10555 thread chrome:cs0 pid 10589)
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800112e09000 from client 10
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:157 vmid:6 pasid:32811)
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:  in process chrome pid 10555 thread chrome:cs0 pid 10589)
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800112e0b000 from client 10
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:157 vmid:6 pasid:32811)
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:  in process chrome pid 10555 thread chrome:cs0 pid 10589)
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800112e01000 from client 10
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:157 vmid:6 pasid:32811)
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:  in process chrome pid 10555 thread chrome:cs0 pid 10589)
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800112e07000 from client 10
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x0060113B
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:          MORE_FAULTS: 0x1
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:          WALKER_ERROR: 0x5
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:          MAPPING_ERROR: 0x1
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:          RW: 0x0
Feb 28 13:17:51 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered
Feb 28 13:18:04 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: Dumping IP State
Feb 28 13:18:04 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: Dumping IP State Completed
Feb 28 13:18:04 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered

Is this a known issue? Or should I ask my distro maintainers?

Thanks

@Mario_Limonciello FYI (sorry to bother you, but you’ve become my go-to person for amdgpu issues :smiley: )

4 Likes

6.13.4 (and versions before it) atleast has a bug if you use flatpaks, it will crash/freeze the whole system when doing something inside the flatpak (for example uploading a file to flatpak app).

The GCVM_L2_PROTECTION_FAULT_status also happens to me on an up to date Fedora with kernel 6.13.4 when using Firefox (just occurred an hour ago while browsing LinkedIn)

3 Likes

Thanks for feedback. Indeed I also seem to notice a correlation with flatpaks AND with Firefox (rpm package).

6.13.3 seems ok though, for the moment.

Oh, coming back to work after a while also seems to trigger this, right after I unlock my KDE lockscreen my entire plasma session started freezing.

Just curious, are you guys using Wayland or Xorg?

I’m on Wayland

Ah interesting. I’m on X11 and had the same thing happening twice this week. I wasn’t actively using flatpak apps.

I was able to reproduce most of this behavior on an AMD FW13, OpenSUSE Tumbleweed, KDE Plasma, 6.13.4 and also on 6.13.5. All on X11. Plasma freezing, blocky artifacts on the screen.

However, I do not have log entries like the ones OP pasted.

I do use Flatpaks though.

There were other issues with this board as well, see: USB C Error on boot – Framework has agreed to send me a new board, although I am skeptical that it’s a faulty board rather than a mix of kernel, driver, and BIOS issues.

1 Like

I just had a hard freeze on 6.13.3, which I thought was immune. Happened while clicking a link in Firefox to open it in another tab. Firefox RPM, not flatpak.

Ctrl + Alt + F1 opened a TTY just fine, so it’s not like the entire system froze, just the graphics.

When trying to go back to the graphical session (on F7), all I could see was a message on the TTY - no way to go back to the DE. The message I saw is the one at the end of this excerpt (from sudo journalctl -k -b -1 | grep amdgpu):

Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:3 pasid:32806)
Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:  in process firefox pid 1079915 thread firefox:cs0 pid 1079989)
Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800170302000 from client 10
Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00340051
Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:          Faulty UTCL2 client ID: CB/DB (0x0)
Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:          MORE_FAULTS: 0x1
Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:          WALKER_ERROR: 0x0
Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:          MAPPING_ERROR: 0x0
Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:          RW: 0x1
Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:173 vmid:3 pasid:32806)
Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:  in process firefox pid 1079915 thread firefox:cs0 pid 1079989)
Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800170303000 from client 10
Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:3 pasid:32806)
Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:  in process firefox pid 1079915 thread firefox:cs0 pid 1079989)
Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800170308000 from client 10
Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:173 vmid:3 pasid:32806)
Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:  in process firefox pid 1079915 thread firefox:cs0 pid 1079989)
Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800170307000 from client 10
Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:3 pasid:32806)
Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:  in process firefox pid 1079915 thread firefox:cs0 pid 1079989)
Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800170300000 from client 10
Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:3 pasid:32806)
Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:  in process firefox pid 1079915 thread firefox:cs0 pid 1079989)
Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x000080017030b000 from client 10
Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:3 pasid:32806)
Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:  in process firefox pid 1079915 thread firefox:cs0 pid 1079989)
Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x000080017030e000 from client 10
Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:3 pasid:32806)
Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:  in process firefox pid 1079915 thread firefox:cs0 pid 1079989)
Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x000080017030a000 from client 10
Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:3 pasid:32806)
Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:  in process firefox pid 1079915 thread firefox:cs0 pid 1079989)
Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800170306000 from client 10
Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:3 pasid:32806)
Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:  in process firefox pid 1079915 thread firefox:cs0 pid 1079989)
Mar 05 15:01:05 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800170302000 from client 10
Mar 05 15:01:16 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: Dumping IP State
Mar 05 15:01:16 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: Dumping IP State Completed
Mar 05 15:01:16 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=10771392, emitted seq=10771394
Mar 05 15:01:16 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: Process information: process firefox pid 1079915 thread firefox:cs0 pid 1079989
Mar 05 15:01:16 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: Starting gfx_0.0.0 ring reset
Mar 05 15:01:18 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=RESET
Mar 05 15:01:18 andromeda kernel: [drm:amdgpu_mes_reset_legacy_queue [amdgpu]] *ERROR* failed to reset legacy queue
Mar 05 15:01:18 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: Ring gfx_0.0.0 reset failure
Mar 05 15:01:18 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: GPU reset begin!
Mar 05 15:01:20 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
Mar 05 15:01:20 andromeda kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Mar 05 15:01:20 andromeda kernel: [drm:gfx_v11_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
Mar 05 15:01:20 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: MODE2 reset
Mar 05 15:01:20 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: GPU reset succeeded, trying to resume
Mar 05 15:01:20 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
Mar 05 15:01:20 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
Mar 05 15:01:20 andromeda kernel: amdgpu 0000:c1:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:225
Mar 05 15:01:20 andromeda kernel: amdgpu 0000:c1:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:233
Mar 05 15:01:20 andromeda kernel: amdgpu 0000:c1:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:241
Mar 05 15:01:20 andromeda kernel: amdgpu 0000:c1:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:249
Mar 05 15:01:21 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
Mar 05 15:01:21 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Mar 05 15:01:21 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Mar 05 15:01:21 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
Mar 05 15:01:21 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
Mar 05 15:01:21 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
Mar 05 15:01:21 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
Mar 05 15:01:21 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
Mar 05 15:01:21 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
Mar 05 15:01:21 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
Mar 05 15:01:21 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
Mar 05 15:01:21 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
Mar 05 15:01:21 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
Mar 05 15:01:21 andromeda kernel: amdgpu 0000:c1:00.0: amdgpu: GPU reset(2) succeeded!
Mar 05 15:01:24 andromeda kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!

I am on AMD FW 13, OpenSuse Tumpleweed, KDE Plasma (Wayland) as well, Kernel 6.13.5 now but I’ve also used 6.13.4 before. I’ve had a few minor stability issues over the last few weeks, but nothing major. Haven’t see blocky artifacts at all. Sometimes a temporary freeze in Firefox when interacting with a page, but just waiting fixed the issue for me. I haven’t looked at the logs at all, though. I’ll watch out for freezes and check the logs if I notice anything.

Just had a freeze when waking up from standby, had to hard power off

1 Like

For the freeze while using Firefox, there’s an issue opened here:

2 Likes

I was getting a lot of graphical glitches for a while and now my login shell seems completely broken. FW13 AMD, Fedora 41, KDE Plasma, Wayland.

[gfxhub] page fault (src_id:0 ring:0 vmid:1 pasid:32777)
 in process kwin_wayland pid 7370 thread kwin_wayla:cs0 pid 7431)
  in page starting at address 0x0000800000000000 from client 10
GCVM_L2_PROTECTION_FAULT_STATUS:0x00101E11
     Faulty UTCL2 client ID: GCR (0xf)
     MORE_FAULTS: 0x1
     WALKER_ERROR: 0x0
     PERMISSION_FAULTS: 0x1
     MAPPING_ERROR: 0x0
     RW: 0x0

typed in manually from my phone ^

1 Like

You deserve a prize for typing all that stuff :smiley:

For folks on tumbleweed, I’m trying kernel-longterm since on 6.13.3 I’m also seeing intermittent freezes and since this is my main gear I can’t live with it.

Hoping this is a 6.13.x issue - kernel-longterm uses 6.12 for the moment. I’ve only been using it for a few minutes, so I can’t confirm it solves the problem.

More info here:

❯ uname -r
6.12.17-1-longterm

I think/hope you can do something similar on Fedora

FWIW 6.13.5-1-default behaves the same. Few minutes after resuming from suspend before artifacts start showing up, followed by whole session going down. dmesg:

kernel: amdgpu 0000:c1:00.0: amdgpu:  in process steamwebhelper pid 22106 thread steamwebhe:cs0 pid 22110)
kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x000080010ca07000 from client 10
kernel: amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:2 pasid:32781)
kernel: amdgpu 0000:c1:00.0: amdgpu:  in process steamwebhelper pid 22106 thread steamwebhe:cs0 pid 22110)
kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x000080010ca06000 from client 10
....
kernel: amdgpu 0000:c1:00.0: amdgpu: Dumping IP State
kernel: amdgpu 0000:c1:00.0: amdgpu: Dumping IP State Completed
kernel: amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=30603, emitted seq=30605
kernel: amdgpu 0000:c1:00.0: amdgpu: Process information: process steamwebhelper pid 22106 thread steamwebhe:cs0 pid 22110
kernel: amdgpu 0000:c1:00.0: amdgpu: Starting gfx_0.0.0 ring reset
kernel: amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=RESET
kernel: [drm:amdgpu_mes_reset_legacy_queue [amdgpu]] *ERROR* failed to reset legacy queue
kernel: amdgpu 0000:c1:00.0: amdgpu: Ring gfx_0.0.0 reset failure
kernel: amdgpu 0000:c1:00.0: amdgpu: GPU reset begin!
PackageKit[22395]: get-updates transaction /14_dcaecbbb from uid 1000 finished with success after 3164ms
kernel: amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
kernel: [drm:gfx_v11_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
kernel: amdgpu 0000:c1:00.0: amdgpu: MODE2 reset
kernel: amdgpu 0000:c1:00.0: amdgpu: GPU reset succeeded, trying to resume

I’ve been getting this a bunch, too. FW13 AMD, Fedora 41, GNOME, Wayland

Currently trying 6.12.15 since that was still available in my grub menu.

For anybody on Fedora that wants to try older kernels from the official builds, you can get them from here

On systems that use DNF as their package manger:

  1. Install a 6.12 or older kernel (might still be present)
  2. Reboot to that kernel
  3. Remove 6.13 packages dnf remove kernel*-6.13*
  4. Add the following config to /etc/dnf/versionlock.toml:
version = "1.0"

[[packages]]
name = "kernel*"
comment = "Exclude 6.13 due to stability issues"
[[packages.conditions]]
key = "evr"
comparator = ">="
value = "6.14"

This will effectively blacklist kernel 6.13 and it’s friends, but still allow updates to 6.14 and higher :slight_smile:

Just want to say that I have the same issue on Kernel 6.13.6, with openSUSE Tumbleweed + GNOME + Wayland on the Ryzen 5 7640U. I do also see the blocky artifacts on an external display. After the crash I get thrown out of my GDM session.

[ 9539.791102] [      C7] amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:7 pasid:32770)
[ 9539.791110] [      C7] amdgpu 0000:c1:00.0: amdgpu:  in process gnome-shell pid 2377 thread gnome-shel:cs0 pid 2460)
[ 9539.791113] [      C7] amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800101c0a000 from client 10
[ 9539.791115] [      C7] amdgpu 0000:c1:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00740051
[ 9539.791117] [      C7] amdgpu 0000:c1:00.0: amdgpu: 	 Faulty UTCL2 client ID: CB/DB (0x0)
[ 9539.791119] [      C7] amdgpu 0000:c1:00.0: amdgpu: 	 MORE_FAULTS: 0x1
[ 9539.791120] [      C7] amdgpu 0000:c1:00.0: amdgpu: 	 WALKER_ERROR: 0x0
[ 9539.791122] [      C7] amdgpu 0000:c1:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x5
[ 9539.791123] [      C7] amdgpu 0000:c1:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
[ 9539.791125] [      C7] amdgpu 0000:c1:00.0: amdgpu: 	 RW: 0x1
[ 9539.791128] [      C7] amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:7 pasid:32770)
[ 9539.791130] [      C7] amdgpu 0000:c1:00.0: amdgpu:  in process gnome-shell pid 2377 thread gnome-shel:cs0 pid 2460)
[ 9539.791132] [      C7] amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800101c09000 from client 10
[ 9539.791136] [      C7] amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:7 pasid:32770)
[ 9539.791138] [      C7] amdgpu 0000:c1:00.0: amdgpu:  in process gnome-shell pid 2377 thread gnome-shel:cs0 pid 2460)
[ 9539.791140] [      C7] amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800101c0b000 from client 10
[ 9539.791144] [      C7] amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:7 pasid:32770)
[ 9539.791146] [      C7] amdgpu 0000:c1:00.0: amdgpu:  in process gnome-shell pid 2377 thread gnome-shel:cs0 pid 2460)
[ 9539.791148] [      C7] amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800101c08000 from client 10
[ 9539.791152] [      C7] amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:173 vmid:7 pasid:32770)
[ 9539.791153] [      C7] amdgpu 0000:c1:00.0: amdgpu:  in process gnome-shell pid 2377 thread gnome-shel:cs0 pid 2460)
[ 9539.791155] [      C7] amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800101c06000 from client 10
[ 9539.791158] [      C7] amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:7 pasid:32770)
[ 9539.791160] [      C7] amdgpu 0000:c1:00.0: amdgpu:  in process gnome-shell pid 2377 thread gnome-shel:cs0 pid 2460)
[ 9539.791162] [      C7] amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800101c0d000 from client 10
[ 9539.791165] [      C7] amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:173 vmid:7 pasid:32770)
[ 9539.791167] [      C7] amdgpu 0000:c1:00.0: amdgpu:  in process gnome-shell pid 2377 thread gnome-shel:cs0 pid 2460)
[ 9539.791169] [      C7] amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800101c03000 from client 10
[ 9539.791173] [      C7] amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:173 vmid:7 pasid:32770)
[ 9539.791175] [      C7] amdgpu 0000:c1:00.0: amdgpu:  in process gnome-shell pid 2377 thread gnome-shel:cs0 pid 2460)
[ 9539.791177] [      C7] amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800101c00000 from client 10
[ 9539.791181] [      C7] amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:173 vmid:7 pasid:32770)
[ 9539.791182] [      C7] amdgpu 0000:c1:00.0: amdgpu:  in process gnome-shell pid 2377 thread gnome-shel:cs0 pid 2460)
[ 9539.791184] [      C7] amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800101c01000 from client 10
[ 9539.791187] [      C7] amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:173 vmid:7 pasid:32770)
[ 9539.791189] [      C7] amdgpu 0000:c1:00.0: amdgpu:  in process gnome-shell pid 2377 thread gnome-shel:cs0 pid 2460)
[ 9539.791191] [      C7] amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800101c04000 from client 10
[ 9549.842217] [  T77805] amdgpu 0000:c1:00.0: amdgpu: Dumping IP State
[ 9549.844763] [  T77805] amdgpu 0000:c1:00.0: amdgpu: Dumping IP State Completed
[ 9549.844928] [  T77805] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered
[ 9549.845096] [      C7] amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:219 vmid:2 pasid:32773)
[ 9549.845101] [      C7] amdgpu 0000:c1:00.0: amdgpu:  in process firefox pid 5990 thread firefox:cs0 pid 6075)
[ 9549.845103] [      C7] amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800000282000 from client 10
[ 9549.845106] [      C7] amdgpu 0000:c1:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x002009B6
[ 9549.845108] [      C7] amdgpu 0000:c1:00.0: amdgpu: 	 Faulty UTCL2 client ID: CPF (0x4)
[ 9549.845110] [      C7] amdgpu 0000:c1:00.0: amdgpu: 	 MORE_FAULTS: 0x0
[ 9549.845112] [      C7] amdgpu 0000:c1:00.0: amdgpu: 	 WALKER_ERROR: 0x3
[ 9549.845113] [      C7] amdgpu 0000:c1:00.0: amdgpu: 	 PERMISSION_FAULTS: 0xb
[ 9549.845115] [      C7] amdgpu 0000:c1:00.0: amdgpu: 	 MAPPING_ERROR: 0x1
[ 9549.845117] [      C7] amdgpu 0000:c1:00.0: amdgpu: 	 RW: 0x0
[ 9560.082326] [  T77817] amdgpu 0000:c1:00.0: amdgpu: Dumping IP State
[ 9560.085653] [  T77817] amdgpu 0000:c1:00.0: amdgpu: Dumping IP State Completed
[ 9560.095677] [  T77817] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=925484, emitted seq=925486
[ 9560.095696] [  T77817] amdgpu 0000:c1:00.0: amdgpu: Process information: process firefox pid 5990 thread firefox:cs0 pid 6075
[ 9560.095712] [  T77817] amdgpu 0000:c1:00.0: amdgpu: Starting gfx_0.0.0 ring reset
[ 9562.099571] [  T77817] amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=RESET
[ 9562.099588] [  T77817] [drm:amdgpu_mes_reset_legacy_queue [amdgpu]] *ERROR* failed to reset legacy queue
[ 9562.099765] [  T77817] amdgpu 0000:c1:00.0: amdgpu: Ring gfx_0.0.0 reset failure
[ 9562.099774] [  T77817] amdgpu 0000:c1:00.0: amdgpu: GPU reset begin!
[ 9564.326726] [  T77817] amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
[ 9564.326738] [  T77817] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[ 9564.554526] [  T77817] [drm:gfx_v11_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
[ 9564.556659] [  T77817] amdgpu 0000:c1:00.0: amdgpu: MODE2 reset
[ 9564.588115] [  T77817] amdgpu 0000:c1:00.0: amdgpu: GPU reset succeeded, trying to resume

A fellow tumbleweeder, we’re growing by the day! :smiley:

Just in case you missed this :wink:

Unfortunately I just hit this problem on 6.12.15 as well. It doesn’t seem to be as frequent, but it still happens. I’ll probably try an earlier 6.12.x next.