[TRACKING] Graphical corruption in Fedora 39 (AMD 3.03 BIOS)

@Mario_Limonciello I’ve applied the patch you linked to 6.7.4 and I’m still seeing issues when unplugging my USB-C display. It coincides with the IOMMU reporting lots of errors:

Feb 14 15:15:01 avalon kernel: amdgpu 0000:c1:00.0: Using 44-bit DMA addresses
Feb 14 15:15:01 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffffc00000 flags=0x0000]
Feb 14 15:15:01 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffffc01000 flags=0x0000]
Feb 14 15:15:01 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffffc02000 flags=0x0000]
Feb 14 15:15:01 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffffc03000 flags=0x0000]
Feb 14 15:15:01 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffffc04000 flags=0x0000]
Feb 14 15:15:01 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffffc05000 flags=0x0000]
Feb 14 15:15:01 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffffc06000 flags=0x0000]
Feb 14 15:15:01 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffffc07000 flags=0x0000]
Feb 14 15:15:01 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffffc08000 flags=0x0000]
Feb 14 15:15:01 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffffc84000 flags=0x0000]
...
Feb 14 15:15:06 avalon kernel: amd_iommu_report_page_fault: 80096 callbacks suppressed
Feb 14 15:15:06 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffffe3c0000 flags=0x0000]
Feb 14 15:15:06 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffffe3c1000 flags=0x0000]
Feb 14 15:15:06 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffffe3c2000 flags=0x0000]
Feb 14 15:15:06 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffffe3c3000 flags=0x0000]
Feb 14 15:15:06 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffffe3c4000 flags=0x0000]
Feb 14 15:15:06 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffffe3c5000 flags=0x0000]
Feb 14 15:15:06 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffffe3c6000 flags=0x0000]
Feb 14 15:15:06 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffffe3c7000 flags=0x0000]
Feb 14 15:15:06 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffffe3e0000 flags=0x0000]
Feb 14 15:15:06 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffffe3cc000 flags=0x0000]

The addresses also look fishy. I only have 32G of RAM and the addresses are in the 4 TB range?

The patch helps only with the graphical corruption, not this white screen + IOMMU issue that is reported by several people. Yes; the IOMMU is the messenger here. It’s still not clear where the bug is that is causing this.

2 Likes

Ah, ok. Is there a known workaround besides disabling the IOMMU? Or an upstream bug report to follow?

Besides losing some capabilities in virtualizing PCIe devices, is there any downside to having the IOMMU disabled? I use this laptop mainly as a workstation and I do run some VMs, but not with PCIe devices assigned to them.

Apart from that, the issue is still happening so far only after hibernate/suspend on Fedora 39 with kernel 6.7.3-200.fc39.x86_64 , so no change as of yet :slight_smile:

I had issues with the sd-card controller on my old laptop without the IOMMU. iommu=soft worked around that.

The problem with disabling the IOMMU is that a buggy (malicous) driver/device may now have an easier time corrupting memory in your system. That may or may not be an issue for you.

1 Like

As a side note, I’m not sure if this is actually an IOMMU problem or a case of the GPU scribbling over random physical memory (on the driver’s mistaken directions) and mostly getting away with it unless the IOMMU catches it red-handed. Whether you’re fine allowing this to run and hoping for the best is for you to decide, of course.

1 Like

Right, thanks, I’ll leave it enabled then (I only ever used it for virtualizing PCIe devices, but I never knew it also handled stuff like this ;-).

There’s an upstream bug report that was opened today.

3 Likes

The graphical corruption isn’t specific to Framework but the AMD cpu?

Thus far it’s only been reported by users on Framework laptops. I have an educated but unsubstantiated suspicion it’s related to a BIOS interaction.

3 Likes

So after already enabling UMA_GAME_OPTIMIZED in the BIOS the issues became less frequent, but still appeared. I have also added amdgpu.sg_display=0 to my kernel boot parameters and so far the issues have dissappeared. So the workaround posted by @Mario_Limonciello in the upstream bug seem to work just fine :slight_smile:

It will be interesting to see if the same issue will be reproducible on the zen4 AGPU sku’s.

I haven’t seen it on the zen3 APU’s i’ve tested with (5700G)- but they are based on the much older navi IP block.

There’s at least some comments in the upstream report that the behavior doesn’t happen with 6.8-rc5. Perhaps those affected can try rc6 (it will have some other changes that fix PMF issues reported on Framework 13).

1 Like

I’m running into this problem and am hoping to fix it, but am new to Linux. How would I actually go about applying this patch?

I wouldn’t try a rc kernel, but you can apply the workaround for now. When 6.8 releases for your distro, you should remove it.

Fedora (reboot afterwards):

sudo grubby --update-kernel=ALL --args 'amdgpu.sg_display=0'

Ubuntu: follow this guide.

1 Like

Hi All,

I’m wondering how this is all going?
So from my post a couple months ago, we’ve seen some Linux kernel and therefore AMD GPU driver updates, in which have seemed to greatly improved the graphics stability.

What I am not finding, and not sure if this should be a new thread, but the AMD GPU driver resets the GPU hardware, completely closing all application windows and restarting the, in my case, Wayland UI.

It does seem to be a memory thing, as I mainly operate with browser, and a few term windows using ssh.

The pattern I’ve seen is, with a number of browser tabs open, maybe 10-15, and working via cli over ssh to servers, all of a sudden without warning, everything goes black/blank and then returns to either a GDM login or desktop screen with no application windows.

dmesg shows;
[653723.632174] i2c_hid_acpi i2c-FRMW0005:00: failed to set a report to device: -121
[653897.788061] i2c_designware AMDI0010:00: i2c_dw_handle_tx_abort: lost arbitration
[653912.815342] i2c_hid_acpi i2c-FRMW0005:00: failed to set a report to device: -121
[654076.162672] i2c_designware AMDI0010:00: i2c_dw_handle_tx_abort: lost arbitration
[654091.381333] i2c_designware AMDI0010:00: i2c_dw_handle_tx_abort: lost arbitration
[654117.063877] i2c_designware AMDI0010:00: i2c_dw_handle_tx_abort: lost arbitration
[654203.486573] i2c_hid_acpi i2c-FRMW0005:00: i2c_hid_get_input: incomplete report (7/65535)
[654462.866202] i2c_designware AMDI0010:00: i2c_dw_handle_tx_abort: lost arbitration
[655525.281967] usb 3-1: USB disconnect, device number 5
[655623.531384] i2c_hid_acpi i2c-FRMW0005:00: failed to set a report to device: -121
[655703.681285] i2c_designware AMDI0010:00: i2c_dw_handle_tx_abort: lost arbitration
[655964.993631] i2c_hid_acpi i2c-FRMW0005:00: i2c_hid_get_input: incomplete report (7/65535)
[655972.063972] i2c_hid_acpi i2c-FRMW0005:00: i2c_hid_get_input: incomplete report (7/65535)
[655981.833375] i2c_hid_acpi i2c-FRMW0005:00: failed to set a report to device: -121
[656139.458248] i2c_hid_acpi i2c-FRMW0005:00: i2c_hid_get_input: incomplete report (7/65535)
[656139.458509] i2c_designware AMDI0010:00: i2c_dw_handle_tx_abort: lost arbitration
[656192.653202] i2c_hid_acpi i2c-FRMW0005:00: failed to set a report to device: -121
[656319.320017] i2c_hid_acpi i2c-FRMW0005:00: failed to set a report to device: -121
[656329.733425] i2c_hid_acpi i2c-FRMW0005:00: failed to set a report to device: -121
[656448.799002] i2c_hid_acpi i2c-FRMW0005:00: i2c_hid_get_input: incomplete report (7/65535)
[656462.800265] i2c_hid_acpi i2c-FRMW0005:00: failed to set a report to device: -121
[656475.626385] i2c_hid_acpi i2c-FRMW0005:00: failed to set a report to device: -121
[656513.306617] i2c_hid_acpi i2c-FRMW0005:00: failed to set a report to device: -121
[656593.816364] i2c_hid_acpi i2c-FRMW0005:00: i2c_hid_get_input: incomplete report (7/65535)
[656651.151653] i2c_hid_acpi i2c-FRMW0005:00: i2c_hid_get_input: incomplete report (7/65535)
[656753.797088] i2c_hid_acpi i2c-FRMW0005:00: failed to set a report to device: -121
[656806.629060] i2c_hid_acpi i2c-FRMW0005:00: i2c_hid_get_input: incomplete report (7/65535)
[657086.685545] i2c_designware AMDI0010:00: i2c_dw_handle_tx_abort: lost arbitration
[657122.544965] i2c_hid_acpi i2c-FRMW0005:00: failed to set a report to device: -121
[657472.055380] i2c_hid_acpi i2c-FRMW0005:00: failed to set a report to device: -121
[657644.117357] i2c_hid_acpi i2c-FRMW0005:00: i2c_hid_get_input: incomplete report (7/65535)
[657733.382291] i2c_hid_acpi i2c-FRMW0005:00: failed to set a report to device: -121
[657924.159076] i2c_hid_acpi i2c-FRMW0005:00: failed to set a report to device: -121
[658116.520939] i2c_hid_acpi i2c-FRMW0005:00: i2c_hid_get_input: incomplete report (7/65535)
[658199.105193] i2c_hid_acpi i2c-FRMW0005:00: failed to set a report to device: -121
[658339.210908] i2c_hid_acpi i2c-FRMW0005:00: i2c_hid_get_input: incomplete report (7/65535)
[658356.361229] i2c_designware AMDI0010:00: i2c_dw_handle_tx_abort: lost arbitration
[658384.413485] i2c_designware AMDI0010:00: i2c_dw_handle_tx_abort: lost arbitration
[658406.711407] i2c_hid_acpi i2c-FRMW0005:00: failed to set a report to device: -121
[658641.746664] [drm:gfx_v11_0_priv_reg_irq [amdgpu]] ERROR Illegal register access in command stream
[658641.757000] [drm:amdgpu_job_timedout [amdgpu]] ERROR ring gfx_0.0.0 timeout, signaled seq=8072741, emitted seq=8072742
[658641.757204] [drm:amdgpu_job_timedout [amdgpu]] ERROR Process information: process Xwayland pid 41047 thread Xwayland:cs0 pid 41058
[658641.757330] amdgpu 0000:c1:00.0: amdgpu: GPU reset begin!
[658641.839808] [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Failed to initialize parser -125!
[658641.913434] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=3
[658641.913612] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] ERROR failed to unmap legacy queue
[658642.028328] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=3
[658642.028442] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] ERROR failed to unmap legacy queue
[658642.143113] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=3
[658642.143224] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] ERROR failed to unmap legacy queue
[658642.257835] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=3
[658642.257943] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] ERROR failed to unmap legacy queue
[658642.372496] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=3
[658642.372619] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] ERROR failed to unmap legacy queue
[658642.487318] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=3
[658642.487451] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] ERROR failed to unmap legacy queue
[658642.602141] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=3
[658642.602245] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] ERROR failed to unmap legacy queue
[658642.716935] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=3
[658642.717039] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] ERROR failed to unmap legacy queue
[658642.831764] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=3
[658642.831876] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] ERROR failed to unmap legacy queue
[658643.065000] [drm:gfx_v11_0_hw_fini [amdgpu]] ERROR failed to halt cp gfx
[658643.066478] amdgpu 0000:c1:00.0: amdgpu: MODE2 reset
[658643.076536] amdgpu 0000:c1:00.0: amdgpu: GPU reset succeeded, trying to resume
[658643.077089] [drm] PCIE GART of 512M enabled (table at 0x0000008000500000).
[658643.077281] amdgpu 0000:c1:00.0: amdgpu: SMU is resuming…
[658643.079117] amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
[658643.080964] [drm] DMUB hardware initialized: version=0x08000500
[658643.086250] [drm] Watermarks table not configured properly by SMU
[658643.535578] [drm] kiq ring mec 3 pipe 1 q 0
[658643.538210] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[658643.538373] amdgpu 0000:c1:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
[658643.539035] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[658643.539037] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[658643.539037] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[658643.539038] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[658643.539039] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[658643.539039] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[658643.539040] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[658643.539040] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[658643.539041] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[658643.539041] amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[658643.539042] amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 1
[658643.539043] amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 1
[658643.539043] amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[658643.542753] amdgpu 0000:c1:00.0: amdgpu: recover vram bo from shadow start
[658643.542756] amdgpu 0000:c1:00.0: amdgpu: recover vram bo from shadow done
[658643.542759] [drm] Skip scheduling IBs!
[658643.544000] [drm] ring gfx_32777.1.1 was added
[658643.545420] [drm] ring compute_32777.2.2 was added
[658643.547072] [drm] ring sdma_32777.3.3 was added
[658643.547085] [drm] ring gfx_32777.1.1 test pass
[658643.547213] [drm] ring gfx_32777.1.1 ib test pass
[658643.547223] [drm] ring compute_32777.2.2 test pass
[658643.547247] [drm] ring compute_32777.2.2 ib test pass
[658643.547781] [drm] ring sdma_32777.3.3 test pass
[658643.547852] [drm] ring sdma_32777.3.3 ib test pass
[658643.550427] amdgpu 0000:c1:00.0: amdgpu: GPU reset(4) succeeded!

It kinda looks like this is a planned operation from the driver, but a complete failure in recovery, as it’s very frustrating in having to re-open all your windows and login to everything again.

So before everyone asks, yes I’ve got the amdgpu kernel flag set, bios is the latest version, I understand Debian 12 is not “officially” FW supported, but it’s the Linux kernel right, same same but different :wink: Oh and all deb sec and updates installed.

Kernel version;
Linux xxxx 6.1.0-18-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01) x86_64 GNU/Linux
Running Gnome 45
Debian 12
64GB RAM

Oh FYI, this happens on the latest BPO kernel too, but the battery life sux, like 75% less time, which I guess is to be expected for a non-optimised kernel build.

Thanks for your time everyone, keep up the great product FW people.

Paul.

That looks a lot like bugs that happen when you’re on old GPU firmware or an older mesa. Try upgrading both?

Have a similar issue, seeing very colourful mosaic-like flickering, different from flickering caused by not setting amdgpu.sgdisplay=0. It is usually when wake up after disconnecting laptop from dp 3440x1440 external monitor during standby, with fedora 39 kernal 6.7.5 bios 3.03 on 7840u with 16g x2 ram, have set amdgpu.sg_display=0. This could be temporarily solved by changing resolution after flickering occurs.
Here is the important log related

12:22:57 kernel: ucsi_acpi USBC000:00: ucsi_handle_connector_change: ACK failed (-110)
12:22:56 wpa_supplicant: bgscan simple: Failed to enable signal strength monitoring
12:22:53 kernel: amdgpu 0000:c1:00.0: [drm] *ERROR* Error queueing DMUB command: status=4
12:22:53 kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
12:22:53 kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=14
12:22:53 kernel: ucsi_acpi USBC000:00: failed to re-enable notifications (-22)
12:22:53 kernel: ucsi_acpi USBC000:00: failed to re-enable notifications (-22)
12:22:53 kernel: ucsi_acpi USBC000:00: possible UCSI driver bug 1
00:39:07 kernel: i2c_hid_acpi i2c-FRMW0005:00: i2c_hid_get_input: incomplete report (7/65535)

Here is the log about amdgpu

[    3.466464] [drm] amdgpu kernel modesetting enabled.
[    3.473596] amdgpu: Virtual CRAT table created for CPU
[    3.473612] amdgpu: Topology: Add CPU node
[    3.473747] amdgpu 0000:c1:00.0: enabling device (0006 -> 0007)
[    3.478413] amdgpu 0000:c1:00.0: amdgpu: Fetched VBIOS from VFCT
[    3.478416] amdgpu: ATOM BIOS: 113-PHXGENERIC-001
[    3.511171] amdgpu 0000:c1:00.0: [drm:jpeg_v4_0_early_init [amdgpu]] JPEG decode is enabled in VM mode
[    3.527169] amdgpu 0000:c1:00.0: vgaarb: deactivate vga console
[    3.527178] amdgpu 0000:c1:00.0: amdgpu: Trusted Memory Zone (TMZ) feature enabled
[    3.527279] amdgpu 0000:c1:00.0: amdgpu: VRAM: 512M 0x0000008000000000 - 0x000000801FFFFFFF (512M used)
[    3.527282] amdgpu 0000:c1:00.0: amdgpu: GART: 512M 0x00007FFF00000000 - 0x00007FFF1FFFFFFF
[    3.527553] [drm] amdgpu: 512M of VRAM memory ready
[    3.527556] [drm] amdgpu: 15635M of GTT memory ready.
[    3.528763] amdgpu 0000:c1:00.0: amdgpu: Will use PSP to load VCN firmware
[    4.099518] amdgpu 0000:c1:00.0: amdgpu: RAS: optional ras ta ucode is not available
[    4.107830] amdgpu 0000:c1:00.0: amdgpu: RAP: optional rap ta ucode is not available
[    4.107833] amdgpu 0000:c1:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[    4.141052] amdgpu 0000:c1:00.0: amdgpu: SMU is initialized successfully!
[    4.210676] amdgpu 0000:c1:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
[    4.228091] amdgpu: HMM registered 512MB device memory
[    4.229221] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[    4.229238] kfd kfd: amdgpu: Total number of KFD nodes to be created: 1
[    4.229701] amdgpu: Virtual CRAT table created for GPU
[    4.229870] amdgpu: Topology: Add dGPU node [0x15bf:0x1002]
[    4.229873] kfd kfd: amdgpu: added device 1002:15bf
[    4.229886] amdgpu 0000:c1:00.0: amdgpu: SE 1, SH per SE 2, CU per SH 6, active_cu_number 12
[    4.229893] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[    4.229895] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[    4.229896] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[    4.229897] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[    4.229899] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[    4.229900] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[    4.229901] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[    4.229902] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[    4.229904] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[    4.229905] amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[    4.229906] amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[    4.229907] amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[    4.229908] amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[    4.234785] [drm] Initialized amdgpu 3.57.0 20150101 for 0000:c1:00.0 on minor 1
[    4.259676] fbcon: amdgpudrmfb (fb0) is primary device
[    4.259680] amdgpu 0000:c1:00.0: [drm] fb0: amdgpudrmfb frame buffer device
[   19.405236] snd_hda_intel 0000:c1:00.1: bound 0000:c1:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
[  129.875944] amdgpu 0000:c1:00.0: Using 44-bit DMA addresses
[80357.725448] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=14
[80357.725777] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
[80357.728079] amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
[80357.731419] amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
[80357.731484] amdgpu 0000:c1:00.0: [drm] *ERROR* Error queueing DMUB command: status=4
[80357.837550] amdgpu 0000:c1:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
[80357.837909] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[80357.837913] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[80357.837915] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[80357.837917] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[80357.837919] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[80357.837921] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[80357.837923] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[80357.837924] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[80357.837927] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[80357.837928] amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[80357.837930] amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[80357.837932] amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[80357.837934] amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[83303.686862] amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:1 pasid:32771, for process firefox pid 5153 thread firefox:cs0 pid 5263)
[83303.686875] amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x000080003f350000 from client 10
[83303.686881] amdgpu 0000:c1:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00101430
[83303.686885] amdgpu 0000:c1:00.0: amdgpu: 	 Faulty UTCL2 client ID: SQC (data) (0xa)
[83303.686889] amdgpu 0000:c1:00.0: amdgpu: 	 MORE_FAULTS: 0x0
[83303.686893] amdgpu 0000:c1:00.0: amdgpu: 	 WALKER_ERROR: 0x0
[83303.686896] amdgpu 0000:c1:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x3
[83303.686899] amdgpu 0000:c1:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
[83303.686902] amdgpu 0000:c1:00.0: amdgpu: 	 RW: 0x0
[83313.936525] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, but soft recovered
[85977.270448] amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:1 pasid:32771, for process firefox pid 5153 thread firefox:cs0 pid 5263)
[85977.270462] amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000000000000000 from client 10
[85977.270468] amdgpu 0000:c1:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00101430
[85977.270472] amdgpu 0000:c1:00.0: amdgpu: 	 Faulty UTCL2 client ID: SQC (data) (0xa)
[85977.270476] amdgpu 0000:c1:00.0: amdgpu: 	 MORE_FAULTS: 0x0
[85977.270480] amdgpu 0000:c1:00.0: amdgpu: 	 WALKER_ERROR: 0x0
[85977.270483] amdgpu 0000:c1:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x3
[85977.270486] amdgpu 0000:c1:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
[85977.270489] amdgpu 0000:c1:00.0: amdgpu: 	 RW: 0x0
[85987.583335] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, but soft recovered
[87413.903259] amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:1 pasid:32771, for process firefox pid 5153 thread firefox:cs0 pid 5263)
[87413.903273] amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x00004000bf004000 from client 10
[87413.903279] amdgpu 0000:c1:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00101430
[87413.903283] amdgpu 0000:c1:00.0: amdgpu: 	 Faulty UTCL2 client ID: SQC (data) (0xa)
[87413.903287] amdgpu 0000:c1:00.0: amdgpu: 	 MORE_FAULTS: 0x0
[87413.903291] amdgpu 0000:c1:00.0: amdgpu: 	 WALKER_ERROR: 0x0
[87413.903294] amdgpu 0000:c1:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x3
[87413.903297] amdgpu 0000:c1:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
[87413.903299] amdgpu 0000:c1:00.0: amdgpu: 	 RW: 0x0
[87424.246049] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, but soft recovered
[106001.728320] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=14
[106001.728642] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
[106001.731128] amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
[106001.734424] amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
[106001.734489] amdgpu 0000:c1:00.0: [drm] *ERROR* Error queueing DMUB command: status=4
[106001.840943] amdgpu 0000:c1:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
[106001.841287] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[106001.841291] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[106001.841293] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[106001.841295] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[106001.841297] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[106001.841299] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[106001.841300] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[106001.841302] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[106001.841304] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[106001.841306] amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[106001.841308] amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[106001.841309] amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[106001.841311] amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[152930.943600] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=14
[152930.943922] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
[152930.946475] amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
[152930.949683] amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
[152930.949747] amdgpu 0000:c1:00.0: [drm] *ERROR* Error queueing DMUB command: status=4
[152931.056253] amdgpu 0000:c1:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
[152931.056610] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[152931.056613] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[152931.056615] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[152931.056617] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[152931.056619] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[152931.056621] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[152931.056622] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[152931.056624] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[152931.056626] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[152931.056628] amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[152931.056630] amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[152931.056632] amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[152931.056634] amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0

In Jan, when resuming from suspend, I was getting the pink-blocky-flickering screen quite a bit. Suspending-and-resuming again would typically fix it.

Haven’t seen that issue for at least a couple weeks though, and I think the only thing I changed was the kernel.

Maybe try upgrading? I’m running 6.7.6-200.fc39.x86_64 and it’s feeling … I hate to jinx it, but it’s feeling pretty darn stable.

I still have sg_disable set of course.

Yep, I can corroborate 6.7.6 seems better.