[TRACKING] Graphical corruption in Fedora 39 (AMD 3.03 BIOS)

I’m running into this problem and am hoping to fix it, but am new to Linux. How would I actually go about applying this patch?

I wouldn’t try a rc kernel, but you can apply the workaround for now. When 6.8 releases for your distro, you should remove it.

Fedora (reboot afterwards):

sudo grubby --update-kernel=ALL --args 'amdgpu.sg_display=0'

Ubuntu: follow this guide.

1 Like

Hi All,

I’m wondering how this is all going?
So from my post a couple months ago, we’ve seen some Linux kernel and therefore AMD GPU driver updates, in which have seemed to greatly improved the graphics stability.

What I am not finding, and not sure if this should be a new thread, but the AMD GPU driver resets the GPU hardware, completely closing all application windows and restarting the, in my case, Wayland UI.

It does seem to be a memory thing, as I mainly operate with browser, and a few term windows using ssh.

The pattern I’ve seen is, with a number of browser tabs open, maybe 10-15, and working via cli over ssh to servers, all of a sudden without warning, everything goes black/blank and then returns to either a GDM login or desktop screen with no application windows.

dmesg shows;
[653723.632174] i2c_hid_acpi i2c-FRMW0005:00: failed to set a report to device: -121
[653897.788061] i2c_designware AMDI0010:00: i2c_dw_handle_tx_abort: lost arbitration
[653912.815342] i2c_hid_acpi i2c-FRMW0005:00: failed to set a report to device: -121
[654076.162672] i2c_designware AMDI0010:00: i2c_dw_handle_tx_abort: lost arbitration
[654091.381333] i2c_designware AMDI0010:00: i2c_dw_handle_tx_abort: lost arbitration
[654117.063877] i2c_designware AMDI0010:00: i2c_dw_handle_tx_abort: lost arbitration
[654203.486573] i2c_hid_acpi i2c-FRMW0005:00: i2c_hid_get_input: incomplete report (7/65535)
[654462.866202] i2c_designware AMDI0010:00: i2c_dw_handle_tx_abort: lost arbitration
[655525.281967] usb 3-1: USB disconnect, device number 5
[655623.531384] i2c_hid_acpi i2c-FRMW0005:00: failed to set a report to device: -121
[655703.681285] i2c_designware AMDI0010:00: i2c_dw_handle_tx_abort: lost arbitration
[655964.993631] i2c_hid_acpi i2c-FRMW0005:00: i2c_hid_get_input: incomplete report (7/65535)
[655972.063972] i2c_hid_acpi i2c-FRMW0005:00: i2c_hid_get_input: incomplete report (7/65535)
[655981.833375] i2c_hid_acpi i2c-FRMW0005:00: failed to set a report to device: -121
[656139.458248] i2c_hid_acpi i2c-FRMW0005:00: i2c_hid_get_input: incomplete report (7/65535)
[656139.458509] i2c_designware AMDI0010:00: i2c_dw_handle_tx_abort: lost arbitration
[656192.653202] i2c_hid_acpi i2c-FRMW0005:00: failed to set a report to device: -121
[656319.320017] i2c_hid_acpi i2c-FRMW0005:00: failed to set a report to device: -121
[656329.733425] i2c_hid_acpi i2c-FRMW0005:00: failed to set a report to device: -121
[656448.799002] i2c_hid_acpi i2c-FRMW0005:00: i2c_hid_get_input: incomplete report (7/65535)
[656462.800265] i2c_hid_acpi i2c-FRMW0005:00: failed to set a report to device: -121
[656475.626385] i2c_hid_acpi i2c-FRMW0005:00: failed to set a report to device: -121
[656513.306617] i2c_hid_acpi i2c-FRMW0005:00: failed to set a report to device: -121
[656593.816364] i2c_hid_acpi i2c-FRMW0005:00: i2c_hid_get_input: incomplete report (7/65535)
[656651.151653] i2c_hid_acpi i2c-FRMW0005:00: i2c_hid_get_input: incomplete report (7/65535)
[656753.797088] i2c_hid_acpi i2c-FRMW0005:00: failed to set a report to device: -121
[656806.629060] i2c_hid_acpi i2c-FRMW0005:00: i2c_hid_get_input: incomplete report (7/65535)
[657086.685545] i2c_designware AMDI0010:00: i2c_dw_handle_tx_abort: lost arbitration
[657122.544965] i2c_hid_acpi i2c-FRMW0005:00: failed to set a report to device: -121
[657472.055380] i2c_hid_acpi i2c-FRMW0005:00: failed to set a report to device: -121
[657644.117357] i2c_hid_acpi i2c-FRMW0005:00: i2c_hid_get_input: incomplete report (7/65535)
[657733.382291] i2c_hid_acpi i2c-FRMW0005:00: failed to set a report to device: -121
[657924.159076] i2c_hid_acpi i2c-FRMW0005:00: failed to set a report to device: -121
[658116.520939] i2c_hid_acpi i2c-FRMW0005:00: i2c_hid_get_input: incomplete report (7/65535)
[658199.105193] i2c_hid_acpi i2c-FRMW0005:00: failed to set a report to device: -121
[658339.210908] i2c_hid_acpi i2c-FRMW0005:00: i2c_hid_get_input: incomplete report (7/65535)
[658356.361229] i2c_designware AMDI0010:00: i2c_dw_handle_tx_abort: lost arbitration
[658384.413485] i2c_designware AMDI0010:00: i2c_dw_handle_tx_abort: lost arbitration
[658406.711407] i2c_hid_acpi i2c-FRMW0005:00: failed to set a report to device: -121
[658641.746664] [drm:gfx_v11_0_priv_reg_irq [amdgpu]] ERROR Illegal register access in command stream
[658641.757000] [drm:amdgpu_job_timedout [amdgpu]] ERROR ring gfx_0.0.0 timeout, signaled seq=8072741, emitted seq=8072742
[658641.757204] [drm:amdgpu_job_timedout [amdgpu]] ERROR Process information: process Xwayland pid 41047 thread Xwayland:cs0 pid 41058
[658641.757330] amdgpu 0000:c1:00.0: amdgpu: GPU reset begin!
[658641.839808] [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Failed to initialize parser -125!
[658641.913434] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=3
[658641.913612] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] ERROR failed to unmap legacy queue
[658642.028328] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=3
[658642.028442] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] ERROR failed to unmap legacy queue
[658642.143113] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=3
[658642.143224] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] ERROR failed to unmap legacy queue
[658642.257835] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=3
[658642.257943] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] ERROR failed to unmap legacy queue
[658642.372496] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=3
[658642.372619] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] ERROR failed to unmap legacy queue
[658642.487318] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=3
[658642.487451] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] ERROR failed to unmap legacy queue
[658642.602141] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=3
[658642.602245] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] ERROR failed to unmap legacy queue
[658642.716935] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=3
[658642.717039] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] ERROR failed to unmap legacy queue
[658642.831764] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=3
[658642.831876] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] ERROR failed to unmap legacy queue
[658643.065000] [drm:gfx_v11_0_hw_fini [amdgpu]] ERROR failed to halt cp gfx
[658643.066478] amdgpu 0000:c1:00.0: amdgpu: MODE2 reset
[658643.076536] amdgpu 0000:c1:00.0: amdgpu: GPU reset succeeded, trying to resume
[658643.077089] [drm] PCIE GART of 512M enabled (table at 0x0000008000500000).
[658643.077281] amdgpu 0000:c1:00.0: amdgpu: SMU is resuming…
[658643.079117] amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
[658643.080964] [drm] DMUB hardware initialized: version=0x08000500
[658643.086250] [drm] Watermarks table not configured properly by SMU
[658643.535578] [drm] kiq ring mec 3 pipe 1 q 0
[658643.538210] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[658643.538373] amdgpu 0000:c1:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
[658643.539035] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[658643.539037] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[658643.539037] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[658643.539038] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[658643.539039] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[658643.539039] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[658643.539040] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[658643.539040] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[658643.539041] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[658643.539041] amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[658643.539042] amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 1
[658643.539043] amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 1
[658643.539043] amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[658643.542753] amdgpu 0000:c1:00.0: amdgpu: recover vram bo from shadow start
[658643.542756] amdgpu 0000:c1:00.0: amdgpu: recover vram bo from shadow done
[658643.542759] [drm] Skip scheduling IBs!
[658643.544000] [drm] ring gfx_32777.1.1 was added
[658643.545420] [drm] ring compute_32777.2.2 was added
[658643.547072] [drm] ring sdma_32777.3.3 was added
[658643.547085] [drm] ring gfx_32777.1.1 test pass
[658643.547213] [drm] ring gfx_32777.1.1 ib test pass
[658643.547223] [drm] ring compute_32777.2.2 test pass
[658643.547247] [drm] ring compute_32777.2.2 ib test pass
[658643.547781] [drm] ring sdma_32777.3.3 test pass
[658643.547852] [drm] ring sdma_32777.3.3 ib test pass
[658643.550427] amdgpu 0000:c1:00.0: amdgpu: GPU reset(4) succeeded!

It kinda looks like this is a planned operation from the driver, but a complete failure in recovery, as it’s very frustrating in having to re-open all your windows and login to everything again.

So before everyone asks, yes I’ve got the amdgpu kernel flag set, bios is the latest version, I understand Debian 12 is not “officially” FW supported, but it’s the Linux kernel right, same same but different :wink: Oh and all deb sec and updates installed.

Kernel version;
Linux xxxx 6.1.0-18-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01) x86_64 GNU/Linux
Running Gnome 45
Debian 12
64GB RAM

Oh FYI, this happens on the latest BPO kernel too, but the battery life sux, like 75% less time, which I guess is to be expected for a non-optimised kernel build.

Thanks for your time everyone, keep up the great product FW people.

Paul.

That looks a lot like bugs that happen when you’re on old GPU firmware or an older mesa. Try upgrading both?

Have a similar issue, seeing very colourful mosaic-like flickering, different from flickering caused by not setting amdgpu.sgdisplay=0. It is usually when wake up after disconnecting laptop from dp 3440x1440 external monitor during standby, with fedora 39 kernal 6.7.5 bios 3.03 on 7840u with 16g x2 ram, have set amdgpu.sg_display=0. This could be temporarily solved by changing resolution after flickering occurs.
Here is the important log related

12:22:57 kernel: ucsi_acpi USBC000:00: ucsi_handle_connector_change: ACK failed (-110)
12:22:56 wpa_supplicant: bgscan simple: Failed to enable signal strength monitoring
12:22:53 kernel: amdgpu 0000:c1:00.0: [drm] *ERROR* Error queueing DMUB command: status=4
12:22:53 kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
12:22:53 kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=14
12:22:53 kernel: ucsi_acpi USBC000:00: failed to re-enable notifications (-22)
12:22:53 kernel: ucsi_acpi USBC000:00: failed to re-enable notifications (-22)
12:22:53 kernel: ucsi_acpi USBC000:00: possible UCSI driver bug 1
00:39:07 kernel: i2c_hid_acpi i2c-FRMW0005:00: i2c_hid_get_input: incomplete report (7/65535)

Here is the log about amdgpu

[    3.466464] [drm] amdgpu kernel modesetting enabled.
[    3.473596] amdgpu: Virtual CRAT table created for CPU
[    3.473612] amdgpu: Topology: Add CPU node
[    3.473747] amdgpu 0000:c1:00.0: enabling device (0006 -> 0007)
[    3.478413] amdgpu 0000:c1:00.0: amdgpu: Fetched VBIOS from VFCT
[    3.478416] amdgpu: ATOM BIOS: 113-PHXGENERIC-001
[    3.511171] amdgpu 0000:c1:00.0: [drm:jpeg_v4_0_early_init [amdgpu]] JPEG decode is enabled in VM mode
[    3.527169] amdgpu 0000:c1:00.0: vgaarb: deactivate vga console
[    3.527178] amdgpu 0000:c1:00.0: amdgpu: Trusted Memory Zone (TMZ) feature enabled
[    3.527279] amdgpu 0000:c1:00.0: amdgpu: VRAM: 512M 0x0000008000000000 - 0x000000801FFFFFFF (512M used)
[    3.527282] amdgpu 0000:c1:00.0: amdgpu: GART: 512M 0x00007FFF00000000 - 0x00007FFF1FFFFFFF
[    3.527553] [drm] amdgpu: 512M of VRAM memory ready
[    3.527556] [drm] amdgpu: 15635M of GTT memory ready.
[    3.528763] amdgpu 0000:c1:00.0: amdgpu: Will use PSP to load VCN firmware
[    4.099518] amdgpu 0000:c1:00.0: amdgpu: RAS: optional ras ta ucode is not available
[    4.107830] amdgpu 0000:c1:00.0: amdgpu: RAP: optional rap ta ucode is not available
[    4.107833] amdgpu 0000:c1:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[    4.141052] amdgpu 0000:c1:00.0: amdgpu: SMU is initialized successfully!
[    4.210676] amdgpu 0000:c1:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
[    4.228091] amdgpu: HMM registered 512MB device memory
[    4.229221] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[    4.229238] kfd kfd: amdgpu: Total number of KFD nodes to be created: 1
[    4.229701] amdgpu: Virtual CRAT table created for GPU
[    4.229870] amdgpu: Topology: Add dGPU node [0x15bf:0x1002]
[    4.229873] kfd kfd: amdgpu: added device 1002:15bf
[    4.229886] amdgpu 0000:c1:00.0: amdgpu: SE 1, SH per SE 2, CU per SH 6, active_cu_number 12
[    4.229893] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[    4.229895] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[    4.229896] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[    4.229897] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[    4.229899] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[    4.229900] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[    4.229901] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[    4.229902] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[    4.229904] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[    4.229905] amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[    4.229906] amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[    4.229907] amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[    4.229908] amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[    4.234785] [drm] Initialized amdgpu 3.57.0 20150101 for 0000:c1:00.0 on minor 1
[    4.259676] fbcon: amdgpudrmfb (fb0) is primary device
[    4.259680] amdgpu 0000:c1:00.0: [drm] fb0: amdgpudrmfb frame buffer device
[   19.405236] snd_hda_intel 0000:c1:00.1: bound 0000:c1:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
[  129.875944] amdgpu 0000:c1:00.0: Using 44-bit DMA addresses
[80357.725448] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=14
[80357.725777] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
[80357.728079] amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
[80357.731419] amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
[80357.731484] amdgpu 0000:c1:00.0: [drm] *ERROR* Error queueing DMUB command: status=4
[80357.837550] amdgpu 0000:c1:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
[80357.837909] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[80357.837913] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[80357.837915] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[80357.837917] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[80357.837919] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[80357.837921] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[80357.837923] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[80357.837924] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[80357.837927] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[80357.837928] amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[80357.837930] amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[80357.837932] amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[80357.837934] amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[83303.686862] amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:1 pasid:32771, for process firefox pid 5153 thread firefox:cs0 pid 5263)
[83303.686875] amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x000080003f350000 from client 10
[83303.686881] amdgpu 0000:c1:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00101430
[83303.686885] amdgpu 0000:c1:00.0: amdgpu: 	 Faulty UTCL2 client ID: SQC (data) (0xa)
[83303.686889] amdgpu 0000:c1:00.0: amdgpu: 	 MORE_FAULTS: 0x0
[83303.686893] amdgpu 0000:c1:00.0: amdgpu: 	 WALKER_ERROR: 0x0
[83303.686896] amdgpu 0000:c1:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x3
[83303.686899] amdgpu 0000:c1:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
[83303.686902] amdgpu 0000:c1:00.0: amdgpu: 	 RW: 0x0
[83313.936525] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, but soft recovered
[85977.270448] amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:1 pasid:32771, for process firefox pid 5153 thread firefox:cs0 pid 5263)
[85977.270462] amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000000000000000 from client 10
[85977.270468] amdgpu 0000:c1:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00101430
[85977.270472] amdgpu 0000:c1:00.0: amdgpu: 	 Faulty UTCL2 client ID: SQC (data) (0xa)
[85977.270476] amdgpu 0000:c1:00.0: amdgpu: 	 MORE_FAULTS: 0x0
[85977.270480] amdgpu 0000:c1:00.0: amdgpu: 	 WALKER_ERROR: 0x0
[85977.270483] amdgpu 0000:c1:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x3
[85977.270486] amdgpu 0000:c1:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
[85977.270489] amdgpu 0000:c1:00.0: amdgpu: 	 RW: 0x0
[85987.583335] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, but soft recovered
[87413.903259] amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:1 pasid:32771, for process firefox pid 5153 thread firefox:cs0 pid 5263)
[87413.903273] amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x00004000bf004000 from client 10
[87413.903279] amdgpu 0000:c1:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00101430
[87413.903283] amdgpu 0000:c1:00.0: amdgpu: 	 Faulty UTCL2 client ID: SQC (data) (0xa)
[87413.903287] amdgpu 0000:c1:00.0: amdgpu: 	 MORE_FAULTS: 0x0
[87413.903291] amdgpu 0000:c1:00.0: amdgpu: 	 WALKER_ERROR: 0x0
[87413.903294] amdgpu 0000:c1:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x3
[87413.903297] amdgpu 0000:c1:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
[87413.903299] amdgpu 0000:c1:00.0: amdgpu: 	 RW: 0x0
[87424.246049] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, but soft recovered
[106001.728320] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=14
[106001.728642] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
[106001.731128] amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
[106001.734424] amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
[106001.734489] amdgpu 0000:c1:00.0: [drm] *ERROR* Error queueing DMUB command: status=4
[106001.840943] amdgpu 0000:c1:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
[106001.841287] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[106001.841291] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[106001.841293] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[106001.841295] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[106001.841297] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[106001.841299] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[106001.841300] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[106001.841302] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[106001.841304] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[106001.841306] amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[106001.841308] amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[106001.841309] amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[106001.841311] amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[152930.943600] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=14
[152930.943922] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
[152930.946475] amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
[152930.949683] amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
[152930.949747] amdgpu 0000:c1:00.0: [drm] *ERROR* Error queueing DMUB command: status=4
[152931.056253] amdgpu 0000:c1:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
[152931.056610] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[152931.056613] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[152931.056615] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[152931.056617] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[152931.056619] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[152931.056621] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[152931.056622] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[152931.056624] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[152931.056626] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[152931.056628] amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[152931.056630] amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[152931.056632] amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[152931.056634] amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0

In Jan, when resuming from suspend, I was getting the pink-blocky-flickering screen quite a bit. Suspending-and-resuming again would typically fix it.

Haven’t seen that issue for at least a couple weeks though, and I think the only thing I changed was the kernel.

Maybe try upgrading? I’m running 6.7.6-200.fc39.x86_64 and it’s feeling … I hate to jinx it, but it’s feeling pretty darn stable.

I still have sg_disable set of course.

Yep, I can corroborate 6.7.6 seems better.

The new FL13 arrived a few days ago, which I’ll be daily driving for a few weeks while waiting for my FL16 and then set up for my sister.
I also experienced flickering/strobing at 30Hz between full screen white with few black spots and my normal screen. It happened after returning from sleep or launching for example rocket league, both in Plasma and hyprland (both in Wayland, didn’t try it in X11). Luckily I’m not epileptic, that was really intense.
Since I’m on arch, I already was on the newest kernel (6.7.5, upgraded to 6.7.6 in between), added the kernel parameter and switched to the game mode in uefi.

After using it for a day, this seems to have fixed it!

The only issues remaining are a double tactile click in the bottom right of the touch pad (might contact customer support and fix it with some tape on the bottom or sth) and rocket league only launching to a black screen.

Not sure if my Rocket League issue relates to any of this, but I’ll just ask anyways.
Basically the game launches and stays black - it still runs, as I can hear music in the epic games version that definitely comes from rocket league. It doesn’t crash, but I found the logs in the proton prefix and the only issue they state is “Timed out while waiting for GPU to catch up. (500 ms)”.
Also, Dead by Daylight runs at less than 1 FPS, while loading two cores constantly. Using nvtop I saw that the dedicated 4GB of VRAM are almost full, but it should be able to allocate up to 32GB of my 64GBs of RAM (which mostly were unused). I might try to change the clock speed from 5600MHz to 4800MHz and/or use memtest to check its stability. However, I doubt that this is actually the issue.

Is this the right thread to ask? Otherwise I’ll start a new one for this specific issue.

Might be worth taking this to another thread but …

For the steam version I can only see the game if I switch Proton from Vulkan to OpenGL.

PROTON_USE_WINED3D=1  %command%

i don’t wanna decry it, but i think my problem with grimshot and slup are gone <3

Well I’m back. It’s been a while but I have had graphical corruption issues again. It started a few weeks ago, but I’ve been to busy to post. When I posted the first time the specific corruption I had looked like what mopac posted in October (white blocky artifacts). In order to get rid of them I would have to restart my computer. They occured more often when the computer had been powered on for a few days. amdgpu.sg_display=0 ultimatly fixed it and I haven’t had any issues for months.

This new graphical corruption looks more like what tokanda posted (colorful and with a seemingly more random distribution with minimal banding). It does not seem to matter how long my computers been up. And I don’t experience it often. I have only seen it happen after suspend or after my lock screen appears due to idling. Also this version of corruption has gone away without me restarting the computer just by letting it sleep after idling (but that behavior is not consistant). It also doesn’t happen very often (I’ve seen it probably 2 or three times.) and I have no idea how to reproduce it.

@Kenneth_L_Rountree I presume you are running an Linux Kernel earlier than 6.8. A lot of fixes for AMDGPU have gone in for 6.8 and i’m running without “amdgpu.sg_display=0” for a few weeks now and haven’t had any issues in this regard anymore…

Yes 6.7.9 dnf says I’m up to date. I thought dnf updated the kernel automaticly. I guess I’ll look into updating in manualy. Thanks.

@Kenneth_L_Rountree Well, i’m on NixOS and have been running release candidate kernels for the 6.8 series, but it is released now and i imagine, it should appear pretty quickly in the fedora repositories… I’m running 6.8.1 now to be precise. Perhaps also of interest for other users: It appears that there is no need to run with “rtc_cmos.use_acpi_alarm=1” anymore, since, iirc, this has become the new default for AMD systems also, thanks to @Mario_Limonciello. Getting there, IMHO…

1 Like

I was encountering display strobing similar to what is described here, in addition to GPU hangs when playing 3D games, on Debian 12 (thread on the latter issue), even after backporting the kernel and firmware, and I suspect it’s a related issue. Working around it, for me, required both updating Mesa to a newer version than Debian provided (I eventually just moved to Debian testing), and setting the kernel parameter amdgpu.sg_display=0. I also turned on UMA_GAME_OPTIMIZED in BIOS settings at around the same time I set that kernel parameter.

Thanks Mario,
I’ve pulled Debian’s Trixie firmware packages and the git kernel repo for firmware and copied it in on my current kernel, updated initram and rebooted.

Let’s see if it this stabilises things.

You’ll see 6.8 in fedora 40, since thats the active branch most 6.8 kernel builds will be there, FC 40 beta was postponed to 3/26.
6.8.0-63.fc40.1.x86_64

I still manage to trigger it on each suspend with UMA_AUTO and amdgpu.sg_display=0 unset, I have also been able to reliably trigger it by changing gnomes scaling offset.

This is great news! I’ll remove it and try the suspend script again and see if it passes.

On Rawhide (40) and 6.8 kernels I have been able to trigger it with sg disabled and UMA pumped to 4GB when I am using HDMI external display.

I was running a tradebooth screen off my Framework 13 a couple of weeks ago and managed to get it to trigger with disconnects/reconnects of the external screen. The eDP panel remained unaffected but I got the familar whiteout/banding on the external display after a couple of plug/replug events.

2 Likes

I also still occasionally observe the white-screen flashing on Arch, which is currently on kernel 6.8.1, with UMA_GAME_OPTIMIZED enabled.

I need to do more testing to find out what, exactly, triggers it, but I’m leaning towards external monitors, XWayland, and/or GPU intensive applications (such as MS Windows games, which perhaps not coincidentally also tend to use XWayland). It would be interesting to find out if I can still reproduce the issue with Wayland native apps.

Also pointing out that switching back-and-forth between TTYs (ctrl+alt+F2/F3/F… in Gnome; or sudo chvt 2 on the terminal, where chvt 1 should switch you back) will generally restore stuff to normal. sudo systemctl soft-reboot also tends to work, but that tends to cause my wifi device to be unavailable.

4 Likes

uhhh i need to try this, thanks :3