[TRACKING] Graphical corruption in Fedora 39 (AMD 3.03 BIOS)

The package in testing is amd-gpu-firmware-20231030-1.fc39 - Fedora Packages

I pulled apart the change notes and got:

  • Update to upstream 20231030 release
  • Update firmware file for Intel Bluetooth AX203/AX210/AX211/
  • Update firmware file for Intel Bluetooth Magnetor AX101/AX201/AX211
  • rtl_nic: update firmware of RTL8156B
  • Update AMD cpu microcode
  • amdgpu: update SMU 13.0.0 firmware
  • add Amlogic bluetooth firmware
  • i915: Add GuC v70.13.1 for DG2, TGL, ADL-P and MTL
  • iwlwifi: add a missing FW from core80-39 release
  • WHENCE: add symlink for BananaPi M64
  • i915: Update MTL DMC to v2.17
  • amdgpu: update various firmware from 5.7 branch
  • iwlwifi: add FWs for new GL and MA device types with multiple RF modules
  • amd_pmf: Add initial PMF TA for Smart PC Solution Builder
  • Update FW files for MRVL PCIE 8997 chipsets
  • rtl_bt: Update RTL8851B BT USB firmware to 0x048A_D230
  • iwlwifi: add new FWs from core81
  • iwlwifi: update cc/Qu/QuZ firmwares for core81-65 release
2 Likes

I’d like to report that I’m seeing the same type of artifacts (big white squares) on Debian Trixie (current testing, future Debian 13) with bios 3.03 and kernel 6.5.8 using Plasma 5.27.8.

So it seems the issue is not Fedora-specific.

While this is admittedly a cool effect, I was able to trigger this issue again with a full-screen application (Skyrim Special Edition) using non-windowed mode in native Steam.

This was with 6.5.9-300 and the 20231030 amd-gpu-firmware package.

This is with 32GB of Framework-supplied DDR5 RAM and no external monitors attached (although replicated with an external monitor as well). 2x USB-C, 2X USB-A expansion cards.

I am definitely seeing an improvement with KDE, 3.03, Linux 6.5.9, and the new 20231030 firmware. Unplugging and plugging back in a dock no longer causes a rapidly flashing white artifact. I am not using amdgpu.sg_display=0 either.

Very exciting to see this much progress.

Per AMD themselves, please try this and report back:

1 Like

If you are still experiencing this. As a test can you try to enable the bios item and report back if this helps or not:
Advanced->iGPU Configuration->UMA_GAME_OPTIMIZED

This will allocate more memory to the GPU.
I discussed this with AMD, and they suspect that these visual artifacts are caused by high memory allocation on the GPU. So providing more memory may help alleviate this issue.

5 Likes

This is actually a great idea and aligns well with what is described in the Fedora bugzilla thread.
In there they speculate that the issue is associated with near full vram usage.

If I can reliably reproduce that with the webgl fish demo I’ll report back.

Oh, maybe that is why I never experienced this. Certainly worth a try for the people affected! The only time I saw it was in a memory heavy game quite a while ago and I toggled that quite a while ago as well.

  • Happens regardless of power or battery
  • Internal display (have not tried an external display yet)
  • have not tested with live usb. (if this would be helpful don’t mind testing this.)

I’ve observed this on sway and i3. Most frequently occurs after waking from sleep if I did not shut the laptop lid myself. Once it occurs in a given boot, I can make it go away by killing sway or i3 (which ever I’m using)and restarting sway or i3. However The longer the laptop has been on the more frequently this occurs. around a week or so it becomes so frequent that I just reboot. Rebooting keeps it at bay for a while. (some times a day or two some times only a few hours) origionaly thought this was a wayland issue untill I tried switching to i3 to get rid of it (which failed, however the only time I got it to happen on x11 was by messing with the refresh rates or having it happen in sway and switching to i3 before killing sway). This happens at all the scales I’ve tried fractional or not. Though it seem to happen less frequently at 1.

It was alway in the back of my head to check the framework forums but I feel silly for not checking sooner.

I skimed this thread and I saw:

  • update the bios to 3.03 (but someone reported that made it worse)
  • Adding the amdgpu.sg_display=0 kernel param (is that an x11 specific?)
    Which is the correct fix?

Also do you still need people to file bug reports? And who do we file with, redhat, fedora, or amd? all have been mentioned in this thread.

All,

Please try what is suggested here and report back.

We’re actively tracking this and need your A/B testing to confirm if this is helping.

Trying to understand this…the A/B testing is to see if this “may help alleviate this issue.”…and not necessary a final fix? Correct?

Yes. On/Off, UMA_GAME_OPTIMIZED in use or not.

UMA_GAME_OPTIMIZED is the proper workaround for me. (as it increases vram)
But I can still reproduce it when I max out the vram otherwise: (checked with “radeontop”)

// (write 2g from GTT to VRAM 1000 times)
amdgpu_stress -b g 2g -b v 2g -c 1 2 2g 1000

When I launch a youtube video in fullscreen with firefox, or I attach my external Monitor (4k120Hz hdmi2.1) it looks like all the pictures in this thread.
lots of:
amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffb1580000 flags=0x0000]

I am using Arch and kde plasma.

If you Have 32GB>= of RAM the UMA Game optimised should really be set. Ideally this should be allocatable tunable in the BIOS. Some applications balk at the 4GB and refuse to load which leads to dodgy hacks such as overiding the HII database i.e : GitHub - DavidS95/Smokeless_UMAF

Frame.work can you make this a user selectable in the next bios release please 512GB 4GB 8GB options.

1 Like

We can only set the optimized option on and off. As this is only a binary flag we can send to AMD firmware.
I also wish we had better control of this as well to provide you with more granular control.

Just as a note, if the GPU runs out of RAM, it will continue allocate memory from system memory.

1 Like

Initially with the bios 3.02 I have been seeing graphical corruption but looking at the bios settings, I have enabled the UMA Game Optimized as well. I have not seen the graphical corruption, nor screen flickering. But I have another issue, which is specifically when I connect my 4K display through the Display Port module and the laptop screens goes dark after inactivity, on the resumption the laptop hangs/becomes extremely slow.

This issue doesn’t happen with the HDMI cable and my 4K monitor, neither with the Display port and another 1440p monitor I have.

Switching to the bios 3.03, same issues are seen. Graphical corruption without the UMA Game Optimized, system hang on the 4K display port on system resumption with the UMA Game Optimized on. I managed to disconnect the display and was patient enough to dump the dmesg for bios 3.03. (I have 7840U variant kitted with 96GB (2x48 GB) crucial DRAM sticks, running fedora 39 with gnome. kernel: 6.5.8-300.fc39.x86_64) Here are some of the messages after greping for amdgpu:

[    3.168357] [drm] amdgpu kernel modesetting enabled.
[    3.175448] amdgpu: CRAT table disabled by module option
[    3.175454] amdgpu: Virtual CRAT table created for CPU
[    3.175485] amdgpu: Topology: Add CPU node
[    3.175631] amdgpu 0000:c1:00.0: enabling device (0006 -> 0007)
[    3.180188] amdgpu 0000:c1:00.0: amdgpu: Fetched VBIOS from VFCT
[    3.180189] amdgpu: ATOM BIOS: 113-PHXGENERIC-001
[    3.212899] amdgpu 0000:c1:00.0: [drm:jpeg_v4_0_early_init [amdgpu]] JPEG decode is enabled in VM mode
[    3.237832] amdgpu 0000:c1:00.0: vgaarb: deactivate vga console
[    3.237839] amdgpu 0000:c1:00.0: amdgpu: Trusted Memory Zone (TMZ) feature enabled
[    3.237945] amdgpu 0000:c1:00.0: amdgpu: VRAM: 4096M 0x0000008000000000 - 0x00000080FFFFFFFF (4096M used)
[    3.237947] amdgpu 0000:c1:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[    3.237948] amdgpu 0000:c1:00.0: amdgpu: AGP: 267894784M 0x0000008400000000 - 0x0000FFFFFFFFFFFF
[    3.238208] [drm] amdgpu: 4096M of VRAM memory ready
[    3.238211] [drm] amdgpu: 46098M of GTT memory ready.
[    3.241589] amdgpu 0000:c1:00.0: amdgpu: Will use PSP to load VCN firmware
[    3.783043] amdgpu 0000:c1:00.0: amdgpu: RAS: optional ras ta ucode is not available
[    3.791355] amdgpu 0000:c1:00.0: amdgpu: RAP: optional rap ta ucode is not available
[    3.791359] amdgpu 0000:c1:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[    3.812385] amdgpu 0000:c1:00.0: amdgpu: SMU is initialized successfully!
[    3.904880] amdgpu 0000:c1:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
[    3.923140] amdgpu: HMM registered 4096MB device memory
[    3.924242] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[    3.924247] kfd kfd: amdgpu: Total number of KFD nodes to be created: 1
[    3.924502] amdgpu: Virtual CRAT table created for GPU
[    3.925052] amdgpu: Topology: Add dGPU node [0x15bf:0x1002]
[    3.925054] kfd kfd: amdgpu: added device 1002:15bf
[    3.925069] amdgpu 0000:c1:00.0: amdgpu: SE 1, SH per SE 2, CU per SH 6, active_cu_number 12
[    3.925214] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[    3.925216] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[    3.925217] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[    3.925219] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[    3.925220] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[    3.925221] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[    3.925222] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[    3.925223] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[    3.925224] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[    3.925225] amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[    3.925226] amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[    3.925227] amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[    3.925229] amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[    3.931466] [drm] Initialized amdgpu 3.54.0 20150101 for 0000:c1:00.0 on minor 1
[    3.936982] fbcon: amdgpudrmfb (fb0) is primary device
[    3.936986] amdgpu 0000:c1:00.0: [drm] fb0: amdgpudrmfb frame buffer device
[   20.345718] snd_hda_intel 0000:c1:00.1: bound 0000:c1:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
[ 1136.599129] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=14
[ 1136.599367] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
[ 1136.601708] amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
[ 1136.605286] amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
[ 1136.711339] amdgpu 0000:c1:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
[ 1136.711598] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[ 1136.711601] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ 1136.711603] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ 1136.711604] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[ 1136.711605] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[ 1136.711606] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[ 1136.711607] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[ 1136.711608] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[ 1136.711610] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[ 1136.711611] amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[ 1136.711612] amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[ 1136.711613] amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[ 1136.711614] amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[ 3322.975833] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=14
[ 3322.976074] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
[ 3322.978530] amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
[ 3322.981620] amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
[ 3323.355351] amdgpu 0000:c1:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
[ 3323.355605] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[ 3323.355608] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ 3323.355610] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ 3323.355611] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[ 3323.355612] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[ 3323.355613] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[ 3323.355614] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[ 3323.355615] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[ 3323.355616] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[ 3323.355617] amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[ 3323.355619] amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[ 3323.355620] amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[ 3323.355621] amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[ 3381.540086] [drm:detect_link_and_local_sink [amdgpu]] *ERROR* No EDID read.
[ 3409.925738] WARNING: CPU: 5 PID: 536 at drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dp_capability.c:1527 dp_retrieve_lttpr_cap+0x16f/0x1a0 [amdgpu]
[ 3409.926030]  hid_sensor_iio_common snd_timer irqbypass snd_acp_config industrialio_triggered_buffer kfifo_buf snd_soc_acpi rapl snd thunderbolt industrialio rfkill soundcore snd_pci_acp3x pcspkr i2c_piix4 k10temp amd_pmf joydev amd_pmc platform_profile loop zram dm_crypt amdgpu i2c_algo_bit drm_ttm_helper ttm drm_suballoc_helper amdxcp iommu_v2 crct10dif_pclmul drm_buddy nvme crc32_pclmul gpu_sched crc32c_intel polyval_clmulni polyval_generic nvme_core drm_display_helper video ucsi_acpi ghash_clmulni_intel hid_sensor_hub hid_multitouch sha512_ssse3 typec_ucsi ccp sp5100_tco cec typec nvme_common wmi i2c_hid_acpi i2c_hid serio_raw ip6_tables ip_tables fuse
[ 3409.926084] Workqueue: events_highpri dm_irq_work_func [amdgpu]
[ 3409.926264] RIP: 0010:dp_retrieve_lttpr_cap+0x16f/0x1a0 [amdgpu]
[ 3409.926447]  ? dp_retrieve_lttpr_cap+0x16f/0x1a0 [amdgpu]
[ 3409.926608]  ? dp_retrieve_lttpr_cap+0x16f/0x1a0 [amdgpu]
[ 3409.926761]  ? dp_retrieve_lttpr_cap+0x16f/0x1a0 [amdgpu]
[ 3409.926890]  ? dp_retrieve_lttpr_cap+0x114/0x1a0 [amdgpu]
[ 3409.927020]  retrieve_link_cap+0x7d/0xb90 [amdgpu]
[ 3409.927155]  ? dp_is_sink_present+0xbc/0x120 [amdgpu]
[ 3409.927284]  detect_link_and_local_sink+0xb24/0xfc0 [amdgpu]
[ 3409.927446]  link_detect+0x3a/0x480 [amdgpu]
[ 3409.927583]  ? dal_gpio_destroy_irq+0x25/0x40 [amdgpu]
[ 3409.927727]  ? query_hpd_status+0x6e/0xa0 [amdgpu]
[ 3409.927889]  handle_hpd_irq_helper+0xf9/0x170 [amdgpu]
[ 3410.109821] [drm:retrieve_link_cap [amdgpu]] *ERROR* retrieve_link_cap: Read receiver caps dpcd data failed.
[ 3413.573401] [drm:detect_link_and_local_sink [amdgpu]] *ERROR* No EDID read.
[ 3588.187339] [drm:detect_link_and_local_sink [amdgpu]] *ERROR* No EDID read.
[ 3616.713695] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
[ 3624.252516]  hid_sensor_iio_common snd_timer irqbypass snd_acp_config industrialio_triggered_buffer kfifo_buf snd_soc_acpi rapl snd thunderbolt industrialio rfkill soundcore snd_pci_acp3x pcspkr i2c_piix4 k10temp amd_pmf joydev amd_pmc platform_profile loop zram dm_crypt amdgpu i2c_algo_bit drm_ttm_helper ttm drm_suballoc_helper amdxcp iommu_v2 crct10dif_pclmul drm_buddy nvme crc32_pclmul gpu_sched crc32c_intel polyval_clmulni polyval_generic nvme_core drm_display_helper video ucsi_acpi ghash_clmulni_intel hid_sensor_hub hid_multitouch sha512_ssse3 typec_ucsi ccp sp5100_tco cec typec nvme_common wmi i2c_hid_acpi i2c_hid serio_raw ip6_tables ip_tables fuse
[ 3624.252560] Workqueue: events_highpri dm_irq_work_func [amdgpu]
[ 3624.252803]  dmub_srv_wait_for_idle+0x40/0x90 [amdgpu]
[ 3624.252964]  dc_dmub_srv_cmd_run_list+0xed/0x1b0 [amdgpu]
[ 3624.253110]  dcn31_link_encoder_is_in_alt_mode+0xae/0x100 [amdgpu]
[ 3624.253259]  detect_link_and_local_sink+0xc02/0xfc0 [amdgpu]
[ 3624.253427]  ? dm_read_reg_func+0x38/0xb0 [amdgpu]
[ 3624.253597]  link_detect+0x3a/0x480 [amdgpu]
[ 3624.253757]  ? query_hpd_status+0x6e/0xa0 [amdgpu]
[ 3624.253904]  handle_hpd_irq_helper+0xf9/0x170 [amdgpu]
[ 3625.113178] [drm:dc_dmub_srv_cmd_run_list [amdgpu]] *ERROR* Error queueing DMUB command: status=2
...
[ 3647.240539] [drm:dc_dmub_srv_cmd_run_list [amdgpu]] *ERROR* Error queueing DMUB command: status=2
[ 3647.433623] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
[ 3647.481713] [drm:dc_dmub_srv_cmd_run_list [amdgpu]] *ERROR* Error queueing DMUB command: status=2
...

2 Likes

@Kieran_Levin ; I haven’t tried Smokeless UMAF on the FW13 ; given the HII Database seems to be exported by default tho as part of AGESA on most BIOS’s (I have one 570x mainboard from ASUS that it’s a tuneable ), I would expect it to ‘somewhat work’. However as there is no mention of Phoenix support have been Leary to attempt; It definitely is helpful on the 5600G I use as my HTPC box tho. Yeah appreciate the Static reserved portion is expandable into 32GB of System RAM on demand but - there are some things which only check the reserved portion for reporting.

So my curiosity got the better of me, Smokeless-UMAF does indeed work with the FW13 AMD however as you’ve mentioned the UMA Buffer data structure only has Auto and Enhanced exposed.

May not be related (ignore this if that’s the case): What’s this 8GB VRAM thing on the Asus Ally?

For that processor / BIOS, they seem to be able to choose between Auto / 3 / 4 / 6 / 8 GB.
https://www.reddit.com/r/ROGAlly/comments/149h6hp/optimal_vram_allocation_2_gb_default_4_gb_8_gb/

Is Asus getting some special TLC from AMD with additional BIOS / flag / setting support?

I can reliably reproduce white graphical flickering 100% of the time by clicking the desktop session picker in SDDM before logging in on Fedora 39 KDE (kernel 6.5.9-300.fc39.x86_64) with all packages updated. While SDDM’s desktop session picker’s dropdown (where you choose between desktop environments and X11/Wayland) is open, all of the screen except the dropdown itself flickers white very rapidly. Setting UMA_GAME_OPTIMIZED did not change the behavior at all.

2 Likes