[RESPONDED] AMD Ryzen 7040 (7840U) - Arch Linux amdgpu errors, blank screen on opening Steam

Apologies if there’s a thread already tracking this, I only found one comment in this thread.

System info:

  • Model: Framework 13 AMD Ryzen 7840U
  • Distro: archlinux
  • Kernel: 6.5.7-arch1-1
  • DE/WM: i3 (Xorg)
  • Bios version: 03.02

Repro steps:

  1. On battery power, no AC (I am running TLP with default settings)
  2. Boot, launch i3
  3. Setup logging sudo dmesg -W -T | tee ~/Documents/debug_dmesg
  4. Launch steam steam-runtime 2>&1 | tee ~/Documents/debug_steam

Outcome:

  • Display appears to freeze or flicker to black screen.
  • Need to hold down power and shutdown, reboot. I have launched steam at least once so possibly some condition/race here that I haven’t found yet.

Logs etc.

Including the dmesg output in post for searching:

[Fri Oct 20 11:33:39 2023] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=14
[Fri Oct 20 11:33:39 2023] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
[Fri Oct 20 11:33:39 2023] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=14
[Fri Oct 20 11:33:39 2023] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
[Fri Oct 20 11:33:49 2023] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=10317, emitted seq=10319
[Fri Oct 20 11:33:49 2023] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process kitty pid 5272 thread kitty:cs0 pid 5273
[Fri Oct 20 11:33:49 2023] amdgpu 0000:c1:00.0: amdgpu: GPU reset begin!
[Fri Oct 20 11:33:49 2023] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[Fri Oct 20 11:33:49 2023] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[Fri Oct 20 11:33:49 2023] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[Fri Oct 20 11:33:49 2023] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[Fri Oct 20 11:33:49 2023] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[Fri Oct 20 11:33:49 2023] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[Fri Oct 20 11:33:49 2023] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[Fri Oct 20 11:33:49 2023] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[Fri Oct 20 11:33:49 2023] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[Fri Oct 20 11:33:49 2023] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[Fri Oct 20 11:33:50 2023] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[Fri Oct 20 11:33:50 2023] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[Fri Oct 20 11:33:50 2023] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[Fri Oct 20 11:33:50 2023] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[Fri Oct 20 11:33:50 2023] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[Fri Oct 20 11:33:50 2023] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[Fri Oct 20 11:33:50 2023] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[Fri Oct 20 11:33:50 2023] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[Fri Oct 20 11:33:50 2023] amdgpu 0000:c1:00.0: amdgpu: MODE2 reset
[Fri Oct 20 11:33:50 2023] amdgpu 0000:c1:00.0: amdgpu: GPU reset succeeded, trying to resume
[Fri Oct 20 11:33:50 2023] [drm] PCIE GART of 512M enabled (table at 0x000000801FD00000).
[Fri Oct 20 11:33:50 2023] amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
[Fri Oct 20 11:33:50 2023] amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
[Fri Oct 20 11:33:50 2023] [drm] DMUB hardware initialized: version=0x08001E00
[Fri Oct 20 11:33:50 2023] [drm] kiq ring mec 3 pipe 1 q 0
[Fri Oct 20 11:33:50 2023] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[Fri Oct 20 11:33:50 2023] amdgpu 0000:c1:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
[Fri Oct 20 11:33:50 2023] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[Fri Oct 20 11:33:50 2023] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[Fri Oct 20 11:33:50 2023] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[Fri Oct 20 11:33:50 2023] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[Fri Oct 20 11:33:50 2023] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[Fri Oct 20 11:33:50 2023] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[Fri Oct 20 11:33:50 2023] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[Fri Oct 20 11:33:50 2023] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[Fri Oct 20 11:33:50 2023] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[Fri Oct 20 11:33:50 2023] amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[Fri Oct 20 11:33:50 2023] amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[Fri Oct 20 11:33:50 2023] amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[Fri Oct 20 11:33:50 2023] amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[Fri Oct 20 11:33:50 2023] amdgpu 0000:c1:00.0: amdgpu: recover vram bo from shadow start
[Fri Oct 20 11:33:50 2023] amdgpu 0000:c1:00.0: amdgpu: recover vram bo from shadow done
[Fri Oct 20 11:33:50 2023] [drm] Skip scheduling IBs!
[Fri Oct 20 11:33:50 2023] [drm] Skip scheduling IBs!
[Fri Oct 20 11:33:50 2023] [drm] Skip scheduling IBs!
[Fri Oct 20 11:33:50 2023] [drm] Skip scheduling IBs!
[Fri Oct 20 11:33:50 2023] [drm] Skip scheduling IBs!
[Fri Oct 20 11:33:50 2023] [drm] Skip scheduling IBs!
[Fri Oct 20 11:33:50 2023] [drm] Skip scheduling IBs!
[Fri Oct 20 11:33:50 2023] [drm] Skip scheduling IBs!
[Fri Oct 20 11:33:50 2023] [drm] Skip scheduling IBs!
[Fri Oct 20 11:33:50 2023] [drm] Skip scheduling IBs!
[Fri Oct 20 11:33:50 2023] [drm] Skip scheduling IBs!
[Fri Oct 20 11:33:50 2023] [drm] Skip scheduling IBs!
[Fri Oct 20 11:33:50 2023] [drm] Skip scheduling IBs!
[Fri Oct 20 11:33:50 2023] [drm] Skip scheduling IBs!
[Fri Oct 20 11:33:50 2023] [drm] Skip scheduling IBs!
[Fri Oct 20 11:33:50 2023] [drm] Skip scheduling IBs!
[Fri Oct 20 11:33:50 2023] [drm] Skip scheduling IBs!
[Fri Oct 20 11:33:50 2023] [drm] Skip scheduling IBs!
[Fri Oct 20 11:33:50 2023] [drm] Skip scheduling IBs!
[Fri Oct 20 11:33:50 2023] [drm] Skip scheduling IBs!
[Fri Oct 20 11:33:50 2023] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[Fri Oct 20 11:33:50 2023] [drm] ring gfx_32780.1.1 was added
[Fri Oct 20 11:33:50 2023] [drm] ring compute_32780.2.2 was added
[Fri Oct 20 11:33:50 2023] [drm] ring sdma_32780.3.3 was added
[Fri Oct 20 11:33:50 2023] [drm] ring gfx_32780.1.1 ib test pass
[Fri Oct 20 11:33:50 2023] [drm] ring compute_32780.2.2 ib test pass
[Fri Oct 20 11:33:50 2023] [drm] ring sdma_32780.3.3 ib test pass
[Fri Oct 20 11:33:50 2023] amdgpu 0000:c1:00.0: amdgpu: GPU reset(2) succeeded!
[Fri Oct 20 11:33:50 2023] [drm] Skip scheduling IBs!
[Fri Oct 20 11:33:50 2023] [drm] Skip scheduling IBs!
[Fri Oct 20 11:33:50 2023] [drm] Skip scheduling IBs!
[Fri Oct 20 11:33:50 2023] [drm] Skip scheduling IBs!
[Fri Oct 20 11:33:50 2023] [drm] Skip scheduling IBs!
[Fri Oct 20 11:33:50 2023] [drm] Skip scheduling IBs!
[Fri Oct 20 11:33:50 2023] [drm] Skip scheduling IBs!
[Fri Oct 20 11:33:50 2023] [drm] Skip scheduling IBs!
[Fri Oct 20 11:33:50 2023] [drm] Skip scheduling IBs!

Other refrences from google:

Happy to add any other logs, configs etc. Hoping we can move the needle and get a fix upstream!

3 Likes

I have that behavior too and currently my theory is that this is related to the issue with the 3.0.2 BIOS, that has been warned about several times. I suggest you may want to revisit this after the 3.0.3 BIOS is released and you updated to that (which may still be some time away).

Note that I had significantly more luck launching Steam on an X11 session instead of a Wayland issue, but I have experienced this issue on either option.

EDIT: Confirmed fixed with me on 3.0.3

Hey, I wrote the linked post.

I did some more reading after posting, and these GPU resets with the “MES” error logs are indeed supposed to be fixed by the 3.03 BIOS - see comment 5 by Mario who is an AMD employee, he mentions MES specifically.

We’ll have to wait for that beta release, which is well overdue and didn’t end up making it in time for Batch 2 laptops it seems. Hopefully validation on that is completed soon, surely it can’t be in a buggier state than BIOS 3.02 :wink:

I should note that, once I have restarted my crashed login screen on boot (which involves this GPU reset crash), I can launch Steam and fullscreen games in my KDE Plasma Wayland session with no graphical issues.

I do get massive white flickering artefacts all over both screens when connecting to an external USB-C monitor, however.

2 Likes

Batch 2 users might be in a better position than Batch 1’s if 3.03 is flashed at the factory for them.

From one user, it appears as Batch 2 is coming with 3.02 flashed. Although given that they already have an internal release of 3.03 and their latest email, hopefully it’ll be released shortly.

Amazing, thanks for the additional info @molzy !

And indeed looking forward to the bios betas to test

No worries @cfebs !

Here’s some temporary fixes I just tested successfully for the issues I was seeing. The kernel parameter amdgpu.sg_display=0 might help with your Steam crash until BIOS 3.03 appears.

1 Like

Looking fixed with 3.03 BIOS install AMD Ryzen 7040 Series BIOS 3.03 and Driver Bundle Beta

Tested launching Steam after a few reboots. No other changes or updates installed, did not use amdgpu.sg_display=0.

:partying_face:

7 Likes

Folks, please try this - should resolve multiple issues. Tested on Fedora 39 and Ubuntu 22.04.3.

For those on Arch, have you tried previous kernel releases?

i’m experiencing an issue where sometimes on resuming from hibernation, the screen is just blank white. i see similar error messages from drm/amdgpu, in addition to stuff like this. i am on arch and i’m going to try using the linux-lts kernel for a bit to see if it stops happening.

[51086.190347] ferris kernel: PM: suspend entry (s2idle)
[51086.216302] ferris kernel: Filesystems sync: 0.025 seconds
[51086.220178] ferris kernel: Freezing user space processes
[51086.221701] ferris kernel: Freezing user space processes completed (elapsed 0.001 seconds)
[51086.221704] ferris kernel: OOM killer disabled.
[51086.221705] ferris kernel: Freezing remaining freezable tasks
[51086.222945] ferris kernel: Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[51086.222948] ferris kernel: printk: Suspending console(s) (use no_console_suspend to debug)
[51086.236967] ferris kernel: queueing ieee80211 work while going to suspend
[51086.423334] ferris kernel: ACPI: EC: interrupt blocked
[51986.242370] ferris kernel: ACPI: EC: interrupt unblocked
[51986.411489] ferris kernel: nvme nvme0: Shutdown timeout set to 10 seconds
[51986.413827] ferris kernel: nvme nvme0: 16/0/0 default/read/poll queues
[51986.511473] ferris kernel: atkbd serio0: Unknown key pressed (translated set 2, code 0x6b on isa0060/serio0).
[51986.511476] ferris kernel: atkbd serio0: Use 'setkeycodes 6b <keycode>' to make it known.
[51986.511946] ferris kernel: atkbd serio0: Unknown key released (translated set 2, code 0x6b on isa0060/serio0).
[51986.511947] ferris kernel: atkbd serio0: Use 'setkeycodes 6b <keycode>' to make it known.
[51986.543836] ferris kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=14
[51986.544075] ferris kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
[51986.546642] ferris kernel: [drm] PCIE GART of 512M enabled (table at 0x000000801FD00000).
[51986.546712] ferris kernel: amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
[51986.550474] ferris kernel: amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
[51986.709634] ferris kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode).
[51986.709796] ferris kernel: amdgpu 0000:c1:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
[51986.710055] ferris kernel: amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[51986.710057] ferris kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[51986.710059] ferris kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[51986.710060] ferris kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[51986.710061] ferris kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[51986.710061] ferris kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[51986.710062] ferris kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[51986.710063] ferris kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[51986.710064] ferris kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[51986.710064] ferris kernel: amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[51986.710065] ferris kernel: amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[51986.710065] ferris kernel: amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[51986.710066] ferris kernel: amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[51986.716029] ferris kernel: [drm] ring gfx_32780.1.1 was added
[51986.716561] ferris kernel: [drm] ring compute_32780.2.2 was added
[51986.717088] ferris kernel: [drm] ring sdma_32780.3.3 was added
[51986.717144] ferris kernel: [drm] ring gfx_32780.1.1 ib test pass
[51986.717175] ferris kernel: [drm] ring compute_32780.2.2 ib test pass
[51986.717270] ferris kernel: [drm] ring sdma_32780.3.3 ib test pass
[51986.722706] ferris kernel: OOM killer enabled.
[51986.722707] ferris kernel: Restarting tasks ... done.
[51986.724536] ferris kernel: random: crng reseeded on system resumption
[51088.183419] ferris systemd-sleep[81159]: System returned from sleep state.
[51986.726290] ferris kernel: PM: suspend exit
[51088.372785] ferris systemd-sleep[81159]: Entering sleep state 'hibernate'...
[51986.915676] ferris kernel: PM: hibernation: hibernation entry
[51088.385559] ferris systemd[1]: Starting autorandr execution hook...
[51986.940055] ferris kernel: Filesystems sync: 0.024 seconds
[51986.940244] ferris kernel: Freezing user space processes
[51986.941820] ferris kernel: Freezing user space processes completed (elapsed 0.001 seconds)
[51986.941823] ferris kernel: OOM killer disabled.
[51986.942549] ferris kernel: PM: hibernation: Marking nosave pages: [mem 0x00000000-0x00000fff]
[51986.942552] ferris kernel: PM: hibernation: Marking nosave pages: [mem 0x0009f000-0x000fffff]
[51986.942553] ferris kernel: PM: hibernation: Marking nosave pages: [mem 0x09b00000-0x09dfffff]
[51986.942556] ferris kernel: PM: hibernation: Marking nosave pages: [mem 0x09f00000-0x09f3bfff]
[51986.942557] ferris kernel: PM: hibernation: Marking nosave pages: [mem 0x4049b000-0x4049bfff]
[51986.942558] ferris kernel: PM: hibernation: Marking nosave pages: [mem 0x4049f000-0x4049ffff]
[51986.942558] ferris kernel: PM: hibernation: Marking nosave pages: [mem 0x42360000-0x4455ffff]
[51986.942588] ferris kernel: PM: hibernation: Marking nosave pages: [mem 0x44569000-0x4456cfff]
[51986.942588] ferris kernel: PM: hibernation: Marking nosave pages: [mem 0x4456f000-0x4456ffff]
[51986.942588] ferris kernel: PM: hibernation: Marking nosave pages: [mem 0x4b83b000-0x4b889fff]
[51986.942589] ferris kernel: PM: hibernation: Marking nosave pages: [mem 0x4d16b000-0x4d16bfff]
[51986.942589] ferris kernel: PM: hibernation: Marking nosave pages: [mem 0x5077f000-0x5affefff]
[51986.942735] ferris kernel: PM: hibernation: Marking nosave pages: [mem 0x5b000000-0xffffffff]
[51986.943684] ferris kernel: PM: hibernation: Basic memory bitmaps created
[51986.943685] ferris kernel: PM: hibernation: Preallocating image memory
[51991.601164] ferris kernel: ucsi_acpi USBC000:00: failed to re-enable notifications (-110)
[51993.655810] ferris kernel: PM: hibernation: Allocated 1178390 pages for snapshot
[51993.655815] ferris kernel: PM: hibernation: Allocated 4713560 kbytes in 6.71 seconds (702.46 MB/s)
[51993.655817] ferris kernel: Freezing remaining freezable tasks
[51993.658214] ferris kernel: Freezing remaining freezable tasks completed (elapsed 0.002 seconds)
[51993.658871] ferris kernel: printk: Suspending console(s) (use no_console_suspend to debug)
[51993.670698] ferris kernel: queueing ieee80211 work while going to suspend
[51993.882247] ferris kernel: ACPI: EC: interrupt blocked
[51993.886670] ferris kernel: Disabling non-boot CPUs ...
[51993.888125] ferris kernel: smpboot: CPU 1 is now offline
[51993.890467] ferris kernel: smpboot: CPU 2 is now offline
[51993.892728] ferris kernel: smpboot: CPU 3 is now offline
[51993.894924] ferris kernel: smpboot: CPU 4 is now offline
[51993.896969] ferris kernel: smpboot: CPU 5 is now offline
[51993.898848] ferris kernel: smpboot: CPU 6 is now offline
[51993.900762] ferris kernel: smpboot: CPU 7 is now offline
[51993.902668] ferris kernel: smpboot: CPU 8 is now offline
[51993.904514] ferris kernel: smpboot: CPU 9 is now offline
[51993.906358] ferris kernel: smpboot: CPU 10 is now offline
[51993.908304] ferris kernel: smpboot: CPU 11 is now offline
[51993.910251] ferris kernel: smpboot: CPU 12 is now offline
[51993.912159] ferris kernel: smpboot: CPU 13 is now offline
[51993.914135] ferris kernel: smpboot: CPU 14 is now offline
[51993.916060] ferris kernel: smpboot: CPU 15 is now offline
[51993.917059] ferris kernel: PM: hibernation: Creating image:
[51994.321520] ferris kernel: PM: hibernation: Need to copy 1254498 pages
[51994.321522] ferris kernel: PM: hibernation: Normal pages needed: 1254498 + 1024, available pages: 15308792
[51993.918216] ferris kernel: AMD-Vi: Virtual APIC enabled
[51993.918596] ferris kernel: AMD-Vi: Virtual APIC enabled
[51993.918651] ferris kernel: LVT offset 0 assigned for vector 0x400
[51993.919359] ferris kernel: Enabling non-boot CPUs ...
[51993.919650] ferris kernel: smpboot: Booting Node 0 Processor 1 APIC 0x1
[51993.922446] ferris kernel: ACPI: \_SB_.PLTF.C001: Found 3 idle states
[51993.922638] ferris kernel: CPU1 is up
[51993.922941] ferris kernel: smpboot: Booting Node 0 Processor 2 APIC 0x2
[51993.925659] ferris kernel: ACPI: \_SB_.PLTF.C002: Found 3 idle states
[51993.925802] ferris kernel: CPU2 is up
[51993.926083] ferris kernel: smpboot: Booting Node 0 Processor 3 APIC 0x3
[51993.928694] ferris kernel: ACPI: \_SB_.PLTF.C003: Found 3 idle states
[51993.928858] ferris kernel: CPU3 is up
[51993.929121] ferris kernel: smpboot: Booting Node 0 Processor 4 APIC 0x4
[51993.931887] ferris kernel: ACPI: \_SB_.PLTF.C004: Found 3 idle states
[51993.932018] ferris kernel: CPU4 is up
[51993.932291] ferris kernel: smpboot: Booting Node 0 Processor 5 APIC 0x5
[51993.934875] ferris kernel: ACPI: \_SB_.PLTF.C005: Found 3 idle states
[51993.935000] ferris kernel: CPU5 is up
[51993.935260] ferris kernel: smpboot: Booting Node 0 Processor 6 APIC 0x6
[51993.937925] ferris kernel: ACPI: \_SB_.PLTF.C006: Found 3 idle states
[51993.938059] ferris kernel: CPU6 is up
[51993.938335] ferris kernel: smpboot: Booting Node 0 Processor 7 APIC 0x7
[51993.940970] ferris kernel: ACPI: \_SB_.PLTF.C007: Found 3 idle states
[51993.941140] ferris kernel: CPU7 is up
[51993.941404] ferris kernel: smpboot: Booting Node 0 Processor 8 APIC 0x8
[51993.944011] ferris kernel: ACPI: \_SB_.PLTF.C008: Found 3 idle states
[51993.944166] ferris kernel: CPU8 is up
[51993.944439] ferris kernel: smpboot: Booting Node 0 Processor 9 APIC 0x9
[51993.947128] ferris kernel: ACPI: \_SB_.PLTF.C009: Found 3 idle states
[51993.947317] ferris kernel: CPU9 is up
[51993.947582] ferris kernel: smpboot: Booting Node 0 Processor 10 APIC 0xa
[51993.950228] ferris kernel: ACPI: \_SB_.PLTF.C00A: Found 3 idle states
[51993.950402] ferris kernel: CPU10 is up
[51993.950671] ferris kernel: smpboot: Booting Node 0 Processor 11 APIC 0xb
[51993.953371] ferris kernel: ACPI: \_SB_.PLTF.C00B: Found 3 idle states
[51993.953587] ferris kernel: CPU11 is up
[51993.953863] ferris kernel: smpboot: Booting Node 0 Processor 12 APIC 0xc
[51993.956653] ferris kernel: ACPI: \_SB_.PLTF.C00C: Found 3 idle states
[51993.956888] ferris kernel: CPU12 is up
[51993.957159] ferris kernel: smpboot: Booting Node 0 Processor 13 APIC 0xd
[51993.959860] ferris kernel: ACPI: \_SB_.PLTF.C00D: Found 3 idle states
[51993.960082] ferris kernel: CPU13 is up
[51993.960377] ferris kernel: smpboot: Booting Node 0 Processor 14 APIC 0xe
[51993.963331] ferris kernel: ACPI: \_SB_.PLTF.C00E: Found 3 idle states
[51993.963589] ferris kernel: CPU14 is up
[51993.963900] ferris kernel: smpboot: Booting Node 0 Processor 15 APIC 0xf
[51993.966695] ferris kernel: ACPI: \_SB_.PLTF.C00F: Found 3 idle states
[51993.966976] ferris kernel: CPU15 is up
[51993.969291] ferris kernel: ACPI: EC: interrupt unblocked
[51994.097858] ferris kernel: usb usb1: root hub lost power or was reset
[51994.097861] ferris kernel: usb usb3: root hub lost power or was reset
[51994.097865] ferris kernel: usb usb4: root hub lost power or was reset
[51994.097865] ferris kernel: usb usb2: root hub lost power or was reset
[51994.098417] ferris kernel: [drm] PCIE GART of 512M enabled (table at 0x000000801FD00000).
[51994.098458] ferris kernel: [drm] PSP is resuming...
[51994.098575] ferris kernel: usb usb5: root hub lost power or was reset
[51994.098579] ferris kernel: usb usb6: root hub lost power or was reset
[51994.098768] ferris kernel: usb usb7: root hub lost power or was reset
[51994.098772] ferris kernel: usb usb8: root hub lost power or was reset
[51994.122729] ferris kernel: [drm] reserve 0x4000000 from 0x8018000000 for PSP TMR
[51994.286894] ferris kernel: nvme nvme0: Shutdown timeout set to 10 seconds
[51994.290098] ferris kernel: nvme nvme0: 16/0/0 default/read/poll queues
[51994.370364] ferris kernel: usb 1-5: reset high-speed USB device number 3 using xhci_hcd
[51994.638687] ferris kernel: amdgpu 0000:c1:00.0: amdgpu: RAS: optional ras ta ucode is not available
[51994.646372] ferris kernel: amdgpu 0000:c1:00.0: amdgpu: RAP: optional rap ta ucode is not available
[51994.646374] ferris kernel: amdgpu 0000:c1:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[51994.646378] ferris kernel: amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
[51994.647877] ferris kernel: amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
[51994.650246] ferris kernel: [drm] DMUB hardware initialized: version=0x08002300
[51994.705025] ferris kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff7b420000 flags=0x0000]
[51994.705033] ferris kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff7b421000 flags=0x0000]
[51994.705038] ferris kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff7b422000 flags=0x0000]
[51994.705041] ferris kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff7b423000 flags=0x0000]
[51994.705045] ferris kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff7b424000 flags=0x0000]
[51994.705048] ferris kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff7b425000 flags=0x0000]
[51994.705051] ferris kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff7b426000 flags=0x0000]
[51994.705054] ferris kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff7b427000 flags=0x0000]
[51994.705059] ferris kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff7b428000 flags=0x0000]
[51994.705063] ferris kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfff7b434000 flags=0x0000]
2 Likes

Hi there

Running latest Manjaro installed a couple of weeks ago here, I got sort of the same issue. Whenever making a screenshot with flameshot, or have a video full screen via Kodi, sometimes I got part of the screen blank, as shown in the attached photo.

If I “esc” sometimes it stop, sometimes it stays and I’ve to reboot.

dmesg says

[13194.577713] amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffff0c00000 flags=0x0000]
[13194.577748] amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffff0c01000 flags=0x0000]
[13194.577765] amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffff0c02000 flags=0x0000]
[13194.577780] amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffff0c03000 flags=0x0000]
[13194.577795] amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffff0c04000 flags=0x0000]
[13194.577810] amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffff0c05000 flags=0x0000]
[13194.577826] amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffff0c06000 flags=0x0000]
[13194.577841] amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffff0c07000 flags=0x0000]
[13194.577855] amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffff0c08000 flags=0x0000]
[13194.577870] amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffff0c24000 flags=0x0000]

I did upgraded the BIOS to 3.03

sudo dmidecode | grep -A3 'Vendor:\|Product:' && sudo lshw -C cpu | grep -A3 'product:\|vendor:'         INT ✘ 
	Vendor: INSYDE Corp.
	Version: 03.03
	Release Date: 10/17/2023
	Address: 0xE0000

uname -a
Linux mylaptopname 6.5.11-1-MANJARO #1 SMP PREEMPT_DYNAMIC Thu Nov  9 02:34:53 UTC 2023 x86_64 GNU/Linux

So, what else can I do to troubleshoot further / fix it ?

Disable scatter gather support on kernel cli and increase the reserved VRAM.

Known issues with the amdgpu default enabling of Scatter Gather VRAM which is set from 6.2 kernel onwards.

add

amdgpu.sg_display=0

to your Kernel command line, and change the VRAM allocation in the BIOS to

UMA_GAME_OPTIMIZED

.

These in combination alleviate (but not completely remove) the issue you are seeing.

2 Likes

This is going to be your best bet. Using linux-lts may also help on Arch.
https://archlinux.org/packages/core/x86_64/linux-lts/

Just an FYI I think the only reason the LTS works is because the kernel is so old it doesn’t turn on Scatter Gather by default. You are much better using a newer kernel and adding the kernel cli to disable it

1 Like

The mitigation worked good enought. Since I applied it, I had a handful of pixel blinking, nothing compelling a reboot. Thanks folks !

1 Like

Tentatively marking this as resolved (for you).

1 Like

Hi all,

I think I’m facing the same issue on various other distros as newer kernels start coming in.

My question is simple: are these workarounds considered temporary? The scatter support thingy seems kernel related and up to the folks at AMD contributing to the kernel to solve (which can’t happen soon enough).

But the BIOS change to allocate more RAM to graphics? I need my RAM for work, and when this bug doesn’t appear, the default allocated value is plenty (I even get to play some GTA V in steam at full resolution). So I would like to know if a more sustainable/long-term FIX is in the work on framework side, to avoid increasing the amount of RAM users have to reserve to the GPU.

Thanks

3 Likes

Linux fedora 6.8.0-0.rc2.20240130gt861c0981.219.vanilla.fc39.x86_64 seems pretty stable for me. Can somebody also please test it?

Are you testing without the SC Disable flag?