[SOLVED] Framework website crashes/freezes on kernel 6.10 with Firefox on AMD Framework

SuperTux88 · August 8, 2024, 2:45am

I discovered a very strange problem with my AMD Framework Laptop 13 after updating to kernel 6.10 recently. If I visit the framework website (specifically this page, and I’m also able to reproduce it with a stored offline version of this page, in case the online version changes) in Firefox, the whole laptop freezes. The screen turns black, and comes back a second later, the mouse still kinda moves, but it’s otherwise frozen and the screen sometimes flickers, but I’m most of the time still able to reboot it with a keyboard shortcut or ssh. This problem is reproducible, as soon as I load this website Firefox, everything freezes basically instantly. Other websites work fine (or at least I didn’t find another website yet which causes problems). The problematic website works fine when I use chrome, but crashes both in Firefox Nightly (version 131) and Firefox ESR (version 115). If I boot kernel 6.9 again, everything works fine, even in Firefox.

I’m using Gentoo Linux with wayland/sway.
Tested broken kernel versions: 6.10.2 and 6.10.3
Tested working kernel versions: 6.9.10 and 6.9.12

dmesg output of one of the crashes:

[   78.882373] gmc_v11_0_process_interrupt: 35 callbacks suppressed
[   78.882380] amdgpu 0000:c1:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:1 pasid:32782)
[   78.882388] amdgpu 0000:c1:00.0: amdgpu:  in process RDD Process pid 3923 thread browser {b:cs0 pid 3954)
[   78.882391] amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x000080011ee4e000 from client 18
[   78.882395] amdgpu 0000:c1:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00103A11
[   78.882397] amdgpu 0000:c1:00.0: amdgpu: 	Faulty UTCL2 client ID: unknown (0x1d)
[   78.882400] amdgpu 0000:c1:00.0: amdgpu: 	MORE_FAULTS: 0x1
[   78.882402] amdgpu 0000:c1:00.0: amdgpu: 	WALKER_ERROR: 0x0
[   78.882404] amdgpu 0000:c1:00.0: amdgpu: 	PERMISSION_FAULTS: 0x1
[   78.882406] amdgpu 0000:c1:00.0: amdgpu: 	MAPPING_ERROR: 0x0
[   78.882407] amdgpu 0000:c1:00.0: amdgpu: 	RW: 0x0
[   78.882410] amdgpu 0000:c1:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:1 pasid:32782)
[   78.882413] amdgpu 0000:c1:00.0: amdgpu:  in process RDD Process pid 3923 thread browser {b:cs0 pid 3954)
[   78.882415] amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x000080011ed5a000 from client 18
[   78.882417] amdgpu 0000:c1:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[   78.882419] amdgpu 0000:c1:00.0: amdgpu: 	Faulty UTCL2 client ID: VMC (0x0)
[   78.882421] amdgpu 0000:c1:00.0: amdgpu: 	MORE_FAULTS: 0x0
[   78.882422] amdgpu 0000:c1:00.0: amdgpu: 	WALKER_ERROR: 0x0
[   78.882424] amdgpu 0000:c1:00.0: amdgpu: 	PERMISSION_FAULTS: 0x0
[   78.882425] amdgpu 0000:c1:00.0: amdgpu: 	MAPPING_ERROR: 0x0
[   78.882427] amdgpu 0000:c1:00.0: amdgpu: 	RW: 0x0
[   78.882684] amdgpu 0000:c1:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:1 pasid:32782)
[   78.882688] amdgpu 0000:c1:00.0: amdgpu:  in process RDD Process pid 3923 thread browser {b:cs0 pid 3954)
[   78.882690] amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x000080011ed5a000 from client 18
[   78.882693] amdgpu 0000:c1:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00103A11
[   78.882694] amdgpu 0000:c1:00.0: amdgpu: 	Faulty UTCL2 client ID: unknown (0x1d)
[   78.882696] amdgpu 0000:c1:00.0: amdgpu: 	MORE_FAULTS: 0x1
[   78.882698] amdgpu 0000:c1:00.0: amdgpu: 	WALKER_ERROR: 0x0
[   78.882699] amdgpu 0000:c1:00.0: amdgpu: 	PERMISSION_FAULTS: 0x1
[   78.882701] amdgpu 0000:c1:00.0: amdgpu: 	MAPPING_ERROR: 0x0
[   78.882703] amdgpu 0000:c1:00.0: amdgpu: 	RW: 0x0
[   78.882706] amdgpu 0000:c1:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:1 pasid:32782)
[   78.882708] amdgpu 0000:c1:00.0: amdgpu:  in process RDD Process pid 3923 thread browser {b:cs0 pid 3954)
[   78.882711] amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x000080011ee4e000 from client 18
[   78.882713] amdgpu 0000:c1:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[   78.882715] amdgpu 0000:c1:00.0: amdgpu: 	Faulty UTCL2 client ID: VMC (0x0)
[   78.882717] amdgpu 0000:c1:00.0: amdgpu: 	MORE_FAULTS: 0x0
[   78.882720] amdgpu 0000:c1:00.0: amdgpu: 	WALKER_ERROR: 0x0
[   78.882722] amdgpu 0000:c1:00.0: amdgpu: 	PERMISSION_FAULTS: 0x0
[   78.882724] amdgpu 0000:c1:00.0: amdgpu: 	MAPPING_ERROR: 0x0
[   78.882725] amdgpu 0000:c1:00.0: amdgpu: 	RW: 0x0
[   88.947442] [drm:amdgpu_job_timedout] *ERROR* ring vcn_unified_0 timeout, signaled seq=549, emitted seq=552
[   88.947467] [drm:amdgpu_job_timedout] *ERROR* Process information: process RDD Process pid 3923 thread browser {b:cs0 pid 3954
[   88.947478] amdgpu 0000:c1:00.0: amdgpu: GPU reset begin!
[   89.215806] [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
[   89.429945] [drm] Register(0) [regUVD_RB_RPTR] failed to reach value 0x000000c0 != 0x00000040n
[   89.645651] [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
[   89.652502] amdgpu 0000:c1:00.0: amdgpu: MODE2 reset
[   89.690199] amdgpu 0000:c1:00.0: amdgpu: GPU reset succeeded, trying to resume
[   89.690938] [drm] PCIE GART of 512M enabled (table at 0x000000801FD00000).
[   89.691120] [drm] VRAM is lost due to GPU reset!
[   89.691123] amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
[   89.693212] amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
[   89.694596] [drm] DMUB hardware initialized: version=0x08003D00
[   90.053993] [drm] kiq ring mec 3 pipe 1 q 0
[   90.310129] [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
[   90.310460] amdgpu 0000:c1:00.0: [drm:jpeg_v4_0_hw_init] JPEG decode initialized successfully.
[   90.310758] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[   90.310761] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[   90.310764] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[   90.310767] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[   90.310769] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[   90.310771] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[   90.310773] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[   90.310775] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[   90.310778] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[   90.310780] amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[   90.310782] amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[   90.310784] amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[   90.310786] amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[   90.313402] amdgpu 0000:c1:00.0: amdgpu: recover vram bo from shadow start
[   90.313406] amdgpu 0000:c1:00.0: amdgpu: recover vram bo from shadow done
[   90.313428] amdgpu 0000:c1:00.0: amdgpu: GPU reset(1) succeeded!
[   90.919743] browser {b:cs0[3954]: segfault at 0 ip 000056141f3c730d sp 00007fefaa6d6910 error 6 in firefox-bin[df30d,56141f30a000+107000] likely on CPU 5 (core 2, socket 0)
[   90.919765] Code: e5 41 56 53 48 89 fb 4c 8b 35 9f b5 04 00 49 8b 36 e8 77 8e 04 00 49 8b 36 bf 0a 00 00 00 e8 ba 8f 04 00 48 89 1d 9b eb 04 00 <c7> 04 25 00 00 00 00 23 00 00 00 e8 03 00 00 00 cc cc cc 55 48 89
[   91.561359] [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000000n
[   91.799086] [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000000n
[  104.812154] [drm:amdgpu_job_timedout] *ERROR* ring vcn_unified_0 timeout, signaled seq=558, emitted seq=560
[  104.812179] [drm:amdgpu_job_timedout] *ERROR* Process information: process RDD Process pid 4340 thread browser {2:cs0 pid 4354
[  104.812192] amdgpu 0000:c1:00.0: amdgpu: GPU reset begin!
[  105.071661] [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
[  105.287239] [drm] Register(0) [regUVD_RB_RPTR] failed to reach value 0x000000c0 != 0x00000040n
[  105.502218] [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
[  105.508940] amdgpu 0000:c1:00.0: amdgpu: MODE2 reset
[  105.547009] amdgpu 0000:c1:00.0: amdgpu: GPU reset succeeded, trying to resume
[  105.547735] [drm] PCIE GART of 512M enabled (table at 0x000000801FD00000).
[  105.547825] [drm] VRAM is lost due to GPU reset!
[  105.547830] amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
[  105.550747] amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
[  105.552113] [drm] DMUB hardware initialized: version=0x08003D00
[  105.918756] [drm] kiq ring mec 3 pipe 1 q 0
[  106.162726] [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
[  106.162819] amdgpu 0000:c1:00.0: [drm:jpeg_v4_0_hw_init] JPEG decode initialized successfully.
[  106.163066] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[  106.163069] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[  106.163072] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[  106.163074] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[  106.163076] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[  106.163078] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[  106.163079] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[  106.163081] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[  106.163083] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[  106.163085] amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[  106.163087] amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[  106.163091] amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[  106.163094] amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[  106.164705] amdgpu 0000:c1:00.0: amdgpu: recover vram bo from shadow start
[  106.164707] amdgpu 0000:c1:00.0: amdgpu: recover vram bo from shadow done
[  106.164723] amdgpu 0000:c1:00.0: amdgpu: GPU reset(2) succeeded!
[  106.220971] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[  106.602075] browser {2:cs0[4354]: segfault at 0 ip 000056141f3c730d sp 00007fefaa75c940 error 6 in firefox-bin[df30d,56141f30a000+107000] likely on CPU 6 (core 3, socket 0)
[  106.602096] Code: e5 41 56 53 48 89 fb 4c 8b 35 9f b5 04 00 49 8b 36 e8 77 8e 04 00 49 8b 36 bf 0a 00 00 00 e8 ba 8f 04 00 48 89 1d 9b eb 04 00 <c7> 04 25 00 00 00 00 23 00 00 00 e8 03 00 00 00 cc cc cc 55 48 89
[  107.430840] [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000000n

Once it also caused a kernel panic and I wasn’t able to shutdown cleanly anymore, but most of the crashes look similar to the one above: Framework kernel 6.10 firefox kernel panic · GitHub

Sometimes firefox also creates a crash-report: https://crash-stats.mozilla.org/report/index/e001cfa1-6cd1-464d-816f-607a00240808
But I’m not sure if that’s showing the real cause, or if that’s just firefox crashing because the whole GPU just crashed?

I have no idea where to start debugging this. Is it kernel problem? Because it started when I upgraded the kernel to 6.10. Or is it a firefox problem? Because it works in chrome, but firefox also works if I downgrade the kernel to 6.9 again, so doesn’t really look like a firefox problem? But since the problem is reproducible, I can also test different things or provide different logs.

Also I find it kinda ironic, that it’s the framework website itself which crashes my framework laptop.

jwp · August 8, 2024, 3:00am

Works fine for me in firefox wayland and a 6.10 kernel. I am guessing it’s to do with the way in which firefox is compiled against mesa bits in your gentoo install.

Try a flatpak/appimage of firefox and see if you can recreate.

Mario_Limonciello · August 8, 2024, 4:23am

If you can recreate it reliably with the rest of your software stack the ideal thing to do is perform a kernel bisect:

https://www.kernel.org/doc/html/next/admin-guide/bug-bisect.html

When you narrow down to a commit, raise a bug at drm / amd · GitLab

SuperTux88 · August 9, 2024, 1:30am

Thanks for your response @Mario_Limonciello. I started the bisect, but it takes time, so I’m not done yet, this is the progress so far:

git bisect start
# status: waiting for both good and bad commits
# good: [a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6] Linux 6.9
git bisect good a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6
# status: waiting for bad commit, 1 good commit known
# bad: [0c3836482481200ead7b416ca80c68a29cfdaabd] Linux 6.10
git bisect bad 0c3836482481200ead7b416ca80c68a29cfdaabd
# bad: [33e02dc69afbd8f1b85a51d74d72f139ba4ca623] Merge tag 'sound-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
git bisect bad 33e02dc69afbd8f1b85a51d74d72f139ba4ca623
# good: [b850dc206a57ae272c639e31ac202ec0c2f46960] Merge tag 'firewire-updates-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
git bisect good b850dc206a57ae272c639e31ac202ec0c2f46960
# good: [46c6d2b186915176be5acc5d4b6f9793eb32a0c7] Merge tag 'asymmetric-keys-next-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
git bisect good 46c6d2b186915176be5acc5d4b6f9793eb32a0c7
# good: [d65bfb9546eb627e3c578336355c5b81797f2255] gpu: drm: exynos: hdmi: eliminate uses of of_node_put()
git bisect good d65bfb9546eb627e3c578336355c5b81797f2255
# good: [c3c5ac4bd7d7019f2e3ad1720572d53226fe656e] ASoC: Intel: updates for 6.10 - part7
git bisect good c3c5ac4bd7d7019f2e3ad1720572d53226fe656e

And I’m currently at 6.9.0-rc6-01371-g4a56c0ed5aa0 (not in above log yet, as I didn’t decide if good or bad yet), where I encountered an interesting new behavior. This time it didn’t instantly crash when opening the website, but I see green graphics glitches on the videos on the website. It still crashed after about 2 minutes, at least the first time, I then rebooted and tried again, and again saw the green glitches, but this time it’s not crashing anymore (OK, I got it to crash after several minutes of scrolling up and down).

There are multiple videos on this site which are glitching (but not all of them glitch that hard), and I assume that this is what is triggering the crashes?

I’ll continue tomorrow with the bisect, there are still a few steps left. Not so sure if I should say “good” or “bad” here, but tending to “bad”, as it at least crashed one time, but at this commit it’s not crashing as reliable as with the final 6.10 version.

I didn’t have time to test flatpak/appimage yet, but since I can trigger it by booting different kernel versions, I think something related to the kernel is a better guess, so I started with the bisect first. The Firefox ESR 115 is installed through gentoo and compiled myself against the installed libs. The Firefox nightly is downloaded from mozilla as a tar.gz and unzipped into my home directory and updated through the builtin auto-updater. My mesa version is 24.1.3.

Mario_Limonciello · August 9, 2024, 1:42am

If you have an indeterminate result then you can use skip. I think marking something like that good or bad will just confuse the results.

You can leave yourself some notes about it though in case it’s needed to revisit it.

pollomarzo · August 9, 2024, 2:19pm

same happened to me today
linux noob and no time to investigate rn, i’ll try keeping 6.9 or roll back to last snapper snapshot and wait a few days
on openSUSE TW, KDE6 on Wayland
firefox with many tabs open, a video playing on youtube and an external monitor was attached. both screens froze while audio continued, i hit spacebar. laptop screen went black, second monitor was frozen. i tried the SysRq REISUB keys but i forget if i enabled them, when that didn’t work i held the power button to get back to work
i don’t think the framework website is to blame, just videos in general.
after restart i saw similar green glitches while watching a video, so i’d expect the problem to be kernel-GPU related, or maybe video decoding?
unfortunately between yesterday and today i updated both kernel and plasma6 so…
leaving a few more info to help anyone searching for this (maybe me in the near future)
@SuperTux88 i’ll be interested in hearing if you manage the bisect
here’s my dmesg output

    Aug 09 14:43:35 laptop_name kernel: gmc_v11_0_process_interrupt: 146 callbacks suppressed
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:1 pasid:32817)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:  in process RDD Process pid 8148 thread firefox:cs0 pid 8646)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800103d56000 from client 18
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00103A11
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          Faulty UTCL2 client ID: unknown (0x1d)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          MORE_FAULTS: 0x1
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          WALKER_ERROR: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          PERMISSION_FAULTS: 0x1
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          MAPPING_ERROR: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          RW: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:1 pasid:32817)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:  in process RDD Process pid 8148 thread firefox:cs0 pid 8646)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800103d51000 from client 18
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          Faulty UTCL2 client ID: VMC (0x0)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          MORE_FAULTS: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          WALKER_ERROR: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          MAPPING_ERROR: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          RW: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:1 pasid:32817)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:  in process RDD Process pid 8148 thread firefox:cs0 pid 8646)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800103e18000 from client 18
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          Faulty UTCL2 client ID: VMC (0x0)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          MORE_FAULTS: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          WALKER_ERROR: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          MAPPING_ERROR: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          RW: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:1 pasid:32817)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:  in process RDD Process pid 8148 thread firefox:cs0 pid 8646)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800103e1b000 from client 18
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          Faulty UTCL2 client ID: VMC (0x0)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          MORE_FAULTS: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          WALKER_ERROR: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          MAPPING_ERROR: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          RW: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:1 pasid:32817)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:  in process RDD Process pid 8148 thread firefox:cs0 pid 8646)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800103e18000 from client 18
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00103A11
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          Faulty UTCL2 client ID: unknown (0x1d)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          MORE_FAULTS: 0x1
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          WALKER_ERROR: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          PERMISSION_FAULTS: 0x1
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          MAPPING_ERROR: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          RW: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:1 pasid:32817)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:  in process RDD Process pid 8148 thread firefox:cs0 pid 8646)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800103e1b000 from client 18
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00103A11
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          Faulty UTCL2 client ID: unknown (0x1d)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          MORE_FAULTS: 0x1
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          WALKER_ERROR: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          PERMISSION_FAULTS: 0x1
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          MAPPING_ERROR: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          RW: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:1 pasid:32817)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:  in process RDD Process pid 8148 thread firefox:cs0 pid 8646)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800103d51000 from client 18
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          Faulty UTCL2 client ID: VMC (0x0)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          MORE_FAULTS: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          WALKER_ERROR: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          MAPPING_ERROR: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          RW: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:1 pasid:32817)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:  in process RDD Process pid 8148 thread firefox:cs0 pid 8646)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800103d56000 from client 18
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          Faulty UTCL2 client ID: VMC (0x0)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          MORE_FAULTS: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          WALKER_ERROR: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          MAPPING_ERROR: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          RW: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:1 pasid:32817)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:  in process RDD Process pid 8148 thread firefox:cs0 pid 8646)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800103e1d000 from client 18
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          Faulty UTCL2 client ID: VMC (0x0)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          MORE_FAULTS: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          WALKER_ERROR: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          MAPPING_ERROR: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          RW: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:1 pasid:32817)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:  in process RDD Process pid 8148 thread firefox:cs0 pid 8646)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800103e1b000 from client 18
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          Faulty UTCL2 client ID: VMC (0x0)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          MORE_FAULTS: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          WALKER_ERROR: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          PERMISSION_FAULTS: 0x1
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          MAPPING_ERROR: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          RW: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:1 pasid:32817)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:  in process RDD Process pid 8148 thread firefox:cs0 pid 8646)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800103e1b000 from client 18
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00103A11
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          Faulty UTCL2 client ID: unknown (0x1d)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          MORE_FAULTS: 0x1
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          WALKER_ERROR: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          PERMISSION_FAULTS: 0x1
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          MAPPING_ERROR: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          RW: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:1 pasid:32817)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:  in process RDD Process pid 8148 thread firefox:cs0 pid 8646)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800103d51000 from client 18
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          Faulty UTCL2 client ID: VMC (0x0)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          MORE_FAULTS: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          WALKER_ERROR: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          MAPPING_ERROR: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          RW: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:1 pasid:32817)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:  in process RDD Process pid 8148 thread firefox:cs0 pid 8646)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800103d56000 from client 18
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          Faulty UTCL2 client ID: VMC (0x0)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          MORE_FAULTS: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          WALKER_ERROR: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          MAPPING_ERROR: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          RW: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:1 pasid:32817)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:  in process RDD Process pid 8148 thread firefox:cs0 pid 8646)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800103e1d000 from client 18
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          Faulty UTCL2 client ID: VMC (0x0)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          MORE_FAULTS: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          WALKER_ERROR: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          MAPPING_ERROR: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          RW: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:1 pasid:32817)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:  in process RDD Process pid 8148 thread firefox:cs0 pid 8646)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800103e1b000 from client 18
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          Faulty UTCL2 client ID: VMC (0x0)
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          MORE_FAULTS: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          WALKER_ERROR: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          MAPPING_ERROR: 0x0
    Aug 09 14:43:35 laptop_name kernel: amdgpu 0000:c1:00.0: amdgpu:          RW: 0x0
    Aug 09 14:43:45 laptop_name kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring vcn_unified_0 timeout, signaled seq=7507, emitted seq=7508
    Aug 09 14:43:45 laptop_name kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process RDD Process pid 8148 thread firefox:cs0 pid 8646

here’s my /var/log/zypp/history | grep kernel -d

# 2024-08-02 16:42:52 kernel-default-6.10.2-1.1.x86_64.rpm installed ok
2024-08-02 16:42:52|install|kernel-default|6.10.2-1.1|x86_64||download.opensuse.org-oss|976c7e192e488325762bca7e7a40666e13d272f078eda4c9ae181d02f61259457c19be6cb57cf3a592508f9b78eb6dc8e256c4fdb8f1c537220dbea3735d42b3|
# 2024-08-07 16:43:21 kernel-default-6.10.3-1.1.x86_64.rpm installed ok
2024-08-07 16:43:21|install|kernel-default|6.10.3-1.1|x86_64||download.opensuse.org-oss|5f4496e8d99e8d58251fefba5a8f999a7eb2f54c88c7095f7bd0399d36a851c0c85fab1655e9e53afe0d3b93d6e38f4316d8bc13eba03c14ac5a9dafdba6af55|
2024-08-09 14:18:00|remove |kernel-default|6.9.7-1.1|x86_64|root@laptop_name|
2024-08-09 14:18:02|remove |kernel-default|6.9.9-1.1|x86_64|root@laptop_name|

strobert · August 9, 2024, 6:34pm

same here jwp. FW16, Fedora39 KDE plasma (so wayland) w/ 6.10.3-100.fc39 kernel (RPM has a 8/5 build date)

Page looks to load fine and not seeing any ill side effects.

SuperTux88 · August 11, 2024, 9:31pm

Ok, I did some more bisecting. First I went with “bad” for this one commit, thinking that I want to find when the problems started (and I can always restart the bisect with some other commits to go a different route). But then I encountered another change of behavior:

When I tested 6.9.0-rc5-01145-gbbecb57e28e6, I didn’t see the green artifacts anymore. But there were still some video artifacts visible, sometimes, a lot less notable and a lot less often, but I saw them. And then when I quit Firefox, it even crashed again, so something was still broken. But since it’s a loss harder to detect, I don’t know if I already marked some earlier commits as “good” that still had the same behavior, as at the beginning I was only looking for an instant crash and didn’t pay that much attention to the videos when it didn’t crash. I tried to go further back (so mark it as “bad” again, as something was still broken), but I sometimes it was really hard to tell if it still has artifacts and I also didn’t manage to crash it anymore. So I stopped that bisect, as I didn’t know if I already marked some commits as “good” that would also have this very light problem.

But I think it was interesting to see, that the further back I go, at some point the instant crash goes away, and if I go further back, as some time the green artifacts go away, but something still stays, but the problem becomes less and less problematic.

I then started again, with a commit that had the green artifacts (from the notes I took from the first bisect), and a commit that didn’t have the green artifacts, to find the commit where the green artifacts were introduced (so not the instant crash), and this is the result (the commits I skipped didn’t compile, so I just skipped until I had one that I was able to compile again):

git bisect start
# status: waiting for both good and bad commits
# good: [bbecb57e28e6fd3666e15142728029b084eee6b2] Merge tag 'exynos-drm-next-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-next
git bisect good bbecb57e28e6fd3666e15142728029b084eee6b2
# status: waiting for bad commit, 1 good commit known
# bad: [9aa99bb1977aab5f1a23780673f74db99d982632] drm/loongson: fix build after debugfs include change
git bisect bad 9aa99bb1977aab5f1a23780673f74db99d982632
# bad: [30ea09a182cb37c4921b9d477ed18107befe6d78] drm/bridge: tc358775: fix support for jeida-18 and jeida-24
git bisect bad 30ea09a182cb37c4921b9d477ed18107befe6d78
# skip: [c1696bf8d5f5389c5312aebf9e3ad0267149cdea] drm/tests: Add a test case for drm buddy clear allocation
git bisect skip c1696bf8d5f5389c5312aebf9e3ad0267149cdea
# skip: [e80c219f52861e756181d7f88b0d341116daac2b] drm/rockchip: vop2: Do not divide height twice for YUV
git bisect skip e80c219f52861e756181d7f88b0d341116daac2b
# skip: [f1e4db073f98f887122a10ef3c66a19e75516b48] drm/vc4: hdmi: switch to struct drm_edid
git bisect skip f1e4db073f98f887122a10ef3c66a19e75516b48
# skip: [9be3eb5d6ee57662a22b56153c7ee39265685455] dt-bindings: display: add #sound-dai-cells property to rockchip rk3066 hdmi
git bisect skip 9be3eb5d6ee57662a22b56153c7ee39265685455
# skip: [bd730c77fa37fe2dda4b6e23f6921ef8a9b1bb97] drm/sun4i: hdmi: switch to struct drm_edid
git bisect skip bd730c77fa37fe2dda4b6e23f6921ef8a9b1bb97
# bad: [51debb6d4a2118f6a46dd36b84a82a5a73fa8236] dt-bindings: display: bridge: tc358775: make stby gpio optional
git bisect bad 51debb6d4a2118f6a46dd36b84a82a5a73fa8236
# skip: [96950929eb232038022abd961be46d492d7a6f0f] drm/buddy: Implement tracking clear page feature
git bisect skip 96950929eb232038022abd961be46d492d7a6f0f
# bad: [c058e7a8f8af355e4a441c89400a6e95a16320e5] Merge drm/drm-next into drm-misc-next
git bisect bad c058e7a8f8af355e4a441c89400a6e95a16320e5
# skip: [105aa4c65b76c3a344ca89a2d2dc96c84cca557f] drm: Fix plane SIZE_HINTS property docs
git bisect skip 105aa4c65b76c3a344ca89a2d2dc96c84cca557f
# skip: [9a314ea512b7db9d38107ea0284b56f805b8fc9a] drm/panel: Add driver for EDO RM69380 OLED panel
git bisect skip 9a314ea512b7db9d38107ea0284b56f805b8fc9a
# skip: [8431f29d2f1deac06e120a1c5d9afb5d72def319] drm/gud: switch to struct drm_edid
git bisect skip 8431f29d2f1deac06e120a1c5d9afb5d72def319
# skip: [e0a200ab4b72afd581bd6f82fc1ef510a4fb5478] drm/edid: Parse topology block for all DispID structure v1.x
git bisect skip e0a200ab4b72afd581bd6f82fc1ef510a4fb5478
# skip: [685ba01ebedb8f87673f587f540ba84c444442d4] drm/rockchip: lvds: Remove include of drm_dp_helper.h
git bisect skip 685ba01ebedb8f87673f587f540ba84c444442d4
# skip: [a68c7eaa7a8ffdec9287ba1561a668d674c20a13] drm/amdgpu: Enable clear page functionality
git bisect skip a68c7eaa7a8ffdec9287ba1561a668d674c20a13
# skip: [0e353133816b3e3e4bf8a682de01506ebc2b1dee] drm/rockchip: cdn-dp: drop driver owner assignment
git bisect skip 0e353133816b3e3e4bf8a682de01506ebc2b1dee
# skip: [5c9837374ecf55a1fa3b7622d365a0456960270f] drm/meson: gate px_clk when setting rate
git bisect skip 5c9837374ecf55a1fa3b7622d365a0456960270f
# skip: [a9b7dfd1d1f96be3a3f92128e9d78719a8d65939] drm/panthor: clean up some types in panthor_sched_suspend()
git bisect skip a9b7dfd1d1f96be3a3f92128e9d78719a8d65939
# skip: [4f888782d30276b08a32fa3d9b5c13b7dc123e28] dt-bindings: display: panel: Add Raydium RM69380
git bisect skip 4f888782d30276b08a32fa3d9b5c13b7dc123e28
# skip: [b1ee6bd3ea954d081bfb1d5559ce3e78ef40443a] dt-bindings: display: add #sound-dai-cells property to rockchip inno hdmi
git bisect skip b1ee6bd3ea954d081bfb1d5559ce3e78ef40443a
# skip: [26f9339212db569310d4b0ef4284efcbb462a86f] drm/panel: add Khadas TS050 V2 panel support
git bisect skip 26f9339212db569310d4b0ef4284efcbb462a86f
# skip: [7e7dc3a9ae38711c5c7a4a88d71d8875849d8c5c] drm/panel-edp: switch to struct drm_edid
git bisect skip 7e7dc3a9ae38711c5c7a4a88d71d8875849d8c5c
# skip: [e69da902467f79d933543661b56101042a45e5a4] drm/panel: simple: switch to struct drm_edid
git bisect skip e69da902467f79d933543661b56101042a45e5a4
# skip: [7fa1d6c50a5f5ad140bca0c615e97221f042a7a5] drm/rockchip: inno_hdmi: switch to struct drm_edid
git bisect skip 7fa1d6c50a5f5ad140bca0c615e97221f042a7a5
# skip: [0546e01d5a0269f02b4aa227f44b30a5a5558792] dt-bindings: panel-simple-dsi: add Khadas TS050 V2 panel
git bisect skip 0546e01d5a0269f02b4aa227f44b30a5a5558792
# skip: [917ebdd0a89304fabab54f58062dbb35d413dcb3] drm/rockchip: cdn-dp: switch to struct drm_edid
git bisect skip 917ebdd0a89304fabab54f58062dbb35d413dcb3
# skip: [a9c428f1b2e203d35117ee60f43db0ebdab39e66] drm/panel-samsung-atna33xc20: switch to struct drm_edid
git bisect skip a9c428f1b2e203d35117ee60f43db0ebdab39e66
# skip: [e58414e44b5315230de829ed88a63611646907ac] dt-bindings: display: add #sound-dai-cells property to rockchip dw hdmi
git bisect skip e58414e44b5315230de829ed88a63611646907ac
# skip: [6221deb716b9d5397c09ba6567f7ae61d8cbeb98] drm/rockchip: rk3066_hdmi: switch to struct drm_edid
git bisect skip 6221deb716b9d5397c09ba6567f7ae61d8cbeb98
# only skipped commits left to test
# possible first bad commit: [c058e7a8f8af355e4a441c89400a6e95a16320e5] Merge drm/drm-next into drm-misc-next
# possible first bad commit: [6221deb716b9d5397c09ba6567f7ae61d8cbeb98] drm/rockchip: rk3066_hdmi: switch to struct drm_edid
# possible first bad commit: [7fa1d6c50a5f5ad140bca0c615e97221f042a7a5] drm/rockchip: inno_hdmi: switch to struct drm_edid
# possible first bad commit: [917ebdd0a89304fabab54f58062dbb35d413dcb3] drm/rockchip: cdn-dp: switch to struct drm_edid
# possible first bad commit: [8431f29d2f1deac06e120a1c5d9afb5d72def319] drm/gud: switch to struct drm_edid
# possible first bad commit: [f1e4db073f98f887122a10ef3c66a19e75516b48] drm/vc4: hdmi: switch to struct drm_edid
# possible first bad commit: [bd730c77fa37fe2dda4b6e23f6921ef8a9b1bb97] drm/sun4i: hdmi: switch to struct drm_edid
# possible first bad commit: [7e7dc3a9ae38711c5c7a4a88d71d8875849d8c5c] drm/panel-edp: switch to struct drm_edid
# possible first bad commit: [a9c428f1b2e203d35117ee60f43db0ebdab39e66] drm/panel-samsung-atna33xc20: switch to struct drm_edid
# possible first bad commit: [e69da902467f79d933543661b56101042a45e5a4] drm/panel: simple: switch to struct drm_edid
# possible first bad commit: [c1696bf8d5f5389c5312aebf9e3ad0267149cdea] drm/tests: Add a test case for drm buddy clear allocation
# possible first bad commit: [a68c7eaa7a8ffdec9287ba1561a668d674c20a13] drm/amdgpu: Enable clear page functionality
# possible first bad commit: [96950929eb232038022abd961be46d492d7a6f0f] drm/buddy: Implement tracking clear page feature
# possible first bad commit: [5c9837374ecf55a1fa3b7622d365a0456960270f] drm/meson: gate px_clk when setting rate
# possible first bad commit: [105aa4c65b76c3a344ca89a2d2dc96c84cca557f] drm: Fix plane SIZE_HINTS property docs
# possible first bad commit: [e0a200ab4b72afd581bd6f82fc1ef510a4fb5478] drm/edid: Parse topology block for all DispID structure v1.x
# possible first bad commit: [9a314ea512b7db9d38107ea0284b56f805b8fc9a] drm/panel: Add driver for EDO RM69380 OLED panel
# possible first bad commit: [4f888782d30276b08a32fa3d9b5c13b7dc123e28] dt-bindings: display: panel: Add Raydium RM69380
# possible first bad commit: [26f9339212db569310d4b0ef4284efcbb462a86f] drm/panel: add Khadas TS050 V2 panel support
# possible first bad commit: [0546e01d5a0269f02b4aa227f44b30a5a5558792] dt-bindings: panel-simple-dsi: add Khadas TS050 V2 panel
# possible first bad commit: [a9b7dfd1d1f96be3a3f92128e9d78719a8d65939] drm/panthor: clean up some types in panthor_sched_suspend()
# possible first bad commit: [0e353133816b3e3e4bf8a682de01506ebc2b1dee] drm/rockchip: cdn-dp: drop driver owner assignment
# possible first bad commit: [685ba01ebedb8f87673f587f540ba84c444442d4] drm/rockchip: lvds: Remove include of drm_dp_helper.h
# possible first bad commit: [b1ee6bd3ea954d081bfb1d5559ce3e78ef40443a] dt-bindings: display: add #sound-dai-cells property to rockchip inno hdmi
# possible first bad commit: [9be3eb5d6ee57662a22b56153c7ee39265685455] dt-bindings: display: add #sound-dai-cells property to rockchip rk3066 hdmi
# possible first bad commit: [e58414e44b5315230de829ed88a63611646907ac] dt-bindings: display: add #sound-dai-cells property to rockchip dw hdmi
# possible first bad commit: [e80c219f52861e756181d7f88b0d341116daac2b] drm/rockchip: vop2: Do not divide height twice for YUV

Since I skipped quite some commits, it wasn’t able to narrow it down to a single commit, so I don’t know how useful that is, or if I can do something differently to get a better result?

Then I started a new bisect again, with a commit that had green artifacts but no instant crash, and a commit that had the instant crash, and this is the result of that one (again, skipped the ones that didn’t compile):

git bisect start
# status: waiting for both good and bad commits
# good: [4a56c0ed5aa0bcbe1f5f7d755fb1fe1ebf48ae9c] Merge tag 'amd-drm-next-6.10-2024-04-26' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
git bisect good 4a56c0ed5aa0bcbe1f5f7d755fb1fe1ebf48ae9c
# status: waiting for bad commit, 1 good commit known
# bad: [33e02dc69afbd8f1b85a51d74d72f139ba4ca623] Merge tag 'sound-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
git bisect bad 33e02dc69afbd8f1b85a51d74d72f139ba4ca623
# good: [47e9bff7fc042b28eb4cf375f0cf249ab708fdfa] Merge tag 'erofs-for-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
git bisect good 47e9bff7fc042b28eb4cf375f0cf249ab708fdfa
# good: [83127ecada257e27f4740dbca9644dd0e838bc36] Merge tag 'wireless-next-2024-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
git bisect good 83127ecada257e27f4740dbca9644dd0e838bc36
# good: [46c6d2b186915176be5acc5d4b6f9793eb32a0c7] Merge tag 'asymmetric-keys-next-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
git bisect good 46c6d2b186915176be5acc5d4b6f9793eb32a0c7
# skip: [395f23e9206d71a0090fc15a9062f93c6e4cd4bc] ASoC: Intel: updates for 6.10 - part6
git bisect skip 395f23e9206d71a0090fc15a9062f93c6e4cd4bc
# skip: [38068d91cf3948ffa220d45f738505cc9f6e13d0] ASoC: Intel: sof_sdw: Allocate snd_soc_card dynamically
git bisect skip 38068d91cf3948ffa220d45f738505cc9f6e13d0
# skip: [44f69ddccb66bcdf969c44d8bb5d4dea4d6b2933] ALSA: usb-audio: Add sampling rates support for Mbox3
git bisect skip 44f69ddccb66bcdf969c44d8bb5d4dea4d6b2933
# skip: [628cc5d0c4bd6a3f70c793968f8e2546afc8c3a3] ASoC: Intel: sof_sdw: Delay update of the codec_conf array
git bisect skip 628cc5d0c4bd6a3f70c793968f8e2546afc8c3a3
# skip: [8166bdd2c560e59e9a6ec0c868b996294d8428d1] ASoC: intel: soc-acpi: Add missing cs42l43 endpoints
git bisect skip 8166bdd2c560e59e9a6ec0c868b996294d8428d1
# skip: [8b6d678fede700db6466d73f11fcbad496fa515e] ASoC: SOF: mediatek: mt8195: Constify snd_sof_dsp_ops
git bisect skip 8b6d678fede700db6466d73f11fcbad496fa515e
# skip: [172811e3a557d8681a5e2d0f871dc04a2d17eb13] ALSA: hda/cs_dsp_ctl: Use private_free for control cleanup
git bisect skip 172811e3a557d8681a5e2d0f871dc04a2d17eb13
# good: [47aa51677c975a5f66bc93d1c527e8878cf34d6c] ASoC: sunxi: Use snd_soc_substream_to_rtd() for accessing private_data
git bisect good 47aa51677c975a5f66bc93d1c527e8878cf34d6c
# good: [3a07362fab1653d3aca31a9155c8cc776138fd02] Merge tag 'asoc-v6.10' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
git bisect good 3a07362fab1653d3aca31a9155c8cc776138fd02
# skip: [b587f413ca47530b41aadc6f6bda6fc76153f77f] drm/msm/gen_header: allow skipping the validation
git bisect skip b587f413ca47530b41aadc6f6bda6fc76153f77f
# skip: [07823889bf37e2ab8aca11e53743f98c779e9f03] drm/msm/dp: Use function arguments for timing configuration
git bisect skip 07823889bf37e2ab8aca11e53743f98c779e9f03
# skip: [0eb61e200e2425f905d7e102a6303daa58ccf353] drm/msm: Update a6xx registers XML
git bisect skip 0eb61e200e2425f905d7e102a6303daa58ccf353
# skip: [a39eec19753be43de10fd251191a3f9fc65dd8d1] drm/i915/dpio: s/VLV_PLL_DW9_BCAST/VLV_PCS_DW17_BCAST/
git bisect skip a39eec19753be43de10fd251191a3f9fc65dd8d1
# skip: [f3f8207d8aed806e99c30749b1f190a5b0330b37] drm/msm: Add devcoredump support for a750
git bisect skip f3f8207d8aed806e99c30749b1f190a5b0330b37
# skip: [53f72c19ffab53e2baff08d64f35883946ab70b8] drm/msm/dsi: drop mmss_cc.xml.h
git bisect skip 53f72c19ffab53e2baff08d64f35883946ab70b8
# skip: [69b79e8075ba23b77593f56ab128600d8afa01e9] drm/msm/a6xx: Cleanup indexed regs const'ness
git bisect skip 69b79e8075ba23b77593f56ab128600d8afa01e9
# skip: [7533c71316fabddc318c89e68dcc3397984f6361] drm/i915/dpio: s/port/ch/
git bisect skip 7533c71316fabddc318c89e68dcc3397984f6361
# good: [be2d3e9d061552af6c50220ee7b7e76458a3080f] drm/panthor: Kill the faulty_slots variable in panthor_sched_suspend()
git bisect good be2d3e9d061552af6c50220ee7b7e76458a3080f
# skip: [263ed349388e2cbe02ce45be7d9079c96ad21b87] drm/i915/dpio: Give VLV DPIO group register a clearer name
git bisect skip 263ed349388e2cbe02ce45be7d9079c96ad21b87
# good: [9367f430917a12d84f90516489c8b94cab5e6390] Revert "drm/display: Select DRM_KMS_HELPER for DP helpers"
git bisect good 9367f430917a12d84f90516489c8b94cab5e6390
# good: [d7c128cb775ef21c29c3ad7113f5bd4ba886efa9] Revert "drm: Switch DRM_DISPLAY_HDMI_HELPER to depends on"
git bisect good d7c128cb775ef21c29c3ad7113f5bd4ba886efa9
# skip: [185f35fee2208b5e40769843ddfe1387ac0473e3] drm/msm: drop A3xx and A4xx headers
git bisect skip 185f35fee2208b5e40769843ddfe1387ac0473e3
# good: [2d9c72f676e6f79a021b74c6c1c88235e7d5b722] drm/xe: Use ordered WQ for G2H handler
git bisect good 2d9c72f676e6f79a021b74c6c1c88235e7d5b722
# good: [08f441360f760151e742768b379fede54b0cbf6c] drm: move DRM-related CONFIG options into DRM submenu
git bisect good 08f441360f760151e742768b379fede54b0cbf6c
# skip: [f4f392074fc5ada8d19a865fcaf30e50b19eb328] drm/msm: convert msm_format::unpack_align_msb to the flag
git bisect skip f4f392074fc5ada8d19a865fcaf30e50b19eb328
# skip: [e1c6c70abe8c3ea729479e113a8a2348d255396e] drm/i915: pass dev_priv explicitly to PALETTE
git bisect skip e1c6c70abe8c3ea729479e113a8a2348d255396e
# skip: [00f24897a49c6cdce19484d7f2a6e03fd2e801ae] drm/msm: drop msm_kms_funcs::get_format() callback
git bisect skip 00f24897a49c6cdce19484d7f2a6e03fd2e801ae
# skip: [5af5a636ae57395820b231a16d39f44ee8b337dd] drm/i915: pass dev_priv explicitly to PIPE_WGC_C02
git bisect skip 5af5a636ae57395820b231a16d39f44ee8b337dd
# skip: [104e548a7c97da24224b375632fca0fc8b64c0db] drm/msm/mdp4: use drmm-managed allocation for mdp4_plane
git bisect skip 104e548a7c97da24224b375632fca0fc8b64c0db
# skip: [366ec5a525c7c40f431bddc599fd7c959c40212e] drm/i915: pass dev_priv explicitly to PIPE_WGC_C12
git bisect skip 366ec5a525c7c40f431bddc599fd7c959c40212e
# skip: [328660262df89ab64031059909d763f7a8af9570] drm/msm/adreno: fix CP cycles stat retrieval on a7xx
git bisect skip 328660262df89ab64031059909d763f7a8af9570
# good: [275654c02f0ba09d409c36d71dc238e470741e30] Merge tag 'drm-xe-next-fixes-2024-05-09-1' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next
git bisect good 275654c02f0ba09d409c36d71dc238e470741e30
# skip: [91bcea421ecece579eb283eb811d9eb197693772] fbdev: uvesafb: replace deprecated strncpy with strscpy_pad
git bisect skip 91bcea421ecece579eb283eb811d9eb197693772
# good: [d731b1ed15052580b7b2f40559021012d280f1d9] ALSA: hda/realtek: Drop doubly quirk entry for 103c:8a2e
git bisect good d731b1ed15052580b7b2f40559021012d280f1d9
# skip: [27d50646d0815f02677ff870a5691f36be67c08f] fbdev: au1200fb: replace deprecated strncpy with strscpy
git bisect skip 27d50646d0815f02677ff870a5691f36be67c08f
# skip: [688cf598665851b9e8cb5083ff1d208ce43d10ff] fbdev: sisfb: hide unused variables
git bisect skip 688cf598665851b9e8cb5083ff1d208ce43d10ff
# skip: [8667a004d6148351a5d66f67889291b8e7466941] fbdev: fsl-diu-fb: replace deprecated strncpy with strscpy_pad
git bisect skip 8667a004d6148351a5d66f67889291b8e7466941
# skip: [ce4a7ae84a58b9f33aae8d6c769b3c94f3d5ce76] fbdev: offb: replace of_node_put with __free(device_node)
git bisect skip ce4a7ae84a58b9f33aae8d6c769b3c94f3d5ce76
# skip: [fb3b9c2d217f1f51fffe19fc0f4eaf55e2d4ea4f] video: logo: Drop full path of the input filename in generated file
git bisect skip fb3b9c2d217f1f51fffe19fc0f4eaf55e2d4ea4f
# skip: [6ad959b6703e2c4c5d7af03b4cfd5ff608036339] fbdev: savage: Handle err return when savagefb_check_var failed
git bisect skip 6ad959b6703e2c4c5d7af03b4cfd5ff608036339
# skip: [5317797e9cd07ff48132a36d545c25c1687ee676] video: hdmi: prefer length specifier in format over string copying
git bisect skip 5317797e9cd07ff48132a36d545c25c1687ee676
# skip: [ada5caa4e081b067736e872f2701e1c677290f22] fbdev: omap2: replace of_graph_get_next_endpoint()
git bisect skip ada5caa4e081b067736e872f2701e1c677290f22
# skip: [536a82d8362b4e3801cf3ea04767c4dc715cfa1f] fbdev: add HAS_IOPORT dependencies
git bisect skip 536a82d8362b4e3801cf3ea04767c4dc715cfa1f
# skip: [26c8cfb9d1e4b252336d23dd5127a8cbed414a32] fbdev: shmobile: fix snprintf truncation
git bisect skip 26c8cfb9d1e4b252336d23dd5127a8cbed414a32
# skip: [51084f89d687e14d96278241e5200cde4b0985c7] fbdev: sh7760fb: allow modular build
git bisect skip 51084f89d687e14d96278241e5200cde4b0985c7
# bad: [d34672777da3ea919e8adb0670ab91ddadf7dea0] Merge tag 'fbdev-for-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev
git bisect bad d34672777da3ea919e8adb0670ab91ddadf7dea0
# bad: [db5d28c0bfe566908719bec8e25443aabecbb802] Merge tag 'drm-next-2024-05-15' of https://gitlab.freedesktop.org/drm/kernel
git bisect bad db5d28c0bfe566908719bec8e25443aabecbb802
# first bad commit: [db5d28c0bfe566908719bec8e25443aabecbb802] Merge tag 'drm-next-2024-05-15' of https://gitlab.freedesktop.org/drm/kernel

This time it did narrow it down to a single commit, but it’s a merge commit, and I assume it merged multiple commits in this one merge commit. So I think I would need to know which commits were merged here and then do another bisect with these commits?

I also don’t know if this is all just the same problem, or if I’m seeing multiple different problems here. I also had different types of crashes (sometimes with the same kernel version it crashed differently). Most of the time it crashes, and is able to at least recover partially (amdgpu: GPU reset(1) succeeded!), the screen is still mostly frozen, but somehow the mouse still moves again, and I can type reboot via ssh to do a clean reboot. But sometimes it crashes even harder with amdgpu 0000:c1:00.0: amdgpu: GPU reset(1) failed, followed by kernel panics, and then I can still type reboot via ssh, but it never shuts down anymore, so I need to turn it off by holding the power button (but I only had this 3 or 4 times).

@Mario_Limonciello do you think the information I found with the bisect would already be enough to do a bug report, or should I test/debug something more?

SuperTux88 · August 11, 2024, 11:03pm

I also tested a few more things now. First I upgraded mesa to the latest testing version of gentoo (24.1.3 was stable before, and now I have 24.1.5 from gentoo testing), but it still crashes with this version.

Then I now also tested appimage, and it also crashes with the firefox appimage version. But with the flatpak version it indeed doesn’t crash, but I don’t know what that means now Can I compare somehow what version of libraries the flatpak version is using?

Mario_Limonciello · August 12, 2024, 1:19am

The flatpak will probably have a different mesa version than you have in your rootfs. So that could point at an interaction with a particular mesa version and the newer kernel.

Another wild thought; do you have anything using explicit sync? I heard recently that there are some bugs with implicit sync and explicit sync being used at same time.

Mario_Limonciello · August 12, 2024, 1:20am

You can certainly raise it, if others things you need to dig further they’ll say.

SuperTux88 · August 12, 2024, 1:34am

I have no idea? Nothing I’m aware of, I think, or at least nothing I configured manually, I have no idea what uses explicit/implicit syncs by default

Mario_Limonciello · August 12, 2024, 4:32am

Try to run the browser on a native Wayland backend instead of Xwayland.

Leonard · August 12, 2024, 1:02pm

After updating kernel to 6.10 I also experience crashes with logs similar to OP. I’m using Firefox on Fedora 40 with VAAPI enabled. After disabling VAAPI everything seems to be well. A quick search suggests that mesa 24.1.6 (24.1.5 is currently on Fedora) may provide the fix, see 7840h/780m system crash after update to linux kernel 6.10 (#3497) · Issues · drm / amd · GitLab.

SuperTux88 · August 12, 2024, 10:51pm

The flatpak has mesa 24.1.3 (so the same version I also had on the rootfs before I updated to 24.1.5 yesterday).

I found another difference between flatpak and the native Firefox instances, on about:support my native Firefox has HARDWARE_VIDEO_DECODING as “available”, while the flatpak it shows “Force disabled by gfxInfo” and “Blocklisted; failure code FEATURE_FAILURE_VIDEO_DECODING_TEST_FAILED”. Since I think it has something to do with the embedded videos on this website, it’s possible that it’s related to hardware decoding, which would explain why it doesn’t crash on the flatpak version without hardware decoding.

I did all my testing with native Wayland backend and no Xwayland (but I could test with Xwayland if needed).

Thanks for the link, that looks very similar to the problem I have. So I’ll wait for 24.1.6 to be released in a few days, or maybe I’m able to test the patch out tomorrow and see if the problem is fixed.

If 24.1.6 doesn’t fix it, I’ll create a separate new bugreport, but if this is already fixed in the next version there is no need to create more noise than needed

Mario_Limonciello · August 13, 2024, 12:56am

Yeah it sounds like there was likely a mesa bug that the new kernel exposed but it’s fixed in the newer mesa version.

Leonard · August 13, 2024, 10:18am

Unfortunately still managed to reproduce it with VAAPI off in Firefox:

Aug 13 12:11:35 framework kernel: gmc_v11_0_process_interrupt: 52 callbacks suppressed
Aug 13 12:11:35 framework kernel: amdgpu 0000:c1:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:1 pasid:32778)
Aug 13 12:11:35 framework kernel: amdgpu 0000:c1:00.0: amdgpu:  in process RDD Process pid 4743 thread firefox:cs0 pid 4821)
Aug 13 12:11:35 framework kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800108208000 from client 18
Aug 13 12:11:35 framework kernel: amdgpu 0000:c1:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00103A11
Aug 13 12:11:35 framework kernel: amdgpu 0000:c1:00.0: amdgpu:          Faulty UTCL2 client ID: unknown (0x1d)
Aug 13 12:11:35 framework kernel: amdgpu 0000:c1:00.0: amdgpu:          MORE_FAULTS: 0x1
Aug 13 12:11:35 framework kernel: amdgpu 0000:c1:00.0: amdgpu:          WALKER_ERROR: 0x0
Aug 13 12:11:35 framework kernel: amdgpu 0000:c1:00.0: amdgpu:          PERMISSION_FAULTS: 0x1
Aug 13 12:11:35 framework kernel: amdgpu 0000:c1:00.0: amdgpu:          MAPPING_ERROR: 0x0
Aug 13 12:11:35 framework kernel: amdgpu 0000:c1:00.0: amdgpu:          RW: 0x0

etc, etc.

nlordell · August 14, 2024, 3:33pm

Since a recent update (I’m guessing sometime earlier this week), I’ve started seeing strange behaviour where my screen will randomly “completely lock up” and turn itself on-and-off multiple times.

I looked at journalctl logs, and it appears to be caused by a page fault somewhere in the AMD GPU driver:

Aug 14 16:49:20.864423 kernel: amdgpu 0000:c1:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:2 pasid:32787)
Aug 14 16:49:20.866611 kernel: amdgpu 0000:c1:00.0: amdgpu:  in process RDD Process pid 2944 thread firefox:cs0 pid 12238)
Aug 14 16:49:20.866782 kernel: amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 0x0000800106f86000 from client 18
Aug 14 16:49:20.867086 kernel: amdgpu 0000:c1:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00203A11
Aug 14 16:49:20.867350 kernel: amdgpu 0000:c1:00.0: amdgpu:          Faulty UTCL2 client ID: unknown (0x1d)
Aug 14 16:49:20.867682 kernel: amdgpu 0000:c1:00.0: amdgpu:          MORE_FAULTS: 0x1
Aug 14 16:49:20.867835 kernel: amdgpu 0000:c1:00.0: amdgpu:          WALKER_ERROR: 0x0
Aug 14 16:49:20.867977 kernel: amdgpu 0000:c1:00.0: amdgpu:          PERMISSION_FAULTS: 0x1
Aug 14 16:49:20.868159 kernel: amdgpu 0000:c1:00.0: amdgpu:          MAPPING_ERROR: 0x0
Aug 14 16:49:20.868406 kernel: amdgpu 0000:c1:00.0: amdgpu:          RW: 0x0

The turning on-and-off seems to happen as the GPU is trying to reset itself in order to recover (but fails):

Aug 14 16:49:31.087880 kernel: amdgpu 0000:c1:00.0: amdgpu: GPU reset begin!
Aug 14 16:49:31.658494 kernel: [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
Aug 14 16:49:31.876468 kernel: [drm] Register(0) [regUVD_RB_RPTR] failed to reach value 0x00000300 != 0x00000280n
Aug 14 16:49:31.890438 kernel: [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
Aug 14 16:49:31.890589 kernel: amdgpu 0000:c1:00.0: amdgpu: MODE2 reset
Aug 14 16:49:31.932529 kernel: amdgpu 0000:c1:00.0: amdgpu: GPU reset succeeded, trying to resume
Aug 14 16:49:31.933181 kernel: [drm] PCIE GART of 512M enabled (table at 0x000000801FD00000).
Aug 14 16:49:31.933250 kernel: [drm] VRAM is lost due to GPU reset!
Aug 14 16:49:31.933295 kernel: amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
Aug 14 16:49:31.936445 kernel: amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
Aug 14 16:49:31.938428 kernel: [drm] DMUB hardware initialized: version=0x08003D00
Aug 14 16:49:32.347458 kernel: [drm] kiq ring mec 3 pipe 1 q 0
Aug 14 16:49:32.603702 kernel: amdgpu 0000:c1:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring vcn_unified_0 test failed (-110)
Aug 14 16:49:32.604502 kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <vcn_v4_0> failed -110
Aug 14 16:49:32.604572 kernel: amdgpu 0000:c1:00.0: amdgpu: GPU reset(1) failed
Aug 14 16:49:32.604891 kernel: amdgpu 0000:c1:00.0: amdgpu: GPU reset end with ret = -110
Aug 14 16:49:32.605129 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -110
Aug 14 16:49:33.895440 kernel: [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
Aug 14 16:49:34.147454 kernel: [drm] Register(0) [regUVD_RB_RPTR] failed to reach value 0x00000040 != 0x00000000n
Aug 14 16:49:36.709902 kernel: [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n

Edit: To clarify, I’m on Fedora 40.

Mario_Limonciello · August 14, 2024, 3:46pm

You should file a bug report; this is very likely a mesa bug.

nlordell · August 15, 2024, 7:37am

Thanks for the tip. I do use some mesa packages from RPMFusion and noticed they are on slighlty different versions than the system mesa packages and wonder if that might be the root of the problem. I’m going to try and remove the packages for now and see if the issue persists. If it does, then I will submit an issue to mesa.

mesa-dri-drivers.x86_64                              24.1.5-2.fc40                       @updates                                 
mesa-va-drivers-freeworld.x86_64                     24.1.5-1.fc40                       @rpmfusion-free-updates   
mesa-vdpau-drivers-freeworld.x86_64                  24.1.5-1.fc40                       @rpmfusion-free-updates