Still having the issues if I turn off UMA_GAME_OPTIMIZED in the bios.
I hope the 6.7.x kernel will fix it
I still get them in 6.7.3 Fedora 39
A little update: Just updated the kernel to 6.7.4 and it has made things worse. The external screen turns white more frequently. I would just permanently set the UMA_GAME_OPTIMIZED option to on.
I don’t know if it’s a hardware or a driver issue, but it’s really frustrating.
Is it safe to say those of us with AMD GPUs should stick with the 6.6.x kernels?
No need to avoid updating. You can just set the UMA_GAME_OPTIMIZED option and it will work fine . It’s good that AMD is looking into it.
Awesome, thank you for this. Is this filed with Fedora if it is not already? It’s brand new it looks like, so I am sure it has not been - asking to avoid duplication.
Not filed.
This is upstream submission.
[PATCH 1/2] drm/buddy: Fix alloc_range() error handling code (kernel.org)
Not sure if this is related here but external displays have started blowing up my CPU usage for some reason, 7640u F39 KDE Kernel 6.7.4-200.
@Mario_Limonciello I’ve applied the patch you linked to 6.7.4 and I’m still seeing issues when unplugging my USB-C display. It coincides with the IOMMU reporting lots of errors:
Feb 14 15:15:01 avalon kernel: amdgpu 0000:c1:00.0: Using 44-bit DMA addresses
Feb 14 15:15:01 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffffc00000 flags=0x0000]
Feb 14 15:15:01 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffffc01000 flags=0x0000]
Feb 14 15:15:01 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffffc02000 flags=0x0000]
Feb 14 15:15:01 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffffc03000 flags=0x0000]
Feb 14 15:15:01 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffffc04000 flags=0x0000]
Feb 14 15:15:01 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffffc05000 flags=0x0000]
Feb 14 15:15:01 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffffc06000 flags=0x0000]
Feb 14 15:15:01 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffffc07000 flags=0x0000]
Feb 14 15:15:01 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffffc08000 flags=0x0000]
Feb 14 15:15:01 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffffc84000 flags=0x0000]
...
Feb 14 15:15:06 avalon kernel: amd_iommu_report_page_fault: 80096 callbacks suppressed
Feb 14 15:15:06 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffffe3c0000 flags=0x0000]
Feb 14 15:15:06 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffffe3c1000 flags=0x0000]
Feb 14 15:15:06 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffffe3c2000 flags=0x0000]
Feb 14 15:15:06 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffffe3c3000 flags=0x0000]
Feb 14 15:15:06 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffffe3c4000 flags=0x0000]
Feb 14 15:15:06 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffffe3c5000 flags=0x0000]
Feb 14 15:15:06 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffffe3c6000 flags=0x0000]
Feb 14 15:15:06 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffffe3c7000 flags=0x0000]
Feb 14 15:15:06 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffffe3e0000 flags=0x0000]
Feb 14 15:15:06 avalon kernel: amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffffe3cc000 flags=0x0000]
The addresses also look fishy. I only have 32G of RAM and the addresses are in the 4 TB range?
The patch helps only with the graphical corruption, not this white screen + IOMMU issue that is reported by several people. Yes; the IOMMU is the messenger here. It’s still not clear where the bug is that is causing this.
Ah, ok. Is there a known workaround besides disabling the IOMMU? Or an upstream bug report to follow?
Besides losing some capabilities in virtualizing PCIe devices, is there any downside to having the IOMMU disabled? I use this laptop mainly as a workstation and I do run some VMs, but not with PCIe devices assigned to them.
Apart from that, the issue is still happening so far only after hibernate/suspend on Fedora 39 with kernel 6.7.3-200.fc39.x86_64
, so no change as of yet
I had issues with the sd-card controller on my old laptop without the IOMMU. iommu=soft
worked around that.
The problem with disabling the IOMMU is that a buggy (malicous) driver/device may now have an easier time corrupting memory in your system. That may or may not be an issue for you.
As a side note, I’m not sure if this is actually an IOMMU problem or a case of the GPU scribbling over random physical memory (on the driver’s mistaken directions) and mostly getting away with it unless the IOMMU catches it red-handed. Whether you’re fine allowing this to run and hoping for the best is for you to decide, of course.
Right, thanks, I’ll leave it enabled then (I only ever used it for virtualizing PCIe devices, but I never knew it also handled stuff like this ;-).
The graphical corruption isn’t specific to Framework but the AMD cpu?
Thus far it’s only been reported by users on Framework laptops. I have an educated but unsubstantiated suspicion it’s related to a BIOS interaction.
So after already enabling UMA_GAME_OPTIMIZED
in the BIOS the issues became less frequent, but still appeared. I have also added amdgpu.sg_display=0
to my kernel boot parameters and so far the issues have dissappeared. So the workaround posted by @Mario_Limonciello in the upstream bug seem to work just fine
It will be interesting to see if the same issue will be reproducible on the zen4 AGPU sku’s.
I haven’t seen it on the zen3 APU’s i’ve tested with (5700G)- but they are based on the much older navi IP block.