[RESPONDED] High CPU usage, low FPS when using eGPU

I’m using the Framework 13 (i5-1340P) to power my eGPU (Razer Core X, RX 5700 XT) on a 1080p screen. I’m currently having issues running things on Arch Linux+KDE+Wayland. Specifically, there is stuttering, the eGPU not being 100% utilized, lower than expected FPS that does not improve when lowering graphics quality, and sometimes high CPU usage.

For example, in The Witcher 3, the game can vary between 20-90 fps depending on what is being rendered, and at one point CPU usage reached 80%, causing even audio to stutter and make the game unplayable. In another example, Civilization VI averaged 16 fps on medium settings on the 5700 XT in the Gathering storm benchmark, compared to 31 fps on the iGPU.

I’m using mesa version 1:23.3.2-2, linux version 6.6.10-arch1-1, and kwin version 5.7.10-2.

Conversely, on Windows, my games run as expected and eGPU usage is always 95+%.

I have some Unigine Superposition benchmarks:

  • On 1080p extreme, the 5700 XT scored ~4550 on Windows and ~4450 on Linux (2% decrease in performance)
  • On 1080p medium, the 5700 XT scored ~12600 on Windows and ~7650 on Linux (39% decrease in performance)

I’ve tried the following to boost performance on Linux:

  • Disabling the laptop screen and only using my external monitor
  • Trying all methods provided by all-ways-egpu
  • Following this guide - there were no reported changes in bandwidth by sudo lspci -vv, and the output was no different than on Windows.

Only the top method improved performance, but there is still a big difference between Linux and Windows. Anyone know what could be causing this?

Maybe KDE desktop services is consuming enough processing power to affect performance?
Have you tried lighter DE’s?

btop reports something like 900% CPU usage by Witcher 3, so it’s probably not KDE desktop services.

Just to confirm, I tried using Hyprland and encountered the same terrible performance and audio stuttering issues.

I also tried putting RADV_PERFTEST=nosam into /etc/environment as suggested on egpu.io but I didn’t get a performance boost.

Just to check, I tried running 3DMark PCIe test and the bandwidth exceeded what you would get on PCIe 2.0 x4, so bandwidth shouldn’t be the issue.

We do not test against eGPUs and Arch is a community support distro, that said, it would be interesting to verify some of the following:

  • I have in the past, successfully used an eGPU on Intel 12th gen. I have not had time to touch the project since, however. Also, this was for an NVIDIA card on the 12th gen. Looks like you are using a 13th gen board.

  • Linux with Windows games using Proton will differ from Windows running Windows games, eGPU or otherwise. I am sure you know this, but putting it out there for others reading this.

  • I’d start by verifying Thunderbolt - ArchWiki and also lspci -k | grep -A 2 -E "(VGA|3D|Display)" to make sure everything is correct, including the driver in use.

  • If we see the driver is active. Right click onto the game in Steam. Goto Properties, "set launch options"and enter DRI_PRIME=1 %command%. Close the dialog box. For a system with two AMD cards for example, this would provide for telling the system not to use 0, instead using 1 as it reflects a dedicated dGPU. I suspect it should reflect in a similar way for an AMD eGPU config.

  • This guide doesn’t feel complete. So I’d also match it with the Arch wiki even though the Arch Wiki leans more so with NVIDIA offerings.

To shorten a bit:

  • Verify the driver is active,and the eGPU is active.
  • Make absolutely sure the cable used is supported, to match this output.
  • Make sure you are launching games with DRI_PRIME=1 %command% (it’s what we use for dGPU).

Hmm, I actually did skim over the Thunderbolt section of the eGPU guide on Arch Linux.
Interestingly, when I run sudo dmesg | grep PCIe I find

[    0.776961] pci 0000:80:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 0000:00:07.3 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)
[    0.778124] pci 0000:82:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 0000:00:07.3 (capable of 252.048 Gb/s with 16.0 GT/s PCIe x16 link)
[    0.788144] pci 0000:84:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 0000:00:07.3 (capable of 252.048 Gb/s with 16.0 GT/s PCIe x16 link)

…which I guess means I’m using PCIe 1.0 x4?

Just to check, I ran sudo lspci -vv -s 0000:00:07.3 which tells me

00:07.3 PCI bridge: Intel Corporation Device a71f (prog-if 00 [Normal decode])
	Subsystem: Framework Computer Inc. Device 0003
        ...		
        LnkSta:	Speed 2.5GT/s, Width x4
			TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-

which I guess for some reason is running in PCIe 1.0 mode? I don’t know.

Re: other stuff: lspci does show the 5700 XT using the amdgpu module, and DRI_PRIME=1 switches things back to the iGPU (I guess because I’ve already applied all-ways-egpu).

Have you tried this? AMD Performance Fixes · ewagner12/all-ways-egpu Wiki · GitHub

Yes, but they haven’t fixed the issue. It seems like the ports coming out of my laptop are causing the bottleneck, rather than the eGPU controller.

EDIT: complicating matters, boltctl reports

 ● Razer Core X
   ├─ type:          peripheral
   ├─ name:          Core X
   ├─ vendor:        Razer
   ├─ uuid:          ce010000-0070-6518-23fe-a68e5034c902
   ├─ generation:    Thunderbolt 3
   ├─ status:        authorized
   │  ├─ domain:     77138780-814e-d6fe-ffff-ffffffffffff
   │  ├─ rx speed:   40 Gb/s = 2 lanes * 20 Gb/s
   │  ├─ tx speed:   40 Gb/s = 2 lanes * 20 Gb/s
   │  └─ authflags:  boot
   ├─ authorized:    Thu 11 Jan 2024 10:07:52 UTC
   ├─ connected:     Thu 11 Jan 2024 10:07:52 UTC
   └─ stored:        Sat 06 Jan 2024 11:51:13 UTC
      ├─ policy:     iommu
      └─ key:        no

so is the transfer speed 8 Gb/s or 40 Gb/s?

Not necessarily. It is possible that the eGPU is slowing down it’s interface for energy savings when it’s idling.

There are patches waiting to be merged that fix this issue in the amdgpu driver. If you’re comfortable with compiling a custom kernel, you can apply this patch to 6.7 (follow the thread to get both .patch files) [PATCH v2 1/2] drm/amd: Use the first non-dGPU PCI device for BW limits - Mario Limonciello

Unfortunately my eGPU isn’t working at the moment so I won’t be able to test anything. That being said Mario Limonciello says “don’t apply it when connected to an eGPU enclosure connected to an Intel host”.

The thread also mentions a Gitlab issue, which mentions dmesg incorrectly reporting bandwidth if not hotplugging and also providing a GPU bandwidth benchmark. I’ll run it when I can.