[TRACKING] Request: verify dGPU support

Does anyone have a summary or up to date info on what currently works? It seems that there has been a lot of progress towards working systems but the info is a bit scattered.

My plan is to use an pcie 4.0 x4→x16 powered riser and a 9070 XT on cachyos, but I’m curious on the current state of the known issues:

  • AMD on AMD power limit, where APU TDP is applied to dGPU, as reported here External GPU – Strix Halo Wiki
  • PCI 4.0 x4 slot negotiating down to 3.0 speeds, but it seems people did not have this issue using the m.2 slot instead.

Is it still the case that nvidia GPUs are working better/more stable?

Any updated info on this stuff is appreciated! I’ve read this thread many times and it’s hard to know where things are at now.

I’ve been running a Radeon AI PRO R9700 (basically pro 9070xt) without issue connected to the x4 pcie slot for a month or two. I am using the current latest 3.04 firmware. I have used this setup both for gaming and running LLMs in both windows and linux.

I initially had some trouble getting the card working. Had stuttering and driver crashes using both windows and linux. I even tried using pcie4.0 redriver cards with all sorts of adapters and cables.

What worked for me was just passive cards with the shortest cable I could get (50cm) and disabling ASPM in the operating system. On windows setting the power plan PCIe power saving to off and in linux setting kernel arg for amdgpu.aspm=0.

Maybe 1 of every 10 boots the card doesn’t enumerate properly on the pcie bus, but it works most of the time and usually only requires a warm reboot. When it doesn’t work I get no display output on the dGPU in either windows or linux. On windows the dGPU will sometimes show as disabled in device manager on a cold boot and showing no dgpu display output, but then I can usually just disable/enable the card in device manager on a display connected to the igpu and the dgpu will start working without a reboot. Linux seems to be more reliable booting with the aspm off kernel arg and rarely fails to load the driver for the dgpu.

I just ran furmark in linux on the R9700 and it was reporting using the R9700’s max 300W in LACT. When I tried running multiple instances of furmark (one running on R9700 dgpu and one on 8060S igpu) the R9700 reached 200W and the 8060S reached 140W. So there may still be some limitations on the dgpu power with an amd dgpu combo which would make sense to keep it under the limits of the 400W PSU included with the full framework desktop kit. I am assuming a higher powered amd dgpu would be limited to maybe ~350-400W if the igpu isn’t being used. I haven’t put any effort into trying to override the power limitation with overclocking or power profile type tuning.

I did migrate my framework desktop from the original framework case into a larger case where I could fit the mainboard, dGPU and 1000W PSU. The power limit is proably somewhere in the APU firmware side of things and not the PSU available power.

Would be nice to have PCIe ASPM options in the UEFI menus along with a way to decouple the APU iGPU and dGPU power limits.

3 Likes

This is very very helpful. Thank you

Would you mind sharing the riser cable you are using and whether it’s externally powered?

Thank you!

I didn’t use a direct riser cable, but a powered adapter board connected to a x4 pcie adapter using slimsas cable.

Basiacally these:



Some amazon links (not the exact ones I have, but they are mostly generic passive boards):

I already had the adapter boards, so I just needed a cable to connect them.

Could maybe also do it with mcio based boards, but I haven’t seen many of the mcio 4i cables and pcie adapter type boards that would fit in the framework desktop x4 slot.

Reporting back with my findings.

I installed a Reaper 9070 XT on the x4 port using a K23SL-TL riser. Using a 15cm for now, but need a 25cm for my fractal terra.

BIOS version 3.05, although I saw no difference in behavior in 3.04

At first, I tried using the riser without external power, and everything “appeared” to be working. LACT shows negotiation at pcie 4.0 x4 speeds, the 9070 XT, got good FPS in games, benchmarks lined up to expectations and I saw the power draw from the 9070XT to 340W no problem. This is on cachyos.

I then tested on windows and also got good results, except at some point there was a driver conflict. I booted back up without the 9070XT, ran DDU, rebooted with the 9070 XT and reinstalled the drivers from AMDs website and everything has been fine since. Got Steel Nomad scores slightly below average but that is expected with the pcie bottleneck.

I then attached a molex 12V connector to my riser so I could compare the differences. While everything “appeared” to be fine without external power, attaching the power cable boosted my FPS and power draw of the card, so I think it’s necessary for optimal performance.

Now the issues:

I didn’t have any boot issues or need any kernel flags (pcie power mgmt was already disabled windows), but when my two monitors were plugged into the dGPU, I could not come back from suspend in Linux. The machine would wake and i could SSH, but no video. At suspend, the dGPU fans would ramp up for a minute or two. Reloading the GPU through a script seems to fix it, but my whole session would be restarted so it was not ideal. I then tried plugging in one of my monitors to the igpu and the suspend issue from the dGPU side went way, but then the CPU cooler fan would ramp up and stay ramped up when entering suspend. I tried a bunch of different kernel flags:

amdgpu.bapm=0 amdgpu.aspm=0 pm_async=0 amdgpu.sg_display=0 amdgpu.runpm=0 pcie_port_pm=off

None of them seemed to work, but I still have them applied and have not gone through pruning. At this point I gave up and since I was having USB suspend issues on BIOS 3.05 anyway, I am just going to continue with suspend disabled. Windows has no such issues.

I had similar experience and landed on the same decision to not use suspend under linux. Most of my usage under linux is just running LLMs and accessing the system over the network and then shutdown or reboot to windows for certain games.

For example I ran into the fan spinning even in idle for my R9700 after some LLM use:

There are other RDNA4 linux fixes coming in 7.0+.

For the most part it has been pretty stable, but I do get occasional issues with RDNA4 dgpu display output not working on boot. It seems to be transient and may not be specific to just being hooked up to the framework desktop.