Anyone running a Framework DIY with external GPU via Thunderbolt?

Hey Everyone,

I’m experimenting with an external GPU (eGPU) setup via Thunderbolt on my Framework DIY Edition (13" AMD), mainly for occasional gaming and AI model testing.

Has anyone here tried a similar setup? Curious about:

Compatibility with AMD vs Intel mainboards

Performance bottlenecks (especially with TB3 vs TB4)

Any BIOS tweaks or kernel flags (for Linux users)?

Also wondering if there’s an official plan to optimize eGPU support down the line or if it’s fully dependent on OS/firmware updates.

Would love to hear your experiences!

— Jhonn Mick

Plenty of folks have done it. I certainly did. There is little performance difference between TB4 and TB3, there is an ocean of difference between TB3/4 and TB5. It’s pretty much plug and play although I can’t recommend Intel GPU’s due to the need for ReBAR and some issues I ran into. I no longer own that GPU nor my Framework but hot-plugging/unplugging didn’t work that well but that was likely an Intel GPU problem. If going NVIDIA make sure to get a 20-series or newer card. AMD is fine no matter what. There is no stated plan to optimize for eGPU usage. BIOS update cadence has improved but Framework is still battling a poor reputation in this area. All ports on Intel based FW13 models are TB certified. Some ports are TB capable in AMD based products but not all. Consult the product pages to determine if that works for you.

3 Likes

On AMD only TB3 is available, no TB4 or TB5. For AMD you should get an USB4 eGPU enclosing/connector. There is a noticable and measurable performance increase when using USB4 instead of TB3. Little reminder, USB4 is not TB4 or TB5.

That’s quite a strong word there :smiley: I’d maybe say “sitting out”

To the point of the topic, I am seeing very good results even on AMD with a Thunderbolt 3 enclosure. The 3080 in an eGPU performs about 10-15% better than my 2080 Ti in a desktop. Which is very close to the performance difference of those cards when both are in the same system.

So far, I’ve heard that AMD systems are better than Intel in terms of overhead with the Thunderbolt protocol. And of course, the more modern and powerful the card the more the communication interface will be a bottleneck. But it seems like right now the situation is miles better than it was when I was running a 1080 Ti with an Intel chip - there you’d easily lose 20% of performance when running eGPU compared to desktop build.

Well on linux you may want to get something supported by the amdgpu drvier (gcn1/gcn2 with a kernel flag or gcn3+ without). But anything semi recent should be just fine.

Amd USB4 host plus amedia based egpu can get close to the full usb4 bandwidth worth of pcie tunneling so it’s a pretty big difference compared to tb3/4 based setups. Of course tb5 is on an other level there but you can’t get that stuff integrated to an soc jet and external controllers come with their own issues.

2 Likes

Except for which ports support what, both AMD and Intel support the necessary protocols. I think overall USB4 support has been more flaky under Linux with AMD being still a bit worse, because they got into it later. And AMD would be expected to still have slightly more compatibility issues with old TB3 stuff, because those no longer get firmware updates and back then only needed to be compatible with other Intel products. But this should be irrelevant for anything USB4-based that tends to work equally good/bad across Intel/AMD with the difference only being from the mainboards and their firmware and how much testing the manufacturer did with the sort of equipment you are running.

Only the recently added and USB4 Boot support BIOS option to prevent eGPUs affecting TPM measurements. But that also disables NVMe boot over USB4 so is already half broken.

Performance just depends 99% on the hardware. Which is all integrated into the CPU. Nothing to tweak here regarding performance.

My old GTX 980 Ti in a Razer Core X works exactly the same on the Strix Point and Alder Lake FW13. While blue screening my work laptop (HP, also Strix Point) immediately. And that HP laptop is supposedly TB4 certified, while the Strix Point FW13 is not. Goes to show you how worthless the TB4 certification can be for reliability.
Most important is, that the hardware / chips used are certifiable (i.e. fullfill Intel’s minimum requirements and are not completely borked). This seems solved now, basically every USB4 controller has achieved TB4 certification on some board. And the remaining differences seem to be not well tested by the certification.

No.
AMD has USB4 40G ports with the mandatory TB3 backward compatibility, same as Intel.
There are differences in the details across generations and between Intel’s and AMDs USB4 controllers.

TB4 and TB5 are certifications (tests and paperwork) for USB4. AMD has long surpassed the minimum requirements for TB4 certification. And nowadays there are even some Laptops with AMD CPUs that are TB4 certified.
So TB4 is essentially just a short hand for USB4 40G with a few optional USB4 features. And Microsoft forced shared drivers for all of them, so the OS is in charge of half of the stuff and treats all of it only according to the USB4 standard.

Regarding performance: this is down to the controller generations and the CPU-internal PCIe implementation.
In actuality, TB3 connections are slightly faster than USB4 40G connections and have the EXACT same overheads as the USB4 we have so far available from Framework.
What usually limits the performance is old TB3 controllers that have PCIe throughput limits due limited PCIe speed support or other internal limits.

Similar with USB4. 11th gen still had a PCIe bandwidth bottleneck, which Intel has removed since 12th gen. Since then all CPU integrated USB4 controllers from Intel and AMD can saturate the entire USB4 40G connection with PCIe. They just need a controller on the other side that can also do that.

The actual overhead is constant and determined by the protocol.
It seems in PCIe bandwidth tests, AMD achieves slightly higher H2D numbers, even when the D2H numbers are identical with Intel’s (so not a PCIe throughput limit of the USB4 controller, but sth. deeper how their respective PCIe implementations interact with USB4/TB3 tunneling).
I don’t precisely know where this comes from yet, if it is some result of latency or the structure of their PCIe implementation etc.. But it also seems to be a small and hard to pin down difference.

Optimize what? I don’t know the current state of ReBAR. But then again, most people don’t understand what that does and why its unlikely to be beneficial for USB4 eGPUs. In general you’d want that off for USB4/TB eGPUs anyway.

And a lot of testing. Stuff like certified 1.8M TB4 cables being guaranteed to work reliably with certified devices is worth something and also takes some effort on the manufacturer side compared to just making usb4 work at all. (Bluescreening or not is not really related to that level, the cert just verifies signal integrity, feature set and the implementation of the protocol are right)

Unless you want to use an intel gpu, then you are just kinda screwed.

I am starting to question this very much. Yes, that would be the point why its worth sth. but this is not consistent:
Dell U3225QE (TB4 certified) with HP Elitebook X G1a (AMD Strix Point, TB4 certified). Does not work at all together.
Dell U3225QE (TB4 certified) with FW13 12th gen (TB4 certified): TB3 connection, basic fail.
Dell U3225QE (TB4 certified) with Caldigit Element Hub (TB4 certified): TB3 connection, basic fail.
Dell U3225QE (TB4 certified) with FW13 Strix Point (not certified at all): USB4 connection, works as expected (after last BIOS update).

Also, Dell managed to get a product certified that violates PD spec. Because they falsely advertise 140W PD, but only support 90W PD SPR, which breaks the USB specs. By the advertising of TB4 including USB compliance, this should not be allowed.

So how much worth is that TB4 certification and the supposed testing? Maybe its some manufacturers that lie and cheat with the testing? But still, from the outside, still looks worthless.

I think, most of the value was from Intel testing their own chips and them being basically the only game in town (so only needing compatibility to themselves). Active TB4 cables may actually mandate using Intel’s cable ReTimer chips (and thus the length limitation). On the other hand, we do not even get to know which DP speed they actually include, even though they claim “universal cables” (= includes DP support for active cables, which is not mandatory in USB-C) over USB-certified 40G cables. So maybe there is testing. But we don’t even get to know the level of signal integrity for which they test. For USB-IF certification we at least know that.

Are you? Do they perform with CPU-initiated transactions over USB4/TB3? Because of the higher latency with TB3/USB4, this usually hurts performance more than it helps. I know that Intel needs ReBAR to perform in desktop.

Like I said, most people don’t understand what ReBAR does and how it plays into it so they say they need it, without realizing it is likely to make eGPU performance worse with TB3/USB4 eGPUs. It does for Nvidia and AMD. So is Intel’s implementation that bad, that it still is needed for performance over TB3/USB4? Or would ReBAR only make them them even worse for that use?

The testing is less about operating correctly and more about the signal integrity and protocol level.

That certainly is a factor and could explain certain blind sports in certification. Would be kinda neat if the usb if could take over the tb branding and it basically just being an alias for certain feature-sets of usb4 but that’s wishful thinking here.

Intel needs rebar to perform period, hence the screwed bit. Their architecture was never designed to be used without it and likely just support fixed sized access through an extremely inefficient compatibility layer. If the pain from not having rebar is worse than the pain of rebar over usb4 is worse idk but sounds like intel egpus are not on the menu.

Well, not quite. My A770 was functional in games. Ofc, I have no way of knowing if performance would be enhanced with ReBAR.

They do function without rebar but performance (specially 1% and 0.1% lows) take a brutal hit to the point where you would be better off with much lower tier amd or nvidia cards.

I don’t have the ability to test it anymore but that wasn’t the issue. The games that ran, ran well, average frame rates were a problem if settings were cranked too high but I don’t recall stuttering in game from poor 1% lows.

It might be that @Ray519 is correct and ReBAR is unnecessary in an eGPU context. Game compatibility was poor and there were plenty of other bugs to go with the A770 experience at the time.

That’s exactly what I wrote TB4 and USB4 are not the same. TB4 is an addition to USB4.

You made it sound like USB4 was separate from TB4 and gave wrong info on TB3 performance.

And even I was imprecise. TB4 ensures a higher minimum feature-level than just what “USB4 40G” label ensures. But all the features of TB4 are USb4 features, they are just optional.

Sadly no manufacturer details the exact feature level of any TB4 or USB4 port. And AMD happens to deliver USB4 that matches or surpasses the TB4 minimums in every dimension. Just as Intel has now 5 hardware generations of TB4, where the first only provided the absolute minimums required by TB4 and the newest generation far surpasses them and even AMD’s current capabilities (in some regards).

I’ve actually been considering looking into doing an eGPU set up as that was advertised as possible with the FW16 but based on what I’m reading from you all, it doesn’t sound like a good idea? Honestly I’ve never done it before so I would have to do some reading on how it all works. I’ve done some browsing but not found reliable sources, can one of you kind folks point me in the right direction to learn more about this?

Edit: typos and punctuation

Thunderbolt eGPU is easy and plug-and-play.

1 Like

TB3 performance with eGPU is indeed much lower than USB4 performance, nothing wrong about that.

1 Like

No. TB3 40G performance does not need to be lower than USB4 40G performance. The TB3 connection is ever so slightly faster than the USB4 40G connection actually.

The only thing that can make TB3 slower is bad controllers that have an internal PCIe bandwidth bottleneck or the PCIe port to the GPU.

But that is not TB3’s fault.
The ASM2464 controller has PCIe x4 Gen 4. That is why its faster than older Intel JHL7440 controllers (with their x4 Gen 3 port).
if you’d run the ASM2464 in TB3 mode instead of USB4 it actually gets slightly faster.

If you get a different USB4 controller that does not have a x4 Gen 4 port, then nothing will get faster with USB4. Because USB4 vs. TB3 has nothing to do with it. Its only about the controllers and their PCIe throughput.

And all but the 11th FW13 can max out the PCIe bandwidth such that they will not be the bottleneck and just the other side. And again, its not because USB4 vs. anything, because the 11th gen Framework still is USB4. Its USB4 controller just was limited to x4 Gen 3 PCIe throughput.

Sure, nobody is producing new controllers that will be advertised as “TB3 controllers” and over half of the TB3 eGPU enclosures used first gen TB3 chips that had those further internal limits on top (those that only reach 2.6-2.7 GB/s instead of 3.0-3.1 GB/s). But its not because its TB3, its because they are really old.

Same with TB4. Intel’s first TB4 controllers only had x4 Gen 3 PCIe ports / bandwidth. Or in the case of the JHL8440 even only x1 Gen 3 (on its actual PCIe port). But the newest Intel TB4 controller (JHL9440) has a x4 Gen 4 PCIe port just like the ASM2464. And it will likely be sold under the TB4 name and not USB4. Even though it is a USB4 controller. Just one for which you can get the TB4 certification on top cheaper than you can get it for the ASM2464.

Half the eGPU solutions that advertise USB4 are lying and are using an old JHL7440 TB3 controller (like the GDP G1, OneXGPU and many others). So looking for just “USB4” is not very helpful and can be misleading.
Only ones with JHL9440, JHL9480 or ASM2464 would actually be as fast as you can get our Framework notebooks.

3 Likes

As always, I appreciate this level of technical knowledge from you Ray.

1 Like