What PCIE network card for 40+ GB/s

Peter_Drayton · January 25, 2026, 10:51am

I tried this on Fedora Server 43 but hit a couple small issues:

Trivial syntax issue with the miimon parameter, just needed to lose the =
Error adding the thunderbolt links to the bond as configured

for #2, apparently bond with balance-rr requires the same MAC addresses on both links, so but the thunderbolt-net driver doesn’t allow changing the MAC address. AI suggested changing to an active-backup bond mode, which would halve the bandwidth. Or to bond them at the TCP level and use MTCP (not sure if this is a workable option for internode comms)

How did you sidestep these issues? What performance were you able to measure?

Lorphos · January 25, 2026, 1:31pm

I’m using a cheap cable myself (UGOURD USB4 0.3m long, with E-Marker chip according to the description).
The Ryzen 395 has two USB4 v1 (40Gbit/s) ports are also capable of Thunderbolt 3 (also 40GBit/s).
To me it seems right now that the reason why we’re only getting 9GBit/s or so via thunderbolt-net on Linux is caused by the kernel layer and the ethernet-via-thunderbolt stack, not the cables.
I’m getting slightly over 9GBit/s per port in iperf3 on Linux.

Thomas_Munn · January 25, 2026, 6:24pm

Another note that this is AI generated, but I input your question into the harder thinker and it came out with this:

The Configuration (IPv6 MPTCP)

For MPTCP to work effectively on local links, you must put the two Thunderbolt links on different subnets so the kernel sees them as distinct paths.

1. Configure “Systemd-Networkd” (Corrected)

Delete the old bond0 files. We will treat the links as independent.

On Strix Halo A (Server):

Link 1 (thunderbolt0): IP fd00:1::1/64
Link 2 (thunderbolt1): IP fd00:2::1/64

On Strix Halo B (Client):

Link 1 (thunderbolt0): IP fd00:1::2/64
Link 2 (thunderbolt1): IP fd00:2::2/64

Example .network file for Strix Halo A (/etc/systemd/network/20-thunderbolt0.network):

Ini, TOML

[Match]
Name=thunderbolt0

[Network]
Address=fd00:1::1/64
IPv6AcceptRA=no

[Link]
MTUBytes=65528

(Repeat for thunderbolt1 with fd00:2::1/64).

2. Enable MPTCP in Kernel

Create /etc/sysctl.d/99-mptcp.conf:

Ini, TOML

# Enable MPTCP
net.mptcp.enabled=1

# Allow MPTCP to control path limits (4 subflows is plenty for 2 links)
net.mptcp.pm_type=0
net.mptcp.allow_join_initial_addr_port=1

Apply with sudo sysctl --system.

3. Configure the Endpoints (The “Magic” Step)

By default, Linux might only use the interface the traffic started on. You must explicitly tell the kernel to advertise the second interface as an available path.

On BOTH machines (run via script on boot):

Bash

# Tell the MPTCP path manager about the second link
# (Adjust interface names/IPs based on which machine you are on)
sudo ip mptcp endpoint add fd00:2::1 dev thunderbolt1 signal

Note: The signal flag tells the other side “Hey, I also have this IP address available for data.”

How to Test (Performance Measurement)

Standard tools like scp do NOT support MPTCP yet. You must use tools that are MPTCP-aware or force them.

1. Install mptcpd:

Bash

sudo dnf install mptcpd

2. Run iperf3 with MPTCP: MPTCP requires the application to request IPPROTO_MPTCP instead of IPPROTO_TCP. mptcpize forces this for legacy apps.

Server (Halo A):

Bash
```
mptcpize run iperf3 -s
```
Client (Halo B):

Bash
```
mptcpize run iperf3 -c fd00:1::1
```

Expected Result: You should see the bandwidth sum up (~40Gbps+). If you check ip mptcp monitor during the test, you will see it join the subflows.

Peter_Drayton · January 25, 2026, 9:32pm

@Thomas_Munn just to be clear - you aren’t saying that you actually did any of this and got 40Gbps, you are saying that you asked Gemini/whatever and it suggested this? Or did you try the original suggestion, hit the snag, swapped to MPTCP, then got it working?

I’m asking because I actually tried it, hit the snags I reported earlier, had Gemini spin its wheels for a bunch of time suggesting all sorts of things, every-increasing it’s “OMG I can’t believe this didn’t work now this will truly fix it” hallucinations - and eventually it landed on “ok, I give up on balance-rr because of thunderbolt-net, go use MPTCP”. This was the point I asked how you got around the issue

I’d love to see the results it it worked. If not, it seems we are using the same AIs (since they even suggest pretty much identical things and even down to MTU size (65528 vs 9000).

BTW, I’m sticking w. ipv4 for now because while ipv6 is a teeny bit faster in theory, all the tools seem better at dealing w. ipv4.

Bottom line? with 2 links (cable1 and cable2), running bi-di tests, I can see ~10Gbps TX and RX on both cable1 and cable2, on both FWD1 and FWD2. But 20Gbps aggregate bandwidth on a USB4 port is a far cry from the claimed 40Gbps.

Other things I have verified / areas I still need to test some more:

Both physical links are reporting as 2x20Gbps i.e. 40Gbps so I think the cables are fine. The cables are <1ft each.
All interfaces on both machines are in the trusted zone so no firewall overhead
Kernel params are iommu=pt and usb4_dma_protection=off. Fun fact was that I learned amd_iommu=on is not even a valid parameter, so articles saying turn that on are hallucinating
I tried all of the sysctl net.core.* settings you cited (my AI session proposed similar ones) but none of them had any appreciable affect on the measured 10Gbps soft cap
My AI session was worrying about power management on the USB4 controllers causing them to maybe flap, causing a downgrade to USB3.2 speeds (10Gpbs)
Lots of the suggestions from the AIs were “go change XYZ in the BIOS” which on the FWD is a complete non-starter

I’m still going to test all the random avenues that are being proposed including MPTCP but I’m not super hopeful. I really want someone to post actual iperf3 results that show something meaningfully over 10Gbps in one direction on the FWD via a USB4 ports. That way at least I know I’m not spinning my wheels.

James3 · January 25, 2026, 9:45pm

Hi,
When measuring network performance, one cannot fill the pipe with a single tcp stream. So, you need to run a test that is not only multithreaded, but also creates multiple tcp links.
So, maybe a iperf command will do that, or run multiple iperf on different ports, or some other test tool that can create multiple tcp links.

Peter_Drayton · January 25, 2026, 10:52pm

I’m welcome suggestions on how to test this better. Either with iperf or some other set of tools?

Here’s what I’m getting so far, pretty consistent results doing bidi tests on both links, measured both ends:

Throughput on client (fwd1):

Interface    | RX Gbps    | TX Gbps
-------------|------------|------------
thunderbolt0 | 11.99      | 11.93
thunderbolt1 | 11.93      | 11.97

Throughput on server (fwd2):

Interface    | RX Gbps    | TX Gbps
-------------|------------|------------
thunderbolt0 | 12.05      | 11.98
thunderbolt1 | 12.01      | 12.08

Watcher script is:

watch -n 1 '
echo "Interface    | RX Gbps    | TX Gbps";
echo "-------------|------------|------------";
for dev in thunderbolt0 thunderbolt1; do
  R1=$(cat /sys/class/net/$dev/statistics/rx_bytes)
  T1=$(cat /sys/class/net/$dev/statistics/tx_bytes)
  sleep 0.1
  R2=$(cat /sys/class/net/$dev/statistics/rx_bytes)
  T2=$(cat /sys/class/net/$dev/statistics/tx_bytes)
  
  # Calculate (Delta * 8 bits / 0.1 seconds) / 10^9 to get Gbps
  RX_GBPS=$(echo "scale=2; ($R2 - $R1) * 8 / 100000000" | bc)
  TX_GBPS=$(echo "scale=2; ($T2 - $T1) * 8 / 100000000" | bc)
  
  printf "%-12s | %-10s | %-10s\n" "$dev" "$RX_GBPS" "$TX_GBPS"
done'

Client (fwd1) tests are run with:

iperf3 -c 10.0.0.2 -p 5202 -P 8 -t 150 --bidir
iperf3 -c 10.0.1.2 -p 5212 -P 8 -t 150 --bidir

Server (fwd2) listener is spun up with:

iperf3 -s -B 10.0.0.2 -p 5202 -D
iperf3 -s -B 10.0.1.2 -p 5212 -D

Client and server setup is similar except for IP addresses ofc. IOMMU and DMA protection set appropriately. MTU set to 9000 (also tested higher). etc.

Thomas_Munn · January 26, 2026, 2:52am

I did note that I hadn’t tried this yet! I don’t have 2 thunderbolt devices to test with yet! the desktop is upstairs and the framework is downstairs……Hence the “AI” disclaimer…….

Thomas_Munn · January 26, 2026, 2:54am

10GBPS is a LOT better than 5gbps, however! No cutter saw needed for the framework desktop, too.

Peter_Drayton · January 26, 2026, 4:25am

Understood! I saw the note that the writeup was AI generated, but I assumed the steps had been tested. I’m pretty new to all this so when I discovered that balance-rr was a non-starter I wanted to determine if the issue was my implementation or AI hallucination. My own experience with e.g. Gemini is that it’s very helpful for learning but it does go enthusiastically off the rails so for my part I’m striving to validate & document my steps.

Peter_Drayton · January 26, 2026, 4:37am

Ah yes, it does seem like an improvement over the FWD’s 5Gbe RJ45, although in the Strix Halo Discord someone pointed out that the thunderbolt-net driver is locked to one core and built around a shared Tx/Rx queue across both USB4 ports. If true, very not ideal.

It’s unclear whether faster networking would actually benefit real scenarios, and if so how much faster networking / how much benefit. I’m trying to be very deliberate and to automate my settings and tests so that I can replicate them on different network setups. IMHO our community would derive benefit from more concrete data & repeatable tests.

Peter_Drayton · January 27, 2026, 5:36am

Hi @James3, I took your suggestion and modified things to run a variable # of the tests in parallel. Same results, but that does support there being a hard limit somewhere in the software or hardware stack.

My tests are doing bidi (tx+rx) on both USB4 ports (thunderbolt0 and thunderbolt1). Running 1 instance, we see a little over 10Gbps on each port and in each direction, around 42-45Gbps total across both USB4s. Scaling up the # of instances 1→2→4→8→etc the individual throughput of each test drops proportional to the # of instances, and the total aggregate bandwidth stays at around the same level of 42-45Gbps.

pdrayton@fwd1:~$ ./run_multiple.sh 1
Launching 1 test instances…
Aggregated total for 1 instances (2 total links): 45.73 Gbps (Tx+Rx)
pdrayton@fwd1:~$ ./run_multiple.sh 2
Launching 2 test instances…
Aggregated total for 2 instances (4 total links): 45.72 Gbps (Tx+Rx)
pdrayton@fwd1:~$ ./run_multiple.sh 4
Launching 4 test instances…
Aggregated total for 4 instances (8 total links): 45.72 Gbps (Tx+Rx)
pdrayton@fwd1:~$ ./run_multiple.sh 8
Launching 8 test instances…
Aggregated total for 8 instances (16 total links): 45.74 Gbps (Tx+Rx)
pdrayton@fwd1:~$ ./run_multiple.sh 16
Launching 16 test instances…
Aggregated total for 16 instances (32 total links): 45.82 Gbps (Tx+Rx)
pdrayton@fwd1:~$ ./run_multiple.sh 32
Launching 32 test instances…
Aggregated total for 32 instances (64 total links): 45.74 Gbps (Tx+Rx)

Reports are that other platforms’ USB4 does not have this issue, is seems to be a Strix Halo failing. I’ve not verified this myself yet, but I will eventually get around to testing it with two Nvidia GB10 units.

Djip · January 27, 2026, 7:53pm

For it there is only USB3.2 (2x2) (@20Gbs), no USB4/Thunderbold => no chance to have thunderbolt_net.

Did someone have bench the MAX USB4 with high speed storage to know what the hardware can do?

Marinas_Florin · January 28, 2026, 7:19am

what cables are you using?

Peter_Drayton · January 29, 2026, 1:34am

I have a few differnet TB4/TB5 cables, all short (shorter than the lenght that active starts being a thing). In particular these tests were done most recently with this: Amazon.com: Silkland 80Gbps USB 4 for Short Thunderbolt 5 Cable 1.6FT, 120Gbps Bandwidth for 16K/Dual 8K, 240W Charing, Braided USB C Data Cable Compatible Thunderbolt 4, SSD,MacBook M4 Pro/Max,iPhone 17/16, Dock : Electronics

They are 1.5 foot, TB5, rated to 80Gbs (120Gbs if async but AFAIK that’s more of a Mac thing?). I have seen the recommendation to get Active cables but AFAICT that applies on longer runs of cables, I wasn’t even able to find active TB4/5 cables at 1 → 1.5ft lengths.

Digging through /sys/bus/thunderbolt/devices/\*-\* from both machines, they claim to be negotiated at the full 40Gbps (2x20) on both ends. From FWD1 we see these links to FWD2, and then the similar thing in reverse:

Device: 0-2 (fwd2)
Negotiated: RX 20.0 Gb/s x 2 lanes | TX 20.0 Gb/s x 2 lanes
Total Bandwidth: 40.0 Gbps
Device: 1-2 (fwd2)
Negotiated: RX 20.0 Gb/s x 2 lanes | TX 20.0 Gb/s x 2 lanes
Total Bandwidth: 40.0 Gbps

I have more advanced scripts that saturate both links at once and aggregate results across many clients, but the simplest version that also shows the issue is this:

Server started on FWD2: iperf3 -s -B 10.0.0.2 -p 5202 -D
Client started on FWD1: iperf3 -c 10.0.0.2 -p 5202 -P 8 -t 30 --bidir

I’m consistently seeing it cap out at ~11Gbps in each direction:

Interface	RX Gbps	TX Gbps
thunderbolt0	11.02	10.98

Thomas_Munn · January 29, 2026, 7:16pm

Still better than 5gbps, even if its 10, (and better than a hacksaw!)

Djip · January 29, 2026, 8:54pm

Yes. but the main subject here is to know if we can have more than 40Gbps network…
For now look the only possibility is to make a dual 25Gbps Network card work, and aggregate the 2 links.

For thunderbolt we may have a 20Gbps network if we can find how to config a aggregated link…

Korvin · February 8, 2026, 3:33pm

What’s the end purpose of this network? To fully utilize such bandwith for inference you would need hardware support, since software alone would not be able to saturate it.

You can look into Infiniband network cards that provide support for RDMA.

Peter_Drayton · February 8, 2026, 8:17pm

OK, I had promised to do this a while back, got sidetracked with the MCX5 cards. Finally got around to measuring USB4 performance on the FWD to a high-speed storage device, and I can confirm the USB4 ports are entirely capable of pushing over 30Gbps sustained.

My tests were done w. a USB4/TB4 dock containing two different PCIe 5.0 M.2 drives. Same two USB4 cables that I’ve been using for my thunderbolt-net tests, same two FWD machines. The Fedora installs on both machines are done using an automated Kickstart file that images them from bare metal, so we can safely say that between the USB networking tests and the USB4 storage IO tests, everything is identical except for software stack (network vs storage) and the actual device on the other end of the cable.

Raw Storage (30+ Gbps)

fio → libaio → Block layer → NVMe driver → PCIe tunnel (hardware DMA) → USB4 PHY → SSD

Networking (11 Gbps)

iperf3 → sockets → TCP/IP/Ethernet → thunderbolt-net driver (software) → USB4 PHY → Remote system

Pretty sure this isn’t a Framework issue, this is a TB software stack issue. It is leaving >60% of potential on the cutting-room floor.

Despite this lacklustre result from thunderbolt-net, TB latency is still better than Ethernet, and TB throughput is better than anything up-to-and-including 10Gbps Ethernet. So USB4/TB is worth using in 2-node Strix Halo clusters for anyone not able to use >=25Gbps networking.

Djip · February 8, 2026, 9:07pm

Great!
Do you happen to have 2 boxes, plug them each into a USB4 port and test read/write on a RAID0 or equivalent to test the aggregate throughput?

May be we have to create something new with this USB4 links… Look like apple did a RDMA network on thunderbolt 5 … (https://appleinsider.com/articles/25/12/20/ai-calculations-on-mac-cluster-gets-a-big-boost-from-new-rdma-support-on-thunderbolt-5)

Peter_Drayton · February 8, 2026, 9:50pm

I don’t have 2x TB5 docks, I only grabbed the one from Amazon as a test. I do have identical multiples of everything else though, but I realistically have no use for a 2nd dock so I am loathe to buy a second.

Topic		Replies	Views
How come no one seems to have tried using Thunderbolt for 40gbps networking? Framework Desktop	44	2345	February 24, 2026
Unable to exceed 2Gbit/s USB4/thunderbolt Framework Desktop	16	603	October 24, 2025
USB4 and Thunderbolt on AMD Framework Laptop 13	57	30934	July 16, 2025
Details about USB, Thunderbolt and dock operation Framework Laptop 13	51	32172	November 16, 2023
Curious Thunderbolt 3 eGPU link speed case (Linux) Linux arch	27	1519	January 3, 2026