What PCIE network card for 40+ GB/s

Hi @James3, I took your suggestion and modified things to run a variable # of the tests in parallel. Same results, but that does support there being a hard limit somewhere in the software or hardware stack.

My tests are doing bidi (tx+rx) on both USB4 ports (thunderbolt0 and thunderbolt1). Running 1 instance, we see a little over 10Gbps on each port and in each direction, around 42-45Gbps total across both USB4s. Scaling up the # of instances 1→2→4→8→etc the individual throughput of each test drops proportional to the # of instances, and the total aggregate bandwidth stays at around the same level of 42-45Gbps.

pdrayton@fwd1:~$ ./run_multiple.sh 1
Launching 1 test instances…
Aggregated total for 1 instances (2 total links): 45.73 Gbps (Tx+Rx)
pdrayton@fwd1:~$ ./run_multiple.sh 2
Launching 2 test instances…
Aggregated total for 2 instances (4 total links): 45.72 Gbps (Tx+Rx)
pdrayton@fwd1:~$ ./run_multiple.sh 4
Launching 4 test instances…
Aggregated total for 4 instances (8 total links): 45.72 Gbps (Tx+Rx)
pdrayton@fwd1:~$ ./run_multiple.sh 8
Launching 8 test instances…
Aggregated total for 8 instances (16 total links): 45.74 Gbps (Tx+Rx)
pdrayton@fwd1:~$ ./run_multiple.sh 16
Launching 16 test instances…
Aggregated total for 16 instances (32 total links): 45.82 Gbps (Tx+Rx)
pdrayton@fwd1:~$ ./run_multiple.sh 32
Launching 32 test instances…
Aggregated total for 32 instances (64 total links): 45.74 Gbps (Tx+Rx)

Reports are that other platforms’ USB4 does not have this issue, is seems to be a Strix Halo failing. I’ve not verified this myself yet, but I will eventually get around to testing it with two Nvidia GB10 units.