Unable to exceed 2Gbit/s USB4/thunderbolt

I’m using a Sonnet Solo10G SFP+ Thunderbolt 3 to 10 Gigabit Ethernet adapter (06:00.0 Ethernet controller: Aquantia Corp. AQtion AQC100S NBase-T/IEEE 802.3an Ethernet Controller [Atlantic 10G] (rev 02))

When plugged into an old intel laptop running the same linux kernel, I have no trouble getting the full 10gbits, but when plugged into the framework desktop, I can’t exceed 2Gbits. I noticed this statement in the kernel logs:

357.664468] pci 0000:06:00.0: 2.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x1 link at 0000:00:01.1 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)
# lspci -s 00:01.1 -vvv
00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Strix/Strix Halo PCIe USB4 Bridge (rev 02) (prog-if 00 [Normal decode])
  <snip>
	Capabilities: [58] Express (v2) Root Port (Slot+), IntMsgNum 0
        <snip>
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L1, Exit Latency L1 <4us
			ClockPM- Surprise- LLActRep+ BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes, LnkDisable- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1
			TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-

Why does the kernel think the PCIe link capability of the USB4 controller maxes out at 2.5GT x1? This seems incredibly anemic for a USB4 bus that supports up to 40Gbit. What’s going on here?

Make sure you are plugged into the correct ports. Not all ports are USB4. The front ports are not USB4 but the rear ports are.

I tried that command and the LnkSta shows x16 for me.

$ sudo lspci -s 00:01.1 -vvv |grep Lnk
                LnkCap: Port #247, Speed 2.5GT/s, Width x1, ASPM L1, Exit Latency L1 <4us
                LnkCtl: ASPM Disabled; RCB 64 bytes, LnkDisable- CommClk-
                LnkSta: Speed 2.5GT/s, Width x16 (overdriven)
                LnkCap2: Supported Link Speeds: 2.5GT/s, Crosslink- Retimer- 2Retimers- DRS-
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
                LnkCtl3: LnkEquIntrruptEn- PerformEqu-

On my machine, it shows x16 capabilities on 00:01.1 when I unplug the thunderbolt cable, but when I plug it in, it shows x1 for both capabilities and status. (boltctl shows the thunderbolt link running at 2x20gbit, and the exact same cable and NIC works fine on an intel box).

Regardless, I don’t really care what lspci shows as long as the NIC can run at wire-speed, but it can’t. Does thunderbolt/USB4 on AMD platforms just suck? Has anyone been able to show thunderbolt PCIe links with reasonable end-to-end bandwidth?

This is using the rear ports, and regardless, if it was using USB3 it wouldn’t even show up in lspci…

@John_Myers, which kernel are you running? I’m running 6.12.48+deb13-amd64, which is the stock Debian 13 kernel.

For what it’s worth, this diff shows the changes before and after plugging the thunderbolt cable in:

 	IOMMU group: 4
 	Bus: primary=00, secondary=60, subordinate=be, sec-latency=0
-	I/O behind bridge: 3000-6fff [size=16K] [16-bit]
+	I/O behind bridge: 0000f000-00000fff [disabled] [32-bit]
 	Memory behind bridge: 80000000-97ffffff [size=384M] [32-bit]
 	Prefetchable memory behind bridge: 2800000000-47ffffffff [size=128G] [32-bit]
-	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
+	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
 	BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
 		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
 	Capabilities: [50] Power Management version 3
@@ -22,21 +22,21 @@
 			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
 			MaxPayload 128 bytes, MaxReadReq 512 bytes
 		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
-		LnkCap:	Port #247, Speed 2.5GT/s, Width x1, ASPM L1, Exit Latency L1 <4us
+		LnkCap:	Port #1, Speed 2.5GT/s, Width x1, ASPM L1, Exit Latency L1 <4us
 			ClockPM- Surprise- LLActRep+ BwNot- ASPMOptComp+
-		LnkCtl:	ASPM Disabled; RCB 64 bytes, LnkDisable- CommClk-
+		LnkCtl:	ASPM Disabled; RCB 64 bytes, LnkDisable- CommClk+
 			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
-		LnkSta:	Speed 2.5GT/s, Width x16 (overdriven)
-			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
+		LnkSta:	Speed 2.5GT/s, Width x1
+			TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
 		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+
 			Slot #0, PowerLimit 0W; Interlock- NoCompl+
 		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet+ CmdCplt- HPIrq+ LinkChg+
 			Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
-		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock-
+		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
 			Changed: MRL- PresDet- LinkState-
 		RootCap: CRSVisible+
 		RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible+
-		RootSta: PME ReqID 0000, PMEStatus- PMEPending-
+		RootSta: PME ReqID 000a, PMEStatus+ PMEPending+
 		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
 			 10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1
 			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
@@ -44,7 +44,7 @@
 			 AtomicOpsCap: Routing+ 32bit+ 64bit+ 128bitCAS-
 		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- ARIFwd-
 			 AtomicOpsCtl: ReqEn- EgressBlck-
-			 IDOReq- IDOCompl- LTR- EmergencyPowerReductionReq-
+			 IDOReq- IDOCompl- LTR+ EmergencyPowerReductionReq-
 			 10BitTagReq- OBFF Disabled, EETLPPrefixBlk-
 		LnkCap2: Supported Link Speeds: 2.5GT/s, Crosslink- Retimer- 2Retimers- DRS-
 		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-

# boltctl
 ● Sonnet Technologies, Inc. Echo 11 Thunderbolt 4 Dock
   ├─ type:          peripheral
   ├─ name:          Echo 11 Thunderbolt 4 Dock
   ├─ vendor:        Sonnet Technologies, Inc.
   ├─ uuid:          a1c38780-00c2-30a8-ffff-ffffffffffff
   ├─ generation:    USB4
   ├─ status:        authorized
   │  ├─ domain:     f43e3804-61fb-d78d-ffff-ffffffffffff
   │  ├─ rx speed:   40 Gb/s = 2 lanes * 20 Gb/s
   │  ├─ tx speed:   40 Gb/s = 2 lanes * 20 Gb/s
   │  └─ authflags:  none
   ├─ authorized:    Mon 20 Oct 2025 04:37:51 AM UTC
   ├─ connected:     Mon 20 Oct 2025 04:37:11 AM UTC
   └─ stored:        no

 ● Sonnet Technologies, Inc Solo 10G SFP+ Thunderbolt 3 Edition
   ├─ type:          peripheral
   ├─ name:          Solo 10G SFP+ Thunderbolt 3 Edition
   ├─ vendor:        Sonnet Technologies, Inc
   ├─ uuid:          cb010000-0090-8518-233d-c8331cc01127
   ├─ generation:    Thunderbolt 3
   ├─ status:        authorized
   │  ├─ domain:     f43e3804-61fb-d78d-ffff-ffffffffffff
   │  ├─ rx speed:   40 Gb/s = 2 lanes * 20 Gb/s
   │  ├─ tx speed:   40 Gb/s = 2 lanes * 20 Gb/s
   │  └─ authflags:  none
   ├─ authorized:    Mon 20 Oct 2025 04:37:59 AM UTC
   ├─ connected:     Mon 20 Oct 2025 04:37:13 AM UTC
   └─ stored:        no

I’m using Fedora 42.
6.16.12-200.fc42.x86_64

This is strange. I wonder whether external USB4 SSD enclosures (like ASM2464 based ones) would hit the same limit.

As far as I can tell from the schematic, the PCIe link between the SoC’s Root Complex and the “PCIe USB4 Bridge” is not real; this data does not leave the SoC and is probably not using real PCIe SERDES. However, the fact that the kernel thinks this “link” is slow might cause things to be throttled unnecessarily elsewhere…On intel platforms (that give me the full 10Gbit) this link is advertised as an x4 link…

Some guidance from Framework folks with access to the SoC datasheets would be helpful here…

1 Like

I have a 10G SFP+ thunderbolt adapter.
I get about 9Gbps using a network test tool. (on my FW16 AMD 7840HS)
The speeds reported in lspci for thunderbolt devices appears to report wrongly.
I see the same 1x 2.5GT/s but actually network capability is faster.

@James3 which kernel are you running? I’ve tried 6.12.48+deb13-amd64, and the latest torvalds kernel (6.17.0), and I can’t get network speeds above 2Gbit on the receive path. I’ve also installed the latest System Firmware from Framework (0.0.3.3), but to no avail….

I’ve just realized that I’m getting different speeds on the transmit and receive:

receiving (NIC to host): 2.06 Gbits/sec
transmitting (host to NIC): 8.66 Gbits/sec

It almost feels like the NIC is the throttling rate of PCIe TLPs towards the host to avoid stressing the “fake” PCIe bottleneck upstream….

After profiling for a bit more, it looks like the kernel is spending most of its time in amd_iommu_iotlb_sync. Perhaps this aquantia atlantic driver does not play well with the AMD IOMMU.

- 99.88% net_rx_action                                              
   - 99.18% __napi_poll                                             
      - 98.59% aq_vec_poll                                          
         - 82.29% aq_ring_rx_fill                                   
            - 78.14% dma_unmap_page_attrs                           
               - iommu_dma_unmap_page                               
                  - 77.52% __iommu_dma_unmap                        
                     - 74.07% amd_iommu_iotlb_sync                  
                        - 73.31% domain_flush_complete              
                           - 73.15% iommu_completion_wait.isra.0    
                              + 65.02% delay_halt                   
                        + 0.56% amd_iommu_domain_flush_pages        
                     + 1.26% __iommu_unmap                          
                       0.89% iommu_dma_free_iova.isra.0             
            + 1.70% dma_map_page_attrs                              
            + 0.82% __alloc_pages_noprof                            
         + 8.02% aq_ring_tx_clean                                   
         + 7.67% aq_ring_rx_clean                                   
   + 0.65% napi_consume_skb                                         

1 Like

Try disabling IOMMU?

I would think that maybe trying to optimize the driver to not need to change iommu mappings on each net_rx_action() call.
A sort of allocate them at module load, and then reuse them, rather than map/unmap them.
Updating the iommu_iotlb is a relatively expensive task.

Alternatively, RTL8159 based usb nics are somewhat available now and they can do 8Gbit iperf on a 10gbit usb3 connection.