Curious Thunderbolt 3 eGPU link speed case (Linux)

I’m facing an issue that was already mentioned on this forum, but discussions are scattered across multiple threads, out of which none treats link speed directly. I also found some mentions of such cases in the Internet, but without clear conclusions.

I’m using Arch Linux with 6.13.1 kernel. Given that (as I mentioned) it seems to be more widespread problem I decided to post on the main forum.

Let’s start from the very root of the problem - I’ve bought AKiTiO Node Titan eGPU case. Upon connection it refuses to negotiate 40Gb/s linkspeed, at least according to boltctl:

 ● AKiTiO Node Titan
   ├─ type:          peripheral
   ├─ name:          Node Titan
   ├─ vendor:        AKiTiO
   ├─ uuid:          -
   ├─ generation:    Thunderbolt 3
   ├─ status:        authorized
   │  ├─ domain:     -
   │  ├─ rx speed:   20 Gb/s = 2 lanes * 10 Gb/s
   │  ├─ tx speed:   20 Gb/s = 2 lanes * 10 Gb/s
   │  └─ authflags:  none
   ├─ authorized:    Sun 02 Feb 2025 12:04:50 PM UTC
   ├─ connected:     Sun 02 Feb 2025 12:04:50 PM UTC
   └─ stored:        Sat 01 Feb 2025 12:22:32 PM UTC
      ├─ policy:     iommu
      └─ key:        no

This is a well battle tested eGPU with known controller:

62:00.0 PCI bridge: Intel Corporation JHL7440 Thunderbolt 3 Bridge [Titan Ridge DD 2018] 

First thing that has drawn my attention is kernel complaining about limited PCI bandwidth due to Thunderbolt bridge running in PCIe 1x mode, however I don’t believe this is the culprit, as nvidia drivers and lspci reports full speed on the card itself (I also see similar logs on different system which negotiates full speed, but more on that later).

So I started experimenting, things I tried so far (with a bit of context):

  1. different cables - I have two TB3 certified cables, one 70cm (27.5 inch) and another 50cm (20 inch) - as cables longer than 1m may actually downgrade the linkspeed to 20Gb/s,
  2. different kernels - because of rolling upgrade distro limitations I tried 6.12 and 6.13, I didn’t go as far as compiling older revisions,
  3. attaching original PSU - as I normally power my laptop via dock integrated with my monitor and it’s rated for 100W only, I made sure to try original 180W PSU
  4. mangling with UEFI settings - there’s no much, but I tried to disable “PCIE Link Power Management” (which apparently affects link parameters when run on battery) and adaptive battery, in case it was triggering some link power management logic
  5. attaching the eGPU to another, Intel system - it negotiated 40GB/sek (2 lanes, 20Gb/s each) without slightest issues
  6. disconnecting battery and letting the system discharge - just to be super sure that hardware didn’t get into some ephemeral, broken state
  7. however it didn’t seem to be connected with linkspeed in any way, I also tried disabling AMD IOMMU as it was reported broken some time ago
  8. both of USB4 capable ports - I also tried replacing the expansion adapters to rule out fault on their side

None of the above helped.

Thinks I still consider trying, but only if I don’t find any other low hanging fruits (or kids and wife let me):

  1. compile kernel 6.8, as I’ve seen reports of folks having full speed on those while they also experienced issues with older revisions
  2. getting and installing Windows 11 to check how it behaves
  3. downgrading UEFI to 3.03, as I run the newest version atm
  4. pray to the gods of hardware so they automagically fix the problem overnight for me

That said, for now I’m stuck and lack of ideas. I may also try to enable / build kernel with more logging from around Thunderbolt and see if it reports something valuable.

Does anyone experience something similar / rootcaused the problem or maybe even found the solution?

That seems to be just a reporting issue. The amd pcie tunnel device reports as pcie1x1 but has way more effective bandwidth than that.

The JHL7440 in my TH4G3 seem to do effectively much over 20gbit on my amd fw13 (but nowhere near as close to 40gbit as the asm controller does) but I don’t recall what it reports in boltctl.

Hi,
I have an asm controller for a nvme enclosure.
I get similar pcie 1x and pcie warning, but using a disk speed test, it is transferring at expected max bandwidth of about 3 GBytes/sec.
So, the pcie 1x and warnings seem bogus.

Assuming you mean GB/s, still you should get close to 4 (3.7+) with a pcie4 ssd on an asm controller and amd host. 2.8-3GB/s is well within the range I get with even first gen intel controllers.

Jup, my ass ran after that red herring for a while last year when it turned out I just need a multi threaded load to get the full bandwidth XD

Thanks for the answers folks. I believe it’s your post that I’ve found about this being red herring, thus my remark that I don’t really believe this is the case here. Especially that speeds on the device itself are reported correctly and if there were inherited from the bridge then they should’ve been capped.

I just tried certified Thunderbolt 4 cables (80cm/31.5inch), as I’ve seen report that it helped to someone from this community, but no luck in my case. I’m having a feeling that this may be somehow driver related.

I’m going to try (1) another distro from liveusb, (2) once I find a bit more time - Windows 11.

If not that then my attention is going to shift to hardware and I’ll be contacting the customer support.

Oh I’ve found something interesting.

I’m editing the whole post, as I was looking at it wrongly. I mixed the domains in tbdump command, here’s what’s really happening:

# tbdump -d 1 -r 0 -a 2 -vv -N2 LANE_ADP_CS_0 
0x0080 0x003c01c0 0b00000000 00111100 00000001 11000000 .<.. LANE_ADP_CS_0  
  [00:07]       0xc0 Next Capability Pointer
  [08:15]        0x1 Capability ID
  [16:19]        0xc Supported Link Speeds
  [20:21]        0x3 Supported Link Widths (SLW)
  [22:23]        0x0 Gen 4 Asymmetric Support (G4AS)
  [26:26]        0x0 CL0s Support
  [27:27]        0x0 CL1 Support
  [28:28]        0x0 CL2 Support
0x0081 0x4828003c 0b01001000 00101000 00000000 00111100 H(.< LANE_ADP_CS_1  
  [00:03]        0xc Target Link Speed → Router shall attempt Gen 3 speed
  [04:05]        0x3 Target Link Width → Establish a Symmetric Link
  [06:07]        0x0 Target Asymmetric Link → Establish Symmetric Link
  [10:10]        0x0 CL0s Enable
  [11:11]        0x0 CL1 Enable
  [12:12]        0x0 CL2 Enable
  [14:14]        0x0 Lane Disable (LD)
  [15:15]        0x0 Lane Bonding (LB)
  [16:19]        0x8 Current Link Speed → Gen 2
  [20:25]        0x2 Negotiated Link Width → Symmetric Link (x2)
  [26:29]        0x2 Adapter State → CL0
  [30:30]        0x1 PM Secondary (PMS)

So it does fall back to Gen2 speed, even though Gen3 is supported (both sides). I’m stuck now.

From what I know it usually means that the cable is too long, but there’s simply no way for that. I’ve tried USB4 cable, two certified TB3 cables and certified TB4 cable. Apparently it may also be a case of too long routes from the port to the controller (in that case CPU as far as I understand), but come on, it works for others… Unless this is faulty hardware.

Ok, different distro test goes as next.

I’ve tried Ubuntu LTS 24.04, running 6.8 kernel, same result - only 20 Gb/s link speed. I may need to experiment with Windows as next, but I have a feeling that it won’t bring any difference.

I didn’t find anything fishy in dyndbg thunderbolt kernel driver logs.

I also triple checked that it runs full 40Gb/s on ThinkPad 14s (Intel system) on the cables I use - it does.