Details about USB, Thunderbolt and dock operation

Hey all, thanks for your replies, kind words and additions. It has indeed taken quite a bit of time to dig through everything and produce this post, but it’s always fun to learn stuff. I had some work to get out of the way the last two weeks, but now I should finally have a bit of time to go through the replies, update my post a bit and maybe even actually pick a dock to buy :-p

However, I just noticed that I can no longer edit my post, I guess that discourse has a time limit on edits… I do think it would be useful to update the post a bit, so maybe the mods could remove this timeout for this post somehow, or otherwise convert it into a wikipost?

1 Like

Thanks for putting it like that. This is how I thought about it, but I wanted to stay a bit more objective in my post :stuck_out_tongue:

[Lanes vs lines]

I had indeed also seen confusing things where e.g. DP uses unidirectional lanes and USB/PCIe use bidirectional lanes. To prevent confusion, I have used “lane” to mean a single differential pair in my post everywhere, which is consistent but indeed not match other sources, and apparently is not sufficient to prevent confusion (as seen from the discussion). I guess I could just prevent the word lane entirely (or maybe only use it with sufficient qualification in relation to specific protocols), and use “line” for a single unidirectional wire pair (instead of “lane” I used for that now) and keep using “full-duplex channel” for a pair of lines.

Cool, I’ll add a bit about this in the post (see below).

Shouldn’t this be Maple Ridge? According to Thunderbolt (interface) - Wikipedia Titan Ridge only does TB3, not TB4?

Thanks, I qualified the line you quoted a bit below.

Ah, I see, that clarifies things a bit. I added some remarks about this.

I am under the impression that the USB4 protocol is designed to just generically support arbitrary tunnels, but actual implementations will be limited by the amount of DP in/out adapters they have. E.g. the USB4 spec says “Each Router contains up to 64 Adapters.” and “A Router may contain one or more DP IN Protocol Adapters, one or more DP OUT Protocol Adapters, or a combination of DP IN and DP OUT Protocol Adapters.”

It also says “A USB4 host shall support DP tunneling. A Host Router shall contain at least one DP IN Adapter and may optionally contain one or more DP OUT Adapters. A USB4 hub shall support DP Tunneling. A USB4 Hub shall contain at least one DP OUT Adapter and may optionally contain one or more DP IN Adapters. A USB4 peripheral device may optionally support DP Tunneling. If a USB4 peripheral device supports DP Tunneling, it shall contain at least one DP Adapter.”, so it seems the USB4 spec only requires a single DP link to be supported (I think it’s interesting that a Hub is required to support a DP out adapter, does that mean it must also have a connector for that? Or is the expectation to be able to route it to DP-alt-mode on (a) downstream USB-C port(s)?)

I also added some info about this below.

I can imagine that bulk USB traffic will fill up any such leftofter bandwidth, but does that also work for PCIe? Doesn’t PCIe also need/negotiate some reserved bandwidth?

Good point, copied your suggestion below. I also copied your list of raw vs encoded bitrates, and added a few more entries below. I did find an error in your USB3.1 bandwidth, which I calculated as 10/132*128 = 9.697 Gbps, but you had 9.846Gbps.

Thanks, fixed MST below and added info about DSC.

Why 10Gbps? If the hub connects to the USB host in Goshen Ridge using USB3 gen2x2, it could use two 10Gbps duplex lines, so 20Gbps upstream bandwidth, right?

Right, so this essentially over-allocates the two links, in the knowledge that the actual bandwidth to be used will fit, even when two HBR3 links will not. I guess this works if the host and dock can agree on this, then the GPU and display can just use two HBR3 links (with added stuffing) and never be the wiser. This does mean that the GPU and display cannot change the resolution to one that needs more bandwidth, so this means that some integration between the GPU driver / display configuration and TB controller in the host is required.

This essentially contradicts what @Ray519 said about bandwidth allocations happening based on the max bandwidth for the negotiated DP bitrate, rather than actual bandwidth, but I guess the “trick” here is that rather than just negotiating the DP link bandwidth over the actual DP channel between the GPU and TB controller (which I guess is how it normally works?), this involves some OS-level integration between the different drivers involved (to communicate about actual bandwidth needed)?

Things to change in my post
Below here, some things to be added / changed in my original post. In addition to changes based on replies, I also added some info on single vs multi-TT hubs that I missed originally.

Edit: I’ve updated the original post, and removed the proposed changes below for clarity

2 Likes

Oh yes, that sounds quite like it is only an Intel limitation. Apparently I ignored / forgot this part back when I read through the spec (way back, before I had devices to test with).

Mhh, I do not know PCIe in detail. But since it is packet based and already supports complex topologies with shared upstream bandwidth, just like USB3, I cannot imagine that the bandwidth can be reserved throughout the whole topology. But I may be wrong on that and it is just not used or just a form of prioritization. But a chipset with 4x PCIe4 lanes uplink and multiple 4x 4.0 downlinks of modern NVMe SSDs worth could not work as they do, if they would actually reserve their max bandwidth on the upstream link.

I currently do not own PCIe-TB devices in order to test this as I have with USB sadly.

Sounds like it. I mostly verified my claims with my Alder Lake + Maple Ridge + Goshen Ridge system with windows (and a bit linux), no Apple stuff at all. I could totally see Apple doing things like this for their own exact display, where they know precisely what it can and cannot do. As Apple refuses MST, they can disregard the whole topic of additional displays appearing on an existing MST-connection, thereby exceeding the actually available bandwidth and running into a problem, the DP protocol probably cannot communicate.

In macOS, 10bpc is required to support the HDR features of the XDR display so that’s why a dual tile HBR3 connection method exists. Dual tile HBR2 can be used for 5K but I don’t know about 6K. Maybe it’s possible - you have to check the EDIDs of both connections to make sure they have the tile info for 6K.

Ice Lake and Tiger Lake have “integrated” Thunderbolt controller(s). I wouldn’t call them Titan Ridge or Maple Ridge which are “discrete” Thunderbolt controllers. They might share some PCI ids but that’s all.

What makes integrated Thunderbolt controllers interesting is that their upstream is not real PCIe so they don’t have a PCIe 3.0 x4 (31.5 Gbps) upstream limit per Thunderbolt bus/controller (two ports per bus). The upstream limit is more like 40 Gbps (unrelated to Thunderbolt’s 40 Gbps - use a benchmark of a software RAID 0 between multiple Thunderbolt ports to discover this limit).

I believe the latter: a USB4 hub’s USB-C ports that support USB4 are expected to also support USB 3.x and DisplayPort Alt Mode.

PCIe is packetized so doesn’t require a minimum to be reserved. I expect USB to be similar? There might be some USB 3.x devices that don’t like having less than 4 Gbps to play with? An XDR display using dual tile HBR3 connection for 6K 60Hz 10bpc only has ≈1 Gbps remaining for USB functionality. Apple describes this case as having a USB 2.0 limit, but I haven’t seen anyone measure that to see if > 480 Mb/s is possible, especially for reading since DisplayPort uses mostly transmit bandwidth. The XDR uses tunnelled PCIe for its USB controller. I guess it doesn’t mind being limited to 1 Gbps even though the slowest PCIe link is PCIe 1.0 x1 = 2 Gbps.

Good catch.

I don’t know of any USB4 hosts (M1, Tiger Lake, Maple Ridge) or peripherals (Goshen Ridge) that support USB 3.2 gen2x2 (20 Gbps).

I think the software just needs to know that the XDR is connected via Thunderbolt 3 at 40 Gbps, and then it can just poke “HBR3” into the DisplayPort link rate register for both of the DisplayPort Out adapters of the Titan Ridge Thunderbolt controller in the XDR display. Everything after that should be automatic. The GPU will see two HBR3 connections to a dual tile display and output the appropriate HBR3 signals to the host Thunderbolt controller’s DisplayPort In adapters which will convert them to tunnelled DisplayPort. I wonder what kind of wonderful things happen if you do that with a couple of HBR3 displays that can exceed 20 Gbps?

Right. This would cause the “exceed 20 Gbps” per connection wonderment. The only guard against this is the fact that only the XDR currently uses this mode and the EDIDs do not describe any modes that would exceed 20 Gbps per tile.

Usually Thunderbolt would negotiate the greatest DisplayPort link rate/width supported by a GPU and display depending on any Thunderbolt DisplayPort links that already exist. This can happen without OS support but I guess the OS has the ability to override the choices made by the Thunderbolt firmware. The chosen link rate/width then affects the range of display modes presented to the user. The link rate/width doesn’t change if you lower the resolution/refresh rate because you want that bandwidth to be there when you increase the resolution/refresh rate.

Here’s some examples:

  1. connect a 5K dual tile display and a 4K display to a Thunderbolt dock. If the 5K dual tile display is discovered first, then the 4K won’t work. If the 4K is discovered first then the 5K can only work at 4K.

  2. connect two HBR3 displays to a Thunderbolt dock. The first one discovered will work at HBR3 link rate. The second can only work at HBR link rate. You can shuffle in a HBR2 display in order to get both HBR3 displays to work at HBR2 link rate but it’s annoying to have to do that everytime you reboot.

Should be 20.625Gbps.

I guess the same place DisplayPort SST-encoding happens - in the GPU part of the chip, before it gets sent to the DisplayPort In Adapter of the integrated Thunderbolt controller.

I’m finally looking to see what dock I should get myself, and now I have need for a shorter summary of all the detail above intended for selecting a dock. So, here’s an attempt at such a summary, once it this is finished I think it could be added to the (other) dock megathread wikipost. If anyone has any additions or corrections, let me know. Also, if anyone has some practical advice to add about when to prefer a TB3 vs a TB4 dock (it seems TB4 advantages are minimal?), that’d be great.


There a few kinds of docks currently available:

  • USB-C (non-Thunderbolt) docks. These are usually cheaper and use (typically) USB3 to connect to the laptop. Essentially these are just a USB3 hub with a bunch of USB devices (USB network card, USB card reader, USB soundcard, etc.).
    • Some docks also connect display outputs as a USB device (e.g. “DisplayLink” is the most common technology for that), but you’ll want to avoid this for anything but very simple office work on Windows (Linux has only closed-source DisplayLink drivers).
    • Some docks connect display outputs using DP alt mode (on the upstream USB-C connection), which uses one of the two lanes (four wires) for a two-lane DP signal, which is routed pretty much directly to a single display output, (or multiple when the dock has a Displayport MST hub builtin). Some docks only support 2.0, so have all wires available for a 4-lane DP signal.
    • Upstream USB3.0 bandwidth is 5Gbps or 10Gbps, and can be 20Gbps (total, full-duplex, before encoding) when you are not using DP-alt-mode. Exact speed depends on the USB version/speed implemented by the dock.
  • Thunderbolt 3 (TB3) docks. These repurpose the wires in the USB-C connector to use the TB3 protocol, and then tunnel PCIe and Displayport over that protocol (and USB3 over PCIe).
    • Display signals are tunneled over the TB3 connection (sharing bandwidth, no dedicated wires). Two independent signals can be tunneled, each of which can be routed to either one or more (using MST) dedicated DisplayPort/HDMI connectors and/or DP-alt-mode on downstream USB-C connectors. One or both signals can also be forwarded entirely (no MST-splitting) to the downstream TB3 connector.
    • These docks might also support DisplayLink (or similar technologies) to stream display data to an USB device (probably for supporting additional outputs), but again you’ll want to avoid this for anything but very simple office work.
    • The downstream TB3 port is required to support DP-alt-mode as well.
    • Older TB3 docks (based on Alpine Ridge) are little less capable (only DP1.2, no DP-altmode support on the upstream port for non-thunderbolt hosts) than the newer (based on Titan Ridge).
    • Upstream bandwidth is 41.25Gbps (total, full-duplex, before encoding).
  • Thunderbolt 4 (TB4) docks. These are very similar to TB3 docks, except:
    • TB4 supports up to 2m passive cables.
    • TB4 supports waking up the host from sleep (e.g. standby) from downstream USB devices (e.g. a keyboard). This probably works through the lower-speed USB2.0 connection. TB3 did not support this, though there were some manufacturer-specific workarounds.
    • TB4 supports 2xDP1.4 25.92Gbps streams (subject to total bandwidth limit), which is the same as newer TB3 docks, but more than older TB3 docks.
    • TB4 requires 32Gbps PCIe instead of 16Gbps and DMA protection, but this is mostly a host requirement, so you’ll get this even when connecting TB3 devices to a TB4 host.
    • Supports multiple downstream TB ports (TB3 supports only daisy-chaining through one downstream port).
    • Ustream bandwidth is slightly lower: 40Gbps (total, full-duplex, before encoding) vs 41.25Gbps for TB3.
    • USB3 traffic is tunneled directly, not inside PCIe (which removes the need for a USB controller driver and might improve performance).
    • TB4 docks (and devices) can also fall back to TB3 (both upstream and downstream), but then they no longer support TB4 features. This makes a TB4 dock more flexible than a TB3 dock: both support TB3 devices, but when you connect USB4/TB4 devices behind a TB3 dock, all will run in TB3-compatibility mode, but behind a TB4 dock all can run using USB4/TB4.
  • USB4 (non-Thunderbolt) docks could technically exist too, but since TB4 is really just USB4 with most optional features made mandatory (and some additional certification), and most of these optional features are already mandatory for USB4 hubs (including even TB3 compatibility), it seems likely that such hubs will just be made TB4-compatible anyway.
  • USB1/2 traffic goes over its own pair of wires, so has its own dedicated 480Mbps of bandwidth (except when using Thunderbolt 3, where it is tunneled over PCIe over TB3) and works pretty much the same across all docks (though integrated USB2 hubs can be single-tt or multi-tt, with multi-tt can be preferable, especially when USB soundcards are involved).

Available bandwidth for displays depends on the connection (all bandwidths after encoding). To see how much bandwidth you need for a specific video mode, see this table on Wikipedia.

  • Full DP alt mode (all four lines/two lanes used for DP, no USB3, e.g. direct connection to monitor): 4x6.48 = 25.92Gbps (DP 1.4/HBR3). The newer DP Alt mode 2.0 could in theory go up to to 4x19.39Gbps = 77.58Gbps (DP2.0/UHBR20), but the Framework does not support this.
  • Half DP alt mode (only two lines/one lane used for DP, the others for USB3 has half of that: 2x6.48 = 12.96Gbps (DP1.4/HBR3).
  • TB3 supports up to 2x20 = 40Gbps, but some protocol overhead has to be subtracted and this is total bandwidth shared between DP, PCIe and USB3 traffic. Also, each of the (max) two DP streams inside is subject to limitations imposed by the tunnel endpoints in the used chips: e.g. 17.28Gbps (DP1.2 / 4xHBR2) for Alpine ridge-based docks, 25.92Gbps (DP1.4 / 4xHBR3) for Titan Ridge-based docks and the Tiger Lake CPU used in the framework laptop.
  • USB4 is very much like TB3 here, except it has slightly lower maximum bandwidth, up to 2x19.39 = 38.78Gbps (40Gbps before encoding). Again, protocol overhead must be subtracted and this is shared between DP, PCIe and USB3. Again, the used chips limit bandwidth (e.g. 2 streams, each 25.92Gbps (DP1.4 / 4xHBR3) for Goshen ridge-based docks and the Tiger Lake CPU used in the framework laptop). In theory, USB4 could support more than two streams, but no current hardware implements this (and given the total bandwidth limit, it seems unlikely to change, especially since MST can support extra outputs if needed).
  • TB4 is the same as USB4, except that it requires that the maximum bandwidth is supported (while USB4 also allows running at half speed).

Furthermore:

  • Display routing and bandwidth allocation within a dock is sometimes complex, especially when MST is supported and/or there are multiple display outputs (either dedicated, or DP-alt-mode on downstream USB-C ports). Unfortunately manufacturers do not seem to provide much details about this.
  • Docks can supply power to the laptop using USB-PD (Power Delivery). This is a property of the USB-C connector and really separate of Thunderbolt support. Thunderbolt does specify some minimum supported power figures, but in practice you can should just see if the dock you want supports the power you need.
  • Cables are not always interchangeable, especially active cables can only be used for the protocol(s) they are designed for. Active TB3 cables need to support TB3 and USB2, while active TB4 cables need to support TB4, USB3 and DP. Passive cables are usually more flexible, but might also be more limited in speed and length.

Specifically about the Framework laptop:

  • The laptop supports USB4. It is intended to become TB4 certified, so it seems safe to assume that it should work with TB3 and TB4 already right now.
  • The tiger lake CPU/GPU supports driving 4 independent displays (including the builtin flat panel). It can generate up to 4 DP1.4 (4xHBR3 each) streams (which can each support multiple displays using MST, still observing a limit of 4 displays in total). Each pair of these streams can be divided among a pair of USB-C ports (probably 2 streams left and 2 streams right), either two streams on one port (using TB3/USB4/TB4), or one stream on both ports (using TB3/USB4/TB4/DP-alt-mode). For maximum bandwidth (8xHBR3), a pair of ports can be combined and connected to a single display (this probably counts as two displays for the total limit).
  • The laptop is designed for 60W+ chargers, supports up to 100W (and also down to 15W, but then it charges slowly and when running might drain even when plugged in).

Summarizing:

  • If you need just a single (non-Thunderbolt) display output and do not need maximum bandwidth for e.g. external harddrives or eGPUs, using a USB-C (non-Thunderbolt) dock can be a good and cheap option. You’ll probably want to avoid DisplayLink-based docks and look for docks that use DP-alt-mode for getting the display signal from the laptop.
  • Otherwise, you’ll want to look for a Thunderbolt dock. TB3 and TB4 docks are very similar and support roughly the same total bandwidth, but see above for some differences (and you probably want to avoid the older Alpine Ridge-based TB3 docks that only support the lower DP1.2 bandwidth).
  • If you have multiple very high resolution displays you might run into bandwidth limits with a single TB3/TB4 connection, and might need to use multiple docks (and/or connecting displays to the laptop directly using USB-C or a DP/HDMI expansion card).

Here my comments for a dock-focused overview.

USB-C Alt Mode Docks: USB Bandwidth can never be 20 Gbps, because then there would be no DP.

TB3 Docks:
I would distinguish old / Alpine Ridge and new / Titan Ridge.
Alpine Ridge is limited to HBR2. I have no experience with Alpine Ridge Docks, but with my Asus Alpine Ridge PCIe Card there is a problem if the monitor and host support HBR3, because they seem not to detect the limitations by the TB controller (even for directly attaching the display, without an actual TB connection in between). This resulted in flickering black screens and the pc endlessly redetecting the display, without me being able to change any of the display settings. So I would be careful getting Alpine Ridge for newer hosts, might be a timebomb, if displays are ever upgraded. Also earlier Alpine Ridge firmware versions prevented HDR.

Titan Ridge on the other hand is backward compatible to USB-C DP Alt Mode hosts, which increases compatibility with other hosts, such as cellphones or cheaper laptops, or AMD based laptops for the time being. Also one hears much less about the docks stopping to work after a while or being as finicky.

TB4:
most things you list are guarantees of a TB4 host, not a dock.
TB4 Docks do not require TB4 hosts with PCIe tunneling. They will also work on USB4 only hosts in the future (so far I am not aware of any TB4 dock using the x1 PCIe port Goshen Ridge provides at most).
TB4 Docks do nothing for DMA protection. In fact they do not need it, because they are not using PCIe (if connected via USB4/TB4). This should allow booting from USB sticks behind TB4 docks, without having to enable non-default and quite insecure options such as PCIe BootROM support behind TB, as was necessary on TB3 hosts in order to get this working. This of course only works with TB4 docks. Sadly my desktop PC has no option to allow booting behind TB4, but this should be distinct from the technical capability.

Wakeup from sleep states means, waking up from USB devices such as keyboard or mice behind TB4 docks. (probably through the USB2 connection separately kept, which TB3 does not have).

My TB4 host will not wake from sleep from keyboard interactions behind my TB3 dock, but will from my TB4 dock. Certain manufacturers have implemented workarounds for TB3, such as my Dell XPS with my WD19TB. I am guessing the wake-on-USB is handled by the dock and communicated via PD in the same way the proprietary power LED and button of the dock is, because there is multiple seconds worth of latency, compared to using the builtin keyboard.

Also, TB4 devices cannot be put behind TB3 devices, because then they operate in TB3 legacy mode, which forces TB4 docks use PCIe instead of USB tunneling again and would prevent such things as waking from standby.
This leads me to recommend TB4 for anything that supports daisy-chaining, because it would be a hard to detect this TB3 limitation later (although it can be exploited to increase USB-bandwidth, which is actually limited to one 10G USB3 link per TB4 port. I did some testing with maple ridge + goshen ridge).

TB4 mandates active cables be backward compatible to USB3 and DP, while TB3 only mandated USB2 support. But the spec excludes fiber-optic cables from that, those do not even need to support USB2 (or power). But passive cables are probably more future proof for DP 2.0 or not yet specific things (I do not know how precisely the active cables work, whether they are protocol aware or only work for specific frequencies…)

A completely different aspect: Tiger Lake supports Adaptive Sync via DP with modern G-Sync / FreeSync displays etc. So far, every dock I have heard about that includes MST Hubs breaks this functionality (raw output via TB-out not affected). On Intel one can still enable Adaptive Sync, but the screen turns black as soon as the GPU actually tries to go below the main refresh rate. Also docks with builtin DP-HDMI converters might be picky with the supported HDMI displays. My Dell WD19TB for example does not like my Ultra-Wide screen display on the builtin HDMI. Exotic monitors are usually not tested for with those docks. If you care about this, one should get a dock that offers the native TB-Outs or raw DP-Ports. If a TB dock currently supports more than 2 displays or HDMI it contains MST-Hubs and adapters that will most likely mess with some exotic functionality.

As I had already foreshadowed, I’ve moved the bulk of the content of my original post to my blog (and made some small revisions along the way) where it can now again be read in a single piece. I’ll also post to this topic when I make significant changes in the future. The full post can be seen here: (...) — USB, Thunderbolt, Displayport & docks

1 Like

Good point, I’ve clarified this.

Thanks, I’ve added some notes.

What part are you referring to here?

Right, though it seems some docks actually connect ethernet via PCIe, and there might also be downstream TB connections that do use PCIe. But now you say this, it actually seems likely that a TB4-capable host can also just implement DMA protection when talking to a TB3 dock, unless there is something in the TB3 protocol that prevents this? Do you happen to know?

What TB3 limitation is in play here, then? AFAIU an USB2 stick would just be connected to the USB2 wires and pretty much talk to the host directly? Or are you talking about a USB3 stick, that would need (potentially insecue) USB-over-PCIe to work at all? Or (as you suggest further on) does TB3 actually not use the USB2 wires at all? I tried finding some spec or documentation, but TB3 is really underdocumented…

Tnx, I added some notes about these things.

I’ve wondered the same. I have the impression that it’s at least frequency and encoding, but I haven’t read anything that’s clear on this subject.

Ah, interesting. A quick google for “adaptive sync MST” shows that some MST hubs advertise explicit support for adaptive sync, but I can imagine that older/other ones indeed break this. I’ve added a note to the original post, seems to much detail for the summary.

Good point, also added a note to the original post.

This: OWC miniStack STX Review: The Perfect Partner for the Mac mini | PetaPixel ?

Speed and Performance

The company claims the miniStack STX can run at 770 MB/s of “real” performance through the SSD drive and my own testing can confirm those numbers are pretty accurate. This means even if the drives connected inside of the miniStack STX are capable of faster performance, due to the limitations of the Thunderbolt PCI-E channel allocations, the max read/write speeds will be limited to about 770 MB/s. It would be amazing to be able to get peak performance out of the drives installed, but the 770 MB/s speeds are still more than enough for the kinds of work photographers and even many videographers are doing.

It’s an optional standard, take a look here: https://www.reddit.com/r/UsbCHardware/comments/sjpm0r/can_thunderbolt_4_use_usb_32_gen_2x2_to_its_full/ | USB 3.2 Gen 2x2 on Thunderbolt 4 - Apple Community

I recently learned about two docks that (AFAIU) use the PCIe x1 port to attach a 2.5Gbit ethernet adapter. See this blog for an excellent teardown of both:

1 Like
  • There exist some USB-C docks that only support USB 2.0 (480 Mb/s) so that they can have 4 lanes (8 wires) of DisplayPort.
  • Using or not using DP-alt-mode does not affect USB speed for a USB-C dock since the 4 lines (8 wires) are divided between USB and DisplayPort so that they don’t affect each other (unlike Thunderbolt/USB4 where DisplayPort and USB share bandwidth and high DisplayPort bandwidth can lower USB bandwidth).
  • USB-C displays are a kind of USB-C dock. Some displays can switch between 4 lanes of DisplayPort + USB 2.0 and 2 lanes of DisplayPort + USB 3.0 or USB 3.1 gen 2.
  • A Titan Ridge or Goshen Ridge dock will usually limit all their capabilities to USB devices only so that all the devices can easily work with a non-Thunderbolt host. this means they share a 10 Gbps USB connection.
  • Alpine Ridge based TB3 docks will use their 4 PCIe lanes for various types of controllers (USB, Ethernet, SATA, FireWire, etc.), For example, the CalDigit TS3+ has 4 USB controllers allowing ≈22 Gbps of USB data instead of just 10 Gbps (but the TS3+ only has one port that can do full 10 Gbps - the downstream Thunderbolt 4 port; it has another 10 Gbps port but it’s limited by 8 Gbps PCIe; the other ports are limited by 4 Gbps PCIe).
  • 25.92 Gbps. The limit is 40Gbps for Thunderbolt but usually the limit for DisplayPort in Thunderbolt is 34.56 Gbps. I can’t think of a way to get more than 34.56 Gbps of DisplayPort except Apple’s method of doing two HBR3 x4 links for the XDR.
  • An MST hub may be a more efficient method of dividing DisplayPort bandwidth than Thunderbolt. With Thunderbolt, you have to dedicate two DisplayPort links with fixed link rate/link width. With MST, there’s only one DisplayPort link but you can have multiple streams that share the link. Also, an MST hub can use DSC compression on the input and decompress the output for displays that don’t support DSC.
  • This is a property of the host, not dock. There’s a few TB3 hosts (Macs / PCs) that are limited to 16 Gbps PCIe. All TB4 hosts have full PCIe bandwidth (at least up to what the Thunderbolt connection supports per port which is something like 22 to 24 or 25 Gbps).
  • The host always has a USB controller. The TB4 dock has a USB controller which gets used when the upstream device or host is Thunderbolt 3.
  • M1 Macs have a slower than expected USB controller, so you may want to separate a TB4 dock from the TB4 host using a TB3 device to disable USB tunnelling. I haven’t seen benchmarks for USB tunnelling from other TB4 hosts such as Tiger Lake.
  • This is not true for TB3. A TB3 dock may have a USB Billboard Device that can notify the OS when it is connected to a non-Thunderbolt port.

The Dell UP3218K (8K60) and Acer XV273K (4K144) are examples of displays that can use dual HBR3 x4 connections (they are dual tile displays). Now that DSC exists, we probably won’t be seeing tiled displays like those anymore. The LG UltraFine 5K and UltraWide HDR 5K2K displays have tiled modes over Thunderbolt. All the displays have single tile modes. The Apple Studio Display (5K60) and XDR display (6K60) are duel tiled displays (using Thunderbolt) but also have a single tile mode that can use DSC.

1 Like

Seems not. My Dell WD19TB with Titan Ridge uses its internal USB2 Controller. USB2 over wire is only used in USB-C Backcompat mode.

Those docks would then loose functionality on pure USB4 ports. Great to see Lenovo going the extra mile and also providing a USB ethernet controller for this case.
I meant that, as long as the docks do not use PCIe internally, all features can work the same on a USB4 host without PCIe tunneling, or even via USB3/USB-C (with slower display connections).

I have a Samsung 970 Evo in a USB3 10G enclosure that can do 800MB/s, so this speed alone would also be possible over USB3. But it would be smarter to use PCIe, because than that bandwidth is not directly shared with the SATA drive / other USB ports. Since they state SATA works also over USB-A, but NVMe not, it looks like PCIe though.

So, even my Titan Ridge Notebook, that launched with the security levels and explicit confirmation, was silently upgraded to relying on DMA protection. First, on Windows 10, it connected the Titan Ridge Dock as “Allow Once” automatically and with the Windows 11 update, these security levels are now gone, titan ridge just seems to be trusted (at least if it is not using PCIe) and always recognized, even before login, same for the TB3 dock on TB4 host.

I have no idea how secure Intels DMA protection actually is. From reading other sources, it seems like the IO-Groups that are protected from each other and limited in memory address ranges are rather coarse, so could include other devices / drivers and do include other linux kernel data that lands in the same page by accident (no idea whether Windows is smart enough to align its data so that that does not happed). And I do not know whether a Titan Ridge controller could be manipulated enough in firmware to do malicious DMA accesses from the trusted USB controller.

But PCIe BootROMs should be even more dangerous because they can inject code, instead of being just an optional boot target selected by BIOS or user. Although Secure Boot should validate the signature of the BootROM and measured boot via TPM should detect injections in the boot process and refuse handing out keys such as bitlocker.

TL;DR:
Maple, Goshen, Titan (dock-version) seem to have the same USB3 controllers. When acting as PCIe UBS3 controller, using 2 TB-Outs (or 1 TB-Out + 1 USB3 10G) I could achieve 650MB/s + 300MB/s on USB drives that can each sustain 800MB/s and 400MB/s on their own. Goshen Ridge was a perfect USB3 hub, distributing the 800MB/s between both drives if attached via USB4/TB4. Great, compared to the USB3 hubs inside my WD19TB, which fell to <300MB/s for each drive when used simultaneously. Full bandwidth could be reached when attaching one drive via USB4 and the other via PCIe/Titan Ridge USB3 (behind USB4)

Qestion, if I have a Thunderbolt 4 dock and a Thunderbolt 3 eGPU does the Thunderbolt bandwidth get divided between each of the controllers (each side of the motherboard) or only if multiple Thunderbolt connected to the same controller (same side)?

Each controller can sustain the full TB connection speed for each port.
But that alone does not do anything. Relevant for all applications is the performance of the connections tunneled through TB.

I believe that the lowest limit is usually not what the controllers can route, but the connections to the rest of the system (USB2, PCIe, DP). Although there have been reports of performance issues / stuttering when using the older TB3 controllers both as PCIe bridge in an eGPU enclosure as well as USB controller for further peripherals (both need to be transferred via PCIe though, so it might not even have been a TB issue).

I do not know / have no measurements, whether Tiger Lake can actually support the full bandwidth of two 4x PCIe 3.0 connections to each TB controller simultaneously. It could potentially achieve even higher throughput, as these controllers are on-die and there might not actually be a reason to strictly limit this connection to the same speed as a physical PCIe connection.

But if you are using a Dock (presumably to attach monitors) and eGPU at the same time, the eGPU will have to transfer every frame through PCIe tunneled through TB back to the CPU/iGPU which will then output those frames via DP through the other TB controller.

So no matter where you attach the dock, you further limit the already limited PCIe bandwidth between eGPU and CPU, by not using the display outputs of a GPU. How much bandwidth is used for this depends solely on resolution and frame rate of the output calculated by the eGPU. USB2 is handled distinctly with USB4/TB4 so USB2 usage on the dock should not change anything. USB3 usage on the same TB controller will probably go through PCIe just as with the discrete TB controllers. @all can anybody confirm whether USB3 devices on Tiger Lakes USB4 ports appear on the system-wide USB3 controller or on the TB controllers own USB3 controller just like with Maple Ridge? (with USBTreeView for example)

You can look at the many benchmarks available looking into performance loss when using the integrated display of a notebook with eGPU instead of the direct display output of the eGPU.

1 Like

Thanks for your replies again, I’ve updated both my posts with some additional notes based on your info.

As an overall observation while reading these replies, I realized that bandwidth and output number limitations with all this tunneling really mostly come from the hardware interfaces present on the controllers/routers used (e.g. dual 4xHBR2 interfaces on Alpine Ridge, dual 4xHBR3 interfaces on Titan Ridge). These limitations are mostly linked to the hardware used, not so much limited by the protocol (though the protocols do seem to typically specify minimum bandwidth/features for each port).

I suspect that also the PCIe bandwidth limits (often documented as 16Gbps for TB3 and 32Gbps for TB4) and the 10Gbps vs 20Gbps USB3 tunneling specified for USB4 is really not a protocol limitation, but a limitation of the interfaces on (currently available) hardware (or in the case of USB, maybe on the USB controller integrated into the TB controller).

Understanding this really helps to make sense of the otherwise seemingly random limitations on the number of outputs and bandwidth, so I added a section dedicated to explaining this to my blogpost (“Understanding tunneling and bandwidth limitations”)

I’m not sure I follow this. Where does this 22Gbps limit come from? It sounds like all USB goes through PCIe, so is this the total PCIe bandwidth? But I thought that as only 16Gbps for TB3? Also, why this 8/4Gbps limits? Does Alpine Ridge have HW interface for 4 lanes of PCIe, one at 8Gbps and three at 4Gbps or so?

Isn’t this also a property of the protocol? I.e. if you connect a TB3 dock to a TB4 port, you’d be limited to 16Gbps PCIe whatever your connect downstream? Or can TB3 also happily tunnel 32Gbps (or 22/24/25 as you say), but it’s just that TB3 controllers currently (and probably forever) have just 16Gbps (PCIe 3.0 x2?) uplink to the CPU?

Wow, when I thought this couldn’t get more complex in terms of options you have, then there’s this :-p Makes sense, though I’m not sure where and if at all I should put this in my post…

This is rather unexpected, but I guess it might be because TB3 evolved from TB1/2 over a DP connector without USB 2.0 wires.

I read that and was slightly confused. Duplicating functionality like that seems wasteful and a recipe for problems even, and I wondered why not just use USB-connected 2.5Gbps ethernet (USB has enough bandwidth), but I guess this might boil down to hardware availability (PCIe allows using any existing PCIe ethernet chip), and/or bandwidth considerations (PCIe might be more efficient, and does not share bandwidth with other USB devices).

So IIUC, what you’re saying means that DMA protection is even possible with TB3, it’s just that TB4 requires it for hosts (and software) to be TB4 certified? And that means that it is really irrelevant when deciding what dock to get for a Framework laptop since you’ll get the protection in any case?

Doesn’t TB support routing DP data from the eGPU directly to a display attached to the same dock (or even somewhere in the same TB tree), without going through the host? I cannot find anything definitive on this, but skimming the USB4 spec I do see that docks/hubs can contain DP IN adapters (which would allow attaching the eGPU DP output to the TB network), and I cannot find any explicit restriction that all tunnels should either start or end at the host (though all examples I can find do involve the host)…

This is probably only the certification threshold for TB3, as there existed TB3 controllers with only PCIe x2 interface (and only 1 DP input). In addition there are a few TB3 devices, that only attached x2 PCIe lanes, because of chipset limitations. The host controllers with 2 DP inputs all had PCIe 3.0 x4 input (Solution Briefs | Thunderbolt Technology Community).

TB4 only raised the certification threshold to the full bandwidth.

Yes. I believe DMA support only depends on the capabilities of the IOMMU, the groups to which IOMMU protection is applied (they seem to not be granular enough to protect every PCIe device separately, but the motherboard / BIOS groups some PCIe devices together with shared IOMMU protection). And all core drivers for the platform must be compliant with DMA protection, as in, that they support more modern structures, where the OS can separate the different drivers into different memory regions mapped differently/not at all for each IOMMU group.

I do not know whether reversed flow direction is supported here (from eGPU back up the topology, possibly through the host controller).
But the classic eGPU enclosures have just a normal GPU with all native GPU ports exposed out the back. Getting access to DP would require those DP GPU outputs be somehow wired back to the TB controller, which I have never seen claimed by any reviewer or producer, even with eGPUs with integrated GPU instead of regular PCIe slotted desktop GPUs.

A conventional eGPU will be handled the same as integrated hybrid graphics. Windows will copy the frame-buffers from the calculating GPU (possibly only a window, not entire desktop) to the outputting GPU. This takes up PCIe bandwidth (not tested by me, but I do not see why they would develop an entirely different solution. Also this explains the performance loss when not attaching displays to the eGPU, basically the same as with integrated hybrid graphics on a narrow PCIe connection).

In M1, all Thunderbolt ports are a separate Thunderbolt controller. In Alpine Ridge, Titan Ridge, Maple Ridge, Ice Lake, and Tiger Lake, a Thunderbolt controller has two ports (usually two per side of a laptop in the case of Ice Lake and Tiger Lake). Each Thunderbolt controller has two DisplayPort In Adapters, so in Ice Lake and Tiger Lake, you can only connect two displays per side.

With integrated Thunderbolt implementations such as M1, Ice Lake, and Tiger Lake, I believe there’s no division of PCIe bandwidth because the upstream of the Thunderbolt controller isn’t really PCIe. So you can software RAID-0 two Thunderbolt NVMe together and get the same forty-something Gbps bandwidth from any two Thunderbolt ports but you can’t get 22Gbps from all ports at the same time (at least in the case of Ice Lake - I haven’t seen benchmarks for M1 or Tiger Lake).

In your case, Instead of two Thunderbolt NVMe, you want to connect a dock and an eGPU. Like I said above, you won’t see a difference no matter which two ports you select. If you look at Device Manager in Windows and view by Connection Type (or use HWiNFO), you’ll see that each Thunderbolt port is a separate PCIe root port (but it’s not really PCIe so you can ignore the ridiculous/inaccurate/meaningless PCIe 1.0 2.5 GT/s link speed).

You’ll definitely get better performance if the display you are outputting to is connected directly to the GPU in the eGPU enclosure. See the eGPU.io website for benchmarks and info.

All eGPUs are Thunderbolt 3 (Alpine Ridge or Titan Ridge) so their USB is controlled by their USB controllers which will use tunnelled PCIe. I don’t think I’ve seen a tunnelled USB on Tiger Lake (using a Thunderbolt 4 dock/hub) so a screenshot of Device Manager or HWiNFO or USBTreeView would be interesting.

The dual HBR3 problem with Titan Ridge is similar to the dual HBR2 problem with Falcon Ridge (Thunderbolt 2). Alpine Ridge is easy since it can only do HBR2 and two HBR2 can’t exceed 40 Gbps. But if you connect with a 20 Gbps cable, then you have the Falcon Ridge problem.

It’s a limit of PCIe data through Thunderbolt. 22Gbps is one of the earliest numbers given by Intel in their marketing materials. It is 2750 MB/s which is a number often seen on the product pages for Thunderbolt NVMe storage devices. This is sometimes rounded up to 2800 MB/s. There have been benchmarks that reach 3000 MB/s (24 Gbps) and sometimes exceed that so the max is something less than 25 Gbps. The difference between 22 and 25 Gbps may be a setting in the firmware of the Thunderbolt peripheral. For example, eGPU enclosures had unexpected lower H2D or D2H bandwidth (host to device or device to host) benchmarks as measured by CL!ng.app in macOS or AIDA64 GPGPU Benchmark or CUDA-Z in Windows. A firmware update of the eGPU enclosure would often correct this. But in the USB4 spec there’s mention of a tradeoff between bandwidth and latency in regards to PCIe and USB tunnelling. So maybe the eGPUs with the lower bandwidth were tuned more for latency?

A Thunderbolt 3 host controller (Alpine Ridge or Titan Ridge) has a max upstream link of PCIe 3.0 x4 (31.5 Gbps) same as Maple Ridge (Ice Lake and Tiger Lake don’t have this limit since they are integrated Thunderbolt controllers). The host controller is a PCIe device, so really, you can connect it to any PCIe connection all the way down to PCIe 1.0 x1 (2 Gbps). Some PC laptops used PCIe 3.0 x2 (15.75 Gbps) (some Mac laptops too) or PCIe 2.0 x4 (16 Gbps) which saves power or lanes. Regardless of the PCIe limit, the Thunderbolt controller can still do 34.56Gbps of DisplayPort if it has two DisplayPort connections to the GPU so it’s not a complete waste.

There’s two Fresco Logic FL1100 USB controllers which each use a PCIe 2.0 x1 connection (5 GT/s * 8b/10b = 4 Gbps PCIe, or 5 Gbps * 8b/10b = 4 Gbps USB). The ASMedia ASM1142 USB controller uses a PCIe 3.0 x1 connection (8 GT/s * 128b/130b = 7.877 Gbps). The USB controller in the Alpine Ridge Thunderbolt controller is not limited by PCIe (10 Gbps * 128b/132b = 9.7 Gbps USB). Actually, in regards to the FL1100, although both PCIe and USB are limited to 4 Gbps, the extra PCIe overhead compared to USB overhead makes it slower than USB 3.0 from the ASM1142 or Alpine Ridge. In regards to the differences between 5 and 10 Gbps USB, the 10 Gbps USB is more than twice as fast because it uses a more efficient encoding.

Alpine Ridge and Titan Ridge in a Thunderbolt peripheral have 4 PCIe 3.0 downstream lanes. They can be divided by the manufacture as x1x1x1, x2x1x1, x2x2, or x4. The TS3+ uses x1x1x1x1 and the fourth PCIe lane to the Alpine Ridge Thunderbolt controller is used by an Intel Gigabit Ethernet controller. A PCIe device can connect using any link rate up to the max supported by the PCIe device and the root port or downstream bridge device it is connected to. You can change the link rate while they are connected (using pciutils) so you can simulate a lower speed device if you like. On a Mac Pro 2008, a PCIe 3.0 device installed in a PCIe 2.0 slot will boot as PCIe 1.0 link rate but you can set it to PCIe 2.0 using pciutils.

Maybe there are situations where having more USB controllers is beneficial. It depends on the capabilities of the USB controller that would be used for USB tunnelling compared to the USB controller in the downstream Thunderbolt peripheral.

I think it’s just following the USB-C Alt Mode spec.

The Thunderbolt 3 device doesn’t do USB so it shouldn’t try to do USB except to provide a USB 2.0 billboard device in case the USB-C port can’t switch from USB mode to Thunderbolt alt mode. Windows will see the billboard device and display a message. The same may exist in USB-C to DisplayPort adapters and cables.

The only eGPUs that use the DisplayPort In Adapter of their Thunderbolt controller are the Blackmagic eGPUs and the Sonnet eGPU Breakaway Puck RX 5500 XT/5700. I don’t know if these work with Windows. I don’t know if the DisplayPort from the eGPU can be tunnelled upstream or only downstream (I guess it’s a decision made by Apple’s software Thunderbolt connection manager). I don’t know if the DisplayPort from the host Thunderbolt controller can be tunnelled downstream of the eGPU such that there are 4 tunnelled DisplayPort connections downstream of the eGPU.

I do know one DisplayPort tunnelling trick where the DisplayPort from one host is tunnelled to the DisplayPort of another host’s Thunderbolt controller. This is called Thunderbolt Target Display Mode which allows an old Thunderbolt iMac (1440p) to be a display for another Mac. So it seems you can have software that pokes some arbitrary DisplayPort paths even across domains. The iMac though has a switch to move its display connection from its GPU to a DisplayPort Out Adapter of its Thunderbolt controller so it can receive the tunnelled DisplayPort. I don’t know if the DisplayPort tunnelling path could extend deeper into the iMac’s Thunderbolt domain.

Not what I was asking, I was already aware of that. And Tiger Lake should very much support tunneling of USB3 through USB4, as that is precisely what the USB4 spec demands. What I meant was the following: Maple, Alpine and Titan Ridge host controllers are their own USB3-PCIe controllers. If you directly attach a USB3 device to one of those, it appears on the TB controllers internal USB3 controller. But Titan Ridge and Maple Ridge forward USB2 signals. (See all Titan Ridge AICs having an additional USB2 input connected to the motherboard). Consequently USB2 devices attached to my Titan Ridge notebook and my Maple Ridge desktop appear not on the TB controllers own USB controller, as they did with Alpine Ridge or my Titan Ridge dock, but on the chipsets controller. When using USB4 hubs on Maple Ridge, nothing changes, USB2 devices appear an the chipset controller, USB3 devices on the TB controller. Only my Alpine Ridge AIC uses its own USB3 controller also for USB2 devices.

My question was, due to Tiger Lake having on-die TB controllers whether USB2 and 3 are handled just as with Maple Ridge, in that every TB controller has its own USB3 controller, plus the chipset having its own (3rd) hard wired USB3 controller on the same die. Is USB2 still forwarded to the chipsets’ own controller?. This would give an indication, of how closely related the implementations actually are, because I am assuming that the reason for still having dual-port TB controllers is also simply because that is what they were going to use for the dedicated controllers and so they can share more HW and just strip away the physical PCIe parts, because backwards compatibility and configurable lanes are not needed on-die.
Then I would presume, that although the uplink might not be limited to 32 GBit/s, the TB controller itself is most likely still somewhat PCIe bandwidth limited, because it never had a need to achieve much higher than 32 GBit/s throughput. Apple showcases, that if you design a TB controller only for on-die purposes, there is no need in doing dual-port controllers, because every subcomponent is either per-port anyway, or can be combined into one component for the whole die, or whatever amount is still feasible.

Is that directed at me?

Yellow is Maple Ridge USB-Controller, Green is Alder Lake Chipset USB-Controller.
Blue is a USB2 stick, Magenta a USB3 stick.
Both sticks are attached the USB3-A ports of the CalDigit ElementHub behind Maple Ridge in the screenshot (physical ports right next to each other). The cursor highlight shows that USBTreeView detects the paired USB2 and USB3 ports of the CalDigit Hub, even though the billboard device only shows up on USB2.
As you can see, the USB2 stick shows up on the chipset controller in both cases (I am only allowed one picture per post, so the other one showing both attached to the chipset via front USB3-A ports is missing).
I speculated previously, that this is at least in part to achieve the energy efficient waking up from keyboard and mouse etc. as the TB controller needs no active PCIe connection to forward USB2 data to the chipset and the existing wake-on-USB HW inside the chipset can handle it in basically the same way as without TB.

When connecting my WD19TB to a simple USB3 port, a USB2 device behind the dock shows up on the USB2 hierarchy of the chipset and not the USB3 hierarchy, so presumably Titan Ridge can still forward USB2 signals in back-compat mode, just not use them in TB3 mode. As to why USB2 lines remain unused in TB3, I do not know. But since it is optional for TB4 fiber cables to even support USB2, it might have just been to simplify the different possible configurations with different types of TB cables. Or like Matthiijs speculated, due to TB’s history.

I think the dual port decision was based more on the number of available DisplayPort connections in relation to the number of Thunderbolt ports. The Tiger Lake GPU only has 4 possible DisplayPort connections. The simplest method of allowing two DisplayPort connections for a single Thunderbolt port (a requirement of Thunderbolt 4) when there are four Thunderbolt ports is to pair two Thunderbolt ports to a controller which has two of the DisplayPort connections. Then the controller can output the two DisplayPort connections to the same port (to meet the Thunderbolt 4 requirement) or to different ports. Intel would have needed to add more complexity to get 4 DisplayPort connections to a single side of the laptop. Apple’s Mac Pro 2019 has a lot of DisplayPort muxes to get DisplayPort from the two MPX module slots (4 DisplayPort connections per slot) to the two built-in Thunderbolt controllers (two DisplayPort connections each).

M1 Macs are weird - For example, the Mac Studio can have 8 DisplayPort connections, two per Thunderbolt port, but can only connect 4 displays (up to two displays per Thunderbolt port to meet the Thunderbolt 4 requirement) - the extra DisplayPort connections can only be used for tiled displays like the LG UltraFine 5K or Dell UP2715K. I don’t know if there’s a 9th DisplayPort connection for the HDMI port.

There is a limit but it’s not exactly PCIe since none of the upstream is real PCIe. There’s the tunnelled PCIe limit and there’s whatever limit that exists in the Tiger Lake CPU. The TB controller needs to allow the PCIe bandwidth of at least one port (22-25Gbps). Ice Lake can do around 38 Gbps from all ports
https://egpu.io/forums/laptop-computing/ice-lake-cpu-on-die-thunderbolt-3-controller-bandwidth/paged/2/
(I think ATTO multi-disk non-raid test would be better than AJA in macOS, or AmorphousDiskMark for software RAID 0)

For a Tiger Lake laptop, the following link shows PCI device:
https://egpu.io/forums/builds/2021-framework-11th4cg-2x-3090-2070-1080ti-32gbps-2x-razer-core-x-agb-3090-agb-2070-win10-21h1/

  • There’s a separate root port for each Thunderbolt port. Since the Thunderbolt controller is integrated, the root ports don’t have an upstream PCIe device that is limited to 8 GT/s x4.
  • There’s a USB 3.x controller specifically for the Thunderbolt ports but the question about USB 2.0 for Tiger Lake is not answered here.
  • I wonder how fast a software raid of 4 USB NVMe devices can get when connected to the Tiger Lake Thunderbolt USB controller?
  • The Thunderbolt NHI #0 and #1 are the two Thunderbolt controllers responsible for setting up paths for PCIe and DisplayPort tunnelling and for doing Thunderbolt networking (using DMA tunnelling).
    An image from Device Manager or HWiNFO would be best because they include both PCIe and USB devices (and other devices). ioreg or IORegistryExplorer.app in macOS are similar but also include drivers (so does Device Manager but not in the connection view).

Right, the Thunderbolt 3 device I was referring to uses Alpine Ridge. Titan Ridge (like Goshen Ridge) peripherals usually can connect as USB 3.x with DisplayPort Alt Mode devices to non-Thunderbolt USB-C ports. And with USB 3.x there is usually always also a USB 2.x connection and this connection may be made with a different USB controller than the the Thunderbolt USB controller that is used for the USB 3.x connection.

1 Like