Details about USB, Thunderbolt and dock operation

D.H · February 27, 2022, 3:23pm

Wow, that has to be one of the most thorough writeups of the various layers of technologies that have been stacked up over the years to make USB the confusing ecosystem it is.

As to the Framework, I and several others are using it successfully with TB3 eGPU enclosures. I also have a USB3 dock that does both DisplayLink and DisplayPort to HDMI. I can plug in both the eGPU and the dock to separate USB-C ports on the Framework simultaneously and almost everything works as expected in EndeavorOS + plasma + X11 session + nvidia-dkms drivers. The only thing I haven’t gotten working is DisplayPort to HDMI through the dock, but that’s probably me not taking enough time to nail down the right X11 config for that, and not really caring enough to fix it.

You are quite right about DisplayLink being a flaming pile of garbage. It does work under Linux and certain Android phones, but only If you were to display a very static image, like a slide deck without animations or a spreadsheet. Don’t even attempt web browsing, as the tearing and blockiness will make it basically unusable, let alone any kind of video over that protocol.

Be_Far · February 28, 2022, 3:22pm

This is the most thorough and complete explanation of protocols and connectors I’ve ever seen, regardless of the hardware they’re going to be connected to. Thank you so much and I’m bookmarking this for all future usage when I need to explain a connector or protocol, I can’t imagine the amount of documentation you had to read through for this!

andyk2 · February 28, 2022, 3:25pm

This is an absolutely fantastic resource - thank you for sharing it, and for the clearly significant amount of time you put into it!

I just wish manufacturers of USB-C and Thunderbolt peripherals would be more upfront about what they are selling and clearly state exactly what they do and what compatibility they have Hopefully that’s something LTT Labs and Gamers Nexus will be able to start to expose to the community and in good time hopefully improve life for all of us who just want to know what works with what!

Fraoch · February 28, 2022, 9:08pm

Pinned! Well worth the read. Thanks!

Project_sbc · March 5, 2022, 2:25am

Great write up! Learned some stuff myself here.

Here is my addition on cpus, and thunderbolt 4.

So before tiger lake and ice lake cpus, computer manufacturers would have to implement their own designs for incorporating a thunderbolt controller. This meant routing high speed differential pcie traces to an expensive controller. It also meant some effort of system integration to make it work with their design. This is what made it a premium feature with a premium price tag.

What sets tiger and ice lake apart is that the controller got integrated into the cpu dye. For ice lake, thunderbolt 3, and tiger lake thunderbolt 4. This brought down power consumption and improved some latency on the processing, which is a win win for everyone.

So tiger lake has a titan ridge thunderbolt controller on the cpu dye. This controller converts 4 lanes pcie3.0 to thunderbolt 4 at up to 4 ports. It is a shared bandwidth, so using all 4 will degrade performance across all devices.

On another note, frame work calls it usb4, which it is, but it is really thunderbolt 4. Thunderbolt 4 and usb 4 are not explicitly overlapping. I’ve heard more and more people say that usb4 is eGPU compatible. Well… not quite. It does call out pcie tunneling in the usb4 spec. I’m not sure if it’s their own implementation and we have yet to see silicon to make this happen or it’s borrowed from thunderbolt.

So these CPU’s make it so little effort is required to get thunderbolt out. A timing circuit and usb c controller that’s thunderbolt compatible is all that’s left. That’s why so many thunderbolt enabled laptops have surfaced.

Ray519 · March 6, 2022, 3:26pm

A few remarks/confirmations/additions:

USB4 has 20/40GBit/s before encoding, while TB3 actually had 20/40GBit/s after encoding, so actually ran at 41.25 GBit/s (called TB3 legacy speeds in USB4 spec). Whether USB4 makes this up somewhere in the protocol and whether TB4 will only use USB4 speeds I do not know. In practice it is hard to come close to that bandwidth, so it has not mattered so far.

Correct. In TB3 (also TB4 Hub in TB3 mode) the dock will have a PCIe/USB Controller (For USB3 and USB2). In USB4/TB4 with TB4 docks, this just looks like a USB-Hub to the host. The Host TB4 controller is the PCIe/USB3 controller, (at least with maple ridge) the regular USB2 controller of the system is used and USB2 just routed through the TB4 controller. Probably something to do with the wake-from-sleep-by-input features, standby power saving and early-boot input.

Somewhat. 1st gen / Alpine Ridge Controllers only supported HBR2, but 2x 4xHBR2 was possible. Titan Ridge added support for HBR3, but TB does not have enough bandwidth for 2 full 4xHBR3 connections. Same holds for USB4. More on that below.

Whether it is a TB4 certification requirement to support 2 HBR3 links I do not know. The Intel provided table of minimum monitor support (2x4K60 or 1x8K60) could be satisfied by 4xHBR2 link + 4xHBR3 link, which still does not fit through the same TB link. Although I have not seen such asymmetric configs from Intel systems, only AMD desktop systems which may not have certification. I would translate the spec into:

must support 2 distinct DP links, each at minimum 4xHBR2
must support at least 1 DP link with HBR3 speeds AND DSC to achieve 8K60

Correct. So far, the host TB-controllers only had at most 2 DP links available. But you can access these 2 links at any point of a TB3 / TB4 hierarchy. I.E. one TB dock uses 1 DP. TB-Out daisy chained to a second dock that can access the other DP link. Or even access one of those DP links via a second TB hierarchy also attached to the same host controller. In that case one is no longer limited by TB bandwidth and can access 2 full HBR3 links, if the host has that available.
I do not know whether it is a protocol limitation of USB4 to only support 2 DP links at most. It may just be an Intel limitation as so far, all Intel TB controllers, even those inside Tiger Lake (2 controllers for 2 TB-ports each) only have 2 DP links available, which they may distribute across the 2 possible TB-Ports.
So for Framework, 2 of the USB4 ports will internally share a controller and therefore share 2 DP links between them. As other 4-TB port notebooks with Tiger Lake have one controller for each side, this is probably also the case for Framework.

Bandwidth allocation for tunneling DP though TB:
As far as I know and tested, DP links through TB have their bandwidth statically allocated on a first-come-first-serve basis. As DP SST displays only use the connection speed required for the desired resolution, if you connect a 4K60 display, only a 4xHBR2 link will be established, with the max 4xHBR2 data rate of ~17.3 GBit/s allocated to that link.
A second display can then still establish a second 4xHBR2 connection, as there is enough bandwidth left. If the first connection that gets established uses 4xHBR3 however ~25.9 GBit/s get allocated. If not enough unallocated bandwidth is available, the DP links will be downgraded, similar to when using lesser cables. In practice I have seen 4xHBR3 + 4xHBR1 and 4xHBR3 + 2xHBR2. Dell lists their docks as using 4xHBR3 + 1xHBR3. I have no idea, whether Displayport devices are smart and flexible enough to find the exact configuration that still can supply the most bandwidth (i.e. 2xHBR2 provides more bandwidth than 1xHBR3), or whether they will just downgrade the speed at whatever amount of lanes they want (USB-C monitors are so far, statically configured inside the OSD whether to seek 2xDP+2xUSB or 4xDP Alt Mode)

This concept of the maximum data rate of the link being allocated to it also worked in 20Gbit/s TB4 mode, where either only 1 4xHBR2 link or for example 2 4xHBR1 link could be established.

If a monitor is driven with less than its native resolution (not upscaled by GPU) the Displayport link may also get downgraded and require less bandwidth of Thunderbolt.
However in my experience MST-Hubs, like those in DP daisy-chainable displays or the popular Dell, Lenovo, HP docks will always establish the highest link speed supported, so they do not have to renegotiate when new displays are attached. But they seem to also only establish their connection, when at least a single sink/monitor is attached, so DP bandwidth allocation can be controlled by connecting displays one after the other in the desired order.
If however a TB Hub / Dock with multiple displays already attached is connected to the host, the internal display outputs of each TB-controller seem to have a static priority, that is used to decide the order in which DP connections are established and get their bandwidth allocated.

Within each DP link speed, the actually used DP bandwidth does not matter for the above described allocation process. But if there is bandwidth left over, like when driving a single FHD60 display behind a TB3 dock with built-in HBR3 MST Hub (4xHBR3 / ~25.9 GBit/s allocated for this link, but only ~3.4 GBit/s actually used) the difference in bandwidth can still be used by other data than Displayport, like USB and PCIe traffic.
But it seems, at least so far, that TB will never thin-provision Displayport links depending on what amount of bandwidth is currently taken up by displays (even though Intel describes in one of their whitepapers that TB controllers sniff that number out of the DP links, to correctly prioritize DP communication above all else).

joevt · March 8, 2022, 2:21am

Some thoughts (some of which mentioned already by others):

Bandwidth
Most of the numbers shown here are for bits on the wire. However, that does not represent the amount of actual data being sent. For example, SATA, PCIe 2.0, USB 3.0, HDMI, and DisplayPort 1.4 transmit 10 bits on the wire for every 8 bits of data, otherwise known as 8b/10b encoding. This means USB 3.0 is more like 4 Gbps instead of 5 Gbps.

Higher speed modes use a more efficient encoding. Below is a list with numbers for single lane/single direction speed:

PCIe 3.0 8 Gbps using 128b/130b = 7.877 Gbps
PCIe 6.0 64 Gbps using 242b/256b = 60.5 Gbps
USB 3.1 gen 2 10 Gbps using 128b/132b = 9.846 Gbps
USB4 10 Gbps uses 64b/66b = 9.697 Gbps
USB4 20 Gbps uses 128b/132b = 19.39 Gbps
Thunderbolt 10 Gbps and 20 Gbps uses 64b/66b but the numbers are after applying the encoding, so the actual bits on the wire is 10.3125 Gbps or 20.625 Gbps. This makes Thunderbolt slightly faster than USB4 (when USB4 is not doing Thunderbolt).

This means USB 3.1 gen 2 is faster then USB 3.2 gen 1x2 even though both have a total of 10 Gbps on the wire.

USB1/2/3
You are mixing up lane and line.
A lane is two lines, one for each direction.

4 wires
2 lines (1 line = 1 differential pair = 2 wires)
1 lane (a pair of lines, one for Rx and one for Tx)
USB-C has room for 4 lines (for DisplayPort) or 2 lanes (for USB 3.2 or Thunderbolt)
or 1 lane of USB 3.1 gen 2 and 2 lines of DisplayPort.

Search and replace lane with line:
“Adds two extra lines”, “the newly added lines”, “TX line”, “RX line”, “the 4 high-speed lines”, “single line speed”, “all 4 high-speed lines”, “10Gbps-per-line”.

DisplayPort
Same lane/line confusion. DisplayPort lines are one direction so I wouldn’t call them a lane.

DP++: usually DP++ is limited to HDMI 1.4. Maybe there exist some DP++ ports that can do HDMI 2.0 - I guess it would just require a voltage shifter that can handle the 6 Gbps per line bandwidth.

MTS: Should be MST (Multi-Stream Transport). MST is interesting because an MST hub can convert DisplayPort signals of different link width and link rates like a PCIe switch can. Also, MST can use DSC on the input and decompress that on the output for displays that don’t support DSC. DSC allows connecting 3 4K 60Hz displays from a single HBR3 x4 link.

Thunderbolt
Same lane/line confusion. Thunderbolt uses 8 wires, 4 lines, 2 lanes. Link width can be one or two lanes.

Read the USB4 spec to understand how Thunderbolt works since they are very similar. Thunderbolt always sends a Thunderbolt signal (when a Thunderbolt device is connected). DisplayPort, PCIe, (and now USB with Thunderbolt 4) are encapsulated into Thunderbolt packets (they are said to be tunnelled). It’s like how Ethernet can do https and smb and telnet on the same wire.

Thunderbolt 1 and Thunderbolt 2 have the same bandwidth but Thunderbolt 2 can combine the 2 lanes (also called channels) into a single link (channel aggregation). In Thunderbolt 1, the two 10 Gbps lanes are separate. Thunderbolt 1 and Thunderbolt 2 can handle two 10 Gbps displays, but only Thunderbolt 2 can handle a 20 Gbps display. While Thunderbolt 1 can do two channels of DisplayPort (one per lane), I don’t know if it can do two DisplayPort signals on a single lane to allow the other lane to be dedicated to PCIe, or if there’s a way for Thunderbolt 1 to do more than one lane worth of PCIe. The channel aggregation of Thunderbolt 2 and later makes things easier and more efficient.

Thunderbolt 4 hubs can be used with Thunderbolt 3 hosts in macOS Big Sur and later because macOS uses its own Thunderbolt connection manager. Windows uses the connection manager that is built into the firmware of the host Thunderbolt controller (ICM - Internal Connection Manager). Linux has software connection manager but only for Macs?

USB in Thunderbolt 1 and 2 docks is done using tunnelled PCIe to USB controllers in the docks. The Thunderbolt 3 controller has its own USB controller. For Alpine Ridge, this is used for the Thunderbolt ports only. Titan Ridge has an extra USB port not connected to the Thunderbolt ports.

USB in Thunderbolt 4 docks (using Goshen Ridge) when connected to a Thunderbolt 4 host (such as an M1 Mac) is controlled by the USB controller of the host which uses USB tunnelling to a four port USB hub in the dock (3 downstream Thunderbolt ports and one USB port). I think the hub is part of Goshen Ridge. When connected to a Thunderbolt 3 host, PCIe tunnelling is used to communicate with a USB controller in the Goshen Ridge. In that case, the USB hub is still used, limiting upstream bandwidth to 10 Gbps. Intel could have done that differently to fully utilize the ≈23 Gbps PCIe bandwidth but i guess the hub method was easier?

USB4 and Thunderbolt 4 use different signalling bandwidth (as described earlier). USB4 hosts are not required to support Thunderbolt (but I think all current hosts do?). Thunderbolt supports a depth (chain length) of 6 but USB4 only supports a depth of 5.

Thunderbolt 3/4 cannot support two HBR3 x4 DisplayPort signals because that would require 51.84 Gbps. Usually you are limited to two HBR2 or one HBR3 with one HBR (34.56 Gbps total). For the XDR display when connected to a GPU that doesn’t support DSC, Apple has a trick to force two HBR3 links (using their software Thunderbolt connection manager). The trick works because the XDR only requires 19.5 Gbps per connection (38.93 Gbps total) and Thunderbolt doesn’t transmit the stuffing symbols used to fill the HBR3 bandwidth (25.92 Gbps).

The DisplayPort Alt Mode support of TB3 and TB4 ports also includes USB support so you can connect USB-C docks or USB devices.

USB-C
Same lane/line confusion. 8 wires, 4 lines. I wouldn’t use lanes in this description since not all the alt modes supported by USB-C are bi-directional.

USB4/Thunderbolt 4
I don’t think DisplayPort 2.0 Alt Mode is used or supported by anything yet? So it’s not a requirement of USB4 or Thunderbolt 4.

For the DisplayPort requirement of Thunderbolt 4, it just means that the Thunderbolt 4 host controller has two DisplayPort 1.4 HBR3x4 inputs from a GPU. 8K60 is not possible uncompressed (unless you count 4:2:0 8bpc using a non-HDMI timing).

Regarding DisplayPort tunnelling, Apple’s Thunderbolt Target Display Mode used in old Thunderbolt iMacs appears to use a cross domain path for DisplayPort tunnelling. A domain consists of a Thunderbolt host and it’s connected Thunderbolt devices. You can connect two domains together so that the hosts can communicate with each other (Thunderbolt IP in Windows, macOS, Linux; Thunderbolt Target Disk Mode between macOS and a Mac’s EFI; Thunderbolt Target Display Mode between macOS and macOS). It would be interesting for Linux to support a software Thunderbolt connection manager that can support cross-domain DisplayPort tunnelling. It could be used as a Thunderbolt KVM.

Framework laptop
You can connect many MST hubs to all 4 Thunderbolt ports in chains and trees, so you can have like a 100 DisplayPort ports, but the iGPU can only support 4 displays.

Each Thunderbolt port can only have two DisplayPort signals which means each Thunderbolt port can have two trees of MST hubs but the iGPU only has 4 DisplayPort signals to devide among all the 8 possible routes (9 including the built-in Display).

On the other hand, the Apple M1 Max has 8 DisplayPort signals to devide between 8 routes (3 Thunderbolt 4 ports, one HDMI port, and the built-in display). However, three of those signals can only be used when connecting a tiled display like the LG UltraFine 5K or the Dell UP2715K - it’s an interesting way of handling tiled displays - like an extra abstraction layer that didn’t exist in Intel Macs.

Ray519 · March 9, 2022, 8:48am

The DP 1.2 spec calls this lanes, so a full DP connection consists of 4 lanes. And yes, to my surprise the USB spec actually includes receive and transmit in what one of their “lanes” entails. Personally I would have expected lanes to be one-way as that seems way more natural to me.
The 1.2 spec is openly available on the internet, but since I do not know how official that is, I am not going to link or quote it.

Have you found any source for this? Shure, every TB4 device must also support backward compatibility to TB3 and thus support the higher wire speed of TB3. But is TB4 actually using it? Because that would make it no longer just a certification of USB4, but technically a competing standard. (The difference is marginal and USB4 speeds cannot be exhausted by either PCIe or DP connections alone. So it could only be measurable with combined usage with very deterministic and constant bandwidth on PCIe and USB links).

It is actually a USB4 requirement for all USB4 Hubs to support TB3 compatibility and works on Windows and Linux with Titan Ridge and Alpine Ridge controllers (Alpine Ridge only supports daisy-chaining topology though). It should have nothing to with how MacOS implements something.

Edit:

Why would they need such special behavior. If you tile the display into 2 halves, you are already well below an HBR2 connection. Also the smaller Intel based MacBooks (without dGPU) that already supported the XDR display were not capable of HBR3 due their older iGPUs.

joevt · March 9, 2022, 11:31pm

I forgot about that. I was thinking PCIe lanes and Thunderbolt lanes which are definitely 1 lane, 2 lines or differential pairs (one for Rx, another for Tx), 4 wires or pins.

You’re right, I’m not sure what speed a Thunderbolt 4 host (Maple Ridge, Apple Silicon, Tiger Lake) uses when communicating with a Thunderbolt 4 device (usually Goshen Ridge). While there may exist in the future other USB4 peripheral controllers, I don’t know that anyone will make a Thunderbolt 4 controller or if a Thunderbolt 4 controller is any different than a USB4 controller. One might hope that the Thunderbolt 4 host would use Thunderbolt speed when communicating with a Thunderbolt 4 peripheral like it would for a Thunderbolt 3 peripheral to gain 1.25 Gbps of speed (156 MB/s) but that amount is not enough for a benchmark to tell the difference. I suppose one could check link rate by examining USB4 registers in Linux? macOS shows link rate and link width for all Thunderbolt connections but I don’t think it differentiates between 40 Gbps and 41.25 Gbps. I do know that the Maple Ridge host controller uses the USB4 PCIe class code but I haven’t tried programming a Maple Ridge controller yet. USB tunnelling doesn’t extend through a Thunderbolt 3 peripheral, so maybe up to that point USB4 timing is used?

You’re right. That compatibility is described in this example:
https://www.caldigit.com/caldigit-thunderbolt-4-usb-4-element-hub-compatibility-and-limitations-on-windows
On the other hand, Sonnet Thunderbolt 4 dock compatibility doesn’t include Thunderbolt 3 PCs and explicitly excludes them in their compatibility pdf. Maybe their info is out of date?
Echo 11 Thunderbolt 4 Dock - SONNETTECH
OWC has similar exclusions as Sonnet.
OWC Thunderbolt Hub - Add Three More Thunderbolt (USB-C) Ports
OWC Docks, Hubs, Docking Stations, and More

XDR 6K 60Hz timing is either a single tile using DSC:

6016x3384@60.000Hz 210.960kHz 1286.01MHz h(8 32 40 +) v(118 8 6 -)

or two tiles without DSC:

3008x3384@60.000Hz 210.959kHz 648.91MHz h(8 32 28 +) v(118 8 6 -)

In the DSC case, multiply the pixel clock by 12bpp = 15.4 Gbps which is low enough for HBR2 so you can connect two of them to a single Thunderbolt port.

In the non-DSC case, multiply the pixel clock by 30bpp (10bpc) = 19.5 Gbps or 38.9 Gbps for both tiles. HBR2 only supports up to 17.28 Gbps (remember that you need to take into account the 8b/10b encoding for the 21.6 Gbps HBR2 signal). The AGDCDiagnose command in macOS shows two HBR3 connections in this case.

The MacBook without Thunderbolt or DSC support can only support 4K 60Hz on the XDR. Do you mean MacBook Pro or MacBook Air? Which specific model?

The XDR does support HBR2 input:

for non-DSC single tile for 4K 60Hz:

3840x2160@60.000Hz 134.699kHz 528.02MHz h(8 32 40 +) v(7 8 70 -)

and non-DSC dual tile 5K 60Hz:

2560x2880@59.999Hz 179.578kHz 481.27MHz h(8 32 80 +) v(99 8 6 -)

and DSC 5K 60Hz:

5120x2880@60.000Hz 179.579kHz 933.81MHz h(8 32 40 +) v(99 8 6 -)

PCs can do 5K 60Hz on the XDR without DSC using HBR2 6bpc (18bpp). I’ve never seen the XDR use HBR3 for single tile.

The EDID for the XDR is huge (896 bytes = 1 base block + 6 extension blocks) because of all these modes multiplied by 5 refresh rates each.

Ray519 · March 10, 2022, 9:58am

I assumed it would use 8 Bit color depth on older devices not capable of HBR3+DSC, because then each tile would still fit into an HBR2 connection without any custom non-compliant handling (also assuming that not only the last Thunderbolt controller is responsible for allocating Displayport bandwidth).

Matthijs_Kooijman · March 17, 2022, 1:56pm

Hey all, thanks for your replies, kind words and additions. It has indeed taken quite a bit of time to dig through everything and produce this post, but it’s always fun to learn stuff. I had some work to get out of the way the last two weeks, but now I should finally have a bit of time to go through the replies, update my post a bit and maybe even actually pick a dock to buy :-p

However, I just noticed that I can no longer edit my post, I guess that discourse has a time limit on edits… I do think it would be useful to update the post a bit, so maybe the mods could remove this timeout for this post somehow, or otherwise convert it into a wikipost?

Matthijs_Kooijman · March 17, 2022, 4:44pm

Thanks for putting it like that. This is how I thought about it, but I wanted to stay a bit more objective in my post

[Lanes vs lines]

I had indeed also seen confusing things where e.g. DP uses unidirectional lanes and USB/PCIe use bidirectional lanes. To prevent confusion, I have used “lane” to mean a single differential pair in my post everywhere, which is consistent but indeed not match other sources, and apparently is not sufficient to prevent confusion (as seen from the discussion). I guess I could just prevent the word lane entirely (or maybe only use it with sufficient qualification in relation to specific protocols), and use “line” for a single unidirectional wire pair (instead of “lane” I used for that now) and keep using “full-duplex channel” for a pair of lines.

Cool, I’ll add a bit about this in the post (see below).

Shouldn’t this be Maple Ridge? According to Thunderbolt (interface) - Wikipedia Titan Ridge only does TB3, not TB4?

Thanks, I qualified the line you quoted a bit below.

Ah, I see, that clarifies things a bit. I added some remarks about this.

I am under the impression that the USB4 protocol is designed to just generically support arbitrary tunnels, but actual implementations will be limited by the amount of DP in/out adapters they have. E.g. the USB4 spec says “Each Router contains up to 64 Adapters.” and “A Router may contain one or more DP IN Protocol Adapters, one or more DP OUT Protocol Adapters, or a combination of DP IN and DP OUT Protocol Adapters.”

It also says “A USB4 host shall support DP tunneling. A Host Router shall contain at least one DP IN Adapter and may optionally contain one or more DP OUT Adapters. A USB4 hub shall support DP Tunneling. A USB4 Hub shall contain at least one DP OUT Adapter and may optionally contain one or more DP IN Adapters. A USB4 peripheral device may optionally support DP Tunneling. If a USB4 peripheral device supports DP Tunneling, it shall contain at least one DP Adapter.”, so it seems the USB4 spec only requires a single DP link to be supported (I think it’s interesting that a Hub is required to support a DP out adapter, does that mean it must also have a connector for that? Or is the expectation to be able to route it to DP-alt-mode on (a) downstream USB-C port(s)?)

I also added some info about this below.

I can imagine that bulk USB traffic will fill up any such leftofter bandwidth, but does that also work for PCIe? Doesn’t PCIe also need/negotiate some reserved bandwidth?

Good point, copied your suggestion below. I also copied your list of raw vs encoded bitrates, and added a few more entries below. I did find an error in your USB3.1 bandwidth, which I calculated as 10/132*128 = 9.697 Gbps, but you had 9.846Gbps.

Thanks, fixed MST below and added info about DSC.

Why 10Gbps? If the hub connects to the USB host in Goshen Ridge using USB3 gen2x2, it could use two 10Gbps duplex lines, so 20Gbps upstream bandwidth, right?

Right, so this essentially over-allocates the two links, in the knowledge that the actual bandwidth to be used will fit, even when two HBR3 links will not. I guess this works if the host and dock can agree on this, then the GPU and display can just use two HBR3 links (with added stuffing) and never be the wiser. This does mean that the GPU and display cannot change the resolution to one that needs more bandwidth, so this means that some integration between the GPU driver / display configuration and TB controller in the host is required.

This essentially contradicts what @Ray519 said about bandwidth allocations happening based on the max bandwidth for the negotiated DP bitrate, rather than actual bandwidth, but I guess the “trick” here is that rather than just negotiating the DP link bandwidth over the actual DP channel between the GPU and TB controller (which I guess is how it normally works?), this involves some OS-level integration between the different drivers involved (to communicate about actual bandwidth needed)?

Things to change in my post
Below here, some things to be added / changed in my original post. In addition to changes based on replies, I also added some info on single vs multi-TT hubs that I missed originally.

Edit: I’ve updated the original post, and removed the proposed changes below for clarity

Ray519 · March 17, 2022, 6:35pm

Oh yes, that sounds quite like it is only an Intel limitation. Apparently I ignored / forgot this part back when I read through the spec (way back, before I had devices to test with).

Mhh, I do not know PCIe in detail. But since it is packet based and already supports complex topologies with shared upstream bandwidth, just like USB3, I cannot imagine that the bandwidth can be reserved throughout the whole topology. But I may be wrong on that and it is just not used or just a form of prioritization. But a chipset with 4x PCIe4 lanes uplink and multiple 4x 4.0 downlinks of modern NVMe SSDs worth could not work as they do, if they would actually reserve their max bandwidth on the upstream link.

I currently do not own PCIe-TB devices in order to test this as I have with USB sadly.

Sounds like it. I mostly verified my claims with my Alder Lake + Maple Ridge + Goshen Ridge system with windows (and a bit linux), no Apple stuff at all. I could totally see Apple doing things like this for their own exact display, where they know precisely what it can and cannot do. As Apple refuses MST, they can disregard the whole topic of additional displays appearing on an existing MST-connection, thereby exceeding the actually available bandwidth and running into a problem, the DP protocol probably cannot communicate.

joevt · March 19, 2022, 5:56am

In macOS, 10bpc is required to support the HDR features of the XDR display so that’s why a dual tile HBR3 connection method exists. Dual tile HBR2 can be used for 5K but I don’t know about 6K. Maybe it’s possible - you have to check the EDIDs of both connections to make sure they have the tile info for 6K.

Ice Lake and Tiger Lake have “integrated” Thunderbolt controller(s). I wouldn’t call them Titan Ridge or Maple Ridge which are “discrete” Thunderbolt controllers. They might share some PCI ids but that’s all.

What makes integrated Thunderbolt controllers interesting is that their upstream is not real PCIe so they don’t have a PCIe 3.0 x4 (31.5 Gbps) upstream limit per Thunderbolt bus/controller (two ports per bus). The upstream limit is more like 40 Gbps (unrelated to Thunderbolt’s 40 Gbps - use a benchmark of a software RAID 0 between multiple Thunderbolt ports to discover this limit).

I believe the latter: a USB4 hub’s USB-C ports that support USB4 are expected to also support USB 3.x and DisplayPort Alt Mode.

PCIe is packetized so doesn’t require a minimum to be reserved. I expect USB to be similar? There might be some USB 3.x devices that don’t like having less than 4 Gbps to play with? An XDR display using dual tile HBR3 connection for 6K 60Hz 10bpc only has ≈1 Gbps remaining for USB functionality. Apple describes this case as having a USB 2.0 limit, but I haven’t seen anyone measure that to see if > 480 Mb/s is possible, especially for reading since DisplayPort uses mostly transmit bandwidth. The XDR uses tunnelled PCIe for its USB controller. I guess it doesn’t mind being limited to 1 Gbps even though the slowest PCIe link is PCIe 1.0 x1 = 2 Gbps.

Good catch.

I don’t know of any USB4 hosts (M1, Tiger Lake, Maple Ridge) or peripherals (Goshen Ridge) that support USB 3.2 gen2x2 (20 Gbps).

I think the software just needs to know that the XDR is connected via Thunderbolt 3 at 40 Gbps, and then it can just poke “HBR3” into the DisplayPort link rate register for both of the DisplayPort Out adapters of the Titan Ridge Thunderbolt controller in the XDR display. Everything after that should be automatic. The GPU will see two HBR3 connections to a dual tile display and output the appropriate HBR3 signals to the host Thunderbolt controller’s DisplayPort In adapters which will convert them to tunnelled DisplayPort. I wonder what kind of wonderful things happen if you do that with a couple of HBR3 displays that can exceed 20 Gbps?

Right. This would cause the “exceed 20 Gbps” per connection wonderment. The only guard against this is the fact that only the XDR currently uses this mode and the EDIDs do not describe any modes that would exceed 20 Gbps per tile.

Usually Thunderbolt would negotiate the greatest DisplayPort link rate/width supported by a GPU and display depending on any Thunderbolt DisplayPort links that already exist. This can happen without OS support but I guess the OS has the ability to override the choices made by the Thunderbolt firmware. The chosen link rate/width then affects the range of display modes presented to the user. The link rate/width doesn’t change if you lower the resolution/refresh rate because you want that bandwidth to be there when you increase the resolution/refresh rate.

Here’s some examples:

connect a 5K dual tile display and a 4K display to a Thunderbolt dock. If the 5K dual tile display is discovered first, then the 4K won’t work. If the 4K is discovered first then the 5K can only work at 4K.
connect two HBR3 displays to a Thunderbolt dock. The first one discovered will work at HBR3 link rate. The second can only work at HBR link rate. You can shuffle in a HBR2 display in order to get both HBR3 displays to work at HBR2 link rate but it’s annoying to have to do that everytime you reboot.

Should be 20.625Gbps.

I guess the same place DisplayPort SST-encoding happens - in the GPU part of the chip, before it gets sent to the DisplayPort In Adapter of the integrated Thunderbolt controller.

Matthijs_Kooijman · March 25, 2022, 1:52pm

I’m finally looking to see what dock I should get myself, and now I have need for a shorter summary of all the detail above intended for selecting a dock. So, here’s an attempt at such a summary, once it this is finished I think it could be added to the (other) dock megathread wikipost. If anyone has any additions or corrections, let me know. Also, if anyone has some practical advice to add about when to prefer a TB3 vs a TB4 dock (it seems TB4 advantages are minimal?), that’d be great.

There a few kinds of docks currently available:

USB-C (non-Thunderbolt) docks. These are usually cheaper and use (typically) USB3 to connect to the laptop. Essentially these are just a USB3 hub with a bunch of USB devices (USB network card, USB card reader, USB soundcard, etc.).
- Some docks also connect display outputs as a USB device (e.g. “DisplayLink” is the most common technology for that), but you’ll want to avoid this for anything but very simple office work on Windows (Linux has only closed-source DisplayLink drivers).
- Some docks connect display outputs using DP alt mode (on the upstream USB-C connection), which uses one of the two lanes (four wires) for a two-lane DP signal, which is routed pretty much directly to a single display output, (or multiple when the dock has a Displayport MST hub builtin). Some docks only support 2.0, so have all wires available for a 4-lane DP signal.
- Upstream USB3.0 bandwidth is 5Gbps or 10Gbps, and can be 20Gbps (total, full-duplex, before encoding) when you are not using DP-alt-mode. Exact speed depends on the USB version/speed implemented by the dock.
Thunderbolt 3 (TB3) docks. These repurpose the wires in the USB-C connector to use the TB3 protocol, and then tunnel PCIe and Displayport over that protocol (and USB3 over PCIe).
- Display signals are tunneled over the TB3 connection (sharing bandwidth, no dedicated wires). Two independent signals can be tunneled, each of which can be routed to either one or more (using MST) dedicated DisplayPort/HDMI connectors and/or DP-alt-mode on downstream USB-C connectors. One or both signals can also be forwarded entirely (no MST-splitting) to the downstream TB3 connector.
- These docks might also support DisplayLink (or similar technologies) to stream display data to an USB device (probably for supporting additional outputs), but again you’ll want to avoid this for anything but very simple office work.
- The downstream TB3 port is required to support DP-alt-mode as well.
- Older TB3 docks (based on Alpine Ridge) are little less capable (only DP1.2, no DP-altmode support on the upstream port for non-thunderbolt hosts) than the newer (based on Titan Ridge).
- Upstream bandwidth is 41.25Gbps (total, full-duplex, before encoding).
Thunderbolt 4 (TB4) docks. These are very similar to TB3 docks, except:
- TB4 supports up to 2m passive cables.
- TB4 supports waking up the host from sleep (e.g. standby) from downstream USB devices (e.g. a keyboard). This probably works through the lower-speed USB2.0 connection. TB3 did not support this, though there were some manufacturer-specific workarounds.
- TB4 supports 2xDP1.4 25.92Gbps streams (subject to total bandwidth limit), which is the same as newer TB3 docks, but more than older TB3 docks.
- TB4 requires 32Gbps PCIe instead of 16Gbps and DMA protection, but this is mostly a host requirement, so you’ll get this even when connecting TB3 devices to a TB4 host.
- Supports multiple downstream TB ports (TB3 supports only daisy-chaining through one downstream port).
- Ustream bandwidth is slightly lower: 40Gbps (total, full-duplex, before encoding) vs 41.25Gbps for TB3.
- USB3 traffic is tunneled directly, not inside PCIe (which removes the need for a USB controller driver and might improve performance).
- TB4 docks (and devices) can also fall back to TB3 (both upstream and downstream), but then they no longer support TB4 features. This makes a TB4 dock more flexible than a TB3 dock: both support TB3 devices, but when you connect USB4/TB4 devices behind a TB3 dock, all will run in TB3-compatibility mode, but behind a TB4 dock all can run using USB4/TB4.
USB4 (non-Thunderbolt) docks could technically exist too, but since TB4 is really just USB4 with most optional features made mandatory (and some additional certification), and most of these optional features are already mandatory for USB4 hubs (including even TB3 compatibility), it seems likely that such hubs will just be made TB4-compatible anyway.
USB1/2 traffic goes over its own pair of wires, so has its own dedicated 480Mbps of bandwidth (except when using Thunderbolt 3, where it is tunneled over PCIe over TB3) and works pretty much the same across all docks (though integrated USB2 hubs can be single-tt or multi-tt, with multi-tt can be preferable, especially when USB soundcards are involved).

Available bandwidth for displays depends on the connection (all bandwidths after encoding). To see how much bandwidth you need for a specific video mode, see this table on Wikipedia.

Full DP alt mode (all four lines/two lanes used for DP, no USB3, e.g. direct connection to monitor): 4x6.48 = 25.92Gbps (DP 1.4/HBR3). The newer DP Alt mode 2.0 could in theory go up to to 4x19.39Gbps = 77.58Gbps (DP2.0/UHBR20), but the Framework does not support this.
Half DP alt mode (only two lines/one lane used for DP, the others for USB3 has half of that: 2x6.48 = 12.96Gbps (DP1.4/HBR3).
TB3 supports up to 2x20 = 40Gbps, but some protocol overhead has to be subtracted and this is total bandwidth shared between DP, PCIe and USB3 traffic. Also, each of the (max) two DP streams inside is subject to limitations imposed by the tunnel endpoints in the used chips: e.g. 17.28Gbps (DP1.2 / 4xHBR2) for Alpine ridge-based docks, 25.92Gbps (DP1.4 / 4xHBR3) for Titan Ridge-based docks and the Tiger Lake CPU used in the framework laptop.
USB4 is very much like TB3 here, except it has slightly lower maximum bandwidth, up to 2x19.39 = 38.78Gbps (40Gbps before encoding). Again, protocol overhead must be subtracted and this is shared between DP, PCIe and USB3. Again, the used chips limit bandwidth (e.g. 2 streams, each 25.92Gbps (DP1.4 / 4xHBR3) for Goshen ridge-based docks and the Tiger Lake CPU used in the framework laptop). In theory, USB4 could support more than two streams, but no current hardware implements this (and given the total bandwidth limit, it seems unlikely to change, especially since MST can support extra outputs if needed).
TB4 is the same as USB4, except that it requires that the maximum bandwidth is supported (while USB4 also allows running at half speed).

Furthermore:

Display routing and bandwidth allocation within a dock is sometimes complex, especially when MST is supported and/or there are multiple display outputs (either dedicated, or DP-alt-mode on downstream USB-C ports). Unfortunately manufacturers do not seem to provide much details about this.
Docks can supply power to the laptop using USB-PD (Power Delivery). This is a property of the USB-C connector and really separate of Thunderbolt support. Thunderbolt does specify some minimum supported power figures, but in practice you can should just see if the dock you want supports the power you need.
Cables are not always interchangeable, especially active cables can only be used for the protocol(s) they are designed for. Active TB3 cables need to support TB3 and USB2, while active TB4 cables need to support TB4, USB3 and DP. Passive cables are usually more flexible, but might also be more limited in speed and length.

Specifically about the Framework laptop:

The laptop supports USB4. It is intended to become TB4 certified, so it seems safe to assume that it should work with TB3 and TB4 already right now.
The tiger lake CPU/GPU supports driving 4 independent displays (including the builtin flat panel). It can generate up to 4 DP1.4 (4xHBR3 each) streams (which can each support multiple displays using MST, still observing a limit of 4 displays in total). Each pair of these streams can be divided among a pair of USB-C ports (probably 2 streams left and 2 streams right), either two streams on one port (using TB3/USB4/TB4), or one stream on both ports (using TB3/USB4/TB4/DP-alt-mode). For maximum bandwidth (8xHBR3), a pair of ports can be combined and connected to a single display (this probably counts as two displays for the total limit).
The laptop is designed for 60W+ chargers, supports up to 100W (and also down to 15W, but then it charges slowly and when running might drain even when plugged in).

Summarizing:

If you need just a single (non-Thunderbolt) display output and do not need maximum bandwidth for e.g. external harddrives or eGPUs, using a USB-C (non-Thunderbolt) dock can be a good and cheap option. You’ll probably want to avoid DisplayLink-based docks and look for docks that use DP-alt-mode for getting the display signal from the laptop.
Otherwise, you’ll want to look for a Thunderbolt dock. TB3 and TB4 docks are very similar and support roughly the same total bandwidth, but see above for some differences (and you probably want to avoid the older Alpine Ridge-based TB3 docks that only support the lower DP1.2 bandwidth).
If you have multiple very high resolution displays you might run into bandwidth limits with a single TB3/TB4 connection, and might need to use multiple docks (and/or connecting displays to the laptop directly using USB-C or a DP/HDMI expansion card).

Ray519 · March 25, 2022, 4:30pm

Here my comments for a dock-focused overview.

USB-C Alt Mode Docks: USB Bandwidth can never be 20 Gbps, because then there would be no DP.

TB3 Docks:
I would distinguish old / Alpine Ridge and new / Titan Ridge.
Alpine Ridge is limited to HBR2. I have no experience with Alpine Ridge Docks, but with my Asus Alpine Ridge PCIe Card there is a problem if the monitor and host support HBR3, because they seem not to detect the limitations by the TB controller (even for directly attaching the display, without an actual TB connection in between). This resulted in flickering black screens and the pc endlessly redetecting the display, without me being able to change any of the display settings. So I would be careful getting Alpine Ridge for newer hosts, might be a timebomb, if displays are ever upgraded. Also earlier Alpine Ridge firmware versions prevented HDR.

Titan Ridge on the other hand is backward compatible to USB-C DP Alt Mode hosts, which increases compatibility with other hosts, such as cellphones or cheaper laptops, or AMD based laptops for the time being. Also one hears much less about the docks stopping to work after a while or being as finicky.

TB4:
most things you list are guarantees of a TB4 host, not a dock.
TB4 Docks do not require TB4 hosts with PCIe tunneling. They will also work on USB4 only hosts in the future (so far I am not aware of any TB4 dock using the x1 PCIe port Goshen Ridge provides at most).
TB4 Docks do nothing for DMA protection. In fact they do not need it, because they are not using PCIe (if connected via USB4/TB4). This should allow booting from USB sticks behind TB4 docks, without having to enable non-default and quite insecure options such as PCIe BootROM support behind TB, as was necessary on TB3 hosts in order to get this working. This of course only works with TB4 docks. Sadly my desktop PC has no option to allow booting behind TB4, but this should be distinct from the technical capability.

Wakeup from sleep states means, waking up from USB devices such as keyboard or mice behind TB4 docks. (probably through the USB2 connection separately kept, which TB3 does not have).

My TB4 host will not wake from sleep from keyboard interactions behind my TB3 dock, but will from my TB4 dock. Certain manufacturers have implemented workarounds for TB3, such as my Dell XPS with my WD19TB. I am guessing the wake-on-USB is handled by the dock and communicated via PD in the same way the proprietary power LED and button of the dock is, because there is multiple seconds worth of latency, compared to using the builtin keyboard.

Also, TB4 devices cannot be put behind TB3 devices, because then they operate in TB3 legacy mode, which forces TB4 docks use PCIe instead of USB tunneling again and would prevent such things as waking from standby.
This leads me to recommend TB4 for anything that supports daisy-chaining, because it would be a hard to detect this TB3 limitation later (although it can be exploited to increase USB-bandwidth, which is actually limited to one 10G USB3 link per TB4 port. I did some testing with maple ridge + goshen ridge).

TB4 mandates active cables be backward compatible to USB3 and DP, while TB3 only mandated USB2 support. But the spec excludes fiber-optic cables from that, those do not even need to support USB2 (or power). But passive cables are probably more future proof for DP 2.0 or not yet specific things (I do not know how precisely the active cables work, whether they are protocol aware or only work for specific frequencies…)

A completely different aspect: Tiger Lake supports Adaptive Sync via DP with modern G-Sync / FreeSync displays etc. So far, every dock I have heard about that includes MST Hubs breaks this functionality (raw output via TB-out not affected). On Intel one can still enable Adaptive Sync, but the screen turns black as soon as the GPU actually tries to go below the main refresh rate. Also docks with builtin DP-HDMI converters might be picky with the supported HDMI displays. My Dell WD19TB for example does not like my Ultra-Wide screen display on the builtin HDMI. Exotic monitors are usually not tested for with those docks. If you care about this, one should get a dock that offers the native TB-Outs or raw DP-Ports. If a TB dock currently supports more than 2 displays or HDMI it contains MST-Hubs and adapters that will most likely mess with some exotic functionality.

Matthijs_Kooijman · March 25, 2022, 8:32pm

As I had already foreshadowed, I’ve moved the bulk of the content of my original post to my blog (and made some small revisions along the way) where it can now again be read in a single piece. I’ll also post to this topic when I make significant changes in the future. The full post can be seen here: (...) — USB, Thunderbolt, Displayport & docks

Matthijs_Kooijman · March 25, 2022, 9:33pm

Good point, I’ve clarified this.

Thanks, I’ve added some notes.

What part are you referring to here?

Right, though it seems some docks actually connect ethernet via PCIe, and there might also be downstream TB connections that do use PCIe. But now you say this, it actually seems likely that a TB4-capable host can also just implement DMA protection when talking to a TB3 dock, unless there is something in the TB3 protocol that prevents this? Do you happen to know?

What TB3 limitation is in play here, then? AFAIU an USB2 stick would just be connected to the USB2 wires and pretty much talk to the host directly? Or are you talking about a USB3 stick, that would need (potentially insecue) USB-over-PCIe to work at all? Or (as you suggest further on) does TB3 actually not use the USB2 wires at all? I tried finding some spec or documentation, but TB3 is really underdocumented…

Tnx, I added some notes about these things.

I’ve wondered the same. I have the impression that it’s at least frequency and encoding, but I haven’t read anything that’s clear on this subject.

Ah, interesting. A quick google for “adaptive sync MST” shows that some MST hubs advertise explicit support for adaptive sync, but I can imagine that older/other ones indeed break this. I’ve added a note to the original post, seems to much detail for the summary.

Good point, also added a note to the original post.

Galixte · March 26, 2022, 8:11am

This: OWC miniStack STX Review: The Perfect Partner for the Mac mini | PetaPixel ?

Speed and Performance

The company claims the miniStack STX can run at 770 MB/s of “real” performance through the SSD drive and my own testing can confirm those numbers are pretty accurate. This means even if the drives connected inside of the miniStack STX are capable of faster performance, due to the limitations of the Thunderbolt PCI-E channel allocations, the max read/write speeds will be limited to about 770 MB/s. It would be amazing to be able to get peak performance out of the drives installed, but the 770 MB/s speeds are still more than enough for the kinds of work photographers and even many videographers are doing.