@kyuz0 maybe i wasn’t clear, my issue is specifically around the combination of Mellanox cards and the Framework motherboard.
wrt the Intel cards - I have multiple pairs of them, both the 1-port E810-CQDA1 and the 2-port E810-CQDA2, they all perform just as they should. Meaning that the cards (which are PCIe 4.0 x16) happily negotiate down to PCIe 4.0 x4 in the Framework desktop, and I see the expected >50Gbps throughput and sub-5us latency. If the Infiniband experiment fails I’ll use these.
BUT, with the Mellanox cards (which are also PCIe 4.0 x16), they point blank refuse to negotiate stably down to PCIe 4.0 x4. Instead, they drop all the way down to PCIe 3.0 x4. At that point I see the expected 28Gbps throughtput and sub-1us latency (Infiniband). This very low latency number is what has kept me exploring the problem.
Apparently the MCX cards have very strict timing requirements and tax the PCI slot a lot compared to the Intel cards. Internet pundits point at the issue of signal degradation in the path between PCI slot and NIC. Which isn’t just “card in slot” becasue of the x4 slot being close-ended.
Anyway, typical solutions involve rigid adapter (as you suggested), short PCIe riser cables, Oculink, MCIO with redrivers, and of course powered risers. I’m slowly working my way through the solution matrix but I can say that so far the short PCIe riser cables, Oculink cards and powered risers have not solved it.
Again, if I could just plug the damn card into the motherboard, this would have been a 15-minute quest, not a 2-week exercise. It has been instructive, I’ve learned about some more esoteric solutions like MCIO redrivers, and the ultrasonic knife is looking intriguing. IDK if the Framework board is capable of negotiating Gen4 with MCX Connect-X 5 cards. Wierd since the FWD handles the Intel cards just fine.
Ideally someone with a FWD and some CX5 cards would post “hey this combination worked” and I’d just copy them. Until then I keep trying alternatives (all hail the Amazon return policy!)