How are you testing the speed of your nvme device?
I ran into that a while ago and it turns out in my case it was just a testing and reporting issue, single threaded dd maxed out at way below the expected 3.5ish GB/s so I was chasing my tail because the usb4 root kept reporting as pcie1.
Turns out single threaded dd just can’t saturate the disk and kdismark did the 3.5GB/s just fine and the usb4 root just always reports the same speed but that has no actual impact on throughput.
@Mario_Limonciello@James3 sorry for the delay: Windows installer decided to throw a tantrum… For now I’ve attached lspci outputs on Linux to the kernel bugzilla report. I will continue trying to install Windows today.
@James3, the lower speed is almost certainly NV’s power management, for example here are two lspci excerpts, the first one is when the card is idle, the second when cuda-z is running:
@knipp30, regarding attachments on kernel bugzilla: there’s “Add an attachment” link near the top of the page right under the list of existing attachments.
Forgot to mention, I received the below message from Minisforum today, I’ll let it speak for itself:
Dear Customer,
Thank you for your patience.
After further discussion regarding your issue, we sincerely apologize that we are unable to conduct accurate debugging on Linux systems due to technical limitations. This problem is likely an isolated compatibility issue specific to Linux. We recommend installing Windows 11 for testing to verify if the random restart still occurs.
Best regards,
[footer truncated to protect the privacy of the representative]
Please reconnect all the parts according to this video: https://www.youtube.com/watch?v=ObK8BskOYPQ
Keep the computer powered off, then turn on your power supply first. Then observe if the graphics card fan starts to rotate. If it doesn’t start rotating, press the forced power-on button on the deg1 panel to observe if the graphics card fan starts rotating. If it still doesn’t rotate, check if the oculink cable is damaged. Try replacing it with a new one. If the fan starts rotating, press the power button on the computer to boot it up. After booting up, enter the system. First, connect the HDMI cable to the HDMI port on the computer, then enter the device manager to check if your graphics card is recognized. If it is recognized, first install the graphics driver from the graphics card official website, then connect the HDMI port to the graphics card and check if it works.
This is after telling them I am using USB4, I have a DEG2, and sending a video of the failure to them…
I have 2 eGPU docks:
UT3G - This works, 100% and is working “well”
DEG2 - This does not work, not all all, not with USB4, not with Occulink
I am sure I am doing something wrong for the Occulink, as there are Framework users stating that Occulink with the 50 series cards are working in other threads (with the DEG1 and DEG2) - what Im not sure is if this is in Linux.
With the DEG2 and Occulink, the GPU shows under lspci output, but is not picked up by the Nvidia driver. I have tried every combination of kernel parameter I could find, and it never seems to work.
Im open to ideas here - because the improved performance seems like it might be worth at least checking out (and, hell, i bought all the parts).
Anything interesting in the logs? (sudo journalctl -b |grep -iE 'nvidia|nvl|nvr')
If there’s nothing there, then maybe try rescanning the PCIe bus (echo 1 >/sys/bus/pci/rescan)
I think your problems are maybe being caused by this device:
63:00.0 Ethernet controller [0200]: Motorcomm Microelectronics. YT6801 Gigabit Ethernet Controller [1f0a:6801] (rev 01)
So, please try physically removing that device if you can, and try again.
Another aspect might be the retimers.
While retimers are not mentioned on the lspci output on windows, they are mentioned on the linux lspci output, and show as disabled on the nvidia gpu card.
So, maybe linux does not support the particular retimer on that particular gpu card yet.
This is the most common internal error of the NV driver. Judging by the previous line (RmInitAdapter: Cannot initialize GSP firmware RM), this is a conflict between NV’s and Framework firmwares. Therefore check if you have the latest Framework firmware, the latest vBIOS for your 5060ti and that you are using the latest NV driver (595.71.x).
Also, have you tried rescanning the PCIe bus?
As crude workaround, you can try the same trick as for TB mode: use the proprietary flavor of the driver (install cuda-drivers package instead of nvidia-open) and disable GSP firmware (options nvidia NVreg_EnableGpuFirmware=0). Not sure if it will help here however, as this may be due to vBIOS not GSP, but worth trying. …And of course even if it does, performance penalty will be even up to 50% in some scenarios…
Finally, I’ve just had a look and there hasn’t been any Framework+Blackwell builds posted on egpu.io yet (Best External Graphics Card Builds | eGPU.io), neither Win nor Linux and neither on Intel nor on AMD. Try to ask the folks that reported success here in the other threads which OS and CPU they have.
It’s a built-in NIC, so not sure if it’s possible to remove it or if it’s soldered. I’ll try opening the laptop later today to check, but I honestly doubt it has anything to do with the TB5 problem as none of the 5 other laptop models on which the same problem was reported, uses this NIC.
Why do you think it may be related?
Do you mean retimers on the GPU card or on the DEG2 adapter? Because as mentioned in the first entry in kernel bugizlla report, the same physical card works perfectly fine with this laptop on Linux when connected with any other non-TB5 adapter (USB4 UT4G, TB3 TH3P4G3, OCuLink DEG1 and DEG2).
The lspci -vvv looks like the retimers are not being used on either the DEG2 or the eGPU card.
Retimers work best on the receiving side of a link, so the DEG2 retimers are the best ones to switch on as they are on the receiving side of the USB4 thunderbolt cable.
I don’t know how to switch them on. It is normally done using firmware in the device (DEG2)
I’ve just had a look on UT4G, they are marked as unsupported also, but maybe TB5 needs them “more critically” than USB4? @Mario_Limonciello what do you think?
On windows, can you do a dump of the PCIe config space in hex.
i.e.
lspci -xxxx (4 x is the best, as it captures more). It should capture 4096 bytes from each device. If you see less, try running it as an admin user or something like that.
and then do the same on Linux.
sudo lspci -xxxx
We can then look for any differences at the byte level
It appears that the windows lspci does not capture some of the PCIe gen 4 bits, so getting a hex dump might capture everything.
The windows output is different from the linux output of lspci -xxxx.
This might be a permissions thing.
Notice how the linux output is about 4096 bytes for each device, but the windows output is about 256 bytes for each device.
Did you use run-as-administrator for the windows one?
It might just be a permissions thing, but also this device is missing on the windows side:
The IOMMU device is missing on the windows side.
I right-clicked on the PowerShell and chose “Run as administrator” or something along that lines and the window title had ‘Administrator’ in its title: I’d guess it means yes, but honestly I know nothing about Windows (before this DEG2 thing, the last time I had a Windows installed was like in 2001).
Thank you for the lspci -xxxx output.
I can then use the lspci -F option on Linux to view both windows and linux output.
Was the lspci -xxxx done while the GPU was in use of windows?
I.e. playing a game or something like that.
Some of the PCIe config on the windows side appears to be disabled when I would not expect it:
In windows: