USB-PD with Thunderbolt eGPU Resetting

I’ve had an interesting problem with my Framework for quite a while now that I finally think I have fully root caused. I’d like to share my experiences with everyone who may be equally lost with a future device or ideally convince Framework to handle this edge case in a software update. Sorry it will be a word-wall, but I figure more info is better for the team and to hit as many keywords as possible.

Now, to preface this, I understand Thunderbolt isn’t officially supported, but I’m not sure this problem is strictly related to that interface. In fact, I think it’s more related to the USB-PD handshaking - especially in regards when it handshakes with a device that may not have a completely robust USB-PD controller. Since I don’t have a way to truly diagnose this on an electrical level, I think I have to just explain the situation and see if anyone disagrees.

By and large I’ve not experienced any major issues with my laptop, but it does seem to be average based on some of the fixes (trackpad not registering detents, USB-C EMI shielding touching board). This is to say that I think this problem would be repeatable and is not unique to my hardware, but I’d be willing to help test this theory.

Anyways, the setup:

  • Typically I use my laptop docked, so it is always charging with the Framework OEM 60W USB-C brick. I wanted to get an eGPU for my laptop since it seemed like a lot of people were having success (especially with the Razer Core X eGPU enclosures). I personally wasn’t a huge fan of the price and ports of the Razer Core comparing it to alternatives. This is how I ended up with a version of the Zotac AMP BOX. It has the extra USB ports, plenty of room for a full size card, and 100W charging over the thunderbolt port at a substantially lower price. Perfect. Bought with a Nvidia 2060 RTX. Now I’m hundreds of dollars into this with a GPU and locked into my future stubbornness.

The issue:

  • Sometimes I’d plug in the eGPU and it would immediately connect/disconnect Windows boops, continuously, until a blue screen would occur. The reaction I had was to try unplug and try multiple ports, the left vs right side of the laptop. It was annoying because sometimes I’d be rewarded and it’d just work. If left unattended, it’d trash the device manager and eventually corrupted the registry/Windows so bad I had to reformat as the eGPU and USB devices would no longer be recognized - I thought I broke the Thunderbolt/USB chipset at this point somehow.

  • Sometimes everything was fine for hours and then the eGPU would just disconnect and reconnect. Sometimes it’d be recoverable, other times it’d put me back into the above situation. Completely unpredictable for the longest time and as a streamer, it’d be a real bummer to just have OBS crash out of the blue.

  • Sometimes it’d crash very briefly, just enough to crash the display driver and have my mouse locked into a black screen, but otherwise completely unusable. I’d have to hard reset to recover - This was the most commonly observed issue for me.

Observations

  • There were a couple clues to keep me trying. The charge LED on the Framework laptop and a single LED on the eGPU adapter board. If I could catch it, the LED on the eGPU would blink or turn off.

  • Could not reliably use the laptop w/ eGPU in Best Performance energy mode while streaming or gaming. It was only a matter of time before it disconnected, so Best Battery Life was the only option. This took a long time to figure out the random disconnects.

  • Could not use the laptop w/ eGPU if I plugged in the eGPU at anything less than 100% state of charge. This also took me a while to find this correlation. 98%? It’d almost assuredly blink the LED on the eGPU and crash the connection and or laptop.

  • If I did not have a program using the GPU (e.g. OBS on in the background), the driver would crash and my mouse would be stuck in blackhole purgatory (cursor moves on the completely black integrated display)

Attempts

  • Driver update - Framework bundle or Nvidia basically made no difference. I’m convinced it was placebo here.

  • BIOS update - This was interesting as it included the battery charge cut-off (v3.07). As mentioned before, I have my laptop docked 90% of the time. When I set the charge limit to, I think, 80% the laptop became completely unusable while attached to the eGPU. The laptop charge LEDs blinked erratically, the eGPU never synced, the laptop stopped charging at weird cycles. It was so unusable I immediately reverted to 100% charge limit to try to get my ports/charging working again. I wrote off trying this again incase I found I massive firmware bug, but it did at least give me a hint - this seems charging related

  • I could only make the eGPU with my laptop work by charging with the framework charger on the right-side of the laptop and only plug in the eGPU on the left side after the laptop read “Plugged In” (instead of “Charging”) at 100% SOC with the laptop in Best Battery Life mode. The laptop’s charge LED would always be on the same side of the 100W eGPU, so I assume the Framework 60W wasn’t doing much. This was the only “reliable” way to have the eGPU remain stable. I used it this way for months.

Final Learnings and why I think this is a USB-PD handshake problem
Today, I got a 100W USB-C charger in from Kickstarter from 2 years ago (ChargeASAP, not a recommendation yet). I had switched to using a USB-C 10gbps dongle hub with HDMI to avoid the stability issues whenever I didn’t need the eGPU. I had to reboot and on a whim decided to plug in the eGPU again with my new 100W USB-PD charger going through my USB-C hub.

For the first time ever, the laptop charge LED associated itself with the new 100W dedicated charger. I also intentionally let the laptop discharge to about ~90% before plugging in the eGPU. I knew from other posts the laptop picks the “biggest” power source and just associates with that and the eGPU supposidly does ~85W-100W. Now that I gave the laptop a bigger charger than the eGPU to switch to, I could test out the charging theory.

I then bumped the laptop up to Best Performance energy settings and it still continued to charge, no eGPU led blinking, no crashing or goofy behavior. The laptop charge LED also eventually went to solid white and says it’s at 100% SOC - still on the same side as the dedicated charger. Many things along the way here would have assuredly crashed something, so I felt confident enough to share what I think is the root cause.

I have yet to see if the BIOS 3.07 Battery SOC limiter still works, it’s just day 1 of not pure skepticism.

Proposed Solutions

  1. Firmware bugfix: Framework gets their hands on my same setup and figures out what exactly is happening with the USB-PD toggling and causing the PCIe bus to crash. Maybe something dumb here is causing the charge controller(s) to reset and then that just takes the bus out with it. Maybe this is very unique to my Zotac eGPU and that’s the violating some protocol standards. More than happy to get part numbers or even do some scope debugging if anyone thinks that’s remotely practical - Here to help the Framework ethos.

  2. Firmware/Bios New Feature: Let users choose which port/side the USB-PD is honored on. I think this is a stop gap honestly, but it’d at least give power users a get out of jail card. It also prevents the whole 1 cable docking life. I don’t know how reasonable it is to root cause normal user’s situations if they had the same problem though.

  3. Do both 1 and 2 :slight_smile:

Thanks for making it this far,
Sam

6 Likes

I had a similar issue with my gen13 Framework running a Razer Chroma egpu.

I tried a separate charger, tweaked settings, drivers, etc but nothing really helped. I had been using the same cable that came with my Razer Chroma and worked flawlesly with a Dell XPS 15. After trying everything I could find, I finally purchased a new TB4 cable, and everything appears to be solid.

It appears the cable is more critical with my Framework laptop. If you are looking to buy one, make sure it has the number “4” printed on the plug.

Hope that helps anyone else having problems. The config I have been using has been solid, with just the single TB4 cable used to charge and connect to the egpu. No separate charger required:

-Razer Chroma egpu with nvidia 1060

-Nvidia mobile game ready graphics driver
In nvidia control panel, Manage 3D settings > Power management mode = Prefer maximum performance

-“Cable Matters” 3.3 tb4 cable (link)

I have been working through a similar issue with my bare-mainboard Framework 11th i7, using a th3p4g2 with an RTX3060.

Sometimes it would work perfectly and boot as expected on the eGPU, sometimes it wouldn’t enable the eGPU at all, and sometimes it would enable the eGPU but not produce any display output.

A common symptom is code 12, not enough resources, displayed in Device Manager.

I had varying results if I plugged power into the right side of the mainboard, and the eGPU into the left.

I think in my case, though, the real culprit was the RTC battery (a recurring issue for me). I removed the RTC battery for five minutes, then tried again, and had no issue booting to eGPU with GPU and power both on the left side (my desired configuration).

I have a bare 11th gen i5 in a Cooler Master case and an EXP GDC TH3P4G3 (using a 600W power supply and an RTX 2080 Super), and I cannot get the power delivery to work for me at all. If I plug in a USB-PD power source and the eGPU the eGPU works fine, but it doesn’t provide any power when it’s the only source. It’s terribly frustrating since obviously there’s no battery in my setup and I can’t boot it without another port used up. I thought it might be a cable issue (and it still might), but I’ve tried two different TB4 cables (a Cable Matters alleged 240W cable and an Anker 100W cable) and neither work.

I’ve noticed a few things about the TH3P4G3:

  • On my unit anyways, the ports are mislabeled for PD. Try the other port on the eGPU perhaps?

  • USB PD did not work (well, and/or at all) with an older PSU. It doesn’t seem related to capacity, as both are rated for 650W and the new one for 500W; when I changed to an HDPlex Solid State 500w though, PD started working and much more consistently.

Just a few things to try based on my experiences.

1 Like