USB-PD with Thunderbolt eGPU Resetting

I’ve had an interesting problem with my Framework for quite a while now that I finally think I have fully root caused. I’d like to share my experiences with everyone who may be equally lost with a future device or ideally convince Framework to handle this edge case in a software update. Sorry it will be a word-wall, but I figure more info is better for the team and to hit as many keywords as possible.

Now, to preface this, I understand Thunderbolt isn’t officially supported, but I’m not sure this problem is strictly related to that interface. In fact, I think it’s more related to the USB-PD handshaking - especially in regards when it handshakes with a device that may not have a completely robust USB-PD controller. Since I don’t have a way to truly diagnose this on an electrical level, I think I have to just explain the situation and see if anyone disagrees.

By and large I’ve not experienced any major issues with my laptop, but it does seem to be average based on some of the fixes (trackpad not registering detents, USB-C EMI shielding touching board). This is to say that I think this problem would be repeatable and is not unique to my hardware, but I’d be willing to help test this theory.

Anyways, the setup:

  • Typically I use my laptop docked, so it is always charging with the Framework OEM 60W USB-C brick. I wanted to get an eGPU for my laptop since it seemed like a lot of people were having success (especially with the Razer Core X eGPU enclosures). I personally wasn’t a huge fan of the price and ports of the Razer Core comparing it to alternatives. This is how I ended up with a version of the Zotac AMP BOX. It has the extra USB ports, plenty of room for a full size card, and 100W charging over the thunderbolt port at a substantially lower price. Perfect. Bought with a Nvidia 2060 RTX. Now I’m hundreds of dollars into this with a GPU and locked into my future stubbornness.

The issue:

  • Sometimes I’d plug in the eGPU and it would immediately connect/disconnect Windows boops, continuously, until a blue screen would occur. The reaction I had was to try unplug and try multiple ports, the left vs right side of the laptop. It was annoying because sometimes I’d be rewarded and it’d just work. If left unattended, it’d trash the device manager and eventually corrupted the registry/Windows so bad I had to reformat as the eGPU and USB devices would no longer be recognized - I thought I broke the Thunderbolt/USB chipset at this point somehow.

  • Sometimes everything was fine for hours and then the eGPU would just disconnect and reconnect. Sometimes it’d be recoverable, other times it’d put me back into the above situation. Completely unpredictable for the longest time and as a streamer, it’d be a real bummer to just have OBS crash out of the blue.

  • Sometimes it’d crash very briefly, just enough to crash the display driver and have my mouse locked into a black screen, but otherwise completely unusable. I’d have to hard reset to recover - This was the most commonly observed issue for me.

Observations

  • There were a couple clues to keep me trying. The charge LED on the Framework laptop and a single LED on the eGPU adapter board. If I could catch it, the LED on the eGPU would blink or turn off.

  • Could not reliably use the laptop w/ eGPU in Best Performance energy mode while streaming or gaming. It was only a matter of time before it disconnected, so Best Battery Life was the only option. This took a long time to figure out the random disconnects.

  • Could not use the laptop w/ eGPU if I plugged in the eGPU at anything less than 100% state of charge. This also took me a while to find this correlation. 98%? It’d almost assuredly blink the LED on the eGPU and crash the connection and or laptop.

  • If I did not have a program using the GPU (e.g. OBS on in the background), the driver would crash and my mouse would be stuck in blackhole purgatory (cursor moves on the completely black integrated display)

Attempts

  • Driver update - Framework bundle or Nvidia basically made no difference. I’m convinced it was placebo here.

  • BIOS update - This was interesting as it included the battery charge cut-off (v3.07). As mentioned before, I have my laptop docked 90% of the time. When I set the charge limit to, I think, 80% the laptop became completely unusable while attached to the eGPU. The laptop charge LEDs blinked erratically, the eGPU never synced, the laptop stopped charging at weird cycles. It was so unusable I immediately reverted to 100% charge limit to try to get my ports/charging working again. I wrote off trying this again incase I found I massive firmware bug, but it did at least give me a hint - this seems charging related

  • I could only make the eGPU with my laptop work by charging with the framework charger on the right-side of the laptop and only plug in the eGPU on the left side after the laptop read “Plugged In” (instead of “Charging”) at 100% SOC with the laptop in Best Battery Life mode. The laptop’s charge LED would always be on the same side of the 100W eGPU, so I assume the Framework 60W wasn’t doing much. This was the only “reliable” way to have the eGPU remain stable. I used it this way for months.

Final Learnings and why I think this is a USB-PD handshake problem
Today, I got a 100W USB-C charger in from Kickstarter from 2 years ago (ChargeASAP, not a recommendation yet). I had switched to using a USB-C 10gbps dongle hub with HDMI to avoid the stability issues whenever I didn’t need the eGPU. I had to reboot and on a whim decided to plug in the eGPU again with my new 100W USB-PD charger going through my USB-C hub.

For the first time ever, the laptop charge LED associated itself with the new 100W dedicated charger. I also intentionally let the laptop discharge to about ~90% before plugging in the eGPU. I knew from other posts the laptop picks the “biggest” power source and just associates with that and the eGPU supposidly does ~85W-100W. Now that I gave the laptop a bigger charger than the eGPU to switch to, I could test out the charging theory.

I then bumped the laptop up to Best Performance energy settings and it still continued to charge, no eGPU led blinking, no crashing or goofy behavior. The laptop charge LED also eventually went to solid white and says it’s at 100% SOC - still on the same side as the dedicated charger. Many things along the way here would have assuredly crashed something, so I felt confident enough to share what I think is the root cause.

I have yet to see if the BIOS 3.07 Battery SOC limiter still works, it’s just day 1 of not pure skepticism.

Proposed Solutions

  1. Firmware bugfix: Framework gets their hands on my same setup and figures out what exactly is happening with the USB-PD toggling and causing the PCIe bus to crash. Maybe something dumb here is causing the charge controller(s) to reset and then that just takes the bus out with it. Maybe this is very unique to my Zotac eGPU and that’s the violating some protocol standards. More than happy to get part numbers or even do some scope debugging if anyone thinks that’s remotely practical - Here to help the Framework ethos.

  2. Firmware/Bios New Feature: Let users choose which port/side the USB-PD is honored on. I think this is a stop gap honestly, but it’d at least give power users a get out of jail card. It also prevents the whole 1 cable docking life. I don’t know how reasonable it is to root cause normal user’s situations if they had the same problem though.

  3. Do both 1 and 2 :slight_smile:

Thanks for making it this far,
Sam

5 Likes