[SOLVED] eGPU keeps disconnecting on Fedora 38

Alright forum peeps, I’ve got a strange one for you. I own an EG200 eGPU enclosure from CoolerMaster and inside it is an Intel Arc A770LE. It keeps disconnecting on me! Here’s the kicker. The enclosure does not disconnect, peripherals attached to it via the dedicated USB hub stay attached and the dock itself stays attached. The GPU inside on the other hand…

If it is recognized when I disconnect my laptop, it seems 50/50 it will be recognized when I get back. The only thing that fixes it is reseating the GPU in the enclosure, then on next boot, it’ll be there waiting for me. I’m on 12th gen updated to the beta BIOS and everything is up to date on Fedora 38 Kinoite. Any suggestions on what could be going wrong?

What does journalctl and dmesg say?

Does your eGPU enclosure give enough power for both GPU and attached devices?

Try removing all attached devices to the GPU.

1 Like

@Anachron I’ll report back when it does it again. I had really wanted to play some games last night so I went ahead and reseated it as my usual practice and gamed the night away.

Edit: That didn’t take long. dmesg has some curious events. It explicitly mentions my enclosure by name and states that a card is recognized.

[Sun Apr  2 20:07:15 2023] thunderbolt 1-0:3.1: new retimer found, vendor=0x8087 device=0x15ee
[Sun Apr  2 20:07:16 2023] thunderbolt 1-3: new device found, vendor=0x283 device=0x1
[Sun Apr  2 20:07:16 2023] thunderbolt 1-3: Cooler Master Technology,Inc MasterCase EG200
[Sun Apr  2 20:07:16 2023] pcieport 0000:00:07.3: pciehp: Slot(6): Card present
[Sun Apr  2 20:07:16 2023] pcieport 0000:00:07.3: pciehp: Slot(6): Link Up
[Sun Apr  2 20:07:16 2023] pci 0000:7d:00.0: [8086:15ef] type 01 class 0x060400
[Sun Apr  2 20:07:16 2023] pci 0000:7d:00.0: enabling Extended Tags
[Sun Apr  2 20:07:16 2023] pci 0000:7d:00.0: supports D1 D2
[Sun Apr  2 20:07:16 2023] pci 0000:7d:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[Sun Apr  2 20:07:16 2023] pci 0000:7d:00.0: PTM enabled, 4ns granularity
[Sun Apr  2 20:07:16 2023] pci 0000:7d:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 0000:00:07.3 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)
[Sun Apr  2 20:07:16 2023] pci 0000:7d:00.0: Adding to iommu group 18
[Sun Apr  2 20:07:16 2023] pcieport 0000:00:07.3: ASPM: current common clock configuration is inconsistent, reconfiguring
[Sun Apr  2 20:07:16 2023] pci 0000:7d:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[Sun Apr  2 20:07:16 2023] pci 0000:7e:01.0: [8086:15ef] type 01 class 0x060400
[Sun Apr  2 20:07:16 2023] pci 0000:7e:01.0: enabling Extended Tags
[Sun Apr  2 20:07:16 2023] pci 0000:7e:01.0: supports D1 D2
[Sun Apr  2 20:07:16 2023] pci 0000:7e:01.0: PME# supported from D0 D1 D2 D3hot D3cold
[Sun Apr  2 20:07:16 2023] pci 0000:7e:01.0: Adding to iommu group 19
[Sun Apr  2 20:07:16 2023] pci 0000:7e:02.0: [8086:15ef] type 01 class 0x060400
[Sun Apr  2 20:07:16 2023] pci 0000:7e:02.0: enabling Extended Tags
[Sun Apr  2 20:07:16 2023] pci 0000:7e:02.0: supports D1 D2
[Sun Apr  2 20:07:16 2023] pci 0000:7e:02.0: PME# supported from D0 D1 D2 D3hot D3cold
[Sun Apr  2 20:07:16 2023] pci 0000:7e:02.0: Adding to iommu group 20
[Sun Apr  2 20:07:16 2023] pci 0000:7d:00.0: PCI bridge to [bus 7e-a5]
[Sun Apr  2 20:07:16 2023] pci 0000:7d:00.0:   bridge window [io  0x0000-0x0fff]
[Sun Apr  2 20:07:16 2023] pci 0000:7d:00.0:   bridge window [mem 0x00000000-0x000fffff]
[Sun Apr  2 20:07:16 2023] pci 0000:7d:00.0:   bridge window [mem 0x00000000-0x000fffff 64bit pref]
[Sun Apr  2 20:07:16 2023] pci 0000:7e:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[Sun Apr  2 20:07:16 2023] pci 0000:7e:02.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[Sun Apr  2 20:07:16 2023] pci 0000:7e:01.0: PCI bridge to [bus 7f-a5]
[Sun Apr  2 20:07:16 2023] pci 0000:7e:01.0:   bridge window [io  0x0000-0x0fff]
[Sun Apr  2 20:07:16 2023] pci 0000:7e:01.0:   bridge window [mem 0x00000000-0x000fffff]
[Sun Apr  2 20:07:16 2023] pci 0000:7e:01.0:   bridge window [mem 0x00000000-0x000fffff 64bit pref]
[Sun Apr  2 20:07:16 2023] pci_bus 0000:7f: busn_res: [bus 7f-a5] end is updated to 7f
[Sun Apr  2 20:07:16 2023] pci 0000:80:00.0: [8086:15f0] type 00 class 0x0c0330
[Sun Apr  2 20:07:16 2023] pci 0000:80:00.0: reg 0x10: [mem 0x00000000-0x0000ffff]
[Sun Apr  2 20:07:16 2023] pci 0000:80:00.0: enabling Extended Tags
[Sun Apr  2 20:07:16 2023] pci 0000:80:00.0: supports D1 D2
[Sun Apr  2 20:07:16 2023] pci 0000:80:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[Sun Apr  2 20:07:16 2023] pci 0000:80:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 0000:00:07.3 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)
[Sun Apr  2 20:07:16 2023] pci 0000:80:00.0: Adding to iommu group 21
[Sun Apr  2 20:07:16 2023] pci 0000:7e:02.0: PCI bridge to [bus 80-a5]
[Sun Apr  2 20:07:16 2023] pci 0000:7e:02.0:   bridge window [io  0x0000-0x0fff]
[Sun Apr  2 20:07:16 2023] pci 0000:7e:02.0:   bridge window [mem 0x00000000-0x000fffff]
[Sun Apr  2 20:07:16 2023] pci 0000:7e:02.0:   bridge window [mem 0x00000000-0x000fffff 64bit pref]
[Sun Apr  2 20:07:16 2023] pci_bus 0000:80: busn_res: [bus 80-a5] end is updated to 80
[Sun Apr  2 20:07:16 2023] pci_bus 0000:7e: busn_res: [bus 7e-a5] end is updated to 80
[Sun Apr  2 20:07:16 2023] pci 0000:7d:00.0: BAR 14: assigned [mem 0x52000000-0x5e1fffff]
[Sun Apr  2 20:07:16 2023] pci 0000:7d:00.0: BAR 15: assigned [mem 0x6060000000-0x607bffffff 64bit pref]
[Sun Apr  2 20:07:16 2023] pci 0000:7d:00.0: BAR 13: assigned [io  0x7000-0x8fff]
[Sun Apr  2 20:07:16 2023] pci 0000:7e:01.0: BAR 14: assigned [mem 0x52000000-0x580fffff]
[Sun Apr  2 20:07:16 2023] pci 0000:7e:01.0: BAR 15: assigned [mem 0x6060000000-0x606dffffff 64bit pref]
[Sun Apr  2 20:07:16 2023] pci 0000:7e:02.0: BAR 14: assigned [mem 0x58100000-0x5e1fffff]
[Sun Apr  2 20:07:16 2023] pci 0000:7e:02.0: BAR 15: assigned [mem 0x606e000000-0x607bffffff 64bit pref]
[Sun Apr  2 20:07:16 2023] pci 0000:7e:01.0: BAR 13: assigned [io  0x7000-0x7fff]
[Sun Apr  2 20:07:16 2023] pci 0000:7e:02.0: BAR 13: assigned [io  0x8000-0x8fff]
[Sun Apr  2 20:07:16 2023] pci 0000:7e:01.0: PCI bridge to [bus 7f]
[Sun Apr  2 20:07:16 2023] pci 0000:7e:01.0:   bridge window [io  0x7000-0x7fff]
[Sun Apr  2 20:07:16 2023] pci 0000:7e:01.0:   bridge window [mem 0x52000000-0x580fffff]
[Sun Apr  2 20:07:16 2023] pci 0000:7e:01.0:   bridge window [mem 0x6060000000-0x606dffffff 64bit pref]
[Sun Apr  2 20:07:16 2023] pci 0000:80:00.0: BAR 0: assigned [mem 0x58100000-0x5810ffff]
[Sun Apr  2 20:07:16 2023] pci 0000:7e:02.0: PCI bridge to [bus 80]
[Sun Apr  2 20:07:16 2023] pci 0000:7e:02.0:   bridge window [io  0x8000-0x8fff]
[Sun Apr  2 20:07:16 2023] pci 0000:7e:02.0:   bridge window [mem 0x58100000-0x5e1fffff]
[Sun Apr  2 20:07:16 2023] pci 0000:7e:02.0:   bridge window [mem 0x606e000000-0x607bffffff 64bit pref]
[Sun Apr  2 20:07:16 2023] pci 0000:7d:00.0: PCI bridge to [bus 7e-80]
[Sun Apr  2 20:07:16 2023] pci 0000:7d:00.0:   bridge window [io  0x7000-0x8fff]
[Sun Apr  2 20:07:16 2023] pci 0000:7d:00.0:   bridge window [mem 0x52000000-0x5e1fffff]
[Sun Apr  2 20:07:16 2023] pci 0000:7d:00.0:   bridge window [mem 0x6060000000-0x607bffffff 64bit pref]
[Sun Apr  2 20:07:16 2023] pcieport 0000:00:07.3: PCI bridge to [bus 7d-a5]
[Sun Apr  2 20:07:16 2023] pcieport 0000:00:07.3:   bridge window [io  0x7000-0x8fff]
[Sun Apr  2 20:07:16 2023] pcieport 0000:00:07.3:   bridge window [mem 0x52000000-0x5e1fffff]
[Sun Apr  2 20:07:16 2023] pcieport 0000:00:07.3:   bridge window [mem 0x6060000000-0x607bffffff 64bit pref]
[Sun Apr  2 20:07:16 2023] pcieport 0000:7d:00.0: enabling device (0000 -> 0003)
[Sun Apr  2 20:07:16 2023] pcieport 0000:7e:01.0: enabling device (0000 -> 0003)
[Sun Apr  2 20:07:16 2023] pcieport 0000:7e:02.0: enabling device (0000 -> 0003)
[Sun Apr  2 20:07:16 2023] pci 0000:80:00.0: enabling device (0000 -> 0002)
[Sun Apr  2 20:07:16 2023] xhci_hcd 0000:80:00.0: xHCI Host Controller
[Sun Apr  2 20:07:16 2023] xhci_hcd 0000:80:00.0: new USB bus registered, assigned bus number 5
[Sun Apr  2 20:07:16 2023] xhci_hcd 0000:80:00.0: hcc params 0x200077c1 hci version 0x110 quirks 0x0000000200009810
[Sun Apr  2 20:07:16 2023] xhci_hcd 0000:80:00.0: xHCI Host Controller
[Sun Apr  2 20:07:16 2023] xhci_hcd 0000:80:00.0: new USB bus registered, assigned bus number 6
[Sun Apr  2 20:07:16 2023] xhci_hcd 0000:80:00.0: Host supports USB 3.1 Enhanced SuperSpeed
[Sun Apr  2 20:07:16 2023] usb usb5: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 6.02
[Sun Apr  2 20:07:16 2023] usb usb5: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[Sun Apr  2 20:07:16 2023] usb usb5: Product: xHCI Host Controller
[Sun Apr  2 20:07:16 2023] usb usb5: Manufacturer: Linux 6.2.8-300.fc38.x86_64 xhci-hcd
[Sun Apr  2 20:07:16 2023] usb usb5: SerialNumber: 0000:80:00.0
[Sun Apr  2 20:07:16 2023] hub 5-0:1.0: USB hub found
[Sun Apr  2 20:07:16 2023] hub 5-0:1.0: 2 ports detected
[Sun Apr  2 20:07:16 2023] usb usb6: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 6.02
[Sun Apr  2 20:07:16 2023] usb usb6: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[Sun Apr  2 20:07:16 2023] usb usb6: Product: xHCI Host Controller
[Sun Apr  2 20:07:16 2023] usb usb6: Manufacturer: Linux 6.2.8-300.fc38.x86_64 xhci-hcd
[Sun Apr  2 20:07:16 2023] usb usb6: SerialNumber: 0000:80:00.0
[Sun Apr  2 20:07:16 2023] hub 6-0:1.0: USB hub found
[Sun Apr  2 20:07:16 2023] hub 6-0:1.0: 2 ports detected
[Sun Apr  2 20:07:21 2023] usb usb2-port4: attempt power cycle
[Sun Apr  2 20:07:30 2023] usb usb2-port4: unable to enumerate USB device
[Sun Apr  2 20:11:37 2023] pci_bus 0000:7f: Allocating resources
[Sun Apr  2 20:11:37 2023] pci_bus 0000:80: Allocating resources
[Sun Apr  2 20:11:37 2023] pci_bus 0000:7f: Allocating resources
[Sun Apr  2 20:11:37 2023] pci_bus 0000:80: Allocating resources
[Sun Apr  2 20:14:05 2023] pci 0000:7f:00.0: [8086:4fa0] type 01 class 0x060400
[Sun Apr  2 20:14:05 2023] pci 0000:7f:00.0: reg 0x10: [mem 0x00000000-0x007fffff 64bit pref]
[Sun Apr  2 20:14:05 2023] pci 0000:7f:00.0: PME# supported from D0 D3hot D3cold
[Sun Apr  2 20:14:05 2023] pci 0000:7f:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 0000:00:07.3 (capable of 252.048 Gb/s with 16.0 GT/s PCIe x16 link)
[Sun Apr  2 20:14:05 2023] pci 0000:7f:00.0: Adding to iommu group 22
[Sun Apr  2 20:14:05 2023] pci 0000:7f:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[Sun Apr  2 20:14:05 2023] pci_bus 0000:80: busn_res: [bus 80] end is updated to 80
[Sun Apr  2 20:14:05 2023] pci 0000:7f:00.0: devices behind bridge are unusable because [bus 80] cannot be assigned for them
[Sun Apr  2 20:14:05 2023] pcieport 0000:7e:01.0: bridge has subordinate 7f but max busn 80
[Sun Apr  2 20:14:05 2023] pci_bus 0000:7f: Allocating resources
[Sun Apr  2 20:14:05 2023] pci 0000:7f:00.0: BAR 0: assigned [mem 0x6060000000-0x60607fffff 64bit pref]
[Sun Apr  2 20:14:05 2023] pci_bus 0000:80: Allocating resources
[Sun Apr  2 20:14:05 2023] pcieport 0000:7f:00.0: enabling device (0000 -> 0002)
[Sun Apr  2 20:14:05 2023] pcieport 0000:7f:00.0: bridge configuration invalid ([bus 80-80]), reconfiguring
[Sun Apr  2 20:14:05 2023] pci_bus 0000:80: busn_res: [bus 80] end is updated to 80
[Sun Apr  2 20:14:05 2023] pcieport 0000:7f:00.0: devices behind bridge are unusable because [bus 80] cannot be assigned for them
[Sun Apr  2 20:14:05 2023] pcieport 0000:7e:01.0: bridge has subordinate 7f but max busn 80
[Sun Apr  2 20:14:05 2023] pci_bus 0000:7f: Allocating resources
[Sun Apr  2 20:14:05 2023] pci_bus 0000:80: Allocating resources
[Sun Apr  2 20:14:25 2023] thunderbolt 0000:00:0d.3: 0:3: failed to reach state TB_PORT_UP. Ignoring port...
[Sun Apr  2 20:14:25 2023] thunderbolt 0000:00:0d.3: 0:3: lost during suspend, disconnecting
[Sun Apr  2 20:14:25 2023] thunderbolt 1-0:3.1: retimer disconnected
[Sun Apr  2 20:14:25 2023] thunderbolt 1-3: device disconnected
[Sun Apr  2 20:14:26 2023] pcieport 0000:7d:00.0: not ready 1023ms after resume; giving up
[Sun Apr  2 20:14:26 2023] pcieport 0000:7d:00.0: Unable to change power state from D3cold to D0, device inaccessible
[Sun Apr  2 20:14:26 2023] pcieport 0000:00:07.3: pciehp: Slot(6): Card not present
[Sun Apr  2 20:14:26 2023] pcieport 0000:7e:01.0: Unable to change power state from D3cold to D0, device inaccessible
[Sun Apr  2 20:14:26 2023] pcieport 0000:7f:00.0: Unable to change power state from D3cold to D0, device inaccessible
[Sun Apr  2 20:14:26 2023] pcieport 0000:7e:02.0: Unable to change power state from D3cold to D0, device inaccessible
[Sun Apr  2 20:14:26 2023] xhci_hcd 0000:80:00.0: Unable to change power state from D3cold to D0, device inaccessible
[Sun Apr  2 20:14:26 2023] xhci_hcd 0000:80:00.0: Unable to change power state from D3cold to D0, device inaccessible
[Sun Apr  2 20:14:26 2023] xhci_hcd 0000:80:00.0: Controller not ready at resume -19
[Sun Apr  2 20:14:26 2023] xhci_hcd 0000:80:00.0: PCI post-resume error -19!
[Sun Apr  2 20:14:26 2023] xhci_hcd 0000:80:00.0: HC died; cleaning up
[Sun Apr  2 20:14:26 2023] xhci_hcd 0000:80:00.0: remove, state 4
[Sun Apr  2 20:14:26 2023] usb usb6: USB disconnect, device number 1
[Sun Apr  2 20:14:26 2023] xhci_hcd 0000:80:00.0: USB bus 6 deregistered
[Sun Apr  2 20:14:26 2023] xhci_hcd 0000:80:00.0: remove, state 4
[Sun Apr  2 20:14:26 2023] usb usb5: USB disconnect, device number 1
[Sun Apr  2 20:14:26 2023] xhci_hcd 0000:80:00.0: Host halt failed, -19
[Sun Apr  2 20:14:26 2023] xhci_hcd 0000:80:00.0: Host not accessible, reset failed.
[Sun Apr  2 20:14:26 2023] xhci_hcd 0000:80:00.0: USB bus 5 deregistered
[Sun Apr  2 20:14:26 2023] pci 0000:7f:00.0: Removing from iommu group 22
[Sun Apr  2 20:14:26 2023] pci_bus 0000:7f: busn_res: [bus 7f] is released
[Sun Apr  2 20:14:26 2023] pci 0000:7e:01.0: Removing from iommu group 19
[Sun Apr  2 20:14:26 2023] pci 0000:80:00.0: Removing from iommu group 21
[Sun Apr  2 20:14:26 2023] pci_bus 0000:80: busn_res: [bus 80] is released
[Sun Apr  2 20:14:26 2023] pci 0000:7e:02.0: Removing from iommu group 20
[Sun Apr  2 20:14:26 2023] pci_bus 0000:7e: busn_res: [bus 7e-80] is released
[Sun Apr  2 20:14:26 2023] pci 0000:7d:00.0: Removing from iommu group 18
[Sun Apr  2 20:14:34 2023] usb usb2-port4: attempt power cycle

Additionally, I managed to get the card to work without reseating it by power-cycling the enclosure. I had tried this before and it hadn’t worked so I’m not sure why it did now. And on top of that, attached peripherals were non-functional despite obviously receiving power until I reseated my USB-A expansion card that connects to the hub on the enclosure. All very odd to me.

Well, now it’s even weirder. I no longer have an issue with my eGPU getting recognized. Now I can’t utilize hotplugging with it. Every time I unplug the cable, no matter the order of steps I take, it always results in a system freeze with the mouse becoming unresponsive. Once it even resulted in a kernel panic!

So just your window manager crashing or an actual system freeze, can you get into a tty with ctrl+alt+f2?

I initially switched to wayland because it handles hot-unplug more gracefully than xorg

@Adrian_Joachim Ah thank you, I couldn’t remember the shortcut to reset my WM and kept forgetting to google it, I’ll try that the next time it happens. I’m on Wayland and most of the programs I use are Wayland.

Did you start you wm with the egpu attached or without? If you pull the gpu out from under the wm it could definitely crash.

I’ve done both. I’ve also tried ensuring that the laptop display is active before pulling it off to ensure the iGPU has taken over. I get a few seconds of visible mouse movement and then it freezes. I haven’t used my eGPU yet today so I’ll check tonight.

In my case (t480s + ghetto rigged 5700xt egpu) it works best when attaching the egpu after starting the window manager. I do need to tell the driver before detaching though if I want to be able to smoothly hotplug it after, it doesn’t crash the wm though if I don’t.

How do you do that? I got hotplug to work but unplugging and replugging does not.

Without telling the driver, hotplugging and unplugging works once in my case but it never crashes the wm. How you tell the driver is driver dependent and I only know how to do it for amd gpus so can’t help you with you intel one there.

It did however crash my xorg wm back when I was running that, main reason I switched to wayland.

If you just unplug the egpu the driver is going to get very unhappy which is why repluging does not work, in my case the egpu was the only one using the amdgpu driver so I initially just tried reloading that which sometimes worked and sometimes didn’t cause it was still in use but then I figured out how to tell it to unbind the speciffic gpu and I didn’t even need to reload the driver.

Does the intel driver even support hotplug? The and one didn’t for a while.

I have an amd gpu on Xorg but unbinding causes several issues, either xorg crashing, the egpu crashing and/or simply the kernel not accepting the command.

I have a RX 6600, maybe its GPU related. The ability for hotplug on AMD is quite new and seems instable to say the least.

Interesting, once I figured it out it was surprisingly stable, though xorg is a lot more sensitive about stuff like that.

It isn’t that new, pretty sure it was added a couple years ago.

@Adrian_Joachim I’d cautiously say the Intel driver does support it? As for a brief amount of time, everything was functioning as intended.

I’m not pissed off to the nth degree about all this but I would be lying if I said I wasn’t more than a little annoyed that Thunderbolt certification appears to be meaningless here. I went the least janky route to assure myself of compatibility and smooth functioning of hardware. This has clearly not been the case. The only explanation for solving my first problem was re-seating the GPU and just not putting the side panels and top panel back on my enclosure. Which is kinda dumb. The kernel panic was weird but I can accept as a one-off. This WM stuff, if present, is equally dumb although maybe this behavior wouldn’t exist on Windows and I’m just getting the shaft with UX because I dare to use Linux…wouldn’t be the first time…and on top of all that, every stage here is Intel. The driver work is done by Intel, the CPU is Intel, Thunderbolt is an Intel certification, it’s all Intel.

Why would thunderbolt certification have anything to do with any of this? The tb certification is about the thunderbolt part which seems to work just fine here.

First gen intel dgpu and least janky don’t really go together in my head XD

But yeah egpu on linux is not for the faint of heart at this point

New idea to test tomorrow, instead I’ll remove the HDMI cable first and then the Thunderbolt cable. It only messes up when the display is connected.

Well, I have a solution now. I have no idea why it works but it does. Removing the display cable and then the TB cable results in seamless hotplugging.

1 Like