Hey everyone, I’m having pretty severe issues with my eGPU setup and could use some help diagnosing and hopefully fixing the problem.
My setup:
- Framework Laptop 13 w/ Ryzen 7640u (BIOS version 03.05)
- 96GB DDR5-RAM
- Ubuntu 22.04 (Wayland) and 6.5.0-1025-oem kernel
- Nvidia RTX 3090 in Razer Core X eGPU enclosure
The issue:
My eGPU randomly crashes and becomes unusable. The behavior is highly inconsistent, making it difficult to pinpoint the cause. Here’s what I’ve observed:
-
Sometimes it crashes during high load, especially when running Stable Diffusion (using the Fooocus web UI as a frontend). This can happen even after just 2 minutes of use.
-
Other times, it runs fine for 20+ minutes at almost 100% load without issues.
-
Occasionally, it crashes even when there’s barely any load.
This randomness indicates that the crash is not simply caused by high GPU usage.
Additional observations:
-
To get the eGPU detected, I have to power off the laptop, plug in the eGPU, and then turn it on. This works about 8 out of 10 times.
-
Once detected, it works fine initially. I use the 3090 solely for AI workloads, like local LLM’s, Stable Diffusion and others.
-
I’ve installed NVIDIA drivers 535.183.01 and blacklisted the Nouveau open-source drivers.
-
When the eGPU crash occurs, the whole system becomes stuttery with the laptop freezing regularly or sometimes outright restarting.
I’ve attached two log files:
-
logs_before_crash.txt: Contains journalctl, dmseg and other logs collected when the eGPU was working properly.
-
logs_after_crash.txt: Contains the same logs collected immediately after the eGPU crash.
I’ve tried updating drivers and checking connections, but the issue persists. The randomness of the crashes and the system-wide impact make the situation particularly frustrating.
Any ideas on what could be causing this or what else I should check? Thanks in advance for any help!