I recently tried running Stable Diffusion to try test a stubborn eGPU, and while that still isn’t working I did manage to get it working on the AMD Framework iGPU. I thought I would share because it is pretty nifty, and I did a lot of unnecessary things. I thought could save other people trouble if they were interested.
Your mileage may vary as needed for whatever distro you’re using, but this is what I did.
-
The main struggle is the amount of RAM available to the iGPU. There are two ways to address this:
a. Enable game mode in BIOS, which will allot 4GB RAM as VRAM for the iGPU.
b. Trim down any VRAM-hogging programs. I noticed my browser was the biggest culprit, even with only 1 empty tab open. If you have any Electron apps those will probably be big problems too. I decided to set one browser to be CPU-only, and use that while using the iGPU. I used Firefox, which can be set to avoid the GPU by opening Settings, scrolling to near the bottom, and uncheck both “Performance” checkboxes. This doesn’t make for a great multimedia browsing experience, but I have another browser for that.
c. You can check VRAM using withradeontop
(which we will install in a later step). You’ll need roughly 2.5GB free, though if you get close to filling it up it seems the iGPU is more likely to crash. I have 320MB used at the moment, mostly by GNOME if I had to guess. -
Decide where to install. I am using Silverblue and Distrobox, and decided to make a container for this. I used Ubuntu 22.04 because this is supposed to have ideal compatibility with ROCm for AI on AMD hardware, though I’m not sure how much it matters. Much to my surprise, I did not have to deal with
/dev/kfd
permissions or anything. Simply:
distrobox create --name igpu --home ~/podhome/igpu --image ubuntu:22.04
distrobox enter igpu
(I like to keep the home directory separate).
- Install some things if you don’t have them:
sudo apt install git radeontop
- Set up a Python virtual environment. This assumes you already have Python. Version 3.11 in my case but anything recent should work:
cd ~
python -m venv pyenv
./pyenv/bin/pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.7
git clone https://github.com/comfyanonymous/ComfyUI.git
./pyenv/bin/pip install -r ComfyUI/requirements.txt
-
Then download a Stable Diffusion checkpoint, such as Copax TimeLessXL - SDXL1.0 - V8 | Stable Diffusion Checkpoint | Civitai.
mv
it intoComfyUI/models/checkpoints/
. -
Now run ComfyUI with options that maximally limit the amount of VRAM that gets used:
HSA_OVERRIDE_GFX_VERSION=11.0.0 ../pyenv/bin/python main.py --novram --cpu-vae
Note that we use HSA_OVERRIDE_GFX_VERSION=11.0.0
because the 780m iGPU is gfx1103 (version 11.0.3) which ROCm does not support, but in my experience using the override to tell ROCm to pretend it is gfx1100 seems to work without issue.
-
Mostly done! You can go to https://localhost:8188 and see the ComfyUI interface. You can try a simple workflow like this one: { "last_node_id": 10, "last_link_id": 18, "nodes": [ { "id" - Pastebin.com
-
I said mostly done because you may experience a little “warm-up” issue. Often, when generating the first image, the screen will go black for a second, return for a second, black again for a second, and then return. This is the iGPU crashing and resetting, as seen in
dmesg
. It successfully resets, but after that Stable Diffusion will be hung and usually you’ll need to restart it. Typically it works on the second try.
Inference is much faster on the iGPU. After the first image (which goes slower due to model loading), I got 195.3 seconds to generate an image with the workflow linked above. Using the --cpu
ComfyUI option, the same workflow took 1215.28 seconds. I have energy-saving settings on so it could be a bit faster, but in any case the iGPU is over 6x faster.
I only wish there was a way to further increase RAM available to the GPU, as it could be faster (and more stable) if everything including the VAE (variational autoencoder) could be offloaded to the iGPU. 4GB seems like an arbitrary limit. I guess I also wish I could use the AI co-processor built into this chip yet sitting idly by.
- Bonus: I also updated my kernel, though I doubt this was necessary for the iGPU. It was needed for an eGPU to connect, but in any case, so I don’t forget: on Silverblue one can download the Rawhide kernels without debugging enabled from here: Index of /pub/alt/rawhide-kernel-nodebug/x86_64 (kernel, kernel-core, kernel-modules, kernel-modules-core, kernel-modules-extra is what I used). Then overlay with:
sudo rpm-ostree override replace ./kernel*.rpm
Sample image from above workflow (I know nothing about using Stable Diffusion, just here to test some ROCm functionality, so yes it is bad):