just want to share how to get Ollama running with iGPU.
Used container
The script uses a container provided by the linked repository. I had quite a similar script to the one of the reddit post, but used not the latest version but the 6.4.1, so dont take this.
Original reddit post
I struggled using ollama with gpu support on my new FW13 with AMD Ryzen HX 370. I stumbled over this reddit post: Reddit - The heart of the internet
Missing permissions on bluefin
I tried to get it to work for bluefin with podman. But I did not have access to /dev/dri folder. I had to add one more
Yep this works for me on Bazzite. I’m the one who posted the quadlet suggestion in the reddit thread too.
Now, if we can just find a container or something to get stable diffusion and text-to-video (or image-to-video) working as well, these “AI” 300 series chips will actually do “AI” (or at least LLMs)
I am not yet familiar with stable diffusion et al. but would definitely be nice!
How is your experience using the GPU so far? Actually it seems like my CPU is faster for inference. I am using gemma3:4b and get around 21 t/s with CPU while GPU only reaches around 13 t/s. Both are utilized around 70% on individual use.
Btw. in my bios I have set iGPU memory usage to medium, which is around 16GB for me.
Not sure if that is relevant.
Haven’t benchmarked it on CPU vs GPU, but it was acceptable performance for me running through that podman container. Like you, I also have my iGPU memory set to medium, and with my 64gb of system memory, it allocated 16 gigs of it for dedicated vram.
It did spit out a gibberish image when I downloaded one of the models from the in-app model browser, but it ran through the GPU for the whole time, so that’s an improvement. It might have been due to the settings I had for the render though. It’s possible if I tweaked things I’d get a good output.
If other folks want to experiment with this and see if they can get it to output a non-gibberish image using the GPU that’d be cool, since I won’t have time to explore more for a few days at least.
No problems running on bazzite and using a fedora distrobox for ROCM. Suggest that distrobox is going to be the easiest way to do this without needing to fiddle anything further.
I tried both approaches. But both times when I run the tutorial model of ComfyUI its process is killed with
0%| | 0/20 [00:00<?, ?it/s]
:0:rocdevice.cpp :2993: 6067000492 us:
Callback: Queue 0x7f57f0600000 aborting with error :
HSA_STATUS_ERROR_INVALID_ISA: The instruction set architecture is invalid. code: 0x100f
Did you try the tutorial model? (The first one that is visible when opening comfyui for the first time)
I am wondering if it has anything to do with the model i had to download. I am new to stable diffusion models, not sure what is recommendable, yet.
Lemonade SDK
Btw. I also tried lemonade SDK via ubuntu toolbox, runs smoothly. Unfortnately I was not able to benchmark t/s via open web ui. But I had a bit the feeling the LLM was a tiny bit faster.
I have set up ubuntu-toolbox and followed instructions on this site (Linux llama.cpp):
In my toolbox-terminal i installed miniforge as described. Used the commands on the website and afterwards started via:
lemonade-server-dev serve
(if that does not work, maybe consider export this variable: export HSA_OVERRIDE_GFX_VERSION=11.0.0)
then i could access it via localhost:8000 I think. And downloaded a model. It is also easy to integrate in open webui via open api connector.
I am just not sure if it utilizes the NPU, because I am missing a tool to monitor it.
Ok to get rid of my issue for stable diffusion I have to use the argument --force-fp32 since float16 does not seem to work. When running in toolbox i have to set it via python main.py --force-fp32 I guess for the podman version it would be via this line: -e CLI_ARGS="--force-fp32" \ but i havent tried yet.
I just wanted to say thank you for your post. I used your container to get Ollama running. I have Framework Desktops on order but I got this running on a non-Framework machine (which is also a Strix Halo 395). I’d ideally like to get Ollama (or vLLM) running across 4 or 6 Strix Halo machines, but that’s another problem