Ollama with GPU on Linux Framework 13 AMD Ryzen HX 370

Hi,

just want to share how to get Ollama running with iGPU.

Used container

The script uses a container provided by the linked repository. I had quite a similar script to the one of the reddit post, but used not the latest version but the 6.4.1, so dont take this.

Original reddit post

I struggled using ollama with gpu support on my new FW13 with AMD Ryzen HX 370. I stumbled over this reddit post: Reddit - The heart of the internet

Missing permissions on bluefin

I tried to get it to work for bluefin with podman. But I did not have access to /dev/dri folder. I had to add one more

podman run --name ollama \
  -v /var/home/martind/OllamaModels/:/root/.ollama \
  -e OLLAMA_FLASH_ATTENTION=true \
  -e HSA_OVERRIDE_GFX_VERSION="11.0.0" \
  -e OLLAMA_KV_CACHE_TYPE="q8_0" \
  -e OLLAMA_DEBUG=0 \
  --device /dev/kfd \
  --device /dev/dri \
  --security-opt label=type:container_runtime_t \
  -p 127.0.0.1:11434:11434 \
  ghcr.io/rjmalagon/ollama-linux-amd-apu:latest \
  serve

Compared to the reddit post, I had to add this: --security-opt label=type:container_runtime_t

([Fedora Silverblue] Ollama with ROCM - failed to check permission on /dev/kfd: open /dev/kfd: invalid argument - #10 by garrett - Fedora Discussion)

Which Linux distro are you using? Fedora 41 with bluefin-dx:gts

Which kernel are you using? 6.14.6

Which BIOS version are you using? 3.0.3

Which Framework Laptop 13 model are you using? AMD Ryzen™ AI 300 Series

Yep this works for me on Bazzite. I’m the one who posted the quadlet suggestion in the reddit thread too.

Now, if we can just find a container or something to get stable diffusion and text-to-video (or image-to-video) working as well, these “AI” 300 series chips will actually do “AI” (or at least LLMs)

1 Like

What about NPU support ?

1 Like

I am not yet familiar with stable diffusion et al. but would definitely be nice!

How is your experience using the GPU so far? Actually it seems like my CPU is faster for inference. I am using gemma3:4b and get around 21 t/s with CPU while GPU only reaches around 13 t/s. Both are utilized around 70% on individual use.

Btw. in my bios I have set iGPU memory usage to medium, which is around 16GB for me.
Not sure if that is relevant.

Haven’t benchmarked it on CPU vs GPU, but it was acceptable performance for me running through that podman container. Like you, I also have my iGPU memory set to medium, and with my 64gb of system memory, it allocated 16 gigs of it for dedicated vram.

I DID get ComfyUI to run and to go through the GPU with this container
ComfyUI-Docker/rocm/README.adoc at main · YanWenKun/ComfyUI-Docker · GitHub.

After building it according to the instructions on the repo… I ran it with:

podman run -it --rm \
--name comfyui-rocm \
--device=/dev/kfd --device=/dev/dri \
--group-add=video --ipc=host --cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
--security-opt label=disable \
-p 8188:8188 \
-v "$(pwd)"/storage:/root \
-e CLI_ARGS="" \
-e HSA_OVERRIDE_GFX_VERSION=11.0.0 \
yanwk/comfyui-boot:rocm

It did spit out a gibberish image when I downloaded one of the models from the in-app model browser, but it ran through the GPU for the whole time, so that’s an improvement. It might have been due to the settings I had for the render though. It’s possible if I tweaked things I’d get a good output.

If other folks want to experiment with this and see if they can get it to output a non-gibberish image using the GPU that’d be cool, since I won’t have time to explore more for a few days at least.

1 Like

No problems running on bazzite and using a fedora distrobox for ROCM. Suggest that distrobox is going to be the easiest way to do this without needing to fiddle anything further.

Hi,

ComfiyUI issues

I tried both approaches. But both times when I run the tutorial model of ComfyUI its process is killed with

  0%|                                                    | 0/20 [00:00<?, ?it/s]
:0:rocdevice.cpp            :2993: 6067000492 us:  
Callback: Queue 0x7f57f0600000 aborting with error : 
HSA_STATUS_ERROR_INVALID_ISA: The instruction set architecture is invalid. code: 0x100f

Did you try the tutorial model? (The first one that is visible when opening comfyui for the first time)
I am wondering if it has anything to do with the model i had to download. I am new to stable diffusion models, not sure what is recommendable, yet.

Lemonade SDK

Btw. I also tried lemonade SDK via ubuntu toolbox, runs smoothly. Unfortnately I was not able to benchmark t/s via open web ui. But I had a bit the feeling the LLM was a tiny bit faster.
I have set up ubuntu-toolbox and followed instructions on this site (Linux llama.cpp):

In my toolbox-terminal i installed miniforge as described. Used the commands on the website and afterwards started via:

lemonade-server-dev serve

(if that does not work, maybe consider export this variable: export HSA_OVERRIDE_GFX_VERSION=11.0.0)
then i could access it via localhost:8000 I think. And downloaded a model. It is also easy to integrate in open webui via open api connector.

I am just not sure if it utilizes the NPU, because I am missing a tool to monitor it.

Ok to get rid of my issue for stable diffusion I have to use the argument --force-fp32 since float16 does not seem to work. When running in toolbox i have to set it via python main.py --force-fp32 I guess for the podman version it would be via this line: -e CLI_ARGS="--force-fp32" \ but i havent tried yet.

I just wanted to say thank you for your post. I used your container to get Ollama running. I have Framework Desktops on order but I got this running on a non-Framework machine (which is also a Strix Halo 395). I’d ideally like to get Ollama (or vLLM) running across 4 or 6 Strix Halo machines, but that’s another problem :joy:

1 Like

Credit goes mainly to @Nitrousoxide :slight_smile:
Happy toe hear about your experience with the 395! Also ordered one :slight_smile:

1 Like