Linux documentation to run Ollama or Llamacpp or vLLM?

lhl · September 3, 2025, 3:16am

Just an update, I have gotten vLLM at least nominally running. It’s still not for the faint of heart, but it is at least possible now:

This was only tested on ROCm/TheRock nightly builds for ROCm. I’d suggest using the latest one: TheRock/RELEASES.md at main · ROCm/TheRock · GitHub
TheRock PyTorch did not work for me. You can reference TheRock’s external build scripts: TheRock/external-builds/pytorch at main · ROCm/TheRock · GitHub but I had to do a bunch of my own work here: strix-halo-testing/torch-therock at main · lhl/strix-halo-testing · GitHub - this is a script that WFM, but there are a lot of moving parts so you probably need to put some elbow grease in
Then there’s building vLLM itself. Note that if you use TheRock version of PyTorch it segfaults immediately atm so you can’t skip the previous step of building your own torch. Even then I found some models don’t run but I didn’t extensively test what worked and what didn’t. Note, these scripts are rougher but basically serve as documentation for how you would in principle get vLLM working: strix-halo-testing/vllm at main · lhl/strix-halo-testing · GitHub

I made a dedicated thread for discussing PyTorch and vLLM on the Framework Desktop (Strix Halo): PyTorch w/ Flash Attention + vLLM for Strix Halo

Topic		Replies	Views
AMD-specific Ollama Alternative? Framework Desktop	8	2811	August 12, 2025
Quickstart Guide: Ollama With GPU Support (No ROCM Needed) Linux ubuntu	3	3070	January 21, 2026
Status of AMD NPU Support Linux	34	13375	February 28, 2026
Llama.cpp/vLLM Toolboxes for LLM inference on Strix Halo Framework Desktop	56	7809	February 2, 2026
Framework 13 + Ryzen AI + Linux Distro + LLM Linux ubuntu , fedora	20	4021	February 11, 2026