Large language-models, such as ChatGPT, have stirred up quite a storm. To put the tempest in my teapot, I installed ollama on my Framework laptop 13 AMD. Works great… but slowly. Notably, the Radeon chip remained idle while ollama labored. Looking more closely at Ollama now supports AMD graphics cards · Ollama Blog, I noticed its supported-Radeon list did not include my 760M. (Nor even the 7700S that’s available with the Framework 16 humpback). Is there a way to hook up a card on ollama’s list to a Framework 13 laptop, or should I get Nvidia next time?
OK, so here we go.
Ollama is great as it simplifies tremendously the process of getting an AI running. If you ever tried llama.cpp, you know how tiring it is to find the model in the compatible new format released 2 weeks ago because the devs borked the old format, find the best quantization and prompt that work with your use case, etc. Ollama devs do all that work for you so it’s really clean and tidy, working accross operating systems and CPU architectures with minimal overhead. They have really done an amazing and outstanding job.
Now, ollama has a few caveats:
- Only half of the cores seems to be used to run the model in order not to choke the CPU on platforms like phones. Well, this also the downside of not using all the performance of high core count CPUs, what a shame it cannot be changed.
- I tried on a supported GPU with Ubuntu 22.04? 24.04, pop os 22.04, and arch linux. I never has been able to run ollama with a GPU. I have tested it on 3 different computers, with 3 different RX vega 56 GPUs. I still believe GPU compatibility is somewhat experimental on ollama right now. Implementing GPU compatibility is very time and ressource expensive as you have t tweak settings and recompile all models for all GPUs! Not only llama3 and phi3 for example, but every single variant of them for every single variant of GPU… Which is a heavy task. Therefore, it’s still experimental for now, no idea how it works on nvidia though, I won’t touch nvidia products since they are overpriced proprietary mess and largely incompatible with linux.
- My AMD Ryzen 5 5800X laptop runs ollama on the iGPU: no idea how it manages that, the GPU isn’t even on the compatibility list, but my processor isn’t used at all. Maybe in your case it is in fact using your 760M even if compatibility isn’t listed. Or if it’s slow as heck, it’s using the CPU.
Maybe you also have unrealistic expectations after seeing chatgpt being faster than elon musk’s space X spaceships. AIs aren’t fast, unless you throw a 2kW 40000€ GPU at it like openAI does. If you get a word per second, that’s fast.
There’s some different advice depending on whether you use Linux or Windows as your host.
Linux: See Integrated GPU support · Issue #2637 · ollama/ollama · GitHub
Windows: there is a draft PR in progress: Allow AMD iGPUs on windows by dhiltgen · Pull Request #5347 · ollama/ollama · GitHub
Some additional info: I personally don’t use ollama, but have gotten my own models to run on the FW13 AMD GPU through rusticl. Assuming ollama performs similarly to the setup I’ve got working, you could expect between 4-6 TFLOPS for most single-precision compute operations. Obviously this isn’t particularly competitive with, say, a desktop or M3 Max according to benchmarks.
llama.cpp works with GPU acceleration, I believe it uses Vulkan. I don’t know what Ollama uses, but I use Jan (UI interface around llama.cpp available as AppImage) and I have GPU acceleration all right when I enable it in the settings. There is also LocalAI.
I use the dolphin llama3 13b model with Q5_K_M. It’s not really fast but I still find it decent on my FW13 Ryzen.