Hey everyone, I decided to take this out for a spin with my new-to-me Ryzen Framework and see how it performs.
Lengthy details on how I got it working over on my “unboxing” blog post on my website, but TL;DR there’s an open PR to ollama that lets you unlock the full RAM of the system, not just allocated VRAM. Here are the results. Sadly it looks like the benchmark y’all were using completely changed out their models, so I can’t really compare to the others in this thread.
CPU inference only looks like this:
{
"mistral:7b": "11.99",
"llama3.1:8b": "10.81",
"phi3:3.8b": "19.74",
"qwen2:7b": "12.07",
"gemma2:9b": "8.62",
"llava:7b": "12.71",
"llava:13b": "6.93",
"uuid": "0f96d8fc-0390-5f84-bd51-3595976f0b2d",
"ollama_version": "0.0.0"
}
{
"system": "Linux",
"memory": 27.205055236816406,
"cpu": "AMD Ryzen 7 7840U w/ Radeon 780M Graphics",
"gpu": "Phoenix1",
"os_version": "Bazzite 41 (FROM Fedora Kinoite)",
"system_name": "Linux",
"uuid": "0f96d8fc-0390-5f84-bd51-3595976f0b2d"
}
While GPU acceleration with the experimental VRAM PR looks like this:
{
"mistral:7b": "17.15",
"llama3.1:8b": "10.84",
"phi3:3.8b": "24.39",
"qwen2:7b": "12.12",
"gemma2:9b": "11.62",
"llava:7b": "17.40",
"llava:13b": "9.62",
"uuid": "0f96d8fc-0390-5f84-bd51-3595976f0b2d",
"ollama_version": "0.0.0"
}
{
"system": "Linux",
"memory": 27.205055236816406,
"cpu": "AMD Ryzen 7 7840U w/ Radeon 780M Graphics",
"gpu": "Phoenix1",
"os_version": "Bazzite 41 (FROM Fedora Kinoite)",
"system_name": "Linux",
"uuid": "0f96d8fc-0390-5f84-bd51-3595976f0b2d"
}
Sadly, since I made that post, I’ve been running into some trouble; it looks like the system just fails to run properly if it’s under a fair bit of memory pressure. I can’t run a 14B model when using 14GB of RAM already, it just never finishes loading. But here’s hoping it improves and gets merged soon!