Hello Framework-Support,
since it affects both Framework 13 and Framework Desktop I ask the question here:
As far as I remember at the 2nd framework event it was said that the framework 13 with AMD AI and Framework Desktop will be fully supported by mentioned linux distributions and llm frameworks.
However, searching around the internet I can not find any evidence that llama.cpp or ollama support the AMD NPUs.
Maybe the framework support team can shed any light on this ?
I think about getting one maybe two desktops, but no support will be a dealbreaker for me.
Thanks for the link.
In principle you are right, but if I am not mistaken these llm frameworks were mentioned (in the background) during the presentation. I hope that framework and AMD support those frameworks in gaining NPU support. Otherwise those devices are a lot less useful having a NPU which can not be used.
Hello,
I may have expressed myself incorrectly. I didn’t really want concrete support, but rather a statement from a Framework employee who reads the forum regarding NPU support.
I asked the question here because both Framework 13 and Framework Desktop are affected and I didn’t want to post the same question in both sub-forums.
If necessary, I can of course contact the official support, but I wanted to avoid unnecessary requests.
However that being said with Framework desktop what you really want to be doing is running models on the integrated GPU using ROCm. The packages and libraries are in the repos for Fedora 42 and will be entering other distros in the future.
No, I do already run my LLMs on a discrete GPU with ollama, llama_cpp_python and GPT4All.
For this I don’t need a framework desktop and ROCm is also not needed.
In conclusion:
If I want to leverage the full potential of the AMD AI I need to use onnx and another special AMD provider.
More and more libraries which can break and need service over time.sigh
Isn’t it just that built in NPUs on current CPUs are really for on-device image signal processing for webcams? And they’re not really designed for actual AI workloads?
Given that most seem to have the capacity for around 50 TOPs or so whereas I think I saw the new RX9070XT gfx card is capable of around 1,500 TOPs … it’s not really that useful if you’re requiring serious AI horsepower.
The NPU have access on all RAM(CPU) so 64 or even 128Go of RAM (and near 256Go on desktop with 4 DIMM…) and The power needed is fare less.
So 50 TOPs is really usable for even the bigger LLM (24B ou even 70B) on local.
if RDNA 4 - Wikipedia is right we have “only” 97.32 TFlops on BF16, so may be ~200 TFlops en FP8 and 400 TOPs on int4 (if usable )
For now the firmware (MES?) is a bit unstable but I can have that perf with the FW16 with iGPU with “only” ~ 4 / 5 TFlops (over the 17-TFlops possible…)
with only the NPU and the use of “FB16-Block” we can expect a x6 gain in perf.
I ordered the framework mainboard only, but functionally same as desktop.
Before i ordered, i researched this same topic (ai using radeon). What i found suggested the primary, existing ai value would be provided almost exclusively by the gpu with rocm, and the npu would be marginally additive…at this time.
I don’t know for a fact, because I’ve been exclusively using nvidia, but that’s why i ordered…to be able to test.
…just sharing my expectations.
The XDNA on this board have a top of 50Tops… not 100% sur how, look it is with a BF16Bloc type, and I may have to try it but the idea look promising (may be more than FP8?)
NPUs are all about low power and efficient processing of small fp8 or similar workloads. Like you said; webcams, signal processing, small llms, etc. its not meant for distilation, complex generation, etc. Sure you can do it on high end NPUs but your often hitting some kind of limiation of the hardware. GPU and CPU working together (with the NPU handling some smaller workloads) is how you make an efficient AI platform.
Each has strengths and weaknesses. It will be nice when we get the XDNA NPU going, but don’t think you will be using a 50 TOPS NPU to do the heavy lifting on a 27b model. It can help, but that RDNA GPU is going to be doing a lot of the operations. remembers when people started playing with NPUs on Pis and first gen NPUs