Status of AMD NPU Support

Hello Framework-Support,
since it affects both Framework 13 and Framework Desktop I ask the question here:
As far as I remember at the 2nd framework event it was said that the framework 13 with AMD AI and Framework Desktop will be fully supported by mentioned linux distributions and llm frameworks.
However, searching around the internet I can not find any evidence that llama.cpp or ollama support the AMD NPUs.
Maybe the framework support team can shed any light on this ?

I think about getting one maybe two desktops, but no support will be a dealbreaker for me.

Thank you

2 Likes

Specifically for linux, I think this may be relevant to what you want to know:

Whether this leads to llama.cpp and ollama supporting it is something that will be dependent on the contributors to those projects.

3 Likes

Thanks for the link.
In principle you are right, but if I am not mistaken these llm frameworks were mentioned (in the background) during the presentation. I hope that framework and AMD support those frameworks in gaining NPU support. Otherwise those devices are a lot less useful having a NPU which can not be used.

Framework team does not provide support here :slight_smile: this is community support category. hope this helps!

1 Like

Hello,
I may have expressed myself incorrectly. I didn’t really want concrete support, but rather a statement from a Framework employee who reads the forum regarding NPU support.
I asked the question here because both Framework 13 and Framework Desktop are affected and I didn’t want to post the same question in both sub-forums.
If necessary, I can of course contact the official support, but I wanted to avoid unnecessary requests.

The way to interact with NPU under Linux is going to be with ONNX Runtime.

Right now the full stack for running an LLM on NPU using OGA (Running LLMs — Ryzen AI Software 1.3 documentation) doesn’t really exist, but that should change in the future.

However that being said with Framework desktop what you really want to be doing is running models on the integrated GPU using ROCm. The packages and libraries are in the repos for Fedora 42 and will be entering other distros in the future.

3 Likes

No, I do already run my LLMs on a discrete GPU with ollama, llama_cpp_python and GPT4All.
For this I don’t need a framework desktop and ROCm is also not needed.
In conclusion:
If I want to leverage the full potential of the AMD AI I need to use onnx and another special AMD provider.
More and more libraries which can break and need service over time.sigh

Yeah it’s early days for software that can use both NPU and GPU at the same time. But OGA is the way that will happen.

Reading the linked page there are some funny things:

  • for running the model you need windows 11
  • for preparing and generating the model a linux computer with AMD/NVIDIA is needed

What worries me is that there seems to be .bin-files generated. Hopefully these are not executables…

Isn’t it just that built in NPUs on current CPUs are really for on-device image signal processing for webcams? And they’re not really designed for actual AI workloads?

Given that most seem to have the capacity for around 50 TOPs or so whereas I think I saw the new RX9070XT gfx card is capable of around 1,500 TOPs … it’s not really that useful if you’re requiring serious AI horsepower.

The NPU have access on all RAM(CPU) so 64 or even 128Go of RAM (and near 256Go on desktop with 4 DIMM…) and The power needed is fare less.
So 50 TOPs is really usable for even the bigger LLM (24B ou even 70B) on local.

if RDNA 4 - Wikipedia is right we have “only” 97.32 TFlops on BF16, so may be ~200 TFlops en FP8 and 400 TOPs on int4 (if usable :wink: )

OK: https://www.amd.com/en/products/graphics/desktops/radeon/9000-series/amd-radeon-rx-9070xt.html
report Peak INT4 Performance 1557 TOPs … I expect it is with sparse matrix so ~800 on full?

For now the firmware (MES?) is a bit unstable but I can have that perf with the FW16 with iGPU with “only” ~ 4 / 5 TFlops (over the 17-TFlops possible…)

with only the NPU and the use of “FB16-Block” we can expect a x6 gain in perf. :crossed_fingers:

1 Like

I ordered the framework mainboard only, but functionally same as desktop.
Before i ordered, i researched this same topic (ai using radeon). What i found suggested the primary, existing ai value would be provided almost exclusively by the gpu with rocm, and the npu would be marginally additive…at this time.
I don’t know for a fact, because I’ve been exclusively using nvidia, but that’s why i ordered…to be able to test.
…just sharing my expectations.

The XDNA on this board have a top of 50Tops… not 100% sur how, look it is with a BF16Bloc type, and I may have to try it but the idea look promising (may be more than FP8?)

NPUs are all about low power and efficient processing of small fp8 or similar workloads. Like you said; webcams, signal processing, small llms, etc. its not meant for distilation, complex generation, etc. Sure you can do it on high end NPUs but your often hitting some kind of limiation of the hardware. GPU and CPU working together (with the NPU handling some smaller workloads) is how you make an efficient AI platform.
Each has strengths and weaknesses. It will be nice when we get the XDNA NPU going, but don’t think you will be using a 50 TOPS NPU to do the heavy lifting on a 27b model. It can help, but that RDNA GPU is going to be doing a lot of the operations. remembers when people started playing with NPUs on Pis and first gen NPUs