[GUIDE] Use NPU (XDNA2) with Arch Linux and FastFlowLM!

Good News, Everyone!
We can now use the NPU for LLM inference with FastFlowLM. Here’s my little guide for you to achieve it

To use the NPU on Arch Linux you need some patches that will soon be merged into linux, xrt-plugin-amdxdna, and xrt.

You need to :

  1. Build linux-git kernel package with patches
    2. Build xrt and xrt-plugin-amdxdna with patches
  2. Compile FastFlowLM or use my aur package

1. Build linux-git with patches
Usin this AUR package AUR (en) - linux-git you have to change the URL:

$_srcname::git+https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux

become

$_srcname::git+https://gitlab.freedesktop.org/drm/misc/kernel/#branch=drm-misc-fixes

Then makepkg -si

2. Build xrt and xrt-plugin-amdxdna with patches

~~ https://gitlab.archlinux.org/superm1/xrt/-/tree/update-version~~
~~https://gitlab.archlinux.org/superm1/xrt-plugin-amdxdna/-/tree/update-version~~

On each repo you have to execute makepkg -si

3. Compile FastFlowLM
Follow this link to build FastFlowLM

You can also use my aur package AUR (en) - fastflowlm-git

You may also need to edit /etc/systemd/system.conf and change #DefaultLimitMEMLOCK to DefaultLimitMEMLOCK=infinity

Happy inference!

Special thanks to @Mario_Limonciello

8 Likes

Very interesting. Can any GGUF model from HuggingFace be used ?

How does it compare to iGPU performance wise tg and pp ?

Can we run iGPU+NPU with NPU for prompt processing and iGPU for text generation ?

  1. No, they need to adept model to make them work with NPU. List of available model’s here: Models · FastFlowLM they also making tools to automate conversion of GGUF to NPU-compatible
  2. There’s some benchmarks here: Benchmarks · FastFlowLM
  3. Not with FastFlowLM which is NPU-only, at least for now.
2 Likes

I suppose this will not work on the AMD 7040 mainboards considering the repo seems to require XDNA2?

Thank you for the detailed guide however!

For people that don’t want to build the Linux kernel, I’ve pushed a dkms package that should work with kernel 6.18+. Here is me testing the NPU in Lemonade

4 Likes

This is awesome stuff guys, thanks for sharing. For Ubuntu, AMD now have a PPA for the xdna2 DKMS package. There is some more info at FastFlowLM/docs/linux-getting-started.md at main · FastFlowLM/FastFlowLM · GitHub - that doc refers to a .deb package for fastflowlm which doesn’t exist yet but I’m sure it’s only a matter of time.

I was able to build FastFlowLM from source and after a small amount of fiddling around (had to make a symlink from my build directory to src/xclbins) now have gpt-oss-20b running at 18 tok/s via the NPU, even with my CPU in power-saving mode. Very cool.

3 Likes

Thank you. I had to install your amdxdna-dkms package to get it to work on CachyOS because Cachy’s firmware was newer than the driver’s version or something. PSA: Do not set amd_iommu=off as a kernel boot parameter if you want the NPU to work.

1 Like