No, they need to adept model to make them work with NPU. List of available model’s here: Models · FastFlowLM they also making tools to automate conversion of GGUF to NPU-compatible
For people that don’t want to build the Linux kernel, I’ve pushed a dkms package that should work with kernel 6.18+. Here is me testing the NPU in Lemonade
This is awesome stuff guys, thanks for sharing. For Ubuntu, AMD now have a PPA for the xdna2 DKMS package. There is some more info at FastFlowLM/docs/linux-getting-started.md at main · FastFlowLM/FastFlowLM · GitHub - that doc refers to a .deb package for fastflowlm which doesn’t exist yet but I’m sure it’s only a matter of time.
I was able to build FastFlowLM from source and after a small amount of fiddling around (had to make a symlink from my build directory to src/xclbins) now have gpt-oss-20b running at 18 tok/s via the NPU, even with my CPU in power-saving mode. Very cool.
Thank you. I had to install your amdxdna-dkms package to get it to work on CachyOS because Cachy’s firmware was newer than the driver’s version or something. PSA: Do not set amd_iommu=off as a kernel boot parameter if you want the NPU to work.