Framework 13 + Ryzen AI + Linux Distro + LLM

Hi,
I would like to buy the Framework Laptop 13 with Ryzen™ AI 300 Series. I plan to install either Ubuntu or Fedora, and another important consideration is running an LLM locally (at most a 14B model). Do you have any benchmarks for various models like Gemma 3, the Llama series, or DeepSeek running on the Ryzen AI 300 series, so I can pick the right processor and RAM? What is the recommended software stack for running these LLMs locally on Linux?

Regards
Alex

1 Like

For now you cannot use the NPU for inference under Linux. We waiting for support. Driver is here, but no apps are atm using it.

2 Likes

Too bad, any timeline? What use is an NPU if it is not supported (yet)

Thanks for replying though a sad news.

Where are we stuck actually? Is it the Linux kernel or AMD driver or a lack of support in inference engines like llama.cpp ? Does Rocm support NPUs & iGPUs or hybrid model? Why did Framework choose Ryzen AI series processors over Intel core ultra series?

Quite surprised to know that AMD is favouring only windows now with its AI series processors using tools like Gaia. Hope that they will expand the support to GNU/Linux systems very soon. They should have done it in the other way since most of the universities, schools, enterprises etc. that I know or work with use GNU/Linux for running servers and bau applications.

Linux now supports the amdxdna driver. However, popular inference engines like llama.cpp currently lack integration. Ideally, these applications and libraries should begin leveraging the amdxdna driver, either directly or through appropriate abstractions.

I don’t know about rocm.

AMD SoC are years ahead of intel equivalent. I don’t about framework choice.
To help strengthen Linux support for XDNA, consider upvoting relevant GitHub issues. This will signal demand and could encourage AMD to prioritize support more quickly

3 Likes

Could you please share the links for the feature requests or issues to upvote?

9 Likes

some more link:

1 Like

~2 months in - has there been any movement?

Yes, amd dev confirm that we’ll have NPU support in next Ryzen AI Software. 1.5

2 Likes

This?

Not seeing any Linux callouts there

NPU on Linux: the upcoming Ryzen AI 1.5.0 release is supposed to have stronger LLMs-on-Linux support, which may enable us to support NPU-only on Lemonade on Linux.

2 Likes

Thought about a new thread but figure this is relevant here …has anyone successfully gotten llama.cpp or similar to work with the Framework 13 in linux?

  • Using a AI 5 340 board in DIY form with Fedora 42
  • I’ve followed a number of guides, and llama.cpp builds but llama.cpp consistently gives the same errors and will not load models:
    • load_backend: failed to find ggml_backend_init in ~/opt/llama.cpp-vulkan/build/bin/libggml-rpc.so
      load_backend: failed to find ggml_backend_init in ~/opt/llama.cpp-vulkan/build/bin/libggml-vulkan.so
      load_backend: failed to find ggml_backend_init in ~/opt/llama.cpp-vulkan/build/bin/libggml-cpu.so
      
      
    • the compilation doesn’t error and trying to build with -DGGML_BACKEND_DL=ON does cause errors so I have not tested that
    • no public help has actually pointed to a method to fix this
    • I know this seems like a llama.cpp specific git issue, but it is heavily tied to the hardware configuration so asking the community.

I’ve followed the desktop guides from @lhl and @Lars_Urban with contributions by @kyuz0 (though I’d rather not use an arbitrary container and the toolbox tool was not found in the repo anyway).

https://community.frame.work/t/amd-strix-halo-llama-cpp-installation-guide-for-fedora-42/75856

Trying to use vulcan not ROCm. I think I’ve done all the things:

  • setup the environment and build
  • check devices
  • try to load a model

See code below

# root user
dnf install gcc.x86_64    # says installed 15.2.1-1
dnf install gcc-c++       # says installed 15.2.1-1
dnf install libstdc++     # says installed 15.2.1-1
dnf install python3-devel
dnf install python3-pip
dnf install mesa-vulkan-drivers.x86_64
dnf install vulkan-tools.x86_64
dnf install vulkan-headers.noarch
dnf install vulkan-loader-devel
dnf install curl.x86_64     # but already there
dnf install curlpp.x86_64	# A C++ wrapper for libcURL
dnf install curlpp-devel	# Development files for curlpp
dnf install libcurl-devel   # says already installed
dnf install glslc.x86_64
dnf install rocm.noarch
dnf install cmake
sudo grubby --update-kernel=ALL --args='amd_iommu=off amdgpu.gttsize=98304 ttm.pages_limit=25165824' # for 96 GB

# regular user
cd ~/opt/llama.cpp-vulkan/
git pull
cmake -B build -DGGML_VULKAN=ON && cmake --build build --config Release -j 11
~/opt/llama.cpp-vulkan/build/bin/llama-cli --list-devices
        ggml_vulkan: Found 1 Vulkan devices:
        ggml_vulkan: 0 = AMD Radeon 840M Graphics (RADV GFX1152) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
        register_backend: registered backend Vulkan (1 devices)
        register_device: registered device Vulkan0 (AMD Radeon 840M Graphics (RADV GFX1152))
        register_backend: registered backend RPC (0 devices)
        register_backend: registered backend CPU (1 devices)
        register_device: registered device CPU (AMD Ryzen AI 5 340 w/ Radeon 840M)
        load_backend: failed to find ggml_backend_init in ~/opt/llama.cpp-vulkan/build/bin/libggml-rpc.so
        load_backend: failed to find ggml_backend_init in ~/opt/llama.cpp-vulkan/build/bin/libggml-vulkan.so
        load_backend: failed to find ggml_backend_init in ~/opt/llama.cpp-vulkan/build/bin/libggml-cpu.so
        Available devices:
          Vulkan0: AMD Radeon 840M Graphics (RADV GFX1152) (65877 MiB, 65707 MiB free)

~/opt/llama.cpp-vulkan/build/bin/llama-cli -m /home/<username>/opt/llm_models/models/mistral_models/7B-Instruct-v0.3/model.q8_0.gguf -ngl 99
        bunch of output that includes the same 3 errors above and a failure to load model message

        load_tensors: loading model tensors, this can take a while... (mmap = true)
        llama_model_load: error loading model: missing tensor 'token_embd.weight'
        llama_model_load_from_file_impl: failed to load model
        common_init_from_params: failed to load model '~/opt/llm_models/models/mistral_models/7B-Instruct-v0.3/model.q8_0.gguf', try reducing --n-gpu-layers if you're running out of VRAM
        main: error: unable to load model

You will generally get better performance with rocm than vulkan. The catch is you need either rocm 6.4.4 from 6.4.x series or rocm 7.0.2 from 7.0.x series.

Yes. Please see this thread. Also, what Mario said.

1 Like

thanks for the reply both of you …

ehsanj post points to LMStudio, I swear I looked at it before and identified some issue. Guess I should look at it again. [edit it annoys me that only an appimage and they don’t seem to publish any verification methods to confirm the download]

I’d made notes on rocm but had tried to use vulkan. I’d seen contradictory evidence on the rocm vs vulcan performance that was use-case specific.

okay further information LMStudio did work, but to get the appimage to work I had to extract it first. running it directly gave error “dlopen(): error loading libfuse.so.2” but Fedora says fuse2 and fuse3 are installed. Search did not find libfuse in packages.

$ chmod +x ~/opt/LM-Studio-0.3.30-1-x64.AppImage 
$ ./opt/LM-Studio-0.3.30-1-x64.AppImage --appimage-extract
$ ./opt/lmstudio/squashfs-root/lm-studio    # works
  • 3.91 tok/sec, using Mistral small 2509 but not sure if GPU or CPU
  • Models are stored to ~/.lmstudio/models/lmstudio-community
  • If you symbollically link existing model folders into that path it will auto add them to LMStudio
  • I learned my existing gguf exports from Huggingface safetensors using llama.cpp are bad, maybe error I got before is a false-postive, or it prevents correct gguf creation using llama
  • confirm llama-cli works with good gguf models;
    • prior command variant of python llama.cpp/convert-hf-to-gguf.py ./phi3 --outfile output_file.gguf --outtype q8_0 was suspect
    • as ~/opt/llama.cpp-vulkan/build/bin/llama-cli -m /home/<username>/.lmstudio/models/lmstudio-community/Magistral-Small-2509-GGUF/Magistral-Small-2509-Q4_K_M.gguf -ngl 99 works
  • also my fan definitely works
1 Like

This made me chuckle :rofl:. My fans spin up to high 3000 to low 5000 RPMs when using gpt-oss-20b (MXFP4) for the duration of a query, and I get ~23 tok/sec. When using a Q4_K_M model (e.g., mistral-7b-instruct-v0.3), it drops to ~16 tok/sec.

You can try btop (CPU+GPU) and/or nvtop (GPU-only) to confirm GPU usage. If you got enough memory to offload your model to the GPU, that’ll improve perf significantly.

As a side note, I use Gear Lever to manage App Images. It makes the overall experience of working with them much nicer.

1 Like

glad you enjoyed the dry humor …

Model Comparison table shows with Vulkan I’m getting roughly 70% of your throughput , though I don’t know if you have the higher end GPU or not. I asked the same dumb question over and over again “you are an intelligent AI, how many licks does it take to get to the center of a sucker” to compare the models. Obviously it recycled the answer and got cheeky.

model time to first token (s) tokens/sec memory (GB) parameters (B)
gpt OSS 20B 12.44 18.57 16.5 20
mistral 7B Instruct v0.3 22.6 11.42 8.9 7
mistral/magistral small 2509 85.2 3.65 20.6 24
mistral/devstral small 2507 143.16 3.26 20.9 24
microsoft/phi-4-reasoning-plus 70.75 5.67 15.8 15

Fan use is much higher for the Mistral 24B models and the Phi 4 model. Phi 4 is odd, I dumped the markdown comparison table in and asked for the best model, it only took 7.25s to first token but throughput dropped to 3.86 tokens/s. With very limited empirical testing the other models seem more consistent.

Thanks for the help, I’ll eventually get around to testing with ROCm also.

Those numbers are pretty solid, actually. I got the Ryzen AI HX 370 with 96gb memory and 2tb storage. I also have allocated minimum VRAM (0.5gb) in the BIOS and rely on UMA to allow the iGPU to use system memory as needed.