AMD Strix Halo Llama.cpp Installation Guide for Fedora 42

Hi. Thanks for the guide! I got lost in the forest on my first try but got Qwen3 Q8 XL running on my 64GB desktop on the second. I wrote down my steps (some of which are unique to my setup) and want to share here:

  1. Install from USB boot loader, select in BIOS. See Framework website for Fedora 43 install ( Fedora 43 Installation on the Framework Desktop - Framework Guides )
  2. From ( linux-docs/framework-desktop/Fedora-all.md at main · FrameworkComputer/linux-docs · GitHub ) : $ sudo dnf upgrade (then reboot)
  3. Install llama.cpp on Fedora ( AMD Strix Halo Llama.cpp Installation Guide for Fedora 42 )
    1. $ sudo grubby --update-kernel=ALL --args='amd_iommu=off amdgpu.gttsize=49152 ttm.pages_limit=12288000’ (for 64gb ram, search google for ttm.pages_limit calc)
      1. Verify: $ sudo grubby --info=ALL | grep args
      2. Reboot
      3. Verify after reboot: $ cat /proc/cmdline
    2. The BIOS setting for allocated iGPU should be default, 512MB (0.5 GB) minimum
    3. Check if toolbox installed: $ toolbox —version
    4. Add user to GPU groups:
      1. $ sudo user mod -aG video $USER
      2. $ sudo user mod -aG render $USER
    5. Choose and create a toolbox:
      1. Create some boxed backends, ie:
        1. $ toolbox create llama-rocm-6.4.4-rocwmma \ --image docker.io/kyuz0/amd-strix-halo-toolboxes:rocm-6.4.4-rocwmma \ – --device /dev/dri --device /dev/kfd \ --group-add video --group-add render --group-add sudo --security-opt seccomp=unconfined
        2. $ toolbox create llama-vulkan-radv \ --image docker.io/kyuz0/amd-strix-halo-toolboxes:vulkan-radv \ – --device /dev/dri --group-add video --security-opt seccomp=unconfined
    6. Enter the toolbox: $ toolbox enter llama-rocm-6.4.4-rocwmma
      1. Inside toolbox, verify: $ llama-cli —list-devices
      2. ‘exit’ the toolbox
    7. Download a model:
      1. Create a models dir: $ mkdir -p ~/Development/ai/models
      2. Install pip: $ sudo dnf install -y python3-pip
      3. Install hugging face-cli: $ pip install --user “huggingface_hub[hf_transfer]”
      4. Make sure ~/.local/bin is in your PATH:
        1. $ echo ‘export PATH=“$HOME/.local/bin:$PATH”’ >> ~/.bashrc
        2. $ source ~/.bashrc
      5. Actually download the model, ie
        1. $ HF_HUB_ENABLE_HF_TRANSFER=0 huggingface-cli download unsloth/Qwen3-30B-A3B-GGUF \ Qwen3-30B-A3B-UD-Q8_K_XL.gguf \ --local-dir Development/ai/models/qwen3-30B-A3B-Q8_K_XL/
    8. Run the model
      1. $ toolbox enter llama-rocm-6.4.4-rocwmma
      2. $ llama-cli --no-mmap -ngl 999 \ -m ~/Development/ai/models/qwen3-30B-A3B-Q8_K_XL/Qwen3-30B-A3B-UD-Q8_K_XL.gguf
      3. ‘exit’ toolbox when done
    9. Should you want to return memory allocations to their defaults (like to play games or use other memory intensive apps?):
      1. $ sudo grubby --update-kernel=ALL --remove-args=‘amd_iommu=off amdgpu.gttsize ttm.pages_limit’
      2. Then reboot. To go back to using ai models, run step (3.1) again. Can go back-n-forth