Hi. Thanks for the guide! I got lost in the forest on my first try but got Qwen3 Q8 XL running on my 64GB desktop on the second. I wrote down my steps (some of which are unique to my setup) and want to share here:
- Install from USB boot loader, select in BIOS. See Framework website for Fedora 43 install ( Fedora 43 Installation on the Framework Desktop - Framework Guides )
- From ( linux-docs/framework-desktop/Fedora-all.md at main · FrameworkComputer/linux-docs · GitHub ) : $ sudo dnf upgrade (then reboot)
- Install llama.cpp on Fedora ( AMD Strix Halo Llama.cpp Installation Guide for Fedora 42 )
- $ sudo grubby --update-kernel=ALL --args='amd_iommu=off amdgpu.gttsize=49152 ttm.pages_limit=12288000’ (for 64gb ram, search google for ttm.pages_limit calc)
- Verify: $ sudo grubby --info=ALL | grep args
- Reboot
- Verify after reboot: $ cat /proc/cmdline
- The BIOS setting for allocated iGPU should be default, 512MB (0.5 GB) minimum
- Check if toolbox installed: $ toolbox —version
- Add user to GPU groups:
- $ sudo user mod -aG video $USER
- $ sudo user mod -aG render $USER
- Choose and create a toolbox:
- Create some boxed backends, ie:
- $ toolbox create llama-rocm-6.4.4-rocwmma \ --image docker.io/kyuz0/amd-strix-halo-toolboxes:rocm-6.4.4-rocwmma \ – --device /dev/dri --device /dev/kfd \ --group-add video --group-add render --group-add sudo --security-opt seccomp=unconfined
- $ toolbox create llama-vulkan-radv \ --image docker.io/kyuz0/amd-strix-halo-toolboxes:vulkan-radv \ – --device /dev/dri --group-add video --security-opt seccomp=unconfined
- Create some boxed backends, ie:
- Enter the toolbox: $ toolbox enter llama-rocm-6.4.4-rocwmma
- Inside toolbox, verify: $ llama-cli —list-devices
- ‘exit’ the toolbox
- Download a model:
- Create a models dir: $ mkdir -p ~/Development/ai/models
- Install pip: $ sudo dnf install -y python3-pip
- Install hugging face-cli: $ pip install --user “huggingface_hub[hf_transfer]”
- Make sure ~/.local/bin is in your PATH:
- $ echo ‘export PATH=“$HOME/.local/bin:$PATH”’ >> ~/.bashrc
- $ source ~/.bashrc
- Actually download the model, ie
- $ HF_HUB_ENABLE_HF_TRANSFER=0 huggingface-cli download unsloth/Qwen3-30B-A3B-GGUF \ Qwen3-30B-A3B-UD-Q8_K_XL.gguf \ --local-dir Development/ai/models/qwen3-30B-A3B-Q8_K_XL/
- Run the model
- $ toolbox enter llama-rocm-6.4.4-rocwmma
- $ llama-cli --no-mmap -ngl 999 \ -m ~/Development/ai/models/qwen3-30B-A3B-Q8_K_XL/Qwen3-30B-A3B-UD-Q8_K_XL.gguf
- ‘exit’ toolbox when done
- Should you want to return memory allocations to their defaults (like to play games or use other memory intensive apps?):
- $ sudo grubby --update-kernel=ALL --remove-args=‘amd_iommu=off amdgpu.gttsize ttm.pages_limit’
- Then reboot. To go back to using ai models, run step (3.1) again. Can go back-n-forth
- $ sudo grubby --update-kernel=ALL --args='amd_iommu=off amdgpu.gttsize=49152 ttm.pages_limit=12288000’ (for 64gb ram, search google for ttm.pages_limit calc)