Llama.cpp/vLLM Toolboxes for LLM inference on Strix Halo

Djip · September 28, 2025, 9:46pm

Thanks!
and like kyuz0 notice I can confirme that 6.4.4 is much faster than actual 7.0.1 release (and in paire with TheRock.)

model	size	params	backend	ngl	n_ubatch	fa	mmap	test	t/s
qwen3 8B BF16	15.26 GiB	8.19 B	ROCm 7.0.1	999	4096	1	0	pp512	325.95 ± 0.22
qwen3 8B BF16	15.26 GiB	8.19 B	ROCm 6.4.4	999	4096	1	0	pp512	1132.26 ± 2.42

Topic		Replies	Views
AMD Strix Halo Llama.cpp Installation Guide for Fedora 42 Framework Desktop framework-desktop-ai-max-300 , ai	18	8034	January 14, 2026
[HOW-TO] Compiling VLLM from source on Strix Halo Framework Desktop ai	59	6721	January 7, 2026
AMD Strix Halo (Ryzen AI Max+ 395) GPU LLM Performance Tests Framework Desktop ai	17	18913	September 29, 2025
Which language models are you using? Framework Desktop	46	2393	March 7, 2026
[TRACKING] Will the AI Max+ 395 (128GB) be able to run gpt-oss-120b? Framework Desktop framework-desktop-ai-max-300 , ai	35	14905	January 25, 2026