Llama.cpp/vLLM Toolboxes for LLM inference on Strix Halo

Thanks!
and like kyuz0 notice I can confirme that 6.4.4 is much faster than actual 7.0.1 release (and in paire with TheRock.)

model size params backend ngl n_ubatch fa mmap test t/s
qwen3 8B BF16 15.26 GiB 8.19 B ROCm 7.0.1 999 4096 1 0 pp512 325.95 ± 0.22
qwen3 8B BF16 15.26 GiB 8.19 B ROCm 6.4.4 999 4096 1 0 pp512 1132.26 ± 2.42