FW13 AMD, Radeon 9700, egpu Razer core X V2, Fedora 44, kernel 6.19, rocm and vulkan performance and working setup

I hope this will be useful for anyone who tries egpu with AMD Framework laptops.

Full setup:

  • Framework laptop 13 AMD 7040
  • RAM 48GB
  • GPU Radeon 9700 PRO AI 32GB
  • enclosure Razer Core X V2
  • egpu power 750w thermaltake
  • Connection top left USB-C port
  • OS: Fedora 44, KDE
  • Linux kernel: 6.19

Kernel params (I did not have time yet to eliminate all the extras but I assume something can be removed without issues)

pci=hpmmioprefsize=128G,realloc=on amd_iommu=off thunderbolt.host_reset=0 amdgpu.runpm=0 amdgpu.pcie_gen_cap=0x4 amdgpu.vm_update_mode=3 amdgpu.lockup_timeout=10000,10000,10000,10000 amdgpu.gpu_recovery=0  

llama.cpp build locally 9016 from official llama.cpp container files - llama.cpp/.devops at master · ggml-org/llama.cpp · GitHub

Vulkan

podman run --rm -it --device /dev/kfd --device /dev/dri localhost/llama-cpp-vulkan:b9016 --bench -ngl 999 -fa 1

ROCM

podman run --rm -it -e ROCR_VISIBLE_DEVICES=1 --device /dev/kfd --device /dev/dri --ipc=host -v localhost/llama-cpp-rocm:b9016 --bench -ngl 999 -fa 1

Benchmarks

model size params backend ngl fa test t/s
falcon-h1 1.5B Q8_0 92.72 MiB 91.13 M Vulkan 999 1 pp512 1510.08 ± 16.15
falcon-h1 1.5B Q8_0 92.72 MiB 91.13 M Vulkan 999 1 tg128 28.09 ± 0.15
falcon-h1 1.5B Q8_0 92.72 MiB 91.13 M ROCm 999 1 pp512 1782.84 ± 37.40
falcon-h1 1.5B Q8_0 92.72 MiB 91.13 M ROCm 999 1 tg128 29.88 ± 0.11
llama 1B Q4_K - Medium 762.81 MiB 1.24 B Vulkan 999 1 pp512 141.96 ± 0.07
llama 1B Q4_K - Medium 762.81 MiB 1.24 B Vulkan 999 1 tg128 223.23 ± 6.43
llama 1B Q4_K - Medium 762.81 MiB 1.24 B ROCm 999 1 pp512 13469.18 ± 1879.66
llama 1B Q4_K - Medium 762.81 MiB 1.24 B ROCm 999 1 tg128 183.70 ± 4.60
llama 1B Q8_0 2.59 GiB 2.61 B Vulkan 999 1 pp512 101.33 ± 0.67
llama 1B Q8_0 2.59 GiB 2.61 B Vulkan 999 1 tg128 199.11 ± 0.90
llama 1B Q8_0 2.59 GiB 2.61 B ROCm 999 1 pp512 12940.78 ± 78.43
llama 1B Q8_0 2.59 GiB 2.61 B ROCm 999 1 tg128 106.74 ± 1.09
qwen3 1.7B Q8_0 1.70 GiB 1.72 B Vulkan 999 1 pp512 9173.58 ± 11.28
qwen3 1.7B Q8_0 1.70 GiB 1.72 B Vulkan 999 1 tg128 147.55 ± 1.53
qwen3 1.7B Q8_0 1.70 GiB 1.72 B ROCm 999 1 pp512 11726.52 ± 889.51
qwen3 1.7B Q8_0 1.70 GiB 1.72 B ROCm 999 1 tg128 123.31 ± 0.49
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B Vulkan 999 1 pp512 4778.50 ± 16.40
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B Vulkan 999 1 tg128 117.70 ± 0.75
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B ROCm 999 1 pp512 6367.38 ± 267.15
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B ROCm 999 1 tg128 110.33 ± 0.50
phi3 3B Q8_0 3.80 GiB 3.84 B Vulkan 999 1 pp512 5040.53 ± 164.22
phi3 3B Q8_0 3.80 GiB 3.84 B Vulkan 999 1 tg128 89.76 ± 0.29
phi3 3B Q8_0 3.80 GiB 3.84 B ROCm 999 1 pp512 7301.09 ± 329.32
phi3 3B Q8_0 3.80 GiB 3.84 B ROCm 999 1 tg128 84.32 ± 0.24
gemma3 4B Q8_0 3.84 GiB 3.88 B Vulkan 999 1 pp512 4697.22 ± 147.91
gemma3 4B Q8_0 3.84 GiB 3.88 B Vulkan 999 1 tg128 75.74 ± 0.30
gemma3 4B Q8_0 3.84 GiB 3.88 B ROCm 999 1 pp512 6757.58 ± 381.84
gemma3 4B Q8_0 3.84 GiB 3.88 B ROCm 999 1 tg128 63.07 ± 0.10
qwen3 4B Q8_0 3.98 GiB 4.02 B Vulkan 999 1 pp512 4368.91 ± 382.59
qwen3 4B Q8_0 3.98 GiB 4.02 B Vulkan 999 1 tg128 86.44 ± 0.21
qwen3 4B Q8_0 3.98 GiB 4.02 B ROCm 999 1 pp512 6018.52 ± 312.61
qwen3 4B Q8_0 3.98 GiB 4.02 B ROCm 999 1 tg128 71.13 ± 0.14
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B Vulkan 999 1 pp512 2859.78 ± 442.46
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B Vulkan 999 1 tg128 71.74 ± 0.21
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B ROCm 999 1 pp512 3297.82 ± 112.40
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B ROCm 999 1 tg128 69.62 ± 0.15
qwen35 9B Q4_K - Medium 5.28 GiB 8.95 B Vulkan 999 1 pp512 2734.57 ± 521.28
qwen35 9B Q4_K - Medium 5.28 GiB 8.95 B Vulkan 999 1 tg128 57.32 ± 0.06
qwen35 9B Q4_K - Medium 5.28 GiB 8.95 B ROCm 999 1 pp512 3127.03 ± 101.47
qwen35 9B Q4_K - Medium 5.28 GiB 8.95 B ROCm 999 1 tg128 54.14 ± 0.12
llama 13B Q4_K - Medium 6.96 GiB 12.25 B Vulkan 999 1 pp512 2188.18 ± 337.89
llama 13B Q4_K - Medium 6.96 GiB 12.25 B Vulkan 999 1 tg128 57.18 ± 0.10
llama 13B Q4_K - Medium 6.96 GiB 12.25 B ROCm 999 1 pp512 2385.88 ± 48.51
llama 13B Q4_K - Medium 6.96 GiB 12.25 B ROCm 999 1 tg128 48.87 ± 0.03
bailingmoe2 16B.A1B Q4_K - Medium 9.22 GiB 16.26 B Vulkan 999 1 pp512 5053.65 ± 126.80
bailingmoe2 16B.A1B Q4_K - Medium 9.22 GiB 16.26 B Vulkan 999 1 tg128 115.33 ± 1.02
bailingmoe2 16B.A1B Q4_K - Medium 9.22 GiB 16.26 B ROCm 999 1 pp512 5759.44 ± 61.50
bailingmoe2 16B.A1B Q4_K - Medium 9.22 GiB 16.26 B ROCm 999 1 tg128 91.89 ± 2.62
gpt-oss 20B Q4_K - Medium 10.81 GiB 20.91 B Vulkan 999 1 pp512 3067.63 ± 512.73
gpt-oss 20B Q4_K - Medium 10.81 GiB 20.91 B Vulkan 999 1 tg128 106.24 ± 0.41
gpt-oss 20B Q4_K - Medium 10.81 GiB 20.91 B ROCm 999 1 pp512 4210.16 ± 51.53
gpt-oss 20B Q4_K - Medium 10.81 GiB 20.91 B ROCm 999 1 tg128 96.92 ± 0.39
mistral3 24B Q4_0 12.56 GiB 23.57 B Vulkan 999 1 pp512 1312.18 ± 62.28
mistral3 24B Q4_0 12.56 GiB 23.57 B Vulkan 999 1 tg128 34.43 ± 0.02
mistral3 24B Q4_0 12.56 GiB 23.57 B ROCm 999 1 pp512 1570.13 ± 18.79
mistral3 24B Q4_0 12.56 GiB 23.57 B ROCm 999 1 tg128 34.51 ± 0.13
deepseek2 30B.A3B Q4_K - Medium 17.05 GiB 29.94 B Vulkan 999 1 pp512 2616.68 ± 395.65
deepseek2 30B.A3B Q4_K - Medium 17.05 GiB 29.94 B Vulkan 999 1 tg128 57.51 ± 0.21
deepseek2 30B.A3B Q4_K - Medium 17.05 GiB 29.94 B ROCm 999 1 pp512 2289.09 ± 6.00
deepseek2 30B.A3B Q4_K - Medium 17.05 GiB 29.94 B ROCm 999 1 tg128 48.54 ± 0.28
qwen35moe 35B.A3B Q4_K - Medium 20.60 GiB 34.66 B Vulkan 999 1 pp512 2484.60 ± 487.31
qwen35moe 35B.A3B Q4_K - Medium 20.60 GiB 34.66 B Vulkan 999 1 tg128 54.11 ± 0.06
qwen35moe 35B.A3B Q4_K - Medium 20.60 GiB 34.66 B ROCm 999 1 pp512 2707.23 ± 19.42
qwen35moe 35B.A3B Q4_K - Medium 20.60 GiB 34.66 B ROCm 999 1 tg128 43.16 ± 0.28
qwen35 27B Q6_K 20.97 GiB 26.90 B Vulkan 999 1 pp512 894.04 ± 0.61
qwen35 27B Q6_K 20.97 GiB 26.90 B Vulkan 999 1 tg128 2.14 ± 0.00
qwen35 27B Q6_K 20.97 GiB 26.90 B ROCm 999 1 pp512 691.71 ± 3.23
qwen35 27B Q6_K 20.97 GiB 26.90 B ROCm 999 1 tg128 19.28 ± 0.01

I have no idea what’s wrong with the bench of qwen 3.6 27b (they are 3.6 not 3.5) but on vulkan in cli it produces same 17-22 tps. The significant drop was noticed at about 50k prompt history to approximately 3-5 tokens per second (agentic work).

gpt oss 120 (–n-cpu-moe 20)
Vulkan: Prompt: 2.5 t/s | Generation: 13.1 t/s

Qwen Coder Next 80b A3B --n-cpu-moe 19
ROCM: Prompt: 30.1 t/s | Generation: 15.3 t/s
Vulkan: Prompt: 27.3 t/s | Generation: 14.6 t/s

Qwen3.5-122B-A10B-Q3_K_M-00001-of-00003.gguf --n-cpu-moe 25
ROCM: Prompt: 12.8 t/s | Generation: 9.3 t/s
Vulkan: Prompt: 13.5 t/s | Generation: 7.8 t/s

24b-27b dense and 35b moe A3B are pretty useful in pi.code. For Qwen Next 80b did not find yet proper balance to have enough context size.

1 Like