I hope this will be useful for anyone who tries egpu with AMD Framework laptops.
Full setup:
- Framework laptop 13 AMD 7040
- RAM 48GB
- GPU Radeon 9700 PRO AI 32GB
- enclosure Razer Core X V2
- egpu power 750w thermaltake
- Connection top left USB-C port
- OS: Fedora 44, KDE
- Linux kernel: 6.19
Kernel params (I did not have time yet to eliminate all the extras but I assume something can be removed without issues)
pci=hpmmioprefsize=128G,realloc=on amd_iommu=off thunderbolt.host_reset=0 amdgpu.runpm=0 amdgpu.pcie_gen_cap=0x4 amdgpu.vm_update_mode=3 amdgpu.lockup_timeout=10000,10000,10000,10000 amdgpu.gpu_recovery=0
llama.cpp build locally 9016 from official llama.cpp container files - llama.cpp/.devops at master · ggml-org/llama.cpp · GitHub
Vulkan
podman run --rm -it --device /dev/kfd --device /dev/dri localhost/llama-cpp-vulkan:b9016 --bench -ngl 999 -fa 1
ROCM
podman run --rm -it -e ROCR_VISIBLE_DEVICES=1 --device /dev/kfd --device /dev/dri --ipc=host -v localhost/llama-cpp-rocm:b9016 --bench -ngl 999 -fa 1
Benchmarks
| model | size | params | backend | ngl | fa | test | t/s |
|---|---|---|---|---|---|---|---|
| falcon-h1 1.5B Q8_0 | 92.72 MiB | 91.13 M | Vulkan | 999 | 1 | pp512 | 1510.08 ± 16.15 |
| falcon-h1 1.5B Q8_0 | 92.72 MiB | 91.13 M | Vulkan | 999 | 1 | tg128 | 28.09 ± 0.15 |
| falcon-h1 1.5B Q8_0 | 92.72 MiB | 91.13 M | ROCm | 999 | 1 | pp512 | 1782.84 ± 37.40 |
| falcon-h1 1.5B Q8_0 | 92.72 MiB | 91.13 M | ROCm | 999 | 1 | tg128 | 29.88 ± 0.11 |
| llama 1B Q4_K - Medium | 762.81 MiB | 1.24 B | Vulkan | 999 | 1 | pp512 | 141.96 ± 0.07 |
| llama 1B Q4_K - Medium | 762.81 MiB | 1.24 B | Vulkan | 999 | 1 | tg128 | 223.23 ± 6.43 |
| llama 1B Q4_K - Medium | 762.81 MiB | 1.24 B | ROCm | 999 | 1 | pp512 | 13469.18 ± 1879.66 |
| llama 1B Q4_K - Medium | 762.81 MiB | 1.24 B | ROCm | 999 | 1 | tg128 | 183.70 ± 4.60 |
| llama 1B Q8_0 | 2.59 GiB | 2.61 B | Vulkan | 999 | 1 | pp512 | 101.33 ± 0.67 |
| llama 1B Q8_0 | 2.59 GiB | 2.61 B | Vulkan | 999 | 1 | tg128 | 199.11 ± 0.90 |
| llama 1B Q8_0 | 2.59 GiB | 2.61 B | ROCm | 999 | 1 | pp512 | 12940.78 ± 78.43 |
| llama 1B Q8_0 | 2.59 GiB | 2.61 B | ROCm | 999 | 1 | tg128 | 106.74 ± 1.09 |
| qwen3 1.7B Q8_0 | 1.70 GiB | 1.72 B | Vulkan | 999 | 1 | pp512 | 9173.58 ± 11.28 |
| qwen3 1.7B Q8_0 | 1.70 GiB | 1.72 B | Vulkan | 999 | 1 | tg128 | 147.55 ± 1.53 |
| qwen3 1.7B Q8_0 | 1.70 GiB | 1.72 B | ROCm | 999 | 1 | pp512 | 11726.52 ± 889.51 |
| qwen3 1.7B Q8_0 | 1.70 GiB | 1.72 B | ROCm | 999 | 1 | tg128 | 123.31 ± 0.49 |
| mistral3 3B Q4_K - Medium | 1.99 GiB | 3.43 B | Vulkan | 999 | 1 | pp512 | 4778.50 ± 16.40 |
| mistral3 3B Q4_K - Medium | 1.99 GiB | 3.43 B | Vulkan | 999 | 1 | tg128 | 117.70 ± 0.75 |
| mistral3 3B Q4_K - Medium | 1.99 GiB | 3.43 B | ROCm | 999 | 1 | pp512 | 6367.38 ± 267.15 |
| mistral3 3B Q4_K - Medium | 1.99 GiB | 3.43 B | ROCm | 999 | 1 | tg128 | 110.33 ± 0.50 |
| phi3 3B Q8_0 | 3.80 GiB | 3.84 B | Vulkan | 999 | 1 | pp512 | 5040.53 ± 164.22 |
| phi3 3B Q8_0 | 3.80 GiB | 3.84 B | Vulkan | 999 | 1 | tg128 | 89.76 ± 0.29 |
| phi3 3B Q8_0 | 3.80 GiB | 3.84 B | ROCm | 999 | 1 | pp512 | 7301.09 ± 329.32 |
| phi3 3B Q8_0 | 3.80 GiB | 3.84 B | ROCm | 999 | 1 | tg128 | 84.32 ± 0.24 |
| gemma3 4B Q8_0 | 3.84 GiB | 3.88 B | Vulkan | 999 | 1 | pp512 | 4697.22 ± 147.91 |
| gemma3 4B Q8_0 | 3.84 GiB | 3.88 B | Vulkan | 999 | 1 | tg128 | 75.74 ± 0.30 |
| gemma3 4B Q8_0 | 3.84 GiB | 3.88 B | ROCm | 999 | 1 | pp512 | 6757.58 ± 381.84 |
| gemma3 4B Q8_0 | 3.84 GiB | 3.88 B | ROCm | 999 | 1 | tg128 | 63.07 ± 0.10 |
| qwen3 4B Q8_0 | 3.98 GiB | 4.02 B | Vulkan | 999 | 1 | pp512 | 4368.91 ± 382.59 |
| qwen3 4B Q8_0 | 3.98 GiB | 4.02 B | Vulkan | 999 | 1 | tg128 | 86.44 ± 0.21 |
| qwen3 4B Q8_0 | 3.98 GiB | 4.02 B | ROCm | 999 | 1 | pp512 | 6018.52 ± 312.61 |
| qwen3 4B Q8_0 | 3.98 GiB | 4.02 B | ROCm | 999 | 1 | tg128 | 71.13 ± 0.14 |
| mistral3 8B Q4_K - Medium | 4.83 GiB | 8.49 B | Vulkan | 999 | 1 | pp512 | 2859.78 ± 442.46 |
| mistral3 8B Q4_K - Medium | 4.83 GiB | 8.49 B | Vulkan | 999 | 1 | tg128 | 71.74 ± 0.21 |
| mistral3 8B Q4_K - Medium | 4.83 GiB | 8.49 B | ROCm | 999 | 1 | pp512 | 3297.82 ± 112.40 |
| mistral3 8B Q4_K - Medium | 4.83 GiB | 8.49 B | ROCm | 999 | 1 | tg128 | 69.62 ± 0.15 |
| qwen35 9B Q4_K - Medium | 5.28 GiB | 8.95 B | Vulkan | 999 | 1 | pp512 | 2734.57 ± 521.28 |
| qwen35 9B Q4_K - Medium | 5.28 GiB | 8.95 B | Vulkan | 999 | 1 | tg128 | 57.32 ± 0.06 |
| qwen35 9B Q4_K - Medium | 5.28 GiB | 8.95 B | ROCm | 999 | 1 | pp512 | 3127.03 ± 101.47 |
| qwen35 9B Q4_K - Medium | 5.28 GiB | 8.95 B | ROCm | 999 | 1 | tg128 | 54.14 ± 0.12 |
| llama 13B Q4_K - Medium | 6.96 GiB | 12.25 B | Vulkan | 999 | 1 | pp512 | 2188.18 ± 337.89 |
| llama 13B Q4_K - Medium | 6.96 GiB | 12.25 B | Vulkan | 999 | 1 | tg128 | 57.18 ± 0.10 |
| llama 13B Q4_K - Medium | 6.96 GiB | 12.25 B | ROCm | 999 | 1 | pp512 | 2385.88 ± 48.51 |
| llama 13B Q4_K - Medium | 6.96 GiB | 12.25 B | ROCm | 999 | 1 | tg128 | 48.87 ± 0.03 |
| bailingmoe2 16B.A1B Q4_K - Medium | 9.22 GiB | 16.26 B | Vulkan | 999 | 1 | pp512 | 5053.65 ± 126.80 |
| bailingmoe2 16B.A1B Q4_K - Medium | 9.22 GiB | 16.26 B | Vulkan | 999 | 1 | tg128 | 115.33 ± 1.02 |
| bailingmoe2 16B.A1B Q4_K - Medium | 9.22 GiB | 16.26 B | ROCm | 999 | 1 | pp512 | 5759.44 ± 61.50 |
| bailingmoe2 16B.A1B Q4_K - Medium | 9.22 GiB | 16.26 B | ROCm | 999 | 1 | tg128 | 91.89 ± 2.62 |
| gpt-oss 20B Q4_K - Medium | 10.81 GiB | 20.91 B | Vulkan | 999 | 1 | pp512 | 3067.63 ± 512.73 |
| gpt-oss 20B Q4_K - Medium | 10.81 GiB | 20.91 B | Vulkan | 999 | 1 | tg128 | 106.24 ± 0.41 |
| gpt-oss 20B Q4_K - Medium | 10.81 GiB | 20.91 B | ROCm | 999 | 1 | pp512 | 4210.16 ± 51.53 |
| gpt-oss 20B Q4_K - Medium | 10.81 GiB | 20.91 B | ROCm | 999 | 1 | tg128 | 96.92 ± 0.39 |
| mistral3 24B Q4_0 | 12.56 GiB | 23.57 B | Vulkan | 999 | 1 | pp512 | 1312.18 ± 62.28 |
| mistral3 24B Q4_0 | 12.56 GiB | 23.57 B | Vulkan | 999 | 1 | tg128 | 34.43 ± 0.02 |
| mistral3 24B Q4_0 | 12.56 GiB | 23.57 B | ROCm | 999 | 1 | pp512 | 1570.13 ± 18.79 |
| mistral3 24B Q4_0 | 12.56 GiB | 23.57 B | ROCm | 999 | 1 | tg128 | 34.51 ± 0.13 |
| deepseek2 30B.A3B Q4_K - Medium | 17.05 GiB | 29.94 B | Vulkan | 999 | 1 | pp512 | 2616.68 ± 395.65 |
| deepseek2 30B.A3B Q4_K - Medium | 17.05 GiB | 29.94 B | Vulkan | 999 | 1 | tg128 | 57.51 ± 0.21 |
| deepseek2 30B.A3B Q4_K - Medium | 17.05 GiB | 29.94 B | ROCm | 999 | 1 | pp512 | 2289.09 ± 6.00 |
| deepseek2 30B.A3B Q4_K - Medium | 17.05 GiB | 29.94 B | ROCm | 999 | 1 | tg128 | 48.54 ± 0.28 |
| qwen35moe 35B.A3B Q4_K - Medium | 20.60 GiB | 34.66 B | Vulkan | 999 | 1 | pp512 | 2484.60 ± 487.31 |
| qwen35moe 35B.A3B Q4_K - Medium | 20.60 GiB | 34.66 B | Vulkan | 999 | 1 | tg128 | 54.11 ± 0.06 |
| qwen35moe 35B.A3B Q4_K - Medium | 20.60 GiB | 34.66 B | ROCm | 999 | 1 | pp512 | 2707.23 ± 19.42 |
| qwen35moe 35B.A3B Q4_K - Medium | 20.60 GiB | 34.66 B | ROCm | 999 | 1 | tg128 | 43.16 ± 0.28 |
| qwen35 27B Q6_K | 20.97 GiB | 26.90 B | Vulkan | 999 | 1 | pp512 | 894.04 ± 0.61 |
| qwen35 27B Q6_K | 20.97 GiB | 26.90 B | Vulkan | 999 | 1 | tg128 | 2.14 ± 0.00 |
| qwen35 27B Q6_K | 20.97 GiB | 26.90 B | ROCm | 999 | 1 | pp512 | 691.71 ± 3.23 |
| qwen35 27B Q6_K | 20.97 GiB | 26.90 B | ROCm | 999 | 1 | tg128 | 19.28 ± 0.01 |
I have no idea what’s wrong with the bench of qwen 3.6 27b (they are 3.6 not 3.5) but on vulkan in cli it produces same 17-22 tps. The significant drop was noticed at about 50k prompt history to approximately 3-5 tokens per second (agentic work).
gpt oss 120 (–n-cpu-moe 20)
Vulkan: Prompt: 2.5 t/s | Generation: 13.1 t/s
Qwen Coder Next 80b A3B --n-cpu-moe 19
ROCM: Prompt: 30.1 t/s | Generation: 15.3 t/s
Vulkan: Prompt: 27.3 t/s | Generation: 14.6 t/s
Qwen3.5-122B-A10B-Q3_K_M-00001-of-00003.gguf --n-cpu-moe 25
ROCM: Prompt: 12.8 t/s | Generation: 9.3 t/s
Vulkan: Prompt: 13.5 t/s | Generation: 7.8 t/s
24b-27b dense and 35b moe A3B are pretty useful in pi.code. For Qwen Next 80b did not find yet proper balance to have enough context size.