AMD Strix Halo (Ryzen AI Max+ 395) GPU LLM Performance Tests

lhl · July 22, 2025, 2:36pm

On and off over the past couple months I’ve been doing ML/AI/LLM testing on Strix Halo (Ryzen AI Max+ 395), specifically with the gfx1151 GPU, in my spare time.

I was given the go ahead a while back to mention that this has been on pre-production Framework Desktop hardware, but since I only just recently finished this latest set of sweeps, and with release I assume getting pretty close (nope, I don’t know anything, don’t ask, lol), I figure I’d share the most detailed/definitive LLM inference performance testing that’s been done on the Ryzen AI Max so far.

A few things that differentiate vs any prior testing that I’ve seen:

Run on the latest software - Linux 6.15.5+ w/ the latest linux-firmware, BIOS/EC, TheRock ROCm nightly releases w/ gfx1151 targeted kernels and also recent llama.cpp builds, built directly from source
Testing of multiple backends and flags including HIP w/ rocBLAS and hipBLASLt, Vulkan w/ 2^n batching tests for MoEs, multiple MoE and dense model architectures and quants
Full sweeps of pp (compute bound), tg (memory-bandwidth bound), and memory usage (w/ and w/o FA)

These tests of course use llama-bench so that they are repeatable and statistically valid as well (default 5 runs).

For those interested for information on specific models, the raw data and individual sweeps for each model are available here (in chart and graph form): https://github.com/lhl/strix-halo-testing/tree/main/llm-bench

For everyone else, here’s the topline results:

Strix Halo LLM Benchmark Results

All testing was done on pre-production Framework Desktop systems with an AMD Ryzen Max+ 395 (Strix Halo)/128GB LPDDR5x-8000 configuration. (Thanks Nirav, Alexandru, and co!)

Exact testing/system details are in the results folders, but roughly these are running:

Close to production BIOS/EC
Relatively up-to-date kernels: 6.15.5-arch1-1/6.15.6-arch1-1
Recent TheRock/ROCm-7.0 nightly builds with Strix Halo (gfx1151) kernels
Recent llama.cpp builds (eg b5863 from 2005-07-10)

Just to get a ballpark on the hardware:

~215 GB/s max GPU MBW out of a 256 GB/s theoretical (256-bit 8000 MT/s)
theoretical 59 FP16 TFLOPS (VPOD/WMMA) on RDNA 3.5 (gfx11); effective is much lower

Results

Prompt Processing (pp) Performance

Model Name	Architecture	Weights (B)	Active (B)	Backend	Flags	pp512	tg128	Memory (Max MiB)
Llama 2 7B Q4_0	Llama 2	7	7	Vulkan		998.0	46.5	4237
Llama 2 7B Q4_K_M	Llama 2	7	7	HIP	hipBLASLt	906.1	40.8	4720
Shisa V2 8B i1-Q4_K_M	Llama 3	8	8	HIP	hipBLASLt	878.2	37.2	5308
Qwen 3 30B-A3B UD-Q4_K_XL	Qwen 3 MoE	30	3	Vulkan	fa=1	604.8	66.3	17527
Mistral Small 3.1 UD-Q4_K_XL	Mistral 3	24	24	HIP	hipBLASLt	316.9	13.6	14638
Hunyuan-A13B UD-Q6_K_XL	Hunyuan MoE	80	13	Vulkan	fa=1	270.5	17.1	68785
Llama 4 Scout UD-Q4_K_XL	Llama 4 MoE	109	17	HIP	hipBLASLt	264.1	17.2	59720
Qwen 3 32B Q8_0	Qwen 3	32	32	HIP	hipBLASLt	226.1	6.4	33683
Shisa V2 70B i1-Q4_K_M	Llama 3	70	70	HIP rocWMMA		94.7	4.5	41522
dots1 UD-Q4_K_XL	dots1 MoE	142	14	Vulkan	fa=1 b=256	63.1	20.6	84077

Text Generation (tg) Performance

Model Name	Architecture	Weights (B)	Active (B)	Backend	Flags	pp512	tg128	Memory (Max MiB)
Qwen 3 30B-A3B UD-Q4_K_XL	Qwen 3 MoE	30	3	Vulkan	b=256	591.1	72.0	17377
Llama 2 7B Q4_K_M	Llama 2	7	7	Vulkan	fa=1	620.9	47.9	4463
Llama 2 7B Q4_0	Llama 2	7	7	Vulkan	fa=1	1014.1	45.8	4219
Shisa V2 8B i1-Q4_K_M	Llama 3	8	8	Vulkan	fa=1	614.2	42.0	5333
dots1 UD-Q4_K_XL	dots1 MoE	142	14	Vulkan	fa=1 b=256	63.1	20.6	84077
Llama 4 Scout UD-Q4_K_XL	Llama 4 MoE	109	17	Vulkan	fa=1 b=256	146.1	19.3	59917
Hunyuan-A13B UD-Q6_K_XL	Hunyuan MoE	80	13	Vulkan	fa=1 b=256	223.9	17.1	68608
Mistral Small 3.1 UD-Q4_K_XL	Mistral 3	24	24	Vulkan	fa=1	119.6	14.3	14540
Qwen 3 32B Q8_0	Qwen 3	32	32	Vulkan	fa=1	101.8	6.4	33886
Shisa V2 70B i1-Q4_K_M	Llama 3	70	70	Vulkan	fa=1	26.4	5.0	41456

Testing Notes

The best overall backend and flags were chosen for each model family tested. You can see that often times the best backend for prefill vs token generation differ. Full results for each model (including the pp/tg graphs for different context lengths for all tested backend variations) are available for review in their respective folders as which backend is the best performing will depend on your exact use-case.

There’s a lot of performance still on the table when it comes to pp especially. Since these results should be close to optimal for when they were tested, I might add dates to the table (adding kernel, ROCm, and llama.cpp build#'s might be a bit much).

For additional discussion/reference:

Posted for discussion on r/LocalLlama
Also, initial testing I did a while back (perf has actually improved a fair amount since then).
Also WIP docs and other AI/ML notes for Strix Halo: https://llm-tracker.info/_TOORG/Strix-Halo

FW4TeePee · July 22, 2025, 11:41pm

I’m curious about this comment @lhl Can you expand a little more, please?

And, thank you! A great post and wonderfully helpful information. Much appreciated!

lhl · July 23, 2025, 3:25am

You can see from my mamf-finder and hgemm results that perf can be extremely low for different shapes: Strix Halo

But actually, if you take a look at this issue I’ve filed: [Issue]: gfx1151 rocBLAS/hipBLAS performance regression vs gfx1100 code path · Issue #4748 · ROCm/ROCm · GitHub

In some tests can see that using the gfx1100 kernels can be many times faster than the gfx1151 kernels:

gfx1100 rocBLAS has 2.5-6X the performance as gfx1151 rocBLAS
gfx1100 rocBLAS is 1.5-3X faster than gfx1151 hipBLASLt

That’s free real estate.

FW4TeePee · July 23, 2025, 4:10am

That’s very helpful. Thanks again @lhl

Atul_C · September 19, 2025, 10:12pm

This is incredibly good!

This looks promising that I have to consider framework desktop again.

Djip · September 19, 2025, 11:47pm

using rocm7rc + WMMA FA: with llama.cpp

model	size	params	backend	ngl	n_ubatch	fa	test	t/s
gpt-oss 120B MXFP4	59.02 GiB	116.83 B	ROCm	99	2048	1	pp1	45.96 ± 0.14
gpt-oss 120B MXFP4	59.02 GiB	116.83 B	ROCm	99	2048	1	pp1	46.09 ± 0.04
gpt-oss 120B MXFP4	59.02 GiB	116.83 B	ROCm	99	2048	1	pp2	57.97 ± 1.40
gpt-oss 120B MXFP4	59.02 GiB	116.83 B	ROCm	99	2048	1	pp4	91.34 ± 2.48
gpt-oss 120B MXFP4	59.02 GiB	116.83 B	ROCm	99	2048	1	pp8	129.47 ± 7.63
gpt-oss 120B MXFP4	59.02 GiB	116.83 B	ROCm	99	2048	1	pp16	208.04 ± 4.68
gpt-oss 120B MXFP4	59.02 GiB	116.83 B	ROCm	99	2048	1	pp32	244.28 ± 9.17
gpt-oss 120B MXFP4	59.02 GiB	116.83 B	ROCm	99	2048	1	pp48	223.12 ± 12.43
gpt-oss 120B MXFP4	59.02 GiB	116.83 B	ROCm	99	2048	1	pp64	315.89 ± 8.22
gpt-oss 120B MXFP4	59.02 GiB	116.83 B	ROCm	99	2048	1	pp96	390.09 ± 6.75
gpt-oss 120B MXFP4	59.02 GiB	116.83 B	ROCm	99	2048	1	pp128	451.03 ± 4.94
gpt-oss 120B MXFP4	59.02 GiB	116.83 B	ROCm	99	2048	1	pp192	517.76 ± 2.36
gpt-oss 120B MXFP4	59.02 GiB	116.83 B	ROCm	99	2048	1	pp256	597.15 ± 10.53
gpt-oss 120B MXFP4	59.02 GiB	116.83 B	ROCm	99	2048	1	pp384	709.08 ± 7.95
gpt-oss 120B MXFP4	59.02 GiB	116.83 B	ROCm	99	2048	1	pp512	775.60 ± 3.08
gpt-oss 120B MXFP4	59.02 GiB	116.83 B	ROCm	99	2048	1	pp768	852.64 ± 4.50
gpt-oss 120B MXFP4	59.02 GiB	116.83 B	ROCm	99	2048	1	pp1024	932.50 ± 6.08
gpt-oss 120B MXFP4	59.02 GiB	116.83 B	ROCm	99	2048	1	pp1536	992.20 ± 1.94
gpt-oss 120B MXFP4	59.02 GiB	116.83 B	ROCm	99	2048	1	pp2048	1020.32 ± 10.73
gpt-oss 120B MXFP4	59.02 GiB	116.83 B	ROCm	99	2048	1	pp3072	939.97 ± 12.84
gpt-oss 120B MXFP4	59.02 GiB	116.83 B	ROCm	99	2048	1	pp4096	962.18 ± 1.20
gpt-oss 120B MXFP4	59.02 GiB	116.83 B	ROCm	99	2048	1	tg16	46.08 ± 0.01
gpt-oss 120B MXFP4	59.02 GiB	116.83 B	ROCm	99	2048	1	pp512+tg64	270.98 ± 0.78

Deus_ex · September 20, 2025, 11:59am

Is this faster than vulkan? Any vulkan test to compare?

Djip · September 20, 2025, 1:16pm

more bench here:

Juste to note: the default ubatch is 512, you can have better result on long context with ubatch of 2048.

Deus_ex · September 20, 2025, 4:50pm

No change for 7 over 6. Vulkan has more wins in the token generation side too. Probably version 8 before it equals or surpasses Vulkan. AMD are also discontinuing their Vulkan driver and supporting Mesa RADV for Linux. Hopefully this means we’ll get improvements for ROCm sooner.

Djip · September 21, 2025, 2:28pm

Vulkan don’t like large BF16 model

Djip · September 22, 2025, 12:20am

Some more bench with Mistral Nemo (12.25 B)

vulkan with mesa 25.1.9 (fc42)
Rocm => 7rc + WMMA

model	size	backend	threads	type_k	type_v	test	t/s
Nemo BF16	22.81 GiB	CPU	16	bf16	bf16	pp1	4.95 ± 0.00
Nemo BF16	22.81 GiB	CPU	16	bf16	bf16	pp1	4.95 ± 0.00
Nemo BF16	22.81 GiB	CPU	16	bf16	bf16	pp2	9.27 ± 0.01
Nemo BF16	22.81 GiB	CPU	16	bf16	bf16	pp4	18.25 ± 0.00
Nemo BF16	22.81 GiB	CPU	16	bf16	bf16	pp8	35.12 ± 0.14
Nemo BF16	22.81 GiB	CPU	16	bf16	bf16	pp16	66.50 ± 0.10
Nemo BF16	22.81 GiB	CPU	16	bf16	bf16	pp32	114.29 ± 0.07
Nemo BF16	22.81 GiB	CPU	16	bf16	bf16	pp48	151.03 ± 0.09
Nemo BF16	22.81 GiB	CPU	16	bf16	bf16	pp64	168.75 ± 0.04
Nemo BF16	22.81 GiB	CPU	16	bf16	bf16	pp128	176.05 ± 0.42
Nemo BF16	22.81 GiB	CPU	16	bf16	bf16	pp192	175.46 ± 0.13
Nemo BF16	22.81 GiB	CPU	16	bf16	bf16	pp256	179.09 ± 0.04
Nemo BF16	22.81 GiB	CPU	16	bf16	bf16	pp384	183.79 ± 0.24
Nemo BF16	22.81 GiB	CPU	16	bf16	bf16	pp512	181.87 ± 0.24
Nemo BF16	22.81 GiB	CPU	16	bf16	bf16	pp768	177.31 ± 0.20
Nemo BF16	22.81 GiB	CPU	16	bf16	bf16	pp1024	180.00 ± 0.04
Nemo BF16	22.81 GiB	CPU	16	bf16	bf16	tg16	4.97 ± 0.00

model	size	backend	ngl	test	t/s
Nemo BF16	22.81 GiB	Vulkan	999	pp1	9.10 ± 0.00
Nemo BF16	22.81 GiB	Vulkan	999	pp1	9.11 ± 0.01
Nemo BF16	22.81 GiB	Vulkan	999	pp2	17.31 ± 0.01
Nemo BF16	22.81 GiB	Vulkan	999	pp4	33.07 ± 0.05
Nemo BF16	22.81 GiB	Vulkan	999	pp8	60.65 ± 0.14
Nemo BF16	22.81 GiB	Vulkan	999	pp16	60.51 ± 0.21
Nemo BF16	22.81 GiB	Vulkan	999	pp32	120.04 ± 0.21
Nemo BF16	22.81 GiB	Vulkan	999	pp48	138.00 ± 0.34
Nemo BF16	22.81 GiB	Vulkan	999	pp64	188.94 ± 1.23
Nemo BF16	22.81 GiB	Vulkan	999	pp128	216.68 ± 1.22
Nemo BF16	22.81 GiB	Vulkan	999	pp192	180.85 ± 2.07
Nemo BF16	22.81 GiB	Vulkan	999	pp256	198.99 ± 1.43
Nemo BF16	22.81 GiB	Vulkan	999	pp384	226.39 ± 1.15
Nemo BF16	22.81 GiB	Vulkan	999	pp512	233.00 ± 1.73
Nemo BF16	22.81 GiB	Vulkan	999	pp768	219.10 ± 1.23
Nemo BF16	22.81 GiB	Vulkan	999	pp1024	222.80 ± 1.32
Nemo BF16	22.81 GiB	Vulkan	999	pp1536	222.22 ± 0.06
Nemo BF16	22.81 GiB	Vulkan	999	pp2048	217.11 ± 0.20
Nemo BF16	22.81 GiB	Vulkan	999	pp3072	215.13 ± 0.65
Nemo BF16	22.81 GiB	Vulkan	999	pp4096	211.14 ± 0.20
Nemo BF16	22.81 GiB	Vulkan	999	tg16	9.11 ± 0.00

model	size	backend	ngl	n_ubatch	fa	test	t/s
Nemo BF16	22.81 GiB	Vulkan	999	4096	1	pp1	9.05 ± 0.00
Nemo BF16	22.81 GiB	Vulkan	999	4096	1	pp1	9.06 ± 0.00
Nemo BF16	22.81 GiB	Vulkan	999	4096	1	pp2	17.44 ± 0.01
Nemo BF16	22.81 GiB	Vulkan	999	4096	1	pp4	33.31 ± 0.02
Nemo BF16	22.81 GiB	Vulkan	999	4096	1	pp8	60.42 ± 0.12
Nemo BF16	22.81 GiB	Vulkan	999	4096	1	pp16	60.37 ± 0.05
Nemo BF16	22.81 GiB	Vulkan	999	4096	1	pp32	120.18 ± 0.40
Nemo BF16	22.81 GiB	Vulkan	999	4096	1	pp48	137.04 ± 0.30
Nemo BF16	22.81 GiB	Vulkan	999	4096	1	pp64	188.16 ± 0.89
Nemo BF16	22.81 GiB	Vulkan	999	4096	1	pp128	217.18 ± 0.89
Nemo BF16	22.81 GiB	Vulkan	999	4096	1	pp192	181.06 ± 1.79
Nemo BF16	22.81 GiB	Vulkan	999	4096	1	pp256	200.64 ± 1.98
Nemo BF16	22.81 GiB	Vulkan	999	4096	1	pp384	231.35 ± 1.20
Nemo BF16	22.81 GiB	Vulkan	999	4096	1	pp512	240.47 ± 0.46
Nemo BF16	22.81 GiB	Vulkan	999	4096	1	pp768	254.99 ± 0.13
Nemo BF16	22.81 GiB	Vulkan	999	4096	1	pp1024	267.15 ± 0.42
Nemo BF16	22.81 GiB	Vulkan	999	4096	1	pp1536	270.17 ± 0.37
Nemo BF16	22.81 GiB	Vulkan	999	4096	1	pp2048	270.73 ± 0.15
Nemo BF16	22.81 GiB	Vulkan	999	4096	1	pp3072	263.42 ± 0.55
Nemo BF16	22.81 GiB	Vulkan	999	4096	1	pp4096	259.99 ± 0.67
Nemo BF16	22.81 GiB	Vulkan	999	4096	1	tg16	9.06 ± 0.00

model	size	backend	ngl	n_ubatch	test	t/s
Nemo BF16	22.81 GiB	ROCm WMMA	999	4096	pp1	9.12 ± 0.01
Nemo BF16	22.81 GiB	ROCm	999	4096	pp1	9.12 ± 0.00
Nemo BF16	22.81 GiB	ROCm	999	4096	pp2	18.34 ± 0.01
Nemo BF16	22.81 GiB	ROCm	999	4096	pp4	26.14 ± 0.03
Nemo BF16	22.81 GiB	ROCm	999	4096	pp8	51.41 ± 0.07
Nemo BF16	22.81 GiB	ROCm	999	4096	pp16	100.74 ± 0.14
Nemo BF16	22.81 GiB	ROCm	999	4096	pp32	188.17 ± 0.29
Nemo BF16	22.81 GiB	ROCm	999	4096	pp48	254.40 ± 1.25
Nemo BF16	22.81 GiB	ROCm	999	4096	pp64	289.35 ± 1.97
Nemo BF16	22.81 GiB	ROCm	999	4096	pp128	502.12 ± 2.10
Nemo BF16	22.81 GiB	ROCm	999	4096	pp192	438.20 ± 0.86
Nemo BF16	22.81 GiB	ROCm	999	4096	pp256	554.11 ± 1.67
Nemo BF16	22.81 GiB	ROCm	999	4096	pp384	674.47 ± 3.94
Nemo BF16	22.81 GiB	ROCm	999	4096	pp512	750.69 ± 2.51
Nemo BF16	22.81 GiB	ROCm	999	4096	pp768	676.79 ± 1.31
Nemo BF16	22.81 GiB	ROCm	999	4096	pp1024	701.14 ± 1.74
Nemo BF16	22.81 GiB	ROCm	999	4096	pp1536	657.02 ± 1.08
Nemo BF16	22.81 GiB	ROCm	999	4096	pp2048	608.92 ± 1.32
Nemo BF16	22.81 GiB	ROCm	999	4096	pp3072	571.98 ± 0.76
Nemo BF16	22.81 GiB	ROCm	999	4096	pp4096	536.20 ± 3.82
Nemo BF16	22.81 GiB	ROCm	999	4096	tg16	9.12 ± 0.00

model	size	backend	ngl	n_ubatch	fa	test	t/s
Nemo BF16	22.81 GiB	ROCm WMMA	999	4096	1	pp1	9.12 ± 0.01
Nemo BF16	22.81 GiB	ROCm	999	4096	1	pp1	9.12 ± 0.01
Nemo BF16	22.81 GiB	ROCm	999	4096	1	pp2	18.16 ± 0.01
Nemo BF16	22.81 GiB	ROCm	999	4096	1	pp4	25.68 ± 0.02
Nemo BF16	22.81 GiB	ROCm	999	4096	1	pp8	50.07 ± 0.04
Nemo BF16	22.81 GiB	ROCm	999	4096	1	pp16	100.46 ± 0.25
Nemo BF16	22.81 GiB	ROCm	999	4096	1	pp32	188.77 ± 0.14
Nemo BF16	22.81 GiB	ROCm	999	4096	1	pp48	250.75 ± 1.11
Nemo BF16	22.81 GiB	ROCm	999	4096	1	pp64	285.20 ± 1.65
Nemo BF16	22.81 GiB	ROCm	999	4096	1	pp128	497.93 ± 3.36
Nemo BF16	22.81 GiB	ROCm	999	4096	1	pp192	440.71 ± 1.45
Nemo BF16	22.81 GiB	ROCm	999	4096	1	pp256	568.84 ± 0.66
Nemo BF16	22.81 GiB	ROCm	999	4096	1	pp384	706.95 ± 1.18
Nemo BF16	22.81 GiB	ROCm	999	4096	1	pp512	833.50 ± 3.12
Nemo BF16	22.81 GiB	ROCm	999	4096	1	pp768	758.99 ± 1.01
Nemo BF16	22.81 GiB	ROCm	999	4096	1	pp1024	857.05 ± 4.48
Nemo BF16	22.81 GiB	ROCm	999	4096	1	pp1536	852.12 ± 1.30
Nemo BF16	22.81 GiB	ROCm	999	4096	1	pp2048	856.53 ± 2.11
Nemo BF16	22.81 GiB	ROCm	999	4096	1	pp3072	789.06 ± 1.45
Nemo BF16	22.81 GiB	ROCm	999	4096	1	pp4096	739.43 ± 0.81
Nemo BF16	22.81 GiB	ROCm	999	4096	1	tg16	9.13 ± 0.00

Christian_Haake · September 22, 2025, 10:32pm

Hi! Thanks a lot. Do you have a link for tutorial how u got it running? With rocm 7 and Vulkan?

Thanks in advance!

Djip · September 24, 2025, 1:16am

For those who looked at the bench published there: https://www.phoronix.com/review/amd-rocm-7-strix-halo/3 as report in comment there is something wrong with the bench with rocm. Ther result I get:

Qwen_Qwen3-8B (params = 8,19 B)

therock 7.0rc, on “toolbox”: https://github.com/kyuz0/amd-strix-halo-toolboxes: “rocm-7rc-rocwmma”
n_ubatch=2048 , fa=1, mmap=0
Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
build: d304f459 (6502)

model	size	test	HIPBLASLT OFF t/s	HIPBLASLT ON t/s
qwen3 8B BF16	15.26 GiB	pp1	12.93 ± 0.02	12.95 ± 0.02
qwen3 8B BF16	15.26 GiB	pp2	26.01 ± 0.03	26.02 ± 0.02
qwen3 8B BF16	15.26 GiB	pp4	28.40 ± 0.04	20.37 ± 0.13
qwen3 8B BF16	15.26 GiB	pp6	41.74 ± 0.04	29.80 ± 0.12
qwen3 8B BF16	15.26 GiB	pp8	55.33 ± 0.08	39.67 ± 0.11
qwen3 8B BF16	15.26 GiB	pp16	112.70 ± 0.24	81.55 ± 0.24
qwen3 8B BF16	15.26 GiB	pp24	165.72 ± 0.30	120.44 ± 0.41
qwen3 8B BF16	15.26 GiB	pp32	215.53 ± 0.44	159.88 ± 0.85
qwen3 8B BF16	15.26 GiB	pp48	298.57 ± 0.94	223.21 ± 0.39
qwen3 8B BF16	15.26 GiB	pp64	362.05 ± 2.13	290.74 ± 1.04
qwen3 8B BF16	15.26 GiB	pp96	442.30 ± 1.39	407.15 ± 2.10
qwen3 8B BF16	15.26 GiB	pp128	571.41 ± 2.80	388.00 ± 1.34
qwen3 8B BF16	15.26 GiB	pp192	526.04 ± 1.16	518.43 ± 2.56
qwen3 8B BF16	15.26 GiB	pp256	665.36 ± 1.33	570.58 ± 2.91
qwen3 8B BF16	15.26 GiB	pp384	884.04 ± 3.37	844.08 ± 4.72
qwen3 8B BF16	15.26 GiB	pp512	1085.20 ± 2.81	1040.50 ± 3.02
qwen3 8B BF16	15.26 GiB	pp768	1050.38 ± 6.90	1210.20 ± 1.64
qwen3 8B BF16	15.26 GiB	pp1024	1132.21 ± 0.96	1116.82 ± 2.52
qwen3 8B BF16	15.26 GiB	pp1536	1149.04 ± 0.78	1269.52 ± 1.70
qwen3 8B BF16	15.26 GiB	pp2048	1143.40 ± 0.89	1213.20 ± 1.84
qwen3 8B BF16	15.26 GiB	pp3072	1037.59 ± 0.66	1081.71 ± 2.06
qwen3 8B BF16	15.26 GiB	pp4096	965.97 ± 2.21	1033.37 ± 1.18
qwen3 8B BF16	15.26 GiB	tg16	12.93 ± 0.01	12.94 ± 0.00
qwen3 8B BF16	15.26 GiB	pp512+tg64	105.00 ± 0.01	104.66 ± 0.03
qwen3 8B F16	15.26 GiB	pp1	13.00 ± 0.01	12.98 ± 0.01
qwen3 8B F16	15.26 GiB	pp2	26.07 ± 0.01	26.04 ± 0.03
qwen3 8B F16	15.26 GiB	pp4	49.20 ± 0.03	49.20 ± 0.06
qwen3 8B F16	15.26 GiB	pp6	41.93 ± 0.04	35.49 ± 0.08
qwen3 8B F16	15.26 GiB	pp8	55.68 ± 0.01	46.91 ± 0.19
qwen3 8B F16	15.26 GiB	pp16	112.61 ± 0.29	94.67 ± 0.31
qwen3 8B F16	15.26 GiB	pp24	166.39 ± 0.39	141.30 ± 0.76
qwen3 8B F16	15.26 GiB	pp32	216.75 ± 0.73	187.52 ± 0.78
qwen3 8B F16	15.26 GiB	pp48	299.06 ± 0.82	263.33 ± 0.96
qwen3 8B F16	15.26 GiB	pp64	362.20 ± 1.82	344.77 ± 0.97
qwen3 8B F16	15.26 GiB	pp96	505.39 ± 2.44	475.28 ± 2.43
qwen3 8B F16	15.26 GiB	pp128	621.02 ± 3.09	469.88 ± 2.35
qwen3 8B F16	15.26 GiB	pp192	736.27 ± 3.05	592.13 ± 2.54
qwen3 8B F16	15.26 GiB	pp256	879.50 ± 3.10	591.46 ± 3.75
qwen3 8B F16	15.26 GiB	pp384	876.07 ± 2.75	937.11 ± 8.14
qwen3 8B F16	15.26 GiB	pp512	1063.39 ± 3.14	1159.14 ± 3.20
qwen3 8B F16	15.26 GiB	pp768	990.83 ± 2.54	1222.91 ± 1.98
qwen3 8B F16	15.26 GiB	pp1024	1123.21 ± 2.67	1158.03 ± 4.19
qwen3 8B F16	15.26 GiB	pp1536	1130.37 ± 2.98	1257.22 ± 1.80
qwen3 8B F16	15.26 GiB	pp2048	1134.58 ± 2.64	1200.06 ± 1.42
qwen3 8B F16	15.26 GiB	pp3072	1029.10 ± 1.27	1084.31 ± 1.60
qwen3 8B F16	15.26 GiB	pp4096	951.39 ± 1.43	1026.12 ± 1.27
qwen3 8B F16	15.26 GiB	tg16	13.00 ± 0.00	12.98 ± 0.00
qwen3 8B F16	15.26 GiB	pp512+tg64	105.19 ± 0.03	105.93 ± 0.03
qwen3 8B Q8_0	8.11 GiB	pp1	25.55 ± 0.01	25.52 ± 0.01
qwen3 8B Q8_0	8.11 GiB	pp2	50.09 ± 0.02	50.15 ± 0.02
qwen3 8B Q8_0	8.11 GiB	pp4	94.66 ± 0.07	94.73 ± 0.04
qwen3 8B Q8_0	8.11 GiB	pp6	131.90 ± 0.12	131.79 ± 0.07
qwen3 8B Q8_0	8.11 GiB	pp8	171.87 ± 0.21	171.79 ± 0.17
qwen3 8B Q8_0	8.11 GiB	pp16	331.09 ± 0.44	331.19 ± 0.44
qwen3 8B Q8_0	8.11 GiB	pp24	452.01 ± 0.78	451.89 ± 0.81
qwen3 8B Q8_0	8.11 GiB	pp32	546.79 ± 0.40	547.36 ± 0.30
qwen3 8B Q8_0	8.11 GiB	pp48	636.67 ± 0.80	637.28 ± 0.86
qwen3 8B Q8_0	8.11 GiB	pp64	226.32 ± 0.55	219.86 ± 0.68
qwen3 8B Q8_0	8.11 GiB	pp96	321.10 ± 0.66	314.47 ± 0.41
qwen3 8B Q8_0	8.11 GiB	pp128	414.82 ± 0.88	332.79 ± 1.33
qwen3 8B Q8_0	8.11 GiB	pp192	479.20 ± 2.61	417.33 ± 4.13
qwen3 8B Q8_0	8.11 GiB	pp256	596.42 ± 2.24	432.78 ± 1.38
qwen3 8B Q8_0	8.11 GiB	pp384	624.29 ± 4.09	690.14 ± 2.43
qwen3 8B Q8_0	8.11 GiB	pp512	797.25 ± 2.31	898.02 ± 2.00
qwen3 8B Q8_0	8.11 GiB	pp768	819.61 ± 1.67	1026.54 ± 2.85
qwen3 8B Q8_0	8.11 GiB	pp1024	944.47 ± 2.79	1011.68 ± 2.15
qwen3 8B Q8_0	8.11 GiB	pp1536	970.48 ± 3.37	1131.98 ± 0.92
qwen3 8B Q8_0	8.11 GiB	pp2048	994.78 ± 2.03	1089.52 ± 1.08
qwen3 8B Q8_0	8.11 GiB	pp3072	897.04 ± 1.53	984.06 ± 0.83
qwen3 8B Q8_0	8.11 GiB	pp4096	846.70 ± 0.89	940.87 ± 1.67
qwen3 8B Q8_0	8.11 GiB	tg16	25.55 ± 0.00	25.55 ± 0.00
qwen3 8B Q8_0	8.11 GiB	pp512+tg64	179.15 ± 0.09	183.12 ± 0.05
qwen3 8B Q6_K	6.54 GiB	pp1	30.36 ± 0.03	30.30 ± 0.03
qwen3 8B Q6_K	6.54 GiB	pp2	59.28 ± 0.04	59.27 ± 0.06
qwen3 8B Q6_K	6.54 GiB	pp4	109.74 ± 0.23	109.72 ± 0.24
qwen3 8B Q6_K	6.54 GiB	pp6	147.07 ± 0.10	146.93 ± 0.08
qwen3 8B Q6_K	6.54 GiB	pp8	174.82 ± 0.11	174.76 ± 0.12
qwen3 8B Q6_K	6.54 GiB	pp16	326.67 ± 0.34	326.77 ± 0.50
qwen3 8B Q6_K	6.54 GiB	pp24	406.59 ± 0.72	407.66 ± 0.57
qwen3 8B Q6_K	6.54 GiB	pp32	433.06 ± 0.65	432.86 ± 0.53
qwen3 8B Q6_K	6.54 GiB	pp48	486.84 ± 0.67	486.43 ± 0.99
qwen3 8B Q6_K	6.54 GiB	pp64	236.55 ± 0.60	230.69 ± 0.69
qwen3 8B Q6_K	6.54 GiB	pp96	335.34 ± 0.26	327.46 ± 0.66
qwen3 8B Q6_K	6.54 GiB	pp128	430.09 ± 1.09	341.29 ± 1.52
qwen3 8B Q6_K	6.54 GiB	pp192	498.83 ± 2.61	432.89 ± 1.25
qwen3 8B Q6_K	6.54 GiB	pp256	627.50 ± 1.53	438.21 ± 2.41
qwen3 8B Q6_K	6.54 GiB	pp384	628.97 ± 3.51	720.54 ± 3.39
qwen3 8B Q6_K	6.54 GiB	pp512	809.68 ± 2.35	919.29 ± 2.92
qwen3 8B Q6_K	6.54 GiB	pp768	829.90 ± 2.82	1046.32 ± 1.71
qwen3 8B Q6_K	6.54 GiB	pp1024	954.67 ± 1.05	1025.66 ± 1.80
qwen3 8B Q6_K	6.54 GiB	pp1536	978.48 ± 2.23	1139.07 ± 1.35
qwen3 8B Q6_K	6.54 GiB	pp2048	987.67 ± 1.64	1103.25 ± 0.64
qwen3 8B Q6_K	6.54 GiB	pp3072	905.59 ± 3.98	995.22 ± 1.31
qwen3 8B Q6_K	6.54 GiB	pp4096	849.35 ± 1.31	957.84 ± 0.85
qwen3 8B Q6_K	6.54 GiB	tg16	30.38 ± 0.01	30.30 ± 0.00
qwen3 8B Q6_K	6.54 GiB	pp512+tg64	205.66 ± 0.17	210.31 ± 0.16
qwen3 8B Q5_K_M	5.80 GiB	pp1	33.41 ± 0.07	33.45 ± 0.08
qwen3 8B Q5_K_M	5.80 GiB	pp2	60.18 ± 0.05	60.28 ± 0.06
qwen3 8B Q5_K_M	5.80 GiB	pp4	98.72 ± 0.09	98.82 ± 0.09
qwen3 8B Q5_K_M	5.80 GiB	pp6	117.95 ± 0.09	117.70 ± 0.04
qwen3 8B Q5_K_M	5.80 GiB	pp8	133.76 ± 0.24	133.63 ± 0.09
qwen3 8B Q5_K_M	5.80 GiB	pp16	361.90 ± 0.44	361.43 ± 0.47
qwen3 8B Q5_K_M	5.80 GiB	pp24	452.90 ± 0.41	452.23 ± 0.28
qwen3 8B Q5_K_M	5.80 GiB	pp32	469.04 ± 0.62	468.04 ± 0.22
qwen3 8B Q5_K_M	5.80 GiB	pp48	332.68 ± 0.64	332.23 ± 0.44
qwen3 8B Q5_K_M	5.80 GiB	pp64	232.55 ± 1.14	224.87 ± 0.30
qwen3 8B Q5_K_M	5.80 GiB	pp96	330.34 ± 1.25	322.97 ± 0.67
qwen3 8B Q5_K_M	5.80 GiB	pp128	423.84 ± 0.72	348.91 ± 1.43
qwen3 8B Q5_K_M	5.80 GiB	pp192	485.42 ± 2.80	431.48 ± 3.15
qwen3 8B Q5_K_M	5.80 GiB	pp256	617.08 ± 2.59	437.79 ± 2.02
qwen3 8B Q5_K_M	5.80 GiB	pp384	639.67 ± 2.64	706.98 ± 0.91
qwen3 8B Q5_K_M	5.80 GiB	pp512	810.94 ± 2.23	911.49 ± 2.45
qwen3 8B Q5_K_M	5.80 GiB	pp768	827.59 ± 2.78	1027.47 ± 2.00
qwen3 8B Q5_K_M	5.80 GiB	pp1024	961.61 ± 2.44	1026.27 ± 3.06
qwen3 8B Q5_K_M	5.80 GiB	pp1536	974.74 ± 2.51	1122.00 ± 1.15
qwen3 8B Q5_K_M	5.80 GiB	pp2048	987.61 ± 1.28	1095.68 ± 1.27
qwen3 8B Q5_K_M	5.80 GiB	pp3072	894.33 ± 2.65	992.42 ± 0.88
qwen3 8B Q5_K_M	5.80 GiB	pp4096	856.71 ± 0.84	953.10 ± 1.15
qwen3 8B Q5_K_M	5.80 GiB	tg16	33.49 ± 0.01	33.49 ± 0.01
qwen3 8B Q5_K_M	5.80 GiB	pp512+tg64	220.58 ± 0.18	226.53 ± 0.14

lhl · September 24, 2025, 2:24am

I have been bringing up docs in an AI section of the Strix Halo HomeLab wiki: AI-Capabilities-Overview – Strix Halo HomeLab (and when I get less busy will be updating all my other docs to point to that and my GitHub - lhl/strix-halo-testing repo as the latest update “sources of truth”)

Vulkan is easy, just follow the llama build instructions. I made a doc a while back on proper compiles for llama.cpp w/ ROCm: https://strixhalo-homelab.d7.wtf/AI/llamacpp-with-ROCm

Christian_Haake · September 24, 2025, 8:03am

Thank you - I’ll give it a try

Djip · September 29, 2025, 3:49am

model	size	params	backend	ngl	n_ubatch	fa	mmap	test	t/s
qwen3 8B BF16	15.26 GiB	8.19 B	ROCm 7.0.1	999	4096	1	0	pp512	325.95 ± 0.22
qwen3 8B BF16	15.26 GiB	8.19 B	ROCm 6.4.4	999	4096	1	0	pp512	1132.26 ± 2.42

Eugr · September 29, 2025, 4:20pm

That’s quite a difference. What numbers are you getting with Vulkan backend?
Also, what is your token generation speed?

Djip · September 29, 2025, 9:50pm

@kyuz0 did a great job to compare many case :

Only there is not bench with new officiel rocm7.0.1. There is some review/bench (like https://www.phoronix.com/review/amd-rocm-7-strix-halo/3 ) that use the new official rocm7 and show realy bad result. What I didn’t know if there where a probleme with llama.cpp build or with rocm. What I get is same result on rocm7 so the “probleme” is with the official rocm7.0.1.

Like kyuz0 say if you want good perf with llama.cpp use therock build or the last 6.4.4 serie.

note: The 6.4.3 did not have bad result but may have more stability bug.

Topic		Replies	Views
LLM Performance Framework Desktop ai	26	5510	June 11, 2025
Llama.cpp/vLLM Toolboxes for LLM inference on Strix Halo Framework Desktop	32	2189	October 26, 2025
Will the AI Max+ 395 (128GB) be able to run gpt-oss-120b? Framework Desktop framework-desktop-ai-max-300 , ai	31	7107	October 25, 2025
AMD Strix Halo Llama.cpp Installation Guide for Fedora 42 Framework Desktop framework-desktop-ai-max-300 , ai	15	1028	October 8, 2025
Request: verify dGPU support Framework Desktop compatibility	179	5878	October 24, 2025