[ci] Adding gfx1103 coverage by geomin12 · Pull Request #1854 · ROCm/TheRock · GitHub
Need more time to look a it.
[ci] Adding gfx1103 coverage by geomin12 · Pull Request #1854 · ROCm/TheRock · GitHub
Need more time to look a it.
What does this mean? Rocm already worked fine on the FW16 since the beginning.
fine.. not really with the iGPU.
TheRock is a AMD official build (even if for now in preview)
for exemple with my FW16 (and 128Go of RAM and non dGPU):
llama.cpp build with rocm-7.10.0a20251025 can have:
| model | size | params | test | t/s |
|---|---|---|---|---|
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp1 | 13.43 ± 0.51 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp1 | 10.68 ± 0.06 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp2 | 17.20 ± 1.25 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp3 | 23.14 ± 1.00 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp4 | 26.34 ± 0.92 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp8 | 32.19 ± 0.11 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp12 | 33.76 ± 0.39 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp16 | 34.30 ± 0.13 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp24 | 34.60 ± 0.73 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp32 | 36.51 ± 0.43 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp48 | 37.35 ± 1.42 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp64 | 38.03 ± 2.25 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp96 | 39.72 ± 0.56 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp128 | 40.03 ± 0.31 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp192 | 39.52 ± 1.03 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp256 | 39.00 ± 0.69 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp384 | 38.80 ± 0.68 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp512 | 37.32 ± 0.16 |
| model | size | params | test | t/s |
|---|---|---|---|---|
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp1 | 17.69 ± 0.19 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp2 | 18.69 ± 0.57 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp3 | 24.74 ± 1.28 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp4 | 27.78 ± 0.31 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp8 | 37.84 ± 2.63 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp12 | 44.73 ± 3.98 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp16 | 52.02 ± 4.58 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp24 | 54.92 ± 2.74 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp32 | 60.35 ± 6.97 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp48 | 54.14 ± 0.80 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp64 | 114.08 ± 1.87 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp96 | 128.74 ± 1.28 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp128 | 146.75 ± 1.92 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp192 | 162.75 ± 2.25 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp256 | 184.04 ± 1.42 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp384 | 202.39 ± 0.15 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp512 | 216.78 ± 1.26 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp768 | 230.29 ± 0.72 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp1024 | 242.97 ± 1.37 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp1536 | 249.27 ± 0.29 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp2048 | 251.07 ± 1.70 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp3072 | 238.61 ± 0.19 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp4096 | 238.47 ± 0.17 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | tg16 | 17.83 ± 0.06 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp512+tg64 | 96.45 ± 0.18 |
but it can be faster if we take time to create optimized backend. for exemple with only ce CPU, ik_llama.cpp fork can have:
| model | size | params | test | t/s |
|---|---|---|---|---|
| gpt-oss ?B MXFP4 - 4.25 bpw | 59.02 GiB | 116.83 B | pp1 | 12.74 ± 0.53 |
| gpt-oss ?B MXFP4 - 4.25 bpw | 59.02 GiB | 116.83 B | pp2 | 20.43 ± 0.47 |
| gpt-oss ?B MXFP4 - 4.25 bpw | 59.02 GiB | 116.83 B | pp3 | 24.47 ± 1.33 |
| gpt-oss ?B MXFP4 - 4.25 bpw | 59.02 GiB | 116.83 B | pp4 | 29.05 ± 0.51 |
| gpt-oss ?B MXFP4 - 4.25 bpw | 59.02 GiB | 116.83 B | pp8 | 39.80 ± 1.77 |
| gpt-oss ?B MXFP4 - 4.25 bpw | 59.02 GiB | 116.83 B | pp12 | 43.89 ± 1.01 |
| gpt-oss ?B MXFP4 - 4.25 bpw | 59.02 GiB | 116.83 B | pp16 | 48.19 ± 0.14 |
| gpt-oss ?B MXFP4 - 4.25 bpw | 59.02 GiB | 116.83 B | pp24 | 50.36 ± 1.30 |
| gpt-oss ?B MXFP4 - 4.25 bpw | 59.02 GiB | 116.83 B | pp32 | 57.38 ± 0.30 |
| gpt-oss ?B MXFP4 - 4.25 bpw | 59.02 GiB | 116.83 B | pp48 | 69.28 ± 1.89 |
| gpt-oss ?B MXFP4 - 4.25 bpw | 59.02 GiB | 116.83 B | pp64 | 76.33 ± 4.12 |
| gpt-oss ?B MXFP4 - 4.25 bpw | 59.02 GiB | 116.83 B | pp96 | 87.84 ± 2.44 |
| gpt-oss ?B MXFP4 - 4.25 bpw | 59.02 GiB | 116.83 B | pp128 | 97.41 ± 2.51 |
| gpt-oss ?B MXFP4 - 4.25 bpw | 59.02 GiB | 116.83 B | pp192 | 107.12 ± 1.62 |
| gpt-oss ?B MXFP4 - 4.25 bpw | 59.02 GiB | 116.83 B | pp256 | 116.30 ± 2.83 |
| gpt-oss ?B MXFP4 - 4.25 bpw | 59.02 GiB | 116.83 B | pp384 | 124.47 ± 2.08 |
| gpt-oss ?B MXFP4 - 4.25 bpw | 59.02 GiB | 116.83 B | pp512 | 126.85 ± 1.06 |
| gpt-oss ?B MXFP4 - 4.25 bpw | 59.02 GiB | 116.83 B | pp768 | 136.04 ± 2.22 |
| gpt-oss ?B MXFP4 - 4.25 bpw | 59.02 GiB | 116.83 B | pp1024 | 138.26 ± 1.67 |
| gpt-oss ?B MXFP4 - 4.25 bpw | 59.02 GiB | 116.83 B | pp1536 | 138.63 ± 1.32 |
| gpt-oss ?B MXFP4 - 4.25 bpw | 59.02 GiB | 116.83 B | pp2048 | 136.38 ± 0.79 |
| gpt-oss ?B MXFP4 - 4.25 bpw | 59.02 GiB | 116.83 B | pp3072 | 131.53 ± 0.57 |
| gpt-oss ?B MXFP4 - 4.25 bpw | 59.02 GiB | 116.83 B | pp4096 | 123.31 ± 1.45 |
| gpt-oss ?B MXFP4 - 4.25 bpw | 59.02 GiB | 116.83 B | tg16 | 13.39 ± 1.39 |
| gpt-oss ?B MXFP4 - 4.25 bpw | 59.02 GiB | 116.83 B | pp512+tg64 | 62.91 ± 1.21 |
Ahh okay. I use the dGPU with rocm. Never thought to try the iGPU lol.
I did not have the dGPU… and the iGPU can use all RAM in my case I can run large model like oss-120, … etc. prety good for MOE large model.
or mistral-nemo on FP16 …
| model | size | params | test | t/s |
|---|---|---|---|---|
| Mistral-Nemo-Instruct-2407 | 22.81 GiB | 12.25 B | pp1 | 2.82 ± 0.00 |
| Mistral-Nemo-Instruct-2407 | 22.81 GiB | 12.25 B | pp1 | 2.81 ± 0.01 |
| Mistral-Nemo-Instruct-2407 | 22.81 GiB | 12.25 B | pp2 | 5.33 ± 0.17 |
| Mistral-Nemo-Instruct-2407 | 22.81 GiB | 12.25 B | pp3 | 7.54 ± 0.02 |
| Mistral-Nemo-Instruct-2407 | 22.81 GiB | 12.25 B | pp4 | 9.38 ± 0.00 |
| Mistral-Nemo-Instruct-2407 | 22.81 GiB | 12.25 B | pp8 | 15.66 ± 0.03 |
| Mistral-Nemo-Instruct-2407 | 22.81 GiB | 12.25 B | pp12 | 23.00 ± 0.31 |
| Mistral-Nemo-Instruct-2407 | 22.81 GiB | 12.25 B | pp16 | 30.82 ± 0.06 |
| Mistral-Nemo-Instruct-2407 | 22.81 GiB | 12.25 B | pp24 | 45.14 ± 0.08 |
| Mistral-Nemo-Instruct-2407 | 22.81 GiB | 12.25 B | pp32 | 58.77 ± 0.10 |
| Mistral-Nemo-Instruct-2407 | 22.81 GiB | 12.25 B | pp48 | 84.73 ± 0.38 |
| Mistral-Nemo-Instruct-2407 | 22.81 GiB | 12.25 B | pp64 | 106.35 ± 0.12 |
| Mistral-Nemo-Instruct-2407 | 22.81 GiB | 12.25 B | pp96 | 147.68 ± 0.22 |
| Mistral-Nemo-Instruct-2407 | 22.81 GiB | 12.25 B | pp128 | 173.62 ± 2.78 |
| Mistral-Nemo-Instruct-2407 | 22.81 GiB | 12.25 B | pp192 | 147.58 ± 0.67 |
| Mistral-Nemo-Instruct-2407 | 22.81 GiB | 12.25 B | pp256 | 171.93 ± 0.57 |
| Mistral-Nemo-Instruct-2407 | 22.81 GiB | 12.25 B | pp384 | 158.70 ± 3.12 |
| Mistral-Nemo-Instruct-2407 | 22.81 GiB | 12.25 B | pp512 | 157.58 ± 4.81 |
| Mistral-Nemo-Instruct-2407 | 22.81 GiB | 12.25 B | pp768 | 158.31 ± 3.40 |
| Mistral-Nemo-Instruct-2407 | 22.81 GiB | 12.25 B | pp1024 | 175.72 ± 11.71 |
| Mistral-Nemo-Instruct-2407 | 22.81 GiB | 12.25 B | pp1536 | 178.50 ± 2.54 |
| Mistral-Nemo-Instruct-2407 | 22.81 GiB | 12.25 B | pp2048 | 171.71 ± 0.17 |
| Mistral-Nemo-Instruct-2407 | 22.81 GiB | 12.25 B | pp3072 | 173.16 ± 0.58 |
| Mistral-Nemo-Instruct-2407 | 22.81 GiB | 12.25 B | pp4096 | 158.70 ± 7.36 |
| Mistral-Nemo-Instruct-2407 | 22.81 GiB | 12.25 B | tg16 | 2.67 ± 0.05 |
| Mistral-Nemo-Instruct-2407 | 22.81 GiB | 12.25 B | pp512+tg64 | 21.19 ± 0.22 |
or mistral small … but the tg is small.
Note: look somthing wrong with my fw16 it is only ~40W and low temp…
Seriously? How? The bios limits max to 8GB so how are you able to use more?
Which bios are you on? 3.06-07 did that and 3.05 would also do it in certain circumstances,
hip can (and llama.cpp with GGML_CUDA_ENABLE_UNIFIED_MEMORY=ON env) can on linux alloc on RAM (hip_host_malloc) no limite with that.
since kernel 6.11 (or 6.12 can remamber) AMD change driver so on iGPU device alloc can use vRAM+GTT on linux. (so without need of code to use the host alloc).
For the 40W don’t know (I didn’t make bench resently , I use the framework desktop
) but after restart (did no know if needed… and on run change the USB port…) I get back with 55/65W
with that:
backend ROCm
| model | size | params | test | t/s |
|---|---|---|---|---|
| llama 13B F16 | 22.81 GiB | 12.25 B | pp1 | 3.26 ± 0.03 |
| llama 13B F16 | 22.81 GiB | 12.25 B | pp1 | 3.28 ± 0.01 |
| llama 13B F16 | 22.81 GiB | 12.25 B | pp2 | 6.42 ± 0.07 |
| llama 13B F16 | 22.81 GiB | 12.25 B | pp3 | 9.06 ± 0.07 |
| llama 13B F16 | 22.81 GiB | 12.25 B | pp4 | 11.35 ± 0.11 |
| llama 13B F16 | 22.81 GiB | 12.25 B | pp8 | 17.11 ± 0.07 |
| llama 13B F16 | 22.81 GiB | 12.25 B | pp12 | 25.65 ± 0.17 |
| llama 13B F16 | 22.81 GiB | 12.25 B | pp16 | 33.91 ± 0.24 |
| llama 13B F16 | 22.81 GiB | 12.25 B | pp24 | 49.08 ± 0.56 |
| llama 13B F16 | 22.81 GiB | 12.25 B | pp32 | 64.24 ± 0.33 |
| llama 13B F16 | 22.81 GiB | 12.25 B | pp48 | 92.70 ± 1.08 |
| llama 13B F16 | 22.81 GiB | 12.25 B | pp64 | 116.14 ± 0.88 |
| llama 13B F16 | 22.81 GiB | 12.25 B | pp96 | 161.09 ± 0.54 |
| llama 13B F16 | 22.81 GiB | 12.25 B | pp128 | 189.96 ± 0.85 |
| llama 13B F16 | 22.81 GiB | 12.25 B | pp192 | 159.92 ± 0.53 |
| llama 13B F16 | 22.81 GiB | 12.25 B | pp256 | 184.93 ± 0.48 |
| llama 13B F16 | 22.81 GiB | 12.25 B | pp384 | 168.24 ± 0.70 |
| llama 13B F16 | 22.81 GiB | 12.25 B | pp512 | 174.95 ± 2.95 |
| llama 13B F16 | 22.81 GiB | 12.25 B | pp768 | 171.82 ± 1.88 |
| llama 13B F16 | 22.81 GiB | 12.25 B | pp1024 | 186.54 ± 5.03 |
| llama 13B F16 | 22.81 GiB | 12.25 B | pp1536 | 192.09 ± 2.03 |
| llama 13B F16 | 22.81 GiB | 12.25 B | pp2048 | 184.93 ± 11.78 |
| llama 13B F16 | 22.81 GiB | 12.25 B | pp3072 | 187.04 ± 2.65 |
| llama 13B F16 | 22.81 GiB | 12.25 B | pp4096 | 177.66 ± 5.01 |
| llama 13B F16 | 22.81 GiB | 12.25 B | tg16 | 3.31 ± 0.00 |
| llama 13B F16 | 22.81 GiB | 12.25 B | pp512+tg64 | 25.63 ± 0.03 |
| model | size | params | test | t/s |
|---|---|---|---|---|
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp1 | 18.60 ± 0.76 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp1 | 18.83 ± 0.03 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp2 | 20.75 ± 0.21 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp3 | 27.65 ± 2.58 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp4 | 33.04 ± 1.56 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp8 | 50.08 ± 2.96 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp12 | 49.36 ± 0.97 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp16 | 60.69 ± 1.54 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp24 | 63.38 ± 3.18 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp32 | 71.41 ± 1.29 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp48 | 67.93 ± 5.06 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp64 | 124.51 ± 1.76 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp96 | 140.86 ± 2.72 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp128 | 161.33 ± 2.23 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp192 | 173.46 ± 2.72 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp256 | 199.94 ± 5.16 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp384 | 220.50 ± 4.32 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp512 | 241.97 ± 4.74 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp768 | 259.88 ± 3.90 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp1024 | 270.64 ± 1.32 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp1536 | 275.05 ± 2.88 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp2048 | 282.40 ± 1.73 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp3072 | 271.99 ± 0.93 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp4096 | 270.66 ± 0.47 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | tg16 | 18.98 ± 0.12 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | pp512+tg64 | 104.82 ± 0.97 |
+15% gain… yes!
(bios 3.07.)