I just ran some tests with a 7B model. The GPU version compiled with the LLAMA_HIP_UMA=ON
option outperforms the CPU by an order of magnitude, ~172 t/s vs ~15 t/s (this is on battery Power Save profile):
llama-bench using ROCm on the iGPU
$ HSA_OVERRIDE_GFX_VERSION=11.0.0 llama-bench -m models/7B/llama-2-7b-chat.Q8_0.gguf
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 ROCm devices:
Device 0: AMD Radeon Graphics, compute capability 11.0, VMM: no
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ---------- | ---------------: |
| llama 7B Q8_0 | 6.67 GiB | 6.74 B | ROCm | 99 | pp 512 | 171.59 ± 1.90 |
| llama 7B Q8_0 | 6.67 GiB | 6.74 B | ROCm | 99 | tg 128 | 8.58 ± 0.06 |
versus
llama-bench using on the CPU
$ llama-bench -m models/7B/llama-2-7b-chat.Q8_0.gguf
| model | size | params | backend | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ---------: | ---------- | ---------------: |
| llama 7B Q8_0 | 6.67 GiB | 6.74 B | CPU | 8 | pp 512 | 14.68 ± 0.53 |
| llama 7B Q8_0 | 6.67 GiB | 6.74 B | CPU | 8 | tg 128 | 5.77 ± 0.03 |
build: unknown (0)
Compiled without the UMA option, the model doesn’t fit into memory:
llama-bench using ROCm on the iGPU, no dynamic VRAM allocation
$ HSA_OVERRIDE_GFX_VERSION=11.0.0 llama-bench -m models/7B/llama-2-7b-chat.Q8_0.gguf
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 ROCm devices:
Device 0: AMD Radeon Graphics, compute capability 11.0, VMM: no
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ---------- | ---------------: |
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 6695.84 MiB on device 0: cudaMalloc failed: out of memory
main: error: failed to load model 'models/7B/llama-2-7b-chat.Q8_0.gguf'
Didn’t try 70B yet - not sure it’ll fit at all. I “only” have 64GB total.