VRAM allocation for the 7840U frameworks

Rufo_Sanchez · June 13, 2024, 1:38pm

This is also something I’ve been interested in, especially with news of the patch @Wrybill_Plover linked; so I popped on the linux-mainline kernel on my Arch install (currently 6.10rc3-1) and compiled llama.cpp from the current HEAD as of today (172c825). Notably, since I’m on the 6.10 release candidate, I did not use the HIP_UMA flag. All these runs were made in performance mode.

llama-bench on the iGPU

# compiled with `HSA_OVERRIDE_GFX_VERSION="11.0.0" make LLAMA_HIPBLAS=1 -j 8`
rufo@framework-linux (git)-[master]-% HSA_OVERRIDE_GFX_VERSION="11.0.0" ./llama-bench -m ~/Downloads/llama-2-7b-chat.Q8_0.gguf
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon 780M, compute capability 11.0, VMM: no
| model                          |       size |     params | backend    | ngl |          test |              t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | ---------------: |
| llama 7B Q8_0                  |   6.67 GiB |     6.74 B | ROCm       |  99 |         pp512 |    259.26 ± 0.70 |
| llama 7B Q8_0                  |   6.67 GiB |     6.74 B | ROCm       |  99 |         tg128 |     10.69 ± 0.13 |

build: 172c8256 (3145)
HSA_OVERRIDE_GFX_VERSION="11.0.0" ./llama-bench -m   73.24s user 1.50s system 100% cpu 1:14.63 total

CPU (openBLAS)

# compiled with `HSA_OVERRIDE_GFX_VERSION="11.0.0" make LLAMA_OPENBLAS=1 -j 8`
rufo@framework-linux (git)-[master]-% ./llama-bench -m ~/Downloads/llama-2-7b-chat.Q8_0.gguf
| model                          |       size |     params | backend    | threads |          test |              t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | ---------------: |
| llama 7B Q8_0                  |   6.67 GiB |     6.74 B | BLAS       |       8 |         pp512 |     13.44 ± 0.07 |
| llama 7B Q8_0                  |   6.67 GiB |     6.74 B | BLAS       |       8 |         tg128 |      6.99 ± 0.20 |

build: 172c8256 (3145)
./llama-bench -m ~/Downloads/llama-2-7b-chat.Q8_0.gguf  3987.82s user 11.98s system 1247% cpu 5:20.56 total

CPU (no flags)

# compiled with `make -j 8`
rufo@framework-linux (git)-[master]-% ./llama-bench -m ~/Downloads/llama-2-7b-chat.Q8_0.gguf
| model                          |       size |     params | backend    | threads |          test |              t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | ---------------: |
| llama 7B Q8_0                  |   6.67 GiB |     6.74 B | CPU        |       8 |         pp512 |     44.67 ± 1.07 |
| llama 7B Q8_0                  |   6.67 GiB |     6.74 B | CPU        |       8 |         tg128 |      7.72 ± 0.13 |

build: 172c8256 (3145)
./llama-bench -m ~/Downloads/llama-2-7b-chat.Q8_0.gguf  1535.49s user 9.35s system 794% cpu 3:14.55 total

So I’m observing quite a good speed bump without the HIP_UMA flag enabled on 6.10.

EDIT: I noticed the other benches had a CPU backend and not BLAS, so I reran them with the default compiled backend. I may re-run them with clang using the command @Nils_Ponsard posted out of curiosity, but I have to run for the moment.

EDIT 2: Ugh, realized I had a rogue process on the CPU run Re-ran it w/gcc and it came out about 50% faster… not going to bother with the openBLAS bench again but presume that would also be about 50% faster.

EDIT 3: clang seems to be about the same speed, at least with the flags make used by default. Done with this set of experimentation for now

Topic		Replies	Views
How do I set a specific VRAM amount on 13" AMD Laptop Community Support windows , bios	17	2624	May 26, 2025
BIOS Feature Request: Add ability to specify UMA size on AMD APUs Framework Laptop 13 feature-requests , bios	70	10222	October 29, 2025
Help! If anyone from FW is seeing this, we need more VRAM ;) Framework Laptop 16 framework-laptop-16-amd-7040	63	6162	December 23, 2023
Extra Vram on framework 13 Framework Laptop 13	6	3033	December 17, 2023
Look there is now build for rocm with official support for the iGPU (780M+?) Framework Laptop 16 framework-laptop-16-amd-7040 , framework-laptop-16-amd-ai-300 , graphics-module-amd-rx7700s	7	186	October 25, 2025

VRAM allocation for the 7840U frameworks

Related topics