VRAM allocation for the 7840U frameworks

Kelvie · September 13, 2023, 9:15pm

Anyone know how the VRAM allocation will work? Is it BIOS controlled? Or can it be controlled somehow at runtime (in Linux).

I ask because for doing machine learning stuff I’m curious to see the performance of using the integrated GPU with a lot of VRAM allocated to it (as in > 70 GB, I think the AMD Framework 13 can support up to 96GB of DDR5)

Kyle_Reis · September 13, 2023, 9:23pm

The ram should be dynamically allocated as needed between the CPU and iGPU (I think it might be limited to only allocating up to half of the dynamic ram to the iGPU, but I’m not sure about).

Some portion can be dedicated to the iGPU because some programs can’t handle dynamically shared VRAM, however most of those programs don’t need much dedicated VRAM so a lot of laptops don’t allow more than 256-512 MB to be dedicated to the iGPU in the BIOS.

Kelvie · September 13, 2023, 9:34pm

I guess the question to Framework will be, is it going to be possible to have a BIOS/firmware setting to allow us to override the static portion of this to let us use a lot more.

I also found this: How to allocate more memory to my Ryzen APU's GPU? · Issue #2014 · RadeonOpenCompute/ROCm · GitHub

Second_Coming · September 13, 2023, 10:51pm

Your particular use case is on the enterprise side of things, not something an iGPU / RDNA3 is targeted for.

A6000 has 48GB VRAM.

Next step up, you are looking into the H100 series (80GB+):

Kelvie · September 13, 2023, 11:14pm

Oh

I am aware, I dabble in localllama, which is what prompted me to make this post. More VRAM will at least let me load a 70B model (although with some quantizations I can load it on my 24GB desktop GPU today), but this in theory lets me test larger models, even if the tokens/second are a lot slower. My hope was that it’d be at least faster than using the CPU + RAM alone, if this were possible.

Second_Coming · September 14, 2023, 12:19am

With unreleased products in general, no one knows the implemented limitations until the product is in hand. With that, I’d say, hope for the best, plan for the worst.

Kelvie · November 2, 2023, 6:13am

Update: There doesn’t appear to be a BIOS setting for this, as of 3.03.

jhoff80 · November 2, 2023, 4:03pm

There actually are two settings but not clearly labeled.

I’m not at my Framework laptop right now but there was something along the lines of UMA_AUTO set as default, with the alternative option of UMA_GAME_OPTIMIZED. Auto dedicated 512MB to VRAM (out of the 64GB I had installed). Game optimized dedicates 4GB out of 64GB. Not sure if there are other changes associated with this option in addition to the VRAM though.

AndrewVeee · November 2, 2023, 6:25pm

I’ve been wondering the same thing. Would be nice to have that support in the bios.

It seems like it’s true that it will dynamically allocate ram for the GPU as needed (or up to half of RAM maybe?), but a lot of the tools query the availability up front and fail.

On the plus side, enough people are interested in APUs that they’re working on shared ram issues. I saw this the other day:

The code is tiny, and saw one report that it worked on a 6800hs, but it’s definitely a bit of a hack, and specifically for pytorch.

I wouldn’t want to count on framework adding these options to the bios, so probably gonna have to cross our fingers that more support for shared ram gets more code soon.

I’m still unsure if the max dynamic allocation is half of system ram, or if that also gets changed on individual laptops, and I already feel like I’m making a lot of tradeoffs to support framework haha

Kyle_Reis · November 2, 2023, 7:26pm

Max dynamic allocation is half, which IIRC is a limitation of either the drivers or the OS (can’t remember which). I’ve seen people mention there’s a workaround to allow more, although I haven’t found it.

Edit: Here’s some discussion about this (including how to override it on Linux).

James3 · March 3, 2024, 9:49pm

@Kyle_Reis
Would you have a URL link for how to make a program / app that uses Dynamically shared VRAM?

Djip · March 15, 2024, 6:21am

Can be nice if we can have more options in setting UMA in bios (UMA_AUTO / UMA_GAME_OPTIMIZED)
look like there is a hard way to do it:

Unlocking GPU Memory Allocation on AMD Ryzen™ APU? | by Winston Ma | Medium
GitHub - DavidS95/Smokeless_UMAF
with 64 ou 96 GByte of RAM can be nice if we can have 32/48/64 GByte for run some demanding IA (Mistral LLM like open-mixtral-8x7b that nead ~ 30GByte: Model Selection | Mistral AI Large Language Models . I test it with CPU (16 core Ryzen 5950X) a bit slow but good result with AMD LM-Studio (https://lmstudio.ai/) . I like to use the GPU can be noticeably faster but now can’t use gtt memory only vRam

Djip · March 15, 2024, 6:29am

A6000 and H100 is good for training and/or run large batch size of IA model (LLM in this case) But only memory is neaded for local (ie only 1 batch) inferance of LLM.

open-mixtral-8x7b work on 16 core Ryzen 5950X (slow) can’t test but I thing with good speed on Zen4 (AMD Ryzen 9 7950X3D). So prety sur it can be even faster with the RDNA3 iGPU of the 7840 (U/HS)
(and may be even better when AMD allow to use the NPU … )

Djip · March 15, 2024, 6:45am

Other sample with “old” APU (pre RNDA GPU) and stable diffusion IA:
https://www.gabriel.urdhr.fr/2022/08/28/trying-to-run-stable-diffusion-on-amd-ryzen-5-5600g/#allocating-more-vram-to-the-igpu
Look like some bios allow to reserve users define VRAM.

Wrybill_Plover · March 15, 2024, 3:24pm

Indeed. And, for what it’s worth, we have a request to add that to the FW BIOS on the forums here: BIOS Feature Request: Add ability to specify UMA size on AMD APUs

By the way, llama.cpp now supports dynamic VRAM allocation on the APUs: ROCm AMD Unified Memory Architecture (UMA) handling by ekg · Pull Request #4449 · ggerganov/llama.cpp · GitHub

Unfortunately, it doesn’t look like there are any StableDiffusion implementations that do that as well.

Adrian_Joachim · March 15, 2024, 4:06pm

And somehow the gpu still performs worse than the cpu itself XD

Wrybill_Plover · March 15, 2024, 5:14pm

Really? I didn’t try this llama.cpp version. Do you mind sharing the results you’ve seen?

The performance of StableDiffusion was much better, in my experience, when it used the iGPU than on the CPU alone. But, I could only use it within the UMA memory limits, of course.

Adrian_Joachim · March 15, 2024, 6:22pm

Was a while ago and I didn’t store results, I was playing with llama2 70b and got around 2 tokens/s on the cpu and a bit over 1 on the igpu. I did verify that it was using the gpu, amdgpu_top showed full load and apropriate vram usage. I do not really know what I am doing though.

Dynamic memory allocation looked like it worked

Kelvie · March 15, 2024, 6:41pm

I’ve also tried llama.cpp, and yeah it does seem like it’s quite a bit slower on the GPU vs just on the CPU on the 7840U.

On the GPU on really large models I had my gpu crash (forget which kernel version / driver versions I had though), I only did a short test.

In theory with 96GB of memory I could run really large models, but they take a long time right now, and haven’t really find a use-case to explore this some more.

Wrybill_Plover · March 15, 2024, 8:12pm

I just ran some tests with a 7B model. The GPU version compiled with the LLAMA_HIP_UMA=ON option outperforms the CPU by an order of magnitude, ~172 t/s vs ~15 t/s (this is on battery Power Save profile):

llama-bench using ROCm on the iGPU

$ HSA_OVERRIDE_GFX_VERSION=11.0.0 llama-bench -m models/7B/llama-2-7b-chat.Q8_0.gguf
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 ROCm devices:
  Device 0: AMD Radeon Graphics, compute capability 11.0, VMM: no
| model                          |       size |     params | backend    | ngl | test       |              t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ---------- | ---------------: |
| llama 7B Q8_0                  |   6.67 GiB |     6.74 B | ROCm       |  99 | pp 512     |    171.59 ± 1.90 |
| llama 7B Q8_0                  |   6.67 GiB |     6.74 B | ROCm       |  99 | tg 128     |      8.58 ± 0.06 |

versus

llama-bench using on the CPU

$ llama-bench -m models/7B/llama-2-7b-chat.Q8_0.gguf
| model                          |       size |     params | backend    |    threads | test       |              t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ---------: | ---------- | ---------------: |
| llama 7B Q8_0                  |   6.67 GiB |     6.74 B | CPU        |          8 | pp 512     |     14.68 ± 0.53 |
| llama 7B Q8_0                  |   6.67 GiB |     6.74 B | CPU        |          8 | tg 128     |      5.77 ± 0.03 |

build: unknown (0)

Compiled without the UMA option, the model doesn’t fit into memory:

llama-bench using ROCm on the iGPU, no dynamic VRAM allocation

$ HSA_OVERRIDE_GFX_VERSION=11.0.0 llama-bench -m models/7B/llama-2-7b-chat.Q8_0.gguf
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 ROCm devices:
  Device 0: AMD Radeon Graphics, compute capability 11.0, VMM: no
| model                          |       size |     params | backend    | ngl | test       |              t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ---------- | ---------------: |
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 6695.84 MiB on device 0: cudaMalloc failed: out of memory
main: error: failed to load model 'models/7B/llama-2-7b-chat.Q8_0.gguf'

Didn’t try 70B yet - not sure it’ll fit at all. I “only” have 64GB total.

Topic		Replies	Views
Reserved RAM / RAM allocation for AMD RDNA3 iGPU Framework Laptop 13	6	2948	August 2, 2023
FW16 iGPU questions Framework Laptop 16 framework-laptop-16-amd-7040	3	867	June 17, 2024
Pushing the VRAM Limits? Framework Laptop 13 windows , bios , framework-laptop-13-amd-7040	1	100	May 23, 2025
Extra Vram on framework 13 Framework Laptop 13	6	2557	December 17, 2023
Framework Laptop 13 Ryzen 300 - Configuring graphics memory Framework Laptop 13 framework-laptop-13-amd-ai-300	16	895	April 28, 2025

VRAM allocation for the 7840U frameworks

Related topics