iGPU VRAM - How much can be assigned?

Hey folks, I ordered the entry level Framework Desktop with Ryzen AI 385 and 32 GB RAM as a mini gaming pc in our living room. Usually I’d said that 12 to 16 GB VRAM and 16 to 20 GB System RAM would fit well for mid level gaming in Full-HD. Now I ask myself if it is actually possible to assign that much “dedicated” VRAM to the iGPU? Or are there restrictions?

Does anyone have an idea?

1 Like

The Strix Halo APU makes use of a unified RAM model much like the Mac M* series of APUs

The amount of ram that can be allocated to the iGPU is variable and can be controlled from as low as 1 GB (Citation needed, but i’m pretty sure this is the low end of the allocation) up to the maximum available to the system. in practice I understand that you can assign up to “System memory minus 8Gig” to the GPU when running windows, and rather more under Linux when not using a desktop environment. (Citation needed. this is just my understanding from the video reviews and articles of the framework desktop and other Strix Halo devices)

So a 16/16GB split as you suggest should be more than achievable.

I’ve recently purchased Crucial 128GB Kit (64GBx2) DDR5-5600 SODIMM - CT2K64G56C46S5 for my FW16.

The BIOS has reserved ( including VRAM which is 2GB ) about 4.36GB~ so that leaves me with about 123.64~ and I’ve allocated 122GB.

My guess is ( on Linux, I’m not sure about Windows as I don’t use that ) is all of it, this is assuming Strix Halo is using similar/same concept of memory architecture as Phoenix

Using Gentoo ( Linux 6.16.0 ) with KDE Plasma 6.3.6 Wayland

We listed some information on this in the Machine Learning tab for Framework Desktop: https://frame.work/desktop?tab=machine-learning

1 Like

Thanks for pointing to the site. I haven’t been aware of it. So, up to 24 GB VRAM can be assigned to the iGPU on the basic machine with its 32 GB RAM using Windows. That sounds nice.

If you are doing LLM or using ROCM, you don’t need VRAM, you need GTT, which can go upto 96GB without any bios setting needed.

I don’t believe this is properly documented anywhere, but on Linux, in the past for AMD APUs, amdgpu.gttsize has been the way to assign GTT, but that is now deprecated. The appropriate way to assign things now is via TTM.

In BIOS GART (base VRAM) can be set to 512MB or anything low, and you can use something like /etc/modprobe.d/amdgpu_llm_optimized.conf to load options. Here’s setting the max GTT pool to 120GB (w/ 60GB preallocated):

# Maximize GTT for LLM usage on 128GB UMA system
options amdgpu gttsize=120000
options ttm pages_limit=31457280
options ttm page_pool_size=15728640

Note, the ttm options are specified as 4KiB pages if you want to do your own math.

Also, while it’s not massive, I’ve found a 5-10% MBW performance boost when setting amd_iommu=off (tested with memtest_vulkan and ROCm Bandwidth Test).

Some additional real-world usage notes if you are using something llama.cpp based for inferencing:

  • by default llama.cpp sets mmap=on which is marginally bad for Vulkan model loading over 1/2 memory usage, but is catastrophic for ROCm model loading. You should use the appropriate command option (differs based on specific llama.cpp binary) to disable mmap in general for Strix Halo
  • you should also be sure to use –ngl 99 (or 999 if >99 layers) - while there is shared memory, 1) CPU memory bandwidth is basically half of the GPU due to Strix Halos memory architecture (I’ve heard due to GMI link design, but the practical part is you’re going to have much lower CPU vs GPU mbw) and 2) have the model in GPU address space is ofc more efficient for GPU inferencing
8 Likes

Yes and no.

VRAM is dedicated memory for the iGPU. Yes, it surprised me too—it’s faster than GTT. The reasons are simple: because it’s allocated at system startup, it is one linear chunk, it is generally mapped as a one-to-one aperture, and the APU memory access subsystem is specifically tuned for GPU usage, including bypassing the cache.

GTT, on the other hand, even when used by the iGPU, is still “mapped” memory. For all practical purposes, this means it must translate addresses and that it may disrupt the TLB when the CPU uses it concurrently. The benefit is that if you’re not using your desktop primarily as a low-power server or LLM box sitting in the corner, GTT allows the CPU to use all that memory when the GPU isn’t using it.

1 Like

This may require some empirical testing to see if there’s any impact on modern AMD APUs like the Strix Halo APU. A quick search suggests (and a more thorough followup) suggests that if the mapping isn’t being flushed and there isn’t fragmentation, then there probably isn’t a perf difference.

I’m running sweeps on my box atm so will leave it for someone else to test if they care (although in practice, since you can’t expand GART to where GTT goes, even if there’s a difference, the tradeoff is you can’t access full memory for your GPU in that case).

Here’s my memtest_vulkan results:

1: Bus=0xC3:00 DevId=0x1586   79GB Radeon 8060S Graphics (RADV GFX1151)
Tester worker logging started at 2025-07-29T13:37:58.308548Z
Standard 5-minute test of 1: Bus=0xC3:00 DevId=0x1586   79GB Radeon 8060S Graphics (RADV GFX1151)
      1 iteration. Passed  0.6668 seconds  written:   72.5GB 228.6GB/sec        checked:   76.1GB 217.7GB/sec
      3 iteration. Passed  1.3345 seconds  written:  145.0GB 229.4GB/sec        checked:  152.2GB 216.7GB/sec
     11 iteration. Passed  5.3006 seconds  written:  580.0GB 231.3GB/sec        checked:  609.0GB 218.0GB/sec
     57 iteration. Passed 30.6377 seconds  written: 3335.0GB 229.5GB/sec        checked: 3501.8GB 217.4GB/sec
    102 iteration. Passed 30.0156 seconds  written: 3262.5GB 229.4GB/sec        checked: 3425.6GB 216.9GB/sec
    147 iteration. Passed 30.0060 seconds  written: 3262.5GB 229.2GB/sec        checked: 3425.6GB 217.2GB/sec
    192 iteration. Passed 30.0038 seconds  written: 3262.5GB 229.3GB/sec        checked: 3425.6GB 217.2GB/sec
    237 iteration. Passed 30.0072 seconds  written: 3262.5GB 229.4GB/sec        checked: 3425.6GB 217.0GB/sec
    283 iteration. Passed 30.6680 seconds  written: 3335.0GB 229.5GB/sec        checked: 3501.8GB 217.0GB/sec
    328 iteration. Passed 30.0094 seconds  written: 3262.5GB 229.2GB/sec        checked: 3425.6GB 217.1GB/sec
    373 iteration. Passed 30.0128 seconds  written: 3262.5GB 229.3GB/sec        checked: 3425.6GB 217.0GB/sec
    418 iteration. Passed 30.0270 seconds  written: 3262.5GB 229.3GB/sec        checked: 3425.6GB 216.8GB/sec

BTW, that’s with amd_iommu=off. You do take a measurable hit if you don’t set that:

1: Bus=0xC3:00 DevId=0x1586   79GB Radeon 8060S Graphics
Tester worker logging started at 2025-07-29T14:31:22.839542Z
Standard 5-minute test of 1: Bus=0xC3:00 DevId=0x1586   79GB Radeon 8060S Graphics
      1 iteration. Passed  0.6745 seconds  written:   68.9GB 214.7GB/sec        checked:   72.5GB 204.9GB/sec
      3 iteration. Passed  1.3518 seconds  written:  137.8GB 215.4GB/sec        checked:  145.0GB 203.6GB/sec
     11 iteration. Passed  5.4121 seconds  written:  551.0GB 214.9GB/sec        checked:  580.0GB 203.6GB/sec
     56 iteration. Passed 30.4277 seconds  written: 3099.4GB 215.2GB/sec        checked: 3262.5GB 203.6GB/sec
    101 iteration. Passed 30.4181 seconds  written: 3099.4GB 215.2GB/sec        checked: 3262.5GB 203.7GB/sec
    146 iteration. Passed 30.4382 seconds  written: 3099.4GB 215.0GB/sec        checked: 3262.5GB 203.6GB/sec
    191 iteration. Passed 30.4201 seconds  written: 3099.4GB 215.1GB/sec        checked: 3262.5GB 203.8GB/sec
    236 iteration. Passed 30.4362 seconds  written: 3099.4GB 215.2GB/sec        checked: 3262.5GB 203.5GB/sec
    281 iteration. Passed 30.4166 seconds  written: 3099.4GB 215.2GB/sec        checked: 3262.5GB 203.7GB/sec
    326 iteration. Passed 30.4324 seconds  written: 3099.4GB 215.0GB/sec        checked: 3262.5GB 203.7GB/sec
    371 iteration. Passed 30.4028 seconds  written: 3099.4GB 215.4GB/sec        checked: 3262.5GB 203.8GB/sec
    416 iteration. Passed 30.4137 seconds  written: 3099.4GB 215.1GB/sec        checked: 3262.5GB 203.8GB/sec
2 Likes

I see no difference in pre allocated gpu memory over unified memory. I tested this on my HP ZBook Ultra G1a 14 (also a Strix Halo APU). Preallocation is useful for older apps that just look at the memory the gpu reports, some applications think there is not enough memory.

1 Like