Updated commands to increase max unified memory usage on Framework Desktop under Fedora 43?

Been looking high and low for the proper commands to run under a fresh Fedora 43 install for Framework Desktop on how to increase the VRAM allocation for running larger LLM models under LM Studio. I have the desktop configuration with the AMD Strix Halo AI MAX+ 395 128GB.

I followed Jeff Geerlings blog post here: Increasing the VRAM allocation on AMD AI APUs under Linux | Jeff Geerling

But I am not sure if this is correct. Is setting parameter amd_iommu=off still recommended for better inference speeds?

Other google searches are showing the old method of increasing GTT memory using amdgpu.gttsize=X which has been deprecated?

What is the proper way? And how can I check to confirm indeed my machine is ready to load up large models from LM Studio?

=====

Just as a note I ran the following:

sudo grubby --update-kernel=ALL --args='ttm.pages_limit=27648000'
sudo grubby --update-kernel=ALL --args='ttm.page_pool_size=27648000'
sudo reboot

and then after reboot and running grep for amdgpu memory, i am seeing:

amdgpu: 512M of VRAM memory ready
amdgpu: 108000M of GTT memory ready.

108GB for VRAM allocation. Jeff claims he’s seeing segfault errors over 108G, but he was running on a cluster. Anyone have it set higher with no issues?

I have been trying to tinker with these but for examplke with jeffs settings, i only get these:

[    4.673285] amdgpu 0000:c2:00.0: amdgpu: amdgpu: 512M of VRAM memory ready
[    4.673286] amdgpu 0000:c2:00.0: amdgpu: amdgpu: 64038M of GTT memory ready.

Have you changed any BIOS settings? Strange that I cannot get it to allocate more than the 64G

Yes! I made sure the iGPU allocation was as small as possible in the BIOS settings (512MB). As that is what I’ve been reading is ideal. Since we want Linux to handle the allocation for VRAM and not have it be a BIOS level setting. Windows has the fancy AMD Driver program thing in which they can set VRAM up to 96GB (documentation on Frameworks page says as much).

Jeff’s post, unfortunately uses the args amdttm.* which didn’t work for me, when it should be just ttm.* as the people in his comments noted.

But thats the thing, I don’t know of any other better to way to confirm if the VRAM allocation is indeed maxed to what I set.

Jeff recommends running: sudo dmesg | grep "amdgpu.*memory" after roboot to confirm settings.

Which for me, shows my setting of 27648000 to be set to amdgpu: 108000M of GTT memory ready.

But I don’t know if this is indeed correct for LM Studio or llama. I did load up gpt-oss-120B model and maxed the GPU offload and the context length. LM Studio did say it would eat up roughly ~70GB and i had it run a question about woodchucks with 46.9tk/sec and 0.29s to first token. Which is decent for speed.

I am just looking for confirmation here, or if anyone else is using their Framework Desktop as a Local AI server.

yeah just figured out that have to use just ttm and not amdttm.

I have also LLM Studio in use and gpt-120B and now i can fully max out the context lenght for it. Seems to run pretty nicely, getting something like you, 48tk/sec etc.

were you not able to max context length before when constrained by 64G?

I had my BIOS settings at custom settings which might have had effect on it. But them on defaults now, but didn’t test the 64G settings and just changed the kernel args to the correct ones.

but with the custom bios settings (i had it set the 96 in bios) LLM Studio wouldn’t load the gpt-120B with max context lenght. So it was mostly bad BIOS settings to begin with. But seeing the model takes about 70gigs when fully loaded, don’t think it would have allowed it with max context lenght

If you are seeing crashes when using ROCM for llm etc.
Try this:

You can use “amdgpu_top” to see if the GTT config has worked.

I was following the Strix Halo wiki - AI Capabilities Overview – Strix Halo Wiki. For the parameters & calculations to use.

Since I’m on Fedora SilverBlue, I had to adopt the commands to rpm-ostree.

rpm-ostree kargs --append=amdgpu.gttsize=107520 --append=ttm.pages_limit=27525120 --append=ttm.page_pool_size=15728640 --append=amdgpu.vm_fragment_size=8

I adjusted my memory to 105G unlike the wiki, as my goal is to use the system while getting the most out of LLMs.

Oh, and I kept my BIOS setting the default (Auto (512mb)). amdgpu_top is accurately reporting 105GB of memory available and I’m able to fit gpt-oss-120b into memory just fine now.

3 Likes

Linux newbie here. I have Ubuntu installed. Where can I install the grubby command from? I’m not having much luck finding it. Thanks!

I’ll answer my own question… On Ubuntu we edit the GRUB configuration directly, not with the grubby command.

# Edit the GRUB defaults file

sudo nano /etc/default/grub


# Find the line that starts with `GRUB_CMDLINE_LINUX_DEFAULT` and add your TTM parameters to it. For example, if it currently looks like:

GRUB_CMDLINE_LINUX_DEFAULT=“quiet splash”


#Change it to:

GRUB_CMDLINE_LINUX_DEFAULT=“quiet splash ttm.pages_limit=27648000 ttm.page_pool_size=27648000”

#Update GRUB configuration

sudo update-grub

#Reboot

sudo reboot

for llama.cpp on linux (don’t know what happend en windows) you do not need to change anything, live GTT be defaut but :

  • use HIP/ROCM backend (it work good with the native rocm-6.4.4 of fedora 43)
  • export GGML_CUDA_ENABLE_UNIFIED_MEMORY=ON before run llama.cpp

that’s all! It can use all RAM available with no limite (on BIOS conf VRAM as low as possible: 512Mo)

I don’t like the amd_iommu=off it remove some functionality: try a rocminfo with default you can see 3 devices (ie the NPU is the last one) with it define the NPU is note usable (Yes for now I am not sure we can use it :wink: )

for me:

# build
 cmake -S . -B build -DGGML_HIP=ON -DGPU_TARGETS=gfx1151 -DCMAKE_BUILD_TYPE=Release
 cmake --build build --config Release -- -j 16

# run
GGML_CUDA_ENABLE_UNIFIED_MEMORY=ON \
  llama-server --no-mmap -fa on -ub 2048 -b 8192 \
  -m <model>

Thanks @Djip ! I guess forgot to qualify that I was trying to run models with LM Studio. I’ll try llama.cpp

LM Studio I think use vulkan backend… I don’t know if there is the same possibility for now.

good luck with llama.cpp and rocm backend :wink:

I was having this exact same problem and I’m glad that I came here to search for others having the same issue. When I first installed Fedora a few weeks ago I had used the older amdgpu.gttsize command and when I reinstalled I got the amdttm commands from Jeff Geerling’s blog. I couldn’t figure out why it only showed 64038M of GTT memory ready because I checked the commands and the number was correct. I just lived with it for a few days and then decided that I needed to look for answers today.

Fedora use systemd and kernel.

CMD for gtt (if realy needed… in many case it is not):

# sudo nano /etc/modprobe.d/increase_amd_memory.conf
options ttm pages_limit=22500000 #4k per page, 90GB total
# options ttm page_pool_size=22500000  # not needed :)

And do not use this GRUG “hack” even grub cmd change have to be done an other way on never fedora.

If I remamber (not sure…) kernel cmd have to be set with

grubby --update-kernel=ALL --args="<cmd to define>"

OK, thanks. I had used grubby to configure the ttm settings. I undid the changes and rebooted. It shows 64G now but it loaded Qwen3-VL-32B-Instruct-BF16 and didn’t crash.

[ 4.445984] amdgpu 0000:c3:00.0: amdgpu: amdgpu: 512M of VRAM memory ready
[ 4.445986] amdgpu 0000:c3:00.0: amdgpu: amdgpu: 64038M of GTT memory ready.

⬢ [firebaugh@toolbx models]$ llama-bench --mmap 0 -ngl 999 -fa 1 -m /home/firebaugh/_code/models/unsloth/BF16/Qwen3-VL-32B-Instruct-BF16-00001-of-00002.gguf
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32

model size params backend ngl fa mmap test t/s
qwen3vl 32B BF16 61.03 GiB 32.76 B ROCm 999 1 0 pp512 343.29 ± 0.77
qwen3vl 32B BF16 61.03 GiB 32.76 B ROCm 999 1 0 tg128 3.45 ± 0.00

OK, update. Qwen3-VL-32B crashed almost immediately in ComfyUI. I used grubby to set the ttm values back the way they were and now it’s not crashing.

what report:

cat /sys/module/ttm/parameters/pages_limit
# cat /sys/module/ttm/parameters/page_pool_size  # look not mater.

for llama.cpp/ggml ROCM backend no need for GTT, you can use

export GGML_CUDA_ENABLE_UNIFIED_MEMORY=ON

did you use some toolbox/container or “pure” fc43 build/run? (Note: I don’t know what ComfyUI use for backend / config :wink: )

did you try with the /etc/modprobe.d/ config in place of kernel param?

You don’t need to set the page pool size. The page limit is enough.

1 Like

Earlier when I checked the ttm parameters they both showed 27648000 but now it shows 26880000 because I backed off a little from 108 GB down to 105 GB.

I have llama.cpp running in a Toolbox with ROCm 7.1.1 right now. I’m not sure if I’m going to keep it that way or not yet. I’ve only had my system together about two weeks so I’m still learning how to do things differently for AMD and Fedora. Nearly all of my Linux use has been with Debian, Ubuntu, or others derived from them.

I hadn’t tried /etc/modprobe.d/increase_amd_memory.conf before but I tried it this morning and it’s been good so far. Thanks for suggesting that way.

ComfyUI is using ROCm 7.1.1 backend. It’s installed in a miniconda venv.

Some toolbox have the export GGML_CUDA_ENABLE_UNIFIED_MEMORY=ON define by default if it is the case it can use all RAM and not only the gtt with llama.cpp rocm backend.

Nice ! I find it simple than change kernel params. So good that it work.