Been looking high and low for the proper commands to run under a fresh Fedora 43 install for Framework Desktop on how to increase the VRAM allocation for running larger LLM models under LM Studio. I have the desktop configuration with the AMD Strix Halo AI MAX+ 395 128GB.
and then after reboot and running grep for amdgpu memory, i am seeing:
amdgpu: 512M of VRAM memory ready
amdgpu: 108000M of GTT memory ready.
108GB for VRAM allocation. Jeff claims he’s seeing segfault errors over 108G, but he was running on a cluster. Anyone have it set higher with no issues?
Yes! I made sure the iGPU allocation was as small as possible in the BIOS settings (512MB). As that is what I’ve been reading is ideal. Since we want Linux to handle the allocation for VRAM and not have it be a BIOS level setting. Windows has the fancy AMD Driver program thing in which they can set VRAM up to 96GB (documentation on Frameworks page says as much).
Jeff’s post, unfortunately uses the args amdttm.* which didn’t work for me, when it should be just ttm.* as the people in his comments noted.
But thats the thing, I don’t know of any other better to way to confirm if the VRAM allocation is indeed maxed to what I set.
Jeff recommends running: sudo dmesg | grep "amdgpu.*memory" after roboot to confirm settings.
Which for me, shows my setting of 27648000 to be set to amdgpu: 108000M of GTT memory ready.
But I don’t know if this is indeed correct for LM Studio or llama. I did load up gpt-oss-120B model and maxed the GPU offload and the context length. LM Studio did say it would eat up roughly ~70GB and i had it run a question about woodchucks with 46.9tk/sec and 0.29s to first token. Which is decent for speed.
I am just looking for confirmation here, or if anyone else is using their Framework Desktop as a Local AI server.
yeah just figured out that have to use just ttm and not amdttm.
I have also LLM Studio in use and gpt-120B and now i can fully max out the context lenght for it. Seems to run pretty nicely, getting something like you, 48tk/sec etc.
I had my BIOS settings at custom settings which might have had effect on it. But them on defaults now, but didn’t test the 64G settings and just changed the kernel args to the correct ones.
but with the custom bios settings (i had it set the 96 in bios) LLM Studio wouldn’t load the gpt-120B with max context lenght. So it was mostly bad BIOS settings to begin with. But seeing the model takes about 70gigs when fully loaded, don’t think it would have allowed it with max context lenght
I adjusted my memory to 105G unlike the wiki, as my goal is to use the system while getting the most out of LLMs.
Oh, and I kept my BIOS setting the default (Auto (512mb)). amdgpu_top is accurately reporting 105GB of memory available and I’m able to fit gpt-oss-120b into memory just fine now.
I’ll answer my own question… On Ubuntu we edit the GRUB configuration directly, not with the grubby command.
# Edit the GRUB defaults file
sudo nano /etc/default/grub
# Find the line that starts with `GRUB_CMDLINE_LINUX_DEFAULT` and add your TTM parameters to it. For example, if it currently looks like:
GRUB_CMDLINE_LINUX_DEFAULT=“quiet splash”
#Change it to:
GRUB_CMDLINE_LINUX_DEFAULT=“quiet splash ttm.pages_limit=27648000 ttm.page_pool_size=27648000”
#Update GRUB configuration
sudo update-grub
#Reboot
sudo reboot
for llama.cpp on linux (don’t know what happend en windows) you do not need to change anything, live GTT be defaut but :
use HIP/ROCM backend (it work good with the native rocm-6.4.4 of fedora 43)
export GGML_CUDA_ENABLE_UNIFIED_MEMORY=ON before run llama.cpp
that’s all! It can use all RAM available with no limite (on BIOS conf VRAM as low as possible: 512Mo)
I don’t like the amd_iommu=off it remove some functionality: try a rocminfo with default you can see 3 devices (ie the NPU is the last one) with it define the NPU is note usable (Yes for now I am not sure we can use it )
I was having this exact same problem and I’m glad that I came here to search for others having the same issue. When I first installed Fedora a few weeks ago I had used the older amdgpu.gttsize command and when I reinstalled I got the amdttm commands from Jeff Geerling’s blog. I couldn’t figure out why it only showed 64038M of GTT memory ready because I checked the commands and the number was correct. I just lived with it for a few days and then decided that I needed to look for answers today.
OK, thanks. I had used grubby to configure the ttm settings. I undid the changes and rebooted. It shows 64G now but it loaded Qwen3-VL-32B-Instruct-BF16 and didn’t crash.
OK, update. Qwen3-VL-32B crashed almost immediately in ComfyUI. I used grubby to set the ttm values back the way they were and now it’s not crashing.
Earlier when I checked the ttm parameters they both showed 27648000 but now it shows 26880000 because I backed off a little from 108 GB down to 105 GB.
I have llama.cpp running in a Toolbox with ROCm 7.1.1 right now. I’m not sure if I’m going to keep it that way or not yet. I’ve only had my system together about two weeks so I’m still learning how to do things differently for AMD and Fedora. Nearly all of my Linux use has been with Debian, Ubuntu, or others derived from them.
I hadn’t tried /etc/modprobe.d/increase_amd_memory.conf before but I tried it this morning and it’s been good so far. Thanks for suggesting that way.
ComfyUI is using ROCm 7.1.1 backend. It’s installed in a miniconda venv.
Some toolbox have the export GGML_CUDA_ENABLE_UNIFIED_MEMORY=ON define by default if it is the case it can use all RAM and not only the gtt with llama.cpp rocm backend.
Nice ! I find it simple than change kernel params. So good that it work.