Updated commands to increase max unified memory usage on Framework Desktop under Fedora 43?

Been looking high and low for the proper commands to run under a fresh Fedora 43 install for Framework Desktop on how to increase the VRAM allocation for running larger LLM models under LM Studio. I have the desktop configuration with the AMD Strix Halo AI MAX+ 395 128GB.

I followed Jeff Geerlings blog post here: Increasing the VRAM allocation on AMD AI APUs under Linux | Jeff Geerling

But I am not sure if this is correct. Is setting parameter amd_iommu=off still recommended for better inference speeds?

Other google searches are showing the old method of increasing GTT memory using amdgpu.gttsize=X which has been deprecated?

What is the proper way? And how can I check to confirm indeed my machine is ready to load up large models from LM Studio?

=====

Just as a note I ran the following:

sudo grubby --update-kernel=ALL --args='ttm.pages_limit=27648000'
sudo grubby --update-kernel=ALL --args='ttm.page_pool_size=27648000'
sudo reboot

and then after reboot and running grep for amdgpu memory, i am seeing:

amdgpu: 512M of VRAM memory ready
amdgpu: 108000M of GTT memory ready.

108GB for VRAM allocation. Jeff claims he’s seeing segfault errors over 108G, but he was running on a cluster. Anyone have it set higher with no issues?

I have been trying to tinker with these but for examplke with jeffs settings, i only get these:

[    4.673285] amdgpu 0000:c2:00.0: amdgpu: amdgpu: 512M of VRAM memory ready
[    4.673286] amdgpu 0000:c2:00.0: amdgpu: amdgpu: 64038M of GTT memory ready.

Have you changed any BIOS settings? Strange that I cannot get it to allocate more than the 64G

Yes! I made sure the iGPU allocation was as small as possible in the BIOS settings (512MB). As that is what I’ve been reading is ideal. Since we want Linux to handle the allocation for VRAM and not have it be a BIOS level setting. Windows has the fancy AMD Driver program thing in which they can set VRAM up to 96GB (documentation on Frameworks page says as much).

Jeff’s post, unfortunately uses the args amdttm.* which didn’t work for me, when it should be just ttm.* as the people in his comments noted.

But thats the thing, I don’t know of any other better to way to confirm if the VRAM allocation is indeed maxed to what I set.

Jeff recommends running: sudo dmesg | grep "amdgpu.*memory" after roboot to confirm settings.

Which for me, shows my setting of 27648000 to be set to amdgpu: 108000M of GTT memory ready.

But I don’t know if this is indeed correct for LM Studio or llama. I did load up gpt-oss-120B model and maxed the GPU offload and the context length. LM Studio did say it would eat up roughly ~70GB and i had it run a question about woodchucks with 46.9tk/sec and 0.29s to first token. Which is decent for speed.

I am just looking for confirmation here, or if anyone else is using their Framework Desktop as a Local AI server.

yeah just figured out that have to use just ttm and not amdttm.

I have also LLM Studio in use and gpt-120B and now i can fully max out the context lenght for it. Seems to run pretty nicely, getting something like you, 48tk/sec etc.

were you not able to max context length before when constrained by 64G?

I had my BIOS settings at custom settings which might have had effect on it. But them on defaults now, but didn’t test the 64G settings and just changed the kernel args to the correct ones.

but with the custom bios settings (i had it set the 96 in bios) LLM Studio wouldn’t load the gpt-120B with max context lenght. So it was mostly bad BIOS settings to begin with. But seeing the model takes about 70gigs when fully loaded, don’t think it would have allowed it with max context lenght

If you are seeing crashes when using ROCM for llm etc.
Try this:

You can use “amdgpu_top” to see if the GTT config has worked.

I was following the Strix Halo wiki - AI Capabilities Overview – Strix Halo Wiki. For the parameters & calculations to use.

Since I’m on Fedora SilverBlue, I had to adopt the commands to rpm-ostree.

rpm-ostree kargs --append=amdgpu.gttsize=107520 --append=ttm.pages_limit=27525120 --append=ttm.page_pool_size=15728640 --append=amdgpu.vm_fragment_size=8

I adjusted my memory to 105G unlike the wiki, as my goal is to use the system while getting the most out of LLMs.

Oh, and I kept my BIOS setting the default (Auto (512mb)). amdgpu_top is accurately reporting 105GB of memory available and I’m able to fit gpt-oss-120b into memory just fine now.

2 Likes