Been looking high and low for the proper commands to run under a fresh Fedora 43 install for Framework Desktop on how to increase the VRAM allocation for running larger LLM models under LM Studio. I have the desktop configuration with the AMD Strix Halo AI MAX+ 395 128GB.
and then after reboot and running grep for amdgpu memory, i am seeing:
amdgpu: 512M of VRAM memory ready
amdgpu: 108000M of GTT memory ready.
108GB for VRAM allocation. Jeff claims he’s seeing segfault errors over 108G, but he was running on a cluster. Anyone have it set higher with no issues?
Yes! I made sure the iGPU allocation was as small as possible in the BIOS settings (512MB). As that is what I’ve been reading is ideal. Since we want Linux to handle the allocation for VRAM and not have it be a BIOS level setting. Windows has the fancy AMD Driver program thing in which they can set VRAM up to 96GB (documentation on Frameworks page says as much).
Jeff’s post, unfortunately uses the args amdttm.* which didn’t work for me, when it should be just ttm.* as the people in his comments noted.
But thats the thing, I don’t know of any other better to way to confirm if the VRAM allocation is indeed maxed to what I set.
Jeff recommends running: sudo dmesg | grep "amdgpu.*memory" after roboot to confirm settings.
Which for me, shows my setting of 27648000 to be set to amdgpu: 108000M of GTT memory ready.
But I don’t know if this is indeed correct for LM Studio or llama. I did load up gpt-oss-120B model and maxed the GPU offload and the context length. LM Studio did say it would eat up roughly ~70GB and i had it run a question about woodchucks with 46.9tk/sec and 0.29s to first token. Which is decent for speed.
I am just looking for confirmation here, or if anyone else is using their Framework Desktop as a Local AI server.
yeah just figured out that have to use just ttm and not amdttm.
I have also LLM Studio in use and gpt-120B and now i can fully max out the context lenght for it. Seems to run pretty nicely, getting something like you, 48tk/sec etc.
I had my BIOS settings at custom settings which might have had effect on it. But them on defaults now, but didn’t test the 64G settings and just changed the kernel args to the correct ones.
but with the custom bios settings (i had it set the 96 in bios) LLM Studio wouldn’t load the gpt-120B with max context lenght. So it was mostly bad BIOS settings to begin with. But seeing the model takes about 70gigs when fully loaded, don’t think it would have allowed it with max context lenght
I adjusted my memory to 105G unlike the wiki, as my goal is to use the system while getting the most out of LLMs.
Oh, and I kept my BIOS setting the default (Auto (512mb)). amdgpu_top is accurately reporting 105GB of memory available and I’m able to fit gpt-oss-120b into memory just fine now.