because you suggest it, I will check it out – but I’ve split my RAM 50/50 between iGPU (VRAM?) and regular system RAM because I’m using RAM for the audio stuff (TTS, STT) so I’m not sure I can run gpt-oss-120B even if it is “sparse”! ![]()
In Linux, you can just allocate the minimum (512MB or 1GB) to iGPU, and it will function as unified memory.
OK you’ve lost me but intrigued me… which will function as unified memory? the 1GB allocated to iGPU or the 127 GB not allocated to iGPU?
but even more important (to me) is whether some particular LLM functionality requires dedicated VRAM (or whatever it’s going to be called) or is everything smart enough (in LInux) to “share and share alike” from the common unified RAM pool? I have already seen error messages to the effect “insufficient VRAM” because another process had it locked or whatever…
if there really is a way to just let everything take what they need at the moment and not preclude other processes that would be ideal (I guess) but I’m not sure that is the case.
Not allocated to GPU. Don’t know about all tools, but llama.cpp, vllm, Vulkan, pytorch all can work with unified memory on Strix Halo just fine. Check out this page: AI Capabilities Overview – Strix Halo Wiki
now it’s definitely worth looking into because I would prefer using Coqui TTS vs. Kokoro if nothing else in the stack started squabbling about VRAM – which I think Coqui requires vs. Kokoro which I think uses system RAM.