I’m using LM Studio to run LLM on my FW13 with 32GB of RAM.
llama.cpp vulkan runtime.
As long as the model uses les than half of the ram (16GB) everything works fine, but when a model exceed that, it starts using swap memory (can be seen by maxing out disk speed) and failing.
It feels to me there is a software limit that prevent the radeon 760m from requesting more than half RAM as VRAM, is there a way to unlock this limit and increase VRAM allocation?
The reason I’m trying a larger model is that Qwen 3 30B A3B is a MoE model that only has 3B active parameters, and should result theoretically in much superior speed than a 14B model. E.g. On my desktop the MoE model achieves 80 tokens/s. I suspect that if I can load it with 32GB of ram, I can boost the speed and use a bigger model.