AMD AI Max+ 395 128GB with cline

There is a recent blog post from Cline about this that was pretty good: Cline + LM Studio: the local coding stack with Qwen3 Coder 30B - Cline Blog

They suggested using Qwen3 Coder 30B, and they have an option for a “compact prompt” to reduce context use. I’ve only started playing with a little, so I don’t have a good impression yet of how useful it can be; it’s definitely slower compared to cloud models, but not unusable.

Note that to get the model to load with full context size, I had to increase GTT limits (on Linux), as shown here: iGPU VRAM - How much can be assigned? - #7 by lhl