I’m asking for suggestions of run a LLM for cline agent coding since there’s not much info online and my GPT and Claude seems really not a reliable options to ask, I’ve view almost anything I can find and still concludes a definite answer.
I’m now in one of the framework desktop late batches and I wanna try out local LLM at then, I primarily use cline + gemini 2.5 flash for Unity/Go backend and occasionally for language likes rust, python typescripts etc if I feel like to code small tool for faster iterations
Would It feels worse in local server? And what model should I go for?
A local LLM server will feel much slower. Much slower.
As for which model you should run, if you’re disappointed with Claude, that might be difficult… It’s unlikely anything running locally can do much better.
They are all unreliable. AI tools are exactly that…tools. You still need to know what you are doing, I regularly correct any LLM I am using, but the upshot of this is that its responses improve over time, and even though I have to correct it, the LLM save me time in creating files that I can edit quickly that would take me a lot of time to create wholesale form scratch. No matter what you do though the human will still need to do the final heavy lift. I just get to do a lot more heavy lifting and a lot less pure grunt work.
Oh I get it what you mean now, what my article said about Claude is the chat I do for search with framework desktop in chat, I now use free version of their service.
MOE models like qwen-coder 30B-A3B and GPT-OSS-120B (and even -20B) should work well on Framework Desktop. I use qwen-coder mostly, because it’s very fast on my existing system (with RTX4090), but I really like gpt-oss-120b (although it’s chat format confuses Cline sometimes).
They suggested using Qwen3 Coder 30B, and they have an option for a “compact prompt” to reduce context use. I’ve only started playing with a little, so I don’t have a good impression yet of how useful it can be; it’s definitely slower compared to cloud models, but not unusable.
It’s hit and miss, frankly. In my experience, even Claude 4 struggles with large tasks, so I prefer breaking them down into smaller, localized chunks instead. I think gpt-oss follows the instructions better, but they both can stray away pretty quickly.