AMD AI Max+ 395 128GB with cline

assassinyin · August 21, 2025, 8:04am

I also post the article in other website.

I’m asking for suggestions of run a LLM for cline agent coding since there’s not much info online and my GPT and Claude seems really not a reliable options to ask, I’ve view almost anything I can find and still concludes a definite answer.

I’m now in one of the framework desktop late batches and I wanna try out local LLM at then, I primarily use cline + gemini 2.5 flash for Unity/Go backend and occasionally for language likes rust, python typescripts etc if I feel like to code small tool for faster iterations

Would It feels worse in local server? And what model should I go for?

bron · August 21, 2025, 4:11pm

A local LLM server will feel much slower. Much slower.

As for which model you should run, if you’re disappointed with Claude, that might be difficult… It’s unlikely anything running locally can do much better.

How about trying some models on OpenRouter first?

nadb · August 21, 2025, 4:18pm

They are all unreliable. AI tools are exactly that…tools. You still need to know what you are doing, I regularly correct any LLM I am using, but the upshot of this is that its responses improve over time, and even though I have to correct it, the LLM save me time in creating files that I can edit quickly that would take me a lot of time to create wholesale form scratch. No matter what you do though the human will still need to do the final heavy lift. I just get to do a lot more heavy lifting and a lot less pure grunt work.

assassinyin · August 21, 2025, 5:27pm

I never said I don’t like Claude, just happen to have the needs to do ai inference offline

In fact, if I ever get a job in future, I’ll be consider buying pro yearly and using their services to code often.

assassinyin · August 21, 2025, 5:31pm

That’s doesn’t quite answer my questions so I’ll be more clear.

IK they are unstable, but flash does copy my work and write the code in my structure and in the way I want it to

Does local ai have ability to achieve that? That is the question I wanna know.

assassinyin · August 21, 2025, 5:35pm

Oh I get it what you mean now, what my article said about Claude is the chat I do for search with framework desktop in chat, I now use free version of their service.

Eugr · August 31, 2025, 8:15pm

MOE models like qwen-coder 30B-A3B and GPT-OSS-120B (and even -20B) should work well on Framework Desktop. I use qwen-coder mostly, because it’s very fast on my existing system (with RTX4090), but I really like gpt-oss-120b (although it’s chat format confuses Cline sometimes).

assassinyin · September 3, 2025, 7:27am

Would you be able to perform complex task with it?
Let’s say I have a very detail PRD with system design layout completed, would it follow?

Sam_H · September 3, 2025, 12:51pm

There is a recent blog post from Cline about this that was pretty good: Cline + LM Studio: the local coding stack with Qwen3 Coder 30B - Cline Blog

They suggested using Qwen3 Coder 30B, and they have an option for a “compact prompt” to reduce context use. I’ve only started playing with a little, so I don’t have a good impression yet of how useful it can be; it’s definitely slower compared to cloud models, but not unusable.

Note that to get the model to load with full context size, I had to increase GTT limits (on Linux), as shown here: iGPU VRAM - How much can be assigned? - #7 by lhl

assassinyin · September 3, 2025, 1:45pm

looks good, I’ll try it on my M4 32g and shared my thought later, ty

Eugr · September 4, 2025, 4:15pm

It’s hit and miss, frankly. In my experience, even Claude 4 struggles with large tasks, so I prefer breaking them down into smaller, localized chunks instead. I think gpt-oss follows the instructions better, but they both can stray away pretty quickly.

assassinyin · September 5, 2025, 7:11am

I know, I know
I just have trauma with copilot 4o myself
That thing is total miss and miss

assassinyin · September 5, 2025, 8:00am

M4 air can’t handle that lol, looks like the fan is still needed

Topic		Replies	Views
Using a Framework Desktop for local AI Blog	8	3702	July 23, 2025
What AI/ML Use Cases Should We Demo? Framework Desktop	36	2500	July 31, 2025
Help Me Make Up My Mind (FW13 Ryzen AI 9 HX 370) Framework Laptop 13 framework-laptop-13-amd-ai-300 , ai	18	2614	July 11, 2025
Wow! FWLP16 - Compilation Performance is outstanding! Framework Laptop 16 framework-laptop-16-amd-7040	1	351	June 12, 2024
Batch 10 Guild Framework Laptop 13	46	10146	April 26, 2022

AMD AI Max+ 395 128GB with cline

Related topics