Coding with local LLMs

I currently use GitHub Copilot integrated into VS Code for the majority of my coding related tasks (Claude Sonnet and GPT Codex models current most used).

I want to supplement this setup with locally running, coding oriented LLMs which I can integrate with the Claude Code VS Code plugin.

I want to be able to switch between Copilot and Claude, and in doing so move between cloud and local models as needed.

I am running models using Llama.cpp and configuring Claude Code in VS Code to use a locally running server:

Current models I am trialling are:

An example llama-server configuration which gives acceptable results for the Qwen model above (performance related settings shown):

–device Vulkan0
–metrics
–verbose
–jinja
–ctx-size 131072
–threads -1
–batch-size 2048
–ubatch-size 512
–parallel 1
–cont-batching
–cache-prompt
–fit on
–flash-attn on
–repeat-penalty 1.05
–temp 1.0
–top-p 0.95
–top-k 40
–min-p 0.01

Some of these settings are recommended by Unsloth on their Hugging Face model card, other settings I have leant on GitHub Copilot to help configure.

I have two questions for the community:

Which coding related models are you seeing best performance when running on the Framework Desktop (I am running the 128GB model)?

What settings are you using when running llama-server to achieve optimal performance?

Interested to hear how others have configured local development environments.

Thanks!

2 Likes

Hi! I tried testing some models on my Framework 13 using LMStudio. I also tried local autocomplete (vscode) — almost no suggestions (???). Overall it works, GPT OSS really feels like a chat, but it’s not super clear what I’m supposed to do with all of it (I’m a developer).

I think the most useful thing would be a tool that, over 12 hours of computation (while I sleep), thoroughly searches for bugs based on a description (haven’t tested anything like that yet).

I’m curious — what kind of tasks are you all using this for?

1 Like

I have a GitHub Copilot subscription and I use Claude and Codex for most coding tasks. It’s hard to drag myself away from that tooling when the quality of the output is so high.

I have also recently started using OpenSpec, which is brilliant in combination with those tools.

I want to find a use case for the local models, maybe for simpler, targeted refactoring tasks?

It may be possible to use something like Sonnet/Opus with OpenSpec to create the proposal, spec, design and task list and then switch to a local model to execute the tasks (use the best models to plan, the cheap models to implement).

I am finding myself using local models integrated into applications to power end use features more than for development, confident this can change as the models evolve.

I created a somewhat related thread, and this reply is potentially useful for the coding use case:

And another thread which may offer some insight: