After I became comfortable running local models via llama.cpp/llama-swap on my framework desktop mainboard, I was frustrated by the lack of a good visibility into local LLM’s. So I spent the last few weeks building llama-dash. I’m a web dev by day, so I really put a lot of effort into making this a polished project. I’d love to hear any feedback you all may have!
llama-dash turns a self-hosted local inference box into an observable, policy-controlled AI gateway: one UI for model state, request history, API keys, routing rules, proxy metrics, and client setup. The implemented inference backend is currently llama-swap over llama.cpp, but it’d be easy to add Ollama, etc.
From an infra perspective it fits here:
OpenAI SDK / Claude Code / Continue / Open WebUI
│
▼
llama-dash :3000
dashboard · auth · logs · routing · metrics
│ │
▼ ▼
llama-swap :8080 direct /v1 upstreams
llama.cpp models · peers OpenAI · Anthropic