Strix Halo self hosted LLM Ops dashboard

After I became comfortable running local models via llama.cpp/llama-swap on my framework desktop mainboard, I was frustrated by the lack of a good visibility into local LLM’s. So I spent the last few weeks building llama-dash. I’m a web dev by day, so I really put a lot of effort into making this a polished project. I’d love to hear any feedback you all may have!

llama-dash turns a self-hosted local inference box into an observable, policy-controlled AI gateway: one UI for model state, request history, API keys, routing rules, proxy metrics, and client setup. The implemented inference backend is currently llama-swap over llama.cpp, but it’d be easy to add Ollama, etc.

From an infra perspective it fits here:

OpenAI SDK / Claude Code / Continue / Open WebUI
                    │
                    ▼
              llama-dash :3000
      dashboard · auth · logs · routing · metrics
             │                     │
             ▼                     ▼
      llama-swap :8080         direct /v1 upstreams
  llama.cpp models · peers      OpenAI · Anthropic