Linux documentation to run Ollama or Llamacpp or vLLM?

Just an update, I have gotten vLLM at least nominally running. It’s still not for the faint of heart, but it is at least possible now:

I made a dedicated thread for discussing PyTorch and vLLM on the Framework Desktop (Strix Halo): PyTorch w/ Flash Attention + vLLM for Strix Halo

3 Likes