A homeassistant Whisper / Piper docker-compose designed for use with strix halo

I had trouble finding ROCm (especially latest ROCm, i.e. 7+) setups of whisper and/or piper. So I put together a little setup in this repository. Hope it helps others looking to run the HA voice stack locally on their new Framework Desktop (Mainboards)!

4 Likes

Just added support for Qwen3-TTS wrapped in Wyoming as well as Pocket TTS wrapped in wyoming!

Unfortunately qwen3-tts is not generating fast enough to be considered for voice assistant use. It does work, however, specifically with the Qwen3-TTS-12Hz-1.7B-VoiceDesign model which allows designing the voice via a prompt at startup. An audio snippet for a ~10 word sentence took anywhere from 30s-1min to generate.

I think this is mostly down to missing flash-attn in the container. I wasn’t able to get this to work with ROCm 7.1.1 nor 7.2.

Pocket TTS on the other hand works a treat! It is a good bit better than piper in terms of generating natural sounding speech. You can play around with the available voices in their demo page here.

I made kokoro-fastapi work on Strix Halo if your’re interested in TTS: Add permutation that support running on ROCm gfx1151 devices such as Strix Halo by projects-land · Pull Request #431 · remsky/Kokoro-FastAPI · GitHub

(At the moment when I made it work ROCm 7.2 hadn’t been released yet so it uses nightly. Now it should work with stable 7.2 ROCm.)

Nice one! Any idea if these Strix Halo boxes can generate audio at ~real-time rates via Kokoro?

It is fast even on CPU. ON GPU it is very fast. Give it a go.

I actually had a Kokoro-FastAPI ROCm fork running already, but it turns out it was using the CPU still :joy:.

I got your patch running with some minor tweaks (commented in the PR) and wow - its definitely useable for voice assistant real-time use-cases now! :heart:

What would you recommend (in terms of where to start the learning process) to be able to implement this? I am a quick learner but relatively new to running HA voice stack locally…

thanks!

Hmm best to probably learn the basics of docker and Linux sys admin stuff first. But if you’re reasonably well versed in those - just dive in! Try to run some models. Ollama and llama.cpp are good starting points