It performs significantly better than I expected… but I think since it’s only Q3, you have to expect some errors or even hallucinations. The context will also slow down considerably when it’s running at 42k… I think 16k is realistic though… it’ll be enough for “technical discussions,” but for coding I’ll definitely use Qwen3 Coder Next at q6_k_m … I actually just wanted to test the maximum possible on the Strix Halo.
yes. normaly i have it @ 64gb, for this i had to set it to 96gb. and i was using the rocm Version of llama.cpp binaries before and now downloaded i the vulkan release… it took same time to get it running