DGX Spark vs. Strix Halo - Initial Impressions

Getting 20 t/s on dual Sparks using VLLM in tensor parallel mode over Infiniband with RedHatAI/Qwen3-VL-235B-A22B-Instruct-NVFP4.

Same workflow running over Ethernet was giving me 16 t/s.

Same physical port and cable.

1 Like

Turned out GB10 is not yet optimized for FP4 quants, so AWQ gave me 25 t/s on the same model.

Also, 40 t/s on Minimax M2 in AWQ 4-bit is very usable for coding.

Wow, I was able to run GLM-4.6 in 4-bit AWQ on my dual Sparks and the performance was acceptable. 16 t/s is not fast by any measure, but usable. Prompt processing speeds were pretty decent too.

Could only fit 50K context. I guess if I optimized my memory footprint, I could ramp it up to 64K.