Compiling VLLM from source on Strix Halo

Any way to get MTP(multiple token prediciton) work, with vllm/rocm?

Haven’t tried yet, so no idea.

Got MTP work. TPS dropped sharply from 15 tps to 8 tps.

Yeah, I tried it on my Spark too and it dropped from 43 t/s to 30 t/s or so.