Any way to get MTP(multiple token prediciton) work, with vllm/rocm?
Haven’t tried yet, so no idea.
Got MTP work. TPS dropped sharply from 15 tps to 8 tps.
Yeah, I tried it on my Spark too and it dropped from 43 t/s to 30 t/s or so.
Any way to get MTP(multiple token prediciton) work, with vllm/rocm?
Haven’t tried yet, so no idea.
Got MTP work. TPS dropped sharply from 15 tps to 8 tps.
Yeah, I tried it on my Spark too and it dropped from 43 t/s to 30 t/s or so.