Theoretically NPUs should handle tensor processing (matmul in PyTorch, Tensorflow, the tech that runs open weights) more efficiently. I’m not sure either if that’s the case here. I.e. does ROCm (equivalent of CUDA) utilize these tensor cores or not.
Theoretically NPUs should handle tensor processing (matmul in PyTorch, Tensorflow, the tech that runs open weights) more efficiently. I’m not sure either if that’s the case here. I.e. does ROCm (equivalent of CUDA) utilize these tensor cores or not.