AVX_512 support:

AVX512 Support:

I’ve always wondered if the Strix Halo has full AVX512 support or, like other Ryzen 300 AI processors, a dual-pump AVX256.

Since this information is hard to find, I ended up doing something simpler: a small benchmark.

Goal: bench the AVX512_BF16 and see the maximum performance achievable => use a matrix product. The goal isn’t to have a fully operational gemm to start with, but simply to evaluate the number of AVX512 overclocks possible.

Here are some results obtained.

  • For reference, an “old” FW16 with a 7940HS: 8 Zen 4 cores, with a “simple” dual-pump AVX256:
    => 2063.51 GFlop/s.

  • and a FD with a ryzen ai max+ 395: 16 zen5 core.
    => with 8 thead active => 4409.93 GFlop/s
    => with 16 threads active => 7873.82 GFlop/s

so:

  • zen4 can do 64 FLOP/cycle/core (32 mul and 32 add) with BF16
    (in fact 2xFMA_AVX256 + 2xFADD_AVX256 per core)
  • zen5_full can do 128 FLOP/cycle/core (64 mul and 64 add) with BF16
    (in fact 2xFMA_AVX512 + 2xFADD_AVX512 per core)

So yes the zen5 core on the max+ 395 is the same as the desktop zen5 (ryzen 9XX0)!

1 Like

I bench with FP32 and get ~4 Tflop/s
I expected (not tested) ~ 2TFop/s with FP64 (ie 2x more than the 928 Glop/s possible with the GPU :wink: ) but did not bench it.

Then my expectation is correct. Thank you for the confirmation.