AVX512 Support:
I’ve always wondered if the Strix Halo has full AVX512 support or, like other Ryzen 300 AI processors, a dual-pump AVX256.
Since this information is hard to find, I ended up doing something simpler: a small benchmark.
Goal: bench the AVX512_BF16 and see the maximum performance achievable => use a matrix product. The goal isn’t to have a fully operational gemm to start with, but simply to evaluate the number of AVX512 overclocks possible.
Here are some results obtained.
-
For reference, an “old” FW16 with a 7940HS: 8 Zen 4 cores, with a “simple” dual-pump AVX256:
=> 2063.51 GFlop/s. -
and a FD with a ryzen ai max+ 395: 16 zen5 core.
=> with 8 thead active => 4409.93 GFlop/s
=> with 16 threads active => 7873.82 GFlop/s
so:
- zen4 can do 64 FLOP/cycle/core (32 mul and 32 add) with BF16
(in fact 2xFMA_AVX256 + 2xFADD_AVX256 per core) - zen5_full can do 128 FLOP/cycle/core (64 mul and 64 add) with BF16
(in fact 2xFMA_AVX512 + 2xFADD_AVX512 per core)
So yes the zen5 core on the max+ 395 is the same as the desktop zen5 (ryzen 9XX0)!