6.4.1 on has basic (rocBLAS) support (gfx1151) but for all the kernels/best performance you should use one of the gfx1151 nightlies available for d/l here: Releases · ROCm/TheRock · GitHub
For regular llama.cpp-based inference, I’ve just posted the testing I’ve done and I believe that you’re mostly fine/better off with Vulkan generally (HIP sometimes does much better for prefill, but for token generation, Vulkan is almost always faster atm).