Experiments with using ROCM on the FW16 AMD

EDIT:
The 34GB RAM limit can be increased using a few simple configuration commands:
You can try increasing the GTT pool with something like:
/etc/modprobe.d/increase_amd_memory.conf

#Otherwise it's capped to only half the RAM
options amdgpu gttsize=90000 #in MB
options ttm pages_limit=22500000 #4k per page, 90GB total
options ttm page_pool_size=22500000

Note:
(gttsize * 1024) / 4.096 = ttm pages

So, if you wish to use 60GB RAM with ROCM:

options amdgpu gttsize=60000 #in MB
options ttm pages_limit=15000000 #4k per page, 60GB total
options ttm page_pool_size=15000000

Previously, with only 34GB GTT RAM, I could only do a 30000 x 30000 matrix.
Now, with 60GB GTT RAM, I can do:
40000 x 40000 matrix multiplication with complex f32 values takes:
Duration: 387.57099167s
50000 x 50000 matrix multiplication with complex f32 values takes:
Duration: 886.901997998s

5 Likes