Fast finetuning of LLMs like Gemma-3 on Strix Halo (Framework Dekstop) using Unsloth and distributed multi-node training

Repo: GitHub - kyuz0/amd-strix-halo-llm-finetuning · GitHub

1 Like

Nice!

for those who missed (like me) the cluster configuration: amd-strix-halo-vllm-toolboxes/rdma_cluster/setup_guide.md at main · kyuz0/amd-strix-halo-vllm-toolboxes · GitHub

did you have see: GitHub - Geramy/OdinLink-Five: A high-performance RCCL (ROCm Communication Collectives Library) plugin for Thunderbolt 5 that enables GPU-to-GPU communication across Thunderbolt connections with RDMA support. · GitHub but it is WIP so may be not good for now.
with latency on TB of 20us it is much better than 80us of eth , but yes not as fast as RDMA …

Yes, and I had started working with the author, but we had some issues and I had to go to SF for a week and this week I won’t have time to work on this, so probably I’ll take a look next week to see if I can get it working, but I am not expecting great performance from USB4 on that.

1 Like

:+1:
Nice to see you have a try to it!!!
Take your time, and if we can help.