Fast finetuning of LLMs like Gemma-3 on Strix Halo (Framework Dekstop) using Unsloth and distributed multi-node training

kyuz0 · March 9, 2026, 8:46pm

Repo: GitHub - kyuz0/amd-strix-halo-llm-finetuning · GitHub

Djip · March 10, 2026, 12:10am

Nice!

for those who missed (like me) the cluster configuration: amd-strix-halo-vllm-toolboxes/rdma_cluster/setup_guide.md at main · kyuz0/amd-strix-halo-vllm-toolboxes · GitHub

did you have see: GitHub - Geramy/OdinLink-Five: A high-performance RCCL (ROCm Communication Collectives Library) plugin for Thunderbolt 5 that enables GPU-to-GPU communication across Thunderbolt connections with RDMA support. · GitHub but it is WIP so may be not good for now.
with latency on TB of 20us it is much better than 80us of eth , but yes not as fast as RDMA …

kyuz0 · March 10, 2026, 11:13am

Yes, and I had started working with the author, but we had some issues and I had to go to SF for a week and this week I won’t have time to work on this, so probably I’ll take a look next week to see if I can get it working, but I am not expecting great performance from USB4 on that.

Djip · March 10, 2026, 7:12pm

Nice to see you have a try to it!!!
Take your time, and if we can help.

Topic		Replies	Views
Llama.cpp/vLLM Toolboxes for LLM inference on Strix Halo Framework Desktop	56	8181	February 2, 2026
AMD Strix Halo (Ryzen AI Max+ 395) GPU LLM Performance Tests Framework Desktop ai	17	16613	September 29, 2025
Ryzen AI "Max" -- not so much? Framework Desktop	23	2426	December 2, 2025
Finetuning LLMs on Strix Halo – Full, LoRA, and QLoRA on Gemma-3, Qwen-3, and GPT-OSS-20B Framework Desktop	9	1914	October 21, 2025
[HOW-TO] Compiling VLLM from source on Strix Halo Framework Desktop ai	59	5735	January 7, 2026

Fast finetuning of LLMs like Gemma-3 on Strix Halo (Framework Dekstop) using Unsloth and distributed multi-node training

Related topics