Finetuning LLMs on Strix Halo – Full, LoRA, and QLoRA on Gemma-3, Qwen-3, and GPT-OSS-20B

I’ve just released a toolbox and a tutorial on how to finetune open-weight LLMs like Gemma-3, Qwen-3 and gpt-oss on AMD’s new Strix Halo processor (specifically I use the Framework Desktop). The 128GB of unified memory make it possible to train models up to 12B parameters in full, while using LoRA and different quantization strategies it is also possible to tune ~20-30B models locally:

12 Likes

Thank you! I needed something to talk me down from canceling both my desktop and mainboard orders. It’s been a few months since this video [this video is from the 19th of August, correct?]. Are you happy with the progress of software/driver development?

So off topic but in your opinion, no complaints about the Desktop PSU behavior at idle? And no complaints about the system stability under heavy GPU load?

1 Like

Hey, the video I posted here is from yesterday - I’d check it out. The stack is definitely more stable and performance, AMD is slowing trying to improve this.

I should say that I have a pre-production unit, but I never had any issue with PSU and mine sits in my lounge as a quirky tech piece and never gave me noise issues.

My question is: what is your use-case? What do you hope to do with the Framework Desktop? There are inherent limitations - the software stack will get better - but still the laws of physics will apply: you only have that much memory and bandwidth and GPU cores - that’s a hard ceiling you won’t escape from.

1 Like

@kyuz0 in your video, you confuse August with October a couple of times, giving the impression it’s an old video :slight_smile:
It’s a good video. I’ll give it a try.

2 Likes

Most certainly. I’ve gotten back to watching now. I started watching then had a few things I had to attend to after a few minutes in and paused the playback until I could give it my full attention. Which is now.

But as to my comment regarding the date. It is in response to your statement after the 1:26 minute mark. From the transcript rather than me transposing what my ear heard:

1:26 … has not been fantastic from day one. However, since my first video in August
1:32 and we’re now today uh on the 19th of August when I’m recording this support
1:37 has improved considerably. uh and there is a lot of work going on on uh AMD’s

So I gather the “…19th of August” was accidental and you meant 19th of October. Got it. No problemo. It happens. Glad you cleared that up.

I’m relieved you hear that. More than I can emphasis. Thank you for the feedback.

Understood. Initial reason for the purchase was 100% to learn and experiment with open source LLMs, with a goal of leading to a self-directed autonomous stock trading, eventually, experimentally. Perhaps one day successfully automating the investment thesis I’ve been following manually and attend to daily. And should this transpire and I be successful, stability is of paramount importance.

But then it’s also evolved to potentially take over my daily driver computer duties, and very rarely a gaming system; super rarely, and I can continue using the newest computer I built for that if/when I want that. But without a doubt, the primary use case being to run open source LLMs in LM Studio and other runner services when I find that would better serve my need, MCP servers and my own code either within them or as my own custom service to interact with the LLM. I plan to initially run Windows 11 Pro on it, maybe utilize Docker, maybe utilize WSL, maybe dual boot with a Linux distribution, and thank you for the “how-to” if doing that ;).

1 Like

Lol - I did realise that and I was able to at least correct one instance with a note, but the other instance I didn’t notice. That tells you how fried my brain is!

3 Likes

Tested @kyuz0’s fine-tuning toolbox today on Framework Desktop (Strix Halo, 128GB, Linux Mint 22.2, kernel 6.17.0-061700-generic).

Results on Gemma-3-1B (2 epochs):

  • Full fine-tuning: 19.43 GB, ~3 min

  • LoRA: 15.40 GB, ~2 min

  • 8-bit + LoRA: 13.06 GB, ~11 min

  • QLoRA: 13.08 GB, ~3 min

All methods completed successfully. Memory usage matches the benchmark table. ROCm 7 nightly detected GPU correctly after fixing /dev/kfd permissions with udev rules.

Submitted PR to fix hardcoded paths in inference sections.

Setup worked smoothly following the repo instructions.

2 Likes

Thank you for testing this out!

1 Like

This is a great video. One quesiton, how would I format say a list of passwords that I want the system to NOT flag as ‘secret’? my use case would be noseyparker, which looks at secrets in code. I had tried using an encoder only model, but is WAY too complexy. What would the data format need to look like, and what would be a good link on formatting data for fine-tuning models?

Hey,

Probably this is not the best place to discuss this. I’d say this needs way more context and way more details, probably a classifier would be better and you’d need positive and negative samples in your dataset → but again, if this is specific to a fixed list of passwords, most definitely do not use an LLM, I can’t imagine a worst way of approaching the problem. Just use string search. f you don’t want the model to detect “generic passwords” as secrets that might be much harder due to the nature of passwords. But again, I’d say this is not the right place for this conversation.