Framework Desktop for Local AI

Barba · February 25, 2026, 10:53am

Hi all, I am interested in getting a capable PC for local AI. The Framework Desktop seems to be a great choice for it

However, I have been looking at the forum, and it seems there are quite few compatibility issues. But I also see that in the latest releases, things seems to have improved: Linux + ROCm: January 2026 Stable Configurations Update

Do you think the Framework is worth getting for local AI (will most likely use it for only this purpose, as I have a laptop as a daily driver)

Or should I build my own PC out of RTX 3090 ?

Or should I wait for the AMD Ryzen AI Halo ?

Or should I get a Mac mini M4 Pro 64GB ?

Which are the most performant ?

And what do you think overall ?

Thanks!

entropy4936 · February 25, 2026, 12:39pm

I use 3 Strix Halo machines (one of which being Framework Desktop) in a cluster for local AI. And whether it’s worth getting or not over the other options kind of depends on which models you want to run. I was aiming for large LLMs like GLM (currently testing GLM 5 performance to see if I can replace 4.6 I’m mostly using) and for me it works great. It’s fast, the thermals are decent and I’m yet to have an issue (I’m using Fedora 43, so your mileage may vary).

Tactical.Finesse · February 25, 2026, 12:52pm

With the FD, and Strix Halo as an architecture what you get is:

A large(r) memory pool than a PC
That is FAR faster than you can get on a traditional PC–and can be used for either CPU or GPU compute. To get a PC with 100GB of GPU VRAM you’ll spend $10,000+USD.

But:

AMD AGESA code…has always been “bleeding edge” and has always had teething pains. Strix Halo now isn’t the newest–and bugs are more ironed than they were
AMD ROCm is in very very active development. Getting it to run and work is a project. It isn’t a set-it-and-forget-it and use it solution, yet.
CUDA is simply more mature…but you lose out on the pros above.

A Mac Mini would have nearly half the memory bandwidth (120GB/second vs. 200GB/s)–and you can’t get near as much memory at the top configuration, and also much less CPU compute–while costing more. Does that matter to your application? IDK. BUT–with the FD sure you have up to 128GB of memory but are you running models where you can use that pool of memory and still get acceptable token-rates?

Whereas a Mac Studio M4 would have double the memory bandwidth (500+GB/s) of the Framework Desktop, but to get the same amount of RAM would cost 50% more–because Apple pricing on memory and storage has always been extremely high because their memory and drives are fast.

Barba · February 26, 2026, 8:45pm

Thanks for the replies!

My main use case would be local inference for development.

Any data on token/seconds for models like the latest qwen 3.5 27b with framework ?

@entropy4936 what are your other 2 Strix Halo ?

entropy4936 · February 27, 2026, 12:24am

Both are Minisforum MS-S1 MAX. At the time I got them, they were actually cheaper that Framework Desktop (because of the launch promo deal), so I went with them, cancelling my preorder for Desktop I had back then

Guest383 · March 4, 2026, 7:02pm

I realize this late but maybe it’ll be useful. These are the results of benchmarking, and I get nearly identical tps using a test prompt.

Qwen 3.5 27B runs fairly slow:

$ llama-bench -m ./unsloth_Qwen3.5-27B-GGUF_Qwen3.5-27B-UD-Q4_K_XL.gguf
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen35 ?B Q4_K - Medium        |  16.40 GiB |    26.90 B | ROCm       |  99 |           pp512 |        299.82 ± 4.47 |
| qwen35 ?B Q4_K - Medium        |  16.40 GiB |    26.90 B | ROCm       |  99 |           tg128 |         10.60 ± 0.01 |

$ llama-bench -m ./unsloth_Qwen3.5-27B-GGUF_Qwen3.5-27B-UD-Q6_K_XL.gguf
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen35 ?B Q6_K                 |  23.90 GiB |    26.90 B | ROCm       |  99 |           pp512 |        250.13 ± 3.50 |
| qwen35 ?B Q6_K                 |  23.90 GiB |    26.90 B | ROCm       |  99 |           tg128 |          7.70 ± 0.00 |

$ llama-bench -m ./unsloth_Qwen3.5-27B-GGUF_Qwen3.5-27B-UD-Q8_K_XL.gguf
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen35 ?B Q8_0                 |  33.08 GiB |    26.90 B | ROCm       |  99 |           pp512 |        288.82 ± 4.64 |
| qwen35 ?B Q8_0                 |  33.08 GiB |    26.90 B | ROCm       |  99 |           tg128 |          5.94 ± 0.00 |

Qwen 3.5 35B-A3B runs much better:

$ llama-bench -m ./unsloth_Qwen3.5-35B-A3B-GGUF_Qwen3.5-35B-A3B-UD-Q6_K_XL.gguf 
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen35moe ?B Q6_K              |  29.86 GiB |    34.66 B | ROCm       |  99 |           pp512 |        775.93 ± 3.43 |
| qwen35moe ?B Q6_K              |  29.86 GiB |    34.66 B | ROCm       |  99 |           tg128 |         36.91 ± 0.06 |

$ llama-bench -m ./unsloth_Qwen3.5-35B-A3B-GGUF_Qwen3.5-35B-A3B-UD-Q8_K_XL.gguf 
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen35moe ?B Q8_0              |  45.33 GiB |    34.66 B | ROCm       |  99 |           pp512 |        609.43 ± 3.95 |
| qwen35moe ?B Q8_0              |  45.33 GiB |    34.66 B | ROCm       |  99 |           tg128 |         25.21 ± 0.01 |

Barba · March 4, 2026, 8:45pm

Thanks, that’s really useful

what are tg128 and pp512 ?

Guest383 · March 5, 2026, 2:49am

pp512 is the rate of prompt processing of 512 tokens, so how long it takes to parse each token when the input is 512 tokens long.

tg128 is the rate of token generation of 128 tokens, so how long it takes to generate each token when 128 tokens are outputted.

Those are the defaults when running llama-bench.

Jason_Sharrad · March 5, 2026, 4:40am

Looks like i should give qwen 3.5 35B a got. trippling token generation is a great improvment.

I will come back on the weekend with some Win11 results.

Topic		Replies	Views
Using a Framework Desktop for local AI Blog	8	9232	July 23, 2025
What AI/ML Use Cases Should We Demo? Framework Desktop	36	3178	July 31, 2025
Is the Framework Desktop suitable for this workload? Framework Desktop compatibility , ai , gaming	3	673	September 17, 2025
Introducing the Framework Desktop Blog	80	13534	May 7, 2026
LLM Performance Framework Desktop ai	26	9074	June 11, 2025

Framework Desktop for Local AI

Related topics