LLM Performance

nitro_marky · April 29, 2025, 4:17pm

TBH I think I’m going to cancel my pre-order until some genuine numbers are released rather than speculate over X device doing X number of tokens a second. I’m sure Framework could get something working, even if they preface it with ‘hey this is just a basic test and not all features are fully working’, just give us something.

Destroya · April 29, 2025, 5:29pm

What should we test? Any ideas or suggestions?

Alex_Morozov · April 29, 2025, 7:39pm

For me – ollama run --verbose on a few popular models of different sizes to check tokens/s and prompt eval rate. I’d like to compare 1:1 to mine 7940 machine; comparison to the recent 13" AI platforms would be also interesting.

fish_177 · April 30, 2025, 1:31pm

The biggest thing in my opinion is probably models above 40GB in size - 70b q4 and up, maybe some 32b with 130k context.

This chip is being marketed hard as a device to run inferencing with models that won’t easily fit in consumer GPU vram, but basically nobody seems to be testing them with models above 13b parameters.

It would be nice to see if it’s actually fast enough to provide usable performance for this use case.

Djip · May 17, 2025, 10:46pm

some element…

gcf · June 8, 2025, 1:43pm

That page now says:

2025-05-30 UPDATE: I am now able to reveal that all my Strix Halo has been done on pre-release Framework Desktop systems. Per the published specs page, it is able to boost to 140W and sustain at 120W. I won’t be going deep into any hardware/system benchmarks (will leave it for others) but in my llama-bench runs it does not appear to thermal throttle.

fish_177 · June 11, 2025, 1:11pm

Someone finally tested and put out results with large models for this SIP, although with a different manufacturer’s product:

He appears to be using lm studio / vulkan back end and getting about 5 tps with llama 70B q4 (40GB model). A little better than I expected actually, but below the threshold I personally would consider usable (for me that’s around 8 tps, obviously usable is subjective).

32B looks fine-ish though.

Topic		Replies	Views
Ryzen AI "Max" -- not so much? Framework Desktop	23	3658	December 2, 2025
AMD Strix Halo (Ryzen AI Max+ 395) GPU LLM Performance Tests Framework Desktop ai	17	21178	September 29, 2025
[TRACKING] Will the AI Max+ 395 (128GB) be able to run gpt-oss-120b? Framework Desktop framework-desktop-ai-max-300 , ai	35	15617	January 25, 2026
Help Me Make Up My Mind (FW13 Ryzen AI 9 HX 370) Framework Laptop 13 amd-ai-300 , ai	19	4406	January 7, 2026
FW13 AI 370 performance? Framework Laptop 13 amd-ai-300	27	2080	December 1, 2025

LLM Performance

Related topics