What AI/ML Use Cases Should We Demo?

Destroya · April 29, 2025, 4:56pm

Hi everyone,

We’re planning our next round of AI/ML tests and demos, and we’d love your input!

What AI use cases would you like to see us run on Framework Desktop?

Drop your suggestions below, we’ll pick a few to explore!

Zetaphor · April 29, 2025, 11:35pm

Benchmarks on LLM models of different sizes would be great. Right now all we have for how this will perform in the real world is speculation extrapolated from the existing Strix devices and other comparable hardware

D.H · April 30, 2025, 12:54am

Locally run mixed mode model in full control of a smart home (strong preference for Home Assistant integration). High speed conversational voice commands, searches, music control, out of normal conditions notifications, all while simultaneously monitoring and notifying about activity in multiple security cams.

At least that’s what I’m aiming for at some point anyway.

Joe_Name · April 30, 2025, 2:18pm

See how much of Msty you can utilize - loading larger models that can utilize the RAG function for obsidian vaults with thousands of entries. It’s ability to utilize ROCm, and hopefully the Framework desktops AI, looks like a match made in heaven on paper.

Derek_Pressnall · April 30, 2025, 7:05pm

Really want to see prompt processing speed. Also specifically what a 128 GiB opens up that can’t be reasonably accomplished on a 24 GiB GPU. Maybe also mix in tests using speculative decoding, where you use a small model (for speed) to front-run a larger model (for accuracy).

Also, what happens when a dGPU is combined with the framework desktop motherboard (i.e., using an x4 to x16 adapter/riser card) with something like a Radeon 7900 xtx.

martind · April 30, 2025, 7:39pm

Hi,
I personally am interested LLM benchmarks on in Linux systems.
But AMD currently seems to have better Software support on Windows via ONNX and the Lemonade SDK. Would be nice to see a comparison. Since on Windows with the correct setup AMD can utilize the NPU CPU and GPU all together in the best way.
I think only with this setup we can see the full potential.
It would make sense to test models ranging from 7B, 14B, 32B to 70B.

Maybe testing BitNet of Microsoft would be interesting too. Not sure about its quality, but might have some decent t/s GitHub - microsoft/BitNet: Official inference framework for 1-bit LLMs

Alex_Morozov · April 30, 2025, 7:48pm

Processing speed, prompt processing speed in tokens/s for popular models like gemma3 / llama4. Linux, ollama. Test a model that requires 100G of RAM (I’m mostly interested in stability on linux – 7940 dies from time to time when loading larger models). Comparison to 7940 in the same setup, it’ll allow me to extrapolate performance I have in my current setup.

groundloop · May 1, 2025, 4:09am

Beyond the obvious, as mentioned, performance testing current quantized models requiring more than 32GB VRAM (and even the 16, and 24GB of most GPU’s).

Additionally, using 2 or more motherboards running the same models to see how the performance scales. I think there might be some disappointment in the performance of these largish models on a single board, so it would be interesting to see if the performance scales. I doubt it would be worth trying to load something like the full Deepseek across multiple systems, but scaling performance across multiple motherboards might be interesting.

Zetaphor · May 1, 2025, 9:48pm

Whatever you do, if you use a quant, please let us know what quant you’re using.

All we have right now is that we can use “a 70B” at “conversational speed”

For those of us who preordered because we want to actually build things with this, the details matter more than the price.

Matt_F · May 2, 2025, 12:56am

This! This is the type of question I really want answered - what can this desktop do that GPUs of similar prices can’t?

Matt_F · May 2, 2025, 12:58am

Yes, this is essential. Any benchmark that doesn’t disclose quant size is useless at best, misleading at worst.

Joe_Name · May 2, 2025, 7:38am

Fine tuning would also be a great test. Fine tune a 70b model using PyTorch 2 + ROCm on the integrated RDNA 3 GPU, together with Hugging Face Transformers + PEFT (LoRA/QLoRA)
that should establish a great baseline for what this system is capable of.

Tugg_Hutchins · May 3, 2025, 3:19pm

So excited for these to ship! I think these are going to be very fun to mess around with.

Clustering has been talked about with these systems, and it would be nice to see what methods the Framework team are using for interconnect and software, and if it really is practical to run a large model across multiple systems for a usable LLM experience (some response tokens per second metrics and such). And if not for LLM, some demos of other cluster-based AI/ML applications.

And more of a secondary-subject to a main demo, I’d love to see mentioned in any demo some more specifics for what the Framework team is experiencing or planning for regarding AI/ML specific driver/software support. For example, while ROCm support for the 395 is listed in the AMD docs on the Windows compatibility matrix at the time I write this, it’s not referenced at all on the Linux compatibility matrix. Obviously responsibility for that support falls on AMD’s shoulders (and the ROCm docs are notoriously jumbled), but knowing what software and drivers the Framework team has been engineering this around would be good to know so users can plan what will be possible with ML on this hardware.

Finally, just to touch on what was mentioned earlier about Home Assistant and the smart home use-case, I envision one scenario of using the Framework Desktop as a home server that can truly be a compact and power efficient “brain” for the house. In this regard, I could see hurdles in the form of being able to properly leverage the hardware of the APU in virtualized environments. It would be great to see an example of support to run fully hardware accelerated AI/ML workloads on a VM or container in Proxmox alongside other traditional VM’s and being able to share hardware resources.

D.H · May 3, 2025, 11:04pm

I’d also like to see an auto-gpt style search agent tackle a difficult search problem. Something like “what’s the maximum sustainable human population of the earth, with citations and references”.

Another idea: automated tagging of the content of a very large (1TB+ ?) personal image and video collection.

Richard_Baker · May 4, 2025, 2:29pm

Beyond the standard performance of the current top models on a single system, I would love to see some comparison when load balancing across 1/2/3/4 systems using USB4.
e.g this is the prompts/s using a single machine, this is it with 2 machines, 3 etc. using a large model.
Testing to see an extra large model being balanced across more than 1 device would be amazing too, for example the Qwen3 235b model across 2/3 machines maybe using something like Xos? NetworkChuck on youtube has a video of himself doing this on 5 Mac Studios with thunderbolt networking which was interesting - can the same be done with these AMD devices, and how do they perform?

There was also some talk in the previous video about working with AMD on the AI cluster, What work was done together? Any significant improvements coming soon (such as to ROCm? load balancing over networks?) that we can look forward too?

Andreas_Stein · May 4, 2025, 10:35pm

For true transparency and comparability, I‘d love to see a selection of training and inference benchmarks from e.g., here:

As important as the benchmark results itself, I‘d consider the log of what it took to make it run…

For me, training/ finetuning would be more important than inference as I believe that you don‘t need to run a 70b+ model these days anymore to get decent consumer grade inference results. In other words: my use case for the Framework Desktop would lean towards AI/ ML development. To that end, Linux would be more important than W11, as I experienced more troubles using Python AI/ ML libraries under W1n than with Linux so that Linux Fedora eventually became my OS of choice for AI/ ML related tasks.

JJ81 · May 5, 2025, 11:54am

I just want a proper CPU benchmark (such as Passmark CPU Mark) for the 395 at full wattage (without GPU load), and know whether that performance can be sustained.

FW4TeePee · May 5, 2025, 11:38pm

When are you planning to make this information, and the demos available @Destroya ?
Thanks

n392 · May 9, 2025, 12:51pm

Clean install of Proxmox on Max+ 395 - 128GB, running Open Web Ui with GPU passthrough, having 10 concurrent users.
I have no idea on how to simulate “normal chatgpt” use pattern.

Setting : SME wanting to run this on premise.

Brett_James · May 9, 2025, 1:13pm

Quick note, max vram assignable is 96gb

Topic		Replies	Views
Using a Framework Desktop for local AI Blog	8	3702	July 23, 2025
Molecular Rendering, Video Game Development - Can the desktop AI Max+ 395 do it? Framework Desktop	30	833	June 15, 2025
LLM Performance Framework Desktop ai	26	5414	June 11, 2025
Introducing the Framework Desktop Blog	76	11148	June 16, 2025
Help Me Make Up My Mind (FW13 Ryzen AI 9 HX 370) Framework Laptop 13 framework-laptop-13-amd-ai-300 , ai	18	2614	July 11, 2025

What AI/ML Use Cases Should We Demo?

Related topics