What AI/ML Use Cases Should We Demo?

Hi everyone,

We’re planning our next round of AI/ML tests and demos, and we’d love your input!

What AI use cases would you like to see us run on Framework Desktop?

Drop your suggestions below, we’ll pick a few to explore!

3 Likes

Benchmarks on LLM models of different sizes would be great. Right now all we have for how this will perform in the real world is speculation extrapolated from the existing Strix devices and other comparable hardware

3 Likes

Locally run mixed mode model in full control of a smart home (strong preference for Home Assistant integration). High speed conversational voice commands, searches, music control, out of normal conditions notifications, all while simultaneously monitoring and notifying about activity in multiple security cams.

At least that’s what I’m aiming for at some point anyway.

4 Likes

See how much of Msty you can utilize - loading larger models that can utilize the RAG function for obsidian vaults with thousands of entries. It’s ability to utilize ROCm, and hopefully the Framework desktops AI, looks like a match made in heaven on paper.

1 Like

Really want to see prompt processing speed. Also specifically what a 128 GiB opens up that can’t be reasonably accomplished on a 24 GiB GPU. Maybe also mix in tests using speculative decoding, where you use a small model (for speed) to front-run a larger model (for accuracy).

Also, what happens when a dGPU is combined with the framework desktop motherboard (i.e., using an x4 to x16 adapter/riser card) with something like a Radeon 7900 xtx.

6 Likes

Hi,
I personally am interested LLM benchmarks on in Linux systems.
But AMD currently seems to have better Software support on Windows via ONNX and the Lemonade SDK. Would be nice to see a comparison. Since on Windows with the correct setup AMD can utilize the NPU CPU and GPU all together in the best way.
I think only with this setup we can see the full potential.
It would make sense to test models ranging from 7B, 14B, 32B to 70B.

Maybe testing BitNet of Microsoft would be interesting too. Not sure about its quality, but might have some decent t/s GitHub - microsoft/BitNet: Official inference framework for 1-bit LLMs

2 Likes

Processing speed, prompt processing speed in tokens/s for popular models like gemma3 / llama4. Linux, ollama. Test a model that requires 100G of RAM (I’m mostly interested in stability on linux – 7940 dies from time to time when loading larger models). Comparison to 7940 in the same setup, it’ll allow me to extrapolate performance I have in my current setup.

5 Likes

Beyond the obvious, as mentioned, performance testing current quantized models requiring more than 32GB VRAM (and even the 16, and 24GB of most GPU’s).

Additionally, using 2 or more motherboards running the same models to see how the performance scales. I think there might be some disappointment in the performance of these largish models on a single board, so it would be interesting to see if the performance scales. I doubt it would be worth trying to load something like the full Deepseek across multiple systems, but scaling performance across multiple motherboards might be interesting.

2 Likes

Whatever you do, if you use a quant, please let us know what quant you’re using.

All we have right now is that we can use “a 70B” at “conversational speed”

For those of us who preordered because we want to actually build things with this, the details matter more than the price.

3 Likes

This! This is the type of question I really want answered - what can this desktop do that GPUs of similar prices can’t?

Yes, this is essential. Any benchmark that doesn’t disclose quant size is useless at best, misleading at worst.

1 Like

Fine tuning would also be a great test. Fine tune a 70b model using PyTorch 2 + ROCm on the integrated RDNA 3 GPU, together with Hugging Face :hugs: Transformers + PEFT (LoRA/QLoRA)
that should establish a great baseline for what this system is capable of.

5 Likes

So excited for these to ship! I think these are going to be very fun to mess around with.

Clustering has been talked about with these systems, and it would be nice to see what methods the Framework team are using for interconnect and software, and if it really is practical to run a large model across multiple systems for a usable LLM experience (some response tokens per second metrics and such). And if not for LLM, some demos of other cluster-based AI/ML applications.

And more of a secondary-subject to a main demo, I’d love to see mentioned in any demo some more specifics for what the Framework team is experiencing or planning for regarding AI/ML specific driver/software support. For example, while ROCm support for the 395 is listed in the AMD docs on the Windows compatibility matrix at the time I write this, it’s not referenced at all on the Linux compatibility matrix. Obviously responsibility for that support falls on AMD’s shoulders (and the ROCm docs are notoriously jumbled), but knowing what software and drivers the Framework team has been engineering this around would be good to know so users can plan what will be possible with ML on this hardware.

Finally, just to touch on what was mentioned earlier about Home Assistant and the smart home use-case, I envision one scenario of using the Framework Desktop as a home server that can truly be a compact and power efficient “brain” for the house. In this regard, I could see hurdles in the form of being able to properly leverage the hardware of the APU in virtualized environments. It would be great to see an example of support to run fully hardware accelerated AI/ML workloads on a VM or container in Proxmox alongside other traditional VM’s and being able to share hardware resources.

3 Likes

I’d also like to see an auto-gpt style search agent tackle a difficult search problem. Something like “what’s the maximum sustainable human population of the earth, with citations and references”.

Another idea: automated tagging of the content of a very large (1TB+ ?) personal image and video collection.

1 Like

Beyond the standard performance of the current top models on a single system, I would love to see some comparison when load balancing across 1/2/3/4 systems using USB4.
e.g this is the prompts/s using a single machine, this is it with 2 machines, 3 etc. using a large model.
Testing to see an extra large model being balanced across more than 1 device would be amazing too, for example the Qwen3 235b model across 2/3 machines maybe using something like Xos? NetworkChuck on youtube has a video of himself doing this on 5 Mac Studios with thunderbolt networking which was interesting - can the same be done with these AMD devices, and how do they perform?

There was also some talk in the previous video about working with AMD on the AI cluster, What work was done together? Any significant improvements coming soon (such as to ROCm? load balancing over networks?) that we can look forward too?

For true transparency and comparability, I‘d love to see a selection of training and inference benchmarks from e.g., here:

As important as the benchmark results itself, I‘d consider the log of what it took to make it run…

For me, training/ finetuning would be more important than inference as I believe that you don‘t need to run a 70b+ model these days anymore to get decent consumer grade inference results. In other words: my use case for the Framework Desktop would lean towards AI/ ML development. To that end, Linux would be more important than W11, as I experienced more troubles using Python AI/ ML libraries under W1n than with Linux so that Linux Fedora eventually became my OS of choice for AI/ ML related tasks.

I just want a proper CPU benchmark (such as Passmark CPU Mark) for the 395 at full wattage (without GPU load), and know whether that performance can be sustained.

1 Like

When are you planning to make this information, and the demos available @Destroya ?
Thanks

Clean install of Proxmox on Max+ 395 - 128GB, running Open Web Ui with GPU passthrough, having 10 concurrent users.
I have no idea on how to simulate “normal chatgpt” use pattern.

Setting : SME wanting to run this on premise.

Quick note, max vram assignable is 96gb