What AI/ML Use Cases Should We Demo?

AFAIK it’s 96GB in Windows, 110GB in Linux.

2 Likes

With correct api, there is no limit on Linux (well only the size of the RAM: 128Go :wink: minus RAM neaded for other active programme.. )
This is the case for other AMD APU too (when the driver did not crash :wink: )

8 or 4 Go may be good for OS it leave 120/124 Go for the LLM tensors.

For AI/ML Use case may be reporte result for https://www.localscore.ai/ with different model.

  • 3 config is interesting CPU, sgemm GPU, full rebuild HIP.
  • Q6_K is fast and with good quality (Q4_K_M is good for bench but have to much hallucination for me…)
  • bartowski (Bartowski) have many model pres quantized, may be a selection of different model size from Lllama 3B to mistral large 123B ..

There is bf16 CPU Perf with llama.cpp too that may be interesting to have a look.

If needed I can give some more specific command to run for different cases.

I did not know if I can finish the FP8 backend of llama.cpp but if you have time I am working on a special IGPU backend for llama.cpp, for now only FP16/BF16 is supported and is optimised for the Ryzen 7940HS iGPU that have 12 CU… did not know what is the best/correct config for this 40CU … (GitHub - Djip007/llama.cpp at feature/igpu) and it may need to use the rocm-6.4 from fedora-43 (beta) …)

and for the “cluster” some bench with big MoE like the Llama 4 Maverick (if possible…) or the smaller Mistral 8x22 (in bf16 quant?)

Yes, I would like to know that as well. I really need some hard performance figures before spending that amount of money.

A small llama.cpp bench (it is the start we can get better… :crossed_fingers: )

You’ve probably seen this, but here’s an AMD benchmark (on Flow13 GZ302) trying to highlight the benefit of 128GB over 48GB memory, where

During testing, the M4 Pro 48GB was observed to rely on swap memory, which significantly slowed its performance.
AMD Ryzen AI Max+395: A Leap Forward in Generative AI Performance with Consumer PC

Stable Diffusion 3.5 large fp16 has 8.1 billion at 2 bytes for fp16, so seems like it should fit into 48GB or even 24GB. So maybe the bottleneck in the first test was not memory capacity. So the swapping probably occurred only in the second test which concurrently ran Stable Dffusion + Phi4, but didn’t explain the benefit of doing so.

Strangely, DeepSeek-R1 70B is mentioned in the configuration and footnote but not in the text or charts. (Speculation: Maybe AMD tried it and didn’t find a benefit in that case. Further speculation: That might be related to the DeepSeek claim that the 671B version uses only 37B of the 671B parameters for any one token. So for AMD’s test prompt, maybe it didn’t need to swap in the entire DeepSeek model, just a small fraction? I don’t know.)

I’m all for AMD making great headway in the AI world, it’s needed. I’m not a fan of cherry picking tests to skew results. AI will suffer horribly when there isn’t enough RAM. It’s more a marketing showcase than a rigorous, unbiased benchmark.

In addition to performance benchmarks, how about some compatibility demonstrations?

In particular, provide sample code that will load a model using PyTorch and Transformers/Accelerate - you know, the standard HuggingFace setup. Ensure that the code runs on Windows and Linux.

That would show that the machine - and, in particular, the GPU - is a solid consumer choice for running AIs. The reason this is important is that, so far, Nvidia cards are the only reliable such choice. AMD has a few cards which can be made to work, but my experiences with the one in the Framework 16 indicate that these require a degree of fiddling that a non-specialist is unlikely to put up with.

2 Likes

I’m an audiovisual creator, and have been training a lot of image models and lately also audio with Stable Audio Open. First of all I’d like to see even proof that basic things like training/ inference on Stable Audio Open works, as well as doing a full checkpoint training on Flux Dev (not possible even on a 40GB A100) and also just in general the capability to run ComfyUI workflows with some common custom nodes. So obviously this depends on RoCM etc, but in my case bracketing for good training values get expensive as such, so it would be great to be able to try out values locally and only then commit for a full run in the cloud.

1 Like

I want to see out of the box LM Studio use cases. The way to get ahead of the inevitable reviews and hearsay is to provide benchmarks of larger but still GPU viable models(32GB and lower) with some GPU comparisons. Models larger than 32GB with a GPU in addition offloading some layers, and of course the same model with just the framework desktop.

If the facts aren’t out there before units ship, these will be some of the tests I would do to decide if I’m better off returning the desktop and spending on GPUs instead.

1 Like

We just posted a blog around this, let us know what you think! Using a Framework Desktop for local AI

5 Likes

Daisy-chain multiple desktops for large models (200B+).

About that, the AI SOC for Nvidia uses specialized networking to accomplish this (either a ConnectX 7 or 8). Was the plan for multiple Framework Desktops to be clustered just with ethernet or pooled with some kind of PCI-E mesh/fabric? I don’t think 5GBASE-T is gonna cut it for models and environments that need coherent pools of VRAM.
Three groups of 4 PCIE lanes exist that could be used, either through the NVME slots or the unused x4 slot. I think those top out about 60gbit. Or maybe through 40gbit USB4?

I’m curious what AMD says about this use case.

1 Like

Point to point usb4 networking is probably the easiest, all you need is usb4 cables and it’s somewhat plug and play. You aren’t getting 40gbit but definitely a lot more than 5.

I’d like to see the benchmarks as well. Especially if they could use Nix to install and run the benchmarks. That would potentially allow you run them on any linux distro with Nix installed and possibly on MacOS. Maybe even under WSL on windows, not sure how that would effect performance though.

it will be hard to have real access to the GPU if possible.