Framework Desktop Deep Dive: Ryzen AI Max

We dedicated a lot of our launch presentation of Framework Desktop to the Ryzen AI Max processor it uses, and for a good reason. These truly unique, ultra-high-performance parts are the culmination of decades of technology and architecture investments that AMD has made, going all the way back to their acquisition of ATI in 2006. For our first technical deep dive on Framework Desktop, we’re going to go even deeper into Ryzen AI Max and what makes it a killer processor for gaming, workstation, and AI workloads.

What makes Ryzen AI Max special is a combination of three elements: full desktop-class Zen 5 CPU cores, a massive 40-CU Radeon RDNA 3.5 GPU, and a giant 256-bit LPDDR5x memory bus to feed the two, supporting up to 128GB of memory. Chips and Cheese did an excellent technical overview of the processor with AMD that goes even deeper on this, and we’ll pull out some of the highlights along with our own insights. We’ll start with the CPUs. Ryzen AI Max supports up to 16 CPU cores split across two 4nm FinFET dies that AMD calls CCDs. These dies are connected together using an extremely wide, low power, low latency bus across the package substrate. The CPUs are full Zen 5 cores with 512-bit FPUs and support for AVX-512, a vector processing instruction set otherwise only available on Intel’s top end server CPUs. We’re excited for you to see the multi-core performance numbers these CPUs can do in our upcoming press review cycle!

The GPU in Ryzen AI Max is discrete-class, with 40 RDNA 3.5 Compute Units in the Radeon 8060S configuration. For reference, the discrete Radeon 7700S GPU in Framework Laptop 16 has 32 RDNA 3 CUs. The GPU sits on a separate, even larger 4nm FinFET die from the CPU CCDs. This die also carries the large NPU, video encode/decode blocks, 32MB of additional MALL cache, and the memory and peripheral interfaces. The GPU handles essentially all current PC titles well at 1080p with high graphics settings, and most at 1440p as well.

To feed a GPU of this class, the processor needs a ton of memory bandwidth. Mobile and desktop processors like the Ryzen AI 300 Series used in Framework Laptop 13 top out at 128-bit memory buses, and Ryzen AI Max doubles that to 256-bit at 8000 MT/s, enabling a massive 256GB/s of bandwidth. That is similar to the throughput that the discrete 7700S GPU achieves. With eight 32-bit memory packages, the processor can support a colossal 128GB of LPDDR5x. On Windows, up to 96GB can be dedicated to the GPU, and we’ve seen even higher numbers on Linux, making Ryzen AI Max excellent for AI workloads. We’ll have a dedicated deep dive on the AI use case soon.

One tradeoff on the memory though is that fanning out that giant 256-bit memory bus requires the LPDDR5x to be soldered. When we learned about Ryzen AI Max, our first question for AMD was whether using LPCAMM2 was possible to modularize the memory. Instead of immediately saying “No, it’s not possible,” AMD allocated technical architects and engineers to spend days testing out different layouts and running simulations. They then finally concluded that it was in fact not possible without massively downclocking the memory, which would defeat the purpose of having a wide memory bus and large GPU. We accepted the tradeoff of using soldered memory, and unlike some electronics brands, aren’t using that as an excuse to charge obscene sums for higher memory capacity.

What makes Ryzen AI Max especially interesting in the Framework Desktop is that we were able to unlock every bit of its power. Because we use a desktop-style 6-heatpipe heatsink from Cooler Master and a 120mm fan, we can run it at its maximum sustained power of 120W along with 140W boost, while keeping the system quiet. We were also able to break out 2x USB4, 2x DisplayPort, HDMI, and all three PCIe x4 interfaces, two for M.2 SSDs and one as a x4 PCIe slot. All of this makes it great in the tiny Framework Desktop form factor, but also makes it excellent to drop the Mainboard into any standard Mini-ITX case. This is, after all, a standard PC! It’s just one that uses a one-of-a-kind, monstrous processor from AMD. Pre-orders for Framework Desktop are open now, with new orders shipping in Q3.

5 Likes

I’m excited to see how this performs with models like the new Llama3.3 model and QWQ. I’m growing more and more interested in local AI the more I play with it.

Why don’t they (AMD) call it L4/5 cache [faster], and give us upgradeable memory [relatively slower]?

That’s how ‘memory’ has been dealt with all these years…faster-closer-smaller size, slower-farther-larger size.

(L3 cache was introduced in 2003)

The price would be more than most people could pay…

Consider how expensive Intel Optane was, and consider what cache sizes we’ve seen with AMD’s L3 3D V-Cache (96MB for 1 CCD) and Intel’s Broadwell (128MB L4).

I don’t know if it’s even possible to have 128GB on die/chip and maintain the same bandwidth as current L3/L4.

Memory bandwidth. Apple’s silicon is seeing the impressive performance that it does because of the bandwidth that is possible on their SoCs. This is not a conspiracy to get rid of socketable ram. It simply is the nature of the beast for attaining this kind of performance on AI based workloads.

1 Like

I probably didn’t describe what I was thinking very well:
Keep L1-L3 cache on-die.
Keep the current on-chip/on-SOC (not on-die) memory, give it faster bandwidth like Apple sillicon, call that L4 cache (or whatever marketing label). This is the 32GB - 512GB non-upgradable memory.
Have the cache / memory hierarchy also reach out to off-chip / on-board / DIMM memory. This will be the current DDR5 sticks.

I’m butchering the needed complexity here, but it’s to introduce an additional layer in the memory hierarchy.

Yes, cost is definitely one of the factors to be considered.

Like, over the years, they’ve been slipping in additional layers between L1 cache, on-board memory, and disk storage to get a smooth transition from one layer to another, to close the performance gap / step size between layers.