Confused Flux 2 Dev - 130gb Overload

Dunc_Ward1 · March 31, 2026, 11:26am

’m very new to Framework, and I’m also new to APUs/iGPUs, so I’d appreciate some straightforward advice.

I’ve got a Framework Desktop Max+ 395 with 128GB RAM on the way.

Specs:

16 cores / 32 threads
3.0GHz base clock
Up to 5.1GHz boost
64MB L3 cache
Radeon 8060S integrated graphics
128GB LPDDR5x-8000
Wi-Fi 7 and 5Gbit Ethernet

I already own a separate machine with a 9950X3D, 2 x RTX 3090 Founders Editions, and 96GB RAM. It’s a very strong machine, but the obvious limitation is VRAM, with 24GB per card.

One of the reasons I was torn between buying something like a 48GB workstation GPU and going for the Framework Desktop instead was because I use ComfyUI a lot, and more and more models are now appearing that simply do not fit into 24GB VRAM.

What’s confusing me is this: when I try to run something like Flux 2 Dev in ComfyUI, it seems to use huge amounts of memory — around 130GB — and then stops. I was under the impression this machine would be able to handle larger models because of the shared memory setup, so I’m struggling to understand why it’s eating all available memory and still not completing.

I’m not especially technical, so I’d really appreciate replies in plain English rather than anything too deep into Python, coding, or command-line fixes.

Has anyone got any advice on what I should realistically expect from this machine, and whether I’m misunderstanding how the memory works?

At the moment im feeling rather deflated

Jason_Sharrad · April 1, 2026, 12:14am

Flux 2 Dev is a 64gb safetensors so it should fit in the memory. I suspect that the KV cache is causing your existing machine to over 130gb. Compressing this or limiting it to 8 bit quant may help.

I assumes that your existing machine is running windows that takes up additional space as well?

in any case have you considered this path?

Dunc_Ward1 · April 2, 2026, 7:26am

Hi Jason,

Thanks for your reply — that makes sense around the runtime overhead and how memory usage can go well beyond the raw model size. I’ll have a closer look at how things are being loaded and whether limiting or compressing the cache has any impact.

Just to clarify my setup — I’m running Linux across both machines (Kubuntu on my main rig and Ubuntu on the Framework), so there’s no Windows overhead in play here.

I’ve actually got two separate machines by design. The main system started as a dual 3090 setup, and I’ve now added a PNY NVIDIA RTX Pro 5000 Blackwell (48GB), so I’ve got a bit more headroom on the CUDA side.

Alongside that, I’ve picked up the Framework/Strix Halo machine to explore unified memory. That’s less about replacing CUDA and more about seeing how far I can push larger models that don’t comfortably fit within even 48GB VRAM.

So effectively I’m running a stable CUDA setup alongside a higher-ceiling experimental system.

The toolbox approach you shared looks interesting — it seems like a more structured way of handling the ROCm and memory side of things, so I’ll take a proper look rather than trying to brute-force it in a standard setup.

Appreciate you pointing me in that direction.

Best,Dunc

Topic		Replies	Views
FW13 AI 370 performance? Framework Laptop 13 amd-ai-300	27	2095	December 1, 2025
Updated commands to increase max unified memory usage on Framework Desktop under Fedora 43? Framework Desktop framework-desktop-ai-max-300 , ai	24	4700	March 14, 2026
LLM Performance Framework Desktop ai	26	9238	June 11, 2025
iGPU VRAM - How much can be assigned? Framework Desktop	9	7891	August 21, 2025
VRAM allocation for the 7840U frameworks Framework Laptop 13	27	12260	August 13, 2024

Confused Flux 2 Dev - 130gb Overload

Related topics