I’m considering getting the 395 max with 128GB but am on the fence at the moment regarding local LLM performance. From what I’ve seen the memory is 256Gb/s which some have suggested a rough maximum of 6t/s or so. Is it possible to see any real world usage of a 32b or 70b model running and the kind of performance we could expect?
It’s going to be a mix – the NPU has non-unified drivers in Windows but might be able to infer (aka ‘do inference’) at a higher rate with better configuration, and the ROCm or Vulkan drivers also need work. For an example of what this looks like on another Ryzen AI board with similar power dissipation, see this Level1Techs Video about the Minisforum X1 AI Mini PC when equipped with 96GB Memory.