Could we have the 16 inch laptop with Ryzen AI max 385/ 395 and 128 gb ram ?
That would be awsome.
Especially would be nice to have in 2027 : AMD ai Max with 128gb DDR6 CAMM 2 ram… it s in my whish list for updating from my 12th gen 13”. That would be enought to run local llms, gaming and anything I could possibly throw at it.
At 1900€ this would be a good buy to me.
Better than the current option , here with 96gb ram:
PS crazy brainstorming, even beeing able to put 2 motherboards inside a 16” laptop for advanced AI tasks. It crazy because it as already failed in the Acorn PC era , where you could have several CPU boards on one PC. This would require advanced hardware and software integration…. but for AI might be of interest : Risc PC "Duet" second ARM processor card - stardot.org.uk
Not really, that’s 100 TOPS total, with massive hardware and software complexity (LLM clusters are still in very early days and often end up with 100% overhead - i.e. with two nodes you get performance of one node :P)
A 5090 is, meanwhile, 1800 TOPS - EIGHTEEN times more, with zero complications.
It depends on how much you’re prioritizing speed vs raw model size. Dual AI max systems would allow you to run any model that fits within 256GB of VRAM, giving a lot more capability to run large models than the 32GB that a 5090 has (or 64GB if you had 2 5090s, to match the dual mainboard configuration being discussed).
OSS-120 (and other MOE model) need many (V)RAM > 80G run prety good on IA-MAX (~1200 t/s on pp and 45/50 t/s on tg).
Think that you need 3 RTX-5090 to make it run…
But the main question was: IA-MAX on FW16… I will say it have not be design for that, the FW16 have be design for CPU+add-on (dGPU) with 2 parts…
If you hask me I may like to have a 16" laptop with a IA-MAX, but the case have be design for a ~60W CPU + 180W GPU. draw 140W for the CPU mean no dGPU possible, and possibly no possibility for good CPU cooling…
And keep in mind that the max USB-C power is 240W…
So who want a IA-MAX capped to 120W without dGPU compatibility, or a 60W capped with dGPU support?
I saw a fun article about token speed.
They did analysis of if the human was acting as the LLM, and what token speed it could achieve:
Information rate of human behaviors:
Behavior/activity Time scale Information rate (bits/s) References
Binary digit memorization: 5 min 4.9 International Association of Memory”
Blindfolded speedcubing: 12.78s 11.8 Guinness World Records Limited®
Choice-reaction experiment: min 5 Hick, '' Hyman, '‘* Klemmer and Muller’?
Listening comprehension (English): min—-h ~13 Williams?
Object recognition: 0.5s 30-50 Sziklai'*
Optimal performance in laboratory motor tasks: ~15s 10-12 Fitts '° and Fitts and Peterson'°
Reading (English): min 28-45 Rayner'’
Speech in 17 languages: <1min 39 Coupé et al. '®
Speed card: 12.74s 17.7 International Association of Memory'°
StarCraft (e-athlete): min 10 Guinness World Records Limited"?
Tetris: min ~7 Tetra Channel*°
Typing (English): min-h 10 Dhakal et al.° and Shannon*