Bringing Framework to the enterprise

After the first time I upgraded the motherboard in my Framework 13 I couldn’t help but think “I wish I could have done this on the servers I managed in the datacenter”. I rolled the idea around in my head for a long time, wondering what a Framework rack server would look like.

I’m not an engineer of any kind, just an experienced IT guy who spent forever installing and upgrading and managing rack servers, and I have thoughts about the kind of server I wish I had. So here’s my idea, please humor me and give it a thought, and then tell me why you think this is good or bad or otherwise. I’m particularly interested in anyone with hardware design or engineering experience who can point out if I’m asking for something impossible (or just way too expensive).

The Framework Matrix, a flexible forever-platform

Philosophy: Bring Framework’s repairability and upgradability to the datacenter. Everything is user-upgradable and replaceable just like the laptops, and wherever possible uses commodity components. It is capable of running incredibly dense workloads and the widest array of interfaces and modules of any server platform.

Specs:
3U standard rack server, 32” deep
16x front 2.1”x2.6” bays, 15” deep (this is the full front of the chassis, one big open space).
12x rear 2.1”x2.6” bays, 9” deep (The rear has 12 bays instead of 16 to accommodate the 3 (n+1) power supplies and management IO)
Modules slide in with rails along the top edge and bottom edge (modules mount upside down in the bottom row)

Besides Framework’s philosophy, what makes this special is the flexibility of a PCIe 6.0 matrix switch. Each one of the 28 bays has a x8 PCIe connector on its backplane, and the chassis system board can map lanes from any bay to any other bay. (x8 because I think beyond that you have to switch to a much more expensive system to make a PCIe matrix, but that’s something AI told me so maybe not).

This means that I can put a compute modules front bays, and a bunch of nvme drives in a rear bay, or an OCP card, or a generic PCI slot, basically anything that can use PCIe lanes can go in a module, and the system can map hosts to devices regardless of where they physically reside in the system.

The chassis system board will also have connections for ethernet, bmc, serial, usb and power to each bay, and has an integrated 2.5gb ethernet switch with 2 rear-panel uplinks. Faster networking is available via a 4-bay add in module, and of course any host can connect to a generic PCIe slot module so you can add any kind of specialized networking or storage that you’d want mapped to any bay you want.

Most of these modules are essentially carriers for commodity parts, with maybe a few chips on the board (PCI repeater, SATA controller, whatever), but of course there will also be compute modules. Framework could make them in 2-4 bay widths, but you can also have modules that are multiple bays tall, or both allowing fairly sizable computers to be used.

For compute modules they could make a 1-bay ‘light’ node, a 2-bay ‘standard’ node, and a 4-bay ‘monster’ node. With 2 bays you can have a 4” wide and 15” deep motherboard, you could fit a lot of power on there.

A high-speed networking module could expose 8 100gb interfaces that can map to host bays, and on the outside it can present 2x400gb uplinks or more (actually with PCI 6.0 it might be double that). It would also interface with the internal 2.5gb switch. A simpler 10GB switch module could also be available.

Another module could be 1-bay wide, and it could be for clusters of tiny machines like raspberry pi compute modules, or just a ‘tinkerer’ module with no board, just the backplane connectors broken out to standard ones.

This platform has other tricks:

  • a 4-bay wide front module can contain two mini-ITX motherboards one behind the other
  • a 2-bay (vertical) module than can hold two Framework 13 motherboards, repurposed as compute modules. They’d have HDMI and USB on their outsides so you can get a console on them with a crash cart. Each one even gets x4 PCIe to the backplane from the m.2 slot if you’re ok using USB storage
  • a 3-bay (horizontal) PCie module which can house almost any consumer-grade GPU (enterprise GPUs might fit into a 2-bay module, and perhaps even in the back - some are under 9” deep)
  • a 1-bay NVMe storage module could fit as many as 24 sticks. I suspect cooling could be an issue at that point, but the density would be nuts. It would also be straightforward to make u.2 2.5” disk modules, or even SATA 3.5” disk modules, if you use 2 or 3 bays.

The chassis fans will be standard parts (92mm). So will the power supplies (M-CRPS). The modules’ specifications would be open so people could make their own easily. And of course there will be some laptop-style removable ports on the back to access the platform’s management interface.

Critically, the brains of the chassis consists of a single central board and two backplanes, and they can be upgradable – at some point in the future a user could upgrade to a PCI 7.0 system board with a new 512-lane PCIe matrix giving 16 lanes to each bay, who knows. The important thing is that the only permanent part of the system is the metal box.

And hey why not – the front bezel can house some of those little tiles from the Framework Desktop

Specs:3U standard rack server, 32” deep
16x front 2.1”x2.6” bays, 15” deep (this is the full front of the chassis, one big open space).
12x rear 2.1”x2.6” bays, 9” deep (The rear has 12 bays instead of 16 to accommodate the 3 (n+1) power supplies and management IO and buttons)
Modules slide in with rails along the top edge and bottom edge (modules mount upside down in the bottom row)

I’m interested in everyone’s thoughts.

A few things impacted my design. In my career I found that we very rarely had opportunities to hot-swap anything upon failure, most failures take a system down anyway. I think there was one failed power supply that I swapped live over 20 years, and of course hard drives failed all the time, but a system fan? some internal module? nah. So My thought is that the PSUs need to be hot swappable, and there must be storage options that expose the drives externally, but in my experience solid state drives fail so infrequently that as long as you have hot spares you can often wait for a downtime window. But all options should exist, and can, and YMMV. I suspect most people would be ok if you had to take a module offline to fix it, as long as the rest of the platform stays up.
Another thing - even though back at my IT job we had service contracts with Lenovo and NetApp, we’d keep a bunch of spare parts around, not just drives and power supplies, but proprietary stuff like fans and certain cable assemblies. We did this because sometimes it can take a while for the tech to get a part, and often we can fix things ourselves, but we had to buy a lot of these overpriced parts. I’d prefer keeping cheap commodity parts around.

My wild guess is that a system like this without modules would probable have to sell for maybe $5k, which is a lot, but the modules should be much cheaper than competitors’, and I bet a bare blade server of this size from the bigger companies cost even more. That being said I don’t know who the target audience is – it’s too expensive and technical for companies that don’t have an IT staff, and probably not that useful for a company with an IT budget in the millions, but somewhere in between I think there are plenty of mid-size businesses that buy servers in this range. I think. :sweat_smile:

Oh duh I missed an obvious bit of synergy - The framework desktop motherboard is mini ITX sized, so it would just need a different heatsink to be able to be used in a 4-bay module, and two motherboards go back to back in one module, and two more could go in shallow bays on the back while still leaving room for storage and advanced networking.

10 machines in 3u isn’t bad

These are really intriguing ideas. I don’t build enterprise servers, but I’m fascinated by the trickle-down of retired hardware available on sites like eBay.

How far does Open Compute Project cut into margins for this or provide ‘already solved’ approaches to some of the design?

Is there a gap in the market for HomeLab people wanting to give a second life to OCP devices but there’s no bridges/adapters or case mounts? Would OCP connectors for PCIe be of interest?

Does Compute Express Link (a way to put trays of remote RAM on a PCIe link and to assign to VM’s inside a host) appeal? Would it put constraints on the PCIe Matrix?

You can’t have an enterprise system when motherboards are sourced and supported by business without any enterprise support which is the case with Framework.

What do we have here? Support requests via web form and email to support@frame.work with no tracking number? Response takes days; resolution takes weeks… Return/exchange process takes weeks… Order cancellations and refunds are handled weeks after initial requests (know it from my own experience: my unfulfilled order I cancelled shortly after placing it is still not refunded 11 days later despite numerous requests, and I have now to pay my $3k credit card bill for something I never recieved).

So yeah, enterprise systems are built from enterprise-grade components and receive enterprise-level support which is not the case with Framework. At it’s current level, It is more a product for enthusiasts and tinkerers building stuff at home. I won’t risk selling their systems to our customers, at least not until their support improves and they implement proper process to support their products.

This is what happens when someone totally unqualified has ideas…

I’m only vaguely aware of the specifics of ocp so, yeah, I could be reinventing the wheel. It appears they have a standardized connector for module sideband and stuff that goes on a motherboard (nics,USB,serial,etc) which I’d adopt, but otherwise it seems to have a lot less flexibility and I don’t think it’s cheaper, it really seems to be designed for massive scale datacenters. I do believe the power supplies i’d use (m-crps) are ocp-spec compatible, that standard has been adopted by the major servers anyway. I’d also try to adopt any standards they have for internal connectors and cabling and anything else that doesn’t break my design.

Re remote RAM - this is a very cool idea I hadn’t thought about. my probably-wrong analysis (ie, thanks Gemini!) of PCI matrix chips tells me that I can’t do more than 8 lanes per Bay without getting a very expensive system with multiple tiered matrix chips. But a 4-bay module would have access to 4 backplane connectors, giving access to 32 lanes. Assigning half of them to a cxl module should work well I think, and the chassis supposedly could dynamically move the module between hosts because they’re hot pluggable. (The cxl spec allows for memory sharing but my brain broke a little bit while trying to understand what that takes). This strikes me as interesting because it allows temporary assignment of additional RAM to a host that needs it for a particular task.

I think the whole idea lives and dies by the viability of the pcie matrix; getting a solid PCIe 6.0 signal from potentially as far as the front of the chassis to the back of the chassis is non-trivial. BTW all “homebrew” and lower end modules would probably run at PCI 5.0 unless they have a retimer chip, at PCI 6.0 my understanding is that the cable/trace length limits are stricter. Modules that are custom designed for this chassis could add a retimer chip to their design at a reasonable cost.

Apologies for the verbosity, it’s a bad habit I’ve stopped fighting.

Oh and OCP network cards would take up 2 or 3 bays (3 for the LFF), but note they’re thin - there’s enough room to stack 4 ocp cards in that space but that would probably want too many PCI Lanes for that number of bays. A 3 bay module will have three 8-lane connectors, so it could pass those straight to the cards. It makes sense to limit to three for this reason even if it could technically fit four - pci6 x8 gives 64gb in each direction so this isn’t really limiting for most scenarios. So if you had,say, eight 2-bay hosts in the front, you could use 8 (out of 12) of the bays in the back to give 8 lanes from each host to a dedicated ocp card. A host can connect similarly to a normal PCIe card, but at half the density.

Yeah moving to the Enterprise would certainly require changes to the way they do support. I assumed they would contract out all of the on-site work and part replacements, but I have no idea how this kind of thing is organized. Your point is definitely an important one, and yeah, might very well scuttle the whole idea.