ECC support?

Do we have any updates on this now that it’s launching?

1 Like

I don’t think there’s anything to update. See this post from October:

1 Like

OOPS! I’m wrong here, looks like framework has addressed this concern in the knowledgebase, and it is not supported at the moment: What DRAM/memory is supported by Framework Laptop 16?

I thought the answer was yes because the CPU supports ECC?

Taking a look at what works with the 13, it looks like ECC is not supported there either: What DRAM/memory is supported by Framework Laptop 13 (AMD Ryzen™ 7040 Series)?

1 Like

Sooo a few months later: will we get a Framework laptop with ECC supporting AMD Pro CPUs this year or in 2025? Any refresh with ECC support on the roadmap?

3 Likes

There are no AMD Pro CPUs on the road map. There are also no other CPUs on the roadmap. Acutally there is nothing on the roadmap, because framework doesn’t talk about their future plans.

6 Likes

Let me rephrase that: It would be great if Framework would announce an additional laptop configuration with AMD Pro CPU and ECC memory from the factory for this year or the next.

3 Likes

Anybody can share what RAM they got? According to Notebookcheck it was A-DATA. I am planning to use Kingston Fury as I’ll be running some Linux VMs and test some processes. It is on sale right now in their US store.

Ryzen 7840HS is supporting ECC, obviously not sure about the motherboard, but even if it won’t, still feels this RAM would work better for my use case.

1 Like

Welcome to the community!

In the knowledgebase article about RAM, it seems there are a couple of ECC RAM modules that have been tested and found that they will boot. However, according to AMD, only the Pro versions of the Ryzen processors actually support ECC. The 79/7840HS does not support ECC. So even if ECC RAM will technically boot and operate on a machine with an HS or U processor, the ECC functionality will not operate. This has been confirmed by Framework as well.

However, the kit you linked is not “ECC” in the traditional sense. All DDR5 memory has “on die ECC,” which is just some error checking that happens on the RAM itself. This is not what was traditionally called ECC and does not require any special compatibility from the processor or memory controller. The RAM sticks you linked to should work just fine. I believe they are listed as working on the RAM compatibility chart Framework has.

5 Likes

The 16 GB per module variant of that is actually one of Framework’s officially tested memory kits. The 32 GB per module variant that you linked to should work fine.

Not exactly. The 7840HS does not support full ECC. It supports on-die ECC, which is required by the DDR5 spec and supported by all DDR5 modules and computers. On-die ECC protects against errors that occour within the module. Full ECC also protects against errors that occur in the communication between the module and the memory controller.

2 Likes

If someone has gotten himself some ECC RAM with the FW16, could you please under linux issues the following command and paste here the output?
The below is from my server - using a AMD Ryzen Embedded V2748 which supports ECC Ram.

# dmidecode --type memory
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 3.2.0 present.

Handle 0x000A, DMI type 16, 23 bytes
Physical Memory Array
        Location: System Board Or Motherboard
        Use: System Memory
====>   Error Correction Type: Multi-bit ECC
        Maximum Capacity: 32 GB
        Error Information Handle: 0x0009
        Number Of Devices: 2

The Error Correction Type will tell you if it is being used or not.

1 Like

After 190 posts in this thread we know it’s not being used. That ship has sailed. :ship:

I would hope that Framework announces a new configuration with Pro CPU that we can configure with ECC memory from the factory and where Framework guarantees that it does support Multi-bit ECC. The :basketball: is in Framework’s court now.

3 Likes

Well, when I received my server board, it was officially not supported (I think I was one of the first in europe to receive it).
But, I received the board and checked with ECC, and it worked and was supported.
So - just check’in :wink:

1 Like

I have heard of scenarios where the error correction type was shown as ECC but the actual correction/logging did not happen. So to be absolutely sure one needs to introduce errors and check if they are handled correctly. Some implementations allow “ECC poisoning” which means they deliberately introduce errors for the memory controller/OS to find. A more low tech approach is overclocking or undervolting until errors happen. Both are probably not realistic in a laptop.
For what it’s worth, on my FW13 AMD neither dmidecode nor other methods of ECC detection show any indication of ECC actually working.
AMD just pedaled back on claiming the 8000G desktop APUs supporting ECC, just like they did for 7000 series mobile APUs. https://www.tomshardware.com/pc-components/cpus/amd-confirms-ryzen-8000g-apus-dont-support-ecc-ram-despite-initial-claims

2 Likes

DDR5 have a limited form of ecc built in (1 bit per 12? I can’t remember), so that answers your question.

I don’t know of any typical DDR5 ECC yet, but they could exist. Would they fit? Likely not. They are also likely to be very expensive.

Also yes, a lot of OS when running on ECC will just … run on ECC. They wouldn’t do anything even if something did occur.

In theory with the checksum/parity you can un-do the error, but its very complicated.
We should just have parity on all and everything. This way we don’t even really need ecc.

1 Like

I think you mean On-Die-ECC which is NOT comparable at all with normal ECC. But sadly marketing does not care and it is once again on the buyers to filter the truth from misleading marketing materials.

I have DDR5 ECC in my FW13 AMD, made by Kingston. ECC function is not available/working, otherwise it’s fine.

It is not complicated. As soon as the hardware and firmware supports it, it just works. 1 bit errors are corrected, regardless of OS. Only logging and reacting to 1 or 2 bit errors needs OS support.

There are systems that can simulate ECC by writing everything to RAM twice and compare that on every read. Very inefficient and halves your available RAM. Real ECC between memory controller and RAM sticks with a bit of extra memory on the stick for parity is much better. We have hardware parity or redundancy close to anywhere else in our computers. SATA, PCIe, inside HDD/SSD, CPU caches but interestingly not our main system memory. We have Intel to thank for that…

4 Likes

Actually, there are tools under linux that you need to install to log these errors correctly.
Check out: Monitoring ECC memory on Linux with rasdaemon | Just another blog
I have configured logging on my server and so far I have seen one error on a storm day when the lightning struck into the street in front of our house. I’m happy that I had builtin protection into the house’s system. 3 other houses had practically 50% of their devices destroyed that day.

3 Likes

Agreed. Hopefully Framework can add a SKU for this.

Though, imo, this artificial segmentation from AMD’s side is more than just a little annoying. ECC is not like the dedicated enterprise SW stack they need to fund via a PRO SKU. It’s already baked-in to the IO die.

:unamused:

6 Likes

I see issues with just about every point you made:

The on-die ECC of DDR5 is indistiguishable from higher reliability dies without ECC error correction.
It has nothing to do with classic Multi-bit ECC (which this thread is about), because you don’t get the errors reported. We already noted this #84 and #123+.

Yes, they do actually exist (and fit). And neither are they very expensive. Framework has even tested them and wrote blogposts about it. They work, but they don’t offer ECC protection with the current Framework model. Those blog post are already linked 8 times in this thread! (e.g. #83 and #120)

If you configure the OS correctly they will log either with WHEA in Windows or rasdaemon under Linux. Please do some research before you post misleading information.

No, you can’t protect the memory without hardware support in software only. The page table and other critical structures reside in memory. And you can’t simply double write them. And I am not aware of any software support for that in mainstream OSes.

3 Likes

Yes, I know. I meant cases where dmidecode showed error correction but hammering the RAM with known data showed errors in the data which should not happen with ECC active. In case of uncorrectable errors there should be at least machine check exceptions logged in the kernel ring buffer, no rasdeamon needed. I assume under Windows in WHEA that holds as well.
dmidecode reads SMBIOS data tables, so in theory the BIOS/UEFI could just lie about ECC being present/working.

1 Like

Well, I actually by chance looked at the kernel rung buffer after the entire neighborhood went dark (after the lightning). That’s where I detected the mce errors, and decide to actually actively log these. AFter all - I have these :smiley:

1 Like