ECC can’t prevent Windows or drivers crashing by itself, which is the source of most Windows-related BSODs. ECC also can’t prevent straight-up hardware failure that’s not on RAM.
Some other evidence for ECC not being particularly relevant:
We’ve observed uncorrectable ECC error instances exactly 6 times over the last 5 years in a bit over a million server installations. Every one of these >1 ECC errors was tracked down to faulty hardware and not “sunspots”, “glitches”, or any such folklore. These are multi-bit ECC errors, so obviously single-bit ECC errors would be far more common, but in every instance of multi-bit ECC error, when the hardware was subsequently tested directly, each one resulted in a constant >2bit error (meaning it was always a cascading failure at the time once that memory address block was accessed). Summary of anecdotal evidence supporting very low memory error rate: in >1mil ECC server installs, 6 of 6 multi-bit ECC errors were due to faulty hardware that would be found immediately upon boot up if POST testing was set to “FULL”.
Contrary to what others have (incorrectly) stated here, all single-bit ECC errors are corrected on-the-fly. Any multi-bit ECC errors result in an immediate kernel panic so that the impact is isolated to service availability but never data corruption.
ECC errors are so uncommon, to combat potential memory corruption on my non-ECC Optiplex Micro 3050 Proxmox server (running 6x16TB SATA drives in RZ2) I have simply scheduled a nightly reboot and always fully test memory on boot.
Now you’re just being silly for the sake of it. The subject is ECC. Obviously, ECC will detect and correct some failures in your ram. I don’t use Windows so I can’t help you with any of that and none of that had anything to do with ECC ram.
Also to your wall of random ecc statistics, they don’t agree with my real world experience. In my profession, we see memory errors quite often among our many servers. In any case, it’s always better to have the option to use it if possible.
Which would not even have been detectable without ECC.
Yes, single bit errors happen.
They are not corrected without ECC and do not automatically cause kernel panics without ECC.
Prevention seems to need active steps like rebooting regularly which would not be necessary with ECC.
I do not take your post as particularly good reason for not having ECC .
In conclusion that all amounts to errors happen and ECC is useful. If your data is worth it to you is your decision. I would like hardware vendors to leave that choice to me. No real server vendor will even sell servers without ECC except in the absolute bargain bin tier of hardware.
It becomes more relevant the more memory you use. Most applications do not use a lot of memory and some memory processes cause more errors than others like heap stacks. If it is a server that is mostly idle, a laptop that only browses the internet and other low memory and especially low critical memory devices likely have no real use for it.
Developers and other occupations with high computer utilization see a lot more of this. These same people are also a lot more picky with their computers and a lot more frustrated with designed to fail devices, especially on the corporate side. Hence why it is very popular here. Normal computer users would just get a good value laptop. Not something this expensive.
ECC error logging just told me that my overclock on my desktop PCs RAM is not stable any more. About 1 corrected error every hour. I configured that overclock about a year ago with extensive (days of memory load) and error free stability testing. So either my memory has degraded, my powersupply is less stable than before or a plethora of other possible reasons can be the cause.
But this example shows how ECC is a useful addition even for non professionally used systems. These hourly errors would have slowly corrupted my data and it probably would need to get a lot worse for me to actually notice something being wrong with hardware. Instead of knowing there are corrected errors happening I would see software crash every now and then, maybe my checksumming file system would catch some incorrectly checksummed files or stuff like that. Everything easily attributable to software bugs, updates or whatever else changed. Hardware is usually the last thing I think about in such cases.
One could make the argument that it is my fault for overclocking the RAM, but think about how many people run XMP/EXPO profiles on non-ECC memory. Typical advice is doing some rounds of memtest++ and call it a day for such configs. My RAM runs not even close to settings that the same Samsung B-die memory was used for in “gaming” sticks.
I’m happy to be in the know on this potential hardware problem and can now dial the overclock a bit back or give it a bit more voltage and just keep an eye on my error logs.
Sooo a few months later: will we get a Framework laptop with ECC supporting AMD Pro CPUs this year or in 2025? Any refresh with ECC support on the roadmap?
There are no AMD Pro CPUs on the road map. There are also no other CPUs on the roadmap. Acutally there is nothing on the roadmap, because framework doesn’t talk about their future plans.
Let me rephrase that: It would be great if Framework would announce an additional laptop configuration with AMD Pro CPU and ECC memory from the factory for this year or the next.
Anybody can share what RAM they got? According to Notebookcheck it was A-DATA. I am planning to use Kingston Fury as I’ll be running some Linux VMs and test some processes. It is on sale right now in their US store.
Ryzen 7840HS is supporting ECC, obviously not sure about the motherboard, but even if it won’t, still feels this RAM would work better for my use case.
In the knowledgebase article about RAM, it seems there are a couple of ECC RAM modules that have been tested and found that they will boot. However, according to AMD, only the Pro versions of the Ryzen processors actually support ECC. The 79/7840HS does not support ECC. So even if ECC RAM will technically boot and operate on a machine with an HS or U processor, the ECC functionality will not operate. This has been confirmed by Framework as well.
However, the kit you linked is not “ECC” in the traditional sense. All DDR5 memory has “on die ECC,” which is just some error checking that happens on the RAM itself. This is not what was traditionally called ECC and does not require any special compatibility from the processor or memory controller. The RAM sticks you linked to should work just fine. I believe they are listed as working on the RAM compatibility chart Framework has.
The 16 GB per module variant of that is actually one of Framework’s officially tested memory kits. The 32 GB per module variant that you linked to should work fine.
Not exactly. The 7840HS does not support full ECC. It supports on-die ECC, which is required by the DDR5 spec and supported by all DDR5 modules and computers. On-die ECC protects against errors that occour within the module. Full ECC also protects against errors that occur in the communication between the module and the memory controller.
If someone has gotten himself some ECC RAM with the FW16, could you please under linux issues the following command and paste here the output?
The below is from my server - using a AMD Ryzen Embedded V2748 which supports ECC Ram.
# dmidecode --type memory
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 3.2.0 present.
Handle 0x000A, DMI type 16, 23 bytes
Physical Memory Array
Location: System Board Or Motherboard
Use: System Memory
====> Error Correction Type: Multi-bit ECC
Maximum Capacity: 32 GB
Error Information Handle: 0x0009
Number Of Devices: 2
The Error Correction Type will tell you if it is being used or not.
After 190 posts in this thread we know it’s not being used. That ship has sailed.
I would hope that Framework announces a new configuration with Pro CPU that we can configure with ECC memory from the factory and where Framework guarantees that it does support Multi-bit ECC. The is in Framework’s court now.
Well, when I received my server board, it was officially not supported (I think I was one of the first in europe to receive it).
But, I received the board and checked with ECC, and it worked and was supported.
So - just check’in
I have heard of scenarios where the error correction type was shown as ECC but the actual correction/logging did not happen. So to be absolutely sure one needs to introduce errors and check if they are handled correctly. Some implementations allow “ECC poisoning” which means they deliberately introduce errors for the memory controller/OS to find. A more low tech approach is overclocking or undervolting until errors happen. Both are probably not realistic in a laptop.
For what it’s worth, on my FW13 AMD neither dmidecode nor other methods of ECC detection show any indication of ECC actually working.
AMD just pedaled back on claiming the 8000G desktop APUs supporting ECC, just like they did for 7000 series mobile APUs. https://www.tomshardware.com/pc-components/cpus/amd-confirms-ryzen-8000g-apus-dont-support-ecc-ram-despite-initial-claims