ECC is technology that allows computers to correct memory errors. The most popular type of ECC used in memory modules is single-bit error correction. This enables the detection and correction of single-bit errors (within a byte, or 8bits of data). It will also detect two-bit and some multiple bit errors, but is unable to correct them.
Take the most common single bit error correction for example. For each byte of data sent across the memory bus, a check-bit is generated by calculating that byte of data using an Exclusive OR algorithm. This check-bit will be stored in a separate memory chip. That is why memory modules with ECC capabilities have nine memory chips on each side, rather than the eight chips per side often seen with non-ECC memory modules.
The system will use the check-bit to check if the data is correct, and correct the single-bit error if there is one. The check-bit will be transferred together with the original byte of data. Therefore, the ECC memory bus is 72-bit wide as opposed to the 64-bit non-ECC memory. Remember only 64 bits out of the 72 bits of data are counted for bandwidth and application usage, the other 8 bits are all check-bits, so the effective bandwidth of ECC and non-ECC memory is identical.
“Do I Need to Get ECC Memory?”
To answer the question above, we have to figure out where memory errors come from first. There are two major causes of the so-called “soft” errors:
- Naturally occurring radioactive isotopes (which emit alpha particles)
- High energy cosmic rays from supernovas
Both of these events can change the value of data stored in a memory chip. These errors are called “soft” errors because they can be repaired by correcting the value of the memory bit, which is exactly what ECC does.
Chances for a single-bit soft error occurring are about once per 1GB of memory per month of uninterrupted operation. Since most desktop computers do not run 24 hours a day, the chances are not actually that high. For example, if your computer (with 1GB of memory) runs 4 hours a day, the chances of a single-bit soft error happening (when your system is running) is about once every six months. Even should an error occur, it won’t be a big issue for most users as the error bit may not even be accessed at that time. Should the system access the error bit, this little error won’t result in a disaster either – the system may crash, but a restart of the system will fix that. That’s why ECC memory is not a necessity for most home users.
Things are very different when it comes to workstations and servers. To begin with, these systems often utilize multi-gigabytes of memory, and they usually run 24/7 as well. Both of these factors result in increased probability of a soft error. More importantly, an unnoticed error is not tolerable in a mission-critical workstation or server – a system crash is only the smallest of worries. What really matters is the erroneous data itself – you can imagine the issues that can arise as a result of a soft error in bank systems or a flight control computer system. Therefore, ECC memory is definitely required for mission critical applications.
Finally, if you do need ECC memory, you’ll have to buy a motherboard that supports ECC memory modules in addition to the ECC memory modules themselves. Without motherboard support (or memory controller support, to be more accurate), the ECC memory module is effectively the same as non-ECC memory.
What’s the Difference between Registered Memory and Unbuffered Memory?
The difference between registered memory and unbuffered memory is whether there are registers on the memory module. Registers are logic components rather than memory. What registers do is buffer the address and command signals going on to the memory module. The memory controller directly addresses each memory chip on all modules in the system directly in unbuffered memory. In registered memory, the memory controller only sees the register, for which there is one per physical bank of memory.
Almost all system memory in today’s PCs is unbuffered memory. With increasing system memory, the stability and performance deterioration of memory is inevitable – as mentioned above, the memory controller has to address each memory chip on all modules directly, which results in high electrical loads. To solve this problem, higher density systems use registered memory instead. Registered memory modules contain registers as a buffer to temporarily hold data (address and command data only) for one clock cycle before it is transferred. This increases the reliability of high-speed data access to high density memory but sacrifices some performance since there is one additional clock cycle between the Chip Select and the Bank Activate command.
Supplemental Information: Why is “Unbuffered” the Counterpart of “Registered”?
Buffers are known as “asynchronous” components, which is to say signals on the input pins appear directly on the output pins. On the contrary, registers are known as “synchronous” components: New signals on the input pins do not show up immediately on the output pins. Instead, they wait for the next tick of the system clock. There were “buffered” memory modules at the time of the old EDO and Fast Page Mode modules, which were both asynchronous DRAMs.
Who Needs Registered Memory?
For a home user, registered memory may not be useful at all – in fact, there is a little performance drop with registered memory. Those who need to utilize more than 4GB of memory in a system, registered memory should be an absolute consideration. Of course, you’ll have to choose a motherboard that supports registered memory modules as a simple requirement. Also, registered memory is required by some server and workstation motherboards – you don’t really have other choices in this case.