ECC memory module modification?

Some time ago I found Intel video that shows the benefit of ECC memory. The video presents modified ECC memory module with a switch that induces memory error, then it shows how i5 can't handle such error and how Xeon processor can. Here is the video in question, they show module at time

28-29s:

formatting link

I'd like to have such module for demonstrating purposes. I wrote on Intel forum but after some time it occured that it is impossible to get the schematic of the modification from them (obviously...).

Is it possible to figure out how to make such modification? Theoretically, what modification would probably work and would not damage motherboard/CPU?

I would appreciate your thoughts.

thanks geos

Reply to
geos
Loading thread data ...

Error detection and correction coding involves adding check bits to individual words of data.

The scheme I was supposed to implement on one system that never actually got built would have need seven extra bits to check and correct 32 bit words, and eight to check and correct 64 bit words.

That means that your "64-bit" memory words have to be stored in 72-bit wide words, and routed through a 72-bit wide error checking and correction module on the way to the 64-bit wide processor.

The error-checking and correction process takes a finite time, which changes the clock timings around the system.

It wouldn't be easy to retrofit.

--
Bill Sloman, Sydney
Reply to
bill.sloman

s

t

idual words of data.

got built would have need seven extra bits to check and correct 32 bit word s, and eight to check and correct 64 bit words.

de words, and routed through a 72-bit wide error checking and correction mo dule on the way to the 64-bit wide processor.

ges the clock timings around the system.

I don't think he is talking about adding ECC to a non-ECC computer. He is asking about making a mod to the memory module to introduce errors to verif y the unit is correcting the error. The thing I'm unclear on is what type of memory module this is. I guess they can use a module in a non-ECC compu ter that supports the extra bits. The computer just ignores them.

Rick C.

Reply to
gnuarm.deletethisbit

My understanding is that this is ECC module inserted into ECC supporting platform (in case of Xeon CPU and chipset/motherboard). The switch induces error (1 bit) and the whole combo is able to handle it based on additional correction bits available in the module.

thanks geos

Reply to
geos

is asking about making a mod to the memory module to introduce errors to v erify the unit is correcting the error. The thing I'm unclear on is what t ype of memory module this is. I guess they can use a module in a non-ECC c omputer that supports the extra bits. The computer just ignores them.

Since the ECC circuitry is in the CPU the same motherboard can be used for two different processors, one with and one without ECC.

Bill's description is how they used to implement ECC in the old days when i t was done in discrete logic. I worked on an array processor which used E CC this way. I believe they used an entire clock cycle to perform the erro r detection and correction function. But then the memory was using 8 clock cycles for every data access. To get data on each clock they used 8 banks of memory interleaved. This is essentially what is inside each SDRAM chip these days, not including the ECC of course.

Do you have a computer with ECC? I don't know what details they implemente d, but if I were designing this I would cut one data line and add a FET swi tch inline. When the switch is enabled it should transmit the signal witho ut significant delay (for some value of "significant"). When the switch is off the state of the signal will float and data will be wrong causing your crash if no ECC is used. Just make sure you grab one of the data lines th at aren't part of the ECC code.

Rick C.

Reply to
gnuarm.deletethisbit

Yes, I have complete ECC-supporting server platform but it has disabled injection of ECC errors in firmware. For this reason the memory testing software that does ECC error injection to test ECC feature cannot test for such feature. Modified memory module would skip firmware restriction and I could clearly see that the whole combo is working properly.

On other group someone suggested joining 1 line for data to power through resistor and switch. I thought similar but joining to a ground. Does any of these ideas make sense to you?

thanks geos

Reply to
geos

ented, but if I were designing this I would cut one data line and add a FET switch inline. When the switch is enabled it should transmit the signal w ithout significant delay (for some value of "significant"). When the switc h is off the state of the signal will float and data will be wrong causing your crash if no ECC is used. Just make sure you grab one of the data line s that aren't part of the ECC code.

Yes, I expect a resistor to either power or ground would do the job. Just don't make the resistor too small or it may damage the chips or too large a nd it won't disrupt the signal. I seem to recall the newer DRAM types use a differential signal. They are usually terminated into a resistance equal to the characteristic impedance, so fairly small.

Rick C.

Reply to
gnuarm.deletethisbit

there's additional complexity with memory technologies above "DDR2" SDRAM because multiple memories are connected to the same data lines and data strobes, the data strobe drivers and receivers are constantly changing at the discretion of the memory controller and might not even be on the same DIMM. that is to say it might not be possible to immediately externally tell which data line is being used for ECC at any particular time with that technology of memory.

Reply to
bitrex

Thanks Rick. Do you have any suggestion for such small resistance?

thanks geos

Reply to
geos

y-for-servers.html

(An edited version of my followup from the original group.)

Interesting problem. Two memory modules need to be hacked. i5 processors apparently use either DDR3 or DDR4. You need a pinout:

formatting link

Then you need to debounce a push button switch and twiddle one bit for one read cycle, regardless of how long the button's pushed. So, you need a timing diagram.

Oh, BTW, FWIW. My sacrificial mobos take a lot of abuse and still keep on going. Obviously if you connect a mobo directly into a wall outlet, something's going to give. But, other than that, mobos made by Intel seem to contain plenty of protection circuitry to protect the board internals from "public" interface mishaps. Just saying.

Thank you, 73,

--
Don, KB7RPU 
There was a young lady named Bright Whose speed was far faster than light; 
She set out one day In a relative way And returned on the previous night.
Reply to
Don KB7RPU

s

c-memor

Why do you need to worry about producing exactly ONE error? The idea is to test that the ECC is working, not to test it on individual cycles. The te st criteria is for the computer to lock up. If the bit you corrupt is data it may well not lock up but keep merrily along. Mashing a data line is mu ch more likely to corrupt instructions and massive amounts of data which wi ll invariably cause a software failure.

Rick C.

Reply to
gnuarm.deletethisbit

.

ust don't make the resistor too small or it may damage the chips or too lar ge and it won't disrupt the signal. I seem to recall the newer DRAM types use a differential signal. They are usually terminated into a resistance e qual to the characteristic impedance, so fairly small.

Not really. I would want to research the specs and see what they are using for a terminator and how much current they are driving. Then if you have an idea of what the thresholds are you can calculate a resistor. Actually, if I am right about the data lines being differential, you may need to mas h the pair in opposite directions to be sure you mess up the data. I suspe ct the signals are similar to LVDS, but the devil is in the details. Do yo u know which flavor of SDRAM you are using?

Rick C.

Reply to
gnuarm.deletethisbit

I think that for testing I will go with Micron 1GB ECC DDR3 module first:

formatting link

thanks geos

Reply to
geos

A simple "bit masher" is a lot easier to design too. But my ignorance of the ECC spec leaves me with the open question, "Does ECC correct an

Thank you, 73,

--
Don, KB7RPU 
There was a young lady named Bright Whose speed was far faster than light; 
She set out one day In a relative way And returned on the previous night.
Reply to
Don KB7RPU

Just open one of the data lines.

--

John Larkin         Highland Technology, Inc 

lunatic fringe electronics
Reply to
John Larkin

ry.

uces

eon

me

get

rs

ed

.

pt

ECC is a generic term, but in this case I believe the correction covers a w ord at a time. Any one bit error is corrected and double bit errors (or mo st of them) are detected as well as some three bit errors if memory serves. This is optimal when the errors are random and infrequent such as radiati on caused bit errors.

Flash can have errors that occur in bursts, so they use ECC on a block at a time. I'm not sure how many bits can be corrected, but more important is to detect the errors and flag bad blocks. Flash is actually pretty bad in that regard. Even with the ECC you should never trust anything important t o Flash without another backup to your backup.

Rick C.

Reply to
gnuarm.deletethisbit

y-for-servers.html

The video clearly shows a +5V molex connector soldered to (one of the two) memory modules. Do you suppose it powers a switch?

Thank you, 73,

--
Don, KB7RPU 
There was a young lady named Bright Whose speed was far faster than light; 
She set out one day In a relative way And returned on the previous night.
Reply to
Don KB7RPU

The video seems to show the ECC memory module hack near the start. But it's hard to tell for certain because a big fat label covers up ?four? chips on the left side of the module. The hack itself entails +5V, Ground, a PCB, and two chips on the right side of the module. It's blurry, but one white wire goes down to connect to DQ48, DQ49, or some other data line in that neighborhood.

Thank you, 73,

--
Don, KB7RPU 
There was a young lady named Bright Whose speed was far faster than light; 
She set out one day In a relative way And returned on the previous night.
Reply to
Don KB7RPU

why only a single cycle? ECC is on a word (=32bits + ECC) by word basis.

--
     ?
Reply to
Jasen Betts

or

The details of the ECC mobo spec are unknown to me.

Doesn't the processor read 64 bits read in a single cycle? Does ECC simultaneously correct two bits, one for each word? Does ECC correct an infinite string of single bit errors within a word? Is this link applicable to the hack?

formatting link

Thank you, 73,

--
Don, KB7RPU 
There was a young lady named Bright Whose speed was far faster than light; 
She set out one day In a relative way And returned on the previous night.
Reply to
Don KB7RPU

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.