Creating RAM faults

B

Brendan 19 years ago

Hi,

I'm working on 80x86 desktop PCs (and I've never done anything with embedded systems), however I have a strange problem that I think the people here are best qualified to help me with.

I've written some code to test RAM for faults, but have no faulty RAM to test it on. I'm looking for ways to create RAM faults in good RAM.... :-)

To make this sound even more silly, I can't necessarily disable the BIOS's memory test in most of my test computers. Some of them have a "fast boot" option that uses a minimal memory test instead of a more thorough test, and some of them have a "press ESC to skip" option. In any case I'd need to create faults that pass the BIOS's test in cases where it's unavoidable.

Mostly what I'm hoping for is intermittent and/or subtle RAM faults that usually require burn-in testing to detect, although any fault that passes the BIOS's tests would be very helpful.

So far I've thought of 3 methods that may or may not work.

The first is finding a computer that allows the RAM settings to be overclocked in the BIOS, and then overclock the RAM until it fails. Unfortunately I've tried setting the fastest possible RAM timing on 4 of my test machines, which didn't change anything (no sign of unreliability).

The next method would be to put small pieces of paper (or some other material) between the RAM and it's socket to create artificially unreliable connections on selected pins. Alternatively, I could place "things" (resistors and/or small capacitors perhaps) between pins (although I wouldn't want to damage the memory controller, unless it was pre-planned "once only" final test).

The last method involves taking some good memory and permanently damaging it. In this case I'd be looking at damaging some spare 72-pin EDO, but I have no idea what the best way to damage the RAM would be - it'd be hard to predict what effect zapping it with some static (or a known voltage that exceeds it's specifications) would do.

If anyone has some comments, suggestions or advice I'd be interested in hearing them... :-)

Thanks,

Brendan

Vote

J

Jim Granville 19 years ago

'Hard' or fixed errors should be the easiest to test for, so you do not want to damage or zap a fixed line. The most comprehensive ram tests cover moving patterns, that catch 'ram crosstalk', where a write corrupts another cell, not just lossy cells themselves.

So, I'd take a ram module, and remove all the decoupling caps :)

-jg

Vote

B

Brendan 19 years ago

This sounds good, but I can't find any decoupling capacitors!

I've looked at some SDRAM, some 72-pin EDO and some old 32-pin stuff - in all cases there's either 16 or 8 surface mount ICs and nothing else. I'm not sure if the decoupling capacitors are built into the ICs, or if I should be looking on the motherboard itself (which doesn't make as much sense as "inbuilt" to me, but...).

The other thing I was wondering is where the refresh comes from in 72- pin DRAM modules - if there's an external refresh signal, then I'd assume disconnecting it would cause some RAM faults. Is there a diagram somewhere showing what each of those 72 pins is for?

Thanks,

Brendan

Vote

T

Tom Lucas 19 years ago

Quick Google....

formatting link

You could open circuit one of the RAS or CAS lines and that will give some funky fun.

Vote

B

Brendan 19 years ago

Google....

formatting link

Ok - I found these:

formatting link

The latter diagram looks like a newer version of the same thing (previous N/C pins used for additional RAS and presence detect lines). Neither show anything that looks (to me) like a "bypass-able" refresh

- I'm assuming the RAS lines are used instead of something seperate.

Hmm - catastrophic funkyness, or subtle funkyness that'd get past BIOS tests? :-)

Thanks,

Brendan

Vote

C

Cupid 19 years ago

Software based fault injection methods are best suited to situations where faulty chips are not available. For example, stuck-at and coupling faults can be simulated by fault injection. (see www3.informatik.uni-erlangen.de/Publications/Articles/ hoexer_hase2005.pdf and

formatting link

Here's a brief overview of tools and techniques for fault injection (dslab.epfl.ch/courses/pods/winter06-07/readings/hsueh-injection.pdf). Also see "SWIF-IT: A Tool for Memory Fault Injection and Protection"

formatting link

VLSI test literature includes an extensive coverage of memory tests. Here are some main types of faults you should test for: permanent faults: missing/extra electrical connections faulty components burnt-out chip wire corroded connection between chip and package logic errors transient faults: temporary open/short due to air pollution/humidity/pressure/ vibrations temporary logic malfunction due to high or low temperatures/power supply fluctuations signal coupling due to electromagnetic interference state changes due to static electrical discharges misinterpretion of logic values because of ground loops intermittent faults: loose connections old components causing changing signal arrival times hazards and races in Critical Paths (poor design) timing faults due to varying resistors capacitors, and inductors state changes due to electrical noise

Several different memory tests (MATS, MATS+, MATS++, MARCH X, MARCH C-, MARCH A, MARCH B, MARCH Y) can be used to achieve different levels of fault coverage.

Vote

V

Vladimir Vassilevsky 19 years ago

You can always simulate the faults by the software :-)

The only things I can think of is to tweak the power supply voltage on RAM, and/or add capacitors to ADDR/DATA lines to mess up the signal integrity. You can also play with the line terminators.

Vladimir Vassilevsky

DSP and Mixed Signal Design Consultant

formatting link

Vote

M

Mike Anton 19 years ago

You could try freezing it. Most commercial temperature RAM will start to have problems around -40C. It's worth a try at least.

Mike

Vote

H

Hans-Bernhard Bröker 19 years ago

So you want to make your RAM ill, but only if and when it's convenient.

Well, a dose of radiation can do that. Getting yourself and your lab licensed to work with the kind of radioactive substances that would be sufficiently disruptive for an efficient test might be too expensive, though. I participated in a rather extreme version of such a test, once: we used an full-blown particle accelerator shooting heavy ions to test-fry some ICs considered for use in the international space station. Believe me: that beast can disrupt pretty much everything electronic. The tricky part in such an experiment actually is to keep transient defects caused by localized short-circuits induced by the beam from turning into chip-wide, everlasting damage.

For a less dramatic and lasting effect, forget about damaging the RAM itself --- just disturb its communication with the CPU. The CPU won't be able to tell if incorrect signals originated in the RAM itself, or got garbled underway. An elecrostatic discharge close to (but not! onto) the board often works. Easiest way to get one is a piezo-based lighter.

Vote

P

Paul Keinanen 19 years ago

Early DRAMs (4-16 Kibit) had a lot of soft errors, so you had to use parity or ECC even with small boards (a few dozen chips). These soft errors were caused by alpha particles and later on, it was discovered that these particles were emitted by the plastic used in the ICs. Changing the plastic reduced the problem very much.

Getting hold of an alpha source should be enough to cause soft errors even in current chips. However, even a sheet of paper will stop alpha particles, so the plastic around the actual chip would have to be removed to hit the memory capacitors or sense lines directly.

Paul

Vote

C

CBFalconer 19 years ago

... snip ...

I used to keep a stock of bogey RAM chips for just that purpose. Whenever final test of a memory system showed up bad chips, we kept them in the bogey bin. Some were only uncovered by thrashing with a separate parity board. The parity interrupt software detailed the chip and bit, so we could use it in the field to uncover marginal things.

"A man who is right every time is not likely to do very much." -- Francis Crick, co-discover of DNA "There is nothing more amazing than stupidity in action." -- Thomas Matthews

Vote

C

Colin Paul Gloster 19 years ago

Hans-Bernhard_Broeker posted: "[..]

Well, a dose of radiation can do that. Getting yourself and your lab licensed to work with the kind of radioactive substances that would be sufficiently disruptive for an efficient test might be too expensive, though. [..]

[..]"

Many possibilities have been mentioned to Brendan, and I will continue with the topic of radiation whether or not it is ideal for his situation.

A cheap way to access a radiation source for Single Event Effects is by being nice to a hospital's staff and asking for permission to subject your hardware to an x-ray machine.

Brendan mentioned desktops, but if a laptop or notebook would be acceptable, the chances of memory being disruptively zapped by neutrons is significantly higher in an airplane than at sealevel or at the altitudes of spacecraft.

Regards, Colin Paul Gloster

Vote

V

Vladimir Vassilevsky 19 years ago

I admire theoreticians.

Have you ever tried to apply X-rays to a semiconductor device? BTW, I did. And here is what happens:

Up to some critical exposure, everything looks normal. After the threshold, there is a massive permanent damage.

Will you please separate your groundless speculations from what you really know.

Vladimir Vassilevsky

DSP and Mixed Signal Design Consultant

formatting link

Vote

T

Tom Lucas 19 years ago

In spite of all the other excellent suggestions, I usually find the best way to induce a fault in a device is to be absolutely dependent on it and have no means of replacing it. To increase chance of failure then tell your boss that you absolutely guarantee you'll meet your deadline. :-)

Vote

C

Colin Paul Gloster 19 years ago

Dear Vladimir Vassilevsky,

Vladimir Vassilevsky posted on Thu, 15 Feb 2007 15:32:03 GMT on Usenet:

"Colin Paul Gloster wrote:

[..]

I admire theoreticians."

Theoreticians often make models of disaster for radiation affecting electronics, whereas empirical results have shown otherwise. Nobody knows what really happens, but interpretations are made of data.

"Have you ever tried to apply X-rays to a semiconductor device?"

Not personally and I prefer to leave the dangerous work to people who do not mind destroying their health.

"BTW, I did. And here is what happens:

Up to some critical exposure, everything looks normal. After the threshold, there is a massive permanent damage."

This does not always happen. This does not always happen with gamma rays either, but the suggestion of x-rays from a hospital seemed to be easier for Brendan. People are very susceptible to making simple-minded -- incorrect but not necessarily harmful -- generalizations re radiation and safety.

"> Brendan mentioned desktops, but if a laptop or notebook would be

Will you please separate your groundless speculations from what you really know."

No thing exists such that someone exists such that someone knows that thing. As knowledge is impossible, anything which is either groundless speculation of mine or what I really know is groundless speculation. I used to say "maybe" in almost every sentence I spoke as an infant, but I have stopped doing so even though it ideally fits into everything one says.

Spacecraft electronics are not particularly prone to Single Event Effects (S.E.E.s, see if I might had made that term up) induced by neutrons because a neutron is uncharged and so is not amenable to readily interacting with matter and a neutron with enough energy to cause an S.E.E. would be traveling so quickly that it has an infinitesimal probability of striking a particle which is part of an atom which it is passing through: the summation of the volumes of the electrons and nucleons in an atom is exceedingly small in relation to the volume of an atom (in much the same way that the summation of the volumes of the planets in the solar system is much smaller than the volume of the solar system). S.E.E.-inducing neutrons are not mentioned in the list of the types of particles handled by SPENVIS on

formatting link

(N.B. "the current models of the neutral atmosphere" mentioned on that webpage have nothing to do with energetic, S.E.E.-inducing neutrons), and I plan to followup to this post with an attachment of a censored file from SPENVIS re particles for radiation for S.E.E.s. N.B. I do not wish to give SPENVIS undue credibility in this paragraph: I have noticed a number of inaccuracies (mildly expressed!), e.g. as I had mentioned on 25 Aug 2005 15:08:22 GMT on

formatting link

: "I have a warning about SPENVIS's critical charge-based SEU calculator with default settings. Today I ran this SEU calculator on a non-SSETI SPENVIS project I had worked on for course work (I gave you a copy of the course work's report), and it claimed SEU rates which were at least three orders of magnitude worse than SEU rates calculated (and agreed by the examiner) for the same SPENVIS project (which was based on a real mission which has been flying for years and has not been suffering SEU rates similar to those from SPENVIS's critical charge-based SEU calculator).

The radiation results from SPENVIS are not to be discounted: the inaccuracies mentioned above are specific to one of the SEU- calculating sections of SPENVIS."

I had mentioned other points re memory errors induced on spacecraft by radiation on:

formatting link

and

formatting link

and

formatting link

but neutrons were so insignificant that I had not mentioned them there.

I mentioned the idea of a hospital for radiation as a cheapskate who does not believe that supposedly space qualified devices are truly of good enough quality to use in space told me that his company managed to avoid spending a lot of money by using a device with a radiation source in a hospital. To be honest, off the top of my head I do not remember whether it was an x-ray machine, but I can check, but I do remember the following well enough (because my telephone logs for 2005 are within an arm's reach) to post about it now: inspired by that cheapskate, I sought to investigate how to access medical radiation equipment on July 25th, 2005. However, for whatever reason (perhaps photons were not important for that orbit; or perhaps because I wanted to determine how much shielding for other particles for estimated radiation energies of protons/ions/electrons would be needed (unlike photons, they can be completely shielded against)), I did not want photons but unfortunately Prof. E. K. J. Pauwels in the Division of Nuclear Medicine in the Department of Radiology in the Leiden University Medical Center informed me that in the medical field, x-rays are the only type of radiation used when I phoned him on July

25th, 2005 (not that he would recognize my name - no point in telling him my name with the conversation so abruptly terminated with an unsatisfactory outcome).

Yours sincerely, Colin Paul Gloster

Vote

P

Paul E. Bennett 19 years ago

[%X]

You roused some interesting and even wacky responses with this one.

With a little hardware interposition between the (good) RAM and the PC (some buffers and gates on all the lines) you could create a RAM Fault Simulator. By adjusting strobe delays, data-bit delays or just plain swapping or blocking of address or data-lines you could create the sort of faults you would be interested in. You would have a few lines off the card in order to apply the fault conditions when you needed to. Just remember to use the appropriate logic family components for the memory bus.

******************************************************************** Paul E. Bennett .................... Forth based HIDECS Consultancy ..... Mob: +44 (0)7811-639972 Tel: +44 (0)1235-811095 Going Forth Safely ..... EBA. www.electric-boat-association.org.uk.. ********************************************************************

Vote

H

Hans-Bernhard Bröker 19 years ago

That may not be as easy as you think. X-Rays meant to make images are optimized for the wrong kind of usage. They aren't point-like enough to create truly localized defects.

Vote

G

Grant Edwards 19 years ago

Therapeutic X-ray souces are a lot more focussed (and stronger). Of course in the die-layout realm the therapeutic sources aren't at all point-like either. You should be able to cook something pretty good with one though...

Grant Edwards grante Yow! I'm pretending that at we're all watching PHIL visi.com SILVERS instead of RICARDO MONTALBAN!

Vote

H

Hans-Bernhard Bröker 19 years ago

... but for someone who had to ask the original question, the odds of getting this plastic removal done without causing accidental damage to the chip are just about negligible.

That was one of the reasons why we used a particle accelerator. It shoots atomic nuclei a good deal heavier than alphas, and at specified energy per nucleon higher than most alpha sources, so you can control just how deep they will penetrate. And it's a thin ray, so you can even control where you hit. The drawback: there are only a handful of machines of this kind, worldwide, and getting ray time there is hard.

Vote

C

CBFalconer 19 years ago

All you need is a Windhurst machine, a suitable ion generator, a vacuum column, and suitable vacuum pumps. The result is called a Van de Graaf machine. :-)

"A man who is right every time is not likely to do very much." -- Francis Crick, co-discover of DNA "There is nothing more amazing than stupidity in action." -- Thomas Matthews

Vote

Creating RAM faults

Join the Discussion

Didn't find your answer?