Continous eeprom checksum microcontroller

V

Vishal 22 years ago

Hi, Anybody come across continous background checksum tests on eeprom?? Is it worth doing ??

Vote

U

Unbeliever 22 years ago

Yes, I do it all the time. Whether it's worth doing depends on whether you need to know if the memory has failed. Probably not for a musical greeting card, probably yes for a radiotherapy dosing controller.

Cheers, Alf

Vote

S

Spehro Pefhany 22 years ago

a) I've implemented it. b) I think so.

Best regards, Spehro Pefhany

"it's the network..." "The Journey is the reward" speff@interlog.com Info for manufacturers: http://www.trexon.com Embedded software/hardware/analog Info for designers: http://www.speff.com

Vote

F

Frank Bemelman 22 years ago

"Spehro Pefhany" schreef in bericht news: snipped-for-privacy@4ax.com...

The question is what to do when an error is detected...

*** Keyboard not present *** Press [F1] to continue.

The only I thing I once did, was a small database with maximum and minimum allowed values and preferred defaults for each parameter, which were used to validate the user input during setup, and for sanity checks during startup. The values would be put back to their defaults, if out of limits. Silently. When more than 3 errors, the entire set of parameters would be set to default, which was useful for production.

I have had some problems with first generations of EEproms, but not the last 5 years or so, not that I noticed anyway ;)

Thanks, Frank. (remove 'x' and 'invalid' when replying by email)

Vote

J

Jim McGinnis 22 years ago

Can be worthwhile in a high-reliability system. Compared to the usual power-on checksum, it has the advantage of testing in the actual execution environment: elevated device temperatures, supply voltage levels and noise affected by active peripherals, etc. In a dialysis machine years ago, I tested ROM and RAM in a background task. The RAM was tested a byte at a time, disabling interrupts briefly while the RAM contents were modified.

Assuming the device is on an external bus, you're really testing all of the signals on the bus - decodes, control lines, address and data - so failures can be from problems other than in the memory itself.

Jim McGinnis

Vote

S

Spehro Pefhany 22 years ago

Loss of calibration data can be the equivalent of a hard failure- there's no obvious default values to set it to.

Best regards, Spehro Pefhany

"it's the network..." "The Journey is the reward" speff@interlog.com Info for manufacturers: http://www.trexon.com Embedded software/hardware/analog Info for designers: http://www.speff.com

Vote

F

Frank Bemelman 22 years ago

"Spehro Pefhany" schreef in bericht news: snipped-for-privacy@4ax.com...

Yes, for calibration data some redundancy is nice, and auto-repair facility. Or a calibration certificate, on luxury paper and interesting stamps all over it.. or display a nag-screen "calibration required" ;)

Thanks, Frank. (remove 'x' and 'invalid' when replying by email)

Vote

P

Paul E. Bennett 22 years ago

If the integrity requirements for your system indicate that such checking is useful then it is definitely worth doing. As some others have indicated, it can catch problems with bus addressing and/or data shared pathways. The most difficult thing about deciding to use continuous eeprom checksum is what you need to do if you discover a problem.

******************************************************************** Paul E. Bennett .................... Forth based HIDECS Consultancy ..... Mob: +44 (0)7811-639972 .........NOW AVAILABLE:- HIDECS COURSE...... Tel: +44 (0)1235-811095 .... see http://www.feabhas.com for details. Going Forth Safely ..... EBA. www.electric-boat-association.org.uk.. ********************************************************************

Vote

H

Hans-Bernhard Broeker 22 years ago

The answer to this, like to all truly interesting questions, is: "It depends".

In the case at hand, it depends on things like:

1) What's more likely: the EEPROM to fail in operation, but without triggering a watch-dog reset or other safeguards; or your EEPROM checking routine itself to fail, for some unrelated reason?

2) Can you afford the extra CPU load for doing this?

3) Does it pay? Or is it even required, maybe?

And remember: be careful asking questions if you can't do anything about the answers. In other words, if the only known cure is worse than the disease, it may not be worth diagnosing the disease.

Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de) Even if all the snow were burnt, ashes would remain.

Vote

A

Anton Erasmus 22 years ago

Hi,

The most important thing when asking if any sort of error checking is worthwhile, is what do you do if the answer is that there is a problem . If you cannot do something practical/realistic with the answer, then do not bother. Anything that can actually be checked and realistically be used to improve reliability is worthwhile.

Regards Anton Erasmus

Vote

P

Paul E. Bennett 22 years ago

It can be useful to ask the question even if nothing can be done about the fault condition other than letting the system die (hopefully with the outputs all set to a safe state).

As for errors in the checking code, this should be simple enough that it remains within the scope of Fagan Inspections to find out if this is flawed or operating correctly. There are other sanity checking measures that can be taken to ensure the system is generally behaving too.

******************************************************************** Paul E. Bennett .................... Forth based HIDECS Consultancy ..... Mob: +44 (0)7811-639972 .........NOW AVAILABLE:- HIDECS COURSE...... Tel: +44 (0)1235-811095 .... see http://www.feabhas.com for details. Going Forth Safely ..... EBA. www.electric-boat-association.org.uk.. ********************************************************************

Vote

V

Vishal 22 years ago

Thanks.am doing it.Cant afford to neglect a wrong calibration. Thanks.

Vote

V

Vishal 22 years ago

Vote

H

Hans-Bernhard Broeker 22 years ago

They ask for it, they pay for it --- that's obviously worth something.

But to be sure, ask your customer. If they thought of requesting such a feature by themselves, odds are they have reasons.

Seems your customer knows more about the project than you do --- this might be a danger to your long-term business prospects if you don't get up to speed fast.

Vote

G

Guy Macon 22 years ago

I would ask for the justification behind the assumption that the place where you get the data to be written to the register is less likely to be corrupt than the data already in the register is.

You might wish to look at my resume (see below) and then call me in for a one-day requirements review.

Guy Macon, Electronics Engineer & Project Manager for hire. Remember Doc Brown from the _Back to the Future_ movies? Do you have an "impossible" engineering project that only someone like Doc Brown can solve? My resume is at http://www.guymacon.com/

Vote

V

Vishal 22 years ago

OK , I will rephrase the question.Is the register area of a micro more reliable than that of static RAM? I got this guideline doc (requirement doc as they call it) which asks for reg refresh.Since the document is quite old , i was wondering if it makes sense to do reg refresh.

Thanks.

Vote

R

Rafael Deliano 22 years ago

As data is continually changing there keeping a checksum is not practical.

Some old and new mainframes have ECC errorcorrection for 1 or 2 bits per word for the external RAM. A background task ( "scrub" ) is reading & writing the memory thereby preventing 1 and 2 bit errors that can be corrected building up to 3 bit errors that cannot be corrected. For that "refresh" of external RAM one would have to implement the ECC in software. For registers in the controller not practical either.

MfG JRD

Vote

P

Paul Keinanen 22 years ago

It is fully practical (for the chip maker) and it has been used at least in the past. On some ECL (Emitter Coupled Logic) processors, which definitely are not sensitive to radiation, had parity bits on internal data paths and internal scratch pad memories.

Even on some ALU operations it makes sense to calculate the parity. For instance, on addition, count the parity on both the operands separately, sum the two parity bits, calculate the parity of the parity sum (full adder) and compare it with the parity calculated from the ALU sum parity.

This is currently used when cosmic radiation flips individual memory cells in various satellites.

Paul

Vote

G

Guy Macon 22 years ago

It is the job of the person making the request to prove any reliability differences between register and RAM. It is your job to indentify any outdated requirements. A guideline is not a requirement and a requirement is not a guideline. You really, really need an experienced embedded systems engineer to do a one-day requirements review and later to do a one-day design review. You are in dire need of help. The cost of an airplane ticket and of hiring a consultant is cheap compared to the cost of project failure, and I fear that project failure is where you are headed. Get help.

Guy Macon, Electronics Engineer & Project Manager for hire. Remember Doc Brown from the _Back to the Future_ movies? Do you have an "impossible" engineering project that only someone like Doc Brown can solve? My resume is at http://www.guymacon.com/

Vote

I

Ingo Cyliax 22 years ago

I did some radiation effects testing on PIC's a while ago. From memory on the particular PICs I tested, random singlebit errors (soft errors) were just as likely in SRAM as they are in registers.

PROM/EPROM is more reliable for soft errors; but they develop hard errors over time (but it took lot of exposure).

The test involved doing running checksum tests on program PROM, registers, checkpoints to catch flipped PC bits, and testing for watch dog resets. One version of the test also tested serial EEPROM. The chips were put in a beam line and exposed until none of them worked.

For high reliability applications, it's a good idea to program very defensibly and use the watch dog reset. Bascially, expect single bit errors in calculations, flipped bits in registers and SRAM and the PC.

There was an article reference (I think slashdot or embedded.com), that talked about the programming the space shuttle computer. Much of the effort is lots of code reviews and talking about what ifs. E.g. if I detect an error here, what do I do.

See ya, -ingo

/* Ingo Cyliax, cyliax@ezcomm.com, Tel: 812-391-0895 */

Vote

Continous eeprom checksum microcontroller

Join the Discussion

Didn't find your answer?