Continous eeprom checksum microcontroller

Hi, Anybody come across continous background checksum tests on eeprom?? Is it worth doing ??

Reply to
Vishal
Loading thread data ...

Yes, I do it all the time. Whether it's worth doing depends on whether you need to know if the memory has failed. Probably not for a musical greeting card, probably yes for a radiotherapy dosing controller.

Cheers, Alf

Reply to
Unbeliever

a) I've implemented it. b) I think so.

Best regards, Spehro Pefhany

--
"it's the network..."                          "The Journey is the reward"
speff@interlog.com             Info for manufacturers: http://www.trexon.com
 Click to see the full signature
Reply to
Spehro Pefhany

"Spehro Pefhany" schreef in bericht news: snipped-for-privacy@4ax.com...

The question is what to do when an error is detected...

*** Keyboard not present *** Press [F1] to continue.

The only I thing I once did, was a small database with maximum and minimum allowed values and preferred defaults for each parameter, which were used to validate the user input during setup, and for sanity checks during startup. The values would be put back to their defaults, if out of limits. Silently. When more than 3 errors, the entire set of parameters would be set to default, which was useful for production.

I have had some problems with first generations of EEproms, but not the last 5 years or so, not that I noticed anyway ;)

--
Thanks, Frank.
(remove 'x' and 'invalid' when replying by email)
Reply to
Frank Bemelman

Can be worthwhile in a high-reliability system. Compared to the usual power-on checksum, it has the advantage of testing in the actual execution environment: elevated device temperatures, supply voltage levels and noise affected by active peripherals, etc. In a dialysis machine years ago, I tested ROM and RAM in a background task. The RAM was tested a byte at a time, disabling interrupts briefly while the RAM contents were modified.

Assuming the device is on an external bus, you're really testing all of the signals on the bus - decodes, control lines, address and data - so failures can be from problems other than in the memory itself.

--
Jim McGinnis
Reply to
Jim McGinnis

Loss of calibration data can be the equivalent of a hard failure- there's no obvious default values to set it to.

Best regards, Spehro Pefhany

--
"it's the network..."                          "The Journey is the reward"
speff@interlog.com             Info for manufacturers: http://www.trexon.com
 Click to see the full signature
Reply to
Spehro Pefhany

"Spehro Pefhany" schreef in bericht news: snipped-for-privacy@4ax.com...

Yes, for calibration data some redundancy is nice, and auto-repair facility. Or a calibration certificate, on luxury paper and interesting stamps all over it.. or display a nag-screen "calibration required" ;)

--
Thanks, Frank.
(remove 'x' and 'invalid' when replying by email)
Reply to
Frank Bemelman

If the integrity requirements for your system indicate that such checking is useful then it is definitely worth doing. As some others have indicated, it can catch problems with bus addressing and/or data shared pathways. The most difficult thing about deciding to use continuous eeprom checksum is what you need to do if you discover a problem.

--
********************************************************************
Paul E. Bennett ....................
 Click to see the full signature
Reply to
Paul E. Bennett

The answer to this, like to all truly interesting questions, is: "It depends".

In the case at hand, it depends on things like:

1) What's more likely: the EEPROM to fail in operation, but without triggering a watch-dog reset or other safeguards; or your EEPROM checking routine itself to fail, for some unrelated reason?

2) Can you afford the extra CPU load for doing this?

3) Does it pay? Or is it even required, maybe?

And remember: be careful asking questions if you can't do anything about the answers. In other words, if the only known cure is worse than the disease, it may not be worth diagnosing the disease.

--
Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.
Reply to
Hans-Bernhard Broeker

Hi,

The most important thing when asking if any sort of error checking is worthwhile, is what do you do if the answer is that there is a problem . If you cannot do something practical/realistic with the answer, then do not bother. Anything that can actually be checked and realistically be used to improve reliability is worthwhile.

Regards Anton Erasmus

Reply to
Anton Erasmus

It can be useful to ask the question even if nothing can be done about the fault condition other than letting the system die (hopefully with the outputs all set to a safe state).

As for errors in the checking code, this should be simple enough that it remains within the scope of Fagan Inspections to find out if this is flawed or operating correctly. There are other sanity checking measures that can be taken to ensure the system is generally behaving too.

--
********************************************************************
Paul E. Bennett ....................
 Click to see the full signature
Reply to
Paul E. Bennett

Thanks.am doing it.Cant afford to neglect a wrong calibration. Thanks.

Reply to
Vishal

Reply to
Vishal

They ask for it, they pay for it --- that's obviously worth something.

But to be sure, ask your customer. If they thought of requesting such a feature by themselves, odds are they have reasons.

Seems your customer knows more about the project than you do --- this might be a danger to your long-term business prospects if you don't get up to speed fast.

Reply to
Hans-Bernhard Broeker

I would ask for the justification behind the assumption that the place where you get the data to be written to the register is less likely to be corrupt than the data already in the register is.

You might wish to look at my resume (see below) and then call me in for a one-day requirements review.

--
Guy Macon, Electronics Engineer & Project Manager for hire. 
Remember Doc Brown from the _Back to the Future_ movies? Do you 
 Click to see the full signature
Reply to
Guy Macon

OK , I will rephrase the question.Is the register area of a micro more reliable than that of static RAM? I got this guideline doc (requirement doc as they call it) which asks for reg refresh.Since the document is quite old , i was wondering if it makes sense to do reg refresh.

Thanks.

Reply to
Vishal

As data is continually changing there keeping a checksum is not practical.

Some old and new mainframes have ECC errorcorrection for 1 or 2 bits per word for the external RAM. A background task ( "scrub" ) is reading & writing the memory thereby preventing 1 and 2 bit errors that can be corrected building up to 3 bit errors that cannot be corrected. For that "refresh" of external RAM one would have to implement the ECC in software. For registers in the controller not practical either.

MfG JRD

Reply to
Rafael Deliano

It is fully practical (for the chip maker) and it has been used at least in the past. On some ECL (Emitter Coupled Logic) processors, which definitely are not sensitive to radiation, had parity bits on internal data paths and internal scratch pad memories.

Even on some ALU operations it makes sense to calculate the parity. For instance, on addition, count the parity on both the operands separately, sum the two parity bits, calculate the parity of the parity sum (full adder) and compare it with the parity calculated from the ALU sum parity.

This is currently used when cosmic radiation flips individual memory cells in various satellites.

Paul

Reply to
Paul Keinanen

It is the job of the person making the request to prove any reliability differences between register and RAM. It is your job to indentify any outdated requirements. A guideline is not a requirement and a requirement is not a guideline. You really, really need an experienced embedded systems engineer to do a one-day requirements review and later to do a one-day design review. You are in dire need of help. The cost of an airplane ticket and of hiring a consultant is cheap compared to the cost of project failure, and I fear that project failure is where you are headed. Get help.

--
Guy Macon, Electronics Engineer & Project Manager for hire. 
Remember Doc Brown from the _Back to the Future_ movies? Do you 
 Click to see the full signature
Reply to
Guy Macon

I did some radiation effects testing on PIC's a while ago. From memory on the particular PICs I tested, random singlebit errors (soft errors) were just as likely in SRAM as they are in registers.

PROM/EPROM is more reliable for soft errors; but they develop hard errors over time (but it took lot of exposure).

The test involved doing running checksum tests on program PROM, registers, checkpoints to catch flipped PC bits, and testing for watch dog resets. One version of the test also tested serial EEPROM. The chips were put in a beam line and exposed until none of them worked.

For high reliability applications, it's a good idea to program very defensibly and use the watch dog reset. Bascially, expect single bit errors in calculations, flipped bits in registers and SRAM and the PC.

There was an article reference (I think slashdot or embedded.com), that talked about the programming the space shuttle computer. Much of the effort is lots of code reviews and talking about what ifs. E.g. if I detect an error here, what do I do.

See ya, -ingo

--
/* Ingo Cyliax, cyliax@ezcomm.com, Tel: 812-391-0895 */
Reply to
Ingo Cyliax

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.