EEPROM/Flash data loss: probability & patterns?

Hi group, I am asked to rewrite a stable-storage scheme on a Motorola HC12 (in its' 4K EEPROM). The module momentarily in use does an array of small (16 byte) records, each with a checksum and a 1-bit correction code. The fixed sized records are ok (to avoid fragmentation) but I highly doubt that a 1-bit correction will win anything with respect to faulty data in an electrical r/w memory. Which are the most common cases of corrupted data in the presence of the following 3 error sources?

- data loss due to wear stress: at the end of the lifetime, errors due to this will increase, but will they come bit by bit? How reliable will a cell be after it showed its' first error? Better to avoid it after that?

- data loss due to power down while erasing/writing: I think that whole words of memory will come out wrong in that case, hardly ever a single bit alone.

- wrong data due to spikes on data/address lines: this sounds like the most probable source of 1-bit errors but will be less of a problem for single-chippers like the HC12. Any hints on that?

I am under the impression that for errors of the kind that will show up in an EEPROM you will have to invest a considerable amount of error correcting bits if you want to catch a significant percentage of them. I will duly stand corrected in case someone has the expertise...

regards, Mark

PS: the persistent memory will of course not rely on the correction code but rather on the right storage strategy; the correction should only offer a higher access probability for the "youngest" data.

-- Mark Piffer MCU and DSP programming & software design

Reply to
Mark Piffer
Loading thread data ...

Check out this article

formatting link

Reply to
jtp

I agree. I think it's not feasible to use a checksum and only 1 bit for a data correction strategy. As it is a checksum is barely reliable to detect anything but a single bit fault in a packet. If you can spare the CPU overhead, then a CRC would be more desirable. You could adopt a error correction scheme, but this would be governed by the criticality of the device and the cost of the extra EEPROM uasge overhead.

I would think that once detected, this will become a degenerative fault for that location.

Basically this fault will never happen unless the hold-up capacitors and/or the low power detection circuitry are faulty. In which case you have a systematic fault & the device is operating out of spec. I'm assuming that the device has been initially designed for the use case of removing power.

In most cases the power supply of the device will filter the spikes. In a quiescent state it's unlikely that the EEPROM would be affected. Obviously the EEPROM is most vunerable when it is writing. Also exposure of the micro interfaces to the outside world (that is outside of its box) increases the risk. As I see it the mitigation of this occurrence has 2 forms, 1) a watchdog or similar mechanism is required for program execution recovery, 2) a reliable EEPROM fault detection scheme and the resetting back to "safe" default values.

This is my take on it also. It all depends on how valuable the data is.

Ken..

+====================================+ I hate junk email. Please direct any genuine email to: kenlee at hotpop.com
Reply to
Ken Lee

Also check my ESP articles "Forget Me Not" and "A Version Therapy", which cover a double buffering technique and some tips on ensuring that your eeprom remains valid when the s/w is upgraded.

The articles are available online atat

formatting link

=========================== See the User Interfaces for Embedded Systems Page at

formatting link

Reply to
Niall Murphy

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.