EEPROM guarantees after power loss during a write

John Devereux · 2008-02-05T18:28:45+00:00

Hi,I am wondering what guarantees are there for existing EEPROM data,after power is lost during a write operation?I am writing a datalogging routine that writes records to anEEPROM. It's an Atmel 24C1024, although the question is probablyapplicable to other devices too. This uses "page mode" for writes -the device seems to be organised as 256 byte pages.Say power is lost during a write to a single byte in a page. What canI assume? Is just that byte suspect, or the whole page (or the wholedevice)? The microcontroller has brownout protection, so isn't going to runwild - but what about the EEPROM internal state machine? Are theygenerally protected against brownout?If I write a single byte, does this in fact involve a hiddenerase/write of the whole page?I can't find any information on this stuff.-- John Devereux

S

ssubbarayan 18 years ago

..

Hi John, I will continue using brackets while posting long links. By the way,I have a question regarding your implementation. In your algorithm to make two copies of any data in nvram,what if during updation to both the copies you encounter power brownout?Since power brown outs are unpredictable,how are we going to guarentee atleast one good copy exists with us? The scenerio which I am referring to here would be the first time when you are updating the data.During first updation,you wont be able to ascertain whether the copy is good or bad.

Another question is at what point of time you would update the validity flag for the data? On what basis you would come to know data is valid given that you dont have a checksum?

I am sorry if these questions look amature,I am trying to understand it and felt your algorithm is more simpler then mine except for extra memory needed for having copies.

Looking farward for your reply and advanced thanks, Regards, s.subbarayan

Vote

C

CBFalconer 18 years ago

... snip ...

FYI the proper way to transmit links is within pairs. See the page URL in my sig. below for an example. Another would be:

[mail]: Chuck F (cbfalconer at maineline dot net) [page]: Try the download section.

Vote

J

John Devereux 18 years ago

[...]

They have to be *angle* brackets, < >. But you are using google groups, which usually scrambles everything up anyway.

update():

a) mark copy 1 invalid b) write new copy 1 c) mark copy 1 valid

[same again for copy 2]

startup(): any copy marked invalid is replaced by the copy marked valid.

The steps happen in strict order. Each previous step must complete successfully before the next is started. So the only way the valid flag can be set is if the data has been successfuly written, without interruption.

The data is marked valid only *after* it has been successfully written. If writing of data is interrupted, then the flag never set either. So next time it powers up we know that copy may be bad, and restore from the good one.

There is always at least one good copy.

Let us look at what happens if programming is interrupted during a,b,d above.

a) The copy 1 valid *flag* is left in an unknown state. But the actual data is valid. So either the startup will see it invalid and restore the data, or it sees it valid and all is OK.

b) The data is marked invalid, and the *data* is left in an unknown state. This is OK, the startup will see the invalid flag and restore the data.

c) The data has been correctly written, but the valid flag is left in an unknown state. If the startup sees the flag as valid, that is OK, because the data is in fact valid. If it sees it as invalid, the data will be restored from the other copy. Still OK.

Obviously this make a few assumptions: the eeprom has not worn out, and that there is some brownout protection so that the CPU does not go crazy and erase everything.

Another assumption is that the flags are either programmed or not programmed. But what if the flag programming gets interrupted so that the flag state is not only unknown, but is actually *unreliable*. That is, it is only "half programmed" (or half erased), so sometimes reads "valid" and sometimes "invalid"? In this condition the state read could depend on temperature,age or supply noise.

It would require a very unlikely sequence of events, but you could have:

update() ... mark copy 2 invalid write copy 2 mark copy 2 valid

Then on power up, copy 2 valid flag is unreliable. But at startup happens to read OK.

Then next time we do an update, we get *another* power cut, this time during copy 1 update. And at power up, this time copy 2 reads

*invalid*. So we have no valid copies.

I think the solution is to reprogram the "valid" flags every startup.

I find it a difficult area, too. (And it gets harder if you start thinking about wear-levelling or if you don't want to allocate a whole page to a record, or if the record does not fit in a single page...)

John Devereux

Vote

D

David Brown 18 years ago

A better method is to have a version stamp along with your data. You have two blocks, each structured as "version stamp, data". At startup, you verify each block based on having a valid version (and possibly a checksum as well, if you are particularly paranoid). The latest valid version shows which block you use as your data.

For an update, you erase the block containing the older version of the data. Then you save your data to this block, then you write your new version stamp. There is no need to write your data a second time - it gives no advantages, and halves your eeprom/flash life expectancy.

Vote

J

John Devereux 18 years ago

[...]

That does seem a better idea. I have used versioned structures before, for a flash based system. So I don't know why I did not suggest it here too.

John Devereux

Vote

S

ssubbarayan 18 years ago

John, My only worry was getting atleast one good copy.In your whole algorithm,you have assumed atleast one good copy exists.I was wondering what would be situation when the first time(no copy available,freshly you are writing data),and you encounter power brown out situation.I guess in this scenerio theres nothing you can do about it.How ever if you have any solutions in mind for this,please let me know.

Regards, s.subbarayan

Vote

J

John Devereux 18 years ago

[...]

Firstly, Davids algorithm is better - use a version number based system like he describes.

For any possible algorithm, if the power fails during writing of data, you are always going to lose *that version*. Just as if the power failed before you started to write it.

Assuming your eeprom is initially filled with 0xff, and a 32 bit version number, then a version number of 0xffffffff (or -1) would indicate a missing copy.

John Devereux

Vote

J

John Devereux 18 years ago

Cool - I was thinking of avoiding the wrap entirely by having a range so high it would never happen :)

I would still love to know, for sure, that a write to part of a page does not involve an internal erasure of the entire page. Without knowing this each version stamp needs a page of its own as far as I can see. The act of writing the version number must be guaranteed not to upset the data it refers to, if it gets interrupted.

I think I will have to try and test this.

John Devereux

Vote

D

David Brown 18 years ago

It's actually enough with the versioning stamp to distinguish between invalid, and newer or later versions. All you really need are versions

1, 2, and 3, and wrap to 1 again after 3. Anything other than 1, 2, or 3 is invalid.

One thing to watch out for, however, is the possibility of corruption at addresses other than the one you are writing. External serial eeproms generally have protection against this, but Atmel AVRs are known to be able to corrupt byte 0 of the eeprom if they get a reset during a write (the address register gets cleared to 0, but the write continues - thus the data at address 0 may be half overwritten). The same problem can probably occur on many other eeproms - I don't know if the AVRs are a particular high risk, or if Atmel is just unusually honest!

Vote

A

Arlet 18 years ago

To test the system, you could make a simple test jig that switches the power to your board. Use another controller to switch the power in random intervals. The random interval timing should match the discharge rate of the power supply capacitors such that the board suffers a lot of brown out conditions. Add an extra R/C filter if necessary.

On the device you're testing, set up some special firmware that continously writes updates to the EEPROM. Instead of real data, write a verifiable test pattern, and have the software check it regularly. If it finds corrupted data in a 'valid' block, trigger an alarm.

Then leave the test setup in a corner of the lab, 24/7.

Vote

J

John Devereux 18 years ago

That sounds like a good idea to test a finished routine.

But to get the initial information needed to write it, I am thinking of this:

- Hack the electronics so the EEPROM can be powered from an output pin - hack my eeprom_write routine so that a timer can interrupt power to the EEPROM and hold the I2C pins low (so the eeprom is definitely unpowered).

That allows the timer to interrupt programming using a precise time delay that I can sweep though a range of values. For each value I can

- print the eeprom page contents (to a serial port) - reprogram the entire page with a test pattern - start the timer and the page programming test (different pattern, only alters part of page)

I should be able to see any partial erasures, partial programming, and also any erasure of bytes on the same page outside of the program area.

Perhaps I would printout extra regions like address 0 and parts of adjacent pages.

This does not simulate a real system since there is no "brownout" state. So I still need something like your setup as a final verification.

John Devereux

Vote

EEPROM guarantees after power loss during a write

Join the Discussion

Didn't find your answer?