EEPROM guarantees after power loss during a write

J

John Devereux 18 years ago

Hi,

I am wondering what guarantees are there for existing EEPROM data, after power is lost during a write operation?

I am writing a datalogging routine that writes records to an EEPROM. It's an Atmel 24C1024, although the question is probably applicable to other devices too. This uses "page mode" for writes - the device seems to be organised as 256 byte pages.

Say power is lost during a write to a single byte in a page. What can I assume? Is just that byte suspect, or the whole page (or the whole device)?

The microcontroller has brownout protection, so isn't going to run wild - but what about the EEPROM internal state machine? Are they generally protected against brownout?

If I write a single byte, does this in fact involve a hidden erase/write of the whole page?

I can't find any information on this stuff.

John Devereux

Vote

L

larwe 18 years ago

The answer to this question depends rather much on whether your external brownout protection also asserts the write protect pin...

Vote

J

John Devereux 18 years ago

I would like to know the situation where this does not happen (i.e. no external brownout detection).

Actually in the case of the AT24C1024, it looks pretty useless anyway. It is active high, which still leaves the question of brownout behaviour open. And the datasheet implies it only provides write protection if asserted *before* the write.

John Devereux

Vote

A

Arlet Ottens 18 years ago

You'd have to ask Atmel to know for sure.

In a similar situation, I've added an electrolytic cap big enough to power the MCU and EEPROM for the time necessary to complete the write.

Vote

J

John Devereux 18 years ago

I did that - no answer I'm afaid. :(.

Maybe Ulf knows? (hint, hint) :)

I am trying to avoid any hardware changes. I can afford to dedicate whole pages to records, and to record status "bytes". But if the entire memory is suspect there is not much I can do.

John Devereux

Vote

V

Vladimir Vassilevsky 18 years ago

I don't think anybody can tell for sure what can happen to the flash write state machine when the power goes down at sudden. Hopefully it will not have enough time to destroy the whole device, so something like the journaling file system could help.

You can also consider the autostore NVRAMs from Simtek:

formatting link

Those parts are designed for the random power outages. Works very well indeed.

Vladimir Vassilevsky DSP and Mixed Signal Design Consultant

formatting link

Vote

G

Grant Edwards 18 years ago

Atmel was very up front with me when I e-mailed their support address with that exact question. They said that the byte being written to when the power failed will be undefined, but everything else will be OK.

Grant Edwards grante Yow! It was a JOKE!! at Get it?? I was receiving visi.com messages from DAVID LETTERMAN!! !

Vote

J

John Devereux 18 years ago

Thanks Grant, that's great - it gives me a lot of confidence it is worth doing this.

John Devereux

Vote

J

Jim Granville 18 years ago

If this is important, it sounds like the sort of thing you should run some agressive tests on. Make the power fail during a write, and see what happens ? All writes have to have a 'hidden erase', so check you can see that 'on demand' and then look around for collateral damage....

There are FRAMs, and I saw someone just released a 32KB SPI SRAM too.

-jg

Vote

T

Tim Wescott 18 years ago

And a status pin that tells you that power's still up before you commence the write, and something to make sure that you don't branch off to a higher priority task during the write, and ...

Tim Wescott Control systems and communications consulting http://www.wescottdesign.com Need to learn how to apply control theory in your embedded system? "Applied Control Theory for Embedded Systems" by Tim Wescott Elsevier/Newnes, http://www.wescottdesign.com/actfes/actfes.html

Vote

J

John Devereux 18 years ago

That could work - at least to answer the question of whether an entire page is erased as part of a single-byte write. Perhaps a timer with an output pin hooked up to disconnect power, then vary the delay until I see something interesting. I don't think manually unplugging the supply is work, the page write time is 10ms and for all I know the vulnerable period could be a lot less than this.

John Devereux

Vote

G

Grant Edwards 18 years ago

I can't find that e-mail, and it could have been a different vendor (I've used EEPROMs from several different ones). You probably should press Atmel for an answer.

Grant Edwards grante Yow! PEGGY FLEMMING is at stealing BASKET BALLS to visi.com feed the babies in VERMONT.

Vote

R

Robert Adsett 18 years ago

My experience would suggest brownout protection on the devices themselves may be minimal. Brownout protection on the micro may actually make the problem worse. Do you know (is it documented) what the state of the micro's pins is during reset as opposed to coming out of reset?

Not usually for conventional EE. If it's flash masquarading as EE.....

There does seem to be a fair amount of resistance to providing full details.

Let me share a previous experience with EE

- Environment, bit banged Microwire/SPI, electrically a bit noisy (100's of Amps switching near by). Hold up cap to maintain power when it is detected that power is removed. Power off detect comparitor used to let micro know when power was removed. - EEProm used to store operating parameters, operating clock and faults. Clocks written every 3 to 6 min to reduce wear on EE to tolerable level. Clock data protected with an ECC code. Fault flags unprotected. Parameters stored in two blocks each protected by a fletcher checksum. Both banks would be read on startup and if one block was bad it would be restored from the other. - Writes to EE would check that power was valid before starting.

- In operation occaisional field returns due to parameter corruption.

Results of improvment attempts. Each one of these resulted in an improvement. - Hold up cap size increased - Write sequence changed so one parameter block completely updated with checksum before next written. It should have been written that way to begin with of course. - Extra decoupling - Redundant pull-up (or was it pull down?) on some of the lines. It 'shouldn't' have been necessary as I recall.

Although all of these helped, none eliminated the problem. Unfortunately it happed rarely enough that we didn't find a way to duplicate it in the lab. As a test I recommended a switch to FRAM to reduce the window of vulnerability but that hadn't happened by the time I left so I don't know if it would have helped.

Some of the reading I did at the time suggested that if the EE state machine were interrupted things could go very wrong.

Try a search for something like reliable EE. I did find something moons ago but as I recall it was from a vendor so judge that as you will.

Robert

Posted via a free Usenet account from http://www.teranews.com

Vote

R

Robert Adsett 18 years ago

Except, of course, when they don't. I had to modify a test bench at one point to add a test to write to such devices, power off, power on and read the device to see if the values were actually stored.

Apparently there was a bad batch and the only way to check was to power cycle them (with them being off for a significant time before repowering them).

You may want to add a check like that if you use them.

Robert

Posted via a free Usenet account from http://www.teranews.com

Vote

J

John Devereux 18 years ago

I naively assumed that "brownout protection" would prevent the micro from sending arbitrary data over the I/O pins. It's an ATMega128. The EEPROM is an I2C device (with 10k pullups on the 2 wires). The datasheet does say that the microcontroller I/O pins go to their "initial state" during a reset, i.e. high impedance inputs. So the I2C lines should get pulled high. Briefly...

I don't think so - but it's possible I suppose!

This is basically what I will be doing (just the software part of the above).

Interesting, thanks for sharing that.

In my application there are a few hundred units in the field that have no protection at all. I.e. the software is written ignoring power failure. And we are not getting problems. But it is obviously a possibility, so I am attempting to address it. Of course this will add complexity and be quite awkward to test. If I am not careful I could introduce a bug that would make things *worse*. So I want to have some clue that it is worth doing.

John Devereux

Vote

S

ssubbarayan 18 years ago

John, We encountered the same problem with our product(still encountering...!).Even though we did not have a right fix,the way we approached to provide a work around for this: We implemented a checksum in our software to detect data corruption in eeprom and incase we find corruption,have a known good copy of eeprom data backup in ROM.(external flash).This data would be copied back to the eeprom during bootup.So this ensures customer has good data when he bootsup. When wrong data is updated due to brownouts,checksum is prone to vary. We will backup good data during a situation where we conclude at least one known set of good data is there.(This can be ascertained again by comparing with known checksum).

We have used this workaround and after this workaround was implemented,we never faced any problems with the content of eeprom.Even though brownout situation still continues to happen,the impact was greatly minimised.

As far this brownout,like your situation we also did not have either an external capacitor or an brownout protection pin in our board.We use ST's eeprom.I have raised a similar query to this a couple of months bak.Given below is the link:

1)

formatting link

2)Regarding checksum:

formatting link

P.S:ours is an consumer electronics product.Processor:ST,EEPROM:ST's M24128BW .

This solution may or may not be suitable to you depending on your product. Hope this helps, Regards, s.subbarayan

Vote

J

John Devereux 18 years ago

This is equivalent to what I was planning. Although I don't think I need a checksum. I was going to have "valid" markers, separate from the data blocks. So it would go

mark copy 1 invalid write new copy 1 mark copy 1 valid mark copy 2 invalid write new copy 2 mark copy 2 valid

On power up both copy valid flags would be checked, and any "invalid" copy overwritten with the valid one. The "copy valid" markers would be stored on separate pages from the data (and each other), so hopefully will not get corrupted at the same time as the data they refer to.

Only problem with this is it requires 4 pages to be written instead of one. Using a checksum to replace the separate flags could mean just two pages - perhaps that is better after all.

1)

formatting link

I will look at these.

By the way, long links often get scrambled up on usenet. You can make it easier for some people if you enclose in angle brackets

This seems to stop them getting split up by news readers.

Thank you.

John Devereux

Vote

M

Marra 18 years ago

A good way around this problem is to have a power monitor function on the micro.

If this shows the power is going then you shouldnt write to the EEPROM. Depending on the power supply it might give you time to write one or more pages of data to the EEPROM.

I used to do work with dataloggers and if the power supply went we had enough time to write all the data to the EEPROM before the power supply died. But we did have a pin on the micro that showed power was dying. You might even need to beef up the pwoer supply caps to give yo ua bit longer.

Vote

R

Robert Adsett 18 years ago

There's another question I've remembered when dealing with brownout. Not only the question of whether I/O is the same in reset as on its rising edge but also over what range reset is asserted and will hold those values.

The problem can occur (or so I've heard) if the voltage drops to a value that the brownout circuit can no longer hold the micro in reset but the voltage is still high enough for the EE to be operating. Not normally an issue since most I/O fails when the voltage drops that far anyway but appently it can be an issue with some EEs. And when you have a hold up cap any transition through such a zone will be slow.

It eill certainly help to have a checksum of some sort on the data if you can. At least then you know something went wrong. Otherwise if a random byte changed would you be able to tell?

If you are not getting problems I'd be tempted to make my first step just making sure that problems will be detected if they occur.

Robert

Posted via a free Usenet account from http://www.teranews.com

Vote

J

John Devereux 18 years ago

The problem is that this information does not seem to be available.

I was just thinking that a "hold up" cap could be a bad idea in this respect. Might be best just to get rid of the supply ASAP - the opposite of a hold up cap, get it through the "dangerous" region quickly.

John Devereux

Vote

EEPROM guarantees after power loss during a write

Join the Discussion

Didn't find your answer?