Saving data in CPU on-chip EEPROM

- Y
- yossi_sr
  
  Contact options for registered users
posted
18 years ago

Tue, Jun 14, 2005 8:33 PM

Hi all, What happens if during saving the parameter in byte of EEPROM( it takes about 5-10 msec)the Power Down occurs.What should be in the byte programmed ? The same value/ The next value/ Undetermined? I developed the diesel controller card which uses AT89C51ED2 processor(

2K bytes onchip EEPROM).The system manages failure table for various parameters and in case of failure the appropriate byte in EEPROM is incremented ( counting failures).Each failure counter consists of 2 bytes ( counting up to FFFF).The system may be switched off (Power down) occasionally during programming and this may lead to errors such as: 1) when writing ( incrementing) low order byte (its value is less than FF)and Power Down occurs,I am not sure what this byte will contain after next Power On. 2) when the value in counter is 0x00FF, incrementing means writing 0x00 to low order byte and then writing 0x01 to the high order byte. If Power Down happens after writing the low order byte the system will show error value 0x0000 after next Power On.

One solution may be to use checksum on all block of data, and always recalculate the new checksum after each parameter change.If the checksum is not OK I can clear all parameters.I don't like this solution because in case of one error I should clear all the list of failures. Is there any other solution or suggestion how to solve this problem? Thanks! Joseph

- G
- Grant Edwards
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Jun 14, 2005 8:48 PM

Undetermined.

You have no way of knowing.

Bummer, eh?

Put two copies in EEPROM with checksums and some sort of sequence number or "active" flag so that you use the most recently written valid copy and write to the older one each time. That way if you loose power during an update, you don't loose everything, you just fall back on the previous set of data.

--
Grant Edwards                   grante             Yow!  Am I accompanied by
                                  at               a PARENT or GUARDIAN?
                               visi.com

- J
- Joerg
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Jun 14, 2005 9:03 PM

Hello Joseph,

Read about how that is done in mission critical applications, such as navigation systems or the actual controls.

Some systems store values in much more than one location and upon wake-up can then perform a majority decision. That requires at least three writes but I have seen as high as five. IOW, one value might be corrupted but when the others corroborate they will be selected as valid.

Regards, Joerg

formatting link

- K
- Kelly Hall
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Jun 15, 2005 1:10 AM

Perhaps a small battery or supercap that can keep the CPU alive during write operations in spite of system power going down?

Kelly

- H
- Hans-Bernhard Broeker
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Jun 15, 2005 8:37 AM

A desaster. So you must not let that happen, or you must build your code so it can survive the desaster.

The usual suggestions are:

1) control the power. Use a back-up supply (e.g. a large capacitor) and an external power-off detector on the "upstream" side of that, which let you know at least a couple of milliseconds in advance before power to the CPU actually goes down.

2) Change the data-pattern used by the counters to a one-way modification, i.e. each fault will clear one bit, instead of re-programming an entire byte, which will go through an erase to value

0xff, and could lose you not just the current error you're trying to record, but also the count of previous ones.

--
Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

- T
- Tauno Voipio
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Jun 15, 2005 12:03 PM

Welcome to the world of transaction processing!

What you need is a way to write multiple bytes so that the whole write either succeeds or is not done at all.

One way of processing a transaction is to use a journal: an area in the EEROM large enough to house one write set (two bytes in your example) and the necessary control data. Usually the control data consists of one status byte, the starting address and the length of the write.

The operation is done:

mark status as tentative, 2. copy the write data to journal, 3. mark status as journal written, 4. copy the write data to final destination, 5. mark the status as free.

If the operation chain is broken, it's possible to roll-back the write from the journal. A broken operation before end of step 3 is simply discarded by marking status as free. A broken operation after step 3 can be completed by copying the data from the journal to the final destination.

Please note that journaling in this form is ill suited for Flash memories, as the status byte and journal must be repetitively overwritten.

HTH

--

Tauno Voipio
tauuno voipio (at) iki fi

- G
- Grant Edwards
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Jun 15, 2005 2:29 PM

[...]

Doesn't the same note applie to EEPROM?

--
Grant Edwards                   grante             Yow!  DIDI... is that a
                                  at               MARTIAN name, or, are we
                               visi.com            in ISRAEL?

- Y
- yossi_sr
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Jun 15, 2005 2:57 PM

Thank you for your suggestions.The hardware is already closed , so I cannot add power-off detector.Other solutions given here seem to be too complicated for this small project. Once again, to remind, I'd like to summarise the system:

1) The EEPROM is on-chip the AT89C51ED2 microcontroller( 2kb). There is no page write,just byte write. Programming one byte takes 10msec typical. 2) The data recorded consist of 20 failure counters ( when one of the system failure occurs, the corresponding to this failure 2byte counter is incremented by one). No Preset,default or etc.. values.If the value of the counter reaches 0xFFFF no further increments would take place for this variable until it is cleared. The EEPROM data array may be cleared via RS-232 terminal command by the user. All failure counters start from zero, and after some amount of time it is possible to read the failure counters and display them at the terminal. The data block also includes additional 3 Hourmeters which hold the overall time the various parts of the system are working. These params are updated every 15 min(to reduce writes to EEPROM) No RTC,counting in software. 3) There is no RTOS running on this project. 4) Writing a change in EEPROM will be always done on one parameter only at the given time.

In my opinion the following solution will be sure and sufficient:

1) I will hold two same blocks of data in EEPROM (block A and block B), each one ends with checksum. 2) When there is time to increment parameter: a) udate parameter in A. b) write new checksum in A. c) udate parameter in B. d) write new checksum in B. 3) On Power On the checksum of both blocks will be checked: a) If A and B are both OK , we check which block is latest (by comparing parameter values of both blocks). Then we copy the newer block to the oldest.(normally there should be only one change, so copying is reduced to the one parameter only). b) If one of the blocks is not OK , we copy the second block to the first. c) If both blocks are not OK, we clear both blocks.

I think this approach will sufficiently protect the data corruption in EEPROM in case when switching off the Power during write cycle. I don't bother against any other source of disturbances. I should prefer to improve the hardware rather than to invest in complicated software algorithms. And probably this approach is reliable enough in this case because the data stored is simple and sequential.

What do you think ? Will the described above approach be sufficient? Are there any potential errors in logic which I don't see? Please let me know before I start to write this software. Thanks!

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Jun 15, 2005 3:27 PM

That should work out fine. When you have the space, duplication of the data is easier than using a log journal (which would be the way to do it using less extra data). The checksum is essential so that you know which copy is correct at power-up. It is also possible to include a "version number" parameter, which would allow you to update only one copy at a time and thereby save have your writes. Just don't copy Microsoft's design for FAT - for "safety", they make two copies of the FAT, but forgot to include any way to spot corruption in the event of discrepancies between them!

- T
- Tauno Voipio
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Jun 15, 2005 4:56 PM

Not so far - a Flash must be block erased, which complicates the thing quite remarkably.

--

Tauno Voipio
tauno voipio (at) iki fi

- G
- Grant Edwards
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Jun 15, 2005 5:06 PM

Right, but you still have to worry about repeatedly erasing a specific location and exceeding the "number of writes" spec.

--
Grant Edwards                   grante             Yow!  Wow! Look!! A stray
                                  at               meatball!! Let's interview
                               visi.com            it!

- T
- Tauno Voipio
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Jun 15, 2005 5:13 PM

Right.

That's why I'm using a FRAM for this kind of use.

--

Tauno Voipio
tauno voipio (at) iki fi

- N
- Noone
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Jun 15, 2005 5:58 PM

One caution to add. We used one 8051 microcontroller in the past that exhibited "random program counter jumps" when powering down. In the absence of a voltage supervisor, the processor would randomly execute program threads. Since EEPROM erase and program routines were in firmware, every once in a while the EEPROM would get clobbered. You can add protection against this was to not use immediate values to enable erase/write access. Use a RAM variable that is defined only for specific windows. XOR a register value into the key. Incrementally build up a valid access key over a long sequence of code. A stray bulk erase could really bite you.

A good way to test for this is to create a dirty power connection to the processor. We used a rotating metal can on a rottisiere motor with a stranded wire rubbing against the body. Pieces of tape we used to create islands of insulation. We fortified the code until we minimized EEPROM random hits and then added an external voltage supervisor for good measure.

Newer processors usually have some form of internal brownout/Vdd dropout detection. The point of all this is simply: Test and verify that your system works the way it is expected. It is the unexpected events that really hurt you.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Jun 16, 2005 6:55 AM

I would think that a microcontroller that jumps around randomly on power down would cause a lot more worries than just overwriting eeprom ! In particular, brown-outs could cause disaster (and even with the best power supply, brown-outs can occur - think of users giving the system a "quick reset, just to be sure everything is working"). You'd definitely want to add an external reset device to such a micro.

- T
- Tauno Voipio
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Jun 16, 2005 7:31 AM

My vote to the voltage monitor, too.

Years ago, I had problems with EEPROM being clobbered despite of triple redundancy and voting read. The culprit was the 8051 processor turning mad just before passing out when power was going down.

The problem was completely cured with a voltage monitor / reset chip.

--

Tauno Voipio
tauno voipio (at) iki fi

- M
- Meindert Sprang
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Jun 16, 2005 8:13 AM

I have seen the same happen with the first 8515 AVR's (the AT90S8515), which lost or corrupted it's internal EEPROM at power down. The ATmega8515 solved it with the Brown Out detection.

Meindert

- A
- Anton Erasmus
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Jun 16, 2005 10:30 AM

Use Gray Code for the counters. Only one bit changes between succesive counts, which of course means only one byte changes. Make sure that if there is a power failure, you have enough power to at least complete any write you are busy with. That way if power fails, you either have incremented the counter, or you have missed only one count. I have used this method on an AVR that counts operation cycles on a unit. Of over 300 units that has been operating over the last 4 years, I have had no corrupt counters.

Regards Anton Erasmus

- J
- Jonathan Kirwan
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Jun 16, 2005 4:28 PM

I've been there, too, with the Atmel AT90S2313 AVRs.

Jon

- J
- Jim Granville
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Jun 16, 2005 7:27 PM

Sounds fine, but what actually happens in the EEPROM update process, is the buried EE state engine first erases the byte (or page) and then replaces it with the new value(s). Some have Page schemes, but allow single byte replace - that just means they ready the page, XOR it with the new info, and write the whole page back.

Thus, even with Gray code, there are finite times, where you have [OldValue][Erasing to 0FFH][0FFH][Writing NewValueZeroes][NewValue]

The issue with wayward PgmCtrs ( mainly on ramp down, but also EMC ) is one reason there is demand for OTP Flash schemes -> devices that _cannot_ update their own code in the field.

-jg

- J
- Jonathan Kirwan
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Jun 16, 2005 7:41 PM

I'm interested in the exact details of this. It is easy for me to imagine how this might be done, using a routine to read a gray code from non-volatile memory and convert it into multi-byte binary form in RAM, where it is incremented., and then converted this result back to gray code before writing back to non-volatile memory.

But I don't know of a direct method to simply read out the current gray code value and more directly figure out which byte may have changed. It seems to me that the conversion to binary, with an increment taking place in that domain, is necessary.

The conversion back and forth is relatively easy, but one of the conversions involves a loop and I'm wondering if there is a method that does not involve a loop and could be used to operate in an expression form.

Jon