EEPROM checksum error

What is the possible cause of EEPROM checksum error? Could magnetic field corrupt the EEPROM data? Any design guidelines to prevent this potential failure?

Reply to
Dummy
Loading thread data ...

Eeproms, as far as I know, do not have this feature. Something else (the programming device) must be making one up and storing it in the eeprom - somewhere.

You will need to provide more specific information.

--
Luhan Monat (luhanis 'at' yahoo 'dot' com)
"The future is not what it used to be..."
http://members.cox.net/berniekm
Reply to
Luhan Monat

When did this occur? After a long time in circuit? After programming?

Ken

Reply to
Ken Taylor

The range of usage spanned from 6 months to 2 years. EEPROM is used in mobile radios. All radios are equipped in car or truck. Suspected to be noise at the supply line that caused the EEPPROM checksum error, but experiment showed that the noise injected at supply line is filtered by regulator circuit. Power supply line to EEPROM is confirmed to be clean regardless the amount of noise exist at main supply. What could have caused the checksum error?

Reply to
Dummy

Make sure you follow all the recommendations of the manufacturer of the eeprom.

Is the eeprom programmed in the field, or is it just programmed once at the factory and then used from then on? Is it possible there is a software error causing this? EEPROMs usually use keyed programming sequences to prevent inadvertent corruption.

Make sure you lock out interrupts while programming the thing.

--
Regards,
   Robert Monsen

"Your Highness, I have no need of this hypothesis."
     - Pierre Laplace (1749-1827), to Napoleon,
        on why his works on celestial mechanics make no mention of God.
Reply to
Robert Monsen

All that good stuff....

Also, are these two-way radios? Do the EEPROM's get altered during normal use, in which case is it possible RF is causing problems?

Ken

Reply to
Ken Taylor

*What* checksum ? How do you calculate 'your' checksum ?

What type of Eeprom ? 24Cxx family for example ?

Far too little info supplied to meaningfully respond.

Graham

Reply to
Pooh Bear

The serial interface is timing tolerant IME. Never seen false data as a result of background interrupts.

Graham

Reply to
Pooh Bear

Checksum is just the addition of each byte of data. At the end of the file, another byte or word is added that will total all of the bytes to zero. If any of the bytes are corrupt, the total sum of all the bytes will not be zero. A loose wire or strong magnetic field may cause this problem. Harold

Reply to
Harold Ryan

What kind of EEPROM? Data corruption in EEPROMs is not uncommon- caused directly by electrical noise, or by faulty design of the controlling microprocessor system, either wrt to EMI or power supply supervision. Redesign to decrease EMI susceptibility and PS issues, and then (and ONLY then) tweaks to add redundancy to the non-volatile storage can reduce the issue to insignificance even for large quantities of units in challenging applications.

Best regards, Spehro Pefhany

--
"it's the network..."                          "The Journey is the reward"
speff@interlog.com             Info for manufacturers: http://www.trexon.com
Embedded software/hardware/analog  Info for designers:  http://www.speff.com
Reply to
Spehro Pefhany

I'm broadly familiar with this thanks. I'm less familiar with why Eprom programmers of old seemed to produce different checksums according to manufacturer.

The OP still hasn't explained *what checksum* he's talking about under what conditions.

Can he even validate the file ?

Graham

Reply to
Pooh Bear

The EEPROM will be programmed in factory before shipping out to customer. Everytime when radio is turned on, checksum will be verified. Checksum error will occur when any bytes are corrupted in the EEPROM. If data corrupted during radio ON, any checksum error won't be detected until the next radio turned OFF and ON cycle.

The corrupted bytes are at random EEPROM address. Some of the parts could be recovered after re-programming while some could not. For those parts which damaged permanently, failure analysis showed cell overwritten. Trying to inject some noises to EEPROM data or supply line while performing write operation could cause checksum error. But all the voltages supplied to EEPROM are clean when in normal use. The filter and regulator have taken care of the noises. So it's not right to point to the noise as the culprit.

Most of the radios failed after being in the field from 6 months to 2 years.

Reply to
Dummy

You say they are preprogrammed, but this implies that you are writing them during normal operation. Which is it?

^ Famous last words. :)

The filter and regulator have taken care of the noises. So

If the eeproms aren't being reprogrammed in the field during normal use, then a software error is unlikely, unless the magic write sequence is stumbled upon during a freak crash. If they *are* being reprogrammed (ie, you are saving some value when the user retunes the radio) then I'll again say software. I'm telling you, lock out those interrupts!

The other possibility is a bad batch of eeproms. This is fairly unlikely, but not without precedent*. Attempt to correlate the bad ones with some lot. Talk to the manufacturer, and ensure that they don't have a 'known' problem. Also, I wouldn't reuse the corrupted ones just because you managed to program them. I'd swap them out as soon as practical.

  • A company I used to work for decided to save 10 cents a ram chip and forgo individual testing of the chips by the manufacturer. Sadly, it turned out that those chips were bad 5 to 10 percent of the time. They were selling high availability purple ethernet switches for hundreds of thousands of dollars each. The engineer responsible was of course promoted to VP, and given vast new responsibilites.
--
Regards,
   Robert Monsen

"Your Highness, I have no need of this hypothesis."
     - Pierre Laplace (1749-1827), to Napoleon,
        on why his works on celestial mechanics make no mention of God.
Reply to
Robert Monsen

Okay. As I suspected.

Well, what about "abnormal" use, say something that might happen only rarely? Are you claiming that the supply voltage on these parts was maintained at 5.0V +/- 5% constantly, never straying lower or higher, from factory to failure? And noise injected from the supply or other pins could cause the micro's PC to point to random bits of code.

I sure don't think you can conclude that.

My original comments definitely apply to this situation. Can you post a link to the schematic of the power supply, micro and EEPROM?

Best regards, Spehro Pefhany

--
"it's the network..."                          "The Journey is the reward"
speff@interlog.com             Info for manufacturers: http://www.trexon.com
Embedded software/hardware/analog  Info for designers:  http://www.speff.com
Reply to
Spehro Pefhany

Take a GOOD look at power up and power down sequences. A few years ago, a vendor of mine was having problems with a similiar situation, where an EEPROM kept getting programmed to random bits here and there. Seemed that on start up (this was on a parallel port) there were voltage glitches that JUST HAPPENED to mimic the programming sequence on the device, which was not supposed to be field programmable! Since this was a security dongle, and the bits were sometimes the security ID codes, this was considered a very bad thing!

So, take a look at what occurs during start up and shut downs, and see if there are any glitches then that can cause you problems!

--
Charlie
--
Edmondson Engineering
Unique Solutions to Unusual Problems
Reply to
Charles Edmondson

Were they actually able to observe this, or was it assumed? Dealing with hardware/software interfaces, it is quite common for programmers to blame software bugs on hardware 'glitches'. I've seen this again and again. It is usually a bug that just seems to come and go, possibly due to some unrelated change in the software that changes the timing or place in memory where a random pointer is hitting. I have made a living out of consulting on these kinds of issues.

Yet another goblin to beware of. Thanks.

--
Regards,
   Robert Monsen

"Your Highness, I have no need of this hypothesis."
     - Pierre Laplace (1749-1827), to Napoleon,
        on why his works on celestial mechanics make no mention of God.
Reply to
Robert Monsen

From their rep, it had definitely been observed. Device had worked for years, then they came out with a new package. New package also came at same time they went to a new fab, which had different processes. New processes made the programming sequence MUCH MORE sensitive, so that random glitches now created random bits programmed in their devices. They needed to replace a whole lot of devices in the field, and got a lot of bad will because of the random failues. We are still replacing these as they go bad...

You see, we only use on small field on the whole EEPROM. Problem doesn't happen every time, and may be worse on some system, and less on others. Also, some people just don't use the things that often to break them!

--
Charlie
--
Edmondson Engineering
Unique Solutions to Unusual Problems
Reply to
Charles Edmondson

Introducing random noise at supply line won't be able to cause any glitches at EEPROM lines because the noise was riding on the supply. However, when glitches are introduced to main supply line by creating a temporary dip of voltage at certain period, glitches can passed through to EEPROM lines.

Previously, we have been able to see the EEPROM checksum error by introducing noise to EEPROM directly, bypassing the regulator. So I reckon that if the glitches get through the regulator, most probably checksum error will occur. We are checking on that. We thought of noise, but missed out the glitches.

If that's the root cause, any method to prevent glitches? I guess regulator is only able to filter the noise that rides on the Vcc. Any sudden dip in voltage is not recoverable.

Reply to
Dummy

For you, this might be a design issue. As power ramps up/ramps down, different components react differently. Some have internal caps that make them hold state a little longer than others, or are just more sensitive to power supply levels. Think about the programming sequence. What could provide it in your circuit. What could PREVENT it in your circuit!

I have often found that start up conditions are not fully considered in design. You just assume that the power comes up all at once, smoothly. In reality, different voltage rails come up differently. Filter caps take time to charge up to voltage. Good design takes that into account, sometimes adding POR circuits to make sure that power is steady before starting things up, and quick shut down sequences to turn everything off before the power goes below limits. It's like preventing race conditions and logic glitches. Sometimes, you just have to take a good look at the failure modes...

--
Charlie
--
Edmondson Engineering
Unique Solutions to Unusual Problems
Reply to
Charles Edmondson

Typically, parallel eeproms have active low write enable and chip select pins. If those pins are slow to come up, I guess it can cause problems.

However, one typical example is the atmel parallel eeproms. They require an 0xAA, followed by an 0x55 to be written to special addresses before they'll go into write enabled mode. The address pins need to be set to

1555 for the 0xAA, and 0xAAA for the 0x55. Getting this to happen because of a startup seems incredibly far-fetched.

There is also a set of hardware features that protect against inadvertent writes. The AT28BV64B has a power on delay of 10ms after Vcc comes up; thus, these glitches would have to be at least 10ms. If OE is low, writes are inhibited; thus, the glitch would have to be restricted to the WR and CE pins. If either CE or WE are high, writes are inhibited; thus, the glitch would have to be on both of these, but not on OE. Also, pulses of less than 15ns on either WE or CE won't initiate a write cycle. After this dance, one has to go through the software programming sequence to get it to really go into write mode.

Again, startup writes to these things sounds incredibly far-fetched. Perhaps after trillions of power cycles inducing random noise (1,000,000 monkeys?).

I'm guessing your example eeprom wasn't one of these, and was protected in some other way. Either that, or the chip firmware was flawed, and the sequence above could be circumvented in some situations.

The OP hasn't indicated what EEPROM he is using, or what kind of usage (whether it's getting programmed in the field or not). Thus, we are all speculating without any real information.

--
Regards,
   Robert Monsen

"Your Highness, I have no need of this hypothesis."
     - Pierre Laplace (1749-1827), to Napoleon,
        on why his works on celestial mechanics make no mention of God.
Reply to
Robert Monsen

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.