Crc16 on power failure

- D
- David R Brooks
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sat, Jan 20, 2007 1:52 AM

formatting link

OK, I found the decompressor (uncr233.exe). I've put the documentation & the disassembled source on my website:

formatting link

I'll leave it to others to tease out the interesting bit :)

- D
- David R Brooks
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sat, Jan 20, 2007 10:03 AM

Hmm, having disassembled that code, I can't find a DAA instruction in it? Maybe the disassembler is acting up, but it doesn't seem so. As per my other post, that code is at

formatting link

- C
- CBFalconer
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sat, Jan 20, 2007 1:10 PM

I can't either. It may be a very early version, before I found the high speed code. Too bad the library had been repacked, as I normally put date stamps in my LBRs.

I may pass the source through id2id to make things more readable. Your disassembler did a nice job, whose is it?. I can spot my techniques in its output quite nicely. You might want to try id2id-20 also. Available on my download page.

--
 

 "A man who is right every time is not likely to do very much."
                           -- Francis Crick, co-discover of DNA
 "There is nothing more amazing than stupidity in action."
                                             -- Thomas Matthews

- D
- David R Brooks
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sat, Jan 20, 2007 9:33 PM

That disassembler is one I wrote myself, many years ago. I could post it, if anyone's interested.

- T
- Tom
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Jan 22, 2007 4:27 AM

One technique that I have used in the past is to connect the backup battery to the CPU through a pair of diodes and some kind of switch (either a MOSFET or a relay). When the main power goes out, the CPU can save its state to BBM and then turn off the switch. This way it doesn't matter if you need 90 ms or 400 ms. Backup capacitors are not required at all and the net effect on battery life is minimal.

This of course assumes that your battery has a high enough voltage and low enough ESR to run your CPU.

--Tom.

- M
- Mr. C
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Jan 22, 2007 2:21 PM

So, if you can limit your errors to only one bit, you can use whatever is the fastest. That's what we need, an error bit reducer! :-)

Because they might happen. How can you accurately predict what will happen to the contents of RAM if a battery is getting low, or is intermittent?

I am pretty sure Crenshaw's article was written with serial communications in mind. But, memories can also fail such that as bytes are processed, there are groups or "bursts" of errors. Yes, you can arrange the array any way you want, but how do you know how to arrange it for better error detection?

Then you have to ask yourself if you are willing to take the chance at having an error go undetected. You might be right about the corruption patterns in memory (although I have seen fairly random looking ones too), but what happens if it does not fail that way?

There are several "tricks" you can play in an attempt to reduce the computing cost of a high-reliability detection scheme. I know, I have used many of them. But all of them are compromises. The question that needs to be asked before any compromise is made is, "What error rate can I afford to tolerate?"

IMO, the OP should start with a CRC calculation and try every possible way to make it work. Then, if it is not possible to compute the CRC, he should look at the next best compromise to see what it delivers in terms of errors detected and at what computing cost.

- A
- Arlet
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Jan 22, 2007 2:54 PM

The definition of 'burst', as it is used in articles about CRC checksums, is a series of consecutive bits, that are *all* in error. So, a burst error of 8 bits may turn 0x00 into 0xff, or 0xaa into 0x55.

Now, errors often tend to occur in bursts, but that doesn't mean each individual bit is neatly inverted. A burst error that happens in typical communication links may have a high probability of toggling a bit, but usually that probability is less than 50%. For example, you may have a case where a burst of bits is all cleared, turning 0xaa into

0x00... This is however not a 'burst error' that CRC codes are supposed to protect against.

In the case of memories, I'd expect burst errors that invert groups of bits to be rare. Instead, I would expect blocks of memory to be zeroed, overwritten with the wrong data/garbage, or randomized by power loss. CRC codes aren't especially suited for any of these cases.

- D
- David Empson
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Jan 23, 2007 9:52 AM

No it isn't. A "burst error" is one in which the first and last bits are flipped, and the ones in the middle are randomly correct or wrong. A burst error of length N may contain any number of single bit errors from

2 to N.

(Reference: Tanenbaum's "Computer Networks", all editions so far. It is on page 196 of my 4th edition, in section 3.2.2 "Error-Detecting Codes". My first edition has it on page 129, section 3.5.3.)

Yes it is. A proper 16-bit CRC will detect errors in a transmission as follows (from earlier editions of Tanenbaum):

- All single bit errors.

- All burst errors of length 16 or less.

- 99.997% of 17-bit burst errors.

- 99.998% of 18-bit or longer burst errors.

This is based on the definition of a burst error being as I described: the first and last bit are flipped, all others in between are random.

There is a critical point, however: the CRC can only meet these standards for a single error within a transmission (either a single bit error or a single burst error).

For example, if there were two single bit errors, with 1000 valid bits in between, that is a 1002 bit burst error, not two single bit errors, so the CRC has a 99.998% chance of catching it.

Agreed, for the most part (ignoring the "inversion" aspect).

If you are using a CRC to detect memory errors, you have to consider the memory as a bit array, with bit order within bytes determined by the order in which bytes are shifted into the CRC calculation (low order bit first is more common).

If there are any errors in the memory, then the distance between the first and last error bits determines the size of the "burst error", and a 16-bit CRC is only guaranteed to detect an error where a single bit is flipped, or where all errors occur within 16 bits of each other (either two adjacent bytes, or straddling two byte boundaries). For anything further apart it only has a high probability of detecting the error.

There are some tricks you can pull. For example, all memory being zeroed is a common error situation. If you preset your CRC to zero, append the CRC to the data and expect a zero remainder, then a zeroed memory block will pass a CRC check. This can be avoided by storing the result of the CRC calculation separately and comparing it with the result of the previous calculation.

Note that this trick doesn't work for variable length data: a CRC preset to zero with any number of zero bytes will result in a zero CRC remainder. In this case, the best technique (e.g. used by HDLC in data transmission) is to preset the CRC to 0xFFFF, append the complemented CRC to the data, and include it in the calculation at the receiving end, where a constant value is expected (0xF0B8 for HDLC, which uses the CRC-CCITT 16-bit polynomial).

--
David Empson
dempson@actrix.gen.nz

- N
- nospam
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Jan 23, 2007 11:52 AM

So unless the errors in the OPs 16kb memory array are restricted to 17 consecutive bits (in the order you choose to arrange them) a 16 bit CRC has a 1 in 2^16 (0.002%) chance of not detecting them, just the same as a 16 bit checksum generated in a variety of simpler ways.

This is why I suggested a simpler 32 bit checksum which would be more effective and faster to calculate. The extra 16 bits is a trivial overhead on 16kB of data.

- M
- Mr. C
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Jan 23, 2007 3:07 PM

Then what would you recommend? As David Empson mentions in this thread, the CRC-16 will catch 99.998% of situations where there are scattered errors (i.e. long bursts), worst case. I would consider that to be "good enough" for me.

Consider duplicate storage of data, say in 2 places. Upon power-up the two areas could be compared for equality. If they are not exactly the same, there is an error somewhere. So if there were errors, the only way the errors could not be detected is if they appeared identically in BOTH memory areas. I wonder what the probability of that would be? Any thoughts on that?

- F
- Francois Grieu
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Jan 23, 2007 6:08 PM

formatting link

Are you sure that the DAA instruction can help computing the 16-bit CRC CCITT? And if yes, how the hell? The more I think about it, the more I am puzzled, and becoming a tad incredulous.

Francois Grieu

- A
- Arlet
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Jan 23, 2007 7:20 PM

*Any* 16-bit checksum will catch 99.998% (65535/65536) of random, uncorrelated errors , so you might as well pick one that's easy to calculate, such as simply summing/XOR'ing 16-bit words. This has the added advantage that you can quickly update your checksum when only changing one variable. Simple summation will also catch many of the burst errors (it's easy to see that all bursts within a single 16-bit memory word are caught).

Depends on the size of the areas, and the chance that there's any correlation. For example, if you have a 32KB memory chip, divided into

2x 16KB memory areas, and the MSB address line to your chip happens to be broken, the two areas will always look the same.

- C
- CBFalconer
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Jan 23, 2007 9:14 PM

Yes. I forget the details, but it involved the auxiliary carry flag on the 8080 and z80.

--
 

 "A man who is right every time is not likely to do very much."
                           -- Francis Crick, co-discover of DNA
 "There is nothing more amazing than stupidity in action."
                                             -- Thomas Matthews

- F
- Francois Grieu
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Jan 24, 2007 7:51 AM

Could the idea be: the auxilliary carry bit is a fast way to keep a temporary copy of bit 3 (compare to 8 and the complement of bit 3 goes into H).

Later, to test H, and since there is no JR H,nn instruction, use DAA to bring H into C or Z, and branch according to this.

So DAA would not be used for its effect on an 8-bit data chuck, but "only" as a way to access the H bit.

Francois Grieu

- M
- Mr. C
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Jan 24, 2007 9:03 PM

OK, I think I see your point now. For random errors that may occur "far apart" in memory, making the burst length very long, even a checksum will be as good as a CRC-16.

OK, but ... suppose I have a 16K memory and I swap data in the first and last locations (and lets say they are not equal since that woud be boring). Now I have a very long burst length of errors since the first and last locations are wrong. Using a simple checksum will not catch the error since all bytes are summed - there is no consequence as to their location.

I guess my brain was in the mode of serial and internal memories where address and data lines are not an issue.

- R
- Robert Adsett
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Jan 24, 2007 10:58 PM

And you also be able to find in that raw set of all the possibilities that could fit into the space two sets that will pass a CRC check but fail a checksum. Which one is more likely depends on the error distribution compared to the check distribution.

Consider a 2 bit check. That gives you 4 possible check values and a

1/4 chance of any random set of bits producing a given check value. Now consider two ways of producing those check bits 1 - check bit one is the parity bit of all the even bits check bit two is the parity bit of all the odd bits 2 - the check bits are the simple check sum of all the bit pairs.

They both will produce the same possibility of matching a random set of bits and it's easy enough to come up with patterns that will pass one but fail the other.

Similarly since a simple 16 bit checksum a 16 bit CRC both have a 1/64K chance of a random set of bits matching. If checksum misses some errors thata CRC catches it follows the reverse will also be true. If it weren't one of the two would have fewer unique check values.

The only real way of choosing is if you can show the pattern of expected errors is better detected with once versus the other. Since a CRC is better at detecting short bursts of errors and comms are often characterized by such error source that makes a good match. Cryptographic hashes are good choice when the goal is to make it especially difficult to produce a second set of bits that will produce the same check value.

To determine which one is actually better for this application you would first need to determine what the failure patterns are likely to be. If failures are random then even a simple check will work.

I'v not seen any investigation into failure modes of battery backed memory so I don't know which might work better. I do know that in one case I dealt with one of the failure modes damaged both the active and backup copies of configuration data in an EE so it may not make sense to assume that failures are restricted in area. That is, however, just a single data point.

Robert

--
Posted via a free Usenet account from http://www.teranews.com

- C
- CBFalconer
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Thu, Jan 25, 2007 12:26 PM

... snip ...

If it is written in a higher level language, so as to be usable on other systems, I would definitely like to see it.

I had a little time last night, so I started clearing up the CCITCRC disassembly. The beginning of an IDPAIRS file follows. I simply use "id2id ccitcrc.chk" and have both ccitcrc.chk and idpairs in editor windows. The editor (textpad) automatically reloads an altered file, so the identify - propagate

- read cycle is very quick. You may want to continue it. id2id-20 is available at:

The file so far:

i10 fclose l005c fcb_dv l005d fcb_fn l04cc abort l04cf quit l04da Eclsm l04dd tstr l04f6 bdos l04e7 couta l04fa Edir l0520 Efull l0541 Eclose l055d Eabort l056b Enofil l057e Efopen l058e Efread l05a2 msgins l05ad msgrmv l05b6 msgdun l05bb msghlp l07ed msgpws l0839 msgcrc l0851 crcfil l0872 crclst l087e last

--
Chuck F (cbfalconer at maineline dot net)
   Available for consulting/temporary embedded and systems.

- M
- maxthebaz
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Jan 29, 2007 7:54 AM

After my post, I had to turn to another problem. Now, I can work on it again. I'm going to read all your suggestions: thank you all! Max

- D
- David R Brooks
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Jan 29, 2007 8:56 AM

CBFalconer wrote: [snip]

Yes, id2id ran just fine. I sure would like to see how DAA was used :)