I notice that the disk drive industry uses CRC error correction codes. Can anybody tell me where these come from, do they have their own standards?
The question arises after study of the academic literature, in communications. There are many codes, going way back; BCH, convolutional, etc .But no mention of CRC. I don't get it, didn't the storage engineers learn from the same textbooks as everyone else?
The disk drive industry use CRC error _detection_ codes, not CRC _correction_ codes. The drive industry is only interested if the data block is completely OK or if it is faulty.
A faulty disk block usually contains a large burst error, which would be impossible to correct anyway, even with advanced ECCs, unless the data is interleaved between several blocks or even disk drives (RAID). The deinterleaving will convert a burst error to random errors which may recoverable with a suitable ECC.
In principle, the ordinary CRCs could be used to correct _one_ bit error in a message or block. If the CRC check fails, invert each bit at a time in the message frame and recalculate CRC, until a match is found :-). This make sense only if the error probability is very low and it is a single bit error and when the retransmission delay would be unacceptable due to half-duplex delay in space communication.
As has been pointed out, Cyclic Redundancy Check codes are for error detection only. They generally do not have error correction capability.
There are tons of different CRCs, but a particular CRC is usually specified in standards that use them. Many communication standards include CRCs at the end of headers or data packets so that if undetected errors get through the FEC or ECC, the CRC will still catch the error.
BCH and convolutional codes are used for error correction and actually don't do nearly as good of a job at error detection as a CRC does. The two jobs are different.
That's wrong. These days all physical (hard) disk blocks have large error checking/correcting blocks attached, and at current densities, its rare (to the point of non-existence) that a block reads completely cleanly. The Seagate Cheetah's, for example, can correct error bursts up to 320 bits. The newer Savio's can correct bursts up to 520 bits. And both of those are with 512 byte sectors. Non-enterprise drives are a bit skimpier, though.
Well, the two jobs overlap to a fair degree, and there's a tradeoff - for a fixed size check block, increasing the number of bits correctable decreases the size of detectable errors, and vice-versa. The exact codes used and the expected error rates and patterns have a huge impact on the balance. In addition, different check codes have different biases on the patterns of errors detected - CRCs are particularly good at detecting single bursts up to the size of the CRC (which matches the error patterns expected for the serial communications devices for which they were originally developed).
The tradeoff you mention makes a great deal of sense to me. You can detect errors better by separating valid codes by larger Hamming distances. But "correction" implies that you accept nearby (Hamming distance wise) codes as valid so that you can then correct them to the actual nearby valid code. But doing that means you must sacrifice error detection for the manifest reasons.
Many standard CRC polynomials, including that one, are reducible. Specifically they have a factor of (x + 1) to detect any odd number of bit errors with certainty.
Right. The table-drive software method is reasonably fast, but where they really shine is in hardware. A linear feedback shift register with taps corresponding to the positions of the 1 coefficients computes the CRC on the fly using very few gates.
I'm reading this on sci.crypt, where we deal with the frightening record of CRC misuse. They're great for detecting accidental errors, but intelligent adversaries easily exploit their linear mathematical properties.
Yup. That's why ECCs aren't good at residual error detecting; decoding is generally essentially selecting the valid codeword with the minimum Hamming distance to the received pattern. This will be the most likely transmitted codeword, but it could still be the wrong codeword and the decoder has no way of knowing that.
A CRC or other residual error detecting code, however, doesn't need to care about correction, so a properly designed system gets away with only checking whether the codeword is valid or not. It is not difficult to constrain the system such that detection of an invalid codeword is a highly reliable indicator of remaining error(s); far more reliable than the ECC/FEC.
CRC codes produce check-bits for information bits so that [information bits]+[check-bits] can be thought as a polynomial over F_2, the field with two elements, that is divisible by a fixed polynomial q(x): (or something like that).
One common polynomial is a 32-bit polynomial:
"The design of the 32-bit polynomial most commonly used by standards bodies, CRC-32-IEEE, was [...]"
This article below makes it clear that Reed-Solomon codes are very much used in CDs.
"The Ubiquitous Reed-Solomon Codes" by Barry A. Cipra
"The paper, "Polynomial Codes over Certain Finite Fields," by Irving S. Reed and Gustave Solomon, then staff members at MIT's Lincoln Laboratory, introduced ideas that form the core of current error-correcting techniques for everything from computer hard disk drives to CD players."
"Disk drives pack data so densely that a read/write head can (almost) be excused if it can't tell where one bit stops and the next one (or zero) begins."
Maybe disk-failure is a different thing than read-errors.
RAID arrays use several (physical?) disks in case one fails or is mixed up.
I've used that in a memory system, where CRCs were cheap and fast and bit errors scarce. Is there a faster way to correct than the suck-it-and-see method outlined above? I couldn't find one, but I'm no expert.
Well, with CRC's, if you flip a bit at a specific position (say, bit 7) of the input, a specific pattern of bits will flip in the CRC (and yes, for any decent CRC, this specific pattern will differ for every position). So, what you do is exclusive-or the CRC you received and the CRC you calculated, and look that result in a table of precomupted error vectors (for example, what the error vector would look like if we flipped bit 7). This will tell you immediately what single bit error we got, and which bit needed to be flipped (if any, the error might have been in the CRC itself)