Question regarding a data integrity audit procedure

We have a requirement to develop an audit procedure that ensures the integrity of our data. We have a hierarchical, embedded, real-time system, and our data is distributed over multiple cards. Each higher level card stores the data and executable image of its lower cards. All requests to update the data emanate from external management systems, and they are incident on the highest level card, where some basic validation is performed.

Management update requests are processed in a trickle-down manner. An update request is routed to the target card, where it is processed. The target card is given an opportunity to validate the request based on the run-time conditions that prevail on that card (which only the target card knows about, since run-time data is not maintained at higher levels). If the target card explicitly rejects the request, and the reject response is not lost before it reaches the top level, the update request is rejected, and the rejection response is sent to the external management entity that issued the request. If the request times out (due to link failure or card failure), the higher level card goes ahead and updates its version of the data.

Our requirement is to audit the management data, since that is the only data that survives a process restart or a card reset. We have looked at two approaches to handle this:

(1) Periodically obtain the checksum of the files at all the levels, and compare them. In case of a discrepancy, we always defer to the higher level card. While this seems reasonable, the cards themselves have different processors, and they may not yield checksums in a consistent manner for the same data file

(2) When the highest level card successfully updates the data pertaining to the target card, it logs the management request. As mentioned above, the highest level card will always update its version of the data, as long as the basic validation is successful. The target card will also log the update management requests that caused it to update some data. Periodically the top level card will send a message that indicates to the target card what all updates it (the top level card) has made for the target card. The target card will compare this information with the log of update commands that it is maintaining. This comparison will be made based on a correlation tag that is generated by the top level card.

If there are more entries in the top-level card's version of the successful updates that have been made since the last audit cycle, it means that the top-level card processed more update commands than the target card. Moreover, the target card will know exactly which commands it missed, and it executes those commands on itself (albeit in a time delayed manner). Upon success, both the top-level card and the target card will delete these entries in the log files that they are maintaining. In case of a failure, these entries will not be deleted, and hopefully the reconciliation will take place in the next audit cycle.

After reconciliation, if there are any intermediate cards, between the top-level card and the target card, the data to those cards will be blindly over-written by the top-level card. This will minimize the risk of the top-level card (which is really the data master) and the intermediate cards getting out of sync.

If anyone can think of other approaches that we can consider, kindly post them here.

Thanks, Zahid

Reply to
Loading thread data ...

Stop. Do not pass go, do not collect $200. First establish a predictable, platform-independent method of integrity checking your data!

Reply to

I agree with Lewin in this respect. You should explore the clues in your data that will lead you to identify the means by which you will know it is reasonable. It is quite possible to have wrong data with perfectly valid checksums. Range checking and check digits would help as well. However, what you do will depend on your data.

Merry Christmas to you Lewin by the way.

Paul E. Bennett ....................
 Click to see the full signature
Reply to
Paul E. Bennett

A note on terminology. An audit is a procedure to independently

*test* an assertion, in this case that multiple copies of your data are consistent. The primary result of an audit is the a report of discrepancies. What you describe later is a synchronization technique to ensure synchronization. The primary result of a synchronization technique is consistent data.

This seems to be an area that might cause discrepancies. Can you flag the update as unverified and, when the card is restored, test and optionally reload the data in question?

This is worthwhile. It doesn't restore consistency per se, but is a good test of whether it exists. As others noted, define a processor-independent signature procedure. I would want to use something more robust that a checksum, perhaps a 32-bit CRC.

While this is useful for determining what data isn't updated, as thus helping to restore consistency from a partial update, it isn't as reliable as a signature of the data itself, such as a CRC, because of the possibility of an update being misapplied.

I think the signature method is good because of the reliability. You can extend it to perform signatures on sections of the data to determine which areas contain mismatches. That may be easier to do than tracking updates.

Reply to
Thad Smith

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.