Question regarding a data integrity audit procedure

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
We have a requirement to develop an audit procedure that ensures the
integrity of our data.  We have a hierarchical, embedded, real-time
system, and our data is distributed over multiple cards.  Each higher
level card stores the data and executable image of its lower cards.
All requests to update the data emanate from external management
systems, and they are incident on the highest level card, where some
basic validation is performed.

Management update requests are processed in a trickle-down manner.  An
update request is routed to the target card, where it is processed.
The target card is given an opportunity to validate the request based
on the run-time conditions that prevail on that card (which only the
target card knows about, since run-time data is not maintained at
higher levels).  If the target card explicitly rejects the request, and
the reject response is not lost before it reaches the top level, the
update request is rejected, and the rejection response is sent to the
external management entity that issued the request.  If the request
times out (due to link failure or card failure), the higher level card
goes ahead and updates its version of the data.

Our requirement is to audit the management data, since that is the only
data that survives a process restart or a card reset.  We have looked
at two approaches to handle this:

(1) Periodically obtain the checksum of the files at all the levels,
and compare them.  In case of a discrepancy, we always defer to the
higher level card.  While this seems reasonable, the cards themselves
have different processors, and they may not yield checksums in a
consistent manner for the same data file

(2) When the highest level card successfully updates the data
pertaining to the target card, it logs the management request.  As
mentioned above, the highest level card will always update its version
of the data, as long as the basic validation is successful.  The target
card will also log the update management requests that caused it to
update some data.  Periodically the top level card will send a message
that indicates to the target card what all updates it (the top level
card) has made for the target card.  The target card will compare this
information with the log of update commands that it is maintaining.
This comparison will be made based on a correlation tag that is
generated by the top level card.

If there are more entries in the top-level card's version of the
successful updates that have been made since the last audit cycle, it
means that the top-level card processed more update commands than the
target card.  Moreover, the target card will know exactly which
commands it missed, and it executes those commands on itself (albeit in
a time delayed manner).  Upon success, both the top-level card and the
target card will delete these entries in the log files that they are
maintaining.  In case of a failure, these entries will not be deleted,
and hopefully the reconciliation will take place in the next audit
cycle.

After reconciliation, if there are any intermediate cards, between the
top-level card and the target card, the data to those cards will be
blindly over-written by the top-level card.  This will minimize the
risk of the top-level card (which is really the data master) and the
intermediate cards getting out of sync.

If anyone can think of other approaches that we can consider, kindly
post them here.

Thanks,
Zahid


Re: Question regarding a data integrity audit procedure


Quoted text here. Click to load it

Stop. Do not pass go, do not collect $200. First establish a
predictable, platform-independent method of integrity checking your
data!


Re: Question regarding a data integrity audit procedure

Quoted text here. Click to load it

I agree with Lewin in this respect. You should explore the clues in your
data that will lead you to identify the means by which you will know it is
reasonable. It is quite possible to have wrong data with perfectly valid
checksums. Range checking and check digits would help as well. However,
what you do will depend on your data.

Merry Christmas to you Lewin by the way.

--
********************************************************************
We've slightly trimmed the long signature. Click to see the full one.
Re: Question regarding a data integrity audit procedure
Quoted text here. Click to load it

A note on terminology.  An audit is a procedure to independently
*test* an assertion, in this case that multiple copies of your data
are consistent.  The primary result of an audit is the a report of
discrepancies.  What you describe later is a synchronization technique
to ensure synchronization.  The primary result of a synchronization
technique is consistent data.

Quoted text here. Click to load it

This seems to be an area that might cause discrepancies.  Can you flag
the update as unverified and, when the card is restored, test and
optionally reload the data in question?

Quoted text here. Click to load it

This is worthwhile.  It doesn't restore consistency per se, but is a
good test of whether it exists.  As others noted, define a
processor-independent signature procedure.  I would want to use
something more robust that a checksum, perhaps a 32-bit CRC.

Quoted text here. Click to load it

While this is useful for determining what data isn't updated, as thus
helping to restore consistency from a partial update, it isn't as
reliable as a signature of the data itself, such as a CRC, because of
the possibility of an update being misapplied.

I think the signature method is good because of the reliability.  You
can extend it to perform signatures on sections of the data to
determine which areas contain mismatches.  That may be easier to do
than tracking updates.

--
Thad

Site Timeline