Embedding a Checksum in an Image File

- R
- Rick C
  
  Contact options for registered users
posted
1 year ago

Thu, Apr 20, 2023 2:06 AM

This is a bit of the chicken and egg thing. If you want a embed a checksum in a code module to report the checksum, is there a way of doing this? It's a bit like being your own grandfather, I think.

I'm not thinking anything too fancy, like a CRC, but rather a simple modulo N addition, maybe N being 2^16.

I keep thinking of using a placeholder, but that doesn't seem to work out in any useful way. Even if you try to anticipate the impact of adding the checksum, that only gives you a different checksum, that you then need to anticipate further... ad infinitum.

I'm not thinking of any special checksum generator that excludes the checksum data. That would be too messy.

I keep thinking there is a different way of looking at this to achieve the result I want...

Maybe I can prove it is impossible. Assume the file checksums to X when the checksum data is zero. The goal would then be to include the checksum data value Y in the file, that would change X to Y. Given the properties of the module N checksum, this would appear to be impossible for the general case, unless... Add another data value, called, checksum normalizer. This data value checksums with the original checksum to give the result zero. Then, when the checksum is also added, the resulting checksum is, in fact, the checksum. Another way of looking at this is to add a value that combines with the added checksum, to be zero, leaving the original checksum intact.

This might be inordinately hard for a CRC, but a simple checksum would not be an issue, I think. At least, this could work in software, where data can be included in an image file as itself. In a device like an FPGA, it might not be included in the bit stream file so directly... but that might depend on where in the device it is inserted. Memory might have data that is stored as itself. I'll need to look into that.

- N
- Niklas Holsti
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Thu, Apr 20, 2023 9:14 AM

Some decades ago I was involved with a project for an 8052-based device, which was required to perform a code-check-sum check at boot.

We decided to use a byte-per-byte xor checksum and make the correct check-sum be zero. We had a code module (possibly in assembler, I don't remember) that defined a one-byte "adjustment" constant in code memory. For each new version of the code, we first set the adjustment constant to zero, then ran the program, and it usually reported an error at boot because the check-sum was not zero. We then changed the adjustment constant to the actual reported checksum, C say, and that zeroed the check-sum because C xor C = 0. Bingo. You can use this method to make the checksum anything you like, for example hex 55.

With a more advanced order-sensitive check-sum such as a CRC you could use the same method if you also ensure (by linker commands) that the adjustment value is always the last value that enters in the computed check-sum (assuming that the linking order of the other code modules is not incidentally changed when the value of the adjustment constant is changed).

- P
- Peter Heitzer
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Thu, Apr 20, 2023 11:30 AM

What about putting the following structure at a fixed address at the end of ROM?: <startaddr><len><checksum>

Your check function then for example does a 16 bit sum of the bytes from <startaddr>..<startaddr>+<len>-1 and compares with <checksum>

<startaddr>, <len> an <checksum> can be evaluated at compile time.

- D
- dalai lamah
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Thu, Apr 20, 2023 11:47 AM

Un bel giorno Rick C digitò:

I'm probably not understanding what you mean, but normally the checksum is stored in a memory section which is not subjected to the checksum calculation itself.

The actual implementation depends on the tools you are using. Many linkers support this directly: you specify the memory section(s) subjected to checksum calculation, the type of checksum (CRC16, CRC32 etc) and the memory section that will store the checksum.

Here is a technical note for IAR:

formatting link

A "poor man" solution is to do it manually:

-In the source code, declare your checksum initializing to a known, fixed value (e.g. 0xDEADBEEF)

-Run the program with a debugger; set a breakpoint when it calculates the checksum (and fails), and write down the correct checksum

-Using a binary editor, find the fixed value into the executable binary, and replace it with the correct value.

- R
- Rick C
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Thu, Apr 20, 2023 1:04 PM

Yes, I didn't explain it clearly. I am not looking for a way to calculate the checksum from a processor. That would be trivial. I want to embed the checksum in the code, so that it can be provided at run time as an ID, a way to validate the version number.

I wish to perform this checksum on the executable file.

Yeah, this is not useful, because changing the value stored changes the checksum. It also makes assumptions about the target.

Maybe this was not the best group to ask the question in. I thought this was more of a math problem with I started writing the question and the embedded community had already dealt with it.

- R
- Rick C
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Thu, Apr 20, 2023 1:18 PM

Yes, it had occurred to me that a simple checksum could be used with adjustment codes. But I don't want the checksum to be set to some value, in this way. I would like to embed the check sum generated from the file. The way to do this is to embed the checksum in the spot where it can be read for reporting. Then another value can be embedded elsewhere, that complements the checksum, keeping the file checksum constant.

Your mention of the XOR checksum makes me realize that if I use addition, rather than XOR, a 16 bit checksum only has a complement if the data used in the calculation are 16 bit quantities. If the 16 bit checksum is calculated using 8 bit data, there will be a carry out of the lower 8 bits changing the final checksum. The XOR checksum is really the equivalent of 8 separate bit level checksums. This has the short coming of one bit detection, but two bit changes in the same bit of two bytes not being detected. But since I'm not trying to protect against changes, this isn't really a problem. I'm using this as a verification of the version number.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Thu, Apr 20, 2023 2:46 PM

I am not sure what your intended use-case is here. But it is very common to add a checksum of some sort to binary image files after generating them. This is done post-link. You have a struct in your read-only data that you link at a known fixed point in the binary. Your post-link patcher can read this struct (for example, to get the program version number that is then used to rename the final image file). It can modify the struct (such as inserting the length of the image). Then it calculates a CRC and appends it to the end of the image.

- G
- George Neuner
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Thu, Apr 20, 2023 3:33 PM

Take a look at the old xmodem/ymodem CRC. It was designed such that when the CRC was sent immediately following the data, a receiver computing CRC over the whole incoming packet (data and CRC both) would get a result of zero.

But AFAIK it doesn't work with CCITT equation(s) - you have to use xmodem/ymodem.

Sorry, I don't know a way to do it with a modular checksum. YMMV, but I think 16-bit CRC is pretty simple.

George

- R
- Rick C
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Thu, Apr 20, 2023 4:45 PM

CRC is not complicated, but I would not know how to calculate an inserted value to force the resulting CRC to zero. How do you do that?

Even so, I'm not trying to validate the file. I'm trying to come up with a substitute for a time stamp or version number. I don't want to have to rely on my consistency in handling the version number correctly. This would be a backup in case there was more than one version released, even only within the "lab", that were different. A checksum that could be read by the controlling software would do the job.

I have run into this before, where the version number was not a 100% indication of the uniqueness of an executable. The checksum would be a second indicator.

I should mention that I'm not looking for a solution that relies on any specific details of the tools.

- T
- Tauno Voipio
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Thu, Apr 20, 2023 5:17 PM

The method to check for a proper constant value after the whole block and CRC are received and put through the generator works with the CRC-CCITT (actually ITU-T). The proper final value depends on the initial CRC and whether the CRC is inverted before sending. The limitation is that the CRC has to be sent least significant octet first.

For a reference, see RFC1662, Appendix C.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Thu, Apr 20, 2023 8:26 PM

You "insert" the value at the end. Anything else is insane.

CRC's are quite good hashes, for suitable sized data. There are perhaps some special cases, but basically you'd be doing trial-and-error searches to find an inserted value that gives you a zero CRC overall.

2^16 is not an overwhelming search space, but the whole idea is pointless.

A CRC is fine for that.

A table-based CRC is easy, runs quickly, and can be quickly ported to pretty much any language (the C and Python code, for example, is almost the same).

- G
- George Neuner
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Thu, Apr 20, 2023 8:44 PM

It's implicit in the equation they chose. I don't know how it works - just that it does.

You have some block of data |....data....|

You compute CRC on the data block and then append the resulting value to the end of the block. xmodem CRC is 16-bit, so it adds 2 bytes to the data.

So now you have a new extended block |....data....|crc|

Now if you compute a new CRC on the extended block, the resulting value /should/ come out to zero. If it doesn't, either your data or the original CRC value appended to it has been changed/corrupted.

I've actually done this: in the early 90s I designed a system that used a CRC based scheme to identify load modules and track inter-module code dependencies.

I computed both 16-bit Xmodem and CCITT CRCs on the modules and concatenated the two values into a 32-bit identifier. That identifier then was used to sign the module and to demand load (or unload) it when needed.

At the time it worked quite well: the system had quite limited memory, so code modules were small enough that even a 16-bit CRC could uniquely identify most/all of them. Combining the two different CRCs into a 32-bit identifier provided more than enough uniqueness, it was fast and easy to compute, and it saved a lot of space vs using something with stronger guarantees like a UUID or crypto-strength signing hash. [A lot of the hashing functions available today either didn't exist or just weren't widely known back then. And still most of them that even have 32-bit variants are weak in guarantees for those variants.]

I made it the basis of dependency checking. Version numbers were secondary and for the benefit of the programmer.

YMMV. George

- G
- George Neuner
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Thu, Apr 20, 2023 8:49 PM

I remember seeing an explanation of it decades ago, but I never would have been able to find it again.

Thanks, George

- R
- Richard Damon
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Fri, Apr 21, 2023 2:09 AM

IF I understand you correctly, what you want is for the file to compute to some "checksum" that comes from the basic contents of the file, and then you want to add the "checksum" into the file so the program itself can print its checksum.

One fact to remember, is that "cryptographic hashes" were invented because it was too easy to create a faked file that matches a non-crptographic hash/checksum, so that couldn't be a key to make sure you really had the right file in the presence of a determined enemy, but the checksums were good enough to catch "random" errors.

This means that you can add the checksum into the file, and some additional bytes (likely at the end) and by knowing the propeties of the checksum algorithm, compute a value for those extra bytes such that the "undo" the changes caused by adding the checksum bytes to file.

I'm not sure exactly how to computes these, but the key is that you add something at the end of the file to get the checksum back to what the original file had before you added the checksum into the file.

- R
- Rick C
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Fri, Apr 21, 2023 2:41 AM

Yeah, for a simple checksum, I think that would be easy, at least if "checksum" means a bitwise XOR operation. If the checksum and extra bytes are both 16 bits, this would also work for an arithmetic checksum where each 16 bit word were added into the checksum. All the carries would cascade out of the upper 16 bits from adding the inserted checksum and it's 2's complement.

I don't even want to think about using a CRC to try to do this.

- B
- Brian Cockburn
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Fri, Apr 21, 2023 8:53 AM

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Fri, Apr 21, 2023 10:43 AM

George's earlier posts made it look like the algorithm was inserting ("embedding") a value somewhere inside the image, so that the CRC over the modified image was zero. This is easy to do for simple checksums such as XOR's or a sum-of-bytes checksum, but infeasible for CRC's.

It is a much easier matter when appending the checksum. Depending somewhat on the details of the CRC (such as bit/byte reversals, inversions, starting values, etc.) it is typically the case that for a binary blob A, crc(A ++ crc(A)) = 0. i.e., if you append the CRC of your data to the data, the CRC of the whole thing is 0.

Of course, this is pretty much irrelevant - whether you check the integrity of the final image by running CRC over it all and comparing to

0, or running it over all but the last word and comparing to the last word is a minor matter.

No, that is pointless for something like a code image. It just adds needless complexity to your CRC algorithm.

You should already have checks that would eliminate an all-zero image or other "obviously corrupt" data. You'll be checking the image for a key or "magic number" that identifies the image as "program image for board X, project Y". You'll be checking version numbers. You'll be reading the length of the image so you know the range for your CRC function, and where to find the appended CRC check. You might not have all of these in a given system, but you'll have some kind of check which would fail on an all-zero image.

Again, pointless.

Salt is important for security-related hashes (like password hashes), not for integrity checks.

"Integrity" has nothing to do with the motivation for change. /Security/ is concerned with intentional modifications that deliberately attempt to defeat /integrity/ checks. Integrity is about detecting any changes.

If you are concerned about the possibility of intentional malicious changes, CRC's alone are useless. All the attacker needs to do after modifying the image is calculate the CRC themselves, and replace the original checksum with their own.

Using non-standard algorithms for security is a simple way to get things completely wrong. "Security by obscurity" is very rarely the right answer. In reality, good security algorithms, and good implementations, are difficult and specialised tasks, best left to people who know what they are doing.

To make something secure, you have to ensure that the check algorithms depend on a key that you know, but that the attacker does not have. That's the basis of digital signatures (though you use a secure hash algorithm rather than a simple CRC).

- R
- Rick C
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Fri, Apr 21, 2023 12:12 PM

This is simply to be able to say this version is unique, regardless of what the version number says. Version numbers are set manually and not always done correctly. I'm looking for something as a backup so that if the checksums are different, I can be sure the versions are not the same.

The less work involved, the better.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Fri, Apr 21, 2023 2:50 PM

No - but "code images" is the topic here.

However, in almost every case where CRC's might be useful, you have additional checks of the sanity of the data, and an all-zero or all-one data block would be rejected. For example, Ethernet packets use CRC for integrity checking, but an attempt to send a packet type 0 from MAC address 00:00:00:00:00:00 to address 00:00:00:00:00:00, of length 0, would be rejected anyway.

I can't think of any use-cases where you would be passing around a block of "pure" data that could reasonably take absolutely any value, without any type of "envelope" information, and where you would think a CRC check is appropriate.

See above.

Do you actually know what is meant by "salt" in the context of hashes, and why it is useful in some circumstances? Do you understand that "salt" is added (usually prepended, or occasionally mixed in in some other way) to the data /before/ the hash is calculated?

I have not given the slightest indication to suggest that "0" is a special value. I fully agree that the value you get from the checking algorithm does not have to be 0 - I already suggested it could be compared to the stored value. I.e., your build your image file as "data

++ crc(data)", at check it by re-calculating "crc(data)" on the received image and comparing the result to the received crc. There is no necessity or benefit in having a crc run calculated over the received data plus the received crc being 0.

"Salt" is used in cases where the original data must be kept secret, and only the hashes are transmitted or accessible - by adding salt to the original data before hashing it, you avoid a direct correspondence between the hash and the original data. The prime use-case is to stop people being able to figure out a password by looking up the hash in a list of pre-computed hashes of common passwords.

You are making no sense at all. Are you suggesting that it would be a good idea to add some value to the start of the image so that the resulting crc calculation gives a nice recognisable product code? This "salt" would be different for each program image, and calculated by trial and error. If you want a product code, version number, etc., in the program image (and it's a good idea), just put these in the program image!

Accidental changes (such as human error, noise during data transfer, memory cell errors, etc.) do not pass integrity tests unnoticed. To be more accurate, the chances of them passing unnoticed are of the order of

1 in 2^n, for a good n-bit check such as a CRC check. Certain types of error are always detectable, such as single and double bit errors. That is the point of using a checksum or hash for integrity checking.

/Intentional/ changes are a different matter. If a hacker changes the program image, they can change the transmitted hash to their own calculated hash. Or for a small CRC, they could change a different part of the image until the original checksum matched - for a 16-bit CRC, that only takes 65,535 attempts in the worst case.

That is why you need to distinguish between the two possibilities. If you don't have to worry about malicious attacks, a 32-bit CRC takes a dozen lines of C code and a 1 KB table, all running extremely efficiently. If security is an issue, you need digital signatures - an RSA-based signature system is orders of magnitude more effort in both development time and in run time.

And what, exactly, do you think that anecdote tells us about CRC checks for image files? It reminds us that we are all fallible, but does no more than that.

Security by obscurity is not security. Having a hidden signature or other mark can be useful for proving ownership (making an intentional mistake is another common tactic - such as commercial maps having a few subtle spelling errors). But that is not security.

Any serious manufacturer /does/ invite challenges to their security.

There are multiple reasons why a manufacturer (such as a semiconductor manufacturer) might be guarded about the details of their security systems. They can be avoiding giving hints to competitors. Maybe they know their systems aren't really very secure, because their keys are too short or they can be read out in some way.

But I think the main reasons are often:

They want to be able to change the details, and that's far easier if there are only a few people who have read the information.

They don't want endless support questions from amateurs.

They are limited by idiotic government export restrictions made by ignorant politicians who don't understand cryptography.

Some things benefit from being kept hidden, or under restricted access. The details of the CRC algorithm you use to catch accidental errors in your image file is /not/ one of them. If you think hiding it has the remotest hint of a benefit, you are doing things wrong - you need a /security/ check, not a simple /integrity/ check.

And then once you have switched to a security check - a digital signature - there's no need to keep that choice hidden either, because it is the /key/ that is important, not the type of lock.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Fri, Apr 21, 2023 3:02 PM

Run a simple 32-bit crc over the image. The result is a hash of the image. Any change in the image will show up as a change in the crc.