Embedding a Checksum in an Image File

- R
- Richard Damon
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Sun, Apr 23, 2023 10:16 PM

I have seen 16-bit checksums done both ways. Summing 16 bit units does eliminate the issue of clustering, and makes adjacent byte swaps detectable.

- R
- Rick C
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Sun, Apr 23, 2023 10:24 PM

If you don't understand, you are making this far more complicated than it is. I don't know what to tell you. There are no other details that are relevant. Don't read into this, what is not there.

Why? What makes a CRC an "appropriate" choice. Normally, when I design something, I establish the requirements. What requirements are you assuming, that would make the CRC more desireable than a simple checksum?

What have I said that makes you think security is an issue??? I don't recall ever mentioning anything about security. Do you recall what I did say?

The fact that they are "simple and efficient" is not a reason to use them. I repeat, what are the requirements?

I suppose there is that possibility. But when people make claims about something being good or "better", without substantiation, there's not much to learn.

If you think a discussion of CRC calculations would be useful, why don't you open a thread and discuss them, instead of insisting they are the right solution to my problem, when you don't even know what the problem requirements are? It's all here in the thread. You only need to read, without projecting your opinions on the problem statement.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Mon, Apr 24, 2023 7:13 AM

Long ago, there used to be a definite risk of mixing up endianness when dealing with program images burned to flash or eeprom. Popular "hex" formats like Intel Hex and Motorola SRecord could differ in endianness. So byte swaps in the entire image was a real possibility, and good to guard against. But it's hard to imagine how an individual byte swap could occur - I see bigger movements and re-arrangements being more likely, and using 16-bit units will not help much there. Still, I think there is little doubt that using 16-bit units is better than using 8-bit units in many ways (except for efficient implementation on small 8-bit devices).

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Mon, Apr 24, 2023 7:17 AM

I've already explained this in quite a lot of detail in this thread (as have others). If you don't like my explanation, or didn't read it, that's okay. You are under no obligation to learn about CRCs. Or if you prefer to look it up in other sources, that's obviously also an option.

I've asked you this before - are you /sure/ you understand how Usenet works?

- R
- Rick C
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Mon, Apr 24, 2023 8:07 AM

Hmmm... I ask you a question about why you think CRC is better for my application and you respond oddly. So you can't explain why the CRC would be better for my application? OK, thanks anyway.

I will say this again, rather than burying your comments on CRC in this thread about checksums, why not open a new thread, and allow the world to read what you have to say, instead of commenting as a side topic in a thread where most people have tuned out long ago? You can use an appropriate subject line like, "Why CRC is better than checksums for some applications".

Or you can continue to muddy up the waters here by discussing something that is of no value in this application.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Mon, Apr 24, 2023 2:37 PM

I am assuming that the CRC is there to determine the integrity of the data in the face of possible unintentional errors. That's what CRC checks are for. They have nothing to do with the content of the data, or the type of the data package or image.

As an example of the use of CRC's in messaging, look at Ethernet frames:

formatting link

The CRC does not care about the content of the data it protects.

I'm sorry, I simply cannot see your point. Identifying the version of a protocol, or other protocol type information, is a totally orthogonal task to ensuring the integrity of the data. The concepts should be handled separately.

Can we agree that this is called an "initial value", not "salt" ?

No. They are different things.

An important difference is that adding "salt" to a password hash is an important security feature. Picking a different initial value for a CRC instead of having appropriate protocol versioning in the data (or a surrounding envelope) is a misfeature.

The second difference is the purpose of the hashing. The CRC here is for data integrity - spotting mistakes in the data during transfer or storage. The hash in a password is for security, avoiding the password ever being transmitted or stored in plain text.

Any coincidence in the the way these might be implemented is just that - coincidence.

I looked up RMI on Wikipedia before asking, and saw nothing of relevance to CRC's or checksums. I noticed no mention of "OCL" in your posts, and looking it up on Wikipedia gives no clues.

So for now, I'll assume you don't want anyone to know what you meant and I can safely ignore anything you write in connection with the terms.

I did. Zero benefit.

Actually, it is worse than useless - it makes it harder to identify the protocol, and reduces the information content of the CRC check.

A /single/ result does not matter (other than needlessly complicating things). Having multiple different valid results /does/ matter.

Do you bother to read my posts at all? Or do you prefer to make up things that you imagine I write, so that you can make nonsensical attacks on them? Certainly there is no sane reading of my posts (written and sent from an /open/ OS) where "do not rely on security by obscurity" could be taken to mean "rely on obscured and closed platforms".

Why are you trying to "persuade" me that manufacturer obscurity is a bad thing? You have been promoting obscurity of algorithms as though it were helpful for security - I have made clear that it is not. Are you getting your own position mixed up with mine?

No, you can't. The sources for Linux and *BSD /are/ all freely available. The private signing keys used by, for example, Red Hat or Debian, are /not/ freely available. You cannot make changes to a Red Hat or Debian package that will pass the security checks - you are unable to sign the packages.

This is precisely because something /outside/ the box /is/ involved - the private half of the public/private key used for signing. The public half - and all the details of the algorithms - is easily available to let people verify the signature, but the private half is kept secret.

(Sorry, but I've skipped and snipped the rest. I simply don't have time to go through it in detail. If others find it useful or interesting, that's great, but there has to be limits somewhere.)

- U
- Ulf Samuelsson
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Thu, Apr 27, 2023 4:26 PM

Den 2023-04-20 kl. 04:06, skrev Rick C:

The proper way to do this is to have a directive in the linker. This reserves space for the CRC and defines the area where the CRC is calculated. I am not aware of any linker which support this.

Two months ago, I added the DIGEST directive to binutils aka the GNU linker. It was committed, but then people realized that I had not signed an agreement with Free Software Foundation. Since part of the code I pushed was from a third party which released their code under MIT, the licensing has not been resolved yet but the patch is in binutils git, but reverted.

You would write (IIRC): DIGEST "CRC64-ECMA", (from, to) and the linker would reserve 8 bytes which is filled with the CRC in the final link stage.

/Ulf

- U
- Ulf Samuelsson
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Thu, Apr 27, 2023 4:27 PM

Den 2023-04-20 kl. 04:06, skrev Rick C:

The proper way to do this is to have a directive in the linker. This reserves space for the CRC and defines the area where the CRC is calculated. I am not aware of any linker which support this.

Two months ago, I added the DIGEST directive to binutils aka the GNU linker. It was committed, but then people realized that I had not signed an agreement with Free Software Foundation. Since part of the code I pushed was from a third party which released their code under MIT, the licensing has not been resolved yet but the patch is in binutils git, but reverted.

You would write (IIRC): DIGEST "CRC64-ECMA", (from, to) and the linker would reserve 8 bytes which is filled with the CRC in the final link stage.

/Ulf

- U
- Ulf Samuelsson
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Thu, Apr 27, 2023 4:36 PM

In all projects I have been involved with, the application binary starts with a header looking like this.

MAGIC WORD 1 CRC Entry Point Size other info... MAGIC WORD 2 APPLICATION_START ... APPLICATION_END (aligned with flash sector)

The bootloader first checks the two magic words. It then computes CRC on the header (from Entry Point) to APPLICATION_END

I ported the IAR ielftool (open source) to Linux at

formatting link

This can insert the CRC in the ELF file, but needs tweaks to work with an ELF file generated by the GNU tools.

/Ulf

- U
- Ulf Samuelsson
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Thu, Apr 27, 2023 4:42 PM

Another thing I added (and was later removed) was a timestamp directive. A 64 bit integer with the number of seconds since 1970-01-01 00:00.

/Ulf

- R
- Rick C
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Thu, Apr 27, 2023 5:09 PM

That assumes there is a linker. How does the application access this information?

You are making a lot of assumptions about the tools. I'm pretty sure they don't apply to my case. I'm not at all clear how this is workable, anyway. Adding the checksum to the file, changes the checksum, which is where this conversation started... unless I'm missing something significant.

- N
- Niklas Holsti
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Thu, Apr 27, 2023 6:29 PM

Almost all toolchains have a linker.

In Ulf's suggestion, it seems the DIGEST directive emits 8 bytes of checksum at the current point (usually the linker "." symbol). I assume one can give that point in the image a linkage symbol, perhaps like

_checksum DIGEST "CRC64-ECMA", (from, to)

or like

_checksum EQU. . DIGEST "CRC64-ECMA", (from, to)

(This is schematic linker code, not necessarily proper syntax.)

One can then from the application code access the "checksum" location as an externally defined object, say:

extern uint8[8] checksum;

The linker will connect that C identifier to the actual address of the DIGEST checksum. Here I assumed that the C compiler mangles C identifiers into linkage symbols by prefixing an underscore; YMMV.

But you have insisted that your "checksum" is for the purpose of identifying the version of the program, not for checking the integrity of the memory image. If so, that checksum does not have to be the checksum of the whole memory image, as long as it is the checksum of the part of the image that contains the actual code and constant data, and so will change according to changes in those parts of the image.

- N
- Niklas Holsti
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Thu, Apr 27, 2023 6:39 PM

should be

_checksum EQU .

(Thunderbird inserted an extra period out of "friendliness"...)

should be (my C is rusty):

extern uint8 checksum[8];

- U
- Ulf Samuelsson
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Thu, Apr 27, 2023 8:36 PM

Linker command file public CRC64; start, stop HEADER = .; QUAD(MAGIC); CRC64 = .; DIGEST "CRC64-ECMA", (start, stop) start = .; # Your data to be protected ... stop = .;

C source code.

extern uint64_t CRC64; extern char* start; extern char* stop;

uint64_t crc;

crc64 = calc_crc64_ecma(start, stop); if (crc64 == CRC64) { /* everything is OK */ }

I am assuming that no tool support this off the shelg, but the patches are inside binutils, but reverted.

/Ulf

- U
- Ulf Samuelsson
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Thu, Apr 27, 2023 8:44 PM

Yes, that is more or less it.

No, you reserve room for the checksum, but that needs to be outside the checked area. The address of the checksum needs to be known to the application. Also the limits of the checked area. That is why the application has a header in front in my projects. The application is started by the bootloader, which checks a number of things before the application is started. The application can read the header as well to allow checking the code area at runtime.

/Ulf

- N
- Niklas Holsti
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Thu, Apr 27, 2023 10:10 PM

I'm nit-picking, but that C code does not look right to me. The extern declarations for "start" and "stop" claim them to be names of memory locations that contain addresses, but the linker file just places them at the starting and one-past-end locations of the block to be protected. So the "start" variable contains the first bytes of the "data to be protected", and the contents of the "stop" variable are not defined because it is placed after the "data to be protected", where no code or data is loaded (it seems).

It seems to me that the call to calc_crc64_ecma should get the addresses of "start" and "stop" as arguments (&start, &stop), instead of their values. But perhaps calc_crc64_ecma is not a function, but a macro that can itself take the addresses of its parameters.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Fri, Apr 28, 2023 7:12 AM

That can work for some microcontrollers, but is unsuitable for others - it depends on how the flash is organised. For an msp430, for example, it would be fine, as the interrupt vectors (including the reset vector) are at the end of flash. But for most ARM Cortex M devices, it would not be suitable - they expect the reset vector and initial stack pointer at the start of the flash image. Some devices have a boot ROM, and then you have to match their specifics for the header - or you can have your own boot program, and make the header how ever you like.

I am absolutely a fan of having some kind of header like this (and sometimes even a human-readable copyright notice, identifier and version information). And having it as near the beginning as possible is good. But for many microcontrollers, having it at the start is not feasible. And if you can't put the CRC at the start like you do, you have to put it at the end of the image.

I've never really thought about trying to inject a CRC into an elf file. I use elfs (or should that be "elves" ?) for debugging, not flash programming. And usually the main concern for having a CRC at the end of the image is when you have an online update of some kind, to check that nothing has gone wrong during the transfer or in-field update.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Fri, Apr 28, 2023 7:20 AM

Timestamping a build in some way (as part of the "make", using __DATE__ or __TIME__ in source code, or some feature of a revision control system) is very tempting, and can be helpful for tracking exactly what code you have on the system.

However, IMHO having reproducible builds is much more valuable. I am not happy with a project build until I am getting identical binaries built on multiple hosts (Windows and Linux). That's how you can be absolutely sure of what code went into a particular binary, even years or decades later.

A compromise that can work is to distinguish development builds and production builds, and have timestamping in development builds. That also reduces the rate at which your minor version number or build number goes up, and avoids endless changes to your "version.h" include file.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Fri, Apr 28, 2023 7:24 AM

I like that. Thanks for doing that work.

Is there also a way to get the length of the final link, and insert it near the beginning of the image? I suppose that would be another kind of DIGEST where the algorithm is simply (to - from). (I assume that "to" and "from" may be linker symbols.)

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Fri, Apr 28, 2023 7:33 AM

It is possible that Rick is using Forth, rather than C (or other languages traditionally compiled in a similar manner, such as C++ and Ada). There are also some commercial C toolchains for brain-dead 8-bit CISC devices that are monolithic and offer very little control over the linking.

Ulf is correct that the ideal place to handle this is part of the linking process. I do it with a post-link Python script run during the build, because the linkers I use can't handle this at the moment. But if Ulf's patch works its way into binutils then I'll be able to do it directly during linking, which is neater. (I will still have post-link scripts to handle things like renaming image files according to version, making zips for sending to customers, etc. - linkers can't do /everything/ !)