Saving C structures to file -- best practices

I'm doing some work on a 16-bit micro and one of the things the software does is store data in a structure for easy access. Currently, I save the entire structure to file and get it back from file when the program is restarted. No problem so far.

However, if I need to change the structure (add a member, remove one, change one, etc.), the data, when read in from file, gets corrupted as one may expect. In addition, this code may be ported to a 32-bit controller, which means the alignment of the data in the structure will be different. So even if the structure doesn't change, when reading data created by a 16-bit processor I would expect the 32-bit processor to corrupt the data.

So here is my question -- what are the best practices for saving data from a C structure so it is universally readable -- even if the structure definition changes? I'm leaning toward a character array (buffer) that uses name-value pairs, but that is a lot more work, and may be over-kill. Any thoughts?

Dave,

Reply to
Dave Boland
Loading thread data ...

Dave Boland wibbled on Wednesday 24 March 2010 15:16

Name value pairs are good. But for a tiny system, how about replacing the name with an integer (16 bits perhaps) in binary, or hex (depending on whether you wish to ascii edit the file or not). You could even have a dump at the start that lists the name-values with their corresponding human readable name for easy reference (that would consume a little flash but would, apart from the grab and write, not really affect the program that would work entirely to the integer "names" (using macro defines of course for program readability).

So you could have either, for maximum human readability:

10A6="Wibble" 206C=89fe66bb EOF

etc

or a binary file of the form ( represents one or more bytes)

Reply to
Tim Watts

Add to the structure a version and a size member in a fixed location (most easily at the top of the structure). That gives your application code at least some chance of doing the right thing if it knows, e.g., that member X is only present in version Y and later.

Once the layout has mostly settled down, only add new members at the end. Never remove (or change) old members, just mark them as "reserved" or "optional" and fill them with benign values.

For alignment, make the first element of the structure a 32-bit quantity. The standard requires that a pointer to the structure also points to the first element (and vice versa, of course). That should "square up" the overall alignment between 16- and 32-bit environments.

--
Rich Webb     Norfolk, VA
Reply to
Rich Webb

This way eventially leads to disaster because of platform - toolset - settings dependent order of the members, alignment, size, endian and padding.

If you want to save structures, make explicit functions for conversion each struct member to array of bytes and back.

The only portable way of saving data to files is in the text form. This also allows for easy editing of the files. I like Windows-style (like GetPrivateProfileString() does):

[Setup_Data]

foo = 12345; bar = 6789;

[Test_Data]

bla_bla_bla = 123.456;

etc. etc.

Vladimir Vassilevsky DSP and Mixed Signal Design Consultant

formatting link

Reply to
Vladimir Vassilevsky

I'd suggest coding in XML (or similar) ... pretty much any arbitrary structure can be serialized, but you'll need a parser which can handle your variant to read the structures back in ...

Reply to
Spam

I thought about XML, but that will eat up too much storage (I have 2 KB). I also thought about JASON, and the way web page data is encoded when a POST occurs (name1=data1&name2=data2a+data2b).

I'm going to look at all of the suggestions to see what makes the best compromise between robust, small, flexible, easy to do (in order).

Dave,

Reply to
Dave Boland

Dave Boland wibbled on Wednesday 24 March 2010 19:13

Also, depending on how fussy you are, SGML based markup can be quite expensive to parse correctly. It's certainly been my observation in perl, using optimised libraries that XML was nearly an order of magnitude slower to load the same data set as run-time loading a data definition file written in perl itself.

--
Tim Watts

Managers, politicians and environmentalists: Nature's carbon buffer.
Reply to
Tim Watts

If space is no object (in the code to write *and* read -- as well as the bulk medium itself), then push everything in ASCII and pull it back the same way. Add a version number to the file format so your code can look at that and adapt to the specifics of what it should expect to encounter.

As with any input, validate each parameter before using it (since it can be corrupted -- even intentionally). If you need to ensure the contents are not corrupted, sign them with a secure hash.

If you are limited on space, do something like the way TIFF files are encoded -- tag each parameter with a unique *numeric* identifier. (TIFF would be good to study in this regard as it makes this pretty clear; you might also look at the way NCD X Terminals store their NVRAM parameters). If you decide you need a new parameter -- or need to entirely change its syntax -- you can just "invent" another identifier.

In either case, either push a flag into the file to indicate byte ordering (if you want to store "raw binary" and need to be able to read it back into a different architecture) or force a particular byte ordering (e.g., analagous to "network byte order").

Avoid just pushing things like floats, doubles, etc. as binary unless you are willing to adopt the same encoding universally in their representation (e.g., IEEE 754).

Also remember that the way a struct is represented in memory varies from compiler to compiler. E.g., instead of pushing the bytes that the struct occupies, instead, push the *values* contained within the struct as if they were individual "parameters" (possibly *within* a single parameter -- delimited with something contextually appropriate)

Reply to
D Yuniskis

That is the way I usually handle such data (mostly in eeprom rather than a file, but the principle is the same). It's compact and efficient, and you can handle smooth upgrades (and even downgrades).

Aligning the first element may not be enough. On 8-bit or 16-bit machine, struct { uint32_t a; uint16_t b; uint32_t c; } will take 10 bytes and c will be unaligned. Some 32-bit cpus can work with unaligned values, but others cannot.

So make sure that elements are aligned to their "natural" alignments. And of course use only sized types.

As a check, simply compile the struct typedef on a 32-bit compiler and check its size is the same as on the 16-bit compiler.

Reply to
David Brown

Yes, I've been meaning to pull the thread mentioned in this, er, thread:

Ran across it after I had ported uip to an MSP430-series board and was thinking about a port to one of NXP's LPC2xxx chips.

From the link above: "As Patrick pointed out, it's more like an alignment problem, on arm that's a heavy issue. On arm7, an unaligned access doesn't generate a trap, instead, it's accepted on a very specific arm way that internally rotates the bits not in the way one would expect. The compiler assumes you know what you're doing, and will not try to cope with unaligned accesses unless the data type (or the attributes, such as packed) implies that."

I really haven't taken the time to suss it out in detail, though. Maybe this weekend ...

--
Rich Webb     Norfolk, VA
Reply to
Rich Webb

Op Thu, 25 Mar 2010 15:19:51 +0100 schreef Rich Webb :

Only with word-size transfers in ARM mode. In Thumb mode or with halfword transfers, it is worse; the result is UNPREDICTABLE.

--
Gemaakt met Opera's revolutionaire e-mailprogramma:  
http://www.opera.com/mail/
 Click to see the full signature
Reply to
Boudewijn Dijkstra

It depends upon how much inefficiency you can tolerate, in terms of data size, code size and execution time.

A context-free grammar such as XML, JSON, YAML, etc will provide maximum flexibility at the cost of efficiency.

The most efficient solution is likely to be a binary structure with a version field. However, code size can suffer if the structure changes frequently and the code needs to be able to handle all known versions.

I wouldn't seriously consider XML unless you actually need to be able to manipulate the data with external tools. JSON, YAML, etc will provide the same level of flexibility with a fraction of the code size (there's no point in using XML unless you're going to write a 100%-conformant parser, which is harder than it looks).

Or you can use a binary equivalent of something like JSON, where keys are unique integers instead of strings, and values consist of a type byte followed by binary data appropriate to the type.

From a time-efficiency perspective, writing numbers in hex rather than decimal can provide a significant gain.

Reply to
Nobody

There are many feasible ways to do this. The important aspect is to adequately document your choice.

Consider ASN.1 BER encoding for a well-defined, fairly efficient mechanism. The ASN.1 packed encoding, PER, is more storage-efficient but much more complex in the worst case. ASN.1 provides a formal way to document the data structure.

--
Thad
Reply to
Thad Smith

I don't recall ASN.1 being anything but unfriendly. BER is easier than PER, but as I recall some of the tag values in the encoding will change with changes of the structure.

OTOH I too have been bitten by software that writes out images of structs, so something needs to be done. In a recent case the from 'header' part of the struct was read back in by the UI when constructing menus, so needed to be timely. Even so, its hard to justify when there are twenty-odd versions of that struct used in the one app (actually derived classes with no virtual methods).

I think that I would use a text file with one line per datum or the name=value style. Then in the code have a table of name, offsetof(field), type-indicator. Fill the struct will 0; scan the table filling fields with default value of type (if not zero); read the file line by line, match name, parse according to type-indicator and save at offsetof in struct. If you delete a field from the struct then the table won't compile. If you add a field without updating the table it will at least be zeroed.

*.comp.lang.c people aren't interested in this sort of stuff so removed.

Peter

Reply to
Peter Dickerson

...

There are different ways to use ASN.1. A simple way is to use the BER low-level encoding and make up your own tags -- done. Another way is to create a formal ASN.1 description of the data. The language has provisions for extensibility -- in fact, I consider that one of its strengths. There are compatible ways within the language to extend specifications, so that old data is represented the same, yet new data can be encoded.

That is fine, too.

--
Thad
Reply to
Thad Smith

Dave,

May be missing something here, but how about this ?:

typedef struct { ... ... ... } DATA;

typedef union { struct DATA sData; U8 u8Data [sizeof (struct DATA)]; } uData;

You can add elements at will and the compiler takes care of all the alignment and padding issues. You access structure elements via the union.sData and access via union.byta array array to save to a file. It should also be portable across any wordsize architecture.

In short, make the complier do the work for you :-)...

Regards,

Chris

Reply to
ChrisQ

This achieves nothing useful at all. There's nothing to be gained by having that union, compared to a plain and simple equivalent of a memcpy() or fwrite()/fread() function, operating directly on a pointer to the struct (cast to unsigned char*).

In a nutshell this is the opposite of best practice. It's about the worst possible way to express worst practice.

Reply to
Hans-Bernhard Bröker

I wasn't sure what to say to this, at first. There are so many perspectives and you didn't narrow this down then.

However, you have clarified some of it with the mention of 2k memory available, for example.

I've used several different methods for similar situations (some where I had to do something different than what I'll suggest here were limited to 256 bytes, so I used a different method for robust/small/flexible there.)

I liked one of the suggestions I've seen here for using a text format with =. I've used that method successfully for systems where versions undergo (and are expected to undergo) many revisions. For example, I have one system that is now in its 57th incarnation in 3 years' time. And I used this method without any serious problems, at all, despite many new features being added and modified over that time. That system is where a user (or computer) may interact and change parameters that need to be saved by command to do so, but where the system needs to start up from such a saved state, too.

One of the things I do is to require the program to read _all_ of the lines of that file. The parsing structure is consistent, so the field can be easily adduced. If the name doesn't match something in the current incarnation of the program, it is ignored. However, anytime the current state is saved, the entire file is re-read, all known fields changed, and all unknown ones preserved in the new save file.

Old systems that know nothing about new parameters will therefore preserve those values they don't know how to use. New systems that don't require older parameters also still preserve those older parameters. Only those parameters that mean the same thing between different incarnations are modified by "saving" the state. Finally, all parameters for any system (new or old) has a default state that takes place if the parameter isn't found, at all, when a save state is loaded. And saving a state writes out all internal state values even if they weren't in the earlier save state, so the text save state may grow when saving.

It has saved me trouble. Old systems run as expected on new save states. New systems run as expected on old save states. Etc.

Jon

Reply to
Jon Kirwan

That's very good. Two paragraphs having nothing constructive to say at all :-).

We can agree to differ, but aliasing a structure to an array is a good technique to ensure that the correct number of bytes get written to the file, irrespective of the cpu architecture, compiler and future changes to the structure.

If you don't understand the importance of data structures in programming, then I guess there's no point in continuing...

Regards,

Chris

Reply to
ChrisQ

And yes, where do you get the count for your memcopy ?. The usual hack is to sizeof (struct), then use that count in memcopy with a pointer to the start of the struct.

If this is the best you can do, it's no surprise that software has so many bugs :-)...

Regards,

Chris

Reply to
ChrisQ

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.