What is your strategy to save non volatile data?

- P
- pozz
  
  Contact options for registered users
posted
7 years ago

Thu, Aug 11, 2016 12:48 PM

Many times I work on embedded projects where it is required to save some data in a non volatile way. Usualy I use external (to the main microcontroller) serial/parallel EEPROM or internal Flash memory (with a software layer for EEPROM emulation).

However my goal isn't to discuss the hw alternatives, but the data format and strategy.

I know there are many serialization data format (XML, JSON, Protocol Buffer, SQLite), but I don't think they are valid solutions for medium/low end microcontrollers. They are complex to implement and they aren't really necessary. Data are written and read from the same platform, so the endianess and byte order are not a problem.

IMHO directly saving a C structure to the non-volatile memory is the best method. When you will load bytes from the non volatile memory, you will have magically your C structure in RAM filled with the saved values.

Is it the same strategy that you use?

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Thu, Aug 11, 2016 1:27 PM

There's no universally "best" approach -- much depends on the nature of the data, it's criticality, the frequency of update, quantity, etc. *And*, the hardware facilities available (i.e., if you don't have a "power fail" signal available, saving megabytes of data *at* loss-of-power can be impractical).

[There is also a dependency on the software structure -- if you can't get your hands on an "intact" dataset at any given instant, then that limits what you can *do* with that data when you "need" to preserve it.]

Just "storing struct's" (assuming each has a way of identifying itself, later -- even if that is implicitly indicated by WHERE it is stored) means you have to ensure they are stored when they are in a "self-consistent" state (i.e., when all members "make sense" in the context of their peers). You also don't have any explicit way of later verifying (at "restore" time) that the data is intact (checksum/hash/hamming code, etc.).

[likewise, are these data consistent with those *other* data that you're also saving AT THIS POINT IN TIME?]

It also assumes all the data in a struct *needs* to be saved. E.g., if I have a struct that is used to define a text window/pad, it's *dimensions* may be worth preserving -- but, probably not its *content*, cursor, etc.

Or, that data have the same "value" (i.e., the window's size/shape/position vs. contents).

Remember, you invariably have to use the saved data to bring the NON-saved data to some consistent state on "restore". So, *you* decide what you need to be able to make that "reinitialization" proper.

The underlying hardware technology also has a significant impact on how and when you store. Updating FLASH (limited write cycles, *long* erase cycles) is much different than BBSRAM; updating a "local" store is much different than updating a *remote* store (esp via a service!).

I've seen naive approaches that (literally) stored three copies of data in an attempt to ensure its integrity (despite the inefficiency and lack of robustness that implies). At the other extreme, I've seen "raw" data stored with *no* attempt made to vouch for its integrity (i.e., a three-valued byte-wide variable could conceivably be recovered with 253 bogus states; yet the software ASSUMED it would be in ONLY one of the three legitimate states).

I've used approaches where I constantly maintained a live set of diff(1)'s against some known system state -- so a "restore" was "restore to known state followed by this set of (TAGGED!) diff's".

Currently, my only persistent store is a RDBMS. So, I rely on *it* to ensure data integrity (I just have to make sure i get the data *to* the RDBMS before losing power). I deal with the other issues (frequency of update, underlying hardware technology, etc.) by partitioning data into different "tablespaces" (i.e., *where*, physically, each datum gets saved "in memory") -- so, I can decide how important various data are /ex post factum/ (cuz I have a limited amount of each type of "memory" and don't want to have to rewrite my code when I have to move dataX into "precious" memory and kick dataY *out* to make room for it!).

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Thu, Aug 11, 2016 2:12 PM

As Don says, there is no single best method. The answer to almost every question in embedded development is "it depends".

But for the common case of storing a small set of parameters, I usually work with a struct definition. Along with the parameters in question, I include a "magic number" for identification, a "parameter structure version number" (very useful for upgrading the parameter set if a later version of the firmware uses more parameters), an update serial number, a CRC check field, and some space reserved for future additional parameters.

Typically I have a statically allocated struct in ram, and two copies in eeprom or flash. On startup, I check both non-volatile copies to see which is the latest version (highest serial number) that has a valid CRC, and copy that into RAM. If the parameter structure version number does not match the current firmware requirements, I upgrade the parameters.

When I need to save the parameters, I use the oldest of the two NV sets. First, I erase it (for eeprom, that's just clearing the magic number). Then I copy in most of the data and CRC, leaving the magic number for last.

I have had occasion to use a log structured parameter "file system" in flash, but usually such complications are unnecessary. And certainly SQLite, or JSON, etc., are normally overkill.

- B
- bitrex
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Thu, Aug 11, 2016 4:19 PM

Why not? There are implementations of the FAT32 filesystem for SD cards and XML libraries for platforms like the Arduino that are really compact.

SD cards are cheap; I think I'd consider using them as an alternative to external EEPROM. There's also the advantage that you can just stick them in a PC and edit the data, if you want.

- P
- Paul Rubin
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Thu, Aug 11, 2016 6:29 PM

Two problems come to mind:

1) if the C structure has pointers in it, then on reload everything must end up at the same address as before. That constrains you a lot and can be a source of bugs and can get in your way if you want to change the software then reload the parameters. So you are probably better off serializing the data somehow--a simple binary format is fine, json etc. are overkill.

2) As David Brown indicated, you can get in trouble if the program crashes (loses power, etc) during the save operation. You need some way to deal with that, such as by writing out two copies as he describes. Having a version number in the data as he mentions is also often worthwhile.

- J
- John Devereux
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Thu, Aug 11, 2016 7:50 PM

Yay, that's *exactly* what I have ended up with.

Binary copying of structs seems to be frowned on but it saves a lot of work (and code). AFAICS it's fine since it is defined by the ARM EABI (in my case). And I am hardly going to be unsoldering the ARM and replacing it with a MIPS.

You've copied my stuff haven't you.

It may be needed if you want to store a counter rather than some occasionally altered settings. That's when I did it and it was a PITA.

--

John Devereux

- P
- pozz
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Thu, Aug 11, 2016 9:49 PM

Il 11/08/2016 18:19, bitrex ha scritto: > On 08/11/2016 10:12 AM, David Brown wrote: >> On 11/08/16 14:48, pozz wrote: >>> Many times I work on embedded projects where it is required to save some >>> data in a non volatile way. Usualy I use external (to the main >>> microcontroller) serial/parallel EEPROM or internal Flash memory (with a >>> software layer for EEPROM emulation). >>> >>> However my goal isn't to discuss the hw alternatives, but the data >>> format and strategy. >>> >>> I know there are many serialization data format (XML, JSON, Protocol >>> Buffer, SQLite), but I don't think they are valid solutions for >>> medium/low end microcontrollers. > > Why not? There are implementations of the FAT32 filesystem for SD cards > and XML libraries for platforms like the Arduino that are really compact.

Even if you're right (I don't know how simple to integrate and compact are those libraries for microcontrollers, such as Cortex-M0+), the price is the first reason to avoid this. Connector and SD card are much more expensive against a simple serial EEPROM or the internal Flash memory that you already have for free.

Another reason could be additional space on the board.

Anyway, could you link those compact libraries for FAT32 and XML?

- P
- pozz
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Thu, Aug 11, 2016 10:16 PM

Il 11/08/2016 16:12, David Brown ha scritto: > On 11/08/16 14:48, pozz wrote: >> Many times I work on embedded projects where it is required to save some >> data in a non volatile way. Usualy I use external (to the main >> microcontroller) serial/parallel EEPROM or internal Flash memory (with a >> software layer for EEPROM emulation). >> >> However my goal isn't to discuss the hw alternatives, but the data >> format and strategy. >> >> I know there are many serialization data format (XML, JSON, Protocol >> Buffer, SQLite), but I don't think they are valid solutions for >> medium/low end microcontrollers. They are complex to implement and they >> aren't really necessary. >> Data are written and read from the same platform, so the endianess and >> byte order are not a problem. >> >> IMHO directly saving a C structure to the non-volatile memory is the >> best method. When you will load bytes from the non volatile memory, you >> will have magically your C structure in RAM filled with the saved values. >> >> Is it the same strategy that you use? > > As Don says, there is no single best method. The answer to almost every > question in embedded development is "it depends".

Of course, yes. However if we define and restrict the requirements, the number of good strategies is reduced.

Indeed my intent is that situation.

Interesting approach.

I couldn't understand what is "magic number" for identification purpose. CRC can be used to check if the saved struct is valid or not, so the "magic number" is useless for this goal.

Maybe you have more than one struct saved to the non volatile memory, and the "magic number" identifies the type of the struct? In this case, I think the physical address where you save the struct automatically identifies the struct itsel. Do you have a different "magic number", "serial number", CRC for each saved struct?

Or do you have a dynamic list of structs that could be saved at different places of non volatile memory (non fixed address)? In this case, you have a file list somewhere, similar to a FAT table. Maybe you have implemented a minimal filesystem where a file is a struct.

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Thu, Aug 11, 2016 10:37 PM

Another reason could be the issue of now having to deal with a memory card that can be arbitrarily removed DURING a write!

- P
- Paul
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Fri, Aug 12, 2016 2:34 AM

You really want to be sure that you have no other problems like hardware fault that addresses wrong area of memory that happens to have a location that appears to give a valid CRC, watch everything fail in a misleading way. Simply could be firmware change and someone has got the save and read address (even internal ones of external devices) wrong due to part of upgrade being different size 'EEPROM'.

I have seen some systems fail if the external storage has been fully erased and give wrong configuration for many other reasons.

You could use magic numbers that way but I tend to use additional members for expected type and a diff magic number for double check, to avoid different sizes of each struct and misleading CRC appearing 'valid'.

Assumption that you have the right data because you think you are only accessing one area of memory especially if external (or even SD Card) can lead to all sorts of confusion.

--
Paul Carpenter          | paul@pcserviceselectronics.co.uk 
    PC Services 
  Raspberry Pi Add-ons 
 Timing Diagram Font 
 For those web sites you hate

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Fri, Aug 12, 2016 9:59 AM

The "magic number" is to identify that the type of record. Perhaps the system will have more than one possible set of data, or the same board is used for two completely different applications with different parameter sets.

- J
- Joe Chisolm
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Fri, Aug 12, 2016 5:10 PM

I think JSON, XML, etc are over kill for data that will never be exposed to the outside world. If you are going to have an editable config file, then yes.

Writing structs directly to storage can introduce upgrade/downgrade issues. But depending on the stability of the structs and how you manage introducing new data sets into the struct it can work. Saving the entire config data set works well with flash where you are block erasing, block writing anyway.

I have always used either on chip or external byte writable EEPROM. Address zero is always a version id of the data in the eeprom. I decouple the FW rev from the eeprom rev. All new items are added at the end. FW rev 1 reads 0x00 to say 0x50 with offsets known. FW rev 2 reads 0x00-0x50 for existing items, and 0x51... for new items. If I need to downgrade to FW rev 1 the EEPROM items 0x51 and beyond are simply ignored.

Another simple way is a tag-length-value list. You have a search routine that reads through the eeprom looking for the specific tag.

With either of these approaches you have to deal with the issue of config item "X" is a u_int8 and now needs to be a u_int32. How do you manage upgrade path fw ver 1 -> fw ver 2 (that needs u_int32) and then downgrade back to fw ver 1 because of some fatal flaw in ver 2 that will take you a month to fix and QA. You have to roll back the fw so the units will be operational until you can fix ver 2.

--
Chisolm 
Republic of Texas

- L
- Les Cargill
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Fri, Aug 12, 2016 5:15 PM

They call these "serialization formats".

The simplest XML solution I've found is ezXML. It's sort of fiddly. JSON is another choice. IMO, JSON is a little easier to look at.

This being said, just dotted names in a text file is equivalent to each of these. Line of Class.instance.42.attribute.thing="86.42"\n works just as well.

Text formats make this problem go away.

Unless the shape of the struct changes. Having to maintain and feed a version ID to cover all possible struct shapes is a pain.

I have, but there's some advantage to mapping the struct to a text file of attribute-value pairs through a table of names. How much trouble this is depends on how many attribute value pairs you're storing.

--
Les Cargill

- P
- Paul
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Fri, Aug 12, 2016 9:00 PM

I try to AVOID using struct control parameters (version, magic number, type, byte count, CRC/checksum) to ever have 0, 0xFF, -1 as legal values to avoid various erased block values being confused as valid values.

Personally rather than having to have tables of which version has how many bytes, I store byte count as well to compare against software version defaults size to work out which count to useeither they are same or defaults size is smaller so used (downgrade) or external is smaller (upgrade). This works if you are concientious in your

If changing the size of a parameter, I put the NEW type on end and leave old one at last used setting. For your example I would not want new version to use same position in struct, for exact reasons above. That sort of change is going to require code change anyway and really should be noted more than just the change of a declaration. The operations on that parameter

If you have to do special conversions then do that based on knowing the different versions between defaults (firmware version) and external version. As per your example upgrade changes type so the default would either need conversion or using new default. This should be part of your update design. I try to work on using new defaults for extra value as the norm as update usually means recalibration or similar task on first use.

--
Paul Carpenter          | paul@pcserviceselectronics.co.uk 
    PC Services 
  Raspberry Pi Add-ons 
 Timing Diagram Font 
 For those web sites you hate

- T
- Tim Wescott
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Sat, Aug 13, 2016 5:06 PM

And then, when you upgrade your software, your data structure is automatically corrupted! Woo hoo!

You can prevent this by assuming that memory will never be corrupted and only growing the structure at the end (which is oh so convenient when, for instance, you want to obsolete a parameter or grow it from one byte to two), or by making 0xff the default value for every byte (again, oh so convenient in the code).

Personally, I use a list of data items arranged in records. Each record has a length byte, an ID byte (or word), and data. It's accompanied by software that lets you define the default value for each ID, so when the item isn't found the software automatically gets the factory default (I use C++, so the default value is done as a parameter in a template). It all goes into a block with a 16-bit CRC word at the end. It's extravagant in the use of overhead, but I have yet to come close to using

256 bytes, much less the amount of memory commonly found in a block of flash or an external EEPROM.

--
Tim Wescott 
Control systems, embedded software and circuit design 
I'm looking for work!  See my website if you're interested 
http://www.wescottdesign.com

- P
- pozz
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Sat, Aug 13, 2016 9:42 PM

You can include a "data layout version" that increases when one parameter is added, removed or changed when a new firmware version is created. During loading process, the "data layout version" is checked against the corresponding version of the current firmware.

Immediately after an upgrade, the data version doesn't match the firmware version so a "data upgrade" process is triggered.

Of course, for complex firmware and persistent data that changes frequently, this task could be tricky: you could have whatever previous data version (or the upgrade *must* be done incrementally from one version to the next one, without gaps).

How do you save a variable-length data, for example a string? Do you save always the maximum lenght for that parameter? Otherwise, how do you manage an increase in the length of that parameter (when the length of a string increases)? You can't overwrite only the old value, you need to rewrite the "database" from the beginning.

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Sat, Aug 13, 2016 10:55 PM

This places a burden on the user(s): they have to track your releases even if they have no need/desire to do so. And, *you* have to keep all of the "updates" available (web site?) so a user who is still running v1.0 can apply 2.0, then 3.0, then 4.0, etc.

Or, offer a set of shortcuts: 1.0 to 3.0 1.1 to 2.0 2.0 to 4.0 etc.

(this gets messy really quick -- esp for a *device*, not a "desktop application")

And, you need some way of telling the user that the update does NOT apply to his current version; that he must first update to version X (which, in turn, may require yet another update)

You can use P-strings. And, to minimize the number of "database rewrites", allow a *shorter* string to replace an existing string (padding the excess so your parser knows to skip over it as "empty space"). Then, the only times you need to rewrite is when you need a *longer* string OR when you need to GC some of the "empty space" in order to accommodate some other increased memory need (i.e., some *other* string that has grown).

(Of course, you can also use ASCIIZ strings)

In each case, you have to consider how you handle corruption in the dataset; if a tag suggests N bytes of data *should* follow (e.g., a 6-octet MAC) but the tag has been corrupted and, really, only M bytes actually are present (because it was really an IP address stored there!), then you quickly can get out of sync and find yourself reading "past" the end of the dataset. Or, coming up short.

(hence the need for a checksum/hash AND the precaution of not trusting ANY of the data until you verify that the entire object appears intact)

Note that this requires you to have enough working memory to store the entire "database" while it is being rewritten. Or, best case, the largest object contained therein (assuming you can repeatedly, incrementally rewrite the database as you shuffle entries around).

- P
- Paul
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Sat, Aug 13, 2016 11:08 PM

....

and

For a lot of applications, as long as you safeguard to ensure even if software is newer or older you only load valid number of parameters.

Even with a record set and downgrade you will have to skip newer records the software knows nothing about becuase software has been downgraded

For the vast majority of embedded applications there is limited number of strings that need to be saved and these are in MOST cases things like hostname, filename, path all of which have system defined LIMITS. It is rare I come across a CONFIGURATION setting (not a save state) that needs a variable length string. If any strings are involved they are normally ones with fixed limits.

Not all systems have to have save state that is more complex than a set of variables, rarely do I find embedded applications that have to save system state or loaded sub-application state.

A lot of embedded systems cannot do more than skip unrecognised parameters as they have no way to inform anyone.

--
Paul Carpenter          | paul@pcserviceselectronics.co.uk 
    PC Services 
  Raspberry Pi Add-ons 
 Timing Diagram Font 
 For those web sites you hate

- T
- Tim Wescott
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Sun, Aug 14, 2016 12:06 AM

Basically, everything I've done with this sort of thing is on a processor big enough that all of the individual parameters can be stored in RAM. In fact, the process is that each parameter is echoed in memory, and at startup each one is either loaded with the value from flash (if it is there), or the default value (if it isn't).

Because I usually end up using a block of the processor's flash, the "database" is, indeed, rewritten from the beginning on save (and woe be unto the service guy who says "save" and then pulls the plug).

I generally don't store strings in this sort of space, so it hasn't been an issue for me.

If it is an issue, you can always have a special data ID for "erasure". You'll still need a way to rebuild memory if you're constantly changing the size of things.

Just for reference, I'm doing this sort of thing on chips with 32K or more of RAM, and 128K or more of flash -- I'm sure it works for smaller processors than that, but at some point space will get constrained enough that it's not reasonable.

--
Tim Wescott 
Control systems, embedded software and circuit design 
I'm looking for work!  See my website if you're interested 
http://www.wescottdesign.com

- P
- pozz
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Mon, Aug 22, 2016 7:46 AM

Do you use any mechanism to avoid corrupting the "database" if the saving process is interrupted for any reason (for example, a power failure)?

Do you restart from "all default" values, loosing all the previous saved values?

If you rewrite all the database from the beginning (in a new memory area or in the same memory area), you could increase/descrease the length of strings without any problem. The overall size of the database will be greater or lower than before.

There's a problem only if you want to rewrite/overwrite only the single parameter that is changed *and* the size of its value changes.