Linux embedded: how to avoid corruption on power off

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
I'm playing with a Raspberry system, however I think my question is  
about Linux embedded in general.

We all know that the OS (linux or windows or whatever) *should* be  
gracefully powered down with a shutdown procedure (shutdown command in  
Linux). We must avoid cutting the power abruptly.

If this is possible for desktop systems, IMHO it's impossible to achieve  
in embedded systems.  The user usually switch off a small box by  
pressing an OFF button that usually is connected to the main power  
supply input.  In any case, he could immediately unplug the power cord  
without waiting for the end of the shutdown procedure.

I'm interesting to know what are the methods to use to reduce the  
probability of corruption.

For example, I choose to use a sqlite database to save non-volatile user  
configurable settings. sqlite is transaction based, so a power  
interruption in the middle of a transaction shouldn't corrupt the entire  
database. With normal text files this should be more difficult.

I know the write requests on non-volatile memories (HDD, embedded Flash  
memories) are usually buffered by OS and we don't know when they will be  
really executed by the kernel. Is there a method to force the buffered  
writing requests immediately?

Other aspects to consider?

Re: Linux embedded: how to avoid corruption on power off
AT Friday 16 June 2017 18:10, pozz wrote:

Quoted text here. Click to load it

sync should do this, but you can never be sure that the buffers and caches  
of the medium will also b flushed.
I have several devices that *may* not get corrupted (helicopter navigation).  
But here I don't have many writes.
I mount all partitions read-only. When a write is needed the partition gets  
temporary remounted as rw (mount -o remount,rw ...) and after the write (and  
sync) is done it gets remounted as ro.
This together with a journalling file system keeps the "window of  
vulnerability" as small as possible.
With more than 100 system flying for more than 10 years I did not have one  
corrupted file system.  
The first half of that systems used mechanical hard disks in such a high  
vibration environment without problems and no, no shock mounts, fixed rigid  
mounting.
It's no guaranty but at least a low probability of problems.
  
--  
Reinhardt


Re: Linux embedded: how to avoid corruption on power off
Il 16/06/2017 12:45, Reinhardt Behm ha scritto:
Quoted text here. Click to load it

When exactly you call sync?
I can call sync everytime my running applications write something on  
non-volatile memories, but I don't know when the OS writes something.


Quoted text here. Click to load it

What is your linux distribution?  I know it's not simple to mount the  
root filesystem as read-only.  Some Linux tasks start from the  
assumption they can write on root filesystems.


Quoted text here. Click to load it

Do you use ext3 as journaled filesystem?


Quoted text here. Click to load it


Re: Linux embedded: how to avoid corruption on power off
AT Friday 16 June 2017 19:19, pozz wrote:

Quoted text here. Click to load it

After closing all open written files. So there is nothing buffered inside  
the application. Files open for read only are not effected.
Writes are quite short, mostly files are newly created, written and closed  
again, like when the pilot changes some waypoint(s).
  
Quoted text here. Click to load it

The first was based on SuSE 9.3, later Arch. /etc, /var and /tmp are in RAM,  
/etc is created and filled during startup from a tarball in the initrd.  
Changing some configs is not simple. But this is rarely done. The software  
is certified (DO-178 DAL D), so changes require re-certification and a lot  
of paperwork.
  
Quoted text here. Click to load it

Yes, the first system used Reiser (2002).
Quoted text here. Click to load it


--  
Reinhardt


Re: Linux embedded: how to avoid corruption on power off

Quoted text here. Click to load it
  
Quoted text here. Click to load it

you put a big capacitor on the power line so that when the main power is cu
t, the capacitor keep the system running the time necessary to gracefully s
hutdown.

Bye Jack

Re: Linux embedded: how to avoid corruption on power off
wrote:


Quoted text here. Click to load it

I second this.

E.g. for 24 V (typically diesel engines) use a series diode, a _big_
storage capacitor and a SMPS with input voltage range at least 8-28 V,
this should be enough time to sync out the data.

Using SMPS input range much larger than 3:1 might not be that
productive, since at the lower capacitor voltages, the energy stored
is very small (inverse square law).  


Re: Linux embedded: how to avoid corruption on power off
Quoted text here. Click to load it


You'll also need some hardware to signal the software that external  
power has turned off.  That event will tell your system that a little  
bit of time remains to shut down before the caps drain and the world ends.

JJS


Re: Linux embedded: how to avoid corruption on power off
On 6/16/2017 11:24 AM, John Speth wrote:
Quoted text here. Click to load it

IME, you need TWO signals (if you have a physical power switch):
- one to indicate that the user has turned the power off (perhaps
   because it is easier than using whatever "soft" mechanisms have
   been put in place; or, because the system has crashed or is
   otherwise not responsive ENOUGH to use that primary approach)
- one to indicate that the user had hoped to keep the device running
   but the mains power has been removed

Depending on the topology of the power supply, you can get a lot
of advanced warning (e.g., monitoring the "rectified mains" on
a switcher) or very little (e.g., monitoring the regulated output(s)
to detect as they are falling below their minimum required voltages.

Note that your advanced warning has to tell you when ANY of the supplies
you will be *requiring* will fall out of spec.

Re: Linux embedded: how to avoid corruption on power off
wrote:

Quoted text here. Click to load it

Very little extra hardware is required. Use an optoisolator and you
can easily monitor the primary input voltage, including mains voltage.
Some comparator functionality and it will drive some of the RS-232
handshake pins, which then generates an interrupt.

One issue with big storage capacitors is how fast it is charged when
the power is restored. The SMPS reset circuitry should operate
reliably even when the input voltage is slowly restored.

With big diesels especially at cold temperatures, the starting current
will be huge, dropping the battery voltage momentarily. This voltage
drop will initiate the data save, but after that, the routines should
check if the input voltage has returned and based on this, continue
normal operation or perform a full shutdown.


Re: Linux embedded: how to avoid corruption on power off

er.com:
Quoted text here. Click to load it
 is cut, the capacitor keep the system running the time necessary to gracef
ully shutdown.
Quoted text here. Click to load it
s.

I've done UPS for a linux system with supercaps, cpu held in reset until  
the supercaps are fully charged. One pin is used to monitor input supply  
and issue a shutdown if it goes away, another pin is used with the  
gpio-poweroff driver to issue a reset in case the power has returned when
  
the shutdown is complete

-Lasse

Re: Linux embedded: how to avoid corruption on power off
On 16/06/17 12:10, pozz wrote:
Quoted text here. Click to load it

Some storage mediums, such as sd cards, are very sensitive to being
corrupted if they are powered off unexpectedly.  There just is no way to
make an sd card storage reliable in the face of unexpected power offs.

In general, using read-only mounts helps enormously.  In finer detail,
some filesystems are more robust for unexpected poweroffs, such as
LogFS, NILFS and FFS.  And some file types are more susceptible to
problems - sqlite databases are notorious for being corrupted if writes
are interrupted.

The only way to be really safe is to have enough capacitance on board to
be able to finish your writes when the supply is cut off.  Keep your
writes short, use kernel parameter tuning (or filesystem mount options)
to minimise the dirty data buffer time, and make sure that as much as
possible is mounted read-only with things like /tmp, /var/lock, etc., on
tmpfs mounts.  And when a power fail occurs, re-mount your r/w
filesystems as read only - that should force writing out buffers as fast
as possible, and block any more writes, and can be perhaps be done early
in the controlled shutdown (if you don't need to write too much log data
on shutdown).



Re: Linux embedded: how to avoid corruption on power off
AT Friday 16 June 2017 19:31, David Brown wrote:

Quoted text here. Click to load it

You never know when the sd cards is done with writing which can include  
internal re-organization for wear leveling.
Quoted text here. Click to load it
--  
Reinhardt


Re: Linux embedded: how to avoid corruption on power off
Il 16/06/2017 13:31, David Brown ha scritto:
Quoted text here. Click to load it

What?  I choosed sqlite because they say the corruption of a database is  
a very rare event.


https://www.sqlite.org/howtocorrupt.html

Re: Linux embedded: how to avoid corruption on power off

Quoted text here. Click to load it
  
Quoted text here. Click to load it

Chapter 3.1 and 4

Bye Jack

Re: Linux embedded: how to avoid corruption on power off
Il 19/06/2017 13:27, Jack ha scritto:

Quoted text here. Click to load it

Ok, however those problems (memories that lie about the write has really  
finished) are common to all other file types, not only sqlite.



Re: Linux embedded: how to avoid corruption on power off
On 19/06/17 12:47, pozz wrote:
Quoted text here. Click to load it

Sorry for causing alarm, I was mixing this up with something else.

You are, of course, still reliant on the OS and the hardware acting
together to get the right behaviour here.  The OS will tell the sqlite
library when it believes the data has been written to the disk, but a
memory card may still mess around with writes, garbage collection, etc.,
after that.


Re: Linux embedded: how to avoid corruption on power off

Quoted text here. Click to load it


It depends.  Sqlite is designed to be embedded in a foreign program,
so it is vulnerable to developer errors in ways that server based DBMS
are not.

But that's only part of it.  There also are hardware - primarily
storage - reliability issues to consider.


As part of the embedded focus, by default, Sqlite does not enable
"WAL" which stands for "write ahead logging".  More generally, this
capability is referred to as "journaling".

You must enable WAL for the database if you want maximum safety.  But
even with WAL enabled, the database can be corrupted by hardware
hiccups during a write.  

The point I've been debating with DonY in another thread is that, when
using WAL, a database corrupted during normal operation ALWAYS SHOULD
BE RECOVERABLE.

The problem is that tuning WAL use is a balancing act: the safer you
want to be, the more extra file space you need for the log.


SSDs mess with the DBMS's ability to recover from a write failure: the
update/erase "block" size != write "page" size issue is a serious
problem.  It's a failure mode that [essentially] can't happen with
other types of storage systems, and something that DBMS never were
designed to handle.  

If the SSD erase block (not "page") size is not ridiculously large,
you can configure the DBMS's "file block" size to match it.  It will
require additional RAM, may slow down i/o (depends on the SSD), and it
will require lots of additional file space for the WAL, but it will
greatly improve the odds that your database won't get corrupted.

If you can't match block sizes, then you have a problem.  The best
solution is multiple SSDs in a self repairing RAID.  If that isn't
possible, then the only viable solution is frequent offline backups of
the database.

George  

Re: Linux embedded: how to avoid corruption on power off
On 6/16/2017 3:10 AM, pozz wrote:
Quoted text here. Click to load it

The easiest -- and most reliable -- is to have an "early warning" of an
impending power loss (regardless of whether that is a genuine power loss,
the power switch being flipped off *or* the power cord being unplugged).
The notice must allow the system to operate *reliably* long enough for it to
preserve whatever it *decides* to preserve (this might not be "everything"
based on the magnitude of the warning interval).

Consider most/many devices are normally "shutdown" in an orderly fashion.
So, the application can go about saving EVERYTHING that it wants to save
before removing power to itself (most power switches, nowadays, are *soft*
power switches -- merely REQUESTS for power to be removed).

Buffered log files -- and other objects which aren't *essential* to be
flushed to permanent store -- can be written out in an orderly fashion,
etc.  (remember, each FILE* also typically represents a buffer in the
application's space)

OTOH, if you are powering down due as a result of an alert of impending
power FAILURE, you have a fixed amount of time ("energy") to get work
done.  The system designer knows how long the system can be kept operational
*and* the conditions that have to be met to do so.  E.g., the software
may have to immediately shut down peripherals that will sap the reserve
power from the power supply's filters, extend the up-time from any on-board
battery backup, etc.

The software then has to decide which subset of everything that it would LIKE
to preserve absolutely MUST be preserved.

It's important not to be caught in the middle of a write to (most) permanent
memory systems.  Most technologies have the potential to corrupt large
portions of the store if a "write cycle" is not strictly implemented.
Its not like just *a* datum will have a bogus value but, rather, an entire
page of data may be corrupted, etc.

So, the last part of your shutdown routine is spinning in a tight loop
(or, executing HALT) deliberately NOT trying to do anything lest some
"write" happen to be corrupted at the instant the power fell below a
level for reliable operation.

I've designed products where the power switch had a second set of contacts
that allowed the software to sense if the switch had been flipped (vs. power
failing) so that I could alter the shutdown routine accordingly (the product
being capable of extended operation on battery as it was used in a high
availability application)

Quoted text here. Click to load it

That's not true.  If the DBMS software is in the process of writing to
persistent store *as* the power falls into the "not guaranteed to work"
realm, it can corrupt OTHER memory beyond that related to your current
transaction -- memory for transactions that have already been *committed*.

Given a record (in the DBMS) that conceptually looks like:
     char name[20];
     ...
     char address[40];
     ...
     time_t birthdate;
an access to "address" that is in progress when the power falls into the
realm that isn't guaranteed to yield reliable operation can corrupt
ANY of these stored values.  Similarly, an access to some datum not
shown, above, can corrupt any of *these*!  You need to understand where
each datum resides if you want to risk an "interrupted write".

When you are alerted of an impending power failure, you want to finish
ALL writes in the "window of reliable operation" that remains.  If
that means prematurely terminating a commit, then that's what you do
else you risk corrupting previous completed commits!

Quoted text here. Click to load it

You can flush the buffer caches.  Many DBMS's do exactly this to keep the
persistent store up to date.

I use different tablespaces for different resources based on the technology
that I want "backing" those stores.  I.e., do I want this part of the database
to reside on disk, in DRAM, BBDRAM, BBSRAM or in FLASH??  Note that I also
have to take durability into consideration, not just persistence (I don't want
to be hammering on FLASH with frequent updates lest I "wear a hole through it"
from overuse.  :>

[BBSRAM is the safest bet as I can ensure the time for individual write cycles
giving me a finer "safe" granularity; with BBDRAM, its possible to "blow a row"
if a write is interrupted (ditto FLASH)]

Quoted text here. Click to load it

Of course!
- mechanisms have to be brought to safe states in an orderly fashion
- some indication of whether the shutdown was completed successfully
   (or not) made so the bootstrap will know how exhaustively it should
   test the preserved state
- external protocol connections have to be treated as local resources
   potentially left in an inconsistent state (they think everything is
   OK even though your device is now OFF)
- is there a risk of those connections being hijacked by an adversary
   killing YOUR power with the knowledge that you DON'T notify those
   connections
- informing your user that you are attempting a recovery (on next powerup)
   so he is aware of the potential for data loss (and, if the recovery
   process is time consuming, he doesn't fret over the long delay)
- handling "double faults" (what happens if you are in the middle of
   recovering and power fails, again?  This can be a common experience
   esp for times when power is coming up hesitantly)
- etc.

Its relatively easy to design a product that works "steady state".
Most of the blemishes appear getting into -- or out of -- that
steady state operating condition!

Re: Linux embedded: how to avoid corruption on power off

Hi Don,

I know *you* know this (because we've been over it before) ... I'm
just providing information for the group.


On Fri, 16 Jun 2017 11:26:39 -0700, Don Y

Quoted text here. Click to load it

That's correct as far as it goes, but from the DBMS point of view
transient data in RAM is expected to be lost in a failure ... the
objective of the DBMS is to protect data on stable storage.  

If you are using a journal (WAL) - and you need your head examined if
you aren't - transactions already committed are guaranteed to be
recoverable unless the journal copy and the stable copy BOTH are
corrupted.  

Normally, the only way this can happen is if the OS/hardware lies to
the DBMS about whether data really has been written to the media.  
If the media is such that a failure during a write can corrupt more
than the one block actually being written, then the media has to be
protected against failures during writes.


The journal copy and the stable copy of any given record are never
written simultaneously, so even if the journal and the stable file
both are being written to at the time of failure, the data involved in
the respective writes will be different.


Quoted text here. Click to load it

No.  Records may be written or updated in pieces if the data involved
is not contiguous, but the journal process guarantees that, seen from
on high, the update occurs atomically.

The *entire* record to be updated will 1st be copied unmodified into
the journal.  Then the journal will record the changes to be made to
be made to the record.  After journaling is complete, the changes will
be made to the record in the stable file.

At every point in the update process, the DBMS has a consistent
version of the *entire* record available to be recovered.  The
consistent version may not be the latest from a client's perspective,
but it won't be a corrupt, partially updated version.


Ideally, journaling will be done at the file block level, so as to
protect data that co-resident in the same block(s) as the target
record.  But not all DBMS do this because a block journal can become
very large, very quickly.  Sqlite (and MySQL and others) journal at a
meta level: recording just the original data and changes to be made to
it - which isn't quite as safe as block journaling, but is acceptable
for most purposes.


Note that OS/filesystem journaling is not a replacement for DBMS
journaling. The addition of filesystem journaling has no affect on
DBMS reliability, but it does affect DBMS i/o performance.

YMMV,
George

Re: Linux embedded: how to avoid corruption on power off
On 6/16/2017 10:33 PM, George Neuner wrote:
Quoted text here. Click to load it

That's exactly the problem with these types of media.  If you violate
the parameters of the write/erase cycle, all bets are off -- especially
if the power to the device may be at a dubious level, etc.

Devices with built-in controllers (e.g., SD cards, SSD's, etc.) do lots
of behind the scenes juggling when you do a "simple" write to them so
you don't know when the ramifications of your "write" are done.  I.e.,
the nonvolatile components *in* the device may be accessed differently
than your mental model of the device would expect.

Without knowing where each datum resides on the individual memory
components at any given time, you can't predict what can be corrupted
by a botched write/erase.  Data that was committed three weeks ago
(and not "touched" in the intervening time) can be clobbered -- how
will you know?

Quoted text here. Click to load it

You're missing the point.  Imagine you are writing to <something> -- but,
I reach in and twiddle with the electrical signals that control that write
and do so in a manner that isn't easily predicted.  You (the software and
hardware) THINK that you are doing one thing but, in fact, something else
is happening, entirely.  The hardware doesn't think anything "wrong" is
happening (its notion of right and wrong are suspect!)

Remember that a FLASH write (parallels exist for other technologies) is
actually an *erase* operation followed by a write.  And, that you're
actually dealing with "pages"/blocks of data, not individual bytes.
The erase may take a fraction of a millisecond ("milli", not "micro")
to be followed by a slightly shorter time to actually write the new
contents back into the cells.

[The times are often doubled for MLC devices!]

During this "window of vulnerability", if the power supplies (or signal
voltages) go out of spec, the device can misbehave in unpredictable ways.

[This assumes the CPU itself isn't ALSO misbehaving as a result of the same
issues!]

This can manifest as:
- the wrong value getting written
- the right value getting written to the wrong location
- the wrong value getting written to the wrong location
- the entire page being partially erased
- some other page being erased
etc.

Change the manufacturer -- or part number -- and the behavior can change
as well.

What's even worse is that this sort of failure can have FUTURE consequences
for reliability.  Given that power was failing at the time of the event,
the system probably doesn't know (nor can it easily remember) what it
was in the process of doing when the power failed.  So, it doesn't know
that a certain page may have only been partially programmed -- even if
it has the correct "values", there might not be the required amount of
charge stored in each cell (you can't count electrons from the pin interface!)

So, data that *looks* good may actually be more susceptible to things like
read (or write) disturbances LATER.  You operated the device outside its
specified operating conditions so you can't rely on the data retention
that the manufacturer implies you'd have!

If your "memory device" has any "smarts" in it (SSD, SD card, etc.)
then it can perform many of these operations for *each* operation
that you THINK it is performing (e.g., as it shuffles data around
for load leveling and to accommodate failing pages/blocks).

*You* think you're updating the "name" portion of a record but
the actual memory device decides to stuff some bogus value in the
"birthdate" portion -- perhaps of a different record!

Quoted text here. Click to load it

And, a week later, the data that I had stored in "birthdate" is no longer
present in the journal as it has been previously committed to the store.
So, when it gets corrupted by an errant "name update", you'll have no
record of what it *should* have been.

The DBMS counts on the store having "integrity" -- so, all the DBMS has
to do is get the correct data into it THE FIRST TIME and it expects it
to remain intact thereafter.  It *surely* doesn't expect a write of
one value to one record to alter some other value in some other
(unrelated) record!

[Recall this was one of the assets I considered when opting to use a
DBMS in my design instead of just a "memory device"; it lets me perform
checks on the data going *in* so I don't have to check the data coming
*out* (the output is known to be "valid" -- unless the medium has
failed -- which is the case for power sequencing on many (most?) nonvolatile
storage media.]

Quoted text here. Click to load it


Site Timeline