Linux embedded: how to avoid corruption on power off

- P
- pozz
  
  Contact options for registered users
posted
6 years ago

Fri, Jun 16, 2017 10:10 AM

I'm playing with a Raspberry system, however I think my question is about Linux embedded in general.

We all know that the OS (linux or windows or whatever) *should* be gracefully powered down with a shutdown procedure (shutdown command in Linux). We must avoid cutting the power abruptly.

If this is possible for desktop systems, IMHO it's impossible to achieve in embedded systems. The user usually switch off a small box by pressing an OFF button that usually is connected to the main power supply input. In any case, he could immediately unplug the power cord without waiting for the end of the shutdown procedure.

I'm interesting to know what are the methods to use to reduce the probability of corruption.

For example, I choose to use a sqlite database to save non-volatile user configurable settings. sqlite is transaction based, so a power interruption in the middle of a transaction shouldn't corrupt the entire database. With normal text files this should be more difficult.

I know the write requests on non-volatile memories (HDD, embedded Flash memories) are usually buffered by OS and we don't know when they will be really executed by the kernel. Is there a method to force the buffered writing requests immediately?

Other aspects to consider?

- R
- Reinhardt Behm
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Fri, Jun 16, 2017 10:45 AM

sync should do this, but you can never be sure that the buffers and caches of the medium will also b flushed. I have several devices that *may* not get corrupted (helicopter navigation). But here I don't have many writes. I mount all partitions read-only. When a write is needed the partition gets temporary remounted as rw (mount -o remount,rw ...) and after the write (and sync) is done it gets remounted as ro. This together with a journalling file system keeps the "window of vulnerability" as small as possible. With more than 100 system flying for more than 10 years I did not have one corrupted file system. The first half of that systems used mechanical hard disks in such a high vibration environment without problems and no, no shock mounts, fixed rigid mounting. It's no guaranty but at least a low probability of problems.

--
Reinhardt

- J
- Jack
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Fri, Jun 16, 2017 11:13 AM

you put a big capacitor on the power line so that when the main power is cu t, the capacitor keep the system running the time necessary to gracefully s hutdown.

Bye Jack

- P
- pozz
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Fri, Jun 16, 2017 11:19 AM

When exactly you call sync? I can call sync everytime my running applications write something on non-volatile memories, but I don't know when the OS writes something.

What is your linux distribution? I know it's not simple to mount the root filesystem as read-only. Some Linux tasks start from the assumption they can write on root filesystems.

Do you use ext3 as journaled filesystem?

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Fri, Jun 16, 2017 11:31 AM

Some storage mediums, such as sd cards, are very sensitive to being corrupted if they are powered off unexpectedly. There just is no way to make an sd card storage reliable in the face of unexpected power offs.

In general, using read-only mounts helps enormously. In finer detail, some filesystems are more robust for unexpected poweroffs, such as LogFS, NILFS and FFS. And some file types are more susceptible to problems - sqlite databases are notorious for being corrupted if writes are interrupted.

The only way to be really safe is to have enough capacitance on board to be able to finish your writes when the supply is cut off. Keep your writes short, use kernel parameter tuning (or filesystem mount options) to minimise the dirty data buffer time, and make sure that as much as possible is mounted read-only with things like /tmp, /var/lock, etc., on tmpfs mounts. And when a power fail occurs, re-mount your r/w filesystems as read only - that should force writing out buffers as fast as possible, and block any more writes, and can be perhaps be done early in the controlled shutdown (if you don't need to write too much log data on shutdown).

- U
- upsidedown
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Fri, Jun 16, 2017 12:38 PM

I second this.

E.g. for 24 V (typically diesel engines) use a series diode, a _big_ storage capacitor and a SMPS with input voltage range at least 8-28 V, this should be enough time to sync out the data.

Using SMPS input range much larger than 3:1 might not be that productive, since at the lower capacitor voltages, the energy stored is very small (inverse square law).

- R
- Reinhardt Behm
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Fri, Jun 16, 2017 1:42 PM

After closing all open written files. So there is nothing buffered inside the application. Files open for read only are not effected. Writes are quite short, mostly files are newly created, written and closed again, like when the pilot changes some waypoint(s).

The first was based on SuSE 9.3, later Arch. /etc, /var and /tmp are in RAM, /etc is created and filled during startup from a tarball in the initrd. Changing some configs is not simple. But this is rarely done. The software is certified (DO-178 DAL D), so changes require re-certification and a lot of paperwork.

Yes, the first system used Reiser (2002).

--
Reinhardt

- R
- Reinhardt Behm
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Fri, Jun 16, 2017 1:52 PM

You never know when the sd cards is done with writing which can include internal re-organization for wear leveling.

--
Reinhardt

- J
- John Speth
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Fri, Jun 16, 2017 6:24 PM

You'll also need some hardware to signal the software that external power has turned off. That event will tell your system that a little bit of time remains to shut down before the caps drain and the world ends.

JJS

- J
- Joe Chisolm
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Fri, Jun 16, 2017 6:30 PM

Google raspberry pi ups There are several options if you dont want to roll your own

--
Chisolm 
Republic of Texas

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Fri, Jun 16, 2017 6:39 PM

IME, you need TWO signals (if you have a physical power switch):

- one to indicate that the user has turned the power off (perhaps because it is easier than using whatever "soft" mechanisms have been put in place; or, because the system has crashed or is otherwise not responsive ENOUGH to use that primary approach)

- one to indicate that the user had hoped to keep the device running but the mains power has been removed

Depending on the topology of the power supply, you can get a lot of advanced warning (e.g., monitoring the "rectified mains" on a switcher) or very little (e.g., monitoring the regulated output(s) to detect as they are falling below their minimum required voltages.

Note that your advanced warning has to tell you when ANY of the supplies you will be *requiring* will fall out of spec.

- U
- upsidedown
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Sat, Jun 17, 2017 2:35 PM

Very little extra hardware is required. Use an optoisolator and you can easily monitor the primary input voltage, including mains voltage. Some comparator functionality and it will drive some of the RS-232 handshake pins, which then generates an interrupt.

One issue with big storage capacitors is how fast it is charged when the power is restored. The SMPS reset circuitry should operate reliably even when the input voltage is slowly restored.

With big diesels especially at cold temperatures, the starting current will be huge, dropping the battery voltage momentarily. This voltage drop will initiate the data save, but after that, the routines should check if the input voltage has returned and based on this, continue normal operation or perform a full shutdown.

- L
- lasselangwadtchristensen
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Sun, Jun 18, 2017 6:47 PM

er.com:

is cut, the capacitor keep the system running the time necessary to gracef ully shutdown.

s.

I've done UPS for a linux system with supercaps, cpu held in reset until the supercaps are fully charged. One pin is used to monitor input supply and issue a shutdown if it goes away, another pin is used with the gpio-poweroff driver to issue a reset in case the power has returned when the shutdown is complete

-Lasse

- P
- pozz
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Mon, Jun 19, 2017 10:47 AM

Il 16/06/2017 13:31, David Brown ha scritto:

What? I choosed sqlite because they say the corruption of a database is a very rare event.

formatting link

- J
- Jack
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Mon, Jun 19, 2017 11:27 AM

Chapter 3.1 and 4

Bye Jack

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Mon, Jun 19, 2017 11:41 AM

Sorry for causing alarm, I was mixing this up with something else.

You are, of course, still reliant on the OS and the hardware acting together to get the right behaviour here. The OS will tell the sqlite library when it believes the data has been written to the disk, but a memory card may still mess around with writes, garbage collection, etc., after that.

- G
- George Neuner
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Tue, Jun 20, 2017 2:40 AM

It depends. Sqlite is designed to be embedded in a foreign program, so it is vulnerable to developer errors in ways that server based DBMS are not.

But that's only part of it. There also are hardware - primarily storage - reliability issues to consider.

As part of the embedded focus, by default, Sqlite does not enable "WAL" which stands for "write ahead logging". More generally, this capability is referred to as "journaling".

You must enable WAL for the database if you want maximum safety. But even with WAL enabled, the database can be corrupted by hardware hiccups during a write.

The point I've been debating with DonY in another thread is that, when using WAL, a database corrupted during normal operation ALWAYS SHOULD BE RECOVERABLE.

The problem is that tuning WAL use is a balancing act: the safer you want to be, the more extra file space you need for the log.

SSDs mess with the DBMS's ability to recover from a write failure: the update/erase "block" size != write "page" size issue is a serious problem. It's a failure mode that [essentially] can't happen with other types of storage systems, and something that DBMS never were designed to handle.

If the SSD erase block (not "page") size is not ridiculously large, you can configure the DBMS's "file block" size to match it. It will require additional RAM, may slow down i/o (depends on the SSD), and it will require lots of additional file space for the WAL, but it will greatly improve the odds that your database won't get corrupted.

If you can't match block sizes, then you have a problem. The best solution is multiple SSDs in a self repairing RAID. If that isn't possible, then the only viable solution is frequent offline backups of the database.

George

- K
- Krzysztof Kajstura
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Tue, Jun 20, 2017 7:34 AM

W dniu 2017-06-16 o 12:10, pozz pisze:

I think that in this matter, most reliable is UPS. For example, in our RB300 we are using UPS based on supercaps. The microcontroller monitors the power supply voltage and controls the system shutdown in case of a power failure. More details:

formatting link

- P
- pozz
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Tue, Jun 20, 2017 8:51 AM

Il 19/06/2017 13:27, Jack ha scritto:

Ok, however those problems (memories that lie about the write has really finished) are common to all other file types, not only sqlite.

- P
- pozz
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Wed, Jun 21, 2017 6:57 AM

Il 20/06/2017 09:34, Krzysztof Kajstura ha scritto:

Does the UPS supply only CM, so at low voltage? How long the supercaps are able to supply correctly the CM, after cutting the input voltage?