Re: Have ext3 on SanDisk CF but can't disable write-back caching as kernel instructs

Kevin Nathan · 2003-07-11T01:27:19+00:00

This post was probably refused because it *looks* like you're askingfor free technical support on a commercial product -- they probablyfelt that you should hire a consultant. Just a guess.How about feeding it a stream of known data and then just shuttingdown the power at various points? There's nothing like a pile of empirical data for answering questions . . . :-) -- Kevin Nathan (Montana, USA)Open standards. Open source. Open minds.The command line is the front line.Linux 2.4.20-4GB-athlon 7:20pm up 2 days 10:00, 7 users, load average: 0.00, 0.00, 0.03

W

Wolfgang Mües 23 years ago

A NAND Flash Sector is 16 KBytes and is erased in 2 ms.

My advice to the original poster is: use CF Cards from Sandisk (the "industrial" Grade(?)) and have some sort of Accumulator and a power fail line to do a sync on the CF before shut down. And don't use a journaling filesystem!

JFFS2 may be a better file system for FLASH, but it is mainly for NOR Flashes.

best regards Wolfgang Mües

(who uses JFFS2 on Strata Flash)

Vote

T

Trevor Barton 23 years ago

Are you convinced that powering off an ext3 filesystem is completely robust? It isn't, all the journalling done by ext3 does is reduce the chances that the filesystem will lose data on power off. There is still a finite albeit smaller probability that you will either lose some data or corrupt the underlying filesystem if it's not unmounted cleanly, compared with the same thing running just ext2.

The only way to eliminate that happenning is to mount all your filesystems read-only. This of course imposes some restrictions on things and you probably need to introduce ramdisk for stuff that the OS needs to write, and of course that is volatile.

Part of the problem with disk accesses is that the structure of the information disk is quite critical, because you have to modify quite a lot of information to successfully write a single byte of data into a file. The system may have to write that the file structure has changed (because it's started using a new sector, for example), and that the access time has changed, and that the modification time has changed, etc. Corrupt one of those writes, and you risk corrupting the entire disk structure.

If it were me that had to robustly write stuff to flash, I'd be looking at a non-filesystem method. For example, if I knew my data was bounded and less than the size of the flash, I'd just start writing data at the start of flash in blocks, and mark the block at the end to say it was the end block. That way it'll all be OK unless the last block isn't written. Or, on powerup you could clear the flash to a known non-zero value, and then write each block such that it had a known checksum different from that of unwritten blocks. That way, you know the data up to the first block with the wrong checksum is OK, all the rest is either unwritten or corrupt. I'm sure that you could come up with similar schemes for data pools that always grow, too, that might involve wrapping the buffer and a sequence number in each block so you could tell which was the earliest bit of data and where the end of good data was after a reboot.

Of course, it all depends on the relative cost to you of either reformatting a corrupt flash silesystem from time to time after pulling your thing out of the plane versus the extra cost of implementing a more robust storage mechanism. *No* filesystem is incorruptable, but ext3 is relatively good, and you might only have to pull out the system once in every 1000 poweoffs, for example. This may be acceptable, but if it isn't then you need to spend the time (and the money) making some other solution.

{ I know bugger all about flash, per se, but I wouldn't be surprised if there were mechanisms by which unrelated blocks could be written to in error if the power went down during a write. Things are all a but unstable when the power being lost, although I guess if it were well designed that could be avoided at the hardware level. }

Trev

Vote

P

Paul Taylor 23 years ago

When you say reliable is this mostly a concern because of the effects of vibration?

I know that ruggedized hard drives are available for use in aeroplane applications where the hard drive is mounted on vibration proof mounts increasing the life of the hard drive against vibrations. However they are big + expensive.

Have you checked out microdrives? - I would imagine they would perform much better than standard hard drive with regards to vibration. If you have to use a hard drive maybe it would be worth checking out their vibration specs.

Regards,

Paul..

Vote

D

Dan Harkless 23 years ago

No. Because of the "sync and then lock out all tasks that write to disk when we go onto capacitor power" behavior I described before, the ext3 is just there as an added protective measure against corruption (and to prevent time-consuming fscks at boot-time).

Not possible for our application. We need non-volatile storage for our data.

The problem is that we really need a filesystem. We have dozens of discrete data files, saved-state files, application logs, syslog logs, etc.

The _hope_ is that if we do the sync-and-then-prevent-further-writes-until-power-down thing that the filesystem won't get corrupted unless there's a bug in the ext3 drivers.

-- Dan Harkless snipped-for-privacy@harkless.org

formatting link

Vote

T

Trevor Barton 23 years ago

Yes, as I said it's a trade-off. I thought that there was some uncertainty about the length of the capacitor power reserve being sometimes inadequate, or was that some other poster's speculation?

Trev

Trevor Barton Isotek Electronics Ltd, 9 Clayton Wood Bank, Leeds, LS16 6QZ, UK. Tel: +44 (113) 275 1339, Fax +44 (113) 224 9827 Remove X from tmb@Xisotek.co.uk before replying. Views expressed are my own and not necessarily those of Isotek Electronics Ltd.

Vote

S

Steve Watt 23 years ago

Yikes, a certification nightmare from the start...

As has been said many times through the thread, this is probably not a good way to do things, because of how the ext3 journal is laid out.

You've said, later in the thread, that you have a power-off detector and a cap to provide some amount of run time after power failure.

I'd solve this with two separate CF devices: One to boot & root from, that gets mounted read-only. The other one holds volatile data. Create a ramdisk the size of the volatile data CF, and use that exclusively until you get the power-off signal. Then dump the whole thing to the volatile data CF, while holding off writes as you had described elsewhere. You might even unmount the ramdisk (if possible) before dumping it to the flash so the image is clean.

You need to be sure you have enough time on the backup cap for that write to complete, which shouldn't be all that much longer than the worst case for a whole boatload of dirty cache blocks.

Depending on how large your budget is (we *are* talking aircraft system prices, here), you could also go to some kind of battery-backed RAM for the read/write filesystem, as well. It's not that much more expensive than CF, and you get more flexibility.

Steve Watt KD6GGD PP-ASEL-IA ICBM: 121W 56' 57.8" / 37N 20' 14.9" Internet: steve @ Watt.COM Whois: SW32 Free time? There's no such thing. It just comes in varying prices...

Vote

D

Dan Harkless 23 years ago

Yes, there is currently uncertainty about that. The hope is that after doing some timing tests we'll find that under normal conditions we always have enough capacitor time to do the sync(). There's still a worry about _unusual_ conditions resulting in an abormally high amount of data that needs to be sync()ed, though, which is why I'm reluctant to give up on a journalled filesystem.

Multiple people have said that ext3 is unwise because it'll cause more writes to the Flash, but again, unless it's an order of magnitude more, I don't see how that would be an issue.

-- Dan Harkless snipped-for-privacy@harkless.org

formatting link

Vote

D

Dan Harkless 23 years ago

Heh...

And as I've said a couple of times, unless the ext3 journal causes an order of magnitude more writes to be done, or unless SanDisk's wear-leveling algorithm is bogus, I don't see how it's significant for Flash lifetime that we'll be using ext3 rather than ext2.

No, we may accumulate many megabytes of data during a given uptime. I'm sure we wouldn't always have time to dump that to Flash during the

200ms - 1 sec. we'll have between the low-voltage signal and capacitor power drainage.

Even if we're talking on the order of 256 MB worth?

Unfortunately it's a moot point, because the hardware has already been designed, initial units built, and has gone through initial qual-testing. Hardware was a done deal before I joined the company -- now I and the other software engineers need to determine how to make it work and be robust and reliable.

-- Dan Harkless snipped-for-privacy@harkless.org

formatting link

Vote

J

John Tetreault 23 years ago

Could you incorporate some sort of small battery into the unit to provide additional power beyond what a cap will provide? I mean... if you could even give yourself 4 seconds of power instead of 1, you could probably avoid any problem.

Vote

J

John Tetreault 23 years ago

Hell... Send it back to the hardware engineers and tell them to design something that actually works... And "gone through initial qual-testing"... I'd say somebody in QC should be fired for signing off on something with that major of a design flaw. (Truth is, that is what you have... a design flaw, that now some bean counter expects you to fix with software) Its the old gigo principal... you can't take a piece of hardware that is unreliable and unstable by design and make it reliable and robust with software. It simply doesn't work. What kind of idiot hardware engineer would design a piece of hardware that only gives you 1 second or less write time in event of a power failure, especially in a system that is guaranteed to have power failures on a regular basis? Anyone with half a brain would have thought... gee, maybe some sort of battery backup would be a good idea here for when we lose power... obviously they knew of the problem or they wouldn't have given you the 1 sec cap power... but that also means they obviously could have and should have, done more.

--

Vote

M

Michael Schnell 23 years ago

It can't be fixed with software as it partly has nothing to do with it. IMHO the CF manufacturer does not guarantee that the card can cope with turning the power off a second after the last write access. First they need to find out _this_ specification. If it can't be determined you just can't us a CF. Than someone needs to define the time the software needs to clean up when power fail is detected. Adding the two is the time power needs to stay intact after the power fail signal.

-Michael

Vote

E

Edward Lee epl 23 years ago

Probably one that got fired.

Not just the CF write time, also the file system sync. flush time. I would put in at least one minute. Furthermore, the CPU needs to be running on backup as well.

CFs are designed for devices with auto power off, i.e. cameras. Of course, if you yank out the battery while taking picture, there is no guarantee of the result. Using CFs for mission critical data storages will never fly, as the OP attempted to do, or are forced to do so.

Vote

S

Sales for IDE-CF flash drive 23 years ago

...

Just to have fun with it. We took a brand-new 4M CF, write random data to it continuously. It crashed (drive not ready) after approx.

25,000 write cycles in three days. We will run a few more tests to confirm it and post the results later.

Vote

B

Bernhard Roessmann 22 years ago

Maybe a reformat helps?

There are some differences between CF cards from different vendors, so please post vendor and exact part information too.

Regards,

Bernhard Roessmann Don't Fear The Penguins!

Vote

S

Sales for IDE-CF flash drive 22 years ago

Nop, can't access it anymore (drive not ready).

Controller: Feiya 1106ET-C314 Flash Memory: SEC KOREA 943 KM29W32000ATS

Vote

W

Wolfgang Denk 22 years ago

Probably not.

Try to get information from your vendor _when_ the CF perfroms it's wera levelling. Chances are that it does so only when power cycling. Running the CF continouosly == ruining it.

Wolfgang Denk

Software Engineering: Embedded and Realtime Systems, Embedded Linux Phone: (+49)-8142-4596-87 Fax: (+49)-8142-4596-88 Web: www.denx.de Artificial Intelligence is no match for natural stupidity.

Vote

F

fishbowl 22 years ago

....literally, according to the original message.

How bad will it be for the plane when this thing fails? I know it can't be a mission-critical piece, or else the design wouldn't call for cheap nonvolatile memory, right?

I live in a fantasy world where people who design electronics for aerospace don't use consumer components, or when they do, they get full disclosure of specs, real guarantees, that sort of thing.

This isn't so much a linux issue, as it is an issue with CF in general. In practice, the OP's box will probably never have a problem, but depending on how critical a failure would be, the knowledge that CF devices will go bad should have been a showstopper a long time before now. I don't envy his position. I don't think I could work very long in an environment where a poor decision is forced to stand, simply because "the decision was made." -- If this device is for in-flight entertainment, then yeah, that's ok. On the other hand, nav, commo, or flight control?

Vote

Re: Have ext3 on SanDisk CF but can't disable write-back caching as kernel instructs

Join the Discussion

Didn't find your answer?