Ext3 Corruption CF

I'm working on a small embedded Linux device and am having fs corruption issues using ext3. Here is the scenario.

The device has a 512MB CF(SanDisk) and an on-board flash. Kernels sit in the on-board flash and are booted via redboot. They are all custom kernels 2.4.18. w/ ext3 opts' on and DMA off.

The CF is partitioned with fdisk into two equal slices hda1 and hda2.They are then formated ext3 with standard options i.e. just the -j flag to mke2fs.

On the 1st run the systems start successfully ! When we reboot or power cycle and start again the kernel hangs at the partition check. To recover the system we boot to a maintenance kernel and run fsck. Note we must run fsck with the -f flag as fsck thinks the fs is OK. A number of orphan inodes are repaired and upon reboot the system stars successfully.

I have attempted to circumvent the partition check with hda=noprobe and passing the drive geometry at boot. This passes the partition check but I get a panic when the kernel is unable to mount the root fs.

This has been a very frustrating issue if any one could provide some insight it would be greatly appreciated.

Reply to
Matthew C. Higgins
Loading thread data ...

When you say "reboot or power cycle" are you shutting the system down properly or just turning off the power? Ext3 is more robust to losing power (more precicely shutdown without disk sync) compared with ext2, but it is not immune to problems.

Trev

--
Trevor Barton
Isotek Electronics Ltd, 9 Clayton Wood Bank, Leeds, LS16 6QZ, UK.
Tel: +44 (113) 275 1339, Fax +44 (113) 224 9827
Remove X from tmb@Xisotek.co.uk before replying.
Views expressed are my own and not necessarily those of Isotek Electronics Ltd.
Reply to
Trevor Barton

Don't use ext3 on compactflash. That's not the answer you want to hear, but its the right one.

--
Alex Pavloff - remove BLAH to email
Software Engineer, ESA Technology
Reply to
Alex Pavloff

It fails in both cases .. I wanted to make clear that it will fail on a standard reboot as well as a power fail.

Reply to
Matthew C. Higgins

I do not have a choice of file systems. the decision comes from on high and I just have to make it work. I assume you think we should be using jffs, or do you have another suggestion?

Reply to
Matthew C. Higgins

Don't use jffs either. On compactflash, use a file system without journaling. Mount everything you can read only. You just can't write to Compactflash on a regular basis like you would with a hard drive and expect the system to keep running in a month, especially if you're buying consumer grade compactflash.

My system running off compactflash uses freedos+loadlin (I'll move to SYSLINUX soon), doesn't change the FAT format of the compactflash, and has the main filesystem as a ext2 loopback image mounted read-only. /tmp, /var and all those directories go into a ram disk.

The few writes that will be done go right to the FAT, and they're done in extremely rare circumstances prompted by a user.

If you really can't fix your system to behave without writing to the CF, you might want to see if you can put an Microdrive in. Then you could write all day long to the thing.

--
Alex Pavloff - remove BLAH to email
Software Engineer, ESA Technology
Reply to
Alex Pavloff

What filesystem have you specified in /etc/fstab ?

I had a problem on my workstation that resembles yours.

  • Bought a new disk
  • Formatted it ext3
  • Copied a filesystem (including /etc) from an ext2 partition
  • Rebooted - all was well
  • Tried a 2.5 kernel - crashed
  • Rebooted 2..4 - did not boot! (message about feature not supported)
  • ... time ...
  • Run a ext3 enabled e2chk from a boot CD (knoppix) - it cleaned the journal
  • Happy could boot again
  • Changed the file system type from ext2 to auto in /etc/fstab...

BTW, make sure you do not write to the CF if you abruptly remove power. It can destroy the CF!

/RogerL

--
Roger Larsson
SkellefteƄ
Sweden
Reply to
Roger Larsson

...

You do NOT want to power-cycle a system where some code might be writing to the CF that very moment. This may KILL your CF - not only corrupt the filesystem, but destroy the CF at controller level.

CF was not designed for power failure while writing, so don't use it under such conditions.

Wolfgang Denk

--
Software Engineering:  Embedded and Realtime Systems,  Embedded Linux
Phone: (+49)-8142-4596-87  Fax: (+49)-8142-4596-88   Web: www.denx.de
Microsoft Multimedia:
You have nice graphics, sound and animations when the system crashes.
Reply to
Wolfgang Denk

I have auto set across the board in fstab further options I have set are noatime,sync.

Reply to
Matthew C. Higgins

I've heard this but never an explanation of the mechanism that's causing the failure. Does anyone have any idea why powering off while writing might cause a permananent failure other than corruption of the filesystem?

Trev

--
Trevor Barton
Isotek Electronics Ltd, 9 Clayton Wood Bank, Leeds, LS16 6QZ, UK.
Tel: +44 (113) 275 1339, Fax +44 (113) 224 9827
Remove X from tmb@Xisotek.co.uk before replying.
Views expressed are my own and not necessarily those of Isotek Electronics Ltd.
Reply to
Trevor Barton

I don't have a validated explanations, but I actually fried some CF cards that way, ao I am pretty sure abouyt the fact.

A theory: as the controller internal to the CF has to perform some wear levelling, it iwll have to record a certain amount of meta data. Probably it does this recording to flash, too. Maybe some controllers can run into situations where they lose or corrupt their meta data. Note that this is 100% speculation.

Wolfgang Denk

--
Software Engineering:  Embedded and Realtime Systems,  Embedded Linux
Phone: (+49)-8142-4596-87  Fax: (+49)-8142-4596-88   Web: www.denx.de
As far as the laws of mathematics refer to reality, they are not cer-
tain, and as far as they are certain, they do not refer  to  reality.
                                                   -- Albert Einstein
Reply to
Wolfgang Denk

It sounds good, but is not what I've seen (note: there were NOT Sandisk Cf cards): in my case, the cards were dead. NO access would work, including formatting.

Wolfgang Denk

--
Software Engineering:  Embedded and Realtime Systems,  Embedded Linux
Phone: (+49)-8142-4596-87  Fax: (+49)-8142-4596-88   Web: www.denx.de
The use of COBOL cripples the mind; its teaching  should,  therefore,
be regarded as a criminal offense.                   - E. W. Dijkstra
Reply to
Wolfgang Denk

Obviously the CF uses an internal table of replaced sectors and other things that have to do with the degradation of the usability after many write cycles (perhaps write counters and such). When power goes off when writing to that table same can be corrupted and the device is not accessible any more. I suppose the table can be rewritten with special means by the manufacturer, but they don't seem to provide this to the users

-Michael

Reply to
Michael Schnell

How long ago was this? Maybe the problem has been fixed by now? A CF card being rendered useless just because of a power failure sounds like a major bug in the design of the CF and you would think that it would be high on anyones agenda to be fixed.

Regards,

PT

Reply to
Paul Taylor

Could this have been due to some chicken and egg problem (i.e. driver scans for devices at boot time, marks the CF as unavailable because it doesn't respond as expected and then it doesn't really attempt to format it since it thinks the device isn't there) ?

Just a wild guess.

Rob

--
Robert Kaiser                     email: rkaiser AT sysgo DOT de
SYSGO AG                          http://www.elinos.com
Klein-Winternheim / Germany       http://www.sysgo.de
Reply to
Robert Kaiser

Below part of a mail I received recently from a SanDisk technician:

"Dear Bernhard,

The industrial grade compact flash card offers exceptional endurance and reliability. The rumor about the card being "totally destroyed" after a few power cycles has no basis. Please understand that if power is taken out during a write operation to the card, it is possible that the data will be corrupted in the sector it was being written to. If the data corruption happens to be of uncorrectable ECC type, then the controller inside the card has no way of correcting the data and writing it back to the card. In that case, the card will fail. However, one can recover from it by simply reformatting the card."

Regards,

--
Bernhard Roessmann
Don't Fear The Penguins!
Reply to
Bernhard Roessmann

...

I've been told that indeed such versions of CF cards exist from some manufacturers (SimpleTech ?). However, I never had one of those in my hands yet.

Best regards,

Wolfgang Denk

--
Software Engineering:  Embedded and Realtime Systems,  Embedded Linux
Phone: (+49)-8142-4596-87  Fax: (+49)-8142-4596-88   Web: www.denx.de
Anyone who isn't confused here doesn't really know what's going on.
Reply to
Wolfgang Denk

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.