We have these boxes that have no monitor or keyboard or mouse access, just ssh. Sometimes when we reboot we get one of those fsck could not fix the errors press enter or control D. Is there a way to tell fsck to ignore that, to run anyways? we have tried ext3 and reiserfs, they both come up with the same error if we treat the box bad enough :-(
Did you think to read the man page? There is a -y option, but you should be aware that this may help you shoot yourself in both feet at the same time.
The obvious solution is to not treat the system that badly. If you feel that you must be able to do so, then make the file system Read Only either by mounting it so, or making it on a CD or DVD, and send the volatile data over the network to a system that won't be so abused.
of course we read it, according to the guy working on it we are already using every option possible to make it work on its own, including -y.
The problem with the system is that we have customers that love to just pull the plug on the back without shutting them down properly. I know my Tivo doesn't even have a power button and I have had to unplug it while recording stuff and it never has frozen on reboot, then maybe they use a different FS or they don't run fsck. We are considering the read only mounting.
read-only mounting of ur root filesystem seems the best option. Whatever data u are trying to write save can be put onto a separate partition/filesystem.in such a case u mite lose data but will never come up with a frozen system
That makes for problems. If you can't smack the customers upside the head a few times to indicate that's a bad thing, about the only thing you can do is to provide a split power source, so that the lights and displays die immediately on loss of power and the CPU and disk retain power long enough to shut down cleanly after recognizing the loss of power. Probably not an easy thing to put into practice.
The "program" portion of the system is RO media, so yanking the power can't screw that. There _MAY_ be a mechanism to recognize when the power has failed, and to immediately stop writing to disk (and hope that any buffers get written correctly) so as to not squitter all over the platters when the power is insufficient to control where the head is located.
That's a highly desirable solution. Used to be you could buy the better grade of disks that had a write-enable jumper. With the jumper (or an external switch) in one position, the entire drive was read-only. Moving the jumper (or changing the switch position) allowed the drive to be made writable. This was entirely independent of how the drive was mounted. We used this feature on our public facing systems, so that even if someone cracked in, they had no place to even temporarily store their rootkit or whatever, never mind altering the files. We're now using bootable CDs or DVDs as those hard drives are somewhat scarce today.
In Message-ID: , you also ask:
Standard answer: unused RAM is wasted RAM - Linux uses it to cache a lot of things.
Both - and don't forget that most hard drives also have a cache. Turning caching off completely makes little sense, even in your "pull the plug" scenario. You will slow the system down slightly, but you can turn off the write caching - depending on the file system you are using. Look at the 'sync' option to 'mount', and depending on your kernel. you may be able to tweak the timing of 'bdflush' if your kernel uses it. The command 'apropos sync tune' might offer further clues.
Hadn't considered the rootkit stuff, that makes it even more desirable to do read only. Can we remount the root partition for writing while we are ssh into that system?
:-) but also there is never such a thing as too much ram!
Actually we have a lot more caching because sometimes we have virtual machines so the virtual OS caches, vmware caches, linux caches and the drive caches! We wouldn't be so worried about write caching other than we have so much free ram we are worried linux is going to cache a lot and then in the event of pulling the plug on the system is going to be really bad!
Another help - not having the compiler (gcc) or the development libraries (example glibc-devel-2.3.5, as opposed to the required glibc-2.3.5) on the system. Note that if someone gets in AND knows what they are doing, they can remount the system RW. That's why we use 'read only' media. Don't forget that if you are using a package manager (rpm), if the system gets compromised, they can use rpm to load pre-compiled binaries directly over the net.
Look at the boot scripts - it's doing something like this already. When you boot, the file system check MUST be done on an unmounted filesystem OR ONE MOUNTED READ ONLY. For a typical Red Hat boot, you see something like
mount -n -o remount,ro /
when it discovers that the quicky fsck shows problems.
At one time, there was. Older motherboards could only cache up to so much RAM - maybe 16 or 64 Megs. When you added more RAM (especially with windoze which loaded into the top of RAM), the system actually got slower because you were using uncached RAM.
Mounting RO is going to help, but remember that you need writable stuff for /var/run/ /var/lock/ and similar, and the logs have got to go _somewhere_ to be useful. If you are fat with RAM, you could consider using a RAM disk for some stuff (certainly /var/lock and /var/run).
As far as Linux is concerned, look at tuning bdflush if your system is using it. I've never used virtual machines, so I don't know what might be in that ball of tar.