Backup Software

Please accept my apology for this not being exactly an embedded question, but as this is my favorite Linux group, this is the primary place for me to seek for advice. (and in special cases this _can_ be useful for embedded devices)

I am planning to do a highly automatically working backup program to be released under GPL.

Same should use an external harddisk (or flash disk) and a DVD (or CD) burner as backup media.

According to how long a version of the file is on the source tree, files should be stored on both media, using a suitable algorithm: the external disk should hold several versions of any file that changes between the backup cycles (daily versions for some days, fewer versions for less recent changes), long living versions finally are permanently stored on the DVD instead. Thus both media together are hold an image of the source tree, while the external harddisk is not likely to overflow. (Of course special manual actions should be provided that move everything to the DVD-chain, etc.) I plan to use a database (e.g. mySQL) to hold a directory of all versions of all files on the backup media.

Some questions:

- Does such a project already exist so that I don't need to start from scratch ?

- Does it make sense to use a compressing file system for the external disk ? Which one is known as save ?

- How is it possible to write to a DVD ROM ? I suppose the easiest way is to prepare the content on the harddisk and use a commandline tool to move it on the DVD. Any pointers ?

Thanks a lot,

- Michael

--


Mit freundlichem Gruß,

Michael Schnell
- Firmware-Entwicklung -

LUMINO Licht Elektronik GmbH, Germany
Europark Fichtenhain A8, D-47807 Krefeld
www.lumino.de
fon: +49 (2151) 8196-172
fax: +49 (2151) 8196-272
Reply to
Mikey Quick
Loading thread data ...

Please accept my apology for this not being exactly an embedded question, but as this is my favorite Linux group, this is the primary place for me to seek for advice. (and in special cases this _can_ be useful for embedded devices)

I am planning to do a highly automatically working backup program to be released under GPL.

Same should use an external harddisk (or flash disk) and a DVD (or CD) burner as backup media.

According to how long a version of the file is on the source tree, files should be stored on both media, using a suitable algorithm: the external disk should hold several versions of any file that changes between the backup cycles (daily versions for some days, fewer versions for less recent changes), long living versions finally are permanently stored on the DVD instead. Thus both media together are hold an image of the source tree, while the external harddisk is not likely to overflow. (Of course special manual actions should be provided that move everything to the DVD-chain, etc.) I plan to use a database (e.g. mySQL) to hold a directory of all versions of all files on the backup media.

Some questions:

- Does such a project already exist so that I don't need to start from scratch ?

- Does it make sense to use a compressing file system for the external disk ? Which one is known as save ?

- How is it possible to write to a DVD ROM ? I suppose the easiest way is to prepare the content on the harddisk and use a commandline tool to move it on the DVD. Any pointers ?

Thanks a lot,

- Michael

Reply to
Mikey Quick

If you are backing up to DVD or CD, you have to be aware that these media have limited lifetimes (although no one is quite sure what that lifetime might be...).

Before thinking of a backup strategy, think about your restore strategy. People often forget that, and come up with a system that requires searching through a dozen "incremental" backup CDs to get the files they are really looking for. The other key point for your backups is that they must be kept off site - your backups are of little value if a thief or a fire destroys the backups as well as the originals.

The backup system I use is "dirvish" (

formatting link
). It is basically a wrapper around rsync - each backup is a direct full copy of the source tree. There are a few smart things about it - using rsync, it only copies over the differences between the current source and the latest backup. It also snapshots the source, so that in your backup directory you have a series of directories labelled by date, each containing an image of the original source tree on that date. It uses hard links during the backup procedure, so that unchanged files are multiply linked and thus do not take up extra space. Finally, it has a flexible system for giving expiry dates to backups, such as daily backups being kept for a month, weekly backups kept for three months, and monthly backups kept for a year.

The rsync protocol and the use of hard links mean that only changes are ever backed up, yet the backup always appears as a full backup. I use it for doing backups of the office servers (something like 200 GB of data) to a server at my home over an ADSL link - it seldom takes more than an hour, since there is not a huge amount of change to the data set each day, and couple of ordinary hard disks on the backup machine are sufficient for years (lvm is your friend - it's important that all the disks are part of the same file system).

And if I need to restore anything, I've got online snapshots of all the data, for any given date.

Reply to
David Brown

David, Thanks a lot for answering

Of course you are right. That is true for tapes, as well. I even read the recommendation to use DVDs instead of tapes, as the small likelihood that in some decades functional drives for reading the tapes used will not be available is the largest danger to be considered

Again you are right. This concept is primarily meant for recovering versions lost by human actions and only secondarily meant for disaster recovery. So fully automatic backup and the possibility to restore single files is the _primary_ intention.

There are lots of backup solutions for disaster recovery that might be use additionally. In case of disaster you would need to set up a working system before you can use the backup media, so this is not a quick solution anyway. To be fast, a mirrored system is the way to go. Of course a restore strategy also suitable for disaster and other mass-restores might be a good enhancement if the system is doing it's primary purpose OK.

Right again. Same issue with any backup solution. And very often forgotten. Doing the backup remotely over a Network helps independently from the software used. Doing an incremental ("intelligent") strategy reduces the Network traffic greatly. IMHO the strategy I consider is very useful for "over the wire" backups.

Thanks a lot for the pointer. I'll take a look at dirvish/rsync. (If I can find something usable for me, I'm not going to do it myself)

How does this rsync handle versions of modified files ? any chanced to get back a previous version after some days ?

How does this rsync handle deleted files ? Any chanced to get one back after some days ? Don't they clutter the backup disk ?

Thanks again,

-Michael

Reply to
Michael Schnell

David, Thanks a lot for answering

Of course you are right. That is true for tapes, as well. I even read the recommendation to use DVDs instead of tapes, as the small likelihood that in some decades functional drives for reading the tapes used will not be available is the largest danger to be considered

Again you are right. This concept is primarily meant for recovering versions lost by human actions and only secondarily meant for disaster recovery. So fully automatic backup and the possibility to restore single files is the _primary_ intention.

There are lots of backup solutions for disaster recovery that might be use additionally. In case of disaster you would need to set up a working system before you can use the backup media, so this is not a quick solution anyway. To be fast, a mirrored system is the way to go. Of course a restore strategy also suitable for disaster and other mass-restores might be a good enhancement if the system is doing it's primary purpose OK.

Right again. Same issue with any backup solution. And very often forgotten. Doing the backup remotely over a Network helps independently from the software used. Doing an incremental ("intelligent") strategy reduces the Network traffic greatly. IMHO the strategy I consider is very useful for "over the wire" backups.

Thanks a lot for the pointer. I'll take a look at dirvish/rsync. (If I can find something usable for me, I'm not going to do it myself)

How does this rsync handle versions of modified files ? any chanced to get back a previous version after some days ?

How does this rsync handle deleted files ? Any chanced to get one back after some days ? Don't they clutter the backup disk ?

Thanks again,

-Michael

Reply to
Mikey Quick

Most disasters are caused by human actions (hopefully the reverse is not true...)

One of the great advantages of dirvish is that you have a mirror, and thus quick and easy restores of either single files or whole trees.

Dirvish lets you use incremental backups, with the underlying rsync even supporting incremental backups of changes to large files, so that things like mailboxes are handled efficiently. An incremental backup of our office data normally takes under an hour - the single original full backup would have taken weeks over the same wire, so I cheated and plugged the backup server into our local network for the first run.

Dirvish gives you a higher level viewpoint - it uses the goodies from rsync, without you having to read the details in the man page, or write your own scripts. But it's also worth reading about rsync, to see what's happening underneath.

Using dirvish (or your own rsync scripts), yes, no problem. All the old backup snapshots are easily available (depending on your "expire" policies for backup snapshots). When a file has not changed between backups, extra copies are hard linked to the old copies, avoiding wasted disk space. Changed versions get their own files - but the network copy may only send the changed parts, reducing network traffic.

If I delete a file today, the original remains in yesterday's backup snapshot, and is gone from this evening's backup run. Files eventually disappear off the disk if all their links are removed during "expire" runs to clear out old backups.

One thing to remember about dirvish (or other similar rsync based backup systems, as there are several around), is that you end up with a lot of files on the disk, and a huge number of links. It's also much more efficient if you have a single file system rather than spread the backups (for any given "vault") over separate file systems, since you can then use hard links. You can also expect the backup system to grow larger than you thought when you first setup the system.

The best way to set up the system (IMHO), therefore, is to use LVM and reiserfs. Reiserfs copes better with large numbers of files and links than most file systems, and has no (realistic) limits. It also supports on the fly size changes, with no issues about limited inode tables. So when you start getting low on disk space, you simply connect up a new hard disk, make it an LVM physical volume, link it to your volume group, and then add as much space as you need to your backup logical volume. I use LVM for most of my data volumes - any time I need more space, assuming there is free space left on the disk, I can grow the file systems in 10 seconds with a couple of commands without even umounting the filesystems.

Reply to
David Brown

All this sounds really promising. (As the unsolved Backup problem is the cause why I did not yet migrate the server to Linux, I now have hope that it can be done soon.)

Just the question of deleted files is not clear to me yet. My hope is that they are handled like versions: if I set the expire time to say 10 days (or whatever a suitable setting is), a deleted file should stay accessible on the backup for 10 days, too, and get deleted at day 11.

Thanks again,

-Michael

Reply to
Mikey Quick

That is correct.

A more usual arrangement for your expiry times is to have a variety - for example, this is an excerpt from my dirvish "master.conf" file:

# Most backups kept for 30 days # Monday backups kept for 6 months # First Monday in month kept for 3 years # Quad-yearly backups kept forever

expire-default: +30 days expire-rule: # Min Hr DOM Mon DOW Expire * * * * 2 +6 months * * 1-7 * 2 +3 year * * 1-7 1,4,7,11 2 never

Reply to
David Brown

Thanks a lot !

I definitely will do the server migration soon and use dirvish.

-Michael

Reply to
Michael Schnell

Still another question.

To move the the backup drive location away from the server, would it be possible to use a NAS device (instead of an USB or FireWire disk, as I intended before) as a backup medium. I understand that this NAS would need to use a Network protocol that supports hard links (NFS ? SSH ?) and should to use Raiser as a file system.

-Michael

Reply to
Michael Schnell

I have not used NAS (or NFS, for that matter), so I couldn't tell you. I also don't know how LVM would work with removable drives. But there is no requirement to use LVM or reiserfs - these are just my personal favourites for such systems (I've set up two such backup arrangements). I find the backup over ADSL to be far more convenient than using removable media - quite simply, it is far less effort and therefore far more reliable. But of course, that requires you having two locations linked by broadband.

Reply to
David Brown

I do follow your argument preferring raiser, of course

For this small site, I would find it secure enough to install the backup drive in the basement, which is very secure against fire. But it's a little damp so I don't want to install the server there.

-Michael

Reply to
Michael Schnell

Also check out Amanda... It's an enterprise backup solution. It backs up to tapes primarily. Its biggest advantage is that restores can be handled via ordinary tools - tar and friends - so a bare metal restore is easier.

Although dirvish sounds really nice for a small, single server system.

--Yan

--
  o__
  ,>/'_          o__
  (_)\(_)        ,>/'_        o__
Yan Seiner, PE  (_)\(_)       ,>/'_     o__
Certified Personal Trainer   (_)\(_)    ,>/'_        o__
Licensed Professional Engineer         (_)\(_)       ,>/'_
Who says engineers have to be pencil necked geeks?  (_)\(_)
Reply to
Captain Dondo

Thanks for the pointer

-Michael

Reply to
Michael Schnell

Of course all files will need to stay on the backup unless they are overwritten on the working disk.

(How) Does this setting provide for this ?

Thanks,

-Michael

Reply to
Michael Schnell

All files stay on the backup even if they are overwritten on the original disk. That's the point of a backup. When dirvish runs and makes a directory called "20060717" for the 17th July, 2006 backup, then that directory contains a snapshot copy of the original disk when the backup was run. The contents of the 20060717 backup directory never change after that, unless it is "expired" (using the dirvish-expire script) according to the expire rules used when the backup was taken. When it is "expired", the directory is completely removed.

Remember, all this stuff about hard links and rsync differential backups is just to make the process more efficient (both in terms of disk space and network bandwidth). The backups appear as simple full copies of the original source.

Reply to
David Brown

Really great !

Thanks a lot for taking the time to explain dirvish to me.

-Michael

Reply to
Michael Schnell

I hope it works out for you (and anyone else who was following this thread - I know it is off-topic in an embedded group, but backups are useful for everyone).

Reply to
David Brown

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.