What do you use for RPi backups?

- R
- Richard Falken
  
  Contact options for registered users
Vote on answer
posted
2 years ago

Fri, Apr 8, 2022 3:00 AM

Re: What do you use for RPi backups? By: Doc O'Leary , to All on Fri Apr 08 2022 04:47 pm

Not necessarily a Pi specific solution, but I find rsync to be fine.

Using tar with a pipe should not be very taxing to your resources, but big tar files are not very manageable.

Definetively try the rsync time machine desribed in Linux Magazine #258 (which basically means you do $ rsync -a Source_Dir First_Backup the first time you take a backup, and then $ rsync -a --link-dest=First_Backup Source Second_Backup for the next ones. That way the first backup will take long, but the next ones will be very quick.

- D
- Doc O'Leary ,
  
  Contact options for registered users
posted
2 years ago

Fri, Apr 8, 2022 4:47 PM

I’ve tried a few “common” packages, they all seem to fall down for me. I have 500K+ files to warehouse, ranging in sizes from tiny up to 10G+. The biggest problem seems to be RAM usage, and I’d like to have something that works on a 3B, and ideally a 0W.

Does anyone have something that still scales up when scaling down to an RPi?

Thanks.

- J
- Jan Panteltje
  
  Contact options for registered users
Vote on answer
posted
2 years ago

Fri, Apr 8, 2022 5:15 PM

On a sunny day (Fri, 8 Apr 2022 16:47:00 -0000 (UTC)) it happened Doc O'Leary , snipped-for-privacy@2017usenet1.subsume.com> wrote in <t2pou4$n66$ snipped-for-privacy@dont-email.me:

I have 3.4 TB USb harddisks connected to my Pi4 Shows up as /dev/sda Made a partition sda2 formatted with ext4 file system and directories I normally work with (root that is in my case) Every now and then I copy all data with cp -rp or cp -urp to that sda2 directory also wrote a script to backup mail to it. Or backup from the laptop with scp -p Never a problem.

Backups go to both Pi4 harddisks, so even if you drop one USB harddisk you still have your data. scripting is a good idea, prevents typing errors.

- A
- Ahem A Rivet's Shot
  
  Contact options for registered users
Vote on answer
posted
2 years ago

Fri, Apr 8, 2022 5:26 PM

Does rsync not get the job done ?

- M
- Martin Gregorie
  
  Contact options for registered users
Vote on answer
posted
2 years ago

Fri, Apr 8, 2022 6:56 PM

Absolutely, but strange as it may seem,some folks havent (yet) heard of rsync or rsnapshot.

I use rsync for my weekly backups. Recommended, because it never takes longer or does more work than absolutely necessary. I make weekly backups of two Linux laptops, my RPi and my Linux-based house server to a cycle of two 1GB USB drives (WD Essentials) on a weekly basis. Currently the complete set of backups occupies about 40% of these disks. The process is manually controlled from an SSH login to the which has the current USB backup drive connected to it, and immediately followed by a weekly software update.

FWIW the house server uses a nightly cronjob to make an rsnapshot backup to a single 2GB USB disk - this is more for recovering any previous day's finger troubles than anything else.

- H
- Hermann Riemann
  
  Contact options for registered users
Vote on answer
posted
2 years ago

Sat, Apr 9, 2022 11:12 AM

The dictionaries, which I want to save, I will copy this dictionaries with scp to special dictionaries in PCs.

The backup is done by PC.

- D
- Doc O'Leary ,
  
  Contact options for registered users
Vote on answer
posted
2 years ago

Sat, Apr 9, 2022 5:10 PM

Well, it gets *a* job done, but a good backup is more than just efficient file copying. Since I have multiple RPi devices (along with some non-RPi machines), my data is best managed with something that does deduping and snapshotting. Of all the things I tried,

formatting link

had the most useful features, but it struggled to run on my RPi for the files I have.

- S
- scott
  
  Contact options for registered users
Vote on answer
posted
2 years ago

Mon, Apr 11, 2022 3:30 PM

Have you looked at rsnapshot? I use that to back up to S3-compatible offsite storage, but it has several storage backends available.

- D
- Doc O'Leary ,
  
  Contact options for registered users
Vote on answer
posted
2 years ago

Tue, Apr 12, 2022 6:54 PM

My problem with rsync-based solutions is that they seems to do *way* too much processing in order to figure out how to do an efficient transfer. I mean, to my mind, if I change 3 files out of 500K for a total of 3MB out of 1TB of data, backing that up should be *fast*. I just don’t see that when I use rsync-based solutions.

I *do* already use rsync extensively for my current “backup” needs, but it just doesn’t have the smarts I’d like to see in a proper backup tool. Like I said, bup has features that are more in line with what I need, but it seems to have trouble scaling down to an RPi sized server.

I was just hoping that something out there with good support was better than the scripts I’ve written myself.

- D
- druck
  
  Contact options for registered users
Vote on answer
posted
2 years ago

Tue, Apr 12, 2022 9:30 PM

I make nightly rysnc differential backups of the SD cards of 15 Raspberry Pi's to SD card image files on a USB SSD drive connected to one of the Pi 4s. The Pi's are of various generations, from 2 to 4 with lots of Zero Ws, most are connected via WiFi, 3 on Ethernet a couple are at remote sites. On average there is 961MB data in 4426 files transferred, and the time to back up all of them sequentially is 14m26, which I don't think is too bad.

---druck

- F
- F. W.
  
  Contact options for registered users
Vote on answer
posted
2 years ago

Wed, Apr 13, 2022 8:04 AM

Am 08.04.2022 um 18:47 schrieb Doc O'Leary:

Overgrive

FW

- D
- DeepCore
  
  Contact options for registered users
Vote on answer
posted
2 years ago

Wed, Apr 13, 2022 8:40 AM

Doc O'Leary wrote on 2022-04-12 at 20:54:

According to the following article on Arstechnica, rsync has to inspect every file to determine which differences are to be sent over the wire... that's the processing you experience.

formatting link

Sometimes using ZFS is better ...

I'm actually trying out this way with a Pi 4 ...

- T
- Theo
  
  Contact options for registered users
Vote on answer
posted
2 years ago

Wed, Apr 13, 2022 10:32 AM

By default rsync just looks at file metadata: are the file's length, date, attributes the same? If so, skip it. If they differ, go through the file and work out what changed, then send the changes. That means it has to inspect every inode, but not every file. You can make it checksum the file contents for every file, rather than just those with differing metadata, which of course is a lot slower.

If you want to avoid the inode inspection you need help from the filesystem to keep track of changes when you save them, rather than when you backup. Filesystems like ZFS inherently do that. It's also possible to install software that monitors file changes dynamically. That makes backups faster at the expense of making file accesses slower.

Theo

- R
- Richard Kettlewell
  
  Contact options for registered users
Vote on answer
posted
2 years ago

Wed, Apr 13, 2022 10:53 AM

That’s true, but there’s another way that rsync can be inefficient for backups, depending what you’re trying to achieve.

If your model is that the backup is a single tree, mutated by each successive backup, then the cost of rsync is reading all the metadata on both sides, and copying the changes. The downside is that you don’t get historical backups.

If your model that you make a fresh tree for each backup, with hardlinks between unchanged files, then you have the additional cost of creating all the directories and making links to unchanged files. Cheaper than copying everything but still relatively expensive.

For me the fact that each backup is a complete tree, that can be navigated and restored with quite basic tooling, is enough of an advantage that I can accept that extra cost compared to a more efficient design.

- J
- Jim Jackson
  
  Contact options for registered users
Vote on answer
posted
2 years ago

Wed, Apr 13, 2022 5:48 PM

What do you mean by "have trouble scaling down to an RPi sized server". Have you tried bup? What were the problems you had?

I've not come across bup before and am curious.

- D
- Doc O'Leary ,
  
  Contact options for registered users
Vote on answer
posted
2 years ago

Wed, Apr 13, 2022 7:57 PM

I would expect that having a “smart” backup tool would also make it easier to track changes than simply rescanning everything every time like rsync does. The scripts I use now do some of that, and it’s not perfect, but it’s still a lot faster than just throwing rsync at a folder hierarchy and letting it work out what to do.

I’m pretty sure inotify support is a default part of the Linux kernel already. I’ve certainly been thinking of tapping into it to make my scripts even more efficient, but I really was hoping there was some custom backup software that has already done the heavy lifting for me.

- D
- druck
  
  Contact options for registered users
Vote on answer
posted
2 years ago

Wed, Apr 13, 2022 8:01 PM

And it does work. I had a Pi fail to reboot yesterday, I found the the superblocks on the root partition been corrupted, so unfixable. So I just used dd to overwrite it with last night's backup image, and it was working again within 5 minutes.

---druck

- R
- Richard Kettlewell
  
  Contact options for registered users
Vote on answer
posted
2 years ago

Wed, Apr 13, 2022 8:12 PM

inotify has been in place for years, but it doesn’t support whole-filesystem notifications. It looks like you want fanotify for that.

- D
- Doc O'Leary ,
  
  Contact options for registered users
Vote on answer
posted
2 years ago

Wed, Apr 13, 2022 8:13 PM

I didn’t dig deep into the root causes, but it started choking on large file sets. What it looked like to me in testing is that it was sucking up a ton of RAM for indexes/hashes/whatever when I threw a lot of files at it. Like it was designed with the assumption that “big data” necessarily required a big machine to handle it. I even gave it a ton of swap space so that it could complete rather than die on my 1GB RPi, but it churned so much and went so slow that I had to kill it anyway.

On the whole, I like its approach. I use git myself as a software developer and my own backup scripts borrow a lot of the same concepts as bup. If I had more time, I’d look into what needed to be rearchitected in bup to make it work for my use case. Until then, I’ll stick with my scripts and keep looking for someone to champion a leaner solution.

- D
- Doc O'Leary ,
  
  Contact options for registered users
Vote on answer
posted
2 years ago

Wed, Apr 13, 2022 8:20 PM

That’s indeed fantastic; thank you for sharing those stats. All I can say is that the behavior I see from rsync isn’t nearly as impressive. I have single machines with *no* changed data that take over 15 minutes for rsync to process.