Disaster recovery, how to automate as far as possible?

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Recently (like last week-end) my Pi that runs local DNS died, I did an
'apt update;apt upgrade', the /boot partition went read only and there
was no kernel to boot from (the "7 short flashes" code from the LEDs
was actually quite handy).  Presumably the SD card failed in some way.

I have good backups and was able to recover everything but it took
quite a while (like Sunday morning) and, as a result I've been
wondering if there's something I can do better.

My existing backups are incremental backups taken daily of /etc and
/home thus restoring the system meant that I got a new clean image,
wrote my backups of /etc and /home to it and was back running again.  
However there were a few gotchas on the way, a few symbolic links from
/etc had to be mended as well as installing some extra packages
(fortunately I list packages I've added when I add them - very
important!).


So, what could I have done to make things quicker/easier?

My incremental backups are automatic and run every night, when they're
more than a month old the system weeds some of them out but I have
monthly backups for some years back now.  I'm not aiming to change this.

However some sort of nightly 'image copy' would make what I had to do
on Sunday much easier.  It has to be automatic (i.e. run by anacron or
cron) or it won't get done.  This is a headless system so it has to
be non-GUI too (with the advantage that as it's a "Lite" installation
there's only about 1.5Gb of it to back up).

Is there some sort of "make an image copy of this system" I can run on
the Pi to write an image to another system?  I'd just keep a single
image I think and overwrite it every night, it's a "restore and get it
running quickly" backup I want, not the same as I have with the
incremental backups.  If the image is bad for some reason then I can
always fall back to my incremental backups.


What's recommended (if anything)?


--  
Chris Green


Disaster recovery, how to automate as far as possible?
  Re: Disaster recovery, how to automate as far as possible?
  By: Chris Green to All on Mon Aug 02 2021 10:29 am

 > I have good backups and was able to recover everything but it took
 > quite a while (like Sunday morning) and, as a result I've been
 > wondering if there's something I can do better.
 >  

It depends on your need for availability. I suppose you don't want to do
anything complex. I assume you are'nt using any CoW filesystem or snapshotting
volume manager in your pi.

You cannot really take an SD card snapshop from a running system without
risking bad problems. To the point problems are close to guaranteed, unless you
use some snapshotting system.

I personally use a cronjob that turns off the services that may perform changes
on the filesystems and them dump the filesystems sequentially over a network
for my personal servers. The drawback is the services go down while you back
them up. Thankfully my personal servers don't need 24/7 availability and the
backup will be done and the services operational once again when needed. You
can script a backup service that dumps the filesystems of the pi to an SD card
over a network and then sets a bootloader for it, but sincerely I think it is
more trouble than it is worth.

--
gopher://gopher.richardfalken.com/1/richardfalken

Re: Disaster recovery, how to automate as far as possible?
Quoted text here. Click to load it
The Pi I want to back up runs only DNS/DHCP for my LAN, there are
other services running but none are actually in use.

--  
Chris Green


Re: Disaster recovery, how to automate as far as possible?
  Re: Re: Disaster recovery, how to automate as far as possible?
  By: Chris Green to Richard Falken on Mon Aug 02 2021 05:51 pm

 > The Pi I want to back up runs only DNS/DHCP for my LAN, there are
 > other services running but none are actually in use.
 >  
 > --
 > Chris Green

Then, your work is easy.

Get PiCore. It is a Tiny Core Linux port for the Raspberry Pi. YOu can
configure it to boot from the SD card, copy the whole operating system to RAM,
and then forget the SD card exists.

You just remaster the SD card with the configuration you want for your DNS and
DHCP. The content of the SD card is never changed. There is no data persistence
unless you manually -or scriptedly- command the OS to copy data to disk.

This allows you to have a BASE operating system which never changes stored in
the SD. If there is data you want to save, you trigger a "store to disk"
command, which saves the changes between the running system in RAM and the base
system in the SD as a tar file.

And then you can send that tar file over a network.

Which is GREAT because you may have two PiCore SDs ready, use one for
production, command it to save the tar overlay every 24 hours and send the tar
to a backup server periodically. If you ever need to recover using the second
SD, you just copy the tar file to the second SD and boot. Just copy. No
imaging, no uncompressing, no anything.

Still I recommend the backup to be done with services stopped.

--
gopher://gopher.richardfalken.com/1/richardfalken

Re: Disaster recovery, how to automate as far as possible?
Quoted text here. Click to load it
Now that is quite a neat idea, thank you.

--  
Chris Green


Re: Disaster recovery, how to automate as far as possible?
Chris Green wrote on 02-08-2021 at 11:29:
Quoted text here. Click to load it

Apart from any backup strategy: boot from a USB disk (SSD would be good)  
which almost certainly more reliable and probably faster than SD card.

Re: Disaster recovery, how to automate as far as possible?
Quoted text here. Click to load it

Yes, I'm considering that too! :-)

Can an old Pi 2 boot from SSD?  I know a Pi 4 can.

--  
Chris Green


Re: Disaster recovery, how to automate as far as possible?
Chris Green wrote on 02-08-2021 at 12:46:
Quoted text here. Click to load it

Only the 2B v1.2, apparently:  
https://www.raspberrypi.org/documentation/hardware/raspberrypi/bootmodes/msd.md

But because it's not USB3 the speed is probably not better than SD card,  
I think, and an SSD might be overkill. Maybe a "high endurance" SD card  
would be a better investment.

Re: Disaster recovery, how to automate as far as possible?
Quoted text here. Click to load it

Which is sort of where I came in. :-)

How do I make (and maintain up to date) an image of the SD card?

--  
Chris Green


Re: Disaster recovery, how to automate as far as possible?
Quoted text here. Click to load it

Well, a high quality high endurance sd card would certainly be an
additional benefit.

Quoted text here. Click to load it

I don't think you can get much closer than you already are, except with
that other suggestion: mirrored disks, which you rightly deemed to be
overkill. You can however dramatically accelerate the restore from fresh
image with something like Ansible playbooks (or your own bash script
implementation).





Re: Disaster recovery, how to automate as far as possible?
Quoted text here. Click to load it

It is faster then the SD card. The SD card interface is not particularly  
fast.

Quoted text here. Click to load it

I have 2 Pi servers, a Pi3B and an old Pi1b+.

Each boots from it's SD card, but the cmdline.txt specifies that the  
root filesystem is on the USB attached HD (spinning rust). The disks are  
both WD disks with built in USB2 interface.

I have it so the sd card /boot partition is mounted readonly, and there  
is a second SD partition (usually unmounted) where I keep a copy of the  
root partition on the USB disk. An rsync script(*) running at least daily,  
mounts the SDcard 2nd part. and updates it from the running root  
partition, then unmounts. That way the SD card is only "at risk" during
the update. I get aprox. 1.3MB per day updates - log files mostly.

If the harddrive dies I will take out the SD card and on my desktop  
computer alter the cmdline.txt file to specify root on the SD card -  
alter the /etc/fstab file as needed - but back in Pi and reboot.
Then go about replacing the failed HD.

The other data partitions on the Harddrive are backuped up elsewhere and  
would have to be restored onto any replacement USB disk, along with a  
copy of the root partition.

I use rsync for my backup purposes, and have recovered my Linux Intel  
desktop when it's boot disk died, using a similar scheme (it has 2 disks  
- but grub is a Bl**dy can of worms and a grub rescue CD was needed).

Jim

(*) make sure you specify the -XAH options to rsync as well as the usual  
-a and -x (don't cross file system boundaries), and whatever delete  
option you prefer.




Re: Disaster recovery, how to automate as far as possible?
On 02/08/2021 10:29, Chris Green wrote:
Quoted text here. Click to load it
Well one way is to take the SD card out put it in a reader and take an image

I am not sure if you can do that over a network in a live system.



--  
Karl Marx said religion is the opium of the people.
But Marxism is the crack cocaine.

Re: Disaster recovery, how to automate as far as possible?
Quoted text here. Click to load it
That's a little difficult to automate! :-)

--  
Chris Green


Re: Disaster recovery, how to automate as far as possible?
On 02/08/2021 11:29 am, The Natural Philosopher wrote:
Quoted text here. Click to load it

You don't want an _image_ copy of a running system - there's all sorts  
of things you don't need/want copied - /dev /proc /sys etc.
Just do an ssh/rsync --one-file-system from the backup machine.
E.g.

rsync --archive --one-file-system --exclude=lost+found --exclude=media  
--exclude=save --exclude=var/cache --exclude=swap --exclude=*~ --delete  
--delete-excluded --info=STATS1 "machine-to-backup:/" "directory-for-backup"


--  
Chris Elvidge
England

Re: Disaster recovery, how to automate as far as possible?
  Re: Re: Disaster recovery, how to automate as far as possible?
  By: Chris Elvidge to The Natural Philosopher on Mon Aug 02 2021 11:56 am

 > You don't want an _image_ copy of a running system - there's all sorts
 > of things you don't need/want copied - /dev /proc /sys etc.
 > Just do an ssh/rsync --one-file-system from the backup machine.
 > E.g.
 >  
 > rsync --archive --one-file-system --exclude=lost+found --exclude=media
 > --exclude=save --exclude=var/cache --exclude=swap --exclude=*~ --delete
 > --delete-excluded --info=STATS1 "machine-to-backup:/" "directory-for-backup"
 >  

Yeah, but he wants a bootable backup image I think.

Also rsyncing a live system may cause inconsistencies in the backup image. You
have to be careful with those.

--
gopher://gopher.richardfalken.com/1/richardfalken

Re: Disaster recovery, how to automate as far as possible?
Quoted text here. Click to load it

I don't think you're going to get that with any kind of online backup.  I
image Raspberry Pi systems by popping out the SD card, shrinking the root
filesystem as small as possible with gparted (running on another computer),
and piping dd into xz (or gzip if I'm in a hurry) to make an image backup.  
Clonezilla would probably work too, but it's a bit trickier to use if you
need to restore to a smaller SD card.

  _/_
 / v \ Scott Alfter (remove the obvious to send mail)
(IIGS( https://alfter.us/ Top-posting!
 \_^_/                              >What's the most annoying thing on Usenet?


Re: Disaster recovery, how to automate as far as possible?
Quoted text here. Click to load it
It needs far more than that for what I want to do.

Apart from anything else there are two file systems to back up:-

    Filesystem     Type 1M-blocks  Used Avail Use% Mounted on
    /dev/root      ext4     29270  1318 26731   5% /
    /dev/mmcblk0p1 vfat       253    49   205  20% /boot


I want to have available some sort of clone/image of the system that I
can quickly create a Pi SD image from.  So the nightly backup will
create a clone/image/whatever (on another system preferably) and from
that clone/image/whatever I will be able to (quickly) create a working
SD card to plug into the Pi.

--  
Chris Green


Re: Disaster recovery, how to automate as far as possible?
On Mon, 02 Aug 2021 12:23:39 +0100, Chris Green wrote:

Quoted text here. Click to load it
Set up your backup cards (use at least two and cycle them), using the  
same partition scheme as your prime card) and run rsync twice (once on  
each partition for each backup session) because that's fast - if there  
are no changes, rsync only scans the partition being backed up.

If you install your locally developed scripts, programs and man pages in  
the /usr/local/* file structure, make sure thats backed up too.

BTW, full marks for keeping a list of the extra packages you've  
installed, but have you thought of structuring that list as a script  
containing a list of "apt get package-name" statements? Its a nice, zero-
cost way of speeding up disaster recovery.

Quoted text here. Click to load it

Using rsync the way I've suggested will do that: cloning the backup disk  
then requires:
- creating the two partitions on the new disk
- running dd to copy their contents to the new disk.


--  
Martin    | martin at
Gregorie  | gregorie dot org


Re: Disaster recovery, how to automate as far as possible?


Quoted text here. Click to load it

Actually I'd say that's a large part of it. I'll sketch here how you
could make an image of your Pi:

- Create an empty file the size of your SD card or at least large enough
  to hold the file systems.
- Put your partition table from Pi's SD card on there and create the
  root file system.
- On my Pi at least the /boot partition on a Pi can be unmounted during
  runtime so you can do that and then image it directly into your backup
  SD card image and remount. Offset calculation left as an exercise.
- For the / partition, mount your corresponding partition inside the SD
  card image and use the rsync stanza from your namesake above.

And that should do it. I've never done this exactly but I don't see an
issue either. You could even use a real SD card as the target instead of
an image but if you do it every night it might eat up the card...

For a more advanced solution you could go to the zfs filesystem and use
the snapshot feature there. I don't know zfs nearly well enough to do
that though.  

Re: Disaster recovery, how to automate as far as possible?
On Mon, 2 Aug 2021 10:29:44 +0100

Quoted text here. Click to load it

    There are many possibilities.

    Running from mirrored drives provides the fastest recovery from
drive failure - not possible with SD cards on a Pi but should be possible
booting over USB. You do need to check from time to time and replace bad
drives promptly.

    Running a failover server provides the fastest recovery from system
failure - setting up can be tricky and you need something to tell you when
a system is down otherwise you may not notice until the failover goes down.

    Network booting everything from a central server is great and makes
it easy to arrange snapshots - until the central server dies - so don't go
that way unless you can afford to failover that server. It's also terrible
for maintaining partial service on batteries in power outage.

Quoted text here. Click to load it

    A better option is probably to keep a master image (several copies)
and make a regular backup that is the increment against the master - then
restoring is a bit like assembling a docker image. Copy the master and
apply the latest increment.

    Whatever you do do not I implore you overwrite a single image every
night. Murphy will ensure that the crash happens in the middle of that
operation and then your system and the backup are hosed[1]. Keep at least
two images.

[1] This is not imagination - where I saw it happen they hadn't changed the
backup tape in the several months the system had been there and the disc
failed in the middle of the backup run. You should have seen the faces when
I said "Where's last night's tape, this one is corrupt ?". They were
*really* good at changing tapes and rotating the weeklies after that!

--  
Steve O'Hara-Smith                          |   Directable Mirror Arrays
C:\>WIN                                     | A better way to focus the sun
We've slightly trimmed the long signature. Click to see the full one.

Site Timeline