Oops - sorry, details of calculation of count was wrong. It should have
been 512 byte sectors rather than 1024 byte blocks:
How to calculate count:
count can be found from fdisk
use "End sector address of last partition" +1,
x 512 for bytes per sector, then
divide by your bs (here=1M))
Example which I used above:
# fdisk -l
Start End Blocks Id System
/dev/sdc1 8192 131071 61440 0c FAT32
/dev/sdc2 131072 7684095 3776512 83 Linux
So count = (7684095+1) x 512 / 1048576
=7684096 / 2048
Just an aside: if your backup images are important to you, you should
never compress them, especially with gzip. A single bit error in a gz
file makes it just so much junk bits. Or with the program gzip recovery
you might get something resembling your original data but that might
still be useless for a file system image.
If you really have to, bzip2 uses blocks so you just lose the block with
the error. Block oriented backup apps like fsarchiver and partimage do
likewise and they usually have the smarts to just leave out parts of the
filesystem that aren't in use so you get less data that way.
So, instead of compression, I'd propose generating error correction data
with par2 for your images.
I prefer my filesystem images with zero bit errors anyway... so I
put them on harddisks with ECC.
And in fact I prefer a "completely junk bits" situation over a nasty
single bit error that results in mysterious, unexplainable problems.
Your ECC hard disk does you no good when the file is corrupted DURING
TRANSFER, which is the scenario under discussion. As the OP stated,
there is little hope of recovery if the file is compressed. With an
uncompressed (or block compressed) file, at least you can recover
important data. Even with an ECC hard disk, RAID, or whatever scheme
you choose, you can STILL EXPERIENCE BIT CORRUPTION! The ECC presumes
that the data stream written to it was uncorrupted to begin with. That
is far from a reasonable assumption - if you are copying files over a
net connection (TCP delivery is NOT guaranteed), buffers can
overflow/underflow, pipes can fail, any number of things can happen
outside the control of the destination file system. Critical data
should NEVER be compressed with an algorithm that cannot be reversed in
the presence of bit corruption. Period.
All of the above transfer methods have checking algorithms in place
that guard against single-bit corruption.
And again, I prefer a completely garbled image over a single bit
corruption any time. Nothing worse than an error that isn't obvious.
Currently I am using a computer without ECC memory, but after some
experiences with memory errors in the past years I am pretty sure that
the next one will have ECC RAM again.
(previously I always made sure there was at least parity checking)
Pardon me, I accidentally sent that before I finished typing.
The underlying IP layer which TCP sits on top of has no guaranteed
delivery. But that's beside the point - a failure can happen at
multiple endpoints and in multiple failure modes (missing data, single
bit errors, bit reversals, double bit errors, etc.). The ECC of a hard
disk plays no role in ensuring data integrity until AFTER the data has
been written to disk. It cannot detect whether or not that data had
been corrupted ENROUTE to the hard disk.
And w.r.t. your other remark: such checks have been added to other
places as well. For example, the transfer from the motherboard to
the disk used to be unchecked, but since the introduction of SATA there
also is a CRC check on that connection.
I am with Rob in not wanting an almost correct file.
Rather than using par2 error correction techniques, I would create the
uncompressed image then checksum it before gzipping. I can then unzip it
and recheck which would validate the gzipping. I could go further and
check the unzipped file byte for byte against the original with a
re-read, or perhaps just read the original source into a checksum
program. In that case I don't need the initial intermediate uncompressed
A further thought is to use tee when reading the initial file to create
the checksum while creating the gz file.
Alternatively: use rsync to maintain uncompressed backups.
This is fast because rsync only does the minimum work needed to make the
backup an exact copy of the filing system being backed up. This means
that the first backup is slower than making a dd copy or using tar, but
subsequent backups are a lot faster because unchanged files are not
copied: deleted files are removed from the backup and added or changed
files are backed up again.
My main backups, which are kept off line are done this way - I back up
all my systems (all Linux - house server, laptop and RPI) to a USB drive
mounted on the house server and kept offline on a firesafe. There are two
backup disks which are used alternately, so there is a backup copy in the
firesafe at all times. If you're just backing up an RPi, the same trick
will work just as well with two or more SD cards and a USB SD card reader.
rsync is a standard Linux utility program and is part of the Raspbian
distro. You can use it 'bare', but backups are easier if you run it from
a bash script which mounts the SD card, runs rsync configured as you want
it [*] and then unmounts the SD card.
[*] rsync has a lot of potions for excluding files, etc. You may also
want to use separate rsync runs for each partition on the main SD card.
martin@ | Martin Gregorie
gregorie. | Essex, UK
I use rsync to back up data using a script, and as you say it is very
efficient. I have a NAS which uses 2 2TB drives Raid 0 and I back this
up once a day to a 2TB drive on my PC. Might sound a** about face but
the NAS is also used by other machines. I also back up online to Amazon
S3 using a set of bash scripts I wrote.
But for my PC systems I use system images - I like to make a complete
image and keep it intact as a historic copy, not updating it. I figure
systems only change gradually so occasional copies are fine. Restoration
from a backup doesn't need much updating and it's an automatic process
anyway, whereas for personal files, updating old copies is a manual
operation and damn near impossible to find the incremental source info
anyway, so you do need very frequent updates, preferably with means to
get back to any older version. I used SpiderOak for this for a while
because they save older versions, but their Linux support was
non-existent when I had an issue. Wuala was even worse. Which is why I
wrote my own. If an old version isn't on my current local backup I can
find it online.