Failure to clone SD card with compression

Russell Gadd · 2016-01-07T20:09:40+00:00

I am having difficulty using gzip for compressing images of SD cards. If I use command lines such as # dd bs=1M count=3752 if=/dev/sdc conv=sync,noerror of=/ # dd bs=1M count=3752 if=/ of=/dev/sdc conv=sync,noerror for making the image and restoring it to another card, it works ok. However if I use compression like this: # dd bs=1M count=3752 if=/dev/sdc conv=sync,noerror | gzip -c >/ # gunzip -c / | dd bs=1M count=3752 of=/dev/sdc conv=sync,noerror it fails to restore the image correctly. The MBR and the partitions cloned are not the same as the original. What error(s) am I making?

R

Russell Gadd 10 years ago

Oops - sorry, details of calculation of count was wrong. It should have been 512 byte sectors rather than 1024 byte blocks:

How to calculate count: count can be found from fdisk use "End sector address of last partition" +1, x 512 for bytes per sector, then divide by your bs (here=1M))

Example which I used above: # fdisk -l ... Start End Blocks Id System /dev/sdc1 8192 131071 61440 0c FAT32 /dev/sdc2 131072 7684095 3776512 83 Linux

So count = (7684095+1) x 512 / 1048576 =7684096 / 2048 =3752

Vote

A

Adam Funk 10 years ago

Yes, I'm not saying "conv=sync,noerror" is right....

Everybody says sex is obscene. The only true obscenity is war. --- Henry Miller

Vote

A

Anssi Saari 10 years ago

Just an aside: if your backup images are important to you, you should never compress them, especially with gzip. A single bit error in a gz file makes it just so much junk bits. Or with the program gzip recovery you might get something resembling your original data but that might still be useless for a file system image.

If you really have to, bzip2 uses blocks so you just lose the block with the error. Block oriented backup apps like fsarchiver and partimage do likewise and they usually have the smarts to just leave out parts of the filesystem that aren't in use so you get less data that way.

So, instead of compression, I'd propose generating error correction data with par2 for your images.

Vote

R

Rob 10 years ago

I prefer my filesystem images with zero bit errors anyway... so I put them on harddisks with ECC.

And in fact I prefer a "completely junk bits" situation over a nasty single bit error that results in mysterious, unexplainable problems.

Vote

C

CPMDude 10 years ago

Your ECC hard disk does you no good when the file is corrupted DURING TRANSFER, which is the scenario under discussion. As the OP stated, there is little hope of recovery if the file is compressed. With an uncompressed (or block compressed) file, at least you can recover important data. Even with an ECC hard disk, RAID, or whatever scheme you choose, you can STILL EXPERIENCE BIT CORRUPTION! The ECC presumes that the data stream written to it was uncorrupted to begin with. That is far from a reasonable assumption - if you are copying files over a net connection (TCP delivery is NOT guaranteed), buffers can overflow/underflow, pipes can fail, any number of things can happen outside the control of the destination file system. Critical data should NEVER be compressed with an algorithm that cannot be reversed in the presence of bit corruption. Period.

Vote

R

Rob 10 years ago

All of the above transfer methods have checking algorithms in place that guard against single-bit corruption.

And again, I prefer a completely garbled image over a single bit corruption any time. Nothing worse than an error that isn't obvious.

Currently I am using a computer without ECC memory, but after some experiences with memory errors in the past years I am pretty sure that the next one will have ECC RAM again. (previously I always made sure there was at least parity checking)

Vote

T

The Natural Philosopher 10 years ago

Yes, it is.

-- New Socialism consists essentially in being seen to have your heart in the right place whilst your head is in the clouds and your hand is in someone else's pocket.

Vote

C

CPMDude 10 years ago

No, it's not.

Vote

C

CPMDude 10 years ago

Pardon me, I accidentally sent that before I finished typing.

The underlying IP layer which TCP sits on top of has no guaranteed delivery. But that's beside the point - a failure can happen at multiple endpoints and in multiple failure modes (missing data, single bit errors, bit reversals, double bit errors, etc.). The ECC of a hard disk plays no role in ensuring data integrity until AFTER the data has been written to disk. It cannot detect whether or not that data had been corrupted ENROUTE to the hard disk.

Vote

T

The Natural Philosopher 10 years ago

No, thats what the TCP stack is for.

TCP is guaranteed to either be uncorrupted, or to fail completely

You can get much farther with a kind word and a gun than you can with a kind word alone. Al Capone

Vote

R

Rob 10 years ago

The TCP header has a checksum field that guards the data against single-bit errors that might occur in the IP layer.

Vote

R

Rob 10 years ago

And w.r.t. your other remark: such checks have been added to other places as well. For example, the transfer from the motherboard to the disk used to be unchecked, but since the introduction of SATA there also is a CRC check on that connection.

Vote

T

The Natural Philosopher 10 years ago

More than single bit. It has to be a pretty organised sort of corruption that defeats the error checking

You can get much farther with a kind word and a gun than you can with a kind word alone. Al Capone

Vote

R

Russell Gadd 10 years ago

I am with Rob in not wanting an almost correct file.

Rather than using par2 error correction techniques, I would create the uncompressed image then checksum it before gzipping. I can then unzip it and recheck which would validate the gzipping. I could go further and check the unzipped file byte for byte against the original with a re-read, or perhaps just read the original source into a checksum program. In that case I don't need the initial intermediate uncompressed image.

A further thought is to use tee when reading the initial file to create the checksum while creating the gz file.

Vote

M

Martin Gregorie 10 years ago

Alternatively: use rsync to maintain uncompressed backups.

This is fast because rsync only does the minimum work needed to make the backup an exact copy of the filing system being backed up. This means that the first backup is slower than making a dd copy or using tar, but subsequent backups are a lot faster because unchanged files are not copied: deleted files are removed from the backup and added or changed files are backed up again.

My main backups, which are kept off line are done this way - I back up all my systems (all Linux - house server, laptop and RPI) to a USB drive mounted on the house server and kept offline on a firesafe. There are two backup disks which are used alternately, so there is a backup copy in the firesafe at all times. If you're just backing up an RPi, the same trick will work just as well with two or more SD cards and a USB SD card reader.

rsync is a standard Linux utility program and is part of the Raspbian distro. You can use it 'bare', but backups are easier if you run it from a bash script which mounts the SD card, runs rsync configured as you want it [*] and then unmounts the SD card.

[*] rsync has a lot of potions for excluding files, etc. You may also want to use separate rsync runs for each partition on the main SD card.

martin@ | Martin Gregorie gregorie. | Essex, UK org |

Vote

F

fruit 10 years ago

+1

For those who find it confusing there is also grsync - rsync with a gui wrapper.

If nothing else it's useful for generating suitable command lines - and a useful tutorial here

formatting link

fruit

Vote

R

Russell Gadd 10 years ago

I use rsync to back up data using a script, and as you say it is very efficient. I have a NAS which uses 2 2TB drives Raid 0 and I back this up once a day to a 2TB drive on my PC. Might sound a** about face but the NAS is also used by other machines. I also back up online to Amazon S3 using a set of bash scripts I wrote.

But for my PC systems I use system images - I like to make a complete image and keep it intact as a historic copy, not updating it. I figure systems only change gradually so occasional copies are fine. Restoration from a backup doesn't need much updating and it's an automatic process anyway, whereas for personal files, updating old copies is a manual operation and damn near impossible to find the incremental source info anyway, so you do need very frequent updates, preferably with means to get back to any older version. I used SpiderOak for this for a while because they save older versions, but their Linux support was non-existent when I had an issue. Wuala was even worse. Which is why I wrote my own. If an old version isn't on my current local backup I can find it online.

Vote

R

Rob Morley 10 years ago

Not useful when your backup is a drive image, which is what we were discussing here (and, I suspect, how Pi users normally roll).

Vote

R

Rob 10 years ago

gzip already does that internally...

Vote

R

Russell Gadd 10 years ago

I guess the advantage is that you can re-read the restored card to check you get the same checksum. Then you're just relying on the first read which created the gz file not to be in error.

Vote

Failure to clone SD card with compression

Join the Discussion

Didn't find your answer?