Shrinking the SD card occupancy

David Taylor · 2013-06-21T07:48:18+00:00

When you start up an RPi you get and option to expand the file system to fill the card. An unfortunate side effect of this is that if you have an 8 GB card (for example), the image of that card you take for backup occupies 8 GB, even if most of the card is empty. Is there a similar easy way to shrink the card - say if only 2 GB is currently occupied a way to shrink the occupancy back to 2.5 GB (say) so that image backups take less space? I'm using the recommended Win32DiskImager program. (Yes, I know I can compress the resulting .img file with 7-zip, but that does take time...). -- Thanks, David Web:

R

Rob 13 years ago

It depends on the distribution you use. I have always used SuSE variants and it has been standard to have a separate /home for ages. However, I always override it and create a /local where some subdirectories are created like /local/home /local/local /local/www etc, and symlinks are made from /home, /usr/local, /srv/www etc.

NIS may be not useful at home, but in a company it is just as useful as a Windows domain.

That is not good. When your new version has changes due to updates in the packages, they show up as diffs and patch would reverse them. What you need is a list of things that YOU changed in the OLD version, then apply those same changes in the NEW version, if that still can be easily done do it by patch, if it is more intricate do it by hand. Sometimes I diff between our saved OLD version and the NEW version but only to see what global changes have been made and if it makes sense to use patch. (OK if there are minor changes, not if the whole structure has been overturned)

That is the kind of change that requires major manual work. Fortunately all our servers are virtual, so it is easy to just create a new server and work a bit on it, test it, then later make it the production server and remove the old one. In the days of physical hardware one was often forced to work over the weekend to do an upgrade, and pray that everything is functional again on monday morning. No fun.

Disk crashes are of course best cured by RAID and other larger disasters by image backups. Not a problem in a virtual environment or with LVM.

Vote

A

Anssi Saari 13 years ago

Well, at least some version of dd prints stats about its progress if it gets a USR1 signal and also displays transfer rate when it's done. Could use pv for that info too though.

Vote

M

Martin Gregorie 13 years ago

Thanks for the info. I think the last time I used it at work it was to let a pair of servers share a common filing system. They also used some form of heart-beat monitoring over a serial link that was supposed to cause comms lines to failover, etc. In practise is wasn't very reliable at crash detection, let alone any form of failover.

I wasn't at all impressed, but then at the time I was used to using a Stratus or Tandem NonStop box if I needed fault tolerance.

I normally do it by hand: its unusual to find I need more than 3-4 changes unless I'm dealing with something like Samba or Apache. Fortunately these config files almost never change, so usually the old one works OK.

However, I'll have a think about your method.

Indeed.

martin@ | Martin Gregorie gregorie. | Essex, UK org |

Vote

M

Martin Gregorie 13 years ago

I don't: almost never use it. I think the last time I did, it was to make an image file off am OS-9/68K formatted floppy so I could access it with the 0s9exec system emulator.

Currently tar (possibly to be replaced by rsnapshot) and rsync are handling my backup needs well enough.

martin@ | Martin Gregorie gregorie. | Essex, UK org |

Vote

W

Windmill 13 years ago

With dd you can also say 'bs=1024k' which seems in some cases to speed things up. (whereas 'bs=1' causes things to crawl along).

Windmill, TiltNot@Nonetel.com Use t m i l l J.R.R. Tolkien:- @ O n e t e l . c o m All that is gold does not glister / Not all who wander are lost

Vote

I

Ivan D. Reid 13 years ago

On Fri, 28 Jun 2013 23:50:28 GMT, Windmill wrote in :

I've been known to use much larger block sizes, it seems especially to help when transferring ISOs to USB memory sticks.

Ivan Reid, School of Engineering & Design, _____________ CMS Collaboration, Brunel University. Ivan.Reid@[brunel.ac.uk|cern.ch] Room 40-1-B12, CERN KotPT -- "for stupidity above and beyond the call of duty".

Vote

R

Rob Morley 13 years ago

Using it with a hard drive I think it makes sense to specify the cache size of the drive as the block size. Specifying numbers lower than the size of the smallest addressable block in the file system is just silly.

Vote

W

Windmill 13 years ago

Well, silly except as an easy way to prove that speed is affected by the block size. Though to what extent I'm not sure. I've never actually timed it, or at any rate not recently, but seem to recall that quite large sizes can be helpful. (To me, a megabyte counts as large, though that's giving away my age.)

Maybe it depends on the output device.

Windmill, TiltNot@Nonetel.com Use t m i l l J.R.R. Tolkien:- @ O n e t e l . c o m All that is gold does not glister / Not all who wander are lost

Vote

R

Robert Riches 13 years ago

If I understand correctly, the kernel does read-ahead on reads and then combines adjacent writes. That would seem to imply that there wouldn't be a _HUGE_ difference in performance difference as long as the block size being considered is large enough that system call CPU overhead was not a big factor.

A block size of 1 would, of course, have a large overhead of a system call per byte. I would guess a block size of even as small as 128 would be sufficient to get _close_ (within about 10 percent) to optimal performance. However, I'd probably use a block size of 1024 or 4096 just to be on the safe side.

Does anyone here have experimental experience with the effect of "reasonable" block sizes on dd performance?

Robert Riches spamtrap42@jacob21819.net (Yes, that is one of my email addresses.)

Vote

B

Bob Martin 13 years ago

I use bs=4M when copying SD cards and 3GB is pretty quick.

Vote

M

Martin Gregorie 13 years ago

I think the following is relevant because dd has to acquire a number of block buffers to operate at all, and with very small (1b) buffers, it will require a lot of them to run at all.

I've not repeated any of this under Linux, but the results are likely to be similar since its likely that most library function implementations will use the same algorithms if not the same source code. With some *NIX systems the overhead of malloc() calls are surprisingly high. Here are some numbers from DEC True Unix, which used a MACH-based kernel and ran on alpha chips. It was a decent OS that surprised me with the number of developers a rather small server could support without slow responses being noticeable.

At the time we were bottlenecked on random access to a rapidly growing data warehouse. We were getting around 350 inserts/sec when we needed at least 1500/sec. We'd done about all we could with DB tuning and decided to try using an in-memory approach using B-trees.

My first attempt used the standard UNIX library tsearch functions for manipulating red-black binary trees. To my surprise this topped out at about 700 inserts/sec. Closer inspection showed that it was mallocing like mad: each insert involved three mallocs (one for space to hold the left, right, key and data pointers, a second for somewhere to put the key and a third for somewhere to put the data. At this point I dug out my copy of Sedgewick and re-implemented the code to use a single piece of memory (i.e. a single malloc() needed to get the space) to hold left and right pointers as well as the key and data). This pushed the speed up to around 2500 inserts/sec, but still not much overhead for expansion, and startup was slow, since the binary tree had to be built from scratch each time the program started - and it held a LOT of data. Put it this way: my testing dataset contained 500,000 data items and the production DB was many times that size: it had hit 300M items last time I saw it.

My final version malloced initially for 1MB of RAM and doubled that for each following chunk until it reached 16MB after which it stayed constant. This version used its own addressing scheme (the chunk number + offset inside the chunk) so that the chunks could be periodically saved to disk and a restart merely read the chunks back into memory. The new addressing scheme meant that it didn't matter where the chunks were put in memory as the content addresses were not dependent of system memory addresses. This version ran at 25,000 inserts/sec on the same hardware despite the overheads of its custom addressing scheme.

Bottom line: If you're using a reasonable number of statically allocated buffers the overheads should be small, but if a lot of address calculation is involved, i.e. you have a large number of very small buffers, that may become significant. However, it is certain that malloc() and free() operations are likely to always be expensive, particularly if there's any garbage collection involved. FWIW the Java VM avoids malloc/free whenever possible by using its own scheme for creating and destroying object instances and only seems to acquire and release large chunks of system memory through malloc/free calls.

martin@ | Martin Gregorie gregorie. | Essex, UK org |

Vote

R

Richard Kettlewell 13 years ago

dd uses at most two buffers; for simple copies like the ones discussed here, it only uses one.

http://www.greenend.org.uk/rjk/

Vote

R

Rob Morley 13 years ago

Think of the process involved in reading or writing blocks of data smaller than the actual block size that the device uses:

read whole block (minimum 512 bytes, typically 4kB these days) take a bit of that and copy it to output read the same block again (at least it should be cached now) take a different bit of it and copy it to output (will that cause the previously cached write to be written to disk, or will it be smart enough to combine them? Dunno - I must be feeling stupid ATM) repeat until end of block read the next block

That's a different issue than deciding between bs=4k and bs=8M for optimising copy speed. The default dd block size of 512B is mostly obsolete when file systems are tending to use 4kB blocks on modern drives.

Vote

R

Richard Kettlewell 13 years ago

Physical writes are deferred, so you?d expect a sequence of sub-block write(2) calls to result in only a single transfer to the physical storage device. The device may do its own buffering too, providing a further opportunity to combine writes.

(Assumption: modern Linux, and none of O_SYNC, O_DIRECT, or raw devices are in play.)

As for 1-byte buffers - well, with bs=4096 byte I get ~140MB/s with a cold cache. With bs=1 I get about 1.7MB/s (and the state of the cache is rather irrelevant). The back of this envelope says that each iteration of the call stack from dd_copy() down to copy_user_generic() takes about half a microsecond. Not much in isolation but if you are copying millions or billions of bytes, you are going to notice.

http://www.greenend.org.uk/rjk/

Vote

R

Rob 13 years ago

In the past I used a program called "buffer" to keep a tape streaming when using it to backup with tar. It allowed a "huge" buffer (for those days) while still writing in 10K chunks as standard with tar.

Vote

M

Morten Reistad 13 years ago

Still useful, for copying to the net or to usb devices. "apt-get install buffer" will install it.

tar cf - /usr /var /local /etc /bin /sbin /opt /lib /run /selinux /srv /sys | \ buffer > /mnt/usbdisk/tarbackup.tar

takes a backup of the standard disk layout, sans /dev /proc

-- mrr

Vote

A

Anssi Saari 13 years ago

I just made a few tries with my Linux desktop, just reads from harddisk to /dev/null. No difference between 4k and 1M blocks or dd's default of

512 bytes, same 86.1 MB/s which is pretty much what my oldish disk can deliver.

Vote

R

Robert Riches 13 years ago

That's good news. It means the kernel's readahead strategy works well. I'm not surprised at that.

Robert Riches spamtrap42@jacob21819.net (Yes, that is one of my email addresses.)

Vote

R

Rob 13 years ago

Some things work well, others not so well. On my desktop systems I always use RAID-1, these days with two SATA drives. You would expect double the read rate on a clever RAID-1 but you won't get it. It can schedule reads to both drives (it is not so stupid as old "hardware RAID" controllers that direct all reads to a single drive), but the readahead apparently is not clever enough to distribute reads across drives.

Vote

G

Gordon Henderson 13 years ago

No, you wouldn't, and I don't know why people expect this. RAID-0 - yes, double the speed as it can read from both drive concurrently, but RAID-1, no. It's just not designed to work that way.

And don't just take my word for it - go read the archives if the linux-raid mailing list - this crops up about 2 or 3 times a year.

RAID-1 reads sequentially from all drives (can be > 2 in a RAID-1), but never from both drives concurrently. Each drive has an identical copy of the data whereas RAID-0 each drive has half the data.

So if Linux does read ahead then it is possible to get a margin of gain when you've asked it to read more than one block at a time, but in practice it rarely happens.

You want speed, use RAID-0, or RAID-10. Want redundancy, use RAID-1 (or RAID-5,6,10)

Gordon

Vote

Shrinking the SD card occupancy

Join the Discussion

Didn't find your answer?