Disk imaging strategy

- S
- Stefan Reuther
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Nov 3, 2014 5:56 PM

[I hope you mean 'tar -zcvf partition_sda1_image.tgz /mnt' or something like that; otherwise, you'll get *really* good compression.]

It depends.

'dd'ing the raw partition is almost guaranteed to produce a working image after unpacking. If you 'tar' a mounted file system, the operating system you run the 'tar' on must suppot all nuances of the file system you want to clone. Back in Win95 days, cloning (or backup/restore) a Win95 installation using 'tar' from Linux did not work, because it did not restore all required file attributes. I wouldn't expect Linux 'tar' to capture all NTFS attributes (like "compressed", ACLs, ADS) as well. Copying the partition blockwise would not have all these problems.

Here's an interesting read that also shows some of the problems of copying large amounts of data file-wise:

formatting link

But in this case, the user couldn't avoid it.

Stefan

- G
- George Neuner
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Nov 3, 2014 5:57 PM

Then you have to be working directly with the drive interface because the BIOS block interface in many systems isn't able to fully address a large drive.

I'm sure you're familiar with the [paraphrased] warning: "this partition may not be bootable because it starts or lies partly above XXXX cylinders". That's telling you the BIOS interface can't handle it.

Try MP4s of the top 40 rap music videos ... you'll wind up erasing everything on the drive 8-)

George

- J
- Jan Panteltje
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Nov 3, 2014 6:39 PM

On a sunny day (Mon, 03 Nov 2014 10:17:18 -0700) it happened Don Y wrote in :

??? Did you read I typed 'mount'??? Maybe you should read my other replies where I stated you need s similar file system for 'regeneration' for exampel file name length.

When you use mount it will TELL you the filesystem. I have tgz'ed whole partitions of real servers no problem, made a new partition on my laptop, untarred it there, added a grub boot entry, and it runs there too, without kernel recompile, but OK only in normal resolution as the old kernel has no drivers for my super new 2 graphics card laptop. But very useful to run all my scripts. And as backup.

That only matters if you copy images. Keep a good documentation of the images or tgzs you create and you KNOW the sizes.

Maybe you live on an other planet, but on mine I have found that disk sizes always increase, same for (SD)card sizes, I think nobody is going to copy an image to smaller disks.

I have made copies that way by dd if=/dev/sda of=my_sda_backup.img and then dd it to an other harddisk. Problem there is you f*ck up the pertition table too. If you do it for one at the time partition only you do not have that problem, but then you have the huge file sizes, even if there is nothing of value.

For small things this works OK, for example Navigatrix on a 16 GB USB stick I have made several copies that way, But on a 1 TB disk with 250 and 500 GB partitions it is not practical. If you really want to clone for students you destroy all old work on the target that way. It is then much safer to use some backup utility as others pointed out.

No, you must live in an other world.

You should not change partition IDs, not mess at that level with the system. Why do that? File systems are there to allow many files. I have stored many movies as image on DVD-R,... but that is different, sectors sequential, no authoring... write on the disk what it is and how to play it back, here an entry from my database, just a text file basically:

814 DVD+R Verbatim inkjet printable 16x NEC 7173 the_gumball_rally.ts as image

-rw-r--r-- 1 root root 3153536064 2011-04-03 18:45 the_gumball_rally.ts Burned 2.4x

That is a transport stream file (as recorded from satellite) with all PIDs relevant to that program in it, often including teletext (ceefax) for that day You can play that with mplayer /dev/dvd

All disk are numbered, all cards and USB sticks are numbered, all is in the database.

Deleted data is a filesystem specific issue. You are confusing 2 things. Either you make an image with everything (compressed or not), or you use a filesystem and compress the current files only.

Deleted or not deleted and what is put in the sectors that are delected is filesystem specific. And if you do not even know what filesystem you are using you should not be messing with disks on a computah at all. Really,.

- J
- Jan Panteltje
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Nov 3, 2014 6:47 PM

On a sunny day (Mon, 03 Nov 2014 12:11:28 -0500) it happened George Neuner wrote in :

Almost all my systems use reiserfs, and most of it is full with video content for video editing and very long files (several GB each). Most is sequential I have never ever had a speed problem even with < 1 GHz processor.

What is funy is that I bought a key for the Raspberry Pi MPEG2 decoder, and somehow was running it at full speed from SDcard (not an high end), and even then it was running on that 500 GHz or so processor at > 50 fps. The raspi does HD too from SDcard. For video the codecs and maybe the graphics card limit the speed. When I run Linux transcode that sets the speed (ffmpeg etc), It is possible the occasional seek on a harddisk happens, but for example when using mplayer I always use -cache some megabytes, Linux caches everything anyway. So it really makes no difference in performance at all in my experience.

- D
- DecadentLinuxUserNumeroUno
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Nov 3, 2014 7:12 PM

On Mon, 03 Nov 2014 18:47:37 GMT, Jan Panteltje Gave us:

. ^^^^^^^

Is that all you could get it to do?

Why didn't you calculate the TOE for us with it?

- J
- Jan Panteltje
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Nov 3, 2014 8:29 PM

On a sunny day (Mon, 03 Nov 2014 18:56:05 +0100) it happened Stefan Reuther wrote in :

Of course, I think I mentioned that compatibility requirement.

I take your word for it, I left MS software in 1998 when I found a copy of SLS Linux on a CD. I do have an old system with win 98 in a partition.. win 98 runs, but the screen is low res, it has no driver for the newer graphics card, or I did not look hard enough. It does have a driver for my Canon flatbed scanner though, something Linux does not. Is is a > 10 year old Seagate that was on 24/7, so far it still seems error free. I did copy that partition with dd to some place as backup, but hey, I still have the original win98 disk.... if things go wrong. Not sure I would bother though...

True

That is why I tar things, for MS widnows I really do not know if that works. I burned my Xp disk and made a video of it. It is available for 98 Euro, before playing it you need to glue a sticker on your PC blah blah Men that Xp sucked. I have heard MS has gotten worse since.

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Nov 3, 2014 8:51 PM

Yes! And you should read that I said "REGARDLESS OF THE FILESYSTEM(s) contained thereon". You've missed the very ESSENCE of my question!

By typing "mount", you are relying on mount's UNDERSTANDING of the filesystem. What do you do when you type "mount" and you get a reply: Operation not supported by device (or, whatever the equivalent "kernel lacks support for the filesystem indicated by the device's contents)?

I.e., add a line after your "mount" in your script that begins with: if [ "$?" -ne "0" ]; then echo "Gee, I can't mount that sort of filesystem! I'll handle \ this in some other way..."

and contains whatever commands you deem necessary to create that image!

No, it won't. it will only tell you which filesystems it *recognizes* (typically by examining magic numbers). And, will only actually *do* the mount if the specific mount_ is executable in your system.

I have a laptop in my hands. (actually, I have six of them) They were purchased in their current state. The partition table contains partitions having magic numbers of 0xde, 0x07,

0x0f and 0xdb.

How does *your* script handle them? Take your time replying. Make sure you look at the sources instead of just SPECULATING about how THOSE SPECIFIC FS TYPES are handled! You may be surprised.

(Hint, a reasonably current Clonezilla won't recognize them!)

That's not the question I posed: "I went skiing in Switzerland"

Have you missed the entire point of this thread? Even the subject line makes that pretty clear.

Disk sizes *can* increase. Partition sizes (which is what we are concerned with; filesystems don't deal with "disks" but, rather, *partitions*) can move up or down at will.

Last month, I moved a NetBSD system that I initially created as a single partition on a 12G (!) disk onto newer hardware. The 12G single partition had value when the media was only 12G -- why risk partitioning it into multiple partitions and possibly ending up with "extra space" in one partition and "not enough" in another?

New disk was 640G. Foolish to treat the entire disk as a single partition. Equally foolish to treat it as a 12G partition (mounted as /) with a

628G partition (mounted as /ExtraSpace). So, *shrink* the partition to a suitable size for the new volume even though the new volume is 50 times larger!

I have a disk. It currently contains 1% of its capacity as "in use". The *one* file just happens to be located in sector 100 (out of 100 total sectors in that partition). I now want to shrink the partition to be 90 sectors. WITHOUT MOVING THE SECTOR'S CONTENTS (because you claim that is not required on YOUR planet), how do you do this?

Ask the appliance manufacturer why he wants to introduce a new/proprietary partition format! Do you think he is going to put a label on the outside of the box that says, "Warning! Proprietary filesystem format used. You won't be able to use to manage this media!"?

What does this have to do with my question? (Jeez, and Clifford claims I don't "read carefully"...)

You should probably go back and dig through the sources. Start with mount(8) so you understand the concept of how different filesystems are detected *by* mount. Use the four filesystem types mentioned above for examples so you know how mount *will* handle them! Then look through the various mount_*(8) exectuables for more specific issues related to each specific filesystem implementation.

*Then* tell me how you're going to know which "files" are present on the media -- and where they reside! So your "image" contains them and not "deleted cruft".

Until then, you're just speculating.

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Nov 3, 2014 9:01 PM

Exactly. But, you don't want your image to HAVE TO BE as large as the original. Esp as most disks have a fair bit of unused space.

Hence the problem I posed: how do you sort out what is "unused space" from "used space" -- in a manner that allows you to ignore the actual metadata/etc. imposed by the particular filesystem implementation.

E.g., I have a list of >100 different fstypes. Why even bother to sort out what they all mean and how they all perform? What's the likelihood that you will CORRECTLY (bug free) implement handlers for each of those types??

OTOH, dd | gzip preserves everything about the filesystem "ignorantly": " Nah, I don't know what all these bits mean. I just make sure I've got EVERY LAST ONE OF THEM!"

I contend that you can improve the "dd|compress" approach by putting highly compressible data into the "unused" sectors of the media. In doing so, you EFFECTIVELY remove the unused sectors from the image that you create. In much the same way that a fs-aware utility effectively notes "this sector was not copied because it was not in use"

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Nov 3, 2014 9:07 PM

All the machines I've seen handle LBA48 in the BIOS. I think that goes back at least 10 years (?). Anything that old wouldn't be worth the time to install an OS! :-/ (let alone trying to find drivers, etc.).

I think the partition table craps out at 2-4TB (but, unlikely to have that large a disk spinning on the boot drive for anything *I* will ever\ use -- or encounter in my pro bono gigs!)

Some of my SPARCs still have the 2G limit on the bootstrap's location.

I've been told that causes the heads to drop onto the platters WITH EXTREME PREJUDICE! :>

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Nov 3, 2014 9:22 PM

I'm going to GUESS that your FS is "proprietary" (not mainstream).

So, a potential test for my envisioned approach! :>

First, can a user create files having arbitrary names and contents under your FS?

Can he copy & rename files?

E.g., could he introduce a file having some particular contents (like "DELETEDDELETEDDELETEDDELETED...") to your FS? Then, could he replicate it many times? (copy to a different filename)

Having done that until the copy failed ("No space left on device"), presumably, he could delete each of them? (perhaps made simpler by creating them all in a single subdirectory/folder and then just deleting the folder AND its contents)

Could I then examine your disk AT THE SECTOR LEVEL and expect to find lots of "DELETEDDELETEDDELETED..." in sectors?

In doing so, effectively know which sectors are currently "unused"? (or, at the very least, safe to restore with "DELETEDDELETEDDELETED..." as their contents WITHOUT actually having to store an instance of "DELETEDDELETEDDELETED..." for EACH such sector?

[BTW, we are now below 25C. Almost feels comfortable!! :> ]

- H
- Hul Tytus
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Nov 3, 2014 11:02 PM

Don - using a "boot loader" approach does sound good, especially if you're using it often.

Hul

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Nov 3, 2014 11:16 PM

I am always leary of anything to do with PC's and their ilk. They seem to undergo frequent fundamental changes at times "unnecessarily" -- as if to change just for the sake of change!

So, an approach that I *think* will work well with the machines that I have available to me *today* may end up completely useless with the next model year, etc.

If you're doing something "for yourself", this is a manageable risk -- you simply decide when the added effort to "chase the newest" is worth the effort TO YOU. OTOH, when you are doing something for *others*, it's really hard not to just throw up your hands and say, "Sorry, I've got better things to do with my time than REDO something that was already working!"

- R
- Robert Wessel
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Nov 3, 2014 11:34 PM

I've seen that before, unfortunately he's off on some of the details. First it's a 3390 module (an "HDA", which was actually two drives or "actuators"), so it's from no earlier than 1989 (not late 70s/early

80s), and it's not 10MB, it's about 1.89GB (assuming it's a model-1), or 3.78GB (if it's a double density model-2), and there were additional models of higher capacity later. It's also not worth $250K

- you could buy an entire -B28 for $275K at the time, and that contained six double density modules (HDAs) of the type he disassembled. You'd usually buy a "string" of three units (one -Axx and two -Bxx units), for a total of 16 HDAs, which would set you back about $750K. So the value is more along the lines of $50K (assuming the enclosure is free).

You'd also need a controller.

Still a cool tear-down.

- R
- Robert Wessel
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Nov 3, 2014 11:52 PM

Of course it can, and it does. Unless you can imagine some way it could always assure a contiguous allocation for a file, whether written all at once, or in parts, it will have to fragment. There are, of course, various strategies to reduce fragmentation, most basically begin some additional cleverness it selection the next disk block to add to a file, but even some DOS/FAT systems did some of those things.

In the case of ext#, rather more aggressive anti-fragmentation strategies are in place (most notably consecutively created files are allocated somewhat scattered across the disk, making it likelier that there will be unused blocks immediately after the file when additional writes occur), but they still break down as the disk fills up.

If you're running ext4, try an "e4defrag -v" to see just how much fragmentation is on your volume.

Or do an "fsck -nvf /dev/hda###". For any of the ext FS's, that should report a number of non-continguous files.

OTOH, the "need" for defragmentation, even on FAT volumes, is greatly overstated. Yes sometimes you do, but nothing like what's generally assumed.

- D
- DecadentLinuxUserNumeroUno
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Nov 4, 2014 12:18 AM

On Mon, 03 Nov 2014 17:52:40 -0600, Robert Wessel Gave us:

Only on a FAT type system. The "previously in use" sectors of a deleted ext2 file become immediately available AFAIR. Things seldom "have to fragment" and if it is a large file, those "fragments are going to be HUGE, and are NOT what the term was coined for.

- A
- Andrew Smallshaw
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Nov 4, 2014 1:31 AM

No, on _any_ general purpose file system. It is inevitable. You seem to be of the impression that Linux is somehow the ultimate system, I've noticed it in a couple of threads now. It isn't and it certainly can't do the impossible. The only sure ways to avoid fragmetnation impose a loss of generality. ISTR at least one of the mainframe systems required you to specify the maximum size on file creation so the space could be set aside for it (may have been MVS but I may be wrong, it's before my time) but that isn't compatible with the way most software operates that expect files to be growable. Similarly some WORM systems never fragment since a modified file gets re-written in its entirety (e.g. multisession ISO9660) but the cost in performance and storage space would again be unattractive for general use.

FAT was particularly prone to fragmentation by modern standards thanks to its heritage - it was designed for small floppy disks and the principle design constraint was memory consumption. To that end the free list wasn't sorted in any manner which is where the bulk of the fragmentation came from. On the other hand a file has to grow by a block but the block after the end of the file if _has_ to be fragmented - well, you could move one or both files but if they're 2GB a piece that cost more than a small amount of fragmentation ever would. Modern file systems are designed according to a range of criteria and minimising fragmentation is on that list but it isn't the most important criteria or even top of the list where performance is concerned. Techniques such as cylinder grouping improve access time but _cause_ fragmentation to some extent. If a file is larger than a group it will be fragmented, even if it is the only file on the disk.

None of this really matters except to the "my system is better than yours" advocates. Obviously if that 2GB file is broken down into a million fragments averaging 2K a piece that is a problem and you could find yourself in situations almost that bad with FAT. On the other hand if it's broken into a hundred fragments of 20MB each no-one should really care for most purposes.

--
Andrew Smallshaw 
andrews@sdf.lonestar.org

- A
- Andrew Smallshaw
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Nov 4, 2014 3:35 AM

I might have considered being insulted if this line of argument wasn't so comical.

Name me a single file on your hard drive that was _not_ zero bytes long on creation. Ignore directories or device nodes whose implementation is inherently implentation specific, just a normal non-special file that started life _greater_ than zero bytes long. If you had done _any_ programming you would have known this. I take it you don't have a hard drive full of zero byte files, so how did they grow to their current size?

--
Andrew Smallshaw 
andrews@sdf.lonestar.org

- D
- DecadentLinuxUserNumeroUno
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Nov 4, 2014 4:20 AM

On Tue, 4 Nov 2014 03:35:30 +0000 (UTC), Andrew Smallshaw Gave us:

Oh boy! A goddamned semantical total retard too, I see.

Straw man, much?

Damn... now I have to laugh.

BRL!!

- G
- glen herrmannsfeldt
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Nov 4, 2014 4:27 AM

(snip)

For OS/360, and I believe also MVS, you specify the initial (primary) file size, which must be allocated in four or less extents (fragments). You can also specify a secondary amount, such that the file can grow, in units of the secondary allocation, until 15 extents. If you specify RLSE, then unused tracks will be deallocated on close.

If you have large disks, specify a large primary, if possible the system will allocate in one extent, and then RLSE to free the unused space. If you don't RLSE, you can later append (DISP=MOD) to use the rest of the space.

The RT-11 file system can only have contiguous files. When you open the first file for writing, it will start at the beginning of the largest available region. If you open a second file, it will either start at the beginning of the second largest, or split the already used region in half. Files are never fragmented, but free space can get fragmented, and there is a process to move files such that the free space is contiguous.

-- glen

- G
- glen herrmannsfeldt
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Nov 4, 2014 4:42 AM

(snip)

I think that there was also a 3390 model that ran at 1200PRM, instead of the usual 3600RPM, and stores three logical tracks on one physical track. Access time is longer, so ti doesn't always help.

-- glen