Disk imaging strategy

- R
- Reinhardt Behm
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Nov 4, 2014 1:55 AM

OK, that sounds like a reasonable strategy. But just _containing_ some byte pattern will not be safe enough. You will need to examine the whole content of the sector otherwise you falsely detect "deleted" sectors. Perhaps you could even devise some compression algorithm based on sectors with equal contents, some variation of Huffman coding.

The problem is now to persuade the OS to mark deleted sectors this way. Just filling all unused sectors once does not guaranty they will contain the pattern when used and freed again.

But you could perhaps create a non perfect strategy from that. Regularly fill all unused sectors with a pattern. Let the system operate. It will overwrite some of your marked sectors and later free them without marking them. You will have to save those used and freed since you do not know better. But for a large disk which will be always mostly unused it might at least save a lot of backup space.

This strategy will have the disadvantage of requiring more time than a FS aware method. Initial and regular filling of free sectors and read all sectors during backup. But at least you save the write time on the backup medium.

--
Reinhardt

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Nov 4, 2014 2:41 AM

Correct. It's a probabilistic issue -- the larger the "required flag string", the lower the chance of it being accidentally encountered (of course, that is a lie as some very long strings could well be encountered: "We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness. That to secure these rights, Governments are instituted among Men, deriving their just powers from the consent of the governed" Possibly not as likely to be encountered on a "programmer's" computer. OTOH, an American History student might find this on *his* computer in a "live" file!

Note that you can still "be 100% safe" if you only use this "flag" as a highly compressible string. I.e., if you detect a sector containing

512 specific bytes (as alluded to, above), then "encode" that sector's contents as "VERY_SPECIAL_512_CHARACTER_STRING" -- using essentially the same sort of data reduction technique that a FS-aware scheme would use to indicate "nothing in this sector".

Correct. Can't expect the OS to fill them with anything when unlinked. Also can't prevent the OS from reusing them in a temporary file just after you've finished unlinking them (if the OS uses them in a PERMANENT file, then you WANT to preserve it's contents!).

So, you wouldn't want to do this on a system that isn't reasonably quiescent.

[But, the goal is to do this imaging just prior to *deploying* a system so it isn't "active"]

Exactly. My approach will be to fill, unlink and then shutdown. This minimizes the window in which the system can decide to reallocate some of those "marked, deleted sectors" for other uses.

It also slyly addresses the possibility of the system "getting very upset" when it runs out of disk space! (I.e., it will be taken down shortly thereafter so won't be expected to provide services in that stressed condition)

The image is created once. So, the costs of creating it aren't as important as the costs in "keeping the image around" (in case it is EVER needed).

The more practical cost is incurred when/if the "deleted flag" is used as a RELIABLE indication that the sector is "not in use" and, as a result, the sector is not restored (written) back to the medium. On a really large drive with low utilization, a fair bit of time can be spent writing "DELETEDDELETEDDELETED..." over and over and over again!

But, this can be offset by limiting the size of the *partition* that is being imaged (and, noting that partitions and disks are not the same thing!). E.g., for the student laptops, I can look at the amount of unused space available when the system is built. And, a rough metric on how quickly that CPU can push bytes at that disk drive. If this looks like the "fill deleted sectors" portion of the operation will be painfully lengthy, I can choose to split the partition into two partitions (e.g., SYSTEM and DATA). And, provide no "restore guarantees" on the non-SYSTEM partition (which, in all probability, will be "empty" when the machine is first built!)

I.e., the policy decision becomes mine instead of determined by an algorithm.

- A
- Andrew Smallshaw
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Nov 4, 2014 3:35 AM

I might have considered being insulted if this line of argument wasn't so comical.

Name me a single file on your hard drive that was _not_ zero bytes long on creation. Ignore directories or device nodes whose implementation is inherently implentation specific, just a normal non-special file that started life _greater_ than zero bytes long. If you had done _any_ programming you would have known this. I take it you don't have a hard drive full of zero byte files, so how did they grow to their current size?

--
Andrew Smallshaw 
andrews@sdf.lonestar.org

- D
- DecadentLinuxUserNumeroUno
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Nov 4, 2014 4:20 AM

On Tue, 4 Nov 2014 03:35:30 +0000 (UTC), Andrew Smallshaw Gave us:

Oh boy! A goddamned semantical total retard too, I see.

Straw man, much?

Damn... now I have to laugh.

BRL!!

- G
- glen herrmannsfeldt
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Nov 4, 2014 4:27 AM

(snip)

For OS/360, and I believe also MVS, you specify the initial (primary) file size, which must be allocated in four or less extents (fragments). You can also specify a secondary amount, such that the file can grow, in units of the secondary allocation, until 15 extents. If you specify RLSE, then unused tracks will be deallocated on close.

If you have large disks, specify a large primary, if possible the system will allocate in one extent, and then RLSE to free the unused space. If you don't RLSE, you can later append (DISP=MOD) to use the rest of the space.

The RT-11 file system can only have contiguous files. When you open the first file for writing, it will start at the beginning of the largest available region. If you open a second file, it will either start at the beginning of the second largest, or split the already used region in half. Files are never fragmented, but free space can get fragmented, and there is a process to move files such that the free space is contiguous.

-- glen

- G
- glen herrmannsfeldt
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Nov 4, 2014 4:42 AM

(snip)

I think that there was also a 3390 model that ran at 1200PRM, instead of the usual 3600RPM, and stores three logical tracks on one physical track. Access time is longer, so ti doesn't always help.

-- glen

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Nov 4, 2014 4:48 AM

IIRC, the Bullet Server could (did?) create contiguously stored files. But, that was largely possible because of its "write once" semantics (size declared a priori).

- D
- Dimiter_Popoff
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Nov 4, 2014 6:50 AM

Yes, the initial version (with 8.4 names) being introduced in 1994 (no way was this 20 years ago....). Now the long name directories (within the same filesystem) are much newer, added these last few years.

We could do that, sure. Your compress idea is really good, I can't think of a reason why it should not work. Then one can have files as mounted disk images to access which should make the exchange easy enough.

Yes of course. Pretty much everything you would expect from a filesystem.

Yes, what filesystem would it be without that :D .

I'd be tempted to go to all 0 files for the "highly compressed" pattern - for no good reason really, except perhaps because disks come as all 0 from the factory. But you will want to fill them up anyway so this is not a consideration.

Yes, all of the above. Making multiple copies of a single file will take 2-3 lines of script, to increment the name somehow. Deleting all in a directory goes the usual del * way, if you want recursion there is a script doing it (rm path/ -R ). I have deliberately kept recursive disk operations in scripts, makes new bugs show up, costs no overhead to speak of, can be retried/resumed, prompts me to write necessary extensions when there is some new need etc.

Well you can have the disk image as a file and do with it whatever you please under any OS. Or under DPS, but you don't have a dps machine.

I am not sure I get this, we may be thinking somewhat differently on how you will implement that. My understanding of your idea is that you will take the disk image - say a 20G one - and copy it elsewhere in 512 byte (say) pieces by skipping those 512 byte pieces which are all 0 (or all deleteddeleted whatever you opt for). You just write (say in another file)where these blocks were, position:length.

Oops, I think I get it now. You can know which sectors have been full of "deleteddeleted" but how will you know you don't have to restore them? What if there has been an allocated sector full of exactly this pattern? May be it is still practical to come up with some 512 byte pattern which will never occur on normal disks but this will be a vulnerability, perhaps an acceptable one. Clearly the "all 0" I thought earlier of is not applicable then. But I'd go for restoring all data (at least have the option to do if the shortcut does not get me anywhere). At say 20M/S a 20G disk would get restored in what, 1000 seconds. Not that long for such a massive intervention I suppose.

Hey, if 25C is "almost" comfortable then I don't know what is "really comfortable" in your book :D . We already hover between

0 and 5C, sometimes 10C, winter is coming... (we do hate it).

Dimiter

------------------------------------------------------ Dimiter Popoff, TGI

formatting link

------------------------------------------------------

formatting link

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Nov 4, 2014 8:29 AM

When you post to comp.arch.embedded, thread drift is a risk - when you cross-post to sci.electronics.design, thread drift is guaranteed or your money back :-)

I think there are four ways to identify the data that you need to store:

Metadata to tell you what is live.
Metadata to tell you what is dead.
Identify the live data from its contents.
Identify the dead data from its contents.

You have ruled out 1 and 2 by not allowing the imaging system to know anything about the filesystem. I doubt that 3 is practical - certainly not for a general purpose filesystem. That leaves number 4 - identify the dead data from its contents (alternatively, fill it with highly compressible zeros).

It's easy enough to zero out the whole disk before using it, which gives you a start. Writing out a large file full of zeros, as you suggested before, is the only general-purpose way to put zeros into the empty space. I gave some disadvantages of that earlier, but it may be the only general solution given the restraints you have given yourself.

On Linux, an alternative technique would be to look at the "fstrim" command - intercept the generated SATA TRIM commands and replace them with commands to write zero blocks. I believe that should be safe, and it should let you zero out all the dead data. You could even just store the fstrim output and use that as a list of dead blocks, if you have some way of getting the information to the imaging software - but be sure nothing more is written to the filesystem after the fstrim.

This suggests two different setups to me. One should be a clonezilla server system that will work for the windows systems (and also, I think Macs and "simple" Linux or BSD setups). This will cover the bulk of your systems, especially all of your pro bono stuff. The Windows systems will all be identified automatically, and since the filesystem is known, there is no problem imaging just the live data.

Then you have another system for your "weird" systems. You may find that these ones are small enough that you can just image the entire disk with "dd". Yes, it is a little inefficient - but it will be safe, reliable, and easy to understand and implement.

And if you walk under a bus, or decide to retire, there is a chance that someone else will understand how it all works.

For unusual systems, images will be taken much more often than they will be restored - the image is for emergencies. For duplicating (nearly) identical machines, images will be "restored" much more often than they are made. These are different purposes, and I believe you should be looking at different setups for these purposes.

OK, I've got a lot better picture of your aims now. And as I said above, I really think you should separate this into two systems - I believe each will be much easier than trying to make a general system that covers both.

- J
- Jan Panteltje
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Nov 4, 2014 9:46 AM

On a sunny day (Mon, 03 Nov 2014 13:51:59 -0700) it happened Don Y wrote in :

Well by now I think I figured the essence, that what you are doing is impossible.

Well I told you all that, in my first reply (with script). use fdisk, remove all partitions, make new ones, put one or more decent filesystems on it, and untar the data (you DID have some files at some point or else this discussion is pointless or file-less) and your done. If not install Linux on it and start fresh all over.

Look, you want to compress a partition image without knowing what filesystem is on it, that leaves gzip (or maybe zip).

Get a life.

- R
- Robert Wessel
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Nov 4, 2014 2:31 PM

It's been the reduction in seek times over the years that's driven the increase in rotational rates. 2400RPM was a semi-standard for much of the sixties, when average seek times were 40+ms (at which point an average 12.5ms latency is not the major contributor to access time). In the seventies and eighties IBM (and others) went to 3600 RPM (8.4ms latency), as seek times slowly dropped (30ms average on the 3330 in

1973, to 17ms on the 3380 in 1983, and then to 9.5ms on the single density, and 4200RPM, 3390s in 1989).

On the flip side the biggest contribution to seek time reduction since the 80s has been the reduction in disk diameter. Prior to that the 14 inch size was very common on large systems, the 3390s reduced that to

11 inches. Modern 15K RPM drives are usually 2.5 (or so) inch platters, even when they're in a nominal 3.5 inch chassis. 3390s are still the DASD of choice in MVS/zOS, but are now, of course, emulated on top of ordinary fixed-block disks, and the performance characteristics are quite different, and larger 3390 model volumes are not slower (although they may have more contention for a I/O path if parallel access volumes are not in use). So these days, most people use 3390-27s or -54s (which were never real devices, but the common names do indicate the relative capacities), unless they're using EAV, which allows more than 64K cylinders, but the 3390 track geometry is still maintained.

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Nov 4, 2014 3:11 PM

Yeah, well... I can *try*! :>

Agreed. Though doing any "in-band" signaling opens the door to coincidentally encountering that same data in the LIVE data. You're trying to store a sector's worth of data *plus* one extra bit -- "This data is live" -- in a sector's worth of data (WITHOUT that extra bit!)

In that case, you risk silently corrupting the restored image if you opt NOT to restore it (on the assumption that it is DEAD and, thus, "don't care"). I.e., the contents of that sector may no longer coincide with what they were -- yet restoring the metadata elsewhere on the medium will tell the recreated filesystem that those contents are actually "valid"/live!

Note that you can use anything to mark the sector (when you "fill" it with dummy files) -- as long as you recognize that "pattern" as indicative of "this is one of the sectors that I filled -- and then deleted -- so is (probably) empty"

Doing this before "building" the system is counterproductive (it was my initial approach). Far easier/quicker to write a tight routine that just pumps zeroes (or any data) out than it is to try to co-operate with an OS that has been installed.

But, the software installation process (esp windows) dirties far too much of the medium (temporary files, etc). So, you have to go back and fill "empty space" (defined by "wherever the OS lets you creat a new file!).

The appeal of the fs-agnostic approach is that it (should) work universally. And, "restores" would be the same effort regardless of platform ("imaging" would vary from platform to platform as file creation, etc. would have to operate *in* the host environ)

The pro bono systems have to be recoverable without my involvement. And, as I mentioned, expecting a (homeless) student to keep track of a "restore DVD" (or three) is an invitation to ongoing involvement ("Can't you make me a new DVD?" "Yeah, and how long before you misplace THIS one?")

Jaded?? :>

So, I would have to install Clonezilla on the "recovery/restore partition" on each of these machines. Then, dumb it down so it was a turn-key operation for them. So, you put this big OS in place that tries to bring up all the hardware in the machine (instead of just the disk), then take away all of that functionality -- just to be fs-aware...

The laptops that I have in front of me have four (MBR) partitions. At least two of which are not recognized by Clonezilla (CZ). So, it just does the dd | gzip trick when commanded to image/restore those.

[I'd also have to script CZ so that it would restore all three of the "non-recovery" partitions.]

And, I'm not sure how the UEFI (secure boot issue) will come to bite CZ *and* "my proposed solution" when W8 machines start appearing in donations! (a couple of years, tops?)

As for the machines that I have here, the SPARCs are problematic for CZ. And, the (x86) *BSD machines have a different disklabel(8) approach which doesn't expose them via the MBR.

I.e., I can't even image (ALL) my "computers", here, with CZ!

[that doesn't even address the appliances!]

I think the printers are probably small enough ( And if you walk under a bus, or decide to retire, there is a chance that

When I'm gone, my systems are coming with me! :>

As for the pro bono stuff, my "absence" has already shown that entropy quickly governs! E.g., they can't even keep track of the *two* "privileged" account passwords on the systems I built for them last year! And, their (contract) IT guy is only a 20W lightbulb... as evidenced by his solutions to trivial problems: "I need to install some new printer drivers on these machines" when, in fact, the "problem" was a loose cable on the back of the printer! (Gee, the machines WERE working. Then, ALL of them stopped being able to print. Their configurations are LOCKED DOWN. Wouldn't you go looking for something OUTSIDE the machines -- like the PRINTER -- for the cause of the problem??)

That may not be the case. E.g., I would only image the printers once. Install all the cruft that they "need", image them, then USE them in that configuration, thereafter. (getting stuff onto them is more tedious than "real computers" so the advantage of the image can pay off big if I have to replace or upgrade a disk)

Off to vote...

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Nov 4, 2014 3:27 PM

(sigh) And it is obvious that you didn't look through the sources.

Given the above, I suggest you "get an education".

- G
- glen herrmannsfeldt
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Nov 4, 2014 3:39 PM

(snip, I wrote)

But note that increasing rotational rate decreases capacity if you can't increase the bit rate. Also, at constant bit rate, decreasing rotational rate increases capacity.

Along with putting a cache local to the drive.

As I understand it, 3390 always emulated CKD using fixed-sized blocks, but in earlier models they were internal to the drive and controller. Later on, they were emulated using ordinary FB-512 drives. (I believe this is visible in the blocks per track calculation. That physical gaps between C, K, and D don't exist like on previous drives.)

But there was discussion, and right now I don't see any reference, to a physical 1200 RPM drive, which I believe was 3390-27, to increase capacity without increasing bit rate, by slowing the drive. (At some point, bit rate is limited by head inductance and other physical factors.)

But yes, with FB-512 emulation, that may have been an early favorite for emulation, with its nice large size.

-- glen

- J
- Jan Panteltje
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Nov 4, 2014 3:50 PM

On a sunny day (Tue, 04 Nov 2014 08:27:39 -0700) it happened Don Y wrote in :

No, you are clueless, and try to invent things that have already been invented. Try: gzip -c /dev/sdaX > my_gzipped_partition.gz

If ANY regular struture is present on that device, then it will be replaced with some token.

You'r rude too, well no worry I'm sure you will not invent a better gzip...

You are clueless!

- D
- DecadentLinuxUserNumeroUno
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Nov 4, 2014 4:11 PM

On Tue, 04 Nov 2014 08:31:57 -0600, Robert Wessel Gave us:

1.8 inch. Usually in a 2.5" form factor case.

The On-the-drive caching has also made up for a lot of the problems fragmentation DID cause for FAT in the early years.

Fragmentation WAS a problem, when all those seek transitions across those expansive platters added up.

Now, with full sector caching, and even full cylinder caching, all this goes away.

Fragmentation is NOT a problem today, even when a drive IS fragmented, and ONLY poses a small problem for the pro set, and they manage it out

100% on a daily basis.

- G
- George Neuner
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Nov 4, 2014 5:41 PM

It could, but usually did not because the client usually could not give it a size. Typically, a Bullet file was allocated in one or more largish extents and then consolidated when the file was closed. Any copies made of an existing file - e.g., to/from a remote server - always were contiguously stored.

George

- R
- Robert Wessel
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Nov 4, 2014 6:16 PM

True, but it's been the management of the physical magnetic spot size that's been the main driver in recent decades, not so much the actual data rate.

The slow (one-third speed) 3390-9s are an interesting case. The normal speed 3390s (-1/2/3s) were already maxing out the data rate of parallel channels, so 3390s with triple the physical track capacities would have been exceptional poor performers, needing all I/Os to go completely through the speed matching buffers. OTOH, ESCON was announced less than a year before the 3390-9s, and that *could* have handled the higher data rate, but 3390-9s, could, IIRC, still be attached to parallel channels. OTTH, tripling the data rate would likely have required a much bigger electronics upgrade in the 3390 (and 3990 controller). So while I don't know for sure, I've always assumed that the -9s were more a product of opportunity, building on the existing 3390 hardware, and having to live within some of the limits imposed by that, and not really any demonstration of the state-of-the-art of disk technology (which had already passed to fixed block devices by that time anyway - IBM itself shipped the 0681 5.25 drive, with 850MB, just a few months after the 950MB 3390-1s were announced).

Cached accesses don't really count as "normal" accesses with seek time and rotational latency. FSVO "local", even 3380 and 3390s could be attached to controllers with cache.

3380s and 3390s both physically used fixed size cells on disks. In the case of 3390s, these were 34 bytes in size. There was still (considerable) overhead for the key and data segments, but it all rounded to an integer number of 34 byte cells. The size (in cells) calculation for 3390s is on page 10 of:

formatting link

But even if there was not a physical, 2314-style gap between the count key and data sections, there was still overhead. You needed about ~4% more nominal cells bytes than your key or record, *plus* there was a fixed overhead of nine cells for each segment (plus ten cells for the count, in all cases), which is certainly a gap of sorts.

The net result is that there were gaps, and the exact calculation is different, but the general form is the same. One the 3350, for example, you assumed overhead of 185 or 267 bytes (the latter if you had keys) for each record - that's pretty much the same concept as 19 or 29 cells overhead per record on 3390s.

We can quibble over whether or not any of that is emulation.

3380 were similar, although with 32 byte cells, and a somewhat simpler overhead calculation.

My post would have been clearer, had I not managed to clip the first paragraph:

"The "real" single, double and triple density 3390s (model-1/-2/-3) all ran at 4200 RPM (faster than the preceding 3380s at 3600 RPM). The

9X density 3390-9s were the ones that ran at a third that speed."

There were never "real" -27s or -54s, it was the -9s which were slow.

All of the 3390s had the same track geometry, and pretty much all of the 3390 emulations work on a per-track basis. IOW, they simulate, in some form, what you can store in the 1749 physical 34 byte cells of the (visible) track on any 3390 device. 3390-1/2/3/9s all had ~56KB tracks and 15 of those per cylinder, as far as the OS could tell, emulation of a particular model really just altered how many cylinders were on the emulated volume. The large volumes came as the OS's cleaned up their support for larger number of cylinders.

- S
- Stefan Reuther
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Nov 4, 2014 6:56 PM

I have often used the "fill the file system with a file that is all zeroes" trick you already mentioned, so I cannot add anything new for that, other than a "+1, yes this works".

Stefan

- H
- Hans-Bernhard Bröker
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Nov 4, 2014 8:35 PM

Am 04.11.2014 um 16:11 schrieb Don Y:

And the true problem of it is that it canot possibly exist. Your approach is _not_ fs-agnostic. On the contrary. It doesn't even manage to be OS-agnostic.

You can't even begin to implement any such "fill disk before creating image" approch without violating that "fs-agnostic" idea. You have to know what that FS is, then you have to be running some OS that knows not just what the FS is, but actually knows how to _write_ to it without corrupting anything. Then you have to run (or even create) a program that writes data to that filesystem, on that OS platform.

Any overall procedure that can reliably tell which parts of a disk partition are currently used by the file system, and which aren't, must contain some part that _is_, for all practical intents and purposes, an implementation of that file system.

You can throw the problem over the nearest fence and claim it's gone, but you can't actually make it go away. E.g. using Linux or the FS's usual host OS to mount the partition and write to it does just that: it "removes" the need for knowing the FS's innards by pretending that Linux can do the job without needing that knowledge. Well: it couldn't.

In short: it is brutally obvious that it impossible to do what you're trying without diluting that "FS-agnostic" promise down to the kind of meaningless gobbledygook we engineers learned to expect from the marketroids.