Disk imaging strategy

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Nov 3, 2014 12:08 AM

It doesn't give you 1 TB on each machine - that's the point. Keep your main data safe in some big system (with raid, backups, etc., in whatever way you see fit) and just have the software and /necessary/ working sets on the local machines. So instead of having 5-10% of 1 TB "in use" and the rest "dirty" or "semi-offline", you have 90% of 100 MB in use.

Such a system won't suit everyone, but when it works, it simplifies backup and machine independence significantly. It also makes it a lot easier to track versions and "current" data, instead of having local copies and server copies that are a bit different - you have /only/ server copies and tracked backups of them.

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Nov 3, 2014 1:39 AM

That's exactly what *I* do:

The problem comes with newer disks. E.g., I keep ~1T on each workstation and *only* drag "current projects" onto them counting on the file servers to maintain most of my stuff "semi-offline".

Executables (and their documentation, support, etc.) are typically in the ~100G ballpark. The balance of the 1T is for whatever documents and "originals", libraries, etc. that I happen to be working on at the time.

[dynamically loading executables from an off-line store just doesn't work on many machines. And, none of this would work for a student's laptop!]

The point of my 1T example (pick ANY number for "total system capacity") is that most of the sectors can be "dirty" -- have "seen" data at some point in the past -- so you can't assume that "empty" would mean "compress readily" (as would be the case for a solution that was FS *aware*!)

I.e., the advantage of a FS-aware approach is you know which portions of the medium are significant -- "worth preserving" -- so the balance can compress to take NO space in the image.

That's exactly what I do -- though I may keep multiple branches on the machine while I am working on it so I can spin down the archive until I really need to check something back *in* (assuming I *will* do so)

- D
- DecadentLinuxUserNumeroUno
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Nov 3, 2014 2:32 AM

On Sun, 02 Nov 2014 15:36:11 -0700, Don Y Gave us:

Have them each get one of these. Look at the frequently bought together section, and also there are USB enclosures.

They could even boot and run from the detachable drive, and back up to the internal, fully bootable mirror, and compressed backup volume(s).

That way, a guy could put his dead laptop down, and put his drive into another student's laptop, and boot up *HIS* own system and finish working until his charges back up. Each laptop could even be further configured to have a guest partition and back up session data for guest sessions there. That one would be tougher though as hardware IDs would have to be utilized.

Nifty like... Like... nifty, man.

Run from the detachable, and back up to the laptop itself. A person would protect his "system on a leash" like any other valued item, such as a wallet.

This also makes the laptop itself a bit more ubiquitous for the student, should it get stolen. He gets a new one, and keeps running, while "the system" hunts down the stolen job, which carries non erasable ID info in a couple places on it, which are even electronically accessible. MAC ID Of course, as you know, and others too. The ones the NSA would like to use to backdoor you, plant evidence, gather it back up... use it against you, etc.

You know their drill. Just ask Monica's friend.

- D
- DecadentLinuxUserNumeroUno
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Nov 3, 2014 2:35 AM

On Sun, 02 Nov 2014 18:32:06 -0800, DecadentLinuxUserNumeroUno Gave us:

Ooops... forgot the link.

formatting link

- G
- George Neuner
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Nov 3, 2014 3:11 AM

The problem with dd is that there is no actual guarantee that what you read will work if written back. Even under the "raw" block devices there is a lot of translation going on.

More like a matter of "necessity". You get exactly 1 block - anything beyond that is your own responsibility.

The media itself may be damaged. It's less worrisome now due to sector remapping, but there's still a possibility that the restore may not work.

You can't fix damaged media.

Calling it an "emergency partition" rather than a "backup" is just semantics.

And also means "doesn't need to be restored".

George

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Nov 3, 2014 3:24 AM

dd(1) is an abbreviation for "low level access to block device". I'm not running a UN*X (or any other OS).

Yup.

If the medium is damaged, then it needs to be replaced. I.e., there is no need for a "restore" to work when the hardware doesn't.

Exactly.

In the "students" case, the chances are the restore will be necessitated from their system getting munged with spyware, downloaded cruft, etc. Expecting/requiring me (or someone else) to rebuild their system because they were irresponsible is silly. Give them the ability to do it... and, the COST of doing it (i.e., the potential to lose anything that THEY don't explicitly save before the restore)

There's a difference in expectations. I do "backups" all the time. I *seldom* do "restores".

This mechanism is intended for folks who never do *backups* but often/sometimes do "restores"!

(i.e., *I* will never be building a new "image" for a machine once it has left my hands)

- R
- Robert Wessel
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Nov 3, 2014 5:11 AM

FWIW, Windows has had an option ("/w") on the "cipher" command to wipe all unused areas on an NTFS volume since the W2K days, and it will do that on a live volume.

I'm not sure what it leaves in the empty space, but at least a decade ago it took several passes (writing zero, writing ones, writing random numbers, etc.) to all of the unallocated space on the volume. For your purposes* it would hopefully not have that random data pass as the last one. I also don't know if that ever worked on non-NTFS volumes.

IIRC, this was an add-on you had to download from MS in W2K, and part of the standard installation of *some* XP versions (Pro and server, I think), and has been standard on all Vista, Win7 and Win8 versions.

*The purpose of the command/option is to prevent people from recovering data from deallocated space, not prepping a volume for (image) compression.

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Nov 3, 2014 5:26 AM

Hmmm... interesting!

I will look for it out of curiosity.

That makes sense. Poor man's approach to a self-scrubbing filesystem.

I think I really want to pursue a more universal strategy.

E.g., I can "fill" the unused areas on my NAS devices by mounting the shares/exports and playing the create/fill/unlink game with the same sort of results as on a local filesystem. Regardless of level of RAID in place, etc. Writing files is a relatively common activity (for storage devices! :> ). Anything beyond that gets if-fy...

[Of course, for the NAS *appliances* I'd then have to physically remove the drive and install it somewhere that I could run the imaging executable]

- J
- Jasen Betts
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Nov 3, 2014 5:30 AM

[app that files the disk with a file full of zeroes]

yeah, unless you can find a mount option or equivalent that does "overwrite with zeros on unlink"

filling the free space with zeroes could take a lot of time if you have a lot free.

--
umop apisdn

- D
- Dimiter_Popoff
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Nov 3, 2014 5:49 AM

Hi Don,

since obviously there is no common solution to all filesystems (unless you want to copy the entire medium which is impractical), your best bet is to go minimialistic about it. Recognize which file system this is, then find your way to allocated space and store it in some indexed format - such that you can subsequently recover it. On some filesystems it will be easier than on others - e.g. on DPS you will need to locate a file in the root directory, unitcat.syst, which is a bitmap of the allocated clusters; and you have to read logic block 0 to see how large the "device" (i.e. partition) is, what block size does it assume and how many blocks are there per cluster. On FAT it will be easier I think (no need to do root or any directory). But you can't get around this minimum I suppose. Then there are not that many filesystems in mass use anyway (I think George Neuner already said that), so the effort will not be that huge.

Dimiter

------------------------------------------------------ Dimiter Popoff, TGI

formatting link

------------------------------------------------------

formatting link

- D
- DecadentLinuxUserNumeroUno
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Nov 3, 2014 6:08 AM

On Sun, 02 Nov 2014 23:11:52 -0600, Robert Wessel Gave us:

Hey, you could image your drive like this guy did.

He made a video "image" of his drive. Hehehehe... BRL!

He doesn't know how to count heads or platters though.

Don't know if I have seen bigger idiots.

Well... there is Sloman.

They inscribed 10 MB onto a clear roll of shipping tape to demonstrate how a laser cube storage medium would work. Two intersecting lasers scan each layer in a single pass, and the entire datagram (whole page)is read.

Too fragile, and would require a hard, miniature optical bench inside about an old 5.25" full height form factor canister for a 1" cube device. Way too fragile.

- D
- DecadentLinuxUserNumeroUno
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Nov 3, 2014 6:11 AM

On Sun, 02 Nov 2014 22:08:06 -0800, DecadentLinuxUserNumeroUno Gave us:

DAMN!!! I forgot the link again!

formatting link

I think that might be Sloman...

Bwuahahaha! BRL!!

- D
- DecadentLinuxUserNumeroUno
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Nov 3, 2014 6:21 AM

On Sun, 02 Nov 2014 22:08:06 -0800, DecadentLinuxUserNumeroUno Gave us:

Hey guys!

Image THIS quarter million dollar hard drive!

formatting link

I really wish I had it. Damned Aussie lucky dogs!

He talks funny too. :-)

- J
- Jan Panteltje
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Nov 3, 2014 10:09 AM

On a sunny day (Sun, 02 Nov 2014 15:07:33 -0700) it happened Don Y wrote in :

Not if you tar a filesystem (on that partition) see the script I posted. So mount partition, say: mount /dev/sdd1 /mnt/sdd1 tar that filesystem: tar -zcvf my_sdd1_backup.tgz /mnt/sdd1/*

If the partition has no files the tgz will be very very small.

If you want it back, create any filesystem, and tar -zxvf my_sdd1_backup.tgz

All links and timestamps sare preservd too that way.

I think I have not 'defragmented' anything ever in my life in Linux, there is no need.

Sorry cannot follow you there....

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Nov 3, 2014 10:36 AM

I am thoroughly confused as to what you are trying to do here.

On the one hand, you want a filesystem aware process so that you can image only real files, not empty space (or leftovers in deleted space). On the other hand, you want something completely independent of the filesystem.

On the one hand, you have only a small amount of real data in use, and on the other hand you might have large amounts of data copies that you also want to image.

So lets get back to basics, and try to understand your setup.

First, how many machines are we talking about? What sort of different systems varieties do you have in the OS's and filesystems? What about the different developers and users - are they at similar levels and are they cooperative/competent, or are you going to have to do these backups and images because they are not good at following version control routines?

The way I would organise this all is that the server (or servers) are masters. You have full control of these - you use raid to protect against hardware failures, and regular snapshot-style backups (such as with rsync or btrfs snapshots). /All/ your data is there. Where practical, it is in the form of version control repositories. Other data, especially more static data, may be just an area of shared files, which is backed up with snapshots.

On local machines, you have only temporary copies of any data while working. Losing that data is an inconvenience, but should not be a disaster. Users check out from the repositories, do their work, and check in changes. If you have need of data that is not part of the repositories, but should be kept safe, it is either accessed directly as part of the servers shared files, or you use rsync or similar backup strategies to copy from the local machine to a safe area on the server.

Imaging of the local machines is just a convenience to get the system running again faster if there is a hardware failure. It is usually done after setting up the basic system and installing key programs, and perhaps on occasions afterwards after major upgrades or installations. It is not about data backup, but merely saving time. Usually something like clonezilla or Norton Ghost to an external disk will be fine - if something goes wrong with the main disk, you can simply put in the imaged copy. Imaging can also be useful if you have multiple systems with the same setup - you might want images stored on a central server in this case. But you don't image the data - you only image the OS, programs, and setup.

Is there something special about your needs that makes such a system impractical?

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Nov 3, 2014 3:28 PM

Because I'm trying NOT to let the thread drift! :>

That conclusion doesn't necessarily follow. What I want to do is not waste "image space" on "dead data" (deleted files) -- WITHOUT EXPLICITLY KNOWING WHAT IS DEAD (because I have no metadata from a "file system" to tell me what is live/dead)

Correct.

I want to image the entire disk -- because I can't know what's live/dead. But, I don't want to waste space/time on "dead content". So, want a scheme (which may include "procedures" and not just "code") that will effectively give me that information without EXPLICITLY seeking it.

E.g., as I proposed, if I create files filled with some highly compressible data ("Dear Compressor, when you encounter me in your input stream, please represent me in your compressed output stream by the special token 'BIG_CANNED_STRING'. In doing so, you will know exactly how to reconstruct me without wasting much space on my actual content. Chances are, you will encounter me many, many, many times as you scan through the blocks of this drive..."), that data gets moved onto real platters (when disk cache is flushed, etc.).

Once I "run out of room" in the filesystem (more or less), I will have consumed the previous "dead space" (free space) with files of this type. I can then unlink all of these files thereby recreating the "dead space". But, while the previous content of this dead space was unrecognizable (without knowledge of the filesystem), it can now be recognized as such a filesystem agnostic piece of code (later).

Personally, about 30 or 40 drives (e.g., some "machines" have multiple drives). Note that a "machine" does not have to be a PC. Nor a SPARCstation. etc.

For my pro bono work, probably 200 - 400 yearly (but, that will hopefully only be 20-40 different "model numbers", 10 or more instances of each)

Personally, three different flavors of Solaris, three Windows, three NetBSD, a couple of oddball "OTS" systems (Jaluna, Inferno, etc.) and probably a dozen different "appliance"/proprietary implementations (effectively black boxes).

Pro bono is much easier. They'll either be PC's or Macs. But, their OS's will largely be defined by whatever happens to *run* on that particular hardware (donations may be of various "ages"). I'm guessing three different Windows (though within that, there can be minor variations like Home, Pro, Business, etc. editions -- possibly even on the same make/model hardware). Probably two different OSX versions (??).

They aren't "backups" (see my thread to George). They are "restore images". They aren't regularly performed (like backups would be). Rather, a machine is imaged (typically *once*) and the image saved in order to recreate the machine's state at a future time (if it gets munged).

So, I expect this to be far more involved than the routine "backups" I do for my working files/configuration. But, I expect the "restore" to be far simpler (UX) -- "push this button and wait".

For example, when I build a new system, here, I image the disk at various stages in that process. This lets me quickly return to one of those points in time (if, for example, I make some annoying mistake in a subsequent stage and want to "undo" it). Prior to putting the system into daily use, I have a final snapshot image. I.e., I can reproduce the software installation and configuration process very quickly if I have to at a later date (because a disk died, because some app scribbled somewhere that it shouldn't, etc.). Instead of DAYS to rebuild the system (individually installing and configuring each application, etc.), I can do it all in a matter of minutes.

And not worry about the things that an incremental approach might fail to address (have I removed ALL the cruft? have I added all the changes back in? etc.)

For the students, they tend to be careless users. And, there's always a certain amount of "I didn't pay for it so I take it for granted" attitude involved. ("If it breaks, I'll just ask for a new one. THAT won't cost (me) anything, either!")

Originally, I was pursuing a "build a set of CD's/DVD's for each machine" that would allow them to restore their machines (without my involvement). But, these would get lost, misplaced, etc. No real incentive for the student to keep track of them (most are also homeless so that would be one more thing for them to keep track of, "just in case"). You'd be naive to imagine that this WOULDN'T turn into "I lost my restore DVD. Can you just make me another one (DVD)?" "Well, if you can't do that, can I just trade this machine in and get ANOTHER machine?"

If, instead, the restore mechanism is on the disk (just like a factory restore partition -- but, with the *final* disk image instead of the *initial*/factory image), the student has no excuse to claim he's lost the DVD or "doesn't know how to repair/restore the machine".

Additionally, if the student feels his machine may have been "compromised" (perhaps an AV update points to the presence of a virus on his machine), he can "clean it" himself. (Hysterically, this has resulted in machines being returned and the system being rebuilt from scratch. *I* have no desire to be in that business -- ESP in an unpaid capacity! :> )

That;s what I do for my personal machines. But, that doesn't mean it is easily usable in that form!

E.g., I have ISOs of every CD/DVD I've purchased. But, if I have to go through the trouble of *installing* it to be able to *use* it, then having the original is just a small part of the solution.

Typically, I build specific machines for specific roles/purposes. Once built, I image their drives and preserve the images on a bunch of (removable) SATA/PATA/SCA/FW drives (depending on how I will ultimately need to restore that image) -- along with the installation log that documents every step in that particular build process.

This just saves me from having to repeat all that labor (install & configure) in a hardware failure, screwup on my part or if I just want to upgrade the local drives to larger ones, etc.

The "data" is the key, here. I can drag the documentation, schematics, PCB artwork, sources, etc. for a project out of the repository, work on it and then discard or recommit any changes -- PAINLESSLY. Because it's just "data" and not "workstation executables". There is no installation or configuration involved. I can erase it and know that there are no vestiges hiding somewhere unseen.

cd /Playpen rm -r *

Exactly.

I image each machine exactly once. I rarely update applications, especially if the reason for the update is solely security related. When the machine is upgraded, I move on to the newer applications which get folded into the image for that newer machine.

Exactly. That is the case with the pro bono effort: archive images for each type (make/model) of machine encountered. Install the COMPLETE image (including the "recovery partition") from the server. If I encounter another "identical" (make/model) machine in the future (often!), then I don't have to bother recreating a suitable image for that machine.

And, thereafter, the user (student) can restore the "system partition" (but none of their data -- because the recovery partition has no knowledge of their data!) at will instead of relying on me to perform that task for them.

I.e., they no longer have an "excuse" to ask (expect) someone to solve THEIR problem (because, chances are, the reason their computer is "all gunked up" is because of poor practices on their part!)

Those that try to "beat the system" by actually BREAKING their computer are "rewarded" by going to the end of the line: "Gee, it's too bad you dropped your laptop off the bus! We'll see if we can salvage any PARTS off of it. And, put you in line to get a replacement. But, there are 187 people ahead of you so you probably won't get one before sometime in the NEXT school year!"

[This is not an exaggeration. :< ]

- G
- George Neuner
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Nov 3, 2014 5:11 PM

That isn't entirely true - at least not with inode filesystems. The n-way tree structuring and inode caching reduce the need to defragment, but where sequential read performance is important, it still pays to defragment.

George

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Nov 3, 2014 5:17 PM

mount(8) brings filesystem specific code into the environment. Tell me how you are going to do this WITHOUT invoking the mount command!

Try gzip'ing /dev/sdd1 and look at the size of the resulting file! (i.e., /dev/sdd1 being the raw/block device without ANY knowledge of the filesystem it is currently supporting!)

Then, you can't arbitrarily shrink a "filesystem" because you don't know where the "live data" resides on it, currently. A file could sit in the last N sectors of the partition and you wouldn't know it. Shrinking the partition by M>N sectors means your file gets cut off the end!

(you have to explicitly or implicitly MOVE the file to ensure it doesn't fall past the end of the trimmed partition)

I'm going to give you a RAW disk. It has data on it and "deleted data". I don't want to waste space preserving the "deleted data" in the image that I create.

When you install that disk in your machine, you are going to discover that you can't "mount" it! I have changed the partition ID to some wacky value that the system from which I pulled it recognizes as "Customized FFSv2 Partition". The only thing that is really "customized" about it is this oddball partition type identifier *and* a macro wrapper that causes each reference to an inode to refer, instead, to "~inode".

The system on which the drive was mounted (containing these two changes) has no problem creating, accessing and deleting data on that medium. With virtually identical performance to a "genuine" FFSv2 filesystem.

But, YOUR tools won't recognize its contents. (I suspect simply changing the magic number assigned as the partition type would be enough to cause problems!)

[Of course, this is a hypothetical machine. I pose this to illustrate the case for ANY FILESYSTEM TYPE NOT CURRENTLY KNOWN TO YOUR IMAGING TOOLS!]

By contrast, the scheme that I outlined (upthread) will allow me to "fill" unused areas of the drive with "predictable, highly compressible content" USING THE NORMAL USER TOOLS PRESENT ON THAT ORIGINAL SYSTEM. Then, unlink those files. And, finally, run my executable OUTSIDE the scope of that OS (as it only needs to deal with the raw disk hardware).

You, OTOH, can best hope to do something like: dd if=/dev/raw_drive | gzip > image.gz And, your image.gz will typically be much larger because it will not be able to determine which is "deleted data" in the raw disk contents.

- D
- DecadentLinuxUserNumeroUno
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Nov 3, 2014 5:27 PM

On Mon, 03 Nov 2014 12:11:28 -0500, George Neuner Gave us:

There would be no fragmentation unless those sequentially read files were constantly being opened and added to, and even THOSE file writes are full commits, free of fragmentation on those file systems.

Kind of like saying "inconceivable".

"I do not think that word means what you think it means."

Sequential read performance is ONLY degraded on FILE reads of fragmented files.

So unless you are operating a database, and keep all your dynamic data on the same volumes as you system and static files, you would see the same number, even if the volume does have some fragmented files on it.

But again, you speak of the file system with seeming good intimacy.

But I was under the impression that this file system operates in such a way that fragmentation like that which occurs on a FAT type system, never happens.

You are saying EXT fs DO fragment files?

I think the actual file sizes might play into one's thinking here too.

Sequentially reading large scattered chunks is not that hard either. It is the database file that has had 50 0.5 kB commits done on it in the last hour that fragment a FAT drive.

Unchanging files do not fragment. The "holes" between them and the deleted files do not pose a huge problem either. It is that ONE file that has so many segmented locations to string together in a single "read".

Still... I did not know that ext fs drives fragment.

- D
- DecadentLinuxUserNumeroUno
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Nov 3, 2014 5:36 PM

On Mon, 03 Nov 2014 10:17:18 -0700, Don Y Gave us:

You need to author your own version of a forensic duplicator.

You are not duplicating a volume or its contents. You are duplicating every last sector on the drive, and IF you insist on not "mounting" a volume the ONLY type of success you will get would be an entire copy, including deleted data. Bit for bit, then compress that datagram.

If ANY "deterministic" cues are used to decide what is or is not deleted data, you ARE looking at files and you ARE looking at them via the file system, and WILL have to mount the drive and use its tables to do so.