IIRC, the Bullet Server could (did?) create contiguously stored files. But, that was largely possible because of its "write once" semantics (size declared a priori).
IIRC, the Bullet Server could (did?) create contiguously stored files. But, that was largely possible because of its "write once" semantics (size declared a priori).
Yes, the initial version (with 8.4 names) being introduced in 1994 (no way was this 20 years ago....). Now the long name directories (within the same filesystem) are much newer, added these last few years.
We could do that, sure. Your compress idea is really good, I can't think of a reason why it should not work. Then one can have files as mounted disk images to access which should make the exchange easy enough.
Yes of course. Pretty much everything you would expect from a filesystem.
Yes, what filesystem would it be without that :D .
I'd be tempted to go to all 0 files for the "highly compressed" pattern - for no good reason really, except perhaps because disks come as all 0 from the factory. But you will want to fill them up anyway so this is not a consideration.
Yes, all of the above. Making multiple copies of a single file will take 2-3 lines of script, to increment the name somehow. Deleting all in a directory goes the usual del * way, if you want recursion there is a script doing it (rm path/ -R ). I have deliberately kept recursive disk operations in scripts, makes new bugs show up, costs no overhead to speak of, can be retried/resumed, prompts me to write necessary extensions when there is some new need etc.
Well you can have the disk image as a file and do with it whatever you please under any OS. Or under DPS, but you don't have a dps machine.
I am not sure I get this, we may be thinking somewhat differently on how you will implement that. My understanding of your idea is that you will take the disk image - say a 20G one - and copy it elsewhere in 512 byte (say) pieces by skipping those 512 byte pieces which are all 0 (or all deleteddeleted whatever you opt for). You just write (say in another file)where these blocks were, position:length.
Oops, I think I get it now. You can know which sectors have been full of "deleteddeleted" but how will you know you don't have to restore them? What if there has been an allocated sector full of exactly this pattern? May be it is still practical to come up with some 512 byte pattern which will never occur on normal disks but this will be a vulnerability, perhaps an acceptable one. Clearly the "all 0" I thought earlier of is not applicable then. But I'd go for restoring all data (at least have the option to do if the shortcut does not get me anywhere). At say 20M/S a 20G disk would get restored in what, 1000 seconds. Not that long for such a massive intervention I suppose.
Hey, if 25C is "almost" comfortable then I don't know what is "really comfortable" in your book :D . We already hover between0 and 5C, sometimes 10C, winter is coming... (we do hate it).
------------------------------------------------------ Dimiter Popoff, TGI
When you post to comp.arch.embedded, thread drift is a risk - when you cross-post to sci.electronics.design, thread drift is guaranteed or your money back :-)
I think there are four ways to identify the data that you need to store:
You have ruled out 1 and 2 by not allowing the imaging system to know anything about the filesystem. I doubt that 3 is practical - certainly not for a general purpose filesystem. That leaves number 4 - identify the dead data from its contents (alternatively, fill it with highly compressible zeros).
It's easy enough to zero out the whole disk before using it, which gives you a start. Writing out a large file full of zeros, as you suggested before, is the only general-purpose way to put zeros into the empty space. I gave some disadvantages of that earlier, but it may be the only general solution given the restraints you have given yourself.
On Linux, an alternative technique would be to look at the "fstrim" command - intercept the generated SATA TRIM commands and replace them with commands to write zero blocks. I believe that should be safe, and it should let you zero out all the dead data. You could even just store the fstrim output and use that as a list of dead blocks, if you have some way of getting the information to the imaging software - but be sure nothing more is written to the filesystem after the fstrim.
This suggests two different setups to me. One should be a clonezilla server system that will work for the windows systems (and also, I think Macs and "simple" Linux or BSD setups). This will cover the bulk of your systems, especially all of your pro bono stuff. The Windows systems will all be identified automatically, and since the filesystem is known, there is no problem imaging just the live data.
Then you have another system for your "weird" systems. You may find that these ones are small enough that you can just image the entire disk with "dd". Yes, it is a little inefficient - but it will be safe, reliable, and easy to understand and implement.
And if you walk under a bus, or decide to retire, there is a chance that someone else will understand how it all works.
For unusual systems, images will be taken much more often than they will be restored - the image is for emergencies. For duplicating (nearly) identical machines, images will be "restored" much more often than they are made. These are different purposes, and I believe you should be looking at different setups for these purposes.
OK, I've got a lot better picture of your aims now. And as I said above, I really think you should separate this into two systems - I believe each will be much easier than trying to make a general system that covers both.
On a sunny day (Mon, 03 Nov 2014 13:51:59 -0700) it happened Don Y wrote in :
Well by now I think I figured the essence, that what you are doing is impossible.
Well I told you all that, in my first reply (with script). use fdisk, remove all partitions, make new ones, put one or more decent filesystems on it, and untar the data (you DID have some files at some point or else this discussion is pointless or file-less) and your done. If not install Linux on it and start fresh all over.
Look, you want to compress a partition image without knowing what filesystem is on it, that leaves gzip (or maybe zip).
Get a life.
It's been the reduction in seek times over the years that's driven the increase in rotational rates. 2400RPM was a semi-standard for much of the sixties, when average seek times were 40+ms (at which point an average 12.5ms latency is not the major contributor to access time). In the seventies and eighties IBM (and others) went to 3600 RPM (8.4ms latency), as seek times slowly dropped (30ms average on the 3330 in1973, to 17ms on the 3380 in 1983, and then to 9.5ms on the single density, and 4200RPM, 3390s in 1989).
On the flip side the biggest contribution to seek time reduction since the 80s has been the reduction in disk diameter. Prior to that the 14 inch size was very common on large systems, the 3390s reduced that to11 inches. Modern 15K RPM drives are usually 2.5 (or so) inch platters, even when they're in a nominal 3.5 inch chassis. 3390s are still the DASD of choice in MVS/zOS, but are now, of course, emulated on top of ordinary fixed-block disks, and the performance characteristics are quite different, and larger 3390 model volumes are not slower (although they may have more contention for a I/O path if parallel access volumes are not in use). So these days, most people use 3390-27s or -54s (which were never real devices, but the common names do indicate the relative capacities), unless they're using EAV, which allows more than 64K cylinders, but the 3390 track geometry is still maintained.
Yeah, well... I can *try*! :>
Agreed. Though doing any "in-band" signaling opens the door to coincidentally encountering that same data in the LIVE data. You're trying to store a sector's worth of data *plus* one extra bit -- "This data is live" -- in a sector's worth of data (WITHOUT that extra bit!)
In that case, you risk silently corrupting the restored image if you opt NOT to restore it (on the assumption that it is DEAD and, thus, "don't care"). I.e., the contents of that sector may no longer coincide with what they were -- yet restoring the metadata elsewhere on the medium will tell the recreated filesystem that those contents are actually "valid"/live!
Note that you can use anything to mark the sector (when you "fill" it with dummy files) -- as long as you recognize that "pattern" as indicative of "this is one of the sectors that I filled -- and then deleted -- so is (probably) empty"
Doing this before "building" the system is counterproductive (it was my initial approach). Far easier/quicker to write a tight routine that just pumps zeroes (or any data) out than it is to try to co-operate with an OS that has been installed.
But, the software installation process (esp windows) dirties far too much of the medium (temporary files, etc). So, you have to go back and fill "empty space" (defined by "wherever the OS lets you creat a new file!).
The appeal of the fs-agnostic approach is that it (should) work universally. And, "restores" would be the same effort regardless of platform ("imaging" would vary from platform to platform as file creation, etc. would have to operate *in* the host environ)
The pro bono systems have to be recoverable without my involvement. And, as I mentioned, expecting a (homeless) student to keep track of a "restore DVD" (or three) is an invitation to ongoing involvement ("Can't you make me a new DVD?" "Yeah, and how long before you misplace THIS one?")
So, I would have to install Clonezilla on the "recovery/restore partition" on each of these machines. Then, dumb it down so it was a turn-key operation for them. So, you put this big OS in place that tries to bring up all the hardware in the machine (instead of just the disk), then take away all of that functionality -- just to be fs-aware...
The laptops that I have in front of me have four (MBR) partitions. At least two of which are not recognized by Clonezilla (CZ). So, it just does the dd | gzip trick when commanded to image/restore those.[I'd also have to script CZ so that it would restore all three of the "non-recovery" partitions.]
And, I'm not sure how the UEFI (secure boot issue) will come to bite CZ *and* "my proposed solution" when W8 machines start appearing in donations! (a couple of years, tops?)
As for the machines that I have here, the SPARCs are problematic for CZ. And, the (x86) *BSD machines have a different disklabel(8) approach which doesn't expose them via the MBR.
I.e., I can't even image (ALL) my "computers", here, with CZ![that doesn't even address the appliances!]
I think the printers are probably small enough ( And if you walk under a bus, or decide to retire, there is a chance that
When I'm gone, my systems are coming with me! :>
As for the pro bono stuff, my "absence" has already shown that entropy quickly governs! E.g., they can't even keep track of the *two* "privileged" account passwords on the systems I built for them last year! And, their (contract) IT guy is only a 20W lightbulb... as evidenced by his solutions to trivial problems: "I need to install some new printer drivers on these machines" when, in fact, the "problem" was a loose cable on the back of the printer! (Gee, the machines WERE working. Then, ALL of them stopped being able to print. Their configurations are LOCKED DOWN. Wouldn't you go looking for something OUTSIDE the machines -- like the PRINTER -- for the cause of the problem??)
That may not be the case. E.g., I would only image the printers once. Install all the cruft that they "need", image them, then USE them in that configuration, thereafter. (getting stuff onto them is more tedious than "real computers" so the advantage of the image can pay off big if I have to replace or upgrade a disk)
Off to vote...
(sigh) And it is obvious that you didn't look through the sources.
Given the above, I suggest you "get an education".
(snip, I wrote)
But note that increasing rotational rate decreases capacity if you can't increase the bit rate. Also, at constant bit rate, decreasing rotational rate increases capacity.
Along with putting a cache local to the drive.
As I understand it, 3390 always emulated CKD using fixed-sized blocks, but in earlier models they were internal to the drive and controller. Later on, they were emulated using ordinary FB-512 drives. (I believe this is visible in the blocks per track calculation. That physical gaps between C, K, and D don't exist like on previous drives.)
But there was discussion, and right now I don't see any reference, to a physical 1200 RPM drive, which I believe was 3390-27, to increase capacity without increasing bit rate, by slowing the drive. (At some point, bit rate is limited by head inductance and other physical factors.)
But yes, with FB-512 emulation, that may have been an early favorite for emulation, with its nice large size.
On a sunny day (Tue, 04 Nov 2014 08:27:39 -0700) it happened Don Y wrote in :
No, you are clueless, and try to invent things that have already been invented. Try: gzip -c /dev/sdaX > my_gzipped_partition.gz
If ANY regular struture is present on that device, then it will be replaced with some token.
You'r rude too, well no worry I'm sure you will not invent a better gzip...
You are clueless!
On Tue, 04 Nov 2014 08:31:57 -0600, Robert Wessel Gave us:1.8 inch. Usually in a 2.5" form factor case.
The On-the-drive caching has also made up for a lot of the problems fragmentation DID cause for FAT in the early years.
Fragmentation WAS a problem, when all those seek transitions across those expansive platters added up.
Now, with full sector caching, and even full cylinder caching, all this goes away.
Fragmentation is NOT a problem today, even when a drive IS fragmented, and ONLY poses a small problem for the pro set, and they manage it out100% on a daily basis.
It could, but usually did not because the client usually could not give it a size. Typically, a Bullet file was allocated in one or more largish extents and then consolidated when the file was closed. Any copies made of an existing file - e.g., to/from a remote server - always were contiguously stored.
True, but it's been the management of the physical magnetic spot size that's been the main driver in recent decades, not so much the actual data rate.
The slow (one-third speed) 3390-9s are an interesting case. The normal speed 3390s (-1/2/3s) were already maxing out the data rate of parallel channels, so 3390s with triple the physical track capacities would have been exceptional poor performers, needing all I/Os to go completely through the speed matching buffers. OTOH, ESCON was announced less than a year before the 3390-9s, and that *could* have handled the higher data rate, but 3390-9s, could, IIRC, still be attached to parallel channels. OTTH, tripling the data rate would likely have required a much bigger electronics upgrade in the 3390 (and 3990 controller). So while I don't know for sure, I've always assumed that the -9s were more a product of opportunity, building on the existing 3390 hardware, and having to live within some of the limits imposed by that, and not really any demonstration of the state-of-the-art of disk technology (which had already passed to fixed block devices by that time anyway - IBM itself shipped the 0681 5.25 drive, with 850MB, just a few months after the 950MB 3390-1s were announced).
Cached accesses don't really count as "normal" accesses with seek time and rotational latency. FSVO "local", even 3380 and 3390s could be attached to controllers with cache.3380s and 3390s both physically used fixed size cells on disks. In the case of 3390s, these were 34 bytes in size. There was still (considerable) overhead for the key and data segments, but it all rounded to an integer number of 34 byte cells. The size (in cells) calculation for 3390s is on page 10 of:
But even if there was not a physical, 2314-style gap between the count key and data sections, there was still overhead. You needed about ~4% more nominal cells bytes than your key or record, *plus* there was a fixed overhead of nine cells for each segment (plus ten cells for the count, in all cases), which is certainly a gap of sorts.
The net result is that there were gaps, and the exact calculation is different, but the general form is the same. One the 3350, for example, you assumed overhead of 185 or 267 bytes (the latter if you had keys) for each record - that's pretty much the same concept as 19 or 29 cells overhead per record on 3390s.
We can quibble over whether or not any of that is emulation.3380 were similar, although with 32 byte cells, and a somewhat simpler overhead calculation.
My post would have been clearer, had I not managed to clip the first paragraph:
"The "real" single, double and triple density 3390s (model-1/-2/-3) all ran at 4200 RPM (faster than the preceding 3380s at 3600 RPM). The9X density 3390-9s were the ones that ran at a third that speed."
There were never "real" -27s or -54s, it was the -9s which were slow.
All of the 3390s had the same track geometry, and pretty much all of the 3390 emulations work on a per-track basis. IOW, they simulate, in some form, what you can store in the 1749 physical 34 byte cells of the (visible) track on any 3390 device. 3390-1/2/3/9s all had ~56KB tracks and 15 of those per cylinder, as far as the OS could tell, emulation of a particular model really just altered how many cylinders were on the emulated volume. The large volumes came as the OS's cleaned up their support for larger number of cylinders.
I have often used the "fill the file system with a file that is all zeroes" trick you already mentioned, so I cannot add anything new for that, other than a "+1, yes this works".
Am 04.11.2014 um 16:11 schrieb Don Y:
And the true problem of it is that it canot possibly exist. Your approach is _not_ fs-agnostic. On the contrary. It doesn't even manage to be OS-agnostic.
You can't even begin to implement any such "fill disk before creating image" approch without violating that "fs-agnostic" idea. You have to know what that FS is, then you have to be running some OS that knows not just what the FS is, but actually knows how to _write_ to it without corrupting anything. Then you have to run (or even create) a program that writes data to that filesystem, on that OS platform.
Any overall procedure that can reliably tell which parts of a disk partition are currently used by the file system, and which aren't, must contain some part that _is_, for all practical intents and purposes, an implementation of that file system.
You can throw the problem over the nearest fence and claim it's gone, but you can't actually make it go away. E.g. using Linux or the FS's usual host OS to mount the partition and write to it does just that: it "removes" the need for knowing the FS's innards by pretending that Linux can do the job without needing that knowledge. Well: it couldn't.
In short: it is brutally obvious that it impossible to do what you're trying without diluting that "FS-agnostic" promise down to the kind of meaningless gobbledygook we engineers learned to expect from the marketroids.
On a sunny day (Tue, 04 Nov 2014 21:35:22 +0100) it happened:
You're making the assumption that all those zeros will fill up the disk.
If the filesystem does data compression you're just burining processor cycles for no gain,
If the file system does hash-based storage sharing you're running in place and the task will only complete when you run out of file names,
-- umop apisdn
To be clear, by "user" I mean can a human being walk up and enter "arbitrary text" into a file, etc.? I.e., I had assumed your filesystem handled files that the *instrument* created (e.g., observational data, instrument generated reports, etc.). Could a purchaser store his email addresses in a file called MYADDRS.TXT? Could he, likewise, create a file filled with repeated strings of "Kilroy was here!"?
Again, my questions are meant to clarify that a *user* can do these things on demand -- not just the *instrument* deciding that it needs to do a "COPY", etc. (for its own purposes)
OK. So, I *could* fill YOUR disk with files containing a specific512 character string (or larger). Then, once the OS complains "no space left on device", I could delete them all thereby freeing up all that space -- yet, leaving that 512 character string on the media (in the deleted files)
I would encode their locations "in-line". So, you open a *bit* stream and start pulling bits out of it in ~512 *byte* chunks. Any sector that was OBSERVED as having the "magic string" in it would be represented by a single bit in this bitstream. So, when *restoring* it, if the bit is set, you generate a copy of the "magic string" that you then store in this sector on the medium. If the bit is NOT set, you take the next 512 bytes worth of *bits* and store them as the "live data". Then, move to the next sector and repeat the process. I.e., a sector either takes a (single) bit or 512*8 bits to store in an image.
(You can compress this bitstream separately to achieve even higher compression ratios)
I *don't* know that! I *do* restore them. But, I don't need to save that "magic string" IN the image (for each such "deleted sector"). Instead, I can save a single bit (or less) and know that this bit represents that "magic string" -- regardless of whether or not the magic string is part of a deleted file (as would most typically be the case -- given that I would have written them deliberately as the contents of those files that I used to fill the medium).
Then I will have recreated that pattern! See above. Note that you MUST restore the entire contents of the disk as you have no idea what sort of "corruption/damage" it may have experienced (i.e., to have necessitated the "restore" operation). You can't even count on the disk to have any legitimate vestiges of the previous filesystem remaining (intact) on it!
You don't want to have to "format" the medium and build a filesystem BEFORE you restore the data. That would require a specific algorithm for each filesystem that you needed to recreate -- BEFoRE you even started restoring the DATA! (e.g., it is not uncommon to have two or three different filesystems on a SINGLE Windows laptop's disk)[So, if you are only going to selectively restore parts of the image, then you need another operation that will ensure any other disk structures required by the SYSTEM (incl filesystem) are present, as well! E.g., do a "format" before doing the restore...]
I don't think it is worth the effort. Having to reexpand the "single bit" into that magic string AND write that string onto the physical disk eats up time during the restore operation. But, even 100+GB isn't going to take forever! Remember, I am contrasting this with having to rebuild a system from scratch! (how long does it take to reformat the disk, install the OS, install all the applications, configure things, etc.) And, this operation could be performed unattended (which would not be possible if rebuilding the system from scratch!)
The key here is that I am imaging the system AFTER it is "built". It is not an ongoing operation: build system, image, add more apps, image again, add even more apps, image yet again, change some part of the configuration, image once more...
So, you can size the partition (medium) to fit the "live data". Let the user store *their* data on another partition (that I don't have to be concerned with).
This doesn't necessarily want to be a "free" operation. If you are using it often, there is something wrong with your usage habits! E.g., I've set up one of our laptops that we use EXCLUSIVELY for on-line financial transactions such that it wipes the disk after each reboot. Essentially equivalent to running off R/O media (as the machine is only powered up to perform the necessary transaction(s) and then powered off/scrubbed)
Well, it's been above 30 (35 to 40-ish) for MONTHS so being able to stand outside without melting is a huge improvement! :> Already amusing to see the "faint hearted" wearing jackets, gloves, etc. :-/
Unfortunately, too many other activities come with the improving temperatures so it gets harder to keep control of your time... :<
I usually reserve the last quarter (of the year) for equipment/tool upgrades/replacements, orgainzing the accumulated mess in the office, closing out the books, etc. Hard to do any "real work" as there are lots of activities that eat into your waking hours (parties, friends returning to town for the cooler weather, community events, etc.). And, a fair bit of time baking for the holidays.
I've got four specs that I would be *thrilled* to have formalized by year end so I can start work on implementing them after the new year. Plus a couple of significant chores that I need to tackle before the weather gets much cooler (pour some concrete, do some outdoor painting, finish laying some irrigation line, some body work on SWMBO's vehicle, etc.)
(sigh) No time for "work"! ;-)
"Blank" doesn't mean "full of zeroes". Rather, it means devoid of data. I explored various different "magic strings" to fill the "to be unused" portions of the disk.
One algorithm takes a fixed (const) 512-byte array and repeatedly pushes it into file(s) until the current file's write(2)'s fail and/or further creat(2)'s fail. Then, unlinks all of this.
I've tried with "compressible" patterns in that array as well as "less compressible" patterns (i.e., where the compressor would have to operate over longer distances -- between sectors instead of within*that* sector).
I have another that creates "random" data to fill the "to be unused" portions of the medium (the goal being to make compression *hard*).
In each case, those portions of the medium are "blank" when the files are later unlinked. Yet, completely restorable! The amount of space they require in the image can vary significantly -- hence the purpose of the experiment(s).
See above. You can push uncompressible data into those "empty"/unused parts of the media -- yet still restore it from a HIGHLY compressed form! You just have to choose data that the COTS compander can't easily compress -- but that *you* can!! :>
You can't come up with a naive "one-size fits all" approach. E.g., some filesystems may place limits on how large a file can be. So, you have to be able to create multiple files! (e.g., imagine FAT32 with >2G of space available)
Some filesystems may place limitations on how many names can fit in a container. So, you need to be able to create *new* containers (to let you restart the "per container" name count).
Some filesystems may have limits on the character set used for the names and the number of characters in an identifier. So, you have to be able to adjust this to support all of the above.
But, it's still an amazingly simple piece of code! And, doesn't need to understand any of the particulars of the filesystem it is deployed on. It just needs to be able to create/write/unlink files!
Remember, any filesystem has to be *usable*! FAT16 would be silly on a TB medium! So, if my "fill" code was run on that medium, you wouldn't fault it for not being able to fully consume the available free space!
Hey, of course you can do that. To cut things short, all of my programming is done under DPS, using its text editor, writing dps shell scripts etc. etc. The emails we exchange have never been here on a disk other than a dps one.*All* my design & programming work is done under dps, I can happily survive the day when windows and unix disappear from the face of the Earth :-).
Well yes, though mostly using command line (shell). I have started a file browser thing (began to make sense to have one once I introduced the longnamed directories) but at the moment it is on hold, other tasks of higher priority are in the way. Hopefully I can resume next month.
Yes, this would work OK.
Dimiter (sorry for cutting it somewhat short, those tasks of "higher priority" have me at the moment....
There are many different ways of "conditioning" the media to minimize the size of the image that you obtain (obviously, tailoring the imaging process to the conditioning that you employed). I started out by characterizing the content of each of the machines that I have here along with those to which I have access. Then, exploring how well various archivers (an obvious choice for an imaging solution!) process each of these (I've encountered some *spectacular* compression rates with home-grown archivers that far outpace what can be done with generic, OTS companders... rather easy to achieve when you control the data that is being compressed! ;> )
As I'm after a "restore" solution and not a "backup AND restore" solution, far more effort can be expended in creating the initial image if the restore can be relatively clean. Just like an archiver/compander, I can look at the content, evaluate multiple different imaging approaches (different companders), then choose the appropriate one to implement "portably" to give me a common "restore" algorithm that handles multiple different targets with comparable results on each. I.e., store an "image type" code in the image that then drives the restorer!
Being stuck living with the "GCD" of operators that are "always" exported by a disk operating system means you can really only safely create/delete, read/write files -- and possibly subdirectories. Unless you want to tie your solution to a particular OS/filesystem (which, in my case, would mean solving the same problem a dozen different times -- especially as the next batch of donated machines might introduce some new, "proprietary" filesystem that must be reverse-engineered/accommodated). Of course, anything you want to do to the medium *before* the OS and applications are in place gives you free reign! :>
Similarly, the restore operation wants to run on bare metal so it needs to be well defined without any supporting framework.
So far, it looks like I can achieve image sizes on a par with that of archivers (which KNOW about the OS on which their archives were created) WITHOUT specific knowledge of the OS/filesystem. (Of course, *I* don't have to deal with unconstrained data/environments so its an unfair comparison!)
ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.