Archiving very old paper diagrams, drawings and text

Sheesh, you're extravagant! Penciled design sketches couldn't possibly deserve more than 200 dpi scans. And 46 MB per scan means that you aren't using compression.

I scan things like tax returns, medical receipts, letters to scholarship funds, etc. For the most part, I go for the minimum resolution, minimum color depth scan which will preserve the information that I need. Typically that means

150 dpi black-and-white, saved in PNG format.

Full color scans are for old (pre-digital) family photos, which I convert to JPEG at a modest compression ratio.

Jerry

Reply to
Jerry
Loading thread data ...

GIF for archived format - maximum compression with

in distortion, even before the compression is done.

a few shades or colors (say 4 bits, 16 levels)black

can be encoded with any power of 2 levels (up to 8

and white, but gives a much cleaner visual

but the visual result would get "muddier" as the

distortion is less noticeable on continuous tone

(hard to compare when the distortion is of two

smaller file for line art.

or product photos), use line art B&W ---> GIF.

color levels, like edge preserving smoothing and

visual appearance.

Could you mention the tools that you use to do this? Also, the processing steps for docs that were meant to be black and white when made. Thanks in advance.

--
--Larry Brasfield
email: donotspam_larry_brasfield@hotmail.com
Above views may belong only to me.
Reply to
Larry Brasfield

I scan as "text" at 150dpi into PDF's.

But REALLY GOOD STUFF I re-enter into PSpice Schematics, then print to PDF.

...Jim Thompson

--
|  James E.Thompson, P.E.                           |    mens     |
|  Analog Innovations, Inc.                         |     et      |
|  Analog/Mixed-Signal ASIC\'s and Discrete Systems  |    manus    |
|  Phoenix, Arizona            Voice:(480)460-2350  |             |
|  E-mail Address at Website     Fax:(480)460-2142  |  Brass Rat  |
|       http://www.analog-innovations.com           |    1962     |
             
I love to cook with wine.      Sometimes I even put it in the food.
Reply to
Jim Thompson

Those are unbelievablly large files, even at 400dpi (which is good enough to record the printing details of postage stamps). What format are you saving these with? I would volunteer to show you how to shrink one of these while retaining effectively all visible information, but my mail box is limited to 5 meg.

Reply to
John Popelish

Digitizing any art that is not produced as pure black and white always results in distortion, even before the compression is done. This makes lines look jagged edged. If ,instead, you reduce the image to just a few shades or colors (say 4 bits, 16 levels)black and white line art has enough levels to produce visually smooth edges. GIFs can be encoded with any power of 2 levels (up to 8 bits, 256 levels, I think). This makes them just a bit larger than pure black and white, but gives a much cleaner visual appearance.

JPEG compression is designed for photographs, not line art. Its compression distortion is less noticeable on continuous tone images than high contrast line art. For a given amount of visible distortion (hard to compare when the distortion is of two different kinds) GIFs or the similar PNG compression usually results in a smaller file for line art.

It also helps if you do a bit of preprocessing before lowering the number of color levels, like edge preserving smoothing and despeckling. This really improves the compression efficiency, as well as the visual appearance.

Reply to
John Popelish

color levels, like edge preserving smoothing and

visual appearance.

I usually use Paint Shop Pro 7.04, but an earlier version 4.1, that I downloaded as shareware that had most of the tools.

In PSP7, the tools I often use for line art and text scans done at 256 gray levels or full color are: Effects, Noise, Edge Preserving Smooth and maybe Effect, Noise, Despeckle

Then I adjust the brightness and contrast to clean up the blacks and whites. Then reduce to 16 shades of gray before saving as GIF or PNG.

Before resizing down a high resolution image that has already been reduced to 2 levels, I often increase the gray levels 256 and use: Effects, Blur, Gaussian Blur, .7 to 2 pixel to soften the jaggies a bit.

Then after resizing (using the bicubic resample), I slightly increase the contrast to get back to cleaner blacks and whites (that were softened by the blur and resample) and then reduce the gray levels to

16 to preserve enough of the shading at the edges to smooth them out, visually. Saving at 4 bits per pixel makes a larger GIF file than saving at only 1 bit per pixel, but the visual improvement is worth the space for the visually more pleasing image. This is especially true of pencil drawings, where the line darkness varies quite a bit.
Reply to
John Popelish

I am posting on a.b.s.e, a resized 16 gray level example of a high but

2 level drawing recently posted there. If you saved the original, you can compare quality and file size.
Reply to
John Popelish

I have these things, they have been used a lot in the past, design sketches made with pencil, diagrams, most on A4... The systems (it was for) still exist. Today I started scanning these in with a Canon scanner. This gives about 46 MB per scan (at 400 dpi in photo mode). So I can get about 100 on a DVD. I use photo mode because this way most detail is preserved. Including the coffee spots :-) Since it is mostly diagrams, OCR has little effect here. But at least now I can throw out all that old paper :-)

My question related to this is: 'How does everybody else do it?'

And of cause I will upload the lot to a free unlimited email account ;-)

Reply to
Jan Panteltje

Think about using a compressed format, and use black and white (or gray scale). Think about data loss this way: unlike a photograph, where the loss of fine detail may not be desirable, line drawings still convey exactly the same information until the data loss reaches a point at which errors in the interpretation of the drawing begin to occur. Detail beyond this point conveys no additional information.

--
Paul Hovnanian     mailto:Paul@Hovnanian.com
------------------------------------------------------------------
"Grant me the strength to change what I can, the ability to accept
what I can\'t, and the incapacity to tell the difference."
        -- Calvin (of Calvin and Hobbes)
Reply to
Paul Hovnanian P.E.

Doesn't matter how far into the future you can get a CD drive. Once the images are digitized, it's trivial to migrate them from one storage medium to another as each becomes obsolescent. That's just impossible with paper. And, whatever "quailty" loss is experienced due to the digitizing process, *that's all*. It'll never degrade further the way paper would.

Plus, once the documents are digitized, it's possible to have several

*identical* copies in disparate locations, which adds even more durability to the documents.

Indexing, cataloging, accessing and so forth are also far easier with a set of computer files.

Isaac

Reply to
Isaac Wingfield

300 DPI is more than sufficent. If you treat the documents as line art, then all you have is B&W, so use GIF for archived format - maximum compression with zero loss. Now one could use JPG with a fair amount of compression for smaller files, but the visual result would get "muddier" as the compression increases,and that process has great loss. Unless some part(s) of a document have meaningful greys (shaded art drawing or product photos), use line art B&W ---> GIF. You will save a *LOT* of space and still have the detail.
Reply to
Robert Baer

Chances are the paper will live longer than the e-image. Paper lives a

*long* time. Disk drives? CDs? The idea is good, but I'm not so sure about the implementaion.

...and no, I don't keep paper either. I'm trying to get rid of as much "stuff" as I can.

--
  Keith
Reply to
keith

Try scanning at 100 to 150 dpi and save as 16 color (4bits/pixel) PNG files. I have had little success with B&W scans on anything at all non-uniform in color/contrast. 16 color usually works as well or better than grey scale and takes less space since greyscale is usually 8 bits per pixel.

I have been helping my daughter with a calculus course and with a couple thousand miles between us, the only viable scheme is e-mail with the odd phone call. Questions and answers have been exchanged with scans as above. Files are generally between 50KB and 200KB. i.e. ~50,000 pages per DVD.

Ted

Reply to
Ted Edwards

Ok. I have a tape reel around here from a 60's or 70's-vintage IBM mainframe. Oh, I think I have one of the big (8 inch?) floppy disks, too. I'm sure somebody could still read them now, but for how much longer. That also assumes the magnetic encoding is still strong enough for normal vintage methods.

Assume I had some Betamax tapes. If there were no players available, how much would it cost to recreate one. Is there enough detailed documentation from that period available to actually do it?

What you are saying is true assuming someone is diligent enough to re-encode to state-of-the-art every few years. There is also the possibility of the media losing its contents even assuming the reader devices remain available and functional.

I bought a commercially produced DVD a few years ago. I recently discovered that its entire internal encoded layer had turned brown and no device seems able to read any of what was once there.

In principle our fancy digital encoding techniques seem permanent, but in fact there are many issues that can render the data unobtainable in a suprizingly short time. As in the example above, how do we know the media is any good other than allow time to pass and see if we lost the data.

I still like using it, but I have learned that if I really care, I should check and re-burn the stuff periodically.

Hey hey, just now thought that copyright violators may be performing a public service by introducing redundancy. Also, removing the encryption may make it easier for future generations to reclaim some interesting stuff.

Actually, I thought this thread was going to be about how to remove the acid from cheap paper of old documents so that it will last for a very much longer time. That would also be a worthwhile discussion.

Reply to
rex

...and paper degrades over time - even if kept mostly in the dark, with controlled atmosphere (eg: the US Constitution). Use leather stored in the caves of Israel...

Reply to
Robert Baer

For many years, i had a 286 *underclocked* to match the original IBM PC 4.7MHz clock, tied to two Qume DT-6 floppy drives (special controller), a 360K, a 720K and a 1.2Mbyte floppy (all 5.25" drives) as well as a 1.44Mbyte 3.5" drive. Was able to read 8 inch floppies from any of the CP/M systems, IBM mainframes, and Unix systems. But as time passed beyond 10 years, the readability of the floppies went sour. Only floppies from high quality makers lasted to about 15 years.

Reply to
Robert Baer

It's said by those who know of these things, that the most permanent archive is simply to photostat the material using an acid free paper.

46MB for a single handcrafted A4 is big. Counterfeit money can be made for less. I've found a 300dpi JPG scan of a 'busy' A4 page may give a file of say 2-3MB. Loss of detail only becoming apparent at the X3, X4 magnification level. A 150dpi (manually increased compression) JPG scan (say 500kB) can still retain a vast amount of detail and (with lesser magnification) appear identical to the 300dpi/original version. For less intense stuff, ie. A4 hand circuit sketches, (say a dozen transistors 5 i.c's, 30 R/C/L's etc) with (normal!) hand printing, I've found it simpler just to stick with a 150dpi JPG scan, as the resulting file sizes are much smaller than a 'standard 8bit' Grey scale GIF or PNG. Yet still retain the detail. Simple sketches, or pure Black and White, machine/PC produced text/artworks (hard edges) seem best scanned as single bit (ie Black/White) PNG. An A4 page maybe= 30kB to 150kB. PNG format gives about 30% smaller files than the same GIF, yet don't carry the licensing mess GIF was dumped with a couple of years ago. Even better, are the Black/White files resulting from a 'level2 Fax encoding' but I've only ever seen this storage option built into the occasional PDF writer, even though it's part of the PDF spec'. Methods abound to reduce the archive files sizes even further but pretty much any normal, straightforward, JPG/GIF/PNG scan should surely be preferable to losing the will to live, as you hang around waiting on a single page, high dpi to finish. regards john
Reply to
john

On a sunny day (Mon, 30 May 2005 17:08:57 -0400) it happened John Popelish wrote in :

Ah, I stored in .tif format. Of cause one can use png. But I want a lossless format, so jpg will not do likely. Yes the high resolution is needed, when I was young I used to write numbers so small I can now only read these with a magnifying glass. Since it is pencil, storing as pure BW graphics needs a slice level, and that does not work very well (I have tried). But probably I will change what I have to a lossless compressed format, you are right.

Reply to
Jan Panteltje

On a sunny day (Mon, 30 May 2005 18:17:34 -0400) it happened John Popelish wrote in :

OK, I will experiment with some GIF formats, seems a good way.

Reply to
Jan Panteltje

On a sunny day (Mon, 30 May 2005 22:12:37 -0400) it happened keith wrote in :

Yes, interesting, see this document for some lifetime tests on DVD:

formatting link

Some of these old diagrams I have, have gone all yellow, and are falling apart. Once digital, you can always make a copy to a new medium without losses. Probably, as we will move towards 200GB or bigger blue light DVD perhaps in a few years, you can have all your life's work on one disk ;-) Much better then all these maps I think.

Reply to
Jan Panteltje

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.