OT: card storage

is

Sounds a lot like an X-terminal. Due to volume issues the laptop may yet be lower cost.

Reply to
JosephKK
Loading thread data ...

No. An X terminal has a processor in it, understands the X protocol, has a network interface, etc.

I.e., if I gave you an LCD monitor and a keyboard, you could never run xdm -- unless you added a processor and a NIC. The device I am describing could be used "as a TV" (with an NTSC-VGA adapter) -- something you aren't going to do with an X Terminal.

Reply to
D Yuniskis

I have seen a video footage of a machine used by a library (sorry, I did not record the details), which opened the book about 120 degrees. One arm took the next page down to horizontal level, a horizontal glass sheet was put on the page to make sure the page was truly horizontal, the flash light was activated and the class sheet was removed and the sequence restarted.

The sequence took about 2-3 seconds. Apparently some auto-focus was used, since the distance between the lens and the paper changed each time a new page was added.

The odd pages could be processed in one run and a separate run would be required to process the even pages (including flipping the page and inverting the picture order).

Put a heavy glass sheet on the page you are photographing, this will flatten out the page and you will get equal focus across the page.

If you do the trouble of carrying the manuals to the print shop, why not let them scan the pages ?

In a law abiding print shops you may have to prove that you have the copyright to make your own copies.

----------------

Then there is the question how to store the scanned pages and also how to distribute the (web) pages in a bandwidth efficient way.

I previously thought that storing and displaying the scanned pages as simply bilevel (1 bit/pixel) bitmaps (typically run length encoded as in faxes) would be sufficient, however, such page pictures look horrible and the OCR software does not reliably make sense of the text.

1 bit/pixel is really too little and 8 bits/pixel would be excessive. How many bits/pixel would be sufficient for pleasant visual rendering or required by OCR software ?
Reply to
Paul Keinanen

I suspect such a device is considerably beyond my *practical* budget! :>

Yes. I do similarly when running the sheets through the document feeder. Once "prepared", I can do 5 or 6 pages a minute -- not too bad but, when you have tens of thousands of pages... :<

1) I didn't realize they could do this 2) it's probably not inexpensive 3) copyright issues

Exactly. It seems like the attitude towards this waxes and wanes. And, no doubt, varies based on who's working on that day, etc.

That is where the manual aspects come into play. You need to review the results of the scan to decide how best to proceed. I've not found any "magic bullet" -- unless you don't care about size (or quality).

It depends on the sizes of the typefaces used. Note that this can vary within a document.

And, whether there are illustrations, etc.

Sometimes, you get really grainy images -- as if there was dust on the scanner (though it is NOT the scanner that is the source of the problem).

For decent typeface sizes, I will use 1bpp at 400-600dpi. This is readable *and* OCR-able (not to be confused with ocre-able -- which is the ability to turn something into ocre!) Other times, I will use 8bpp and drop down to 300dpi (trying to balance the added image depth against the decreased resolution).

I wrote some utilities to create *4* bit TIFFs but very few programs will recognize this encoding (despite adhering to the letter of the spec).

I generally avoid the OCR stage as it requires *lots* of proofreading. Images often get mishandled. Text often gets misrecognized (remember, these are "computer manuals" so "pigx" and "pigy" might be real "words" despite the OCR packages attempts to "fix" them into "pigs" and "piggy", etc.). I figure just creating the (electronic) documents is enough of a "donation" so if folks want to grumble, they can go find better versions (hint: most of this stuff is simply NOT AVAILABLE). :>

"If you don't like what I'm serving for dinner, you're welcome to eat elsewhere..."

Reply to
D Yuniskis

My standard comment to those that "make the decision" at places like Kinko's is. "These copies are being made because the technicians at my shop at complete assh**es and destroy original manuals every time they pick one up. Now if I can keep them from writing on the monitor screens with a Sharpie I'd be a happy camper."

Usually I get the eye roll, but after they stop laughing, they authorize the full copying and or scanning of the documents.

Jeff

--
?Egotism is the anesthetic that dulls the pain of stupidity.?
Frank Leahy, Head coach, Notre Dame 1941-1954

http://www.stay-connect.com
Reply to
Jeffrey D Angus

Ha! I'm not sure I want to rely on that sort of response... :<

Reply to
D Yuniskis

Reply to
Michael A. Terrell

more

that is

yet

All read up. Now i see what you want. The best approach still looks like a really serious hack of a laptop. The monitor part is going to be really tough. Keyboard and one or more pointing devices should be pretty easy. You may have to hack the power brick, or the batteries or both. Once USB3 (3 Gb/s) becomes common you only need the one interface.

Reply to
JosephKK

I think it is important to keep the distinction between scanning/storage format and on the other hand the publishing format.

These days 1 TB of storage costs practically nothing (and an other TB for backup), IMHO the source should be scanned and stored with the best available resolution and bit planes, possibly with some very mild compression.

You can then make some 1 bit/pixel encoding for publishing and heavy compression.

After a few years, you can reprocesses your digital source archives, without rescanning the original documents when better software is available, in order to produce smaller or higher quality publishing formats.

4 bit/pixel might be a usable format for _storage_, since this can register the varying illumination, the whiteness of the paper and how black the ink is. This might be usable information when postprocessing to 1 bit/pixel.

As a compromise, you might publish the scans as bit maps, however, it might be a good idea to run your original scans through some OCR software and use the result to build an index. While a "pig" might be a bit unexpected in a computer manual index, there is much less manual proofreading.

IMHO the worst problem with scanned documents is that it does not usually contain a searchable index, so including even somewhat flaked index would be a great service.

Scanning fragile (and often disintegrating) paper documents is a way of preserve our cultural heritage.

Unfortunately, intellectual property laws (with protection times decades after the IP holders death), may in fact cause a loss of the human intellectual heritage.

Reply to
Paul Keinanen

Reply to
Albert van der Horst

Aye, but it works. That's the bottom line.

Jeff

--
?Egotism is the anesthetic that dulls the pain of stupidity.?
Frank Leahy, Head coach, Notre Dame 1941-1954

http://www.stay-connect.com
Reply to
Jeffrey D Angus

Ooooh... I want one of those for plagerism. If I'm going to break the law, I might was well go first class.

I copy a few service manuals, where the original is in ring binder format. One of my customers has a Canon ImageRunner 5000 copier, scanner, printer, etc conglomeration. Here's a video clip of it scanning both sides of service manual: (4MBytes) Unfortunately, scanning large size foldout pages had to be done by hand and usually in pieces. Some of the results are here:

Bottom line is that it's a HUGE waste of time trying to scan anything on a typical home bed scanner. The 180 page AN/SRD-22 manual took about 45 minutes (including screwups) on the Canon ImageRunner. I once did a similar manual at home on my HP bed scanner which took a total of about 6 hours to scan, cleanup, make searchable, and assemble into a document.

--
Jeff Liebermann     jeffl@cruzio.com
150 Felker St #D    http://www.LearnByDestroying.com
Santa Cruz CA 95060 http://802.11junk.com
Skype: JeffLiebermann     AE6KS    831-336-2558
Reply to
Jeff Liebermann

In my case, they are one in the same. I'm not in this as a "business" (I am uncompensated for the *many* hours it takes to convert the documents)

You'd be amazed at how quickly that eats up disk space! I scanned a disintegrating book on origami a few years ago seeking to preserve color, etc. It was over 100MB compressed. You can't store very many books if you preserve that much detail. :<

I don't know about *you*, Paul, but I don't get enough sleep as it is! :> I want things over and done with *now*. :-/

But, it's a "proprietary format", then. I used this on a manual I produced and it was nothing but trouble since I had to explicitly "unpack" each image before I could create the final artwork... then, repack everything to conserve space on disk.

I guess I look at it differently. The original PAPER document didn't have a (electronic) searchable index and "somehow" seemed to work. So, if the electronic document doesn't have that searchable index, it's no *loss* (it's just not a *gain*!).

E.g., I have lots of novels that I would love to preserve in this way. I don't care if they are available as text. I just want to be able to re-read them after the paper versions have disintegrated (paperbacks being notoriously short-lived). So, an "image" of a page that my brain can process -- even if it doesn't have enough fidelity for an OCR package to handle -- is quite adequate.

See AEK's work at bitsavers.org. Be prepared to be blown away! (be friendly to the server as I think it's his personal expense)

Reply to
D Yuniskis

I know it works. :)

I just thought it was only fair to issue the standard warning: 'This individual is classed as "Mostly Harmless!"' Do not look him directly in his good eye, or take his last doughnut and your chances of survival will be 93%. ;-)

--
Anyone wanting to run for any political office in the US should have to
have a DD214, and a honorable discharge.
Reply to
Michael A. Terrell

Reply to
Michael A. Terrell

I have to use formats that are "open" and/or widely accepted (which often ends up with them being "open"). I don't live in *just* the "Windows World"

Reply to
D Yuniskis

Paperport will print to a PDF driver program if you want. I like it for storing the raw scans because the file size VS image quality is great. I don't know if it works with other OS or not.

--
Anyone wanting to run for any political office in the US should have to
have a DD214, and a honorable discharge.
Reply to
Michael A. Terrell

Man, that's a looooooong download for 15 seconds of video.

Thanks, Rich

Reply to
Rich Grise on Google groups

Reply to
Paul Keinanen

A good advice.

It seems that Adobe has software to add OCR to a bitmap document. That means text is searchable. For an example see the old issues of Forth Dimensions (http//

formatting link
) under the heading Forth Online documentation. So although you're looking at a scan you can search for e.g. DROP and get it right most of the time.

(But I'm convinced that there will be a time that you ocr a 19-th century book, and the result will be better than the original.)

See above.

This is one of my grave concerns. The program SchoonSchip of (Nobel price winner) Veltman has a nice manual, that is free. The original manual (197x, mainly of historic interest) sits behind a (ca) 30 Euro fee. (I'm involved with this, trying to port SchoonSchip from 68K assembler to Intel.) It is not hard to imagine a hardcore Elsevier executive to drop all papers not downloaded for 5 years. (This has been a seminal activity for the "standard model" in physics, but what do they know ...)

Throughout history it has been a fight to have libraries in shape. We don't need another destructive force, besides wars and ignorance.

Please note that IP laws give protection. We are in no obligation to exert these rights to the full. A movement that establishes the habit of pushing all legacy documentation into the public domain would get my backing.

Groetjes Albert

--

--
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst
Reply to
Albert van der Horst

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.