I have a boatload of books, papers, etc. in a variety (too many!) of different formats (TXT, PDF, PS, CBR/CBZ, MOBI, EPUB, DOC, RTF, etc.). I'd like to pick *one* format and convert them *all* and forget about having to make sure I have the *right* reader on the right
*device*, etc.
Any comments (pointers to references) to help in deciding which would be the "least bad" choice? And, what I risk losing in the process?
txt is portable, compact, and easily parsed - but you lose all the formatting, structure, images, etc., and it is not necessarily easy to convert other formats to txt.
pdf is the most portable (while still retaining structure and formatting) - there are readers for it on every device and every OS. You have the advantage and disadvantage that the page layout is fixed and the same on every screen or printed page. When done properly, you can copy-and-paste from it. While there are utilities for manipulating pdf files a bit, it is basically a non-editable format.
ps is a little like the pdf of two decades ago. You need a PC (with Ghostscript) to view it, but that is not a common program on non-*nix systems (Windows, pads, etc.). It also has less structure (indexes, table-of-contents, etc.) than pdf.
I don't know what CBR/CBZ and MOBI are - and therefore cannot recommend them. If they are sufficiently obscure that I don't recognize them, they are not well enough supported for your needs.
epub is a possibility if the majority of your reading will be on small-screen devices (small pads, telephones). epub reads flow the text to suit the screen. But that also means that the appearance of the document changes depending on the device used to view it. epub readers are available for most systems, but are not nearly as common or mature as pdf readers.
doc and rtf are word processor formats, with all the disadvantages that brings - being editable means the data can be changed, but that's a disadvantage for this sort of use. You need a PC to display them, and the appearance changes depending on the version of office program used and the fonts on the system. You view them using a word processor, which is not an optimal program for reading - you don't get convenient links, contents, cross-references, etc., and your screen is filled with editing controls (or a hideous "ribbon" if you use MS Word).
In my view, this is an easy decision - pdf is the only practical choice. It also has the advantage that the majority of the stuff you have will probably already be in pdf format. doc and rtf (and even txt) can be easily converted using LibreOffice (which can be automated from the command line if you have lots of files). ps2pdf will handle any ps files you have.
I use Calibre for my books, papers,... library. It takes care of opening the files with the right tool on the pc, and convert to the appropriate formate when I move the file on another device (ebook reader,...).
Bye Jack
--
Yoda of Borg am I! Assimilated shall you be! Futile resistance is, hmm?
I was through this some time ago. I picked MOBI - for a reason I don't remember and I am not sure I could have explained back then when I did, may be I had read a few books in it. Someone mentioned calibre, I have used it to convert.
On Android devices I use a reader written by a Russian guy, let me check.... CoolReader, the icon is a brownish/yellowish open book. Works really well, may be it has a preference for mobi so I went there (but I don't remember if this is the case).
Too often, formatting contains information, itself. E.g., most documents contain multiple flows; how do you distinguish between these in a pure text document format?
You seldom want to *modify* a document (that you haven't authored). OTOH, it is frequently desirable to be able to *annotate* a document. In this regard, Adobe smartly foresaw this need. I miss being able to annotate (Windows) "Help" files...
CBR is commonly used for comic books; it's the equivalent of storing TIFF's of scanned pages in a PDF (except the scans are JPEG's -- because comics can easily tolerate the losses that JPEG's introduce!)
CBZ is a compressed form.
MOBI is a format designed for small screen devices (e.g., Palm PDA's).
I've also forgot to mention DJVU -- which is similar to CBR/TIFF PDF's.
Yes. Put all of these in the same category as HTML documents.
This ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ is the biggest win. The biggest down-side is viewing documents on "smaller screens" (where "small" is defined as "less than the size of the original medium"). E.g., most "papers" would require at least a ~15" diagonal screen to avoid the "pan and scan" interface.
Other advantages are support for multimedia, interactive documents and as a "packaging" mechanism (e.g., I "include" attachments in the PDF's that I create instead of having to add other "files" to a repository and somehow tag them as "belonging" to a particular "document".
I'd like to move my book/paper archives onto "dedicated devices". Presently, these things are stored on NAS devices just because of their sheer quantity.
So, to access one, I have to fire up a PC, fire up the appropriate NAS, then, invoke the right "reader" (software).
I'd like, instead, to just copy everything onto the internal drive of a "tablet PC". This makes access easier: fire up the tablet PC and browse for the file/document of interest.
The pen interface would also make it easy to make annotations to those documents (esp drawings).
And, by treating the documents as part of a *collection* (instead of as individual documents), I can better organize and cross reference them (instead of just lumping them into dubious "groups"/folders based on very limited descriptions: medical, mathematics, metal-working, etc. I.e., adding/maintaining metadata would be facilitated because it's all part of the same "collection".
I want to move them all onto a couple of tablet PC's (~12" dia screen) which will be the "normal" (portable) means of accessing them. Anything that I need to reference for longer periods of time (e.g., while writing code, designing hardware, etc.) I will network mount and access from my regular workstations. (I don't use/carry "mobile devices" so I'm not bound by their tiny screen sizes)
Some of the novels *might* be nice to read on a paper-back sized device but most that I've seen would be too tiring on my eyes.
I'd prefer something even larger (e.g., TRULY page-sized) but that's not essential; viewing half a page, magnified, with the tablet in landscape orientation will probably be sufficient.
epub is good if it's all text (if you would like to be able to read them on a phone or tablet). Epub is OK with footnotes, but if there are photos, drawings, diagrams, tables, or example code snippets, then epub isn't great, and pdf is probably best.
--
Grant Edwards grant.b.edwards Yow! I had a lease on an
at OEDIPUS COMPLEX back in
Good plan, but it's easier to just have a decent format converter such as Calibre in case you run into problems. I carry a flash drive with a mix of documents. If I need to read something, I convert it to MOBI mostly because that's what my various Kindle readers seem to like and because I like to email documents directly to my Kindle readers which requires a MOBI file.
See the comments from the author of Calibre on the topic: Note that he considers PDF the worst choice. Reading between the lines, I think he prefers: LIT, MOBI, AZW, EPUB in that order.
I've found that converting web pages into various formats is possible. Calibre has templates for handling various news web pages:
--
Jeff Liebermann jeffl@cruzio.com
150 Felker St #D http://www.LearnByDestroying.com
I think all of them suck; there's something about the user experience of a "real" book that is hard to replicate electronically. So, any electronic version thereof has to add capabilities that simply aren't possible with print (e.g., multimedia, interaction, etc.) to compensate for the munged interface.
I plan on doing the conversion *once* -- instead of each time I need to view a document (I'm only targeting *one* sort of device; ebook devices just don't cut it, for me).
Note that his bias against PDF seems to be rooted in the fact that PDF isn't intended to *be* converted -- except to another identically formatted *page*!
You can apply similar logic to any of the other formats that he describes -- none have "ideal" presentations so you're always letting the targeted device impose its constraints on the content.
Imagine viewing schematics ("oh, my! the screen is too small!"), sheet music, complex illustrations, etc.
I'm not interested in consuming and archiving "news" pages. The few HTML documents that I keep are just technical documents that their authors opted to create in that format.
As a result, they don't fit onto "physical" pages very well -- because they weren't conceived with page sizes or boundaries in mind. Converting them to other formats is too "involved" -- you have to re-layout the document in a way that (hopefully) is visually appealing, efficient and doesn't change the content appreciably ("Hmmm... this illustration won't fit, here -- I cram it on the next page. Oh, crap! That's a verso page so the text will now be separated from it!")
?? just because of the "serial" nature of the flow? E.g., I know that when I create documents, I go to great lengths to keep "non-text" objects and the associated descriptive text "nearby" -- deliberately anchoring the former to the latter and shepherding it's placement (instead of letting the tools automatically "fit them where they may")
As I target PDF for my container of choice, I am keenly aware of what the final presentation will be like -- whether viewed "online" or rendered to paper (e.g., I will avoid downsampling certain photos and illustrations if I think an online viewer might want to zoom for greater detail -- detail that would not be reproducible in a print for).
Several formats appear to just be tweeks to others with DRM additions (I don't purchase "best sellers" so there's no appeal to supporting any of those)
This is also what I've found useful. There's an import function (just point it at a directory full of e-books and it gloms 'em), lots of metadata support, works for MacOS or Linux or Wintel...
Trying to sort out formats is futile; even 'PDF' is a kind of envelope around multiple internal formats. Calibre supports conversions, on a case-by-case basis, and keeps all the formats archived nicely, which is what one really wants.
Even a LOT of e-books, fits nicely on any modern hard drive. Space used is not a real problem.
My hardest issue, is merging the collected e-books (books, catalogs, datasheets and app notes) from the work laptop and my home machine. I've got thousands, and it's still possible to just copy the whole thing to a new disk, but not to copy the organization (tags, metadata, corrected author fields, all confuse the merge process). I've had a new file rejected for inclusion because of some similar preexisting entry.
I think there is some almost hidden setting to decide what to do with similar entries: dont'import, import merging, import don't merge, etc. Also there are some plugins that helps merging similar entries
Bye Jack
--
Yoda of Borg am I! Assimilated shall you be! Futile resistance is, hmm?
On a *desktop*, perhaps not. My "paper archive" resides on a 1T NAS (mirrored onto another 1T NAS).
This includes:
- research papers (usually < 100pp; usually scanned TIFFs wrapped in PDF)
- magazines
- novels
- "references" (formal specs/standards, docs of historical interest, etc.)
- texts (what most folks would colloquially call "books")
- "manuals" (for computers, test equipment, cars, appliances, etc.)
- devices (service manuals, etc.)
About 400G falls into these categories. *But*, there are some executables thrown in there, as well (e.g., drivers for the various computers that are archived therein; firmware images; etc.). Just looking at the *obvious* "paper" portions of the archive (i.e., all except "devices") du(1) reports ~120G. If just 10% of the balance is "paper", that's easily 150GB.
Note that this doesn't include datasheets (which are now ginormous) or any "project related" materials (source code, PCB artwork, etc.)
Of course, I *could* spend days sorting through and pruning items from the collection: "Hmmm... I no longer own a 9-track tape drive so why should I waste some bytes on a service manual for it?" Murphy, of course, teaches that things discarded are *immediately* needed! :-/
Part of my reason for wanting to put all of this stuff on *a* "device" is to better integrate them with each other. Currently, I have to rely on my own familiarity with "what I have" -- and *where* I have it!
I would like to be able to "hot link"/"cross reference"/"see elsewhere"/etc. documents as I am using them.
E.g., when I was working on my first formant synthesizer, I had opportunity to explore different methods of implementing digital resonators. It would be nice to be able to leave a "trail of breadcrumbs" *in* the document that discussed formant synthesizers to lead me to the resonator documents (when I revisit the subject at a later date).
While consulting the resonator documents, there was need to explore the effects of limited precision representations of the coefficients and "signals" -- which leads to still other papers. Again, a trail of breadcrumbs would be helpful so I don't have to *remember* that there was a document that addressed this issue... "but what was it called? and, where was it located in my file hierarchy??"
And, of course, to be able to annotate the documents with general personal notes for future reference -- as well as being able to *find* those notes (even if the document itself isn't searchable because it was a scanned image, photo, drawing, etc.).
If individual "files" can be removed, then you need some sort of URN by which you can track (and eventually re-locate) each of these references. Easier to just put everything in one device and ensure they all remain together!
You might think so... but, when I walk into LARGE municipal libraries, I encounter thousands of documents bound and sitting on the shelves. And, the "tool" that I use to view each of them is exactly the same! Regardless of whether they are fat/tall/wide/short, contain stories of adventure, lists of words and their definitions, pretty pictures, etc.
OTOH, if I insisted on viewing them with a TELESCOPE (or a microscope!), I might find that the tool was improperly targeted to the material presented!
ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here.
All logos and trade names are the property of their respective owners.