OT: Email to Database

What's the best (i.e., easiest and cheapest) way to scrape an email and stick some of that content in a database flat file? Must be robust.

I'm hoping for something "off the shelf".

I am aware of email2db, email2rss, etc..., but some of those cost $1250 and up.

My business partner could almost certainly code this, but getting him to do ANYTHING on-time is a major pain in the ass.

Basically, we need a way for ZenDesk to backfeed some data to an in-house program to keep things in sync. ZenDesk has an API, but see the aforementioned ass statement.

I have no problem buying a solution - even the $1250 one, but that sure seems awfully expensive for a program to scrape incoming email that can be so ridigdly formatted to spec, that even Mr. Monk would be proud.

Thanks.

Reply to
mpm
Loading thread data ...

What do you mean by "content"? Just the "text" that follows the headers? Can you scrub ("86") attachments? Do you need to keep track of who to/from?

What's robust? How do you *envision* it failing?

By "database flat file", do you mean *a* generic file used as a database (in the sense that passwd(5) is a database?)

A small perl script should do the trick "wired" to the mailbox in question (i.e., so your mailer "delivers" mail to an alias that invokes the script). Assuming you aren't hammering away at it with hundreds of messages hourly.

I'd start by gutting Majordomo and arranging for the back end to issue an "INSERT" (or whatever) specific to your particular DBMS (or, whatever is serving in that capacity -- even if just a big file onto which you append new "content"!)

Does that mean you don't have coding skills?

Your business partner should be able to twist something like Majordomo to fit these needs. Or, spend a few minutes looking at it to see how to hook the message; then, gut it and pipe the content to

(got any Linux children??)

Reply to
Don Y

Ask the NSA, they have it figured out.

Reply to
Bill Martin

Can you scrub ("86") attachments? Do you need to keep track of who to/fr om?

Just text. No images, hyperlinks, sound files, etc.. The outbound emails will always originate from us. We can include a key or special format to a ssist in jettisoning everything else. We will not publish the email addres s.

Email not received, not parsed, or parsed incorrectly, and then failing to connect or update the database. Clobbering the database, etc...

(in the sense that passwd(5) is a database?)

Don't know passwd(5). Database could be text file, dBase, Excel, Informix, Oracle, FoxPro (showing my age now??), SQL, even Access (ack!!). My point actually was that there is no requirement for it to be a relational databa se. Maybe 4 columns of data tops. One row is one record. SQL is what the recipient program already uses.

n (i.e., so your mailer "delivers" mail to an alias that invokes the script ). Assuming you aren't hammering away at it with hundreds of messages hour ly.

My thoughts exactly! I think those $1250 guys are basically offering a per l script. (?) And as to loading, we're talking maybe 50 transactions a day (database writes). By the way, "append-only" is fine as far as this piece of it goes. The recipient program would take it from there.

I'd start by gutting Majordomo and arranging for the back end to issue an " INSERT" (or whatever) specific to your particular> DBMS

Thanks, I will Google that and kick the tires.

It means I don't have the time!! I could code this in Visual Studio (probably?) if I had time to dedicate to it. Like they say, when all you got is a hammer, everything looks like a nail.

I'm trying not to say business partner and twist in the same sentence. His work is top-notch (eventually, after what I consider to be too much han d-holding), but he clearly has no sense of time management. "I'll have it in two weeks" translates roughly to a random calendar: could be 8 weeks, could be never. But more often than not, you get something at 2.8x the original time estimate that sorta looks like what you spec'd but somehow still manages not to get the job done. It's frustrating to say the least, which is why I don't want to task him with this in the first place.

I'm pretty sure the $1250 approach will work, I just don't see where it's w orth $1250. Unless we're talking $1250 in aspirin and blood pressure medic ation. :)

formatting link

Does my Asterisk box count? :)

Thanks for the tips.

Reply to
mpm

Warning: encryption scheme is a bit wild.

Tell me about it: I'm no fan of Outlook. (even though I use it daily.)

To put a fine edge on it: One program sends (via email, not that it matters) jobs needing attention t o a second program. Let's call those source and destination.

The source is a very complex program that we wrote in-house. The destination (ZenDesk) is a CRM program and once it has the job, it pros ecutes it to completion. Think: dispatching field crews & follow-up, etc..

What we want: is for the source program to "know" what the latest is in th e ZenDesk ticket (destination) so we can speed-up creation of new jobs by b asically avoiding (ignoring) jobs that pop up again in the source, but whic h have already been floated over to the destination (Zendesk). In other wo rds, we just want to close the loop between these two separate programs so the left hand knows what the right hand is doing. We will only need to pas s about 4 text variables to really speed-up the whole process. It will be a huge time-saver, once implemented.

It is enterprise, not desktop. And for reasons not particularly relevant, email is the best (and perhaps only?) way to do it. It is certainly trivia l for Zen to spit out an email using triggers and automation. If we can ju st get that email content into a database that the source program can acces s - Bingo! Loop closed.

Reply to
mpm

Simply ctrl-c and ctrl-v into a text file. Hire a high school intern. In my day, $1250 was huge! And now with the economy gone elsewhere, it may again be huge.

Reply to
RobertMacy

Well, OE and Outlook itself, IS a database. Warning: encryption scheme is a bit wild.

Reply to
Robert Baer

Sorry, the (last) point I was trying to draw attention to was how much (if any) *parsing* of the message is required.

E.g., you can only append messages to a "one line per record" file if each message is ONLY a single line! :> (Unless multi-line messages are equivalent to multiple *messages*/records!).

If your DB is expected to have more than one "field" (i.e., "columns" per "line/record"), then what goes in each column? Or, do you *hope* this is encoded in the message's format (tab delimited, etc.)

If you want to, for example, extract the "From" address and preserve it (in your DB) as "Problem Originator", then you have to extract that from the message and know where to put it in the "DB record". If you want to record the "Date Created" and "Problem Explanation", etc.

There's no way to know if an email got dropped along the way. But, every email "presented" to this agent can be noted. E.g., you can arrange to deliver messages to an alias that includes some other "log" recipient (whose mailbox ends up with a copy of every discrete message). I.e., you can cover that possibility outside the scope of this "project".

Bugs not allowed. :>

Bigger problem is how do you handle "message incorrectly formed"?

OK.

Sorry. What I was getting at was whether a "flat file" could be used in which the *format* of the file embedded the structure of the data.

E.g.,

11/09/2013*M. P. Millard*U27 overheating*Bob Flunky 11/12/2013*Bob Flunky*Added heatsink to U27*RESOLVED

where each line (record) is of the form:

  • * *

Obviously, this means nothing can contain a '*' (or, you have to take measures to encode that in the file in a way that removes its "special meaning" as a "field delimiter")

It also means (I am lying, here) that each "field" can not contain line-breaks and probably wants to be reasonably short (you don't want lines with 10,000 characters -- you'll never be able to review them!)

OK. Flat files are easier to create (and "visually inspect"). But, they impose some constraints (see above) on what you can include in each "field"/column.

See comments re: what these fields are and contain.

You could write a little script to pack the "field values" into an SQL statement and go that route.

To be fair, they are probably also talking directly to your mail host. I.e., they could probably interact with *gmail* (residing on someone else's server). My approach shortcuts that by expecting the mail server to deliver each message to a *pipe* that feeds a program directly.

I.e., AS IF someone sat watching for each new incoming message. And, when one arrived, started this little program (perl script) and *typed* the message into it, verbatum. So, the script only has to deal with parsing the message and massaging it to a form that the "DB" can handle.

This is how "bug reports" are automated.

Majordomo has a lot more "mechanism" than you probably need. I.e., it is intended to manage mailing lists so it allows messages to contain "commands" that it will interpret. (I didn't know how much you initially wanted in terms of complexity)

A better (smaller) place to start might be an OLD version of GNATS:

To give you an idea of how it was *used*

Ain't life a bitch? :> By the time you've picked up all the skills you need, you don;t have the time to *use* them!! :<

I think that would be harder. Try to exploit some existing facilities to reduce the problem to more of a "script" than a real program. That's where the "perl"s come in handy!

Standard estimation process! Double quantity and bump to next higher unit of measure: 2 weeks -> 4 months.

So, problem is actually one of constraining the problem ("Shoot the Engineer") and not letting it be an opportunity for "creative thinking"

Given how quickly you can spend $1K in *time*, you might find it cheapest to just buy a solution! That way, you don't have to deal with "Gee, our email provider does X, Y and Z which makes approach Q a bit of a chore..."

This is how we end up with less than ideal -- but WORKABLE -- solutions.

Exactly. I don't even bother *evaluating* $100 products. The time I spend thinking about it exceeds the cost of the product! Buy it and throw it away if it was a mistake. :<

Sorry, I meant any *kids* (humans) who like to tinker. It's the perfect sort of "text processing" application for someone who wants to "write a program" yet may not be savvy enough to know much beyond "sitting at a PC".

Good luck!

Reply to
Don Y

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.