OT: Need program to extract data from fields.

OT: Need a stand-alone program to extract data from MS Office fields. Specifically, extract data from user fields in PowerPoint slides.

IDEALLY, one would drag-and-drop the PPT onto a desktop icon. From there, the program would spit out a CSV file (or whatever) that provid es the following:

Field Name, Value (All the values will be text.)

There are roughly 120 fields, and it varies maybe 10%, depending...

We want to avoid macro's in the PPT for several reasons -- primarily becaus e you can't easily open *.pptm files in most mobile devices. That's a deal killer for the application right there. There are also business-process i ssues at play that simply rule out macros as a viable solution.

Yes: We "could" code this "extractor program" ourselves, but we just don't have the time to do it, and the clock is ticking on this one.

My thinking is that (at least for now..), this wheel has GOT to already be invented somewhere, but I'm having trouble finding it.

Got any ideas??

My apologies, I know this is the wrong group, but there's clearly a wide ra nge of knowledge here (and yes, on occassion, a wide range of ignorance her e too.) On this issue, unfortunately I think I'm in the latter category! Yikes!!

All help is appreciated though. Truly.

-mpm

Reply to
mpm
Loading thread data ...

What is a "field" in the context of PowerPoint, a text box?

Then what is a "user field"? Something opposite to an "Administrator Field" or...?

Are you using PPT somehow for data entry? bizarre if so...

--
Chris
Reply to
Chris

I think you need to define your problem somewhat better. A sample of the sort of page you mean would help clarify things.

Do you just want to grab the text from various PPT objects?

Sounds like a rivetting presentation to have to sit through - its effect on the unlucky audience almost as good as Vogon poetry.

There may well be tools to grab all text from an Office PPT file but whether or not that does what you want is another matter.

Save as HTML followed by a grab everything inside would be one simple way to convert it with minimal effort. No reason why the macro can't be in a Word or Excel document that selects and opens PPT files and populates a table or spreadsheet from the results.

--
Regards, 
Martin Brown
Reply to
Martin Brown

If you need to automate this, libreoffice can be run from the command line (and therefore a program script) to do things like convert formats.

Reply to
David Brown

Much like other PowerPoint pitches. Edward Tufte did a priceless fisking of the whole PP thing in "The Cognitive Style of PowerPoint".

Vogon poetry has its potential uses: "Turn off the projector, or I will rend thee in the gobblewarts with my blurglecruncheon, see if I don't."

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs 
Principal Consultant 
ElectroOptical Innovations LLC 
Optics, Electro-optics, Photonics, Analog Electronics 

160 North State Road #203 
Briarcliff Manor NY 10510 

hobbs at electrooptical dot net 
http://electrooptical.net
Reply to
Phil Hobbs

Thanks Martin, Chris...

First - this is not my idea (trust me!). The client wants to use PowerPoint because they feel it is the least common denominator when it comes to their staff being able to update and annotate with little colored arrows and what-not. Personally, I think this is stu pid because what they truly need to roll out is something like a SharePoint application, not individual PPT presentations.

Anyway, these PPT's are simply containers. While they could be shown to an audience, that is not their intended use. Nobody will ever have to sit th rough one of these. They're best described as containers for collecting fi eld data, which consists of end-user entered text in mostly pre-formatted f ields, and photographic images with minimal markups and annotations. At th e end of this process, I'd like to extract the "content" automatically so a s to avoid double-entry (duplication of effort), etc...

I like the idea of HTML tags. If that works, I can probably find time this afternoon to throw that together. That would at least get the wolf away f rom the front door vis-a-vis, the clock.

Don't mean to give you the bum's rush, but got to get moving this morning. If I can post a better explanation of what's needed later on today, I will .

Thanks again!

Reply to
mpm

[SNIP]

Excel or Word sound like a better fit to what he wants to do anyway.

I have a how not to do it Powerpoint presentation writ large (every other one is a how to). The one I am most proud of has meaningless distracting animations and fonts which vary type, size and style randomly per character. It looks for all the world like a ransom note.

It is astonishingly difficult to read even when you know what it says!

It drives me crazy when someone stands up and reads in a monotone from

20 lines of dense text on a PPT slide followed by more of the same.

Used well to show graphs and diagrams it can be a powerful tool. Used badly it can be awful...

I prefer the Klingon approach myself. More direct!

--
Regards, 
Martin Brown
Reply to
Martin Brown

You do not quite give enough detail for evaluation of the problem. Are you talking about Access or about Excel for the data source; nominally the use of terms like "field" is used in a database-like environment like dBase(TM) or Access.

If the data source is in Excel, then eXplicit details like what columns (or rows) are which "fields" would be needed for a macro to be written. Yes, in that case, you do not (for various reasons) want a macro to be a part of the Excel database. The "work-around" is simple: a second XLS "worksheet" is used, and its macro fills its workspace from the first,and then parses away, and (say) writes PDF documents,or HTML documents, or javascript or combinations as desired (i have done that and it is fairly easy).

If the data source in Access, then a similar work-around can be done; data output might be limited to whatever Access allows. Never used it so cannot say.

If the data source is in Nerd, the only viable process would be to output it as raw text and parse that with a custom program (which _might_ be Excel).

Reply to
Robert Baer

Ideally, somebody would use something other than Powerpointless for data entry.

catppt, which is part of the catdoc package, says it will dump out the plain text from a Powerpoint file. I just tried it on an old (2004) .ppt I had laying around and it does seem to work. I don't know if it knows about .pptx files. You'll have to build it for Windows yourself, though.

formatting link

You'd probably have to follow catppt with further processing (awk/sed, Perl, Python) to extract just the field names and values that you want. You can get versions of all of these programs for Windows.

There might be a .Net library to read Office files, but it probably costs money and doesn't work very well.

Matt Roberds

Reply to
mroberds

Years ago I got sucked into the PPT presentation thingy when I was doing a lot of stuff for Atmel, and that's what they (thought) they wanted.

Used it about a year, then realized that such presentations can be done much more nicely with PDF, allowing hierarchical navigation, marvelous for design reviews.

See an example at...

On the S.E.D/Schematics Page of my website. ...Jim Thompson

--
| James E.Thompson                                 |    mens     | 
| Analog Innovations                               |     et      | 
| Analog/Mixed-Signal ASIC's and Discrete Systems  |    manus    | 
| San Tan Valley, AZ 85142     Skype: skypeanalog  |             | 
| Voice:(480)460-2350  Fax: Available upon request |  Brass Rat  | 
| E-mail Icon at http://www.analog-innovations.com |    1962     | 
              
I love to cook with wine.     Sometimes I even put it in the food.
Reply to
Jim Thompson

ides the following:

use you can't easily open *.pptm files in most mobile devices. That's a de al killer for the application right there. There are also business-process issues at play that simply rule out macros as a viable solution.

't have the time to do it, and the clock is ticking on this one.

e invented somewhere, but I'm having trouble finding it.

range of knowledge here (and yes, on occassion, a wide range of ignorance h ere too.) On this issue, unfortunately I think I'm in the latter category! Yikes!!

It sounds like something I would do in Python. Use odfpy to convert the ppt slides to html. Use beautiful soup to parse the html.

Reply to
Wanderer

ovides the following:

cause you can't easily open *.pptm files in most mobile devices. That's a deal killer for the application right there. There are also business-proce ss issues at play that simply rule out macros as a viable solution.

on't have the time to do it, and the clock is ticking on this one.

be invented somewhere, but I'm having trouble finding it.

e range of knowledge here (and yes, on occassion, a wide range of ignorance here too.) On this issue, unfortunately I think I'm in the latter categor y! Yikes!!

Actually the python-pptx Python module might work better.

formatting link

Reply to
Wanderer

rovides the following:

cause you can't easily open *.pptm files in most mobile devices. That's a deal killer for the application right there. There are also business-proce ss issues at play that simply rule out macros as a viable solution.

on't have the time to do it, and the clock is ticking on this one.

be invented somewhere, but I'm having trouble finding it.

e range of knowledge here (and yes, on occassion, a wide range of ignorance here too.) On this issue, unfortunately I think I'm in the latter categor y! Yikes!!

The data source is POWERPOINT. When I say "field", what I mean is text boxes that have unique names assign ed to them in the Properties field. Think VBA controls.

The text in the textbox will ultimately go to a field in flat file database , like Excel or whatever, or maybe even Sharepoint. TBD. Right now, I jus t want to parse the data and get it out. The client ultimately needs to m ake up their minds where they want it to go.

Reply to
mpm

The newer versions of Office store their files in an XML-type format, maybe that could be post processed to make sense of the data?

HTH

--
Cheers, 
Chris.
Reply to
Chris

That sounds nice in theory, until you see what is inside there XML formats - it's just their original inconsistent, incomprehensible and inefficient binary formats written out long-hand and then wrapped inside a vast array of MS-specific tag names.

It is certainly easier for third-party software to parse than the original binary, but it is not a "proper" XML format like odt.

Reply to
David Brown

provides the following:

because you can't easily open *.pptm files in most mobile devices. That's a deal killer for the application right there. There are also business-pro cess issues at play that simply rule out macros as a viable solution.

don't have the time to do it, and the clock is ticking on this one.

dy be invented somewhere, but I'm having trouble finding it.

ide range of knowledge here (and yes, on occassion, a wide range of ignoran ce here too.) On this issue, unfortunately I think I'm in the latter categ ory! Yikes!!

I'm guessing you don't do Python. But just out of curiosity doesn't that la st example do exactly what you want?

Extract all text from slides in presentation

from pptx import Presentation

prs = Presentation(path_to_presentation)

# text_runs will be populated with a list of strings, # one for each text run in presentation text_runs = []

for slide in prs.slides: for shape in slide.shapes: if not shape.has_text_frame: continue for paragraph in shape.text_frame.paragraphs: for run in paragraph.runs: text_runs.append(run.text)

Reply to
Wanderer

s.

t provides the following:

y because you can't easily open *.pptm files in most mobile devices. That' s a deal killer for the application right there. There are also business-p rocess issues at play that simply rule out macros as a viable solution.

st don't have the time to do it, and the clock is ticking on this one.

eady be invented somewhere, but I'm having trouble finding it.

wide range of knowledge here (and yes, on occassion, a wide range of ignor ance here too.) On this issue, unfortunately I think I'm in the latter cat egory! Yikes!!

last example do exactly what you want?

I haven't looked at it yet. Will have to do this weekend. I don't really need all text, just the text in the VBA text boxes.

I did take a quick look at the XML - and it is ugly. Possibly "do-able", but ugly. Since that doesn't look quick or easy, we would just code it. Although that said, the clock is still ticking... :(

Reply to
mpm

lds.

hat provides the following:

..

ily because you can't easily open *.pptm files in most mobile devices. Tha t's a deal killer for the application right there. There are also business

-process issues at play that simply rule out macros as a viable solution.

just don't have the time to do it, and the clock is ticking on this one.

lready be invented somewhere, but I'm having trouble finding it.

a wide range of knowledge here (and yes, on occassion, a wide range of ign orance here too.) On this issue, unfortunately I think I'm in the latter c ategory! Yikes!!

t last example do exactly what you want?

If you need to use the XML, I would suggest using Beautiful Soup.

formatting link
%20XML

I think Python is easier than trying to figure out how to use somebody's us er interface. Need to do something, just Google it.

Reply to
Wanderer

[snip]

Is there an extremely good reason why the clueless muppets are not using Excel to enter their list of items in two columns?

Then you can make it so that they can only alter the input column.

I can't think of a more horrible user interface for lists than PPT.

BTW beware drag and drop input of images into Office documents. The file size can grow exponentially under some circumstances as orphaned dropped on binary image data clogs up the file.

I have seen old files of 100MB with only 5MB of live data inside! (the sort that are reports of inspections with new photos added)

--
Regards, 
Martin Brown
Reply to
Martin Brown

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.