need small xml parser, bsd license or similair

Hi all,

We have what is a common requirement: Do some mild xml parsing in an app targeted to the low end arm family. Our team has started builing an xml parser from scratch, but since that is a huge task I'm looking at finding a lib that can work for us.

What we need:

1) BSD or MIT license, etc 2) C language 3) Will take around 30K of space in our static lib, which is already taking 260K out of our 300K max.

I looked at expat and libxml2 . The latter claims to be suitable for embedded, but my simple gcc test with ' ./configure --with-minimum ' create a 1.2 meg static lib !

Any advice appreciated, Robert

Reply to
robert
Loading thread data ...

XML parsing a "huge task"? Maybe I'm missing something, but I don't see why. Especially if by "mild" you mean you have an idea of the kind of XML to expect, in terms of max depth etc.

Once parsed, how is the data used?

Steve

formatting link

Reply to
Steve at fivetrees

I wrote a tiny xml parser in some part of one day, by first setting up the tokenizer, then feeding tokens to a shit-slow brute force reducer. Similiar to a yacc/bison parser except that you fed tokens into the system instead of the system calling some subfunction for getting a token.

It was very very tiny, very easy to work with, and relatively slow compared to a yacc/bison parser, however that was irrelevant.

XML is difficult to parse, unless you've had experience with LR grammars and parser generators -- in which case it's trivial.

My code assumed the xml was kosher -- it didn't verify anything. I just checked and it was 439 lines of code, one .c file and one .h file.

Tokens were:

CommentOpen = Equals = = Open2 = String = "text" | text Open = < Close = >

Eof = end of file token

So you feed tokens in to a stack, then keep looking on the top of stack for patterns to reduce, meaning you replace the pattern with something else and take some action.

I've never understood why the standard xml parsers were so huge, bloated + seemingly obfuscated to the point of impenetribility...

-Dave

--
David Ashley                http://www.xdr.com/dash
Embedded linux, device drivers, system architecture
Reply to
David Ashley

In C++, not C, but can be used as a proof that this is not a "huge task" (~6000 lines):

formatting link

Reply to
Roberto Waltman

expat is big - it runs the same code through the compiler multiple times with different macro definitions to get versions of the parser that should be better for particular encodings - a technique I always thought was fundamentally misguided and baroque, besides being unlikely to help overall.

However, your mini-parser cannot correctly handle XML, in particular the presence or absence of whitespace is lost, you have no handling for CDATA, etc... That's not to start on what it takes to parse DTDs, internal and external, etc. ... In other words, you parse some subset of XML, but have no right to call it XML.

If you want a data language that's rigorously defined, easy to parse, read and write, and aren't wedded to using XML, you could do much worse than to support YAML.

Clifford Heath.

Reply to
Clifford Heath

Easily fixed with a state machine-based character parser.

Now there's a thing. If you need to refer to the DTD, and cope with all its variations, then yes, it's a big job. However, if the OP meant "XML" in the sense of structured data, it's fairly easy.

Interesting. I shall look that up.

Steve

formatting link

Reply to
Steve at fivetrees

You might think about ezxml. I have never used it, and it has quite a few limitations but it does compile and link statically at 12K. I hope this helps

Reply to
Wulf

XML is what you make of it. All I know about XML comes from looking at the API + code that uses it of things like expat or libxml, and from seeing xml text. I have no clue even what your CDATA, DTD even mean.

But then, the little bit of time I spent on the problem and the little bit I knew just happened to solve the problem *perfectly* in our specific case.

The original poster spoke of small, and this is an embedded newsgroup. So I think it would be a safe assumption that the need was for something to parse some xml-like text and do something useful with the information.

Was there a mention of parsing a complicated format like perhaps one of the new office XML output formats? Probably not.

People are frequently using XML as a way to encode information in such a way that it's human readable/editable, and painlessly extensible. I can see it being perfect for configuration of an embedded system. So to use the nice + convenient features of XML, do you need to bring in any of the giant libraries? Of course not.

-Dave

--
David Ashley                http://www.xdr.com/dash
Embedded linux, device drivers, system architecture
Reply to
David Ashley

Wulf escreveu:

Exactly what I was looking for, thanks! Robert

Reply to
robert

Steve at fivetrees escreveu:

This is all true ... however, since this thread acknowledges that xml parsing is common in embedded, I'd prefer to solve the issues particular to this project, rather than writing yet another xml parser , have to maintain it, and then wonder if it will cover all the corner cases.

To get an idea, this project is writing a generic, reusable web browser component to be used on cell phones by a famous vendor. The idea is this component will need to be ported to other arm's for future cell phones. With that background I'd like to ask one more question from this knowledgable group ... roll your own or use something simple like ezxml ? This project may end up rolling its own but I thought I'd ask for advice.

Thanks! Robert

Reply to
robert

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.