lex/yacc efficiency tradeoffs

Hi,

Does anyone have any rule-of-thumb guidelines for the relative efficiencies of doing promotions in the lexer vs the formal grammar? Both in terms of time and space?

E.g., imagine:

[1-9][0-9]{2,4} return VALUE;

in the lexer vs. the same explicit definition in the *grammar*? (i.e., using ". return (int) yylex[0]" in the lexer)

Are there any easily identifiable criteria that affect the tradeoff between the two approaches?

Is there any (significant) difference in lex/yacc implementations wrt this issue (e.g., vs flex/bison)?

I can run some empirical tests but I am not expert enough on either to know how to push the limits with test cases...

Thx,

--don

Reply to
D Yuniskis
Loading thread data ...

Unless your compiler pass is compiling 24/7 and has to be real-time like, you absolutely shouldn't care and should write your compiler to be the most maintainable thing ever. Speed doesn't matter for 99% of compilation passes.

Type promotion is done either in the type checking phase of the compiler, after parsing, or in the code generator. It is better to make type promotions explicit in a pass of your compiler, since them you can warn when type promotions lose data (among other things). Of course, all of the phases can be adhoc'ed together, but then you end up with unmaintainable and fragile junk.

Well, that's my opinion anyway... :)

-pete

Reply to
Peter Keller

Parsing is such a small part of a compiler that I would trade time differences for clarity especially for embedded systems compile once execute many application environments.

Regards,

Walter..

-- Walter Banks Byte Craft Limited

formatting link

--- news://freenews.netfront.net/ - complaints: snipped-for-privacy@netfront.net ---

Reply to
Walter Banks

(sigh) Sorry, I should have clarified. I'm not using them to write a compiler but, rather, a parser for "messages". While speed isn't a real issue (though I wouldn't want it to be a *pig*!), I *am* concerned with size and stack penetration, etc. "footprint"

As such, I am concerned with how often the machine "backtracks", how much state it carries, etc.

Reply to
D Yuniskis

Not part of a compiler -- please see my reply to Peter Keller's post.

The point of using a formal grammar instead of an ad-hoc implementation is exactly that -- to make the formats of the messages parsed very obvious. And, to make it easy for others to write similar parsers just by cutting and pasting from an existing one.

However, since it *does* sit in a run-time loop, I can't

*ignore* runtime costs (time *and* space).
Reply to
D Yuniskis

If I were doing it I would still make a similar comment of doing it for clarity. My personal approach would be regular expressions rather than in the lexer as the function of these these are more likely to be understood without error and use common library code.

Regards,

Walter..

-- Walter Banks Byte Craft Limited

formatting link

are more likely to be recognized

--- news://freenews.netfront.net/ - complaints: snipped-for-privacy@netfront.net ---

Reply to
Walter Banks

This depends on your REs in the lexer and your grammar. From your other posts it seems you are at the stage where you can control the error message syntax and thus your grammar. This is a good thing. If you can put a lot of terminal symbols in your grammar you can control how much back tracking the grammar has to do. Back tracking in your lexer is a lot dependent on the longest match first rules and your REs.

For simple grammars it can sometimes be better to let the lexer be just a tokenizer and push some of the rules back into the grammar. Example your digit RE above. But bottom line is the number of messages you will have to process per unit of time. Unless you have a complex grammar I dont think your memory foot print is going to be much of a issue vs a hand coded parser.

--
Joe Chisolm
Marble Falls, Tx.
Reply to
Joe Chisolm

Well, in some cases, I have complete control over the grammar. In other cases, I am interfacing to existing protocols so I'm stuck with whatever was thrown together (usually with little concern for this sort of thing!) previously.

Currently, I'm experimenting with the lexer doing damn little at all! E.g., things like:

\r return CR \n return LF [folks seem to more readily relate to seeing "CR LF" in messages]

\040 return SP [eliminate the ambiguity of ' ' being a space or a coincidental tab]

[\200-\277] return NONASCII [8b clear channel can see bogus "characters" if UART misconfigured]

. return (int) yylex[0]; [push everything up to the formal grammar]

But, doing this (last rule) leaves me worrying: "lex exists for a reason; I've just made it redundant! :< "

I'm more concerned with rules that have long chains of terminals that can force lots of (consecutive) states into the FSM. I've not looked at the actual code generated, yet to see how well yacc "folds" those tables (implication analysis). Nor if it is clever enough to shrink the "other" tables to their minimum sizes (i.e., omit the "can't happen" state transitions).

For example, parsing a fixed width data field:

value: [1-9][0-9][0-9][0-9][0-9]'.'[0-9][0-9] | SP [1-9][0-9][0-9][0-9]'.'[0-9][0-9] | SP SP [1-9][0-9][0-9]'.'[0-9][0-9] | SP SP SP [1-9][0-9]'.'[0-9][0-9] | SP SP SP SP [1-9]'.'[0-9][0-9] | undefined

Note these sorts of rules are more expensive than a more typical:

value: [0-9]+.[0-9]+

As I say, lex exists for a reason. Which god am I angering by failing to pay it tribute? :-/

Reply to
D Yuniskis

Agreed as to "clarity", ease of maintenance, etc.

But, you can push some stuff into the lexer instead of putting it *all* in the yacc grammar. Roughly the same in terms of readability:

[1-9] return NONZERO; 0 return ZERO;

with:

digit: NONZERO | ZERO value: NONZERO DIGIT DIGIT DIGIT DIGIT '.' DIGIT DIGIT | SP NONZERO DIGIT DIGIT DIGIT '.' DIGIT DIGIT

(ad nauseum)

[this is admittedly a bad example]

My question regards the differences in the generated code and memory requirements (e.g., transition tables in the FSM) that are consequential to one approach over another.

(and, whether different lex/yacc implementations are better or worse than others)

Keep in mind, lex and yacc were intended to generate code that runs in resource rich environments...

Reply to
D Yuniskis

Hi Don,

Point of clarification: a lexer specification *is* a formal grammar ... only for a simpler language (in the Chomsky sense).

For one thing, using (f)lex, it won't work if you are using a yacc or bison parser above it. The parser expects the return value to be a numeric token identifier ... returning an arbitrary integer extracted from the input will cause a fatal error.

What you want to do is something like the following: [1-9][0-9]{2,4} { yylval = atoi(yytext); return VALUE; }

The token's value is passed separately through a variable called "yylval", which defaults to an integer but can be redefined to be a union of differing types.

lex/yacc is before my time, but using flex/bison you can redefine the default parser and lexer functions to include arbitrary parameters. I've used this ability to, e.g., parse out of memory buffers rather than from file streams.

In general there is no performance difference between extracting non-character data in the lexer vs extracting it at a higher level ... you'll make the same number of passes over it.

However, the lexer does not have to preserve input characters across calls to yylex(), so extraction at a higher level may mean making a copy of the input characters.

No ... your issue is outside the scope of either lexer or parser generation.

flex and bison are MAJOR improvements on lex and yacc, but the differences are optimizations of their internal DFA and PDA processes, support for programmer extensions, and for bison, support for handling more complex grammars: yacc is LALR, bison supports LALR, LR, SLR and GLR (depending on version).

George

Reply to
George Neuner

The lexer has to examine each and every character of input and there is overhead for each call to it. The more times the parsing code calls the lexing code, the slower the parse becomes.

Regardless of implementation, tokenization and lexical analysis of the input stream are the limiting factors for parser performance. In a typical optimizing compiler, lexing accounts for 70-80% of the running time while parsing accounts for less than 1%.

No they aren't. lex and yacc are products of the 1960's when the average program executing on a mainframe had 32KB of RAM. They were designed around a particular space/speed performance point that was acceptable to programmers and which the tools themselves could manage the limited space available.

Newer tools like flex and bison have other options. bison has even tighter table packing modes than yacc and can produce smaller parsers using the same grammars. But bison can also produce much faster parsers, or reentrant parsers, or handle more complex grammars, etc. Similarly, flex can make smaller lexers than lex or it can make lexers that run somewhat faster.

George

Reply to
George Neuner

Yes, but the grammars that lex can parse are quite simple. It's mainly used to do the "grunt work" (e.g., gobbling up groups of characters that the yacc grammar would eventually consider as "nonterminals" -- NUMBER, STRING, PATHNAME, etc.) that would consume lots of state transitions in the yacc FSM.

[I can't speak for flex...]

lex begins assigning token values at 256 -- conveniently so that you can pass any ASCII value (any 8 bit value, to be pedantic) directly through. E.g., "(int) yylex[0]" just takes the "current character" (matching "dot") and returns it AS IF it was a "token". (technically, the cast should be a bit more explicit to handle signed char's properly)

So, the rule I cited essentially turns lex into a raw pipe...

Yes, I understand. I was pointing out the difference in the lex+yacc approach or "dumb lex"+"do-it-all-in-yacc" approach.

I.e., the traditional approach would be to let lex do the grunt work of gathering up all the "digits", converting them to an integer (or whatever the grammar really wants), stuffing the value of that "integer" into u.int and reducing the character sequence to a single non-terminal: INTEGER (etc.).

By contrast, I am comparing the approach where the individual digits are passed directly through lex as tokens in their own right. I.e., '0' is the token having the value 0x30 (call it ZERO if you like); '1' is the token having the value 0x31 ("ONE"), etc.

Then, letting yacc directly operate on these "characters" (tokens) and fabricate it's own concept of "integer:" (vs. INTEGER).

So, my question is, how expensive does this get to be by doing that grunt work in yacc (instead of lex)? E.g., yacc has to track N different states (for the N digit "integer:") instead of responding to a single INTEGER token in *a* particular state.

[grrr... this is clear in my head but I seem to be having a tough time expressing it :< ]

Consider:

You can "scan" a 5 character sequence of digits with a counter and a single state (if char is digit, stay in this state and increment count; if char is not digit, move to someother state as defined by the grammar).

If, however, yacc "can't count", then it effectively counts a 5 character sequence of digit by:

state1: if digit, move to state 2, else state X state2: if digit, move to state 3, else state ... state3: if digit, move to state 4, else state ... state4: if digit, move to state 5, else state ... state5: if digit, move to state 6, else state ...

i.e., in state2, one digit has been accumulated; in state3, two digits have been accumulated; ... until, finally, in state6, five (consecutive) digits have been accumulated.

If each state requires a transition table with 256+ entries (for each of the possible stimuli -- tokens -- that can be received *in* that state), then having "state strings" like this gets expensive.

E.g., imagine a grammar that parses:

sentence: "The quick brown fox stepped on the lazy dog's nose"

entirely within yacc (I am guessing that implication analysis won't let you reduce that to anything less than strlen(...) discrete states!)

Yes, this is an inherited capability.

I'm concerned with the efficiency of the FSM generation. (consider the "sentence" above) And, how deep the pushdown stack gets to cover backing out of "wrong promotions" that are attempted as yacc encounters new tokens (which, in the case I am suggesting, are actually *characters*!)

The grammars are all relatively simple. Rather, I was interested in any differences in the automata they construct (for the previously illustrated examples).

I should just build some bogus grammars and tweek them to see how the tables change in size. Perhaps I can deduce the basic behavior of the FSM generator empirically (I suspect these things are too clever for that sort of simplistic experimentation)

[if I've still not managed to highlight the issue that is of interest to me, let me know and I'll try to come up with some real examples to *quantitatively* illustrate]
Reply to
D Yuniskis

The grammars lex can parse are a superset of regular expressions.

formatting link

lex adds to regex limited context sensitivity and limited ability to do counting. Both of these abilities are simple hacks, however, and neither is sufficient to promote the recognized language to the next Chomsky level.

Read (f)lex as shorthand for "flex or lex"

So does flex.

Yes, bison predefines 1..255 as tokens (I don't know about yacc). However zero is an illegal token - (f)lex returns zero to mean EOF. So you can't parse binary that includes zero by doing this.

I don't think anyone has measured this.

I would expect the __raw__ regex ((f)lex) and PDA (yacc/bison) state machines to be roughly comparable in performance. The main difference is in where and when any additional user code is executed.

For regex, the user code is always executed after the FSM has finished so there is no impact on recognition speed. If you try to recognize simple things in the PDA however, the intermediate rules may have to execute code: consider the difference between recognizing the word "integer" and recognizing an actual integer value:

%%

input : word { printf( "recognized word \"integer\"\n"); } | value { printf( "recognized value : %d\n", $1 ); } ;

word : 'i' 'n' 't' 'e' 'g' 'e' 'r';

value : value digit { $$ = ($1 * 10) + $2; } | digit ;

digit : '0' { $$ = 0; } | '1' { $$ = 1; } | '2' { $$ = 2; } | '3' { $$ = 3; } | '4' { $$ = 4; } | '5' { $$ = 5; } | '6' { $$ = 6; } | '7' { $$ = 7; } | '8' { $$ = 8; } | '9' { $$ = 9; } ;

%%

Accumulating any kind of value requires execution of user code. To accumulate an integer, you *must* construct a value for EACH digit individually and pass it up the chain.

So for example, the following won't work:

value : value digit { $$ = ($1 * 10) + ($2 - '0'); } | digit { $$ = $1 - '0'; } ;

because by default the current state identifier is passed up the chain ... that is, the subrule digit will receive some bogus internal yacc value instead of the input character.

Asa matter of fact, yacc CAN count - indirectly via the PDA's stack - which is why yacc can handle recursive grammars while lex cannot.

But I know what you mean.

(f)lex constructs a pure state machine. yacc/bison constructs a state machine with a stack. Aside from that you are dealing with how they pass user values around.

George

Reply to
George Neuner

Em 23/7/2010 05:47, D Yuniskis escreveu:

[snipped]

don,

Having arrived late in this thread, and understood that you're trying to use lex/yacc for a [comm] protocol, I wonder if in your searches did you consider using tools more closer to your problem at hand.

Something lighter than Estelle, for example?

--
Cesar Rabak
GNU/Linux User 52247.
 Click to see the full signature
Reply to
Cesar Rabak

Hard to consider anything "lighter" than lex/yacc (without resorting to ad-hoc approaches)

I've never used Estelle. But, it seems a lot more heavy-handed than simply specifying a grammar in lex/yac. I.e., it seems like it wants to extend *beyond* just the "interface specification" and deeper into the interactions between competing/cooperating processes that happen to be "communicating" (gee, how's that for a mouthful? :> )

I just want to create a simple template/framework that allows others to create similar interfaces just by tweeking an existing one (since most folks seem to go "deer-in-headlights" when faced with creating something "from scratch").

lex/yacc can do that. But, the question is, where various bits of the grammar should reside to most effectively use these capabilities (as with most things, there are lots of different ways to approach it WITHIN the lex/yacc framework; but, there are consequences to each approach!). Since I am anticipating others will copy/modify what I have done, there is extra incentive to "get it right" (lest *everyone* ends up repeating my mistakes!)

:-/

Reply to
D Yuniskis

If all you need is regex there is a simple preprocessing tool called RE2C that lets you embed the expression right into your C code. I used it a number of years ago and it worked quite well. The project has a new owner and supposedly there have been improvements since then.

formatting link

If you can use a C++ compiler (even for writing C) Boost Spirit is a lightweight library for regex and LL parsing. A number of Boost libraries are function based rather than object based and intermix easily with C code ... but I haven't used Spirit so I don't know which category it is in.

formatting link

George

Reply to
George Neuner

Yes, I understood your notation. Rather, I was commenting that I don't know if *flex* makes the same assumptions re: token "values" (starting at 256)

OK

I'm not worried about binary values -- just ASCII (I don't consider 'NUL' :> )

I was wondering if anyone had "looked under the hood" enough to be able to posit a guess.

lex seems to like to "write spaghetti code" (an overstatement but, contrasted to yacc's more rigid "tables", I think you know what I mean).

For the sorts of "traditional" tokens that it is meant to parse, I think lex is probably a win. E.g.,

[0-9]+ return INTEGER; [a-zA-Z]+ return WORD;

etc. yacc would probably not take a big hit, either, as it's just a state looping back to itself repeatedly. Even ad hoc approaches can be efficient with these. (i.e., lots of actions result in "same state" transitions so the total number of states is "fewer" than if you had to *count* "digits" or "letters")

OTOH, if your grammar has more rigidly defined tokens like:

[0-9]{4,4} return INTEGER;

or (gasp):

[1-9][0-9][0-9][0-9] | ' '[1-9][0-9][0-9] | ' ' ' '[1-9][0-9] | ' ' ' ' ' '[0-9]

then there are a lot *more* state transitions involved. Here, yacc's approach could get pricey (since you can't "cheat" and wrap lots of character sequences into a

*few* consecutive states like:

{' '}*[1-9]?[0-9]+

(which isn't a 1:1 replacement for the above)

I.e., for rigid message formats, you can end up with lots of sequential states, each of which defines the *few* expected "inputs" that will advance to the "next state" while many/most other "inputs" will signal an error (syntax).

Yes, but you could do:

integer: digit digit digit digit { $$ = ( (( (( ( ($1 - '0') * 10) + ($2 - '0')) * 10) + ($3 - '0')) * 10) + ($4 - '0')); }

(assuming I've counted my parens properly :-/ )

I'll have to build some toy grammars and move the promotions around between lex and yacc and see what the resulting code looks like. I'll try to get around to that in the next few days and post some examples

[currently preoccupied retiring a server and replacing another so I live in fear of "forgetting something" if distracted while in the process :-/ Unfortunately, I should have installed the Gb switch *before* doing this as moving that much data at 100M is *painfully* slow! :< Sheesh! To think back to the SneakerNet days... :-/ ]
Reply to
D Yuniskis

Agreed. This suggests letting lex gobble up as much input as it can (reasonably).

OTOH, the parser has more knowledge of what's going on than the lexer. Once invoked, the lexer will gladly process lots of input... THAT THE PARSER MIGHT CONSIDER AS "INAPPROPRIATE" (given context)!

It's this that I think will force me to move everything into yacc, regardless of "efficiency". Thinking aloud: if context suggest to yacc that is *required*, lex can merrily encounter in the input and patiently gobble up as many [A-Za-z]+ as happen to be present -- EVEN THOUGH "return STRING" will cause yacc to signal an error!

If, OTOH, lex simply passed each character (in essence) to yacc as discrete tokens and let yacc promote each accordingly based on the rules of the grammar, then yacc can detect invalid "messages" earlier!

Recall, the goal here is to parse messages, not "text files" (e.g., "source code"). And, that processing is not "off-line"/batch oriented. I.e., lex's "input()" will be pulling characters off of a live input stream.

That input stream is not "received" in nice clean units (i.e., it's not like a text file where full lines can be extracted at a time). E.g., it could *pause* "mid-message" for an indeterminate amount of time (subject to the communication protocol in place).

As a (trivial) example, consider a grammar that supports two message types:

"Hello, my name is " | " "

(where and are what you suspect them to be)

[granted, a silly example but should suffice to illustrate my point]

If the parser has processed "Hello, my name is ", then it is obviously awaiting . If, OTOH, the input contains "1234567.8" and has *stalled* at that point, then having lex process as "[0-9]+'.'[0-9]*" means lex will dutifully hang waiting the next character (after the '8') to see if it can fold that into it's rule. Only when *lex* has decided where that token's boundary lies will it pass NUMBER to yacc.

OTOH, if the "1" had been passed to yacc, it would have already decided that "Hello, my name is 1..." is not a valid sentence in the grammar and signalled an error. (by contrast, if it has to wait for lex -- and lex has to wait for a *paused* input stream -- then the error is not detected until well after the fact!)

So, it looks like lex can only do context independant promotions *if* I want to keep the error path expedited.

But, they had secondary storage available, could make multiple passes over the input, weren't time constrained, etc. I.e., they could concentrate on *just* that one aspect.

The details of their respective (and relative) automata are needed to sort out just how efficient each *might* be.

Reply to
D Yuniskis

That's not a convincing argument.

First of all, every successful non-human language and protocol is context free (in the Chomsky sense). "Context free" does not mean the parser works without knowledge of its surroundings ... it means that the syntax is such that any legal "sentence" can have only one possible interpretation.

Secondly, to be efficient the parser needs to work at a higher level than input characters. Lexers were created specifically so that parsers can work with abstract idea "tokens" rather than strings.

That's right. But if the surroundings (let's avoid the word "context" to prevent Chomsky based misunderstandings) dictate that is required, then it would be a syntax error if the lexer found a string.

But at the expense of a huge increase in grammar complexity and correspondingly larger parsing code, increased processing per input character and possibly a dead programmer who, in deep despair, blew his own brains out before finishing the job because he didn't realize what he was getting himself into.

Live input doesn't matter in any theoretical sense. As a practical matter, however, if you must provide real time feedback you simply create an input driven "push" parser rather than an expectation driven "pull" parser. In a push parser, the lexer drives parsing, throwing tokens at the parser which accumulates them until they make sense. You'll get your error at the first "word" that doesn't fit into any of the possible "sentences" in the match set.

yacc/lex can't create push parsers, but bison/flex and some other modern tools like ANTLR and Yacc++ can.

That's as it should be. The "words" of a "sentence" have to be delineated. Parsing 1492 as "I've got '149' so far and I'm stuck waiting for more input" is semantically wrong. Until you see something that doesn't fit into the assumptions of the current match set, you *cannot* know that you've matched anything.

Honestly, it doesn't matter whether you are matching characters or abstract tokens. Whenever matching is applied over sequences, the matching *process* itself is the limitation on what's possible.

You're going to have to take my word for it that you likely will be unable to write a non-trivial grammar that can to do that. Go ahead and try, but if you blow your brains out don't say I didn't warn you.

My advice is to investigate push parsers instead.

(f)lex has a feature that can help: start states (flex calls them start "conditions"). Normally all the individual regex patterns are coalesced into one giant DFA which matches input against all of them simultaneously and stops when it has a match against any of them. Start states group related patterns into sub-DFAs which are only matched against when the lexer is in their related state.

Flex is more flexible 8) than lex in that it allows the same pattern to belong to more than one state group (in lex you have to repeat the pattern with a different state prefix). State changes are entirely handled by user code, so, for example, the state could be a parameter passed in by the parser.

Overall flex is a much better tool than lex. It is *much* more flexible in the way it interfaces to the parser and in how it handles input. Unlike lex, there are well documented hooks into most of the core functions. Flex lexers are intended to be customizable. Flex also can create measurably faster lexers.

George

Reply to
George Neuner

Sorry, I meant "context" in the sense that it is used in defining syntax (e.g., "left context", "right context") of symbols. E.g., may be a valid token in

*some* portions of a particular grammar but would never be valid with a left context of "GET".

But wasn't *efficiency* the whole point of my query? :>

I.e., I am already acknowledging this (potential) "hit" and trying to gauge how big it will be.

OTOH, eliminating the lexer results in a savings, there.

As always, the devil is in the details (of the grammar, etc.)

But, if is something like [a-zA-Z]+, then the lexer can sit there gobbling an indeterminate number of characters even though the *first* such character would be enough for the parser to realize a syntax error has occurred!

I don't see that. You're thinking in terms of languages, not communication protocols. E.g., you could parse SMTP/NNTP/etc. messages without a lexer and without unduly complicating the parser's grammar. The question is, how much cost in terms of performance (space+time) does such an approach incur (?). Note that SMTP/NNTP/etc. don't have the same issues that I'm addressing so this approach is needlessly complex, there (e.g., who cares if the SMPT client/server gobbles up a

200 character "string" as and *later* (algorithmically) determines it to be invalid *as* an email address).

Gries' top-down vs. bottom up.

The goal here is to stick with lex and yacc (with some *possible* allowances for flex/bison). Otherwise, it would be easier to just write some tight "hand-crafted" parsers and document how they could be modified for other, similar grammars.

But that is not consistent with the goal of detecting errors *early*.

Unlike pulling text from a file (foo.c, etc.), this is a real-time application -- there is information to be gleaned from the temporal aspects of data stream. E.g., if characters are to arrive over a serial link at a particular rate, then they *must* arrive at that rate... waiting indefinitely ignores an error condition. Likewise, waiting some amount of time "per character" allows a trickle of characters to keep tickling the timeout (thus avoiding a temporal error indication) yet

*still* satisfying an inappropriate (for the current "surroundings") token for the particular grammar.

Lex and yacc weren't written for the use to which I am applying them. But, there are enough hooks in their implementations that I think I *can* trick them into doing what I want within my design constraints (I have to look more closely at how to throttle their stack usage, etc.).

I must be missing something. :< I threw together some simple grammars ( ?) and got yacc to perform as I had expected. I was unhappy with the footprint but haven't yet looked to see how much initialization code is dragged in by default, how I can tweek the size of the pushdown stack, etc. (but I suspect there is a way to adjust those things).

Moving recognition of the individual tokens into a lexer front end only marginally cleans up the grammars. And, adds in the "uncertainty" that I mentioned above. It also adds more of those "configuration" items that I'll have to chase down (to more closely "fit" the lexer -- to whatever extent it *is* used -- to the particulars of the grammar in question)

Won't help. Unless I move parts of the grammar specification into this realm just as a "hack".

The advantage of lex/yacc is that it would let developers "simply" formalize the interface grammars and leave the code writing to lex/yacc. Cleaner implementation that should be a lot easier to validate (instead of waking through a hand-crafted parser verifying that each path in the syntax tree is handled correctly).

I'll be traveling for the next few weeks. I'll try to tabulate the results I got from the first tests I ran (along with the grammars) so you can see the issues and their resolutions in the two approaches (lex vs. yacc).

Thx,

--don

Reply to
D Yuniskis

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.