Command language parsing - how formal to get?

Question

Hi:Having completed enough serial driver code for a TMS320F2812 microcontroller to talk to a terminal, I am now trying different approaches to command interpretation.I have a very simple command set consisting of several single letter commands which take no arguments.  A few additional single letter commands take arguments:Command language (somewhat generalized):command  argument(s)     descriptiona                        do action_ab                        do action_bc                        do action_c                          ...n                        do action_n                          ...y        [var_name]      display variable(s)z        var_name value  set variable to valuewhere var_name is a variable identifier.  The 'd' command lets one display the value of a variable identified by var_name.  If no var_name is provide, this command will display all variables.  The 'e' command lets one set the value of the variable identified by var_name to value.Presently I am using the simplest yet least flexible approach to parse commands.  I scan and lexically analyze on the fly, and use if, else if, else logic blocks to parse.  (I know, pathetic.)  This approach requires one if, else if, else block for each field of the command language.Next I plan to split the lexical analysis to a first-pass, and create a list of tokens, with additional processing such as converting keywords and variable names to keys, and numeric text fields to machine numbers.     This will eliminate the scanning/lexing on the fly to make the parsing code much more readable.Then a second pass of parsing could still be done using if, else if, else it seems in the case that a language has a fixed and relatively small maximum number of fields per command.The question is then, for a language similar to the above and considering these factors:1. need to implement in an embedded microcontroller in not more than a few K of program memory2. not a particularly high speed of parsing is required. ...

Jonathan Kirwan · Accepted Answer

Hi, Chris.  Just a few comments.(1)I wouldn't design the commands and queries as you did.  I like therecommendations of IEEE-488-2.  One of them is to use a ? for queriesand to reply to such queries with a command that can also be parsed.For example, in your above list, you list 'y' and 'z' for reading andsetting variable values.  The recommended procedure would be to usethe same text for both actions, but in this way:  var? foowhich would cause the following reply:  var foo 5.3and you'd say:  var foo 7to change it to 7.The query form is just a ? added to the end of a command form, inother words.It also recommends that you implement a single, simple command thatdumps out the current state.  That state should be listed out in sucha way that if the receiver of it simply sent the entire body straightback to your device, it would have the effect of restoring the state.So if you had three variables that represented all of the importantstate in your device, call them S1, S2, and S3, then...

Robert Adsett · Answer

For a simple human (as opposed to machine) interface I've often used a simple lookup table composed of command name/function pointer pairs. Simple brute force lookup, and you can add a simple help string for each command. The functions themselves were responsible for parsing any arguments.

Simple but limited.

Robert

Chris Carlen · Answer

>>[edit] >>[edit] >Thanks for the input.Interesting suggestion.  I suppose since I have 0 experience with designing command languages, most suggestinos will seem interesting :-)Actually, I have taken inspiration for this command set from DOS debug.     I'm also writing a hex line editor in Python (as a Python learning exercise) which also uses this single letter business.I just downloaded IEEE-488.2  Good grief!  I have often shyed away from standards since they are always long and agonizing to read.  Would take months just to get familiar with this.  So I have no interest in making the gadget fully 488 compliant.  It doesn't appear that you were implying that I should.I had thought about becoming familiar with SCPI to see if it would be applicable to this project as well.  It just all seems way too complicated.This is pretty cool.commandname will get filled with a command keyword by parsecmd()?parsecmd() is basically a lexer which extracts the next word ?Lookup and execute a function...

Gene · Answer

Don't apologize! Table-driven methods are nearly always easier to understand and maintain than equivalent "hard code." And they avoid redundancy (such as the comparisons in an elsif chain) that you can't afford in a scarce memory environment. Thus all the old assemblers and BIOS interfaces are implemented that way. Even if you choose a more sophisticated command parsing method than table search, table- oriented implementations of a DFAs and parsers are among the most compact, and yet they are quite fast.

Steve at fivetrees · Answer

Instead of a function pointer, think in terms of an enum of commands, a structure (class, if you prefer) defining all the properties of the command, and a table of such (constant) structures. One structure element could indeed be a handler function, called with an enum reason value, decoded via a switch within the handler (query vs command etc). Others could define the number of operands and their types (handlers) etc...

As for keeping state while handling other tasks: JK made it sound complex ;). A state machine makes it trivial to pick up the position when a character comes in, and parse the complete line on CR/LF (etc).

Steve

formatting link

Jonathan Kirwan · Answer

On this point, I agree that state machines aren't hard to do when you have a complete specification. They can be rather hard to document inside the code, though.

As an example where I am really glad to use state machines would be for driving things where I have clock edges to drive in software with special timing of the signals relative to the clock edges. Especially in cases like this because the hardware is usually well specified and not likely to change in the future -- except in very predictable ways (increased addressing width in a serial protocol, for example.)

In parsing cases, it is much more natural to use a recursive descent approach and the co-routine/cooperative switch mechanism fits the paradigm much better in my opinion. You often go around changing parsing in ways that would make maintaining a state machine a bit tricky. This is one reason why there are compiler-compilers, in fact, to automate the construction of state machines for parsing. If you are doing the parsing and execution coding by hand, though, and not with some tool to automate the generation of a state machine, I would rather use a cooperative hand-off mechanism to keep the parsing code simple and straightforward.

Jon

Mark Borgerson · Answer

I find it useful to buffer the input line, then activate the parser when the CR/LF is received. This allows you to support the backspace key for users who are not perfect typists.

I used a structure to hold the command characters (I used 2-letter commands), a pointer to the command function, and the number and type of parameters.

Mark Borgerson

CC · Answer

Yeah.  I have made a function:int SCI_gets(char *str, int count);which behaves much like fgets(), except that a global enum:typedef enum {NO_ECHO=0, ECHO=1} SCIgetsEchoModes;static SCIgetsEchoModes Gets_echo_mode = ECHO;control whether it echos.  It also handles backspace, and interprets any of '', '', or '' as EOL.It will return a SCI read error passed up from the underlying SCI_getc(), or an EOF if nothing was available when first called.  It blocks however, once one char has been read.  It also echos non-printing characters as '?' with red coloring on ANSI compliant terminals.  Just plain '?' if a #define is set to NON_ANSI.So I plan to parse only buffered, terminated strings.-- _____________________Christopher R. CarlenSuSE 9.1 Linux 2.6.5

Robert Adsett · Answer

I'm sorry I didn't mean to apologize :)Really, I was just mentioning a simple, quick method.  I was under the impression Chris was really looking for something with a little more sophisticated syntax checking than a strcmp loop.Speaking of which a friend of mine implemented a simple command interpreter on a PDP (actually a clone I think) in fortran that used two characters in the command.  The characters were in a 6 bit encoding and two of them formed a 12 bit word that was used in a command lookup.  Apparently you can get quite fast with two character commands if you don't have to worry about things like whitespace and other syntactic fluff.Robert-- Posted via a free Usenet account from

John Larkin · Answer

It's cleaner to do a table-driven thing, rather than a bunch of case statements. With a single letter per command, you can just index to 26 small table entries, each with a handler address, maybe a pointer to a variable, and maybe a flag word.

I recently did a parser with command lines like...

ADelay 12.5n; TRigger REmote; FIre

where each command is a 2-char token (more chars are allowed but ignored), a token plus an arg, or two tokens. All table driven, with maybe 140 commands. The parser proper is about 100 lines of assembly code. It just packs the command chars into a longword as "AD " or "TRRE" or "FI ", looks it up in the table, and dispatches to the named handler.

XTABLE:

; TOKEN1 TOKEN2 HANDLER PARAM

.WORD "US", 0, XLONG, FUSE ; USec timer .WORD "UW", 0, QWORD, FUSE+2 ; USec timer, as a word! .WORD "WA", 0, XWAIT, 0 ; WAit nnn microseconds .WORD "FI", 0, XFIRE, FIRE ; FIre .WORD "GA", "FI", XGFIR, FBFIRE ; GAte FIre .WORD "FE", 0, XFEOD, FEOD ; FEod .WORD "CO", 0, XOK, 0 ; COmment!

.WORD "AD", 0, XTSET, CCBA+CDHI ; ADelay 123.45N .WORD "AW", 0, XTSET, CCBA+CWHI ; AWidth 123.45N .WORD "AS", "ON", XSON, PCBA ; ASet ON .WORD "AS", "OF", XSOF, PCBA ; ASet OFf .WORD "AS", "PO", XSPOS, PCBA ; ASet POs .WORD "AS", "NE", XSNEG, PCBA ; ASet NEg .WORD "AS", 0, XAINQ, CCBA ; ASet inquiry .WORD "AP", 0, XAINQ, PCBA ; APend pending inquiry

John

Stefan Reuther · Answer

Hi there,Chris Carlen wrote:The parsing style I choose also depends on how robust (against usersthat mistype commands) it has to be. For a general-purpose, shell-stylecommand language, I usually split the input into an argv-style array andhand that to an interpreter function. This allows you to implement auniform way of quoting arguments, and use the usual idioms and standardfunctions (such as getopt()) to parse it.For things that have irregular syntax, I often use sscanf for parsing:    if (sscanf(input, "#ifdef %31s", &variable) == 1)        handleIfdef();or    if (sscanf(option, "--size=%i%c", &size, &dummy) == 1)        ...If you already use sscanf in your program, this may be a nicelow-overhead way of parsing, otherwise it's probably quite expensive.It's a little harder to get sensible error messages from the scanfsolution (mine would always reply 'unknown option' if you say'--size=X', not '--size wants a number as argument'), but for adebugging shell or internal tools I don't...

Command language parsing - how formal to get?

Join the Discussion

Didn't find your answer?