Parser, again!

J

jmariano 12 years ago

Dear All,

I'm in need of some advice! (not the full solution). A word of caution: i'm not an computer science guy, so probably i'm not using the correct therms!

I'm a part-time prototype developer at my university. My latest project is a box with a microcontroller that measures and actuates on stuff. The box i s under the command of a PC, using RS232 or USB, in a master-slave model, t he PC being the master. I want to use a message based command language, sim ilar to SCPI, but not so complicated (no tree structure). I was thinking in something like START, STOP SETADC 1000, REAADC 1, etc.

I have to define the syntax and program the parser (in C) on the uc side. S ince I don't have very strict specification on the syntax, I can define it in such a way that makes it more easy to analyse, or more robust or etc.

So, my questions are:

1 - Regarding the language definition: Are there god examples of such langu age that I can get inspiration from? And references? I'm sure someone as al ready thought about this in a formal way. I'm looking for practical advice like shall I use fixed length commands (6 character, for example), start-of

-message character (#)? Why? And the arguments, separated by commas, spaces ? etc.

2 - Regarding the parser: Is it really a parser that I need or is it someth ing else? Where can I read about this? I just don't what to read the full d ragon book just to get to the conclusion that it was the wrong book!

Any thoughts are welcome.

Regards

Mariano

Vote

D

dp 12 years ago

Here is a protocol I defined some 5-10 years ago to talk to a slave; it was meant for machine use (i.e. commands do not have to sound "natural language" like but they were still ASCII so one could talk/debug etc. using a terminal. Might be a useful reading, pretty short I hope.

formatting link

Dimiter

------------------------------------------------------ Dimiter Popoff, TGI

formatting link

------------------------------------------------------

formatting link

Vote

M

Mel Wilson 12 years ago

jmariano wrote: [ ... ]

[ ... ]

I've got a dead-simple "language" that I've used a few times in uC systems. It can accumulate a single numeric value from successive input digits, and apply postfix operators to that value. A simple example might code START and STOP as A and Z, for instance, and SET and READ as S and R, then a command sequence could be

A1000S1RZ

Usually I use "!" as a reset that clears any accumulated value that might be left over from anything that may have happened before; whitespace characters are explicit no-ops in case people are typing this stuff in. Generally speaking an operator using the numeric value consumes it, and zeros it so the next value can be accumulated.

It's dead simple to code the interpreter, and it's dead simple to create these command strings in a PC-based program. As long as the actions you have to perform are equally simple.

Mel.

Vote

L

Les Cargill 12 years ago

So this is a rough sketch of how I'd attack this. I may not even use this style ( bracing and what not ) as production code, but it's shortened to fit in a Usenet post. I have not run it thru a compiler.

enum { OK, WTF, RANGE } errcode; // almost certainly incorrect

int procSETADC(const char *cmd) { // parse out the argument

const int LARGEST = 4200; const int SMALLEST = 12; const char SETADC[] = "SETADC "; const char rhs = strstr(cmd,SETADC);

if (rhs==NULL) return WTF;

char *converter = rhs+strlen(SETADC);

if (!isdigit(*converter)) return WTF;

int number = atoi(converter);

if (number < SMALLEST) return RANGE; if (number > LARGEST) return RANGE; // Now actually write the ADC, with the error checking // and the validation and the glaivin... ... return OK; }

int procSTART(const char *cmd) { ... }

typedef struct { char *command; int (*callback)(const char *command); } msgentry;

const static msgentry table[] = { { "START ", procSTART }, { "SETADC ", procSETADC }, ... };

static const int TABLESIZE = ( ( sizeof(table) ) / (sizeof(table[0])) );

int eval(char *cmdstr) { int i; for (i=0;i 2 - Regarding the parser: Is it really a parser that I need or is it

Les Cargill

Vote

V

Vladimir Vassilevsky 12 years ago

MODBUS protocol?

Vladimir Vassilevsky DSP and Mixed Signal Designs

formatting link

Vote

R

Robert Wessel 12 years ago

Assuming you control both devices, and your commands and responses are as simple as you're describing, don't over think it. You don't want to reinvent XML here.

You can make things pretty simple. Make a command a line, delimited by a CR (makes it easy to type these from a terminal emulator for testing), and the command is then everything from one CR to the next. Set a reasonable limit on the length of commands (and responses). Make the command start in the first column, then a space, then comma delimited parameters. Allow both numbers and quoted strings. Set a reasonable limit on the number of parameters. Parsing that is pretty simple; remember to allow escapes for quotes if you want those in your strings.

Keep the command handling fairly ad-hoc, each handler routine would start by verifying the number and type of received parameters, then validating that parameters themselves, then execute the action, and format the response.

Make the responses simple too. Start with a numeric result code ("0" = OK, anything else is an error of some sort), followed by a space, and any returned values (again, numbers and strings, comma separated). Again, minimal work to parse.

If it's an error, you can leave the rest of the line as descriptive text.

Absolutely include a version command that returns some an identification string and a version code for the communications protocol/command set the device supports. The controlling PC should use that at startup to adapt to the device (or refuse to communicate if it doesn't know the protocol version).

Be somewhat strict in what you accept (on both sides), and don't worry too much.

Now if you need high performance, complicated commands or responses, really long commands/responses, high enough reliability that you need to detect transmission errors, you need authentication, etc., you'd probably want to be more sophisticated than that.

Vote

T

Tom Gardner 12 years ago

On trap that I fell into myself once, and have seen other people fall into /many/ times is: - it is a very limited requirement, just a very few "peek/poke" commands - but we don't know everything at the outset, so we'll put in hooks to add new commands Let's implement it as nothing more than a set of macros

- oh, we need to have arguments evaluated - it would be much neater if we had if-then-else - and loops

At which point the very limited requirements have mutated to produce something that a bastard language that grows like Topsy

If that is a possibility, it is probably much cleaner simpler, faster (speed and soon) to embed a Forth interpreter from the outset.

Yes, I know the XP/agile fraternity will frown on that. Tough; some of that brigade doesn't know their limits!

Alternatively, I'm merely reminding you of the aphorism "Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp."

Vote

T

Tom Gardner 12 years ago

One trap that I fell into myself once, and have seen other people fall into /many/ times is: - it is a very limited requirement, just a very few "peek/poke" commands - but we can't define everything at the outset, so we'll put in hooks to add new commands Let's implement it as nothing more than a set of macros

- oh, we need to have arguments evaluated - it would be much neater if we had if-then-else - and loops

At which point the very limited requirements have mutated to produce a bastard language that grows like Topsy especially the last pararaph.

If that is a possibility, it is probably much cleaner simpler, faster (i.e. execution speed and delivery date) to embed a Forth interpreter from the outset.

Yes, I know the XP/agile fraternity will frown on that. Tough; some of that brigade doesn't know their limits!

Alternatively, I'm merely reminding you of the aphorism "Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp."

Vote

U

upsidedown 12 years ago

There are advantages of also having a separate character for the start of message indicator, such as SOH, "!" exlamation mark colon etc. A line in which the other end is just powered up (transient) or a line being idle for a long time can collect garbage characters, which can be easily ignored until the next valid start character (Modbus RTU is an example that violates this principle).

If CR is used as a message terminator, an additional LF can be appended after a valid message to tidy up the display on a terminal emulator, but this does not harm the decoding of the next message, since everything (including LF) is ignored until the start character.

With manual command entry, there is an issue with BCC/CRC, since it is quite hard to generate those manually in the fly, there should be a way to switch this feature on and off either by a separate command or specifying that some specific CRC value such as 00 or FFh means that CRC check should not be performed (compare with UDP header CRCs).

I have used one (or more) spaces and/or tabs as token separators to help readability. On the receiving side, any message function code or positional parameter are easily separated to tokens (e.g. null terminated strings) after handling the escape sequences if needed for strings with spaces.

Since the CPU power is not going to be an issue, after splitting the message into token strings, try to decode each token into a decimal/hexadecimal value regardless it is needed or not.

Then perform a table search for supported function codes, such as "W" (Write), "RD" (ReaD) or "ACK" to get a function code index and make a branch table. Once the function and hence required parameters are known, just get the predecoded numeric parameters with a single assignments for each parameter.

If you are really low of RAM (but with plenty of ROM) i.e. can't accomodate a 80 - 255 character command/message line, you have to do the parsing on a token by token basis.

In the worst case with _very_ little RAM, you might even have do the ASCII numeric (or hexadecimal) to integer conversion for each received numeric digit in the receiver interrupt service. However, this requires a quite complex state machine and the error recovery might be quite nasty.

Depending on the capabilities available on the slave device, the available solutions might be quite different.

While in the past devices communicated at very low data rates such as

110, 300 or 1200 bit/s, the absolute minimum capability these days seems to be 9600 bit/s (1 byte/ms), while 115k2 (12 bytes/ms) seems to be normal with receiver multibyte FIFOs, Thus, there is not much point of trying to optimize message frame sized with binary (or even compressed binary) communication for ad-hoc protocols, just use ASCII protocol, which simplifies development and helps problem solving with simple tools such as terminal emulators.

Since the PC software is easy to update, but the slave firmware is not, the master must know how to handle those old PLCs.

In the real world, you need to be quite strict what you send, but you should expect any interpretation (accordng to standars) what the other partners wants to tell you. In many standards, read carefully which features are designated as Mandatory/Optional.

Vote

T

Tauno Voipio 12 years ago

Before he start on it - DO NOT!

Modbus has been designed with far too many networking blunders.

-Tauno Voipio

Vote

V

Vladimir Vassilevsky 12 years ago

What do you think is so wrong with MODBUS ?

Vladimir Vassilevsky DSP and Mixed Signal Designs

formatting link

Vote

U

upsidedown 12 years ago

Modbus ASCII is OK (start character, end characters) as well as Modbus/TCP is OK (frame size in header) but Modbus RTU is not, since it depends on very critical timing (which is a no no for any PC based systems).

While you might be able to work around the timing issues on RTU master (but not as a terminal emulator), things gets quite ugly on the slave side, especially in multidrop circuits.

Vote

D

Daniel Murphy 12 years ago

I second this, a fairly simple forth interpreter can create a very powerful command language. I've used one inspired by jonesforth on avr chips.

jonesforth:

formatting link

Vote

T

Tauno Voipio 12 years ago

Agreed. The problem is in finding the framing in the binary Modbus RTU format. The message boundaries should be findable without parsing the whole message. This breaks the protocol layering badly and complicates the data link layer code. The addition of timing constraints to framing is poison to sensible line drivers, e.g. a FIFO -buffered interface cannot be used, as it destroys the inter-character timing information. The same applies to delays caused by the other tasks of the operating system. I have not yet met a Modbus handler in a PC which did obey the timing constraints.

Modbus/TCP is wrong because it uses TCP. The Modbus messages are, by definition, datagram messages, and the only correct transport in the TCP/IP suite is UDP. Using TCP ports the frame boundary problem to the receiving program. Contrary to popular belief, TCP does not preserve record boundaries. The only transport guarantee is that all octets sent will arrive in the same order as sent, but they may be packed to entirely different set of TCP segments.

-T.

Vote

S

Stephen Pelc 12 years ago

There are plenty of Forth interpreters available in C, including a commercial one from MPE.

Stephen

Stephen Pelc, stephenXXX@mpeforth.com MicroProcessor Engineering Ltd - More Real, Less Time 133 Hill Lane, Southampton SO15 5AF, England tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691 web: http://www.mpeforth.com - free VFX Forth downloads

Vote

L

Les Cargill 12 years ago

I don't believe the OP's topology is multidrop. It's possible to have RS232 multidrop, but it's not advisable - if you must have multidrop, use 485 or 422. There are lots of converters available.

I still think a simple ASCII protocol is easier than any of those. MODBUS is for when you're interfacing devices made by two seperate teams or vendors and there would be gain from having a standard; if you own both ends, a simple ASCII protocol is easier.

The OP also mentioned SCPI; that's less like MODBUS.

Les Cargill

Vote

L

Les Cargill 12 years ago

Great idea, although it depends on how much trouble that is to actually do in cases.

I'd think Agile would *support* that, since it should be easier to test.

Les Cargill

Vote

T

Tom Gardner 12 years ago

It tends to go against the holy commandments of "don't do big up-front design" to "do the simplest thing", because "you can always refactor it later".

Too many XPers/agilistas, IMNSHO, treat XP/agile as a religion, i.e. something in which there are (12 IIRC) Commandments To Be Obeyed in order that the magic recipe works.

That's ridiculous, of course, since XP/agile is immensely valuable when deployed intelligently in appropriate circumstances.

I've even seen Them strip out all comments from a well-documented library on the /principal/ that "comments get out of sync with code" and "good code doesn't need comments". Often valid, but not when the comments describe the subtleties of /why/ the library is implemented that way and /how/ to use it.

Vote

D

Don Y 12 years ago

I.e., the "commands" are going to be algorithmically generated. Not subject to the sorts of errors humans would make!

For example, you don't (??) have to support the possibiliy of a human typing "STAG" (instead of "STAR"), followed by a backspace or delete character (to "erase the erroneous preceding 'G') and then "RT" (for "START")

Does the message stream need to be recognizable by mere mortals? I.e., if USB connected, is anyone snooping the line to see that you sent "START" and not "GO" or "XZ", for that matter?

What sort of integrity do you require in the communications? I.e., are you likely to encounter transmission errors (noise) that you would like to safeguard against? "LAUNCH THE ROCKET" (NO!!! I meant "LUNCH AT THE ROCKET CAFE!")

How large is the command set? If you just have a handful of commands (message formats), why not just pick a single letter for each command? If possible, relate them to natural language terms IF YOU ARE OBSESSIVE ABOUT THESE SORTS OF THINGS (i.e., so you *could* examine a message "unaided" and suss out what it means). Otherwise, just pick letters that are far enough apart to not easily be "corrupted" (e.g., if you have the baudrate, character size, etc. set incorrectly on the EIA232 version of your device AND/OR encounter transmission errors -- unduly long serial cables, etc)

Then, the message can be something like: and "parsing" can be purely ad hoc: switch (command) { case 'A': // parse in the context that A requires // do whatever A means case 'B': // parse in the context that B requires // do whatever B means default: panic(); }

God's got nothing to do with it... :>

Depends. Do your commands all seem to need "more or less" 6 charcters to encode their intent? If one need 3 and another needs 8, then there may still be advantages to packing everything into an 8 character frame.

If, for example, many messages have similar argument requirements, then you could factor out these requirement and have a first pass deliver , and (for example) for interpretation by each "action routine"

You either have an end of message, start of message or both. Otherwise, you have no way of putting the contents of the message in context.

What 3 digit numbers am I trying to send from A to B in the following

*series* of messages: 12830395834953 Are you sure the first one is "128" and not actually "283" with the "1" being the last digit of the preceding message (that you didn't see because the cable wasn't yet attached, because the first two digits of that message had parity errors, because...

Read a book that has no capitalization or punctuation and see how much sense you can make out of it WITHOUT BACKING UP!

Who's your audience? If its just a machine, is there enough information present without these extra delimiters to RELIABLY understand the content of the message? E.g., if you always send numerics as fixed width fields (with leading and/or trailing zeroes), then 00000123 can reliably be interpreted as:

00, 0, 001, 23 without the benefit of additional delimiters!

IME, when talking between machines, you want as rigid a structure as possible so the recipient can reassure itself that nothing has been "lost in transmission". OTOH, when interacting with people, you tend to want convenience -- don't force me to type six digits for a single value if they're all going to be 0!

Vote

L

Les Cargill 12 years ago

How is Forth anything *but* the simplest thing? My own uncertainty is simply the actions necessary to install the interpreter for a given environment - the various ones linked to look pretty easy to use.

This being said, if you just have a simple line-oriented thing, a table-driven "parser" is pretty easy. I posted some actual partial-code in the thread...

Forth is *extremely* hackerish, and therefore ( IMO ) consonant with XP. Flexibility ot the goal, right?

We can't help that; what my takeaway from Agile/XP is test-first, in cases pairs and otherwise eschewing big waterfall.

All programming paradigms uncover the latent "Spanish Inquisitor" in people.

Of course. I'd say I first used it in the late '80s, although it wasn't called that. We got stuff done.

Not if you have good shop culture to support comments. reviews help, too - if you can manage them well.

Les Cargill

Vote

Parser, again!

Join the Discussion

Didn't find your answer?