'Favourite' methods for scanning/parsing input data?

Hi,

I was reading an interesting article on Embedded.com about using Lex & Yacc in embedded applications. Here's the URL (please don't think I'm being patronising to the regulars - I just thought including it may be useful)

formatting link

It just started me wondering - are there any 'pet' methods that you'd like to share?

Regards, Mike

Reply to
Mike G
Loading thread data ...

I use a loop, reading characters with all error handling (including timeouts) in one place, and a state machine that effectively maps the syntax structure of the data being read, character by character.

As we've discussed here before, reading a block of data and casting it to a packet structure is a really bad idea. Yet it seems quite common.

Steve

formatting link

Reply to
Steve at fivetrees

String compare in assy.

Reply to
Jim Stewart

If I'm designing protocols myself for embedded, I prefer to make the command side of things fixed-length binary packets, with data (of length described in the command packet) following. Sometimes you can't get away with that though ;)

A halfway house between text and binary that's a useful "cheat" for small homebrewed protocols that still need to be vaguely readable is using short (1, 2 or 4 octet) fixed-length commands, then using their representation as 8, 16 or 32-bit integers as cases in a switch statement or entries in a hash table of functions to dispatch to...

In a PPOE we decided against lex for text-based parsers and rolled our own very tight lexer (there were only really four types of token it needed to recognise, keyword, number, identifier and quoted string) and interfaced that to a Bison-generated parser. The same lexer has been reused across many different projects, and did a decent enough job for us!

pete

--
pete@fenelon.com "I once coaxed a dog into a library" - Tommy Saxondale
Reply to
Pete Fenelon

The code Generated by Lex & Yacc requires a lot of RAM, which is a problem on most MCUs. Even for quite simple things it easily requires more than 64K of RAM. Parsifal Soft had a program called Anagram that could generate a state machine based parser that could easily run on small MCUs. Unfortunately the owner died, and it has been impossible to get hold of a copy ever since. A very good example of why open source is a good idea. Anybody have a copy of Anagram they would be willing to sell ?

Regards Anton Erasmus

Reply to
Anton Erasmus

Other than all of the compiler portability issues, why don't you like casting to a packet structure after the full packet has been received?

Curious, Bob

Reply to
BobH

I'm glad you asked ;).

Let's assume the comms line (ignore TCP/IP for the moment) is slightly flakey, and one byte gets lost (perhaps a parity error) along the way. Consider what the effect would be on a packet digester. Also, if we were expecting a packet of a certain size, perhaps we won't see the full complement. What would happen then? Timeout? Or maybe use the first byte from the next packet?

With TCP/IP, it's slightly easier. But even then, suppose we receive a malformed packet. Consider the implications.

In all of these cases, we've probably cast garbage onto the structure. That, at best, means we have to validate each element of the structure. Which means we're back to ensuring the syntax was enforced at the protocol level. Which is where I came in.

And then there's the portability (endianness etc) you mentioned...

Steve

formatting link

Reply to
Steve at fivetrees

I am not all that thrilled with the casting to a structure method, but I have used it. My main objections are the portability issues. As I recall from the C standard, structure packing, padding, order and such are left up to the compiler writer. Then you get object size issues, as fond as everybody is of the 8 bit accessibility, some processors have 16 bit minimum access size (DSP's). Endianness is yet another issue. Most of these issues can be sorted out either with compiler flags or ifdefs, but it is painful to change compiler vendors or worse yet processors.

The packet validity can't be assumed unless your protocol has checksumed or better validation, regardless of how the data is taken out of the input data. The timeout and header/trailer fields on the packet help, but in a binary protocol, unless you do bitstuffing, it is possible for the header values to appear in the data stream.

The structure method lets the compiler deal with the offset calulations (for better or worse) instead of manually walking a pointer through the data and assigning it to variables. The structure method is probably smaller code-wise than manually walking a pointer through.

My thinking is that if bad things happen as a result of bad values, sanity check them, regardless of how they are parsed.

Thanks for your thoughts, Bob

Reply to
BobH

As long as the limits are fixed at compile time, the limit checking of a floating point value does not cost much, even with 8 bit integer instructions only. No floating point subtraction (which involves costly denormalisation) is required.

To check if a received floating point value is above a limit, just compare the exponent part of the value with the exponent part of the limit. If the value exponent is greater, the whole value is greater than the limit. If the value exponent is less than the limit exponent, the value is definitively less than the limit. Only when the value exponent and the limit exponent are the same, there is a need to compare the mantissa parts. Starting with the most significant part compare bytes/words until a difference is found.

Paul

Reply to
Paul Keinanen

I think that would work pretty well in the most common cases that I encounter: A gyro rate that should be inside +/-300 deg/sec ends up at 85314---or some other very large number.

The problem with your test algorithm where error density is low is that it requires the most cycles when values are inside limits.

I may have to try your approach, but limit it to just the exponent test, then see what percentage of the actual errors it catches. Since the numbers go through a digital filter, error values which are less than

2X the limiting value won't have as much disruptive effect as numbers that are 2000 times the limit.

Since my particular limits are symmetric about zero, I may even be able to work a bit of magic with the sign bit and perform only one test.

(magic = as-yet-undefined shifts and masking operations)

Mark Borgerson

Reply to
Mark Borgerson

This is definitively true if the valid range is very limited e.g. [23.0 .. 29.9], in which a single sign test and a single exponent compare and two (at least partial) mantissa compares are required.

However, with a larger valid range, e.g. [1.5 .. 100.0], any sample in the [2.0 .. 64.0] range will require only (two sign tests and) two exponent compares. If the sample value is in [1.5 .. 2.0] range or [64.0 .. 100.0] range, _one_ additional (usually partial) mantissa compare is required. With constant sample distribution, about 2/3 cases in this example could be handled with just the exponent compares. Of the remaining 1/3 cases, most would be handled with a single integer compare, unless the value is very close to the limit.

With IEEE bit float, just clear the sign bit (leftmost bit). On an 8 bit processor, get the next 8 bits (exponent) and compare to the limit exponent. If the exponents are equal and an accurate limit test is required, compare the three rightmost bytes in the sample and in the limit. The comparison can be terminated, when there is a difference.

On a 16 bit processor, get the leftmost 16 bits, mask of the sign bit and mantissa bits and compare the whole 16 bit value with the limit value (with mantissa bits masked off). If the masked sample value exponent is less, the sample is OK, if greater, the value is invalid.

If the exponents are equal and more accurate check is required, get the original leftmost 16 bits, mask of the sign bit, compare the leftmost 16 bits of the leftmost 16 bits of the limit. There is no need to mask off the exponent, since these are the same in both values and any possible difference will only occur in the mantissa bits. Only if this comparison produced an equal result, then the rightmost 16 bit word must also be compared.

Paul

Reply to
Paul Keinanen

Erm... I presume these FP values are within a checksummed or CRC'ed packet?

Steve

formatting link

Reply to
Steve at fivetrees

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.