Optimize a serial protocol between two applications

- P
- pozz
  
  Contact options for registered users
posted
11 years ago

Thu, Dec 20, 2012 11:09 PM

I have two C applications running on embedded platforms. Actually they communicate through a serial connection (38400bps) with a proprietary binary protocol.

The master application sends commands to the slave application requesting its status or changing some settings. The answer to the status request is a sequence of binary data: numerical values in 1 or 2 or 4 bytes (in some cases, organized in arrays), small null-terminated strings, bitmask and so on.

At first this approach was good, but now I'm finding many problems. Mostly I have to keep the two applications synchronized: if I add a parameter (maybe in the middle of the answer, because an array size is increased) or enlarge a numerical values (from 1 to 2 bytes) or something else, I have to change both applications. I usually want to have a retro-compatible master application (master should communicate well with new or older slaves application), so the code will be filled with many ifs (if the slave answer version is 1.0 than there is one parameter here, if it is 1.2 the array size is 3 and not 2...)

Is there a better approach to exchange data on the wire between two C applications, considering the limitations of memory, the low speed and the length of messages of a small embedded platform?

- R
- Rich Webb
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Dec 21, 2012 1:53 AM

I'd approach this by having the slave announce its version number when it's introduced to the master. Incoming bytes in the master are stored in a buffer of unsigned char (suitably aligned if the processor so requires) and then decoded by accessing the buffer through a pointer to struct of the type required for that message and version number. I'd build the messages with a fixed-size header and variable-length body. Header contains the message type (and perhaps version number, instead of a one-time version announcement at the start), size of the body, CRC values, etc.

Could get similar behavior with a union of all possible structs overlaid with an array of unsigned char of suitable size. Somewhat a matter of taste, although the union method verges on unspecified behavior.

--
Rich Webb     Norfolk, VA

- G
- George Neuner
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Dec 21, 2012 2:11 AM

How do you think you might avoid this? Even were the protocol extensible, both sides would need to know what to do with new or different data.

Given that the protocol wasn't designed to be extensible from the beginning, there's likely nothing you can do now. If you need to support unmodified old clients, the server has to speak all the old versions of the protocol and either somehow recognize which is being used (if possible) or be configured for the particular client's needs.

For the future you can use - or devise - an extensible marshalling protocol that specifies the number/types/lengths/endianess of fields. Such protocols have already been devised. Google is your friend.

- L
- Les Cargill
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Dec 21, 2012 2:39 AM

There is a small infinity of ways to do this. Since you now have a legacy system, you can add a layer of indirection and hook routines for each message type into a table, and once the protocol version has been negotiated, change the table. If you need to support multiple slaves and the slave versions can be heterogenous, you can have multiple tables. #defines can hide the tables if you desire.

in general, any argument you can make in favor of a binary protocol can usually be overcome by doing a little thinking. It's not 1978 any more.

If you're actually bandwidth constrained over a 384k link, maybe the product needs a better comms solution. At least experiment with 115k.

--
Les Cargill

- T
- Tauno Voipio
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Dec 21, 2012 9:34 AM

I'd consider to use a framing resembling that of PPP (RFC 1662).

It is normal in asynchronous serial connection that sooner or later one cheracter is missed on receive. Without a reliable framing method, the system will be out of sync till next reboot at both ends.

--

Tauno Voipio

- R
- Rich Webb
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Dec 21, 2012 11:39 AM

Absolutely, yes. One needs to have framing (magic numbers) and validity checks (structure size, framing positions, consistency, and checksum/CRC tests) before declaring it a valid message.

For an example of a message format that can encompass multiple sources and vendors, look at the eXtended Triton Format (XTF) used in sonar data recording. Very flexible, although perhaps a bit messy, reflecting how it has accreted different capabilities as it has grown.

--
Rich Webb     Norfolk, VA

- N
- Noob
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Dec 21, 2012 12:04 PM

Are you referring to aleph-zero? :-)

The typical signature delimiter is "DASH DASH SPACE NEWLINE"

--
Here be dragons!

- L
- Les Cargill
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Dec 21, 2012 7:01 PM

Something like that :

This work?

--
Les Cargill

- V
- Vladimir Vassilevsky
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Sun, Dec 23, 2012 5:18 AM

[...]

Perhaps, the minimum solution is memory mapped protocol. There are only two commands: read block of X bytes from location Y and write block of X bytes to location Y. Memory setup and data update logic is up to you. Very simple protocol without any overhead.

Vladimir Vassilevsky DSP and Mixed Signal Consultant

formatting link

- S
- Stefan Reuther
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Sun, Dec 23, 2012 12:12 PM

This does not solve the problem of having to change both sides of the application at the same time when something changes. And if you really use memory addresses, they can change often. Some of our old products did that, requiring extensive emulation in the new products to keep the same wire protocol.

I would at least use "virtual addressing", i.e. "read block of X bytes from location Y within block Z", where Z is used to map the location into a real address, and validate the length. If you then adhere to a convention of extending blocks only at the end (i.e. old clients only read/write the initial part of the block), you'll be able to evolve the software quite a bit before needing to throw away blocks.

Stefan

- V
- Vladimir Vassilevsky
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Sun, Dec 23, 2012 2:35 PM

Yes, this is a big problem. However, this is not a problem of the protocol. This is design problem. IMO this is a kind of a problem where universal approach results in tremendous overhead and rarely justifies itself.

The first step is making the transport independent from application.

Vladimir Vassilevsky DSP and Mixed Signal Consultant

formatting link

- S
- Stefan Reuther
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Sun, Dec 23, 2012 3:08 PM

Tremendous overhead? Come on. For the minimal version, you need just a table that maps block identifiers to real address/size pairs. And you need that table anyway for ISO9000/SPICE audits :-)

I build most of my protocols this way ("truncate if sender sends too much, zero-pad if he sends too little, never send any raw pointers") and that little overhead paid out thousandfold already. Whereas others have to rebuild the whole world after changing a structure, my component in version X usually works together with version X+1 on the other side. Usually even X+20. Of course this also needs a little thought. E.g., sometimes store "not X" instead of "X" to make use of the zero-pad rule.

That is a different story.

Stefan

- J
- Jon Kirwan
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Sun, Dec 23, 2012 7:36 PM

I'll mention something I've used that I'm not entirely sure are better for your needs, but provide backward and forward compatibility (to the degree possible.) In those cases, each binary packet includes a type and size. If the receiving unit doesn't recognize the type, it is at least able to skip the 'size' number of bytes and get on to the next packet. This provides a kind of backward compatibility where a newer unit that produces newer data elements can be attached to an older unit that can't process them. It's not always necessay that the older unit does and in those cases, this saves the day so to speak. Newer units receiving data from older units will use defaults for new type values not sent by the older unit. So it works for cases where customers needed to buy a newer product and use it with older equipment and don't need any of the newer features of the newer product, for example.

I have also had cases where the 'size' field itself (because of low speed comms) needed to be variable in size. In such cases, I may arrange for the size field to be extensible. For example, I might choose to set the type field to be 3 bits and the size field to be 5, for a total of 8 (or one byte.) But the size field could be designed so that value 0 indicates that the next entire byte is to be taken as the size (adding another byte to the stream.) Or you could use one bit of the 5 bits (leaving only four for most packets) as an 'extension bit' indicating that another byte is required, if you felt you might need an entire 12 bit size field sometimes. So you don't always have to waste a lot of space at it.

There are lots of variations on these themes, which permit extending the size of some value or parameter you want to transmit/receive, that allows compatibility backward and forward as things change, without having to slow down traditional comms by much at all. Just use your imagination.

A nice thing about a simple type field and a size field, even an extensible size field, is that the code to handle it remains fairly simple and usually small. It doesn't take much to extract fields and it doesn't take much more to extend the size field if you use an extensible form. (You could even allow the type field to be extensible, I suppose.)

This has also worked pretty well when dealing with software updates to old hardware, where the user parameters are stored in EEROM. The newer software can readily read the old data, use default values for what isn't there, and the next time it saves, it updates the EEROM with the new fields. If, for some reason, the customer insists there is a problem with the new software, we can easily revert back to the old software even though the EEROM now holds newer fields (which it skips.) Either way, it works. So it's a nice flexibility, at times.

Jon

- C
- Christopher Head
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Mon, Dec 24, 2012 4:31 AM

These ideas are similar to the principles behind Google Protocol Buffers. That specific encoding format may not be what you want, but read the description and you?ll get the idea that they thought about these issues carefully and solved them and you might be inspired by what they did.

Chris

- U
- upsidedown
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Mon, Dec 24, 2012 5:45 AM

Sounds as you are trying to reinvent ASN.1

formatting link

which also might or might not be appropriate for the OP.

- J
- Jon Kirwan
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Mon, Dec 24, 2012 7:54 AM

Doesn't look the same to me, and it looks more complex which one might expect from committees and ISO/IEC processes. I can't see the similarity.

I used what I discussed here at least 25 years ago. And what I used (modified to my needs) wasn't a new idea for me. It came from certain OBJ file formats I'd developed and worked on 15 years before that, almost 40 years ago.

Jon

- J
- Jon Kirwan
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Mon, Dec 24, 2012 8:00 AM

I'll have to look over that. What I was thinking of is quite simple to implement and doesn't require protocol compilers or anything like that. It's just too easy to do. I'd just recommend that the OP think about mods that will only take a few lines of extra code over what he is already doing.

But thanks. I didn't know about this and it's something to read about. I went so far as to see their discussion of varints as well as a simple int32 transmission sequence... and from that, I suspect it's worth more time to look at. Not for existing or old tools, but as a suggestion for others.

Why didn't you post about this for the OP before? (Might have been a fit with lots of doc rather than me just blabbering.)

Jon

- B
- baranov
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Mon, Dec 24, 2012 2:34 PM

the

two

bytes

simple

Thus we are approaching to the new invention of MODBUS.

--------------------------------------- Posted through

formatting link

- C
- Christopher Head
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Mon, Dec 24, 2012 8:41 PM

You don?t *have* to use the compiler to use the protobuf wire format. The wire format is fully documented on their site, so you can just pick that up and write your own parser and encoder if you prefer.

The varint encoding is an interesting way to save some space, and the negative number thing as well.

Something else that?s nice about the protobuf wire format is that the field identifier (e.g. field ?name?) and a rough approximation to the field type (varint, 64-bit, 32-bit, or length-delimited) are packed into a single byte. That?s not enough to fully unpack a message (you don?t know what a length-delimited value represents, or whether a

32-bit value is an integer or a float, or whatever), but it *IS* enough to skip over fields you don?t recognize (backwards compatibility!) because each of the above field types contains enough detail to figure out the length of the field payload. But it does this without actually including a length field in most cases, so encoding lots of primitives ends up smaller on the wire than a simple list of key/length/value tuples.

The wire protocol probably isn?t ideal for embedded applications. You might want to invent something similar but using different field types for better space efficiency?maybe you don?t want to bother with varints or don?t care about 64-bit integers, but if you allocated two field type codes to the purpose, you could encode an entire boolean field in a single byte by having separate field types for TRUE and FALSE, for instance?payload length is zero, and field value is encoded by means of the field type.

It?s not really targeted at embedded systems, so it wasn?t really in mind at the time :)

- J
- Jon Kirwan
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Mon, Dec 24, 2012 9:11 PM

Hehe. So my comments triggered the addition. Resonance between minds. Nifty.

Anyway, thanks for it. I will spend a little time reading and learning about it until more fully apprehend the scope of it and its benefits and problems better. If it can function both for communication as well as for EEROM parameter storage with equal facility (and EEROM is limited in size, so again size is very important) I would much prefer having something to point out as a reference, documenting slight differences in implementation instead of having to spend time documenting the entirety of my own design. It's less work that way. And so long as I don't sacrifice something along the way, that's a good thing.

Been using similar ideas, but not the same, for almost 40 years (in OBJ file formats back in the day when disk space was very precious and main memory even more so.) Nice to see this, though.

Jon