Suggestions for custom application-layer protocol?

- J
- Jonathan Bartlett
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, May 25, 2005 4:54 PM

If the throughput is low, binary would be better.

For example, "1.0000236" is four bytes in binary, but 10 bytes as text.

I also want the protocol to be based on TCP/IP. Does

Read the HTTP protocol. If you really want text-based, HTTP is probably the way to go. My suggestion would be just to use CORBA, since it maps easily into most languages, has a binary representation, and is fairly standard.

What sort of complexities do you wish to avoid? Most of these complexities exist with all text-based messages. With XML you have the advantage that you don't have to write your own parser.

Use CORBA. There are lots of good, free ORBs, including MICO, TAO, and ORBit. There are also some Java ORBs.

Jon

--
Learn to program using Linux assembly language
http://www.cafeshops.com/bartlettpublish.8640017

- M
- Mark
  
  Contact options for registered users
posted
18 years ago

Wed, May 25, 2005 5:01 PM

I need to implement a simple application layer protocol that will be used to communicate between an embedded device (single-board computer running Linux) and a monitoring terminal (running Windows). There isn't alot of data being passed around, mostly status information collected by the embedded device, and control messages from the monitoring terminal. Messages will consist of between 1 and 10 fields of data.

I want the protocol to be text based rather than binary since the data throughput is low. I also want the protocol to be based on TCP/IP. Does anyone here have any suggestions on the design of a simple protocol? Are there simple, standard ways of formatting text messages to be sent over a TCP socket (such as comma-separated)? I would prefer to avoid the complexities of XML. Since either the embedded device or the monitoring terminal can initiate a message, is it preferable to have TCP servers running on both sides? Or is it better to simply leave a TCP connection open between client and server? Any suggestions on ultra-simple standard protocols that do something similar to this?

Any suggestions or pointers to further reading are much appreciated.

Thanks, Mark

- W
- Walter Roberson
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, May 25, 2005 5:13 PM

In article , Mark wrote: :I need to implement a simple application layer protocol that will be used to :communicate between an embedded device (single-board computer running Linux) :and a monitoring terminal (running Windows). There isn't alot of data being :passed around, mostly status information collected by the embedded device, :and control messages from the monitoring terminal. Messages will consist of :between 1 and 10 fields of data.

:I want the protocol to be text based rather than binary since the data :throughput is low.

OK.

:I also want the protocol to be based on TCP/IP.

For the purposes you describe, it sounds as if UDP might be a better fit. UDP has a lot fewer states to worry about. The main question, though, would be how you want either end to react if it notices that a packet has gone missing.

:Does :anyone here have any suggestions on the design of a simple protocol? Are :there simple, standard ways of formatting text messages to be sent over a :TCP socket (such as comma-separated)?

If the fields are consistant (e.g., if there are 3 fields then the first 2 are exactly the same as if there were only 2 fields, or if there are 3 fields they are always the -same- 3 fields) then you can just list the values with some convenient delimeter character.

If the fields are not consistant (including, e.g., only transfering data that changed) then keyword/value pairs would be typical. TEMP3=46 STATE=stable

The protocol doesn't start to get interesting until you have "fields" which are lists of values, or some fields may contain arbitrary text (including your standard delimeter character.) If it's less complex than that, just go ahead and do whatever seems natural.

:Since either the embedded device or the monitoring :terminal can initiate a message, is it preferable to have TCP servers :running on both sides? Or is it better to simply leave a TCP connection :open between client and server?

Sounds like UDP. If you use TCP servers on both sides, then -every- message requires the full 3-way negotiation. If you leave the TCP connection open, you have to worry about the possibility of the connection dropping and needing to detect and recover from that. [e.g., suppose you have to reboot the Windows machine.]

--
Warning: potentially contains traces of nuts.

- E
- Eric Sosman
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, May 25, 2005 5:22 PM

Could you use telnet, plus your own convention for formatting the messages?

--
Eric.Sosman@sun.com

- P
- Peter Andersen
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, May 25, 2005 5:43 PM

Take a look at netcat or nc. It open a tcp port where you can send any data binary or ascii. You can find nc for both windows and linux.

--
/ Peter Andersen
/ Skanderborg

- G
- Gordon Burditt
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, May 25, 2005 6:03 PM

Take a look at protocols like SMTP, NNTP, POP3, IMAP, etc. Another example to look at is the "AT" command set of modems (although this doesn't use TCP). A command is one line of text. It starts with some kind of command verb that identifies what follows. The response is usually one line of text, and something at the beginning of the line identifies what kind of response it is (status code) and may identify whether it's the last line or not. The status codes are in groups so a client doesn't have to be aware of all the status codes (e.g. 4NN is a temporary failure of some kind, 5NN is a permanent failure).

Spontaneous responses make the protocol a bit more complicated. You need to be able to identify a spontaneous response (status code) and perhaps the spontaneous response only says there IS data, and the client needs to ask for it to actually get it sent. IMAP uses tags on requests and replies so you can match them up, and deals with sponteneous responses.

Use any kind of field separator that's appropriate for the data you are sending. POP3 and IMAP use spaces. Commas might be more appropriate for certain kinds of data.

Testing is generally easy: telnet to the appropriate port and manually type commands. Read the responses.

Gordon L. Burditt

- M
- Måns Rullgård
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, May 25, 2005 6:33 PM

I think the OP means that the amount of data is small, so the larger size of a text based protocol doesn't matter.

HTTP is a document retrieval protocol, not a two-way message passing protocol, despite it's frequent abuse as such.

CORBA is also difficult to use and debug, has a large overhead in code, and is largely incompatible with itself. Avoid at all cost.

Have you ever read the XML specification, and all the other required documents (DTD spec, etc.)? Do so, and let's discuss complexities afterwards.

With XML you have the disadvantage that you have to write a complicated parser, or integrate a huge library, for little gain.

Just keep in mind that most of these are incompatible, often even between different versions of the same one.

--
Måns Rullgård
mru@inprovide.com

- S
- Sean Burke
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, May 25, 2005 7:21 PM

One option to consider is to embed a web interface into your application. This has the advantage that you can use any web browser as the client side of the interface.

There are a variety of very small web servers that are suitable for embedding. One such that I have used successfully is "pserv". From the FreeBSD ports description:

| pServ is a small, portable HTTP server. It is written in pure C for speed and | portability. It runs as a standalone program and does not require inetd. | It should be small enough to be used in a mobile computer or to be run on your | obsolete workstation you have somewhere. | | This port contains a patched version to support the execution of php scripts. | | WWW:

formatting link

-SEan

- P
- Pascal Bourguignon
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, May 25, 2005 7:33 PM

This is the best advice. The design of Internet protocols follows good principles, and leads to simple implementations.

More complex protocols such as HTTP are needed only when your requests cannot hold on one command with a few arguments. Then we format commands as email headers, with several lines of 'field: value' ended with an empty line.

Read:

formatting link

Note that AT commands are cryptic (not HELP command like in the Internet protocols, and no mnemonic commands). AT came for the unique bit pattern A and T have in ASCII which helped the modem to detect and adjust automatically the bit rate. You don't need that for a network protocol.

--
__Pascal Bourguignon__                     http://www.informatimago.com/

Nobody can fix the economy.  Nobody can be trusted with their finger
on the button.  Nobody's perfect.  VOTE FOR NOBODY.

- M
- Mark
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, May 25, 2005 7:57 PM

to

Linux)

being

of

Thanks for all the excellent suggestions so far. I'll read through the RFCs for various Internet application protocols. I've also found some good information in Eric S. Raymond's online book "Art of Unix Programming". It would be nice to be able to use a simple standard protocol of some kind. It seems strange to have to come up with ways of formatting text fields, delimiting messages etc since I'm sure this has been done already many times before. Thanks, Mark

- D
- David Schwartz
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, May 25, 2005 8:07 PM

You can take a page from the URL encoding scheme and use a list of = pairs delimited by & signs. Encode all less than signs, greater than signs, & signs, = signs, percent signs, whitespace, control characters, and non-ascii characters as '%xy' where x and y are the first and second hexadecimal digit of the ASCII value.

So a query might be:

query=get&parameter=uptime

And a reply might be:

query=get&parameter=uptime&value=1%20hour,%205%20minutes

Specify that each end ignore any parameters they don't understand (rather than considering it an error), and your protocol is extensible.

DS

- G
- Grant Edwards
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, May 25, 2005 8:36 PM

Not sure what you mean by "monitoring terminal"?

Is the data intended for use by man or by machine? If the former, then why not just use the telnet protocol and telnet into your SBC?

If by machine, then simple ASCII newline terminated messages are often the easiest to impliment and debug.

Telnet, SMTP, IMAP, NNTP, etc. are all worth looking at.

No, there aren't really any overall standards. I'd stick with something plain ASCII and line-oriented.

I would hope so. :)

Who can initiate a message and who can initiate a connection are orthogonal issues. We have no way of knowing what approach will work best for your application.

You haven't given enough details for us to know what might be similar and what wouldn't.

Whatever you pick, my main recommendation would be to do something you can test/debug by using a telnet client and netcat.

--
Grant Edwards                   grante             Yow!  OVER the
                                  at               underpass! UNDER the
                               visi.com            overpass! Around the FUTURE
                                                   and BEYOND REPAIR!!

- W
- Walter Roberson
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, May 25, 2005 10:20 PM

In article , Mark wrote: :It :would be nice to be able to use a simple standard protocol of some kind. It :seems strange to have to come up with ways of formatting text fields, :delimiting messages etc since I'm sure this has been done already many times :before.

Well you could use XDR, but changes are that is overkill.

There are lots of different standards you could use, but if the standard is flexible then you end up putting in a lot more logic than you need for your particular application.

For example, a flexible standard needs to be able to account for nested lists of values, including the possibility that the nesting is not the same at all points. Then there are sparse representations to be handled. And it doesn't take much in a flexible standard before "something comes up" and you need to refer to a previously transmitted value in order to make sense of what follows...

--
Studies show that the average reader ignores 106% of all statistics
they see in .signatures.

- W
- Walter Roberson
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, May 25, 2005 10:40 PM

In article , David Schwartz wrote: : You can take a page from the URL encoding scheme and use a list of := pairs delimited by [...]

: Specify that each end ignore any parameters they don't understand :(rather than considering it an error), and your protocol is extensible.

I disagree. If you ignore parameters you don't understand, then you may well be ignorning something that changes the meaning of something else that you thought you understood.

When I last did this kind of protocol definition work, the first thing I had the two sides exchange is their respective protocol revision numbers. Each side would have a range of versions over which it was able to support backwards compatability; if the two sides were too far apart, then either end could say "Sorry, I don't know how to talk to you."

You could offer "ignore what you don't understand" extensibility within a protocol release provided that the extensions don't change the meaning of any established parameter, and the extensions don't command anything critical. For example, if one added an arrow-shape parameter to a graph, then likely it would be considered acceptable if the other end ignored the shape parameter -- but a new parameter that indicated (e.g.) "Turn off motor #7!" is probably too important to ignore. Any time there is a notable semantic change or critical command change, the version number should get a boost.

Note to the original poster:

One of the other issues that I ran into, which might or might not be relevant to you, is that some of my operations could take a long time. I needed command response timeouts; and I needed keep-alives so the one end would know the other end was working instead of hung up; and I needed a way to interrupt executing commands. [My code was single threaded and non-preemptive, so I had to put in co-operative multitasking hooks.] There's no clean equivilent to ^C over a network connection ;-)

--
  The rule of thumb for speed is:

  1. If it doesn't work then speed doesn't matter.  -- Christian Bau

- D
- David Schwartz
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, May 25, 2005 11:15 PM

That's not a problem. Just have a parameter like 'minimum-version' that goes in each request. If the minimum version of the query is higher than the version you are coded to, you know you don't understand the protocol. You can also use ':' instead of '=' for "critical" parameters, that is, ones that should cause you to reject the request if you don't understand them.

The only problem with this is that you can't easily have multiple overlapping changes. For example, suppose one person adds a compression option to the protocol while independently someone adds a security option. No implementation exists with both options because they're temporarily (or maybe even fundamentally) incompatible.

This is a really hard problem. I've seen many "negotiation" specifications that didn't actually do anything more than announce capabilities. That's not negotiation.

That works fine if and only if one person controls all implementations of and changes to the protocol.

Yeah, we can make this as hard as we want to. ;)

DS

- B
- Barry Margolin
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, May 26, 2005 12:03 AM

Right. Look at HTTP as an example of a protocol that has evolved pretty gracefully by making use of the "ignore what you don't understand" philosophy. Many of the features that eventually became codified in HTTP/1.1 were originally done as nonstandard extensions in HTTP/1.0. For instance, the "Host:" header, which is used by virtual servers, was nonstandard in 1.0; it would have slowed down growth of the web significantly if we had to wait for a new version of the standard to be published before browsers and servers could start using this feature.

What you have to do when you start creating nonstandard extensions, though, is ensure that they're not critical to operation of the protocol. The protocol has to operate properly if the recipient ignores them. You need to bump up the protocol version number, and make use of some form of negotiation, when you introduce new required features. An example in HTTP/1.1 is "chunked" encoding -- if a server sends this and the client doesn't understand it, the document will look all wrong and the client and server will get out of sync in communication (the recipient expects the sender to close the connection when it's done).

--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***

- W
- Walter Roberson
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, May 26, 2005 12:51 AM

In article , David Schwartz wrote: : The only problem with this is that you can't easily have multiple :overlapping changes. For example, suppose one person adds a compression :option to the protocol while independently someone adds a security option. :No implementation exists with both options because they're temporarily (or :maybe even fundamentally) incompatible.

: This is a really hard problem. I've seen many "negotiation" :specifications that didn't actually do anything more than announce :capabilities. That's not negotiation.

This discussion reminds me of the old VMS options processing setup. It was table driven, but it had the flexibility to know what options took what kind of parameter, and to know what options were not valid in combination with other options.

--
Warning: potentially contains traces of nuts.

- D
- Dave
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, May 26, 2005 5:11 AM

your

I don't think this link is even exist.

D

- G
- Glyn Davies
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, May 26, 2005 8:09 AM

Hi Mark,

You seem to be implying that you have a network connection to your SBC, but don't confirm it!

For simplicity of implementation, I'd recommend implementing a message layer over the TCP socket - something as simple as a two byte binary header to indicate the length of the message to follow.

This means you can then read a whole message - text based or otherwise, and the process it accordingly. You don't need to mess about with parsing lines and stuff like that. It drops out to:

Read Header: 2. Read Data: 3 Process Request Repeat (ad infinitum)

I'd suggest just running the server on the SBC. Its always possible you might need to run multiple monitoring terminals in other places, and there is no need for the SBC to connect to the monitoring terminal if the software for the monitoring is not running (by the sound of it)

After that, the format of your payload could be a simple text header indicating the message type with the specific format of data to follow. CSV for the data for example

Hope this helps,

Glyn

--

------------------------------------------------------------------------
Glyn Davies / gryn@plig.net / www.technobobbins.com / Insert quote here?
------------------------------------------------------------------------

- T
- Tom Anderson
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, May 26, 2005 2:51 PM

I've written a bunch of little protocols like this. I think it really is as simple as it sounds - use TCP, frame messages by terminating with a line break (i'd tolerate CR, LF or CR+LF, and consistently emit CR + LF, i think), split each message into fields with tab characters, use the first field for a command or status code, and put your data in the other fields. If you want to be able to have multiple commands in flight at once, use the first field for a transaction tag (as in IMAP), and the second field for command or status.

The latter. Engineer the protocol to be stateless, though, so if you lose the connection, you can just throw up a new one and carry on.

As for versioning, if you think implementations of this might persist in the wild for several years, or if anyone else is going to implement it, by all means do some version negotiation, but otherwise, if you control both ends, i'd forget about it.

Add a no-op command so you can do keepalives. Maybe add ping and pong commands, so you can see if the far end is still alive. If you have long-running transactions and are paranoid, go with the tagged command approach and add a 'query transaction status' command and appropriate responses.

If you want to encode complex data structures, use JSON:

formatting link

JSON encodings are valid javascript and python source, which makes parsing them pretty easy!

Ultra-simple and standard: pick one!

And, while i'm slinging aphorisms: make it work, then make it right, then make it fast. Oh, and do the simplest thing that could possibly work.

tom

--
Punk's not sexual, it's just aggression.