Suggestions for custom application-layer protocol?

- W
- Walter Roberson
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, May 26, 2005 6:11 PM

In article , Glyn Davies wrote: :For simplicity of implementation, I'd recommend implementing a message :layer over the TCP socket - something as simple as a two byte binary header :to indicate the length of the message to follow.

:This means you can then read a whole message - text based or otherwise, :and the process it accordingly. You don't need to mess about with parsing :lines and stuff like that.

You still need to "mess about" if your data might not fit in a single packet. Recall that when you read() from a socket you are not certain to get all of the data unless the data length is no greater than your system's atomicity limits [PIPE_BUF]. The POSIX minimum value for PIPE_BUF is 512 -- so in the general case a write() of more than 512 bytes may require multiple read() statements.

--
Ceci, ce n'est pas une idée.

- D
- David Schwartz
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, May 26, 2005 6:13 PM

And even if it does.

These atomicity limits don't apply to TCP connections, which have their own segmentation rules.

DS

- G
- Grant Edwards
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, May 26, 2005 6:22 PM

And even if it does.

The data may have passed through a bridge or router that may have broken it up even further. It's a bad idea to make any assumptions about any correspondance between write() and read() calls on a TCP connection. It's just a byte stream: you oughtn't depend on any sort of datagram behavior whatsoever.

A 500 byte "message" may have been sent using a single call to write(), but the receiver had better be prepared to call read() anywhere from 1 to 500 times.

--
Grant Edwards                   grante             Yow!  Two with FLUFFO,
                                  at               hold th' BEETS...side of
                               visi.com            SOYETTES!

- M
- Mark
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, May 26, 2005 10:36 PM

to

Linux)

being

of

Thanks alot for all the responses, there are some excellent ideas here which I will investigate further. I looked briefly at the BEEP protocol but it seems to support alot more features than I need. I control both ends of the protocol, so don't need to overdo it on versioning robustness, etc. To clarify a few responders questions ... by "monitoring terminals" I meant Windows-based computers with a simple GUI that displays the data and allows users to control various aspects of the embedded device from a remote location. I expect I will end up with a text-based protocol, with a header that indicates the number of bytes in the payload. The payloads will be composed of field=value pairs.

Thanks guys I really appreciate it, this discussion has helped me out a great deal!

Mark

- G
- Grant Edwards
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, May 27, 2005 12:04 AM

If you're using a text-based protocol, you don't nead a header with a payload count to tell you where the message ends. Use an end-of-message character or string. The cr/lf pair is used as the end-of-message delimiter by a lot of text-based runs-on-top of TCP protocols. That approach works wonderfully with line-oriented high-level text-handling libraries and utilties -- you can do all sorts of useful stuff using nothing but a few utilities like netcat, expect, and so on.

I've also seen protocols that use STX/ETX to frame messages, and that works nicely as well, but isn't as compatible with pre-existing utilities.

That Sounds like a good plan.

--
Grant Edwards                   grante             Yow!  Hey, LOOK!! A pair of
                                  at               SIZE 9 CAPRI PANTS!! They
                               visi.com            probably belong to SAMMY
                                                   DAVIS, JR.!!

- J
- James Antill
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, May 27, 2005 5:04 AM

Not a terrible idea, a simple HTTP/1.0 server can be pretty small esp. if you don't mind stopping as soon as something works (the very basics can be done in < 20 lines of C).

Yeh, pretty good ... I stopped looking after seeing 2 major vulnerabilities on the first google page. The code also looked promising ... for more exploits.

Writing a custom "simple protocol" is likely to be much easier, using netstrings is probably more likely to make you do the right thing ... but a simple "CMD arg1 arg2" type telnet/SMTP/NNTP like protocol isn't hard to get right.

--
James Antill -- james@and.org
http://www.and.org/vstr/httpd

- J
- James Antill
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, May 27, 2005 5:12 AM

It interacts well with telnet, which is about the best thing that can be said for it. There are large problems with how you limit "too long" lines, and what happens when you hit a CR or LF on it's own. Esp. if you put something like a perl regexp directly on the socket. The HTTP response splitting problem is entirely the fault of pretending CRLF terminated lines are a good idea.

--
James Antill -- james@and.org
http://www.and.org/vstr/httpd

- M
- Mark
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, May 27, 2005 7:34 AM

This is the main reason I'm leaning towards a byte count to delimit the message. But I think I'll end up with some concept of special characters in the protocol (= , crlf). Is there a standard way to escape these characters if they appear in the body of the message? I suppose I could use nested netstrings to encode each individual field in the message. Thoughts?

- G
- Glyn Davies
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, May 27, 2005 10:52 AM

Yes - but you do know how much you are getting. You can handle all this stuff in a comms layer, then concentrate on your application logic.

I've done countless systems using this technique. Works well, and is simple.

Glyn

--

------------------------------------------------------------------------
Glyn Davies / gryn@plig.net / www.technobobbins.com / Insert quote here?
------------------------------------------------------------------------

- G
- Glyn Davies
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, May 27, 2005 10:55 AM

Most STX/ETX stuff I have seen was over serial comms. Not seen it in use over TCP - except where there was an annex box feeding out a bunch of serial lines converted to TCP.

Glyn

--

------------------------------------------------------------------------
Glyn Davies / gryn@plig.net / www.technobobbins.com / Insert quote here?
------------------------------------------------------------------------

- G
- Glyn Davies
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, May 27, 2005 11:06 AM

Standard way of escaping a 'special character' is to put it in twice.

ie, if % was your special character, to have a '%' in your message you'd put %%

If you do put a byte count in, don't put it in as text. It might make testing easier, but I doubt it.

Being able to telnet in and test things is the only advantage of CR/LF protocols. Its not hard to knock up test apps which can send a message though once you have done your basic comms library (if using a binary header)

Glyn

--

------------------------------------------------------------------------
Glyn Davies / gryn@plig.net / www.technobobbins.com / Insert quote here?
------------------------------------------------------------------------

- M
- Måns Rullgård
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, May 27, 2005 11:12 AM

The protocol used by Micros Fidelio point of sale terminals use STX/ETX framing when talking to the property management system. The reason for this is probably that these devices historically were connected with serial links.

--
Måns Rullgård
mru@inprovide.com

- G
- Glyn Davies
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, May 27, 2005 1:37 PM

There is a multi-drop (I think thats the term, my brain is a bit frazzled today) which was used to connect many terminals to a host controller.

ISO/ASYNC I think it was called. Relied on a host polling each terminal in turn, at which point they were allowed to speak.

Was used extensively with serial driven ATMs - most are TCP/IP these days I believe.

But we digress...

Glyn

--

------------------------------------------------------------------------
Glyn Davies / gryn@plig.net / www.technobobbins.com / Insert quote here?
------------------------------------------------------------------------

- M
- Måns Rullgård
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, May 27, 2005 1:53 PM

I've never had the misfortune of having to deal with such a thing.

I've once seen an ATM display an mswindows dialog box with a DHCP related error message.

--
Måns Rullgård
mru@inprovide.com

- G
- Glyn Davies
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, May 27, 2005 2:38 PM

:-) By the time I got to it we were moving people off them to dedicated lines. Protocol stayed though as it worked fine.

Yeah - they obviously didn't have the clever* stuff we knocked up to stop that kind of thing happening.

Glyn

--

------------------------------------------------------------------------
Glyn Davies / gryn@plig.net / www.technobobbins.com / Insert quote here?
------------------------------------------------------------------------

- G
- Grant Edwards
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, May 27, 2005 2:59 PM

If you handle IAC seqeunces -- at least well enough to ignore them.

And it's a pretty big thing, in my experience.

What problems? If a message has to be X bytes long, it's X bytes long regardless of how you choose to delimit it. Handling long messages is a general issue and not specific to the delimiter charater chosen.

Yes, that can be a problem if you have to be able to include arbitrary strings in the message body. I was under the impression that this wasn't the case for the OP's application. I could be wrong.

My other suggestion to use STX/ETX is pretty simple and is quite robust if you impliment an escape mechanism for when STX,ETX, or the escape character occur in the message body.

--
Grant Edwards                   grante             Yow!  Sometime in 1993
                                  at               NANCY SINATRA will lead a
                               visi.com            BLOODLESS COUP on GUAM!!

- G
- Grant Edwards
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, May 27, 2005 3:04 PM

Assuming you're using a 7-bit character set, one of the easiest schemes I've seen is on transmit to convert to

[Escape] |0x80

Remember to include the [Escape] character in your set of special characters that get escaped:

When sending a message body,

STX becomes ESC,STX|0x80 ETX becomes ESC,ETX|0x80 ESC becomes ESC,ESC|0x80

When receiving a message body

ESC, becomes &0x7f

--
Grant Edwards                   grante             Yow!  Yow! Are you the
                                  at               self-frying president?
                               visi.com

- G
- Grant Edwards
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, May 27, 2005 3:07 PM

A byte stream is a byte stream. The serial (as in RS-232) byte stream isn't reliable, but I cann't see any difference between a serial comm link and a TCP link when it comes to message framing.

--
Grant Edwards                   grante             Yow!  How do I get HOME?
                                  at               
                               visi.com

- G
- Glyn Davies
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, May 27, 2005 3:15 PM

Nope, there is no general difference... The serial streams were generally 7bit though, which would stop you using a binary length header.

Also, as you point out the serial stream is unreliable. That makes a difference in the protocol you choose. With STX/ETX framing if you get garbage you can resync at the next STX. If you are using length headers and you get some noise on the line, you are lost and have no way to resync.

Glyn

--

------------------------------------------------------------------------
Glyn Davies / gryn@plig.net / www.technobobbins.com / Insert quote here?
------------------------------------------------------------------------

- J
- James Antill
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, May 27, 2005 4:25 PM

Generally I'd say one of the major advantages of having leading byte counts is that you don't have to escape anything. Where you need multiple arguments just enclose them all in a single netstring. Ie. for abcd = xyz do[1]

13:4:abcd,3:xyz,,

...if you want to get really fancy (and allow easier exensibility) you can do key/value pairs[2].

That works "ok" if you will only ever have a single character that needs escaping, like IAC escaping in telnet ... it works much worse if you have to do more than one (CSV is the obvious example here). Then doing html/C/url numbered escaping works better (but again, if you can avoid it I'd do so).

Putting a byte count in as a binary encoding should always be a last resort. You save little, and you are then tied to that byte size[3].

[1]

formatting link

[2]

formatting link

[3]

formatting link

--
James Antill -- james@and.org
http://www.and.org/vstr/httpd