Simple BUT reliable serial protocol

- P
- pozz
  
  Contact options for registered users
posted
8 years ago

Mon, Jan 4, 2016 1:01 AM

I'm trying to implement a simple protocol for a point-to-point full-duplex serial link. It could be a reliable link, such as a connection between two near MCUs on the same PCB, or a noisy link, such as RF link.

The application layer should send and receive generic messages: -> How are you?

- T
- Tom Gardner
  
  Contact options for registered users
Vote on answer
posted
8 years ago

Mon, Jan 4, 2016 1:12 AM

See

formatting link

- L
- Les Cargill
  
  Contact options for registered users
Vote on answer
posted
8 years ago

Mon, Jan 4, 2016 2:25 AM

Does "as" mean " in the same manner that"?

Congratulations. You can use SLIP over a serial port and reuse TCP wholesale.

The something like ":,;" can be your application transport protocol, for ready parsing. Use a \: , \, and \; for escaping those characters.

Or constrain messages to ASCII and use STX, ETX and other < 0x20 characters as delimiters. Or something else.

Or you could do without TCP and use the same basic mechanism.

Don't forget NACK, WACK ( wait ACK ) sequences... When you send a WACK, and the processing is done for it, send an ACK for that message. Or don't.

So now we're up to: :!,;

Or make it such that duplicate messages don't matter.

he can't. See also "The Two Generals Problem". He can, however, be assured that he'll receive an ACK ) or NACK, of WACK ) within seconds of transmission.

Have the receiver send NAK sequences when an error is detected. It's your call whether to try to use a sequence number in NACK sequences.

Indeed; or with processing if that's okay at the application level.

It's often useful to have a state of "oh, the line is dead" based on the timeout.

Other than repealing the second law of thermodynamics, there isn't one.

Correct.

formatting link

'_Problem

--
Les Cargill

- R
- Robert Wessel
  
  Contact options for registered users
Vote on answer
posted
8 years ago

Mon, Jan 4, 2016 5:43 AM

And a partial solution is something like the two-phase commit protocol (also well described in the obviously named Wikipedia article), but it's not 100% (it may require administrator intervention to resolve certain failures), but that's how distributed databases maintain coherency over unreliable links.

- P
- pozz
  
  Contact options for registered users
Vote on answer
posted
8 years ago

Mon, Jan 4, 2016 9:26 AM

Yes, sorry for my poor English.

I was thinking to use the full TCP/IP lwip stack for this simple problem: reliable full-duplex point-to-point protocol. lwip comes with SLIP and TCP and is highly customizable. I gave it a try and, of course, it works well... but after that I think it is too complicated for my very simple task.

First of all, the lengths of frames are at the minimum 40-50bytes, without payload (TCP header + IP header + SLIP framing).

IP packets have 32-bits source and destination address, TCP segments have 16-bits source and destination ports. For a simple point-to-point link it seems too complicated.

IP packets and TCP segments have 16-bits checksum.

lwip needs a dynamic memory allocator, even if it could be very simple.

Finally I think I need another solution, similar to TCP in many aspects but without many of its features (port multiplexing, IP layer, ...)

Indeed it is the way I think I will follow. It is very strange to me there isn't a ready-to-use or standard protocol to reuse.

Are you talking about application or transport protocol?

Here I'm interested in *transport protocol* (that doesn't know anything regarding application protocol). Transport SHOULD guarantee a reliable connection with two hosts. If a frame is received, the receiver MUST send an ACK to the sender (never a NACK). For me ACK means "ok, I received the frame" and not "I received the frame, but the message is syntactically wrong or the command received can't be processed". Those kind of things are at the application layer.

Anyway NACK can be used in transport protocol when a frame is received with a non sequential sequence number, maybe because the previous message was lost. Here NAK can be used to speed up the retransmission of a lost frame. But the same result can be obtained without NACK, simply the receiver silently drops all the frames received with a non sequential sequence number. The sender retransmits the un-acked frames when a timeout expires.

Similar things for WACK. If it is used for application ("hey, this commands that too time, please wait"), it should be moved to another protocol layer. Anyway it can be used in transport protocol for flow control ("hey, I received last frame, but you are too fast for me. Please stop transmitting now, I'll tell you when you can continue again").

Yes.

Yes, it's possible to design an *application* protocol with messages that could be duplicated without any side effect (for example, all the

measure").

Howevere in this case you are designing a transport protocol keeping in mind what is the application protocol that will be used. Maybe tomorrow you will want to change the application protocol or use another one and a conflict with the transport protocol could arise.

Now I understand why I didn't find a solution to my problem... it doesn't have a real solution.

Here the problem is with sender that doesn't receive the ack from the receiver. It isn't an error detected by the receiver.

Of course, yes. However what happened to my last message, before the link died?

Ok, thank you for your observations and suggestions. Now I know a good solution doesn't exist.

At last, I think I will try to implement a simple HDLC ABM protocol for my needs.

- P
- pozz
  
  Contact options for registered users
Vote on answer
posted
8 years ago

Mon, Jan 4, 2016 9:29 AM

Il 04/01/2016 06:43, Robert Wessel ha scritto:

Of course, it is too complicated for simple serial links, maybe between a PC and an embedded board.

Fortunately I don't work for medical and mission-critical applications. Anyway I'm curious how those kind of problems are solved in those applications where I can't accept *any* error.

- T
- Tom Gardner
  
  Contact options for registered users
Vote on answer
posted
8 years ago

Mon, Jan 4, 2016 12:00 PM

That's a good question and attitude :)

Not all problems can be solved. A classic example is synchroniser metastability. The best that can be achieved is to reduce the probability of failure to an acceptable level. That, of course, requires that all the implementation details details are correct and that they match the design presumptions - which isn't easy.

Another example is two-phase transaction commit protocols, which also have a finite failure probability. That probability can be reduced by three-phase commit protocols, and I think you can guess the next step :)

In practice it would be wise to assume that failures will occur and that there are other mechanisms to detect and recover from failures. Alternatively architect the system so that failures are tolerated; they will often have other system advantages.

Always bear in mind the "eight fallacies of distributed computing",

formatting link

and that "a distributed system is one where your application can be broken by a failure in a computer that you didn't know existed"

- T
- Tom Gardner
  
  Contact options for registered users
Vote on answer
posted
8 years ago

Mon, Jan 4, 2016 12:04 PM

Yes.

Especially if that results in the explicit design of an FSM, with a corresponding simple, easily readable and easily modifiable implementation.

The latter should strongly shape the implementation techniques, because it is all too common that "doing the simplest thing" at each modification leads to an unmaintainable ball of string.

- N
- Nobody
  
  Contact options for registered users
Vote on answer
posted
8 years ago

Mon, Jan 4, 2016 12:31 PM

By waiting for an ACK; forever if necessary.

It just keeps sending.

If an ACK arrives, the data was received. If no ACK arrives, you don't know whether it was received so you assume that it wasn't and resend eventually.

You cannot be certain that data *wasn't* received; the lack of an ACK can be caused either by the data getting lost or by the ACK getting lost. If the remote system suddenly goes silent and stays silent, it's impossible to distinguish beteween received-but-not-ACK'd and not-received data.

The simple solution is not to give up. Note that TCP will give up eventually; this is required e.g. for the situation where the remote computer gets unplugged, tossed in the dumpster, and never replaced. No protocol can fix that.

- J
- Jack
  
  Contact options for registered users
Vote on answer
posted
8 years ago

Mon, Jan 4, 2016 1:46 PM

If you can set a MCU as master and the other as slave, I find that MODBUS R TU is simple and robust enough for most of the communication that small MCU need to do. There are of course limits: a MCU is the master the other(s) are slave: itm eans that one ask the other(s) answer. No communication can be initiated fr om slaves. It measn that the communication is half-duplex (even if you use two lines, rx and tx).

Bye Jack

- T
- Tauno Voipio
  
  Contact options for registered users
Vote on answer
posted
8 years ago

Mon, Jan 4, 2016 2:48 PM

Modbus RTU is from the data communication standpoint one of the very worst protocols. For the receiving station, there is no way of knowing the packet boundaries without parsing the whole protocol with all variants.

The timing limits set for packet transmission (obviously an attempt to patch the original bad decision) makes it impossible to use an interface with built-in FIFO, as the timing is then lost.

For simple async serial transmission, the HDLC-style encapsulation of PPP is at least one of the vety best.

--

-TV

- P
- Paul Rubin
  
  Contact options for registered users
Vote on answer
posted
8 years ago

Mon, Jan 4, 2016 6:21 PM

There's a standard dilemma in distributed systems, where you want to send messages with a guarantee that they're delivered exactly once, but you can't do that because of the two generals problem. You can choose between "at least once" (but it might be delivered more than once), and "at most once" (but it might not get there at all). So the basic idea is choose "at least once" and design things such that multiple deliveries don't cause problems.

- U
- upsidedown
  
  Contact options for registered users
Vote on answer
posted
8 years ago

Mon, Jan 4, 2016 9:13 PM

At least you need to define much more accurately what you really want to do.

- L
- Les Cargill
  
  Contact options for registered users
Vote on answer
posted
8 years ago

Tue, Jan 5, 2016 5:43 AM

+1.

It's absolutely vital.

--
Les Cargill

- L
- Les Cargill
  
  Contact options for registered users
Vote on answer
posted
8 years ago

Tue, Jan 5, 2016 5:49 AM

Half-duplex RTU can therefore offer only limited utilization of those half-duplex circuits.

Sometimes that matters.

--
Les Cargill

- E
- Ed Prochak
  
  Contact options for registered users
Vote on answer
posted
8 years ago

Sat, Jan 9, 2016 10:59 PM

[]

You got may other good comments, I just wanted to add one more about something not really covered yet.

There is a reason the Network layers include a application layer. That example of charging the account twice is clearly an application layer issue.

Lower protocols can never solve that problem because it doesn't belong at the lower protocol levels in the stack.

HTH, ed

- V
- Vincent vB
  
  Contact options for registered users
Vote on answer
posted
8 years ago

Thu, Jan 14, 2016 8:01 AM

Op 4-1-2016 om 2:01 schreef pozz:

Its quite simple to add an message counter. Each time a message is send with a counter value, the remote side has to ack this with the same counter value. If no ack is received within a certain time, the message is send again. The remote side keeps track of the last received counter value. If that value is received again, the message is not processed, but another ack is send.

I have implemented this scheme over UDP, but with framing and a checksum/crc it could be applied to a serial communication stream as well.

Regards, Vincent

- R
- Robert Wessel
  
  Contact options for registered users
Vote on answer
posted
8 years ago

Thu, Jan 14, 2016 8:53 AM

Which doesn't solve the problem of the remote side receiving and processing the message, but going offline before the ACK can be sent back (perhaps the ACK was half-way out the network card at the instant the backhoe hit the cable). In that case the local node can retransmit all it wants, but it will presumably eventually give up and assume incorrectly that the remote side did *not* process the message.

- C
- Clifford Heath
  
  Contact options for registered users
Vote on answer
posted
8 years ago

Thu, Jan 14, 2016 9:06 AM

Or, like TCP, it can wait a short period to see if further messages are closely following, and just ack the highest sequence number that forms a continuous sequence. Fewer ACKs are needed that way.

TCP also piggy-backs bi-directional ACKs on outgoing data frames, which saves frames.

In order to determine the throughput of the slowest hop, it also does a "slow start", increasing in speed until a NACK indicates packet loss, then adaptively backing off a little.

None of these algorithms are difficult, but collectively, they "make the magic".

Why should it make that assumption? It might equally wrongly make the opposite assumption. Until a mutual closing handshake has been completed there is no certainty.

And even then, the change of state at one end might not have been persisted properly, and lost during a reboot.

It's important to design protocols based on "desired state" where possible, and not on "state change" requests.

Idempotence rules. So does idempotence, and idempotence.

Clifford Heath

- V
- Vincent vB
  
  Contact options for registered users
Vote on answer
posted
8 years ago

Thu, Jan 14, 2016 10:17 AM

Yes, of course that is possible. I have actually implemented something similar, but the ACK piggy-backs the remote side.

There is no way to know. Message ordinals/acks and CRCs/Checksums will make your connection quite reliable. The idea is to minimize the types and rate of errors which can occur, and to design the higher layer of the software such that it can deal with special circumstances.

That would do it. If messages are lost, and connection is regained, verify the state of the remote system and update where desired. Its all a matter of careful thinking.

Regards, Vincent