MQTT QoS vs TCP

- P
- pozz
  
  Contact options for registered users
posted
11 months ago

Fri, May 19, 2023 9:39 AM

I know TCP is able to guarantee the delivery of messages from sender to receiver, without corruption (thanks to checksums), in order and without duplication (thanks to sequence numbers).

So I'm confused when I read something about MQTT QoS. For example, QoS 1 (at least once) uses ack (PUBACK) and retransmissions (with packet ids) to guarantee that the message is delivered at least one time.

I'm wondering how this is possible over a TCP connection that uses the same technique of ack and sequence numbers.

From what I know about TCP, if some data isn't well acknowledged by the receiver, the sender automatically resends the packets not acked. This is performed by TCP/IP stack or kernel, without the application knows anything about this.

It seems to me it's impossible that a MQTT client needs to resend a MQTT message because it wasn't received by the broker. If this happens, TCP should signal the error to the application that should close and try to reopen the connection.

- E
- Ed Prochak
  
  Contact options for registered users
Vote on answer
posted
11 months ago

Fri, May 19, 2023 3:14 PM

TCP is only one of the lower levels of the protocol stack. Data can sometimes be lost in the higher levels.

Secondly, there is the issue of resend timeouts. If TCP fails to deliver the message past the MQTT retry time limit, then MQTT will resend the message.

HTH, Ed

- I
- Ivan Shmakov
  
  Contact options for registered users
Vote on answer
posted
11 months ago

Fri, May 19, 2023 4:51 PM

After which an MQTT client will need to retransmit its message, no? The difference between QoS 0 and QoS 1 boils down to whether the sender of the message is actually bothered to do that.

(I /think/ QoS 1 also allows for reliable delivery along the entire client-to-server-to-another-client path, but I'm not sure about that.)

TCP will also signal an error when the message has successfully reached its destination, but the respective acknowledgement has not; as such, an application level protocol running over TCP generally needs the means to weed out the duplicates that are bound to happen in this case. Which is what MQTT QoS 2 does.

- P
- pozz
  
  Contact options for registered users
Vote on answer
posted
11 months ago

Mon, May 22, 2023 7:11 AM

In this case, there's only one higher level, that is MQTT application. How an application running on a machine could lost something? Network links aren't reliable, but applicaions running on a processor are reliable. Do you think about application crash or an entire machine crash that needs a reboot? In this case, after the reboot, the MQTT application usually doesn't know anything about the previous connection, timeout and lost messages... except it saved something on a non volatile memory.

What happens in this case? Suppose one TCP fragment with a single MQTT message (just for simplicity) sent by a client to the server (the broker) was lost. After a TCP timeout, the network stack autonomously resend the fragment until an ACK is received. Even if the MQTT application resend the MQTT message *before* TCP timeout, it will not be sent by TCP layer until the previous fragment is acked. Maybe, more exactly, on the receiver machine, the TCP layer will not pass the resent message to the application (the MQTT broker) before the lost TCP segment is received as well. When the lost TCP fragment is received, the broker will receive two MQTT messages: the "original" and the resent ones. I think it's impossible for the broker to receive the second transmission without receiving the first.

So it seems to me the retransmission made at the MQTT level is completely useless... but I think I didn't get the real point here.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
11 months ago

Mon, May 22, 2023 8:09 AM

I haven't used MQTT much, but generally if an application gets a timeout and wants to retry, it will close the TCP/IP connection and open a new one. (Or rather, open a new one while the old one is closing - closing a failing TCP/IP connection can be slow.)

I would actually have thought that UDP was a more natural choice for MQTT, rather than TCP - although older versions of MQTT did not have QoS and were therefore reliant on TCP's acknowledges and retries.

(I always think its a shame that SCTP never caught on - among its many benefits, you don't have this head-of-line blocking issue.)

- P
- pozz
  
  Contact options for registered users
Vote on answer
posted
11 months ago

Mon, May 22, 2023 9:08 AM

I'm quite sure that MQTT retransmission mechanism is *not* based on a new TCP connection. In MQTT, the TCP connection is persistent. It can stay open for days without exchanging any real data. In this case, the keepalive facility is there to detect a broken link.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
11 months ago

Mon, May 22, 2023 10:54 AM

If the TCP/IP connection is working correctly, messages will be transmitted correctly to the broker. If a QoS message fails to be transmitted - the MQTT client or server does not receive an acknowledge in time - then there are two possible issues. One is that the server/broker application is in trouble. The other is that there is an issue with the network. In most cases, I would suspect the network first. TCP/IP already has acknowledges and timeouts, so if it is a temporary problem then it is likely to be handled there. By the time it reaches the attention of the application protocol's QoS handling, you are definitely at the point where a new TCP/IP connection is the right way to go - perhaps targeting a different IP address or via a different route.

The MQTT application already has to handle dropping and making new TCP/IP connections - even if the norm is for the connection to last for weeks at a time or more. So creating a new TCP/IP link has a lot to gain, and very little to lose, and it is the standard way to handle such issues.

- P
- pozz
  
  Contact options for registered users
Vote on answer
posted
11 months ago

Tue, May 23, 2023 6:53 AM

Yes, this is the only solution for me too. Anyway, I don't know if this behaviour (closing and reopening TCP connection) is described in the MQTT specifications.

Here[1] the MQTT client implementation of lwip, a popular TCP/IP stack for embedded systems. When the timeout for the ACK is expired, this client only calls an application callback with ERR_TIMEOUT. Maybe the decision to close and reopen a new TCP connection is passed to the application. I don't know if other MQTT clients implement an embedded mechanism that automatically tries to solve the issue of lost ACKs by reopening a TCP connection.

[1]

formatting link

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
11 months ago

Tue, May 23, 2023 7:55 AM

I haven't read the MQTT specifications - I don't even know what documentation exists for the protocol. But implementation details like this are not always covered in such documents, as it is really at a level below the protocol itself. (The specifications for HTTP, for example, don't say how many simultaneous connections a browser should have to a web server, or when it should give up and retry.) So don't be surprised if this is /not/ in the specs - that does not mean a client cannot or should not make new TCP/IP connections.

This is a bit muddled. I am familiar with LWIP, but I don't know whether you are talking about an MQTT client that you wrote yourself, or which comes as part of newer LWIP, or which someone else contributed as a sample.

Yes, that would be the normal behaviour.

I don't know either. I can only tell you that if you are failing to communicate on a TCP/IP connection, then making a new one (possibly after a delay) is the normal way to handle things if you want automatic recovery.

- P
- pozz
  
  Contact options for registered users
Vote on answer
posted
11 months ago

Tue, May 23, 2023 9:00 PM

In the link, there's the official MQTT client implementation of lwip project.