High packet rate ethernet with LwIP

- S
- Stef
  
  Contact options for registered users
posted
4 years ago

Fri, Sep 20, 2019 4:27 PM

For transportation of several high speed serial ports, we want to use a serial over ethernet approach. At a minimum I need to transfer four ports with 512 byte packets that come in every 40 ms at 2 Mbps.

Search for a ready to use (for test at least) device has turned up nothing. Standard devices usually go up to 230 kbps, some to 1 Mbps. Most faster ones are single port only. And it should be a small device, embeddable, logic level is all that is needed. If anyone knows of a usable device...

Since the board that this is on incorparates an ethernet capable controller (LPC4088) that has not that much work to do richt now, I thought it should be possible to add ethernet to it.

For testing I got an LPC4088 demo board with ethernet interface and got LwIP running on it (stand alone, no OS).

Got it working OK to simulate 4 ports sending slow data to 4 TCP ports that can be opened with a PC terminal program.

But when I turn up the data volume and packet rate, it stops working. In some cases stuck in the hard fault handler, in other cases in a timed waiting loop in the LwIP stack.

At this moment it is running fine with 2 ports open and a 512 byte packet every 100 ms on each. When I increase the speed to 50 ms, it stops after a while.

I have no previous experience with LwIP so where to start looking?

- First of all does it sound feasable over a 100 Mbit ethernet connection using TCP?

- What can be tuned in LwIP to improve this? I have already increased PBUF_POOL_SIZE from default 7 to 64 and MEM_SIZE from 12k to 32k

--
Stef    (remove caps, dashes and .invalid from e-mail address to reply by mail) 

A man who keeps stealing mopeds is an obvious cycle-path.

- U
- upsidedown
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Fri, Sep 20, 2019 6:20 PM

Have you looked at the Ethernet traffic with e.g. Wireshark ? Look for TCP retransmissions. If each serial line pumps data at 2 Mbit/s without any flow control, a TCP retransmission will upset the Ethernet traffic for a while, possibly overflowing buffers. Anyway, if retrials are expected, the gross (ethernet) speed must be several times larger than the net data rate.

Is it possible to use UDP and accept that some packets may be lost ? Include a serial number in each frame and the receiver can detect missing packets (that have failed CRC test) and mark the missing data accordingly. This way the net speed can be much closer to the raw data rate.

- J
- Joe Chisolm
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Fri, Sep 20, 2019 6:44 PM

Post to the lwip-users list. They may be able to help. It's very active

formatting link

--
Chisolm 
Texas-American

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Sat, Sep 21, 2019 5:00 PM

If you can consider USB instead of Ethernet, an FTDI 4232H chip will handle 4 ports at 3 Mbps without trouble.

But there are no limits in LWIP speed for what you need. I have run a board with a Modbus TCP/IP server handling 6 simultaneous connections each with a transaction (request and reply) every millisecond - far more packets than you need here.

Your problem might be limiting parameters in the LWIP configuration, or it might be in the basic structure of the code. You can try increasing configuration parameters to see if it helps. You can also enable statistics for LWIP, which can give you a clue - if you are getting something on the failure counters, that will let you know what you need to increase.

Another poster suggested using WireShark to see the traffic - that is definitely a useful tool.

- U
- upsidedown
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Sat, Sep 21, 2019 9:01 PM

That is essentially a half duplex environment with inherit flow control. So if there is some congestion issues, it will clear sooner or later.

The OP did not specify if those 2 Mbit/s streams had some kind of flow control. If not, a congestion problem could cause an overflow and possibly lockup.

Anyway, I did not notice that the packets came with such low duty cycle, so even a 10baseT connections should be able to handle four streams even with TCP/IP even with some retransmissions, provided that a missed frame is reacted upon in less than 40 ms.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Sun, Sep 22, 2019 7:06 AM

It is half duplex on each connection, but full duplex overall (you can receive a request on one connection while sending out a reply on another, and there is always other traffic such as ARP or broadcasts flying around). That is always the case with TCP/IP and Ethernet.

Of course he needs to support the UART side of this with the four channels coming in - I assumed that much was obvious, and that he could handle that side of the problem easily enough. And I also assumed that there is no problem with a bit of variable latency on the transmission of packets through the system and over the Ethernet, otherwise the whole idea is infeasible.

Yes. It should be entirely possible, without pushing anything to its limits.

- U
- upsidedown
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Sun, Sep 22, 2019 8:29 AM

While it is legal for the Modbus/TCP client to send out multiple request to a single server, without first waiting for response for previous requests, there are some practical maximum number of outstanding request If not otherwise but at least when the client send window is full (request not TCP/IP acknowledged by the server TCP/IP stack).

On the server side, if there are performance issues, it should artificially delay the responses, thus ultimately forcing the client to suspend the next request. When receiving requests from multiple clients, the TCP/IP stack should not acknowledge a request, until there are Rx buffers available or when the Modbus/TCP server is ready to handle next Modbus request/reply sequence,

All this reduces throughput during a congestion, but at least maintains link sanity.

If there is no flow control on the serial side, then this is a _hard_ realtime case, so the system *must* respond to a request no matter what, before the next packet arrives, in this case 40 ms. In some cases it might be possible to do some buffering on the serial side, but the buffer size needs to be large to survive longer hiccups. Anyway, the high level system design must know, how to handle overload issues (alarming, freezing, interpolation etc.) in case of failure.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Sun, Sep 22, 2019 9:24 PM

The clients here were multiple independent processes on the same PC. This is perfectly reasonable for any type of TCP/IP service. On the server side, being a single core processor, it is entirely possible to have multiple requests waiting, though only one request is processed at a time. And of course only one request packet could be coming in on the Ethernet MAC at a time (multiple packets could be in-flight, buffered at switches along the way from the client to the server).

I was not pushing requests as fast as I could - they were being sent at approximately 1 ms intervals from each client.

Of course incoming packets are delayed if there are no Rx buffers available. But 6 buffers are all that are needed for 6 simultaneous connections (plus a buffer or two for "noise" Ethernet packets).

The Modbus server code sees there is a request from a client, processes it, generates a reply, and sends that out to the LWIP stack. These replies are buffered, and passed out through the MAC when it is free.

The link has a bandwidth ceiling - 100 Mbps total, the processor has speed limits for handling the packets, and there is a limit to the LWIP buffers. Unlimited links are hard to find.

I am afraid I don't see where you are going with this. My point was merely to show that an LWIP based microcontroller card can handle a great deal more throughput of packets than the OP needed for his use - LWIP itself is not the limitation he feared.

We have no information about what flow control there might be, or what real-time and latency requirements there might be (and hard time limits are independent from flow control - no flow control does not imply hard timing requirements).

Obviously.

Also obvious.

I think it is fair to assume that the OP is aware of such things until he indicates otherwise. He merely wants to know if his throughput problems are due to a limitation of LWIP, or if LWIP can be configured to improve the throughput here.

- S
- Stef
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Mon, Sep 23, 2019 8:37 AM

On 2019-09-20 snipped-for-privacy@downunder.com wrote in comp.arch.embedded:

No, not done that yet, but indeed something I need to do.

That is something that came to my mind just after posting the question. Would UDP be a better match for this application? The whole TCP retransmission and timers may be the mechanisms causing the problem. I think the standard TCP timeouts ar far too slow for this kind of thing? Think I saw a 250 ms timer tick in the LwIP docs?

Missing packets may not be a problem. It is measurement data and if any re-transmit takes over 40 ms there is no point anyway, there is already new data. On a much slower rate, some control connection does have to be reliable. So that could use TCP (if is does not disturb the parallel UDP flow in LwIP) or use UDP as well and implement the retries in the application.

--
Stef    (remove caps, dashes and .invalid from e-mail address to reply by mail) 

Your temporary financial embarrassment will be relieved in a surprising manner.

- S
- Stef
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Mon, Sep 23, 2019 8:46 AM

Indeed looks very active, thanks for the link. Never used a mailing list before :$.

--
Stef    (remove caps, dashes and .invalid from e-mail address to reply by mail)

- S
- Stef
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Mon, Sep 23, 2019 9:03 AM

Funny you mention this, this exact combination is what we want to get rid of.

The FTDI chip is much more sensitive to electromagnetic disturbances that anything else we ever used. :-( Especially EFT is a problem. Testing with an FTDI eval kit even resulted in failures when the kit was in the proximity of the EFT tester, not even connected. Even though the EFT test is a conducted immunity test.

To add to the trouble, the FTDI drivers throw blue screens when the chip hangs during the test.

Good to know.

Any idea what parameters to look at? Don't see too many in lwipopts.h already increased PBUF_POOL_SIZE and MEM_SIZE with no effect.

Or parameters like LPC_NUM_BUFF_TXDESCS in lpc_17xx40xx_emac_config.h?

Agreed.

--
Stef    (remove caps, dashes and .invalid from e-mail address to reply by mail) 

"It was a virgin forest, a place where the Hand of Man had never set foot."

- S
- Stef
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Mon, Sep 23, 2019 9:10 AM

Yes, four ports with data coming in without flow control. It's is measurement data that is produced at this rate.

A bit of latency is allowed, but not too much. That is partly why the serial bit rate is that high. At this rate the 512 byte packets are transferred in under 3 ms instead of using the 'whole' 40 ms.

--
Stef    (remove caps, dashes and .invalid from e-mail address to reply by mail) 

Linux: Where do you want to GO... Oh, I'm already there!  

   -- Ewout Stam

- T
- Tauno Voipio
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Mon, Sep 23, 2019 11:08 AM

UDP may be the better option here. TCP does not guarantee to transfer the packet boundaries, but UDP is just a lightweight cover for raw IP packets.

If your raw data comes in with asynchronous serial lines, the non- reliability of datagram traffic is already there, and it does not matter much what UDP adds there.

Your processor may be overwhelmed with serial line character interrupt load.

--

-TV

- R
- Richard Damon
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Mon, Sep 23, 2019 11:13 AM

If occasional missing packets aren't an issue, but timely delivery is, then TCP isn't really the right choice. TCP is designed to get packets there eventually (if possible), at the expense of possible delays. It is also generally tuned for a long path (and thus the long delays for re-transmission).

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Mon, Sep 23, 2019 12:04 PM

TCP/IP timeouts are slow, yes - you need to consider if that will be an issue. But TCP/IP gives you better guarantees about delivery - either the packets will all arrive, with correct contents, sorted in the right order, or you will get a clear failure indication for the whole connection. With UDP, you need to handle this manually - you might need more checksumming, re-ordering, re-tries, etc., depending on the application. But UDP has fewer overheads, and with manual coding comes greater control and flexibility.

TCP/IP is also /much/ easier to handle over longer networks - through NAT routers, VPN's, ssh tunnels, and the like.

Note that if you are getting retries or re-ordering of TCP/IP packets while you are on local network, you have something badly wrong with the software or the network - retries should be very rare.

- D
- Dimiter_Popoff
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Mon, Sep 23, 2019 2:03 PM

Generally true but you can be easily surprised - like I was a while ago. A new windows 10 based laptop, about top of the line Acer had less than a year ago, was communicating with my DPS system a lot slower than my older windows XP based laptop (also Acer, about 10 years old). This via ftp at 100 Mbps, all 3 devices connected to the same switch (I let some huge file, 1G or more, transfer and watch how it goes).

I started to investigate and it turned out the windows 10 machine was just losing too many packets and my (well, that of the DPS machine) retries were not aggressive enough. Making them so took care of that, but the surprise was there allright. Did not investigate seriously the windows side as for me it was easier to do it on the DPS side.

Dimiter

====================================================== Dimiter Popoff, TGI

formatting link

======================================================

formatting link

- T
- Tauno Voipio
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Mon, Sep 23, 2019 3:55 PM

TCP will get the octets (bytes) there in the correct order, but it is explicitly specified that it does not transfer the packet boundaries. Many TCP stacks transfer the data in the same size chunks the user gives them for transport, but it is allowed to re-package the octet stream in any way the routers under way feel fit.

UDP does not need any more checksumming, the Ethernet, IP and UDP layers have already taken care of that, discarding bad datagrams. There are good reasons to add packet sequence numbering to the datagrams sent, to track possibly discarded datagrams.

--

-TV

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Mon, Sep 23, 2019 4:37 PM

There you go - something was badly wrong with the network. It had a Windows 10 machine on it :-)

- U
- upsidedown
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Mon, Sep 23, 2019 7:26 PM

Looking more closely on your original question, the actual duty cycle is less than 10 %, so you could do the same thing with four 230 kbit/s streams, thus the speed requirement is no big deal. There are lots of eth/serial converters supporting 4-8 serial lines at 230 kbit/s.

Using MAC or UDP framing is just using Ethernet as some very high speed serial connections. The advantage of UDP compared to MAC addressing is that ARP and more user friendly IP addressed compared to Ethernet MAC addresses can be used. In addition, there is a better UDP support in Windows, compared to raw Ethernet MAC framing.

You really have to do the transmission and possible retransmission in your 40 ms time window.

Just make sure you know if some frames are missing (e.g. serial numbering).

Why not, you can use both TCP and UDP as two separate links, even with the same IP-address and port number.

- S
- Stef
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Mon, Sep 23, 2019 8:16 PM

On 2019-09-23 snipped-for-privacy@downunder.com wrote in comp.arch.embedded:

Yes, I found those. Unfortunately most are quite bulky and include level conversion. At 230kbs, the packets would need almost 25 ms to transfer. Although this will fit in the 40 ms interval, the latency is higher than desired.

Yes, and I think (but have not yet been able to verify) this is the problem with the (setup of the) LwIP stack right now.

Yes, we need something to detect missing packets.

Yes, that is true, but does LwIP support this? I don't know how the final output to the MAC is treated, so in a (dumb) implementation, a TCP retransmission could hold up everything. Not that I expect this from LwIP (as it is widely used), but it is something to check.

--
Stef    (remove caps, dashes and .invalid from e-mail address to reply by mail) 

Your own mileage may vary.