Serial Bus Speed on PCs

R

Rick C 3 years ago

I am using laptops to control test fixtures via a USB serial port. I'm looking at combining many test fixtures in one chassis, controlled over one serial port. The problem I'm concerned about is not the speed of the bus, which can range up to 10 Mbps. It's the interface to the serial port.

The messages are all short, around 15 characters. The master PC addresses a slave and the slave promptly replies. It seems this message level hand shake creates a bottle neck in every interface I've looked at.

FTDI has a high-speed USB cable that is likely limited by the 8 kHz polling rate. So the message and response pair would be limited to 4 kHz. Spread over 256 end points, that's only 16 message pairs a second to each target. That might be workable if there were no other delays.

While investigating other units, I found some Ethernet to serial devices and found some claim the serial port can run at up to 3.7 Mbps. But when I contacted them, they said each message has a 1 ms delay, so that's only 500 pairs per second, or maybe 2 pairs per second per channel. That's slow!

They have multi-port boxes, up to 16, so I've asked them if they will run with a larger aggregate rate, or if the delay on one port impacts all of them.

I've also found another vendor with a similar product, and I've asked about that too.

I'm surprised and disappointed the Ethernet devices have such delays. I would have expected them to work better given their rather high prices.

I could add a module, to interface between the PC serial port and the 16 test fixtures. It would allow the test application on the PC to send messages to all 16 test fixtures in a row. The added module would receive on separate lines, the 16 responses and stream them out to the port to the PC as one, continuous message. This is a bit messier since now, the 16 lines from this new module would need to be marked since they have to plug into the right test fixture each day.

Or, if I could devise a manner of assigning priority, the slaves could all manage the priority themselves and still share the receive bus to the serial port on the PC. Again, this would look like one long message to the port and the PC. The application program would see the individual messages and parse them separately. Many of the commands from the PC could actually be shortened to a single, broadcast command since the same tests are done on all targets in parallel. So using an RJ-45 connector, there would be the two pairs for the serial port, and two pairs for the priority daisy-chain.

I guess I'm thinking out loud here.

LOL, so now I'm leaning back toward the USB based FTDI RS-422 cable and a priority scheme so every target gets many, more commands per second. I just ran the math, and this would be almost 20,000 bits per command. Try to run that at 8,000 times per second and a 100 Mbps Ethernet port won't keep up.

I've written to FTDI about the actual throughput I can expect with their cables. We'll see what they come back with.

Vote

R

Richard Damon 3 years ago

You can get much more that 8000 cps with an FTDI interface. This is because you can send/recieve more that one character per "poll".

My first thought is why are you trying to combine everything into one USB serial port. Why not give each test fixture its own serial port (or lump just a few onto a given port) and let the USB bus do the bulk of the multi-drop.

The ethernet unit might be just a 10 MBit device, or maybe a 100MBit and you need to send a whole message block, process it, then send the data in it, and then it can send back the answer when it figures the full answer has come back. It likely doesn't even TRY to transmit on a character basis, but because of the much larger overhead of an ethernet packet, presumes network bandwidth is more important the delay.

Also, they may be quoting figures with typical routing delays assuming a multi-hop route from computer to destination, which adds to the delay, since that is the sort of application you use those for. Ethernet is a "long haul" medium, not normally thought of as short haul, particularly when talking about lower bandwidth applications.

Vote

R

Rick C 3 years ago

Yes, I'm aware of that. I suppose I didn't spell out everything in my post, but the 8,000 per second polling rate translates into 4,000 message pairs, one Tx, one Rx. With 256 end points to be controlled, this is just 16 message pairs per second per end point. The length of the messages is around 15 char, so this gives a bit over 1 Mbps. The RS-422 FTDI adapter can manage 3 Mbps, or the TTL, hi-speed adapter can be set for up to 12 Mbps, but I'm still waiting to hear from them about any internal or software overhead that would slow the message rate.

I don't know if that will work any better. I have questions in to the various vendors.

10, 100 Mbps and 1 Gbps.

"It"??? What is "it" exactly? The message blocks are 15 characters. The bus runs with a single command from the master resulting in a single response from the slave, lather, rinse, repeat. The short message size results in a low bit rate, or, really, the message rate is the choke point, not the bit rate.

I don't know where you got the "character" idea. I don't know what the adapter decides is a block to send, but I assume there is a maximum size and short of that, there's a time out.

No one said anything about Ethernet "routing" delays. I've explained to them what I'm doing and one vendor said there is a 1 ms delay in handling each "message" as I described it.

I could go with something much fancier, where the same command is sent to all slaves, and the slaves respond in turn, controlled by a separate signal controlling priority to write the reply onto the shared bus. The message from the master can be a single broadcast message, with 128 replies.

So far, no one has indicated the specific baud rates they support. They only list the maximum rate. I have to design the slaves with a clock for the baud rate times X. It would be nice to share that with the rest of the design which needs a clock around 33 MHz for comms to the UUTs.

It's kind of odd that FTDI has a hi-speed serial adapter with a TTL level UART interface that runs up to 12 Mbps, while the RS-422/485 UART interfaces only run full-speed, at up to 3 Mbps. Still, 3 Mbps will work a champ if the interface does not have message handling delays. Same concern with the 12 Mbps TTL level interface.

Vote

B

Bernd Linsel 3 years ago

So what is there against you using such a 12 Mbps USB/serial thing and attaching an RS-422/485 transceiver (e.g.

formatting link

That should meet all your requirements mentioned so far.

Regards, Bernd

Vote

R

Rick C 3 years ago

I heard back from FTDI and they only support polling rates up to 1 kHz. So I guess I'm stuck with Ethernet. I might be stuck with changing the protocol. Someone suggested that the OS will interject delays as well. So I might have to either install 16 serial ports directly in the PC, or change th e protocol so the master talks to all the slaves in a burst or a single broadcast command, and the replies are controlled by a priority scheme so they are back to back.

I didn't expect this to be the difficult part of the job.

I could also automate the test steps into the FPGA on each test fixture board. But that makes the whole thing much less flexible while developing.

Vote

D

David Brown 3 years ago

The general issue is that PC's are great at throughput, but poor at latency. USB in particular has a scheduler and polls the devices on the bus at regular intervals. (This can't really be avoided in a half-duplex master-slave system.) For Ethernet, a gigibit switch will usually have a latency of 50 - 125 us. Even with a direct connection with no switch, you'll be hard pushed to get latencies lower than 50 us, and thus a query-reply peak rate of 10,000 telegram pairs a second.

You can get higher throughput if you have multiple outstanding query-replies going to different USB devices or different IP connections. So while you are not going to get more than 4000 send/receive transactions a second to one USB 2.0 high speed FTDI serial port device, you could probably do that simultaneously to several such devices on the same bus as long as you don't need to wait for the reply from one target before sending a message to a different target. (The same principle goes for Ethernet.)

A communication hierarchy is likely the best way to handle this.

Alternatively, at the messages from the PC can be large and broadcast, rather than divided up. You could even make an EtherCAT-style serial protocol (using the hybrid RS-422 bus you suggested earlier). The PC could send a single massive serial telegram consisting of multiple small ones:

Each slave would reply after hearing its own telegram, fast enough to be complete in good time before the next slave starts. (Adjust padding as necessary to give this timing.)

Then from the PC side, you have one big telegram out, and one big telegram in - using 3 MBaud if you like.

Vote

D

Dimiter_Popoff 3 years ago

David, that kind of detailed problem solving should not go out free of charge you know :-). Of course this is the way to do it.

Vote

A

antispam 3 years ago

I am not sure if you get that there are two issues: througput and latency. If you wait for answer before sending next request you will be bounded by latency. OTOH if you fire several request without waiting, then you will be limited by througput. With relatively cheap convertors on Linux to handle 10000 roundtrips for 15 bytes messages I need the following times:

CH340 2Mb/s, waiting, 6.890s CH340 2Mb/s, overlapped 1.058s CP2104 2Mb/s, waiting, 2.514s CP2104 2Mb/s, overlapped 1.214s

The other end was STM32F030, which was simply replaying back received characters.

Note: there results are not fully comparable. Apparently CH340 will silently drop excess characters, so for overalapped operation I simply sent more charactes than I read. OTOH CP2104 seem to stall when its receive buffer overflows, so I limited overlap to avoid stalls. Of course real application would need some way to ensure that receive buffers do not overflow.

So, you should be easily able to handle 10000 round trips per second provided there is enough overlap. For this you need to ensure that only one device is transmitting to PC. If you have several FPGA-s on a single board, coordinating them should be easy. Of couse, you need some free pins and extra tracks. I would use single transceiver per board, depending on coordination to ensure that only one FPGA controls transceiver at given time. Anyway, this would allow overlapped transmisson to all devices on single board. With multiple boards you would need some hardware or software protocol decide which board can transmit. On hardware side a single pair of extra wires could carry needed signals (that is your "priority daisy chain").

As other suggested you could use multiple convertors for better overlap. My convertors are "full speed" USB, that is they are half-duplex 12 Mb/s. USB has significant protocol overhead, so probably two 2 Mb/s duplex serial convertes would saturate single USB bus. In desktops it is normal to have several separate USB controllers (buses), but that depends on specific motherboard. Theoreticaly, when using "high speed" USB converters, several could easily work from single USB port (provided that you have enough places in hub(s)).

An extra thing: there are reasonably cheap PC compatible boards, supposedly they are cheaper and more easy to buy than Raspberry Pi (but I did not try buy them). If you need really large scale you could have a single such board per batch of devices and run copy of your program there. And a single laptop connecting to satelite board via ethernet and collecting results.

Vote

R

Rick C 3 years ago

Of course I'm aware of it. That's the entirety of the problem.

Until I contacted the various vendors, I had no reason to expect their hardware to have such excessive latencies. Especially in the Ethernet converter, I would have expected better hardware. Being an FPGA sort of guy, I didn't even realize they would not implement the data path in an FPGA.

I found one company that does use an FPGA for a USB to serial adapter, but I expect the PC side USB software may be problematic as well. It makes you wonder how they ever get audio to work over USB. I guess lots of buffering.

Yes, but the current protocol using a single target works with one command at a time. In ignorance of the many problems with serial port converters, I was planning to use the same protocol. I have several new ideas, including various ways to combine messages to multiple targets, into one message. Or... I could move the details of the various tests into the target FPGAs, so they receive a command to test function X, rather than the multiple commands to write and read various registers that manipulate the details being tested.

Concerns with this include the need to reload all the FPGAs, any time the are updated with a new test feature, or bug fix. That's probably 64 FPGAs. I could use one FPGA per test fixture, for a total of 16, but that makes the routing a bit more problematic. Even 16 is a PITA.

Also, I've relied on monitoring the command stream to spot bugs. That would require attaching a serial debugger of some sort to the interface to the UUT, and the internal test controller would be much harder to observe. Currently, that is controlled by commands as well.

That's 11.3 per target, per second. (128 targets)

That's pretty close to 74 per target, per second.

I used to use the CH340 devices, but we had intermittent lockups of the serial port when testing all day long. I switched to FTDI and that went away. I think you told me you have no such problems. Maybe it's the CH340 Windows serial drivers.

I don't know what the CP2104 is.

I'm not certain what "overlapped" means in this test. Did you just continue to send 15 byte messages with no delays 10,000 times?

Since you are in the mood for testing, what happens if you run overlapped, with 128 messages of 15 characters and wait for the replies before sending the next batch? Also, if you don't mind, can you try 20 character messages?

Wait, what? How would overlapped operation operate if you have to worry about lost characters???

I'm not sure what "stall" means. Did it send XOFF or something?

Any idea on what size of aggregated messages would prevent character loss? That's kind of important.

Yes, the test fixture boards have to be set up each day and to make it easy to connect, (and no backplane) I was planning to have two RJ-45 connectors on the front panel. A short jumper would string the RS-422 ports together.

My thinking, if the aggregated commands were needed, was to use the other pins for "handshake" lines to implement a priority chain for the replies. The master sets the flag when starting to transmit. The first board gives all the needed replies, then passes the flag on to the next board. When the last reply is received by the master, the flag is removed and the process is restarted.

I've been shying away from USB because of the inherent speed issues with small messages. But with larger messages, hi-speed converters can work, I would hope. Maybe FTDI did not understand my question, but they said even on the hi-speed version, their devices use a polling rate of 1 ms. They call it "latency", but since it is adjustable, I think it is the same thing. I asked about the C232HD-EDHSP-0, which is a hi-speed device, but also mentioned the USB-RS422-WE-5000-BT, which is an RS-422, full-speed device. So maybe he got confused. They don't offer many hi-speed devices.

But the Ethernet implementations also have speed issues, likely because they are actually software based.

Yeah, but more complexity. Maybe it doesn't need to run so fast. I've been working with the idea that it is not a hard thing to do, but I just keep finding more and more problems.

The one approach that seems to have the best chance at running very fast, is a PCIe board with 4 or 8 ports. I'd have to use an embedded PC, or at least a mini-tower or something. Many of these seem to have rather low end x86 CPUs. There's also the overhead of the PC OS, so maybe I need to do some testing before I worry with this further. I have one FTDI cable. I can use an embedded MCU board for the other end I suppose. It will give me a chance to get back into Mecrisp Forth. I wonder how fast the MSP430 UART will run? I might have an ARM board that runs Mecrisp, I can't recall.

Vote

D

David Brown 3 years ago

<snip>

I would be rather surprised if you were not aware of the difference - but your posts show you don't seem to be familiar with the level of the latencies inherent in USB and Ethernet. It seems you think it is just poor implementations of hardware or drivers. (Of course, limited implementations can make it worse.)

No one implements the data path of Ethernet in an FPGA. Sometimes a few bits (such as checksums) are accelerated in hardware, and there can even be filtering or re-direction done in hardware, but the data in Ethernet packets is always handled in software.

Even if it was all handled instantly in perfect hardware, an Ethernet frame is 72 bytes plus 12 bytes gap. Then there is at least 20 bytes of IP header, then 20 bytes for the TCP header. That's 124 bytes before there is any content whatsoever, or 10 us for 100 Mbps Ethernet.

USB works by cyclic polling. There is inevitably a latency. USB 1 had

1 kHz polling, while USB 2 has 8 kHz. (I don't know off-hand what USB 3 has, but USB serial devices are invariably USB 1 or 2.)

Most serial port drivers have lower polling rates than strictly necessary by USB cycle times, since polling very fast is difficult to do efficiently. I believe it is difficult on Windows to have periodic events at a resolution below 1 millisecond without busy-waiting, and drivers can't have busy-waiting - you can't have a driver that eats one of your cpu cores just because you've plugged in a USB to serial cable!

If you write your own code that accesses the USB lower levels directly (such as using Linux libusb, or its Windows port) then you can, I believe, call USB transfer functions faster, up to the base USB cycle rate.

None of this should make you wonder about audio. You just need enough buffering to cover USB cycles (125 us for USB 2). Any application delay is typically /far/ longer, such as when collecting streaming audio from a dodgy internet connection.

I wonder if you are confusing the two related kinds of latency - one-way latency (time difference between when an application starts to send something at one end, and the application at the other end has got the data), and two-way latency for a query-reply two-way communication. You might also be mixing up jitter in this.

I say this because there are such critical differences between the needs of audio and the needs of your communication. In particular, audio does not care about two-way latencies, and can cope with significant one-way latency (up to perhaps 20 ms) even when there is video. Without video, latency is irrelevant for audio as long as the jitter is low.

Vote

R

Rick C 3 years ago

I was warned that the polling rate in USB is at best 1 kHz for full-speed and 8 kHz for hi-speed, which creates definitely significant delays in this application. I've been told by FTDI (possibly in error) that even using the hi-speed interface, the best they can set their device for is 1,000 kHz polling. This does not result in a terrible data throughput, but it's not as fast as I'd like. If FTDI supported the hi-speed polling rate of 8 kHz, I would probably settle for that and quit looking.

I'm pretty confident there is nothing inherent in 100 Mbps Ethernet that would create delays significant to this application. I've been told by one supplier, their device has a 1 ms built in delay. I'm wondering if this is a timeout to indicate a packet should be sent, even if no more data is being received. But so far, no one has said this is adjustable. I just spoke with Perle and I was told of a 5 ms delay on their Ethernet unit. Again, that's not inherent in the Ethernet protocol.

You are the second person to tell me that I didn't design FPGAs for the TTC/Acterna/Viavi TBerd to process OC-12 data. I guess I just dreamed it.

I'd like to know you base your assertion on?

10 uS would be wonderful! 100 times faster than anyone else. Where do you sell your devices?

So far, no one has said it was the PC software. They have *all* said the delays are in their box.

Please don't say USB 2. The number you cite is for hi-speed USB, regardless of the version of USB being used.

Or not. The application sends messages two-ways as a means of preventing collisions on the RS-485 bus. The delay at the slave is near zero, approximately 0.5 us. The two messages are each 150 bits long, which on a 1.5 Mbps bus take 100 us to transmit. Everything else is due to the equipment. With a 1 ms delay added, that's a 10x slowdown.

Ok, then forget about audio. Far too much has been said about that already. Thank you.

At this point I am looking at using an Ethernet to serial module on each test fixture card and an Ethernet switch to connect them all to the PC. I don't like this in terms of the connectivity and the reliance on not just one, but two different vendors to make it work. Also, most of the modules are either rather large or expensive, or from an Asian company with awkward documentation. They often design their modules without regard to height which make them skyscrapers compared to the rest of the board. But I have a couple identified as potential candidates, but they will be much harder to test, since they need to be attached to a board.

Vote

R

Rick C 3 years ago

I've been giving this some thought and it might work, but it's not guaranteed. This will prevent the slaves from talking over one another. But I don't know if the replies will be seen as a unit for shipping over Ethernet or USB by the adapter. I've been told that the messages will see delays in the adapters, but no one has indicated how they block the data. In the case of the FTDI adapter, the issue is the polling rate.

This is the format I'm currently thinking of

01 23 45 C\r\n - 11 chars 01 23 45 C 67\r\n - 14 chars

The transmitted message would add 15 char of padding for a total of 26 chars per end point. At 3 Mbps a message takes 87 us to transmit on the serial bus for 11,500 messages a second, or 90 messages per second per end point. That certainly would do the job, if I've done the math right. Even assuming other factors cut this rate in half, and it's still around 45 messages per end point each second.

I wish I had something I could run tests with. I suppose any old MCU board would do the job. All it needs to do is see the \n and return a fixed response. The PC can perform this repeatedly and time it. Unfortunately, 3 Mbps is a bit fast for RS-232 and I don't have RS-422 on an MCU card, but I do have TTL! I should be able to make that work from an RS-422 signal. The RS-422 receiver will work too, if I bias one input to ~1.5V.

Unfortunately I don't have the dongle yet, so the test will need to wait a bit. I could try it with an RS-232 dongle just to see how it will work at slower data rates. I think the fastest might be around 250 kbps.

Vote

R

Rick C 3 years ago

Actually, I wasn't taking into account that the dummy characters only need to provide a small amount of delay to prevent slave collisions. The padding doesn't need to be as long as a slave message. So, with a 3 character difference in length, 4 char of padding should suffice, and make the replies look almost like a continuous stream of characters.

I hate sending dummy characters though. They get in the way of debugging if you connect to the bus with an analyzer. But that shouldn't be needed, right? LOL In the first iteration of this test fixture, I had a bug in the FPGA code that showed up as random characters being dropped or changed. It was hard to find because that code had been used elsewhere. It was a failure in the documentation (not unlike the Ariane rocket failure) that resulted in my omission of a synchronizing FF that should have been at the input. The protocol that echos the command helped a LOT.

Vote

D

David Brown 3 years ago

Just to be clear - the slaves should not send any kind of dummy characters. When they have read their part of the incoming stream, they turn on their driver, send their reply, then turn off the driver.

The master side might need dummy characters for padding if the slave replies (including any handling delay - the slaves might be fast, but they still take some time) can be longer than the master side telegrams.

Each subtelegram in the master's telegram chain must be self-contained - a start character, an ending CRC or simple checksum, and so on. Replies from slaves must also be self-contained.

It doesn't matter how the USB-to-serial or Ethernet-to-serial adaptors break up the messages - applications read the data as serial streams, not synchronous timed data. The only timing you have is a pause between master telegrams, which can be many milliseconds long, used to ensure that if something has gone wrong or lost synchronisation, their receiving state machine is reset and ready for the next round.

Vote

R

Rick C 3 years ago

It absolutely does matter how the messages get broken up. That's where the delays come in. If the slave replies are sent over the network/USB bus one at a time, it's not significantly better than the original approach.

Vote

A

antispam 3 years ago

How do you know that data path is not in hardware? One question is if hardware is able to opperate with low latency. Another is if it should. And frequently answer to secend question is no, it should not try to minimize latency. Namely, Ethernet has minimal packet size which is about 60 characters. If you send each character in separate packet, then there would be very bad utilization of media. So, converter is expected to wait till there is enough characters to transmit. Note that at 115200 bits/s delay of 1ms is roughly

11 characters, so not so big. At lower rates delay becomes less signifincant and at higher rates people usually care more about throughput than latency. And do not forget that Ethernet is shared medium, even if convertor could manage to transmit with lower latency withing available Ethernet bandwidth, it could do that only at cost of other users (possibly second convertor).

And from a bit different point of view: normally there will be software in the path, giving you 0.1ms of latency on good modern unloaded hardware and much more in worse conditions. Also, Ethernet likes packets of about 1400 bytes size. On 10 Mbit/s Ethernet this is about 1.4 ms for transmitssion of packet. If network in not dedicated to convertor such packets are likely to appear from time to time and convertor has to wait till such packet is fully transmitted and only then gets chance to transmit. So, you should regularly expect delays of order

1ms. Of course, with 100 Mbit/s Ethernet or gigabit one media delays are smaller, but serial convertors are frequently deployed in legacy contexts where 10 Mbit/s matter.

Audio is quite different than serial. Audio can be pre-scheduled but in general you do not know when there will be traffic on serial port.

Well, my use is rather light. Most is for debugging at say 9600 or

115200. And when plugged in convertor mostly sits idle. I previously wrote that CH340 did not work at 921600. More testing showed that it actually worked, but speed was significantly different, I had to set my MCU to 847000 communicate. This could be bug in Linux driver (there is rather funky formula connecting speed to parameters and it looks easy to get it wrong). Similary, when CH340 was set to 576800 I had to set MCU to 541300. Even after matching speed at nomial 576800, 921600 and 1152000 test time was much (more than 10 times) higher than for other rates (I only tested 1 character messages at those rates, did not want to wait for full test). Also, 500000 was significantly slower than 460800 (but "merely" 2 times slower for 1 character messages and catching up with longer messages). Still, ATM CH340 looks resonably good.

Remark: I bought all my convertors from Chinese sellers. IIUC FTDI chip is faked a lot, but other too. Still, I think they show what is possible and illustrate some difficulties.

It is a chip by Silicon Laboratories. Datasheet gives contact address in Austin, TX.

No. My slave simply returns back each received character. There is some software delay but it should be less than 2us. So even waiting test has some overlap at character level. To get more overlap above I cheated: my test program was sending 1 more character than it should. So sent message was 16 bytes, read was 15. After reading 15 another batch of 16 was sent and so on. In total there were 10000 more characters sent than received. My hope was that OS would read and buffer excess characters, but it seems that at least for CP2104 they cause trouble. My current guess is that OS is reading only when requested, but I did not investigate deeper...

OK, I tried modifeed version of my test program. It first sends k messages without reading anything, then goes to main loop where after sending each message it read one. At the end it tail loop which reads last k messages without sending anything. So, there is k + 1 messages in transit: after sending message k + i program waits for answer to message i. In total there is 10000 messages. Results are:

CH340, 15 char message 20 char message k = 0 6.869s 7.163s k = 1 4.682s 1.320s k = 2 0.992s 1.320s k = 3 0.991s 1.319s k = 4 0.991s 1.320s k = 5 0.990s 1.319s k = 8 0.992s 1.320s k = 12 0.990s 1.320s k = 20 0.992s 1.319s k = 36 0.991s 1.321s k = 128 0.991s 1.319s

CP2104, 15 char message 20 char message k = 0 2.508s 3.756s k = 1 1.897s 1.993s k = 2 1.668s 2.087s k = 3 1.486s 1.887s k = 4 1.457s 1.917s k = 5 1.559s 1.877s k = 8 1.455s 1.803s k = 12 1.337s 1.501s k = 20 1.123s 1.499s k = 36 1.125s 1.502s

k = 128 reliably stalled, there were random stalls in other cases

FTDI232R, 2 Mbit/s 15 char message 20 char message k = 0 5.478s 3.755s k = 1 4.929s 3.030s k = 2 2.506s 3.339s k = 3 2.459s 2.020s k = 4 1.708s 1.061s k = 5 1.671s 1.032s k = 8 0.764s 1.021s k = 12 0.772s 1.014s k = 20 0.763s 1.009s k = 36 0.758s 1.007s k = 128 0.757s 1.008s

FTDI232R, 3 Mbit/s 15 char message 20 char message k = 0 8.216s 10.007s k = 1 5.006s 4.344s k = 2 3.338s 1.602s k = 3 2.406s 1.444s k = 4 1.766s 1.316s k = 5 1.599s 1.673s k = 8 1.040s 1.327s k = 12 1.071s 1.312s

With k = 20, k = 36 and k = 128 communication stalled.

With PL2303HX at 2 Mbit/s I had a lot of transmission errors, so did not test speed.

My program uses blocking system calls, it did not finish in resonable time. I did not investigate deeper. ATM I assume that OS/driver is correct os that my program would get characters if convertor delivered them. I also assume that MCU is fast enough to avoid loss of any character (character processing should be less than

2us, at 2 Mbit/s I have 5us per character). In inital test I have sent more characters then I wanted receive, so loss of some characters would not stop the program (OK, loss of more than 10000 would be too much). I this batch of tests I sent exactly the number of characters that I wanted to receive, so loss of any would cause infinite wait.

Each convertor has finite transmission and receive buffers. Accordinng to datasheet CP2104 have 576 character receive buffer. For other I do now have numbers handy, but I would expect something between 200 characters and kilobyte. When characters arrive via serial port they fill receive buffers. Driver/OS/user program have to promptly read them. When doing first test my hope was that OS/driver will read characters from convertor and store them is system buffer. But then I saw stalls with CP2104. After I have seen this my guess was that in my test I overflowed CP2104 receive buffer (in my initial test I was sending 10000 characters more than I received, so much more than receive buffer size). However I have seen stalls with k = 18 and message size 15. And even with k = 0 and message size 20. In both cases new test program guaranteed that amount of data in transit was much smaller than stated buffer size. So, at least for CP2104 there must be some other reason.

The issues are more fundamental: both in USB and Ethernet there is per message/packet overhead. Low latency means sending data soon after it is available, which means small packets/messages. But due to overheads small packets are bad for throughput. So designers have to choose what they value more and in both cases the whole system is normally optimized for throughput.

MSP430G2553 theoretically allows setting quite high rates like 4 Mbit/s, but it is not clear it it will run (if noise immunity is good enough). AFAICS 1 Mbit/s is supposed to work. Other thing is software speed, I think that software can handle 1 Mbit/s, but probably not more.

Vote

R

Rick C 3 years ago

Not only did the vendor tell me it's through a CPU, he laughed at the idea of implementing Ethernet in an FPGA. That's when I sent him a link to the TBERD product line I had worked on around 2000.

At the high serial rates we are talking about (3 Mbps) waiting for milliseconds is a bit absurd. Why? Because it can cause exactly the poor results I'm talking about. But the vendor with the Ethernet box never said the delay was intentional. What you are talking about would only be on the slave to master path. Why would data received from the network be delayed before sending to the slave? That sounds like a recipe for disaster.

We are not talking about 0.1 Mbps, rather 30x faster, 3 Mbps.

Most Ethernet is not shared, rather point to point. In this case it definitely is not.

Ok, now it sounds like you are agreeing with me that the hardware is poor.

No one is using 10 Mbps. If needed, I would use 1 Gbps. I'm assuming that's not required. But you keep talking about "likes" and shared which don't apply here, at all. If a product is going to support up to 16 ports at 3 Mbps each, it seems like a bad idea to saddle them with such throughput killers.

Now you are being silly. If they design the equipment to work on 100 Mbps or even 1 Gbps Ethernet, you think it's reasonable for them to limit it to what they can do on 10 Mbps?

Yes, it's reasonably good for situations where it does not need to work reliably. I was surprised when the finger was pointed to the CH340 adapter. But someone (probably here) had warned me they are not dependable, and now I know. The cost of a name brand adapter is not so much that it's worth saving the difference, only to have to throw it out and go with FTDI anyway, when you have real work to do.

FTDI fakes no longer work with the FTDI drivers. Maybe they play a cat and mouse game, with each side one upping the other, but it's not worth the bother to try it out. FTDI sells cables. It's easier to just buy them from FTDI.

Some of the results seem odd, hard to understand, like why the message rate improves so much as k is increased, but so dramatically at 3 Mbps. They all seem to approach ~1.3 second as k increases. At k=0 they are around 1 ms per message, which is the polling rate... if you adjust it. I think the default for FTDI was 8 ms.

Thanks for doing this.

With 100 Mbps Ethernet the inherent latencies are very low compared to my message transmission rates. One vendor specifically indicated the delays were in their software. That was when I mentioned FPGAs and he talked as if I were being ridiculous.

I have an FTDI adapter which I will try running my own tests with. To be realistic, they should be with a target, but that might be a problem just now. We'll see what I can cobble up. Most of my stuff is not convenient at the moment.

I was playing with it using Putty, but that's not the best terminal emulator in the world. I can't get it to show control characters or use different colors for transmit and receive. Heck, maybe I'm just being stupid, but I can't find how to send a file through the port. I'm pretty sure I've done that using Putty before, because that's how you compile programs on an embedded Forth. You simply send the file through the serial port like you were typing it.

Vote

A

antispam 3 years ago

Well, delay from Ethernet to serial port clearly means that implementer did not spent enough effort to make it fast.

You were talking about connecting more convertors. Normally laptops have only single Ethernet port, so all convertors that you connect will share single Ethernet. If you use 100 convertors, 3 Mbits/s each

switches it should be possible to get 30 Mbytes/s of aggregate bandwidth (assuming gigabyte port in laptop and gigabyte switch at top of tree). But if each converter would waste a lot of bandwidth due to small payload per packet, then such rate would be impossible.

It is your planned use that would kill throughput. I would expect that when product is used as intended you would get resonable fraction (say 70%) of nominal throughput (that is 2*16*3Mbits/s). If not, then I will join you in calling it bad product.

Sometimes you get product designed for 10 Mbit/s which just got faster Ethernet part to be good citizen on fast network. Above you wrote about 16 port thing. That should be designed for faster network, but on common 100 Mbit/s Ethernet running ports in parallel it would be limited by Ethernet troughput. And even on 1 Gbit/s Ethernet it needs enough bandwidth that you can not waste it even if it is the only thing on the network. And just a litte thing: you wrote Ethernet, but raw Ethernet is problematic on PC OSes. So I would guess that you really mean TCP/IP over Ethernet. TCP requires every packet to be acknowleged, which may add more small-packet trafic.

Well, I say you what I observed. People say various thing on the net. I was interested if net know something about my trouble with CP2104 so I googled for "CP2104 lockup". And I got a bunch of complaints about FTDI devices, solved by using CP2104. So, there is a lot of noise and ATM I prefer to stay with what I see.

AFAIK Linux driver does not discriminate againt non-FTDI devices. So fact that convertors works with Linux driver tells you nothing about its origin. And for the record, I bought mine several years ago.

Let me first comment 2Mbit/s results. FTDI transfers data in 64-byte blocks (they say that actual payload is 62-bytes and there are 2-bytes of protocol info). With 15 characters messages 0.764s really means

98% of use of serial bandwidth, so essentiall as good as possible. Corresponding k = 8 means really 9 messages in transit, so 135 characters which is slightly more than 2 buffers. More data in transit does not help, but also does not make things worse. With 20 charaster messages main improvement is at k = 4 which means 100 characters, which is smaller than 2 buffers, with extra improvements for more data in transit. With CH340 and 15 char messages we see main improvement for k = 2, which corresponds to 45 characters in transit. With 20 char messages we get impovement for k = 1 which is 40 charactes in transit. CH340 uses 32 character transfer buffers, so improvemnet corresponds to somwhat more than 1 buffer in transit. Now, if transfers between converter and PC were at optimal times, then one buffer

one character would be enough to get full serial speed. But USB tranfers can not be started at arbitrary times, IIUC there are discrete time slots when transfer can occur. When tranfer can not be done in given slot it must wait for next slot. So, depending on locations of possible slots more buffering and more data in transit may be needed for optimal performance. OTOH 2-3 buffers should be enough to allow PC to get full bandwidth and this is in good agreement with FTDI results. In case of CH340 there is extra factor: CH340 also uses 8 byte transfers. I do not know what function they have, but resonably likely guess is that those 8 byte pack tranfer control info that FTDI bundles with normal data. Anyway, those are "interrupt" tranfers in USB sense, so have higher priority than data transfer. Resonable guess it that they steal some USB bandwith from data tranfers. Also, smaller than maximal data block size limits efficiency, so it is possible that CH340 is limited by USB bandwith (lack of enough slots).

Now, concerning 3 Mbits/s, due to different serial speed optimal times for transfers are different than in 2 Mbits/s case. It is possible that there is worse fit of desired and possible transfer times. Buffering allows to at least partially cure this, so initial improvement. But clearly, there is some extra bottleneck. Now some speculation: with 1/8 ms USB-2.0 cycle, there is 1500 FS clock per cycle. I would have to look at spec to be sure, but this is close to 150 byte worst case FS transfer. Beside data there is some USB protocol overhead and (speculatively) it is possible that low level USB diver may refuse to schedule two 64-byte transfers in single cycle. In such case effective bandwith for serial data would be 4096000 bits, which correspond to 5120000 serial bits (serial sends start and stop bits which are not needed for USB). This is less than full duplex 3 Mbits/s (both directions add to 6 Mbits/s and must go trouh the same USB). With larger amount of data in transit this could give wild oscilations in amount of buffered data, leading to slowdown when buffers get empty and giving stall when receive buffer overflows.

Of course there is another speculation: convertor may be fake. Supposedly fakes use MCU-s with special program. Software could crate delays which limit transfer rate at 3 Mbits/s and lead to data loss/stall with more data in transit.

Well, you wrote that you have needed experience, so do low-latency Ethernet-serial convertor based on FPGA. Give your numbers and look how many customers come in.

Vote

R

Rick C 3 years ago

I won't argue with that!

Good thing we aren't trying to use 100 converters. The vendors who produce 4, 8 and 16 port versions don't do much to make them fast actually. I think the matter of small messages, just don't come up often enough to be on their radar.

"Intended"!? I saw nothing in any document that said serial port traffic had to meet any particular specifications. They didn't set this sort of spec when they designed the product. It happened that it had this limitation and someone said, "Good enough, ship it"!

I'm not sure what your point is. But it is not important. We are discussing nits at this point.

What sort of complaints about FTDI? Did you contact them about it?

I'm not using Linux. I don't have any FTDI fakes. I have some Prolific fakes somewhere, if I could find them. I never had one bricked, but I think it was Prolific that did that some years ago. Or, I may have them confused with FTDI. I remember the bricking driver was released with a Windows update and MS was pretty pissed off when the bricking hit the news.

Yeah, I'm not following that at all. At k=8, the 2 Mbps FTDI transferred in 1.021s. What is 0.764s???

It's too late for me to try to read all this.

No, I never said I've designed Ethernet interfaces. I said, I worked on FPGA code in the a comms tester, which also tested Ethernet. I worked on one of the telecom formats, OC-12 rings a bell. Besides that would be a major project. I have two other major projects to work on. This should be something I can buy.

Reading your tests has made me realize, that while combining the messages for every target into one batch can be a bit unwieldy, I could limit the combinations to the end points on a single card. The responses have to be combined for the one driver anyway. Between the 8 end points on a single board I could easily combine those commands, and then stagger the replies without any extra signals between the boards and no special characters in the command stream.

Again, thinking out loud, at 3 Mbps, 8 * 150 bits per command is 1,200 bits or 400 us. That would greatly reduce the wasted time, even with a 1 ms polling period. It would allow an exchange of 8 commands and 8 replies every 2 ms, or 4,000 per second. That would be almost 32 per end point, which would great! Actually, it could be faster than this, since the staggering of the replies, doesn't require the first reply to wait for the last command. So replies will start at the end of the first command. The beauty of full-duplex!

Any chance you could run your test on the FTDI cable at 3 Mbps with a 1,200 bit block of data (120 characters)? I imagine the RS-232 waveform is getting a bit triangular a that speed.

Vote

D

David Brown 3 years ago

I mean it doesn't matter how the messages are broken up from the application code's viewpoint, as long as you handle it correctly as a stream and don't incorrectly assume you always read whole telegrams at a time.

You can expect the converter to buffer up the incoming data and send it in large lumps up the USB or Ethernet bus. That's how it can work at high baud rates and throughputs. You lose the precise timing information, however, and have extra latency and jitter - so you be sure to treat the incoming data as a stream and then that does not matter.

Vote

Serial Bus Speed on PCs

Join the Discussion

Didn't find your answer?