Hi I am working on a data acquisition and transmission application, and have been experimenting witht the picdem board and microchip tcpip stack. Using TCP I managed to get a data transmission rate on about 12 Kbyte/sec from the board, though the NIC is a 10Mbps card. I was wondering if anyone has tested the max data transmission rate using the PIC chip and RTL8019 NIC and could suggest some way by which I could boost the throughput ?
I can't comment specifically on the Microchip implementation, but in general, the main thoughput constraint on the PICmicro is the RAM size of 1.5 Kbyte.
A TCP stack has to keep a copy of all data in transit, so that the data can be re-sent in the event of network errors. The PICmicro RAM size does impose a severe limit on the amount of data that can be retained, so the PICmicro must wait until some or all of its buffered data is acknowledged, before it can send any more. In many microcontroller stacks, TCP is essentially running lock-step, i.e. is sending one network frame, then waiting for an acknowledgement before sending the next. This means that the throughput will be very dependant on the round-trip time, which may explain the figures you've measured.
There are various work-arounds for the RAM problem, such as assigning more of the NIC RAM for use as mutlple transmit buffers, or re-fetching erroneous data from its source, but the fundamental problem remains; it is difficult to achieve a high TCP throughput with small amounts of RAM.
To to prove it isn't the NIC's fault, I've achieved 2 Mbit/s net data throughput on a 40 MHz PICDEM.net using UDP, but that was streaming uncompressed video with no error retries - see TCP/IP Lean chapter 14 for details.
The RAM limit of the PICs microcontroller is a great limit, but if your TCP/IP stack supports dynamic memory allocation, real-time task context switches and a good management of the interface queues, you will be able to improve the throughput of your system
Your main problem here is that in order to make TCP stacks not go into deferred ACK processing you need to put at least two segments on the wire at a time - the remote system will ACK immediately after the second one arrives. Unfortunately this means having to maintain 2 segments worth of retransmission state since you may have to retransmit either or both frames.
If I only have 1.4KB for buffering the outgoing TCP stream, then I can only transmit 1.4KB of data before I must wait for an acknowledgement from the remote node. If the round-trip time is large, then I will have to wait a long time for that acknowledgement, and the data throughput will be poor.
I don't understand how any amount of dynamic memory allocation or task switching can get round this problem - please explain.
Thanks for your reply jeremy. I feel the throughput is severely limited by the RAM in the PIC. I did manage to almost triple my throughput however, by performing the checksum calculation in a 256 byte array in the PIC. Due to the size limitation, the microchip implementation reads back all the data from the NIC transmit buffer in order to perform the checksum calculation.. the way they implemented that was to read back byte by byte, while i rewrote the code to read 256 bytes into the ram at once and then perform the calculations.. its resulted in a significant increase of throughput.
I was also interested in the UDP implementation you mentioned.. what was the network you tested over ? and what were the data fidelity/loss numbers for the implementation ?
we have the same feeling about the limits of PIC's RAM. The round-trip time of the acknowledgement strongly affects the performance of the stack running on every microcontroller with small RAM resources.
The concept explained in my reply was related to a general TCP/IP stack implementation and not specifically the Microchip one running on a PIC
This isn't right - the problem is the ACK delay for sending single frames. If you send short frames then as long as more than one goes at a time then you start to get immediate ACKs from the remote side instead of deferred ACKs. This lets you clear retransmission buffers much faster and send more data.
There's a lot more protocol overhead this way but if memory is tight then this is about the only option.
In our implementation we compute a data checksum while the Tx data is being sent to the NIC, and then adjust it for the IP & TCP headers - that way, we don't have to re-fetch the data from the NIC. For the UDP speed test we turned the UDP checksum off (which you can't do for TCP).
That was on a small zero-error local network; the data wasn't intended for the Internet, so the PICmicro just blasted it out as fast as possible.
does anyone have an example on how to use the ethernet RAM for retransmission buffers or how to backup the write pointer to re-inject the checksum after the headers have been written?
As for smaller MTU in limited RAM enviornments, this does indeed increase throughput until the RTTM gets large. If you can buffer 4 128 byte segments it is possible to pingpong between sets of 2 giving you a large increase in throughput over 1 512 byte segment. One 512 byte segment stalls the pipe until the delayed ack kicks in where as with 4 128 byte segments you are better able to keep the pipe full. This holds true until the roundtrip time exceeds the (delayed ack time - a small amount to cover the extra headers).
I've tested this, and only local lans you can get 10x or more improvement over SRP style TCP stacks.
Well for TX limiting the MTU or MSS has the same result, but the peer will most likely negotiate a larger MSS then described here so if you want to use the term MSS then you'd be limiting it on the tx side, not using what the peer is offering.
Same result would be to lower the MTU to allow only 128byte segments in the above example, or have a larger MTU and only create 128byte segments by setting the MSS to 128bytes and ignoring the peer unless it is lower than that value.