embedded tcp input only optimizations

I'm looking to streamline a tcp stack for use on the Renesas H8 series microprocessors. I would like to solicate your opinions on a few stack optimizations I have in mind, what others might be possible, and the performance which might be achieved.

For my application only tcp input is required. This should eliminate the need to maintain timers and thus allow for a purely interrupt driven implimentation (i.e. no real-time OS).

If possible I would like to design the interrupt handler such that it pulls packets out of the NIC (RTL8019AS), and simultaneously:

  1. Checksums the payload for a final ACK/NAK decision
  2. parses the incoming data translating control codes into function pointers and placing these (or the non-control code data) into the main-loop's receive buffer

By doing these two tasks simultaneously, the main-loop can be constructed by simply casting the receive buffer's head into a function pointer and then calling that function.

Does such an implimentation sound possible, and if so, how effective might it be in increasing device performance.

Are any other optimizations possible?

Thanks very much,

Albert Kennis

Reply to
akennis
Loading thread data ...

Hmmm, depends who is initiating the connection, but either way with todays syn flood prevention algorithms in place (syn cookies) I think you will not get away without retransmitting either the initial SYN or the SYN/ACK if the need arises.

Um, I don't see why you should not be able to also generate an interupt based on the timer and deal there with segment retransmission. Since this happens only ocasionally anyways, I don't think that the penatly would be that noticeable.

Sounds good to me. Consider writing this function in assembly cause the main performance gain will come from the fact that you load a word of data from the controller to the CPU exactly only once, and at the same time summarize it in a CPU register, then store it in your buffer you intend to pass to the main loop. If you code that in C (on the H8) I bet that data will be stored on the stak which obviousely will need much more cycles. Then don't forget that also your answering segments must have a propper checksum. You could prepare a default segment for it and only modify the otherwise precalculated checksum with the changes you do.

Sure.

IMHO it depends on what you intend to optimize. If it's the pure TCP throughput then useing a bigger reciveing window could help at the cost of a better segment buffering. A very important fact is wether there are simultanous connections in paralell possible or if your stack should only deal with one connection at a time. In the latter case there will be obviousely less overhead involved with buffering. If you are after optimizing CPU resource useage you will want to have fewer buffers and rely on the remote end to do more retransmits if needed etc. You could also consider if you need synflood protection (asumign you accept incoming connections) or not and also wether you want to accept early segments or not.

HTH

Markus

Reply to
Markus Zingg

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.