I need a Linux TCP stack guru

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
I am looking for someone who knows the internals of the TCP implementation
on Linux (2.6.10 or thereabouts).  Here's a brief overview of the issue I'm
trying to resolve:

Background:
I'm trying to optimize transfers over a local GigE connection.  The Linux
machine (MIPS) is supposed to send 500K+ of data using a single send()
function from the test application.  The socket buffer size is set to more
than 1MB.  Nagle is disabled (not that it should matter in this case).  I've
essentially disabled congestion control by initializing tcp_cwnd to something
like 128.  I've done everything I can think of to make sure the kernel and/or
TCP stack have no reason to do anything but send this chunk of TCP data as
fast as possible.

Problem:
Whenever the Linux TCP stack receives a packet from the peer indicating a
larger window size, it seems to cause a delay of about 350 microseconds
before additional TCP processing occurs on this connection.  This occurs
BEFORE the peer's window ever gets too small for the Linux machine to
stop filling it, so it's not that the window closed and Linux had to stop
sending data to the peer.

Analysis:
Doing the math, this chunk should be able to be transferred in under 5 milli-
seconds (really, closer to 4 msec).  Instead, it's taking around 20 msec.  
There are 41 of these window opening delay events in my test transfer, adding
at least 15 msec to the transfer time.

I don't know if I've explained this as clearly as I'd like.  I could really
use a quick chat with someone who knows the workings of the Linux stack
inside and out (especially with regards to congestion control and ACK/
window processing).

Patrick
========= For LAN/WAN Protocol Analysis, check out PacketView Pro! =========
    Patrick Klos                           Email: snipped-for-privacy@klos.com
    Klos Technologies, Inc.                Web:   http://www.klos.com /
==================== http://www.loving-long-island.com/ ====================

Re: I need a Linux TCP stack guru
are you being bit by tcp's slow start feature here. TCP connections
do a slow start just in case the connection crosses a congested link, so
that it doesn't make the situation worse. After some epriod with a good
acks and good RTT TCP winds up to full throughput.

It's known problem with TCP on very fast uncongested networks, and
can restrict tcp throughputs. It also hits apps where there are lots and
lots of small tcp sessions (like the web :-().

Check out rfc2001, google returns loads of refs.



Quoted text here. Click to load it






Re: I need a Linux TCP stack guru
Quoted text here. Click to load it

Thanks for the reply.  Although slow start may also be involved, I determined
that the primary reason I was seeing such delays was due to interrupt
coalescing.  When I disabled interrupt coalescing on the ethernet adapter,
my transfer times became consistantly shorter.

Quoted text here. Click to load it

I'll check that out.  I'm still seeing symptoms that appear to be slow-
start-like but they don't happen all the time.  Does Linux TCP "remember"
congestion information on a per-interface basis rather then on a per-
connection basis?

Patrick
========= For LAN/WAN Protocol Analysis, check out PacketView Pro! =========
    Patrick Klos                           Email: snipped-for-privacy@klos.com
    Klos Technologies, Inc.                Web:   http://www.klos.com /
==================== http://www.loving-long-island.com/ ====================

Re: I need a Linux TCP stack guru

Quoted text here. Click to load it

"remember"
Quoted text here. Click to load it

Can't see how it can do. It might cache connection info by destination
just in case there are multiple tcp sessions to same end point - it
sounds like it would be a neat optimisation - but sorry, I'm no Linux
Kernel TCP gearhead, so dunno. What kenrel version you using?




Re: I need a Linux TCP stack guru
Quoted text here. Click to load it

It's kept in a metrics portion of the routing cache.  It's based on
broader route selection criteria, not interface.  Stored metrics
includes things like rtt, cwnd, initial cwnd, send threshold, pmtu,
negotiated mss, etc.  TCP also has per-connection state of course.
Storing metrics in the routing tables seems pretty common, I know
several other TCP implementations that do the same (e.g. Sun Solaris,
at least as of a few years ago).  This is the obvious way of doing it,
since the route picked greatly affects network behavior, and two
connections to the same address can end up with different routes, so
may need different metrics.

Site Timeline