Extracting the Congestion Window from the TCP Linux Kernel

By: Leslie Raju Cherian

 

Introduction:

The Transmission Control Protocol (TCP) is the most commonly used transport protocol in the Internet. It provides reliable, in-order data transfer between two Internet hosts, freeing the programmer of details like the retransmission of lost packets. Another aspect of TCP, its congestion control algorithms, is crucial to the stability of the Internet. These algorithms enable TCP to respond to signs of congestion in the network by reducing the rate at which it sends data. Before these algorithms were added to TCP, the Internet experienced episodes of congestion collapse in which the network was full of data, but nearly all of it was dropped due before reaching its final destination.

Internet researchers have designed and implemented many variants of TCP congestion control algorithms. For example, the earliest TCP congestion algorithms are often referred to as TCP Tahoe. Subsequent changes have gone by names such as TCP Reno, TCP New Reno, TCP SACK, TCP FACK, TCP Vegas and others. Many of these variants are available in the Linux kernel by changing the settings of certain compile time flags.

Our goal in this Linux Challenge entry was to develop a framework for tracking the behavior of TCP’s congestion control algorithms in the Linux kernel. We begin by observing that most aspects of TCP behavior are visible in traces of network activity that can be captured with network protocol analyzers like Ethereal. However, many of the kernel variables that track congestion control, such as snd_cwnd, are not directly visible in these traces. We modified the Linux kernel to print out the values of important congestion control variables along with other information that allows us to correlate changes in congestion control with the network activity seen in Ethereal traces. By making these important parameters visible, we hope to both increase the understanding of these algorithms and provide a window into diagnosing any congestion control problems.

 

Exposing the TCP variables:

The first problem that we ran into in trying to expose such parameters was how to identify the actual variables in the kernel code. We did a lot of detailed examination of the kernel code in order to isolate these variables and finally zeroed in on snd_cwnd to be the congestion window and rto to be the retransmission timer. These variables are defined in the file /source/include/net/sock.h as fields in the TCP options structure tcp_opt.

We then printed these variables out into the system log at /var/log/messages. Along with the congestion window and the retransmission timer, we printed out the sequence number and the timestamp of the packets so that we could correlate our results with the ones we got in the Ethereal traces. The lines of code that we added in the tcp_sendmsg() function in /net/ipv4/tcp.c are shown below.

 

printk(" C %u", tp->snd_cwnd);

printk(" T %u", tp->rto);

printk(" SEQ %u", tp->snd_nxt);

printk(" TSTMP %u", tp->rcv_tsecr);

 

Now that we had access to the variables that we needed, we ran a few tests to see how the congestion window actually responded to congestion in the network. We emulated this by running the program ttcp to send across variable amounts of data. We ran ttcp repeatedly with different values for the size of the buffer and the number of packets sent across. We automated the whole process using a script. We also made sure that the network had time to settle down between ttcp runs, by inserting a sleep() command between the runs.

While running the tests, we expected the congestion window to keep increasing, for every successful transmission, i.e., ramp itself up to oblivion with a considerably long number of successful transmissions. But, when we did run the tests, we found that the congestion window topped off at a value of 45 maximum segment sizes (MSS) and did not increase beyond that.

We also found that for our longer transmissions, there were retransmissions according the data we got in the /var/log/messages file, but the values of the congestion window did not show any decrease at all. This was one point that was really puzzling. According to the congestion control algorithms that are implemented in the Linux kernel, the congestion window is supposed to decrease dramatically on detecting a loss and this is what we expected to see. But, the congestion window seemed to remain unaffected, even after the loss. Does this mean that there is a subtle change in the implementation of TCP than what is advertised in existing papers published in the field? We would like to really learn more about why TCP exhibited this behavior. This again underscores the reason why such variables need more visibility in network traces. This increased visibility would bring out and help solve such problems that TCP has.

 

Future Work:

The main motivation for this project was to add these variables into the options fields of TCP and get them printed out in Ethereal traces. Now that we have access to these variables, we would like to add them into the TCP options portion of the header, so that when we capture them using Ethereal, we can match them up. We would also need to modify Ethereal, which is Open Source, in order for it to recognize these variables.

More information about this project is available here.