A tool was created to analyse DCCP connections 2 and Second Life circuits, this tool was written in C [94]. It was created as a single program so that it could also be used to analyse Second Life circuits transported by DCCP packets. The program can also analyse DCCP packets that are encapsulated inside UDP packets. This tool detects different connections by their source and destination addresses. The program generates graphs for information extracted from the trace. TCP connections are also detected and their packets grouped to allow the graphs to include information about TCP connections.
The number of packets and the number of bytes is counted for different intervals. The intervals used are 10 milliseconds, 100 milliseconds, 1 second and 1 minute. The number of packets and bytes a second is calculated from
1
V. Jacobson, C. Leres, S. McCanne, et al. Tcpdump. http://www.tcpdump.org/, 1989. [Online; accessed 03-November-2010]
2
DCCP is a protocol designed for datagram traffic like Second Life and OpenSim. We have developed a user space implementation and have demonstrated it working with a modified OpenSim server and client. This is documented in appendix B
this. The numbers of bytes and seconds for the different protocols are also counted and output for the different time intervals. The supported protocols are TCP, UDP, DCCP and Second Life circuits. The number of application levels bytes is calculated for different units of time overall and per protocol. The maximum number bytes per second and the maximum number of application bytes per second is also calculated. The distribution of packet sizes is recorded and output as the Probability Distribution Frequency (PDF) and the Cumulative Distribution Frequency (CDF). The frequency distribution the number of bytes a second is recorded and output as the PDF.
The number of bytes and packets per second is also recorded for each connection or flow. The data is output in the correct format for generating graphs using gnuplot 3. The per connection data is output separately for each connection and so that it can be drawn on a single graph.
The program outputs a textual representation of the trace, that is similar to tcpdump, for the simple information about packets. The information about the state of the connection is also included in the output.
3.2.1
Related work
There are several other programs for analysing traffic that are similar to SLparse that are described below, however none of them have the desired features.
CoMo [78] is a system for generating statistics about network traffic. It has input plug-ins that read packets either from the network or from files containing already captured traffic. The analysis is done by modules which receive every packet, along with a piece of memory to hold information about the flow to which the packet belongs. As CoMo is designed for generating statistics of values relative to time the flow objects are stored away at each sampling interval. The system allows a large variety of modules to be created. The modules cannot be chained together so two modules that perform similar tasks must each perform their task, separately.
CoralReef [95] is a system that can be used to perform packet capture and analysis. It contains two software stacks for parsing network traffic, one that parses packets and one that handles flows. It can read packets from a variety of sources including libpcap and its files and its own file format. It is implemented as a library, which can be used to create network monitoring programs. It can use libpcap to use BPF filters on the traffic that it reads in.
Bro [149]4is an Intrusion Detection System (IDS) which uses port numbers [167, 168, 189] and pattern matching [36] to identify user level protocols it then runs the packet through modules that track the state of application and transport level sessions to detect suspect behaviour. Bro uses libpcap to capture packets and BPF filters to take only the packets that it needs from the network. The packets are then passed to a system that examines connection behaviour protocols and hosts to look for anomalies behaviour and then generate events. The events can cause a number of things to happen, including generating logs and running scripts.
Wireshark5
is a system that can perform live capture of traffic and analysis of traffic as well as the offline analysis of captured traffic. It can identify protocols and capture files being transferred over the network as well as perform analysis of application level protocols. Wireshark is capable of reading and writing a number of different packet capture formats including the libpcap format.
The structure and operation of Nprobe is described in [127]. Nprobe [126] has modules to process and analyse the IP and TCP headers as well as the modules to process application level protocols. The paper describes the Hypertext Transfer Protocol (HTTP) and Hyper Text Markup Language (HTML) modules that can examine HTTP connections and extract the transmitted files. The HTML module can then parse the HTML to extract information on browsing patterns and references to other objects. This information can then be used to create web load generators to test and compare web servers. This data can also be used to determine unnecessary overheads in the web browsers and servers. This paper also states that Nprobe has more modules and can be used for a variety of other purposes. A visualisation tool constructed as part of Nprobe is described in [70].
Nprobe has also been used to study the behaviour and performance of web browsers [71]. The transferred HTML is examined to identify the HTTP and Domain Name System (DNS) connections that are part of the page. This allows the performance of the web browsers to be examined.
Snort6
is an IDS which uses libpcap to capture traffic. It uses pattern matching and protocol analysers to detect and block unwanted use, or misuse, of a network.
3
T. Williams and C. Kelley. GNUplot: an interactive plotting program. http://www.gnuplot.info/1998.
4
Vern Paxson. Bro. http://bro-ids.org/[Online; accessed 29-September-2010].
5
Gerald Combs. Wireshark. http://www.wireshark.org/, 1998. [Online; accessed 29-September-2010]
6
Benko and Veres, in [12], describe a method of estimating the packet loss rate, this method ignores packets whose deliver status cannot be accurately determined to give an accurate rate of packet loss. This will only work if those packets whose deliver status can be accurately determined are representative of all of the packets.
The programs described above do not have the desired features, the only program that can recognise and parse SLP packets is Wireshark. The problem with Wireshark is that it cannot produce the desired analysis or the tables and graphs used is this thesis.
3.2.2
DCCP
For DCCP connections SLparse keeps track of the state of the connection and the features. Information about the number and size of packets is recorded and used to generate graphs. By keeping track of the features it is possible for the program to determine the congestion control system or systems that are being used by the connection. The program can then use this information to track the state of the congestion control system. Packet loss can be detected and the round trip time (RTT) calculated. DCCP uses time stamps and includes delay information in packets so the network delay can be calculated separately from the other delays. When packets are captured at the ends of the connection the sending and receiving time for the time stamp packets is known. The DCCP macro state is tracked by using an implementation of the state machine that can deal with losing some of the packets that lead to state transition. The state of the connection after the packet has been sent or received is included in the text output. The contents of the headers and the options are included in the textual output.
3.2.3
Second Life
Support was added to SLparse to allow it to track the state of Second Life[115] circuits. It groups together packets that belong to the same circuit and treats them like connections for the purposes of the functions not specific to Second Life traffic. Support is included to parse the headers of packets [153] and print this information. The packet type information is then used to group packets into the throttle groups and generate graphs from this information. Information about the size of packets for different types of packet and for different circuits are extracted by the program and along with the packet type.
3.2.4
Passive loss estimation
Second Life packets have sequence numbers which can be used to identify packet loss. The loss detection system used by Second Life is a timeout based system that operates at the receiving end of the connection and relies on detecting gaps in the sequence range. The timeout is 16 times the RTT. The system keeps track of the sequence number that it expects to see next. If this sequence number is seen then the value is incremented otherwise the sequence number is searched for in the list of sequence for potently lost packets and if it is found this sequence number is removed from the list. If it is not found and the packet is not marked as having been re-sent all of the sequence numbers between this one and the expected one are added to the list of possibly lost and the next expected sequence number is set to this packet sequence number plus one. This can also be used in the calculation of fair transmission rates [27].
Oliver et al. describe a method for estimating packet loss in TCP connections in paper [143]. The packets for each connection are first grouped together. For each connection an object keeps track of which packets have been seen and which packets have been acknowledged. When the connection starts, the initial sequence numbers, of each side, are taken from the packets with the SYN flag set, this also give the client and server ends of the connection. When packets travel across the Internet there is a chance that they will arrive in a different order than they were sent. The connection keeps track of the sequence number of the next expected in-order packet, when packets arrive in the correct order this is increased to the last sequence number of each packet as they are processed. When there are packets whose sequence numbers are greater than the expected value, the list of non-contiguous blocks of sequence numbers is examined. When the packets sequence number matches the expected value, the value is updated and the non-contiguous blocks are examined. Packet loss estimation is looked at by Sommers et al. in [178].
3.2.5
Passive RTT estimation
Second Life uses special packets for estimating the RTT of the circuit. This makes calculating the RTT straightforward. A packet with the type StartPingCheckis sent by the side wanting a RTT calculation, this packet contains an ID number. The other end sends back a packet of typeCompletePingCheck in response to this packet, this packet also
has an ID number that indicates theStartPingCheckpacket that it is in response to. When theCompletePingCheck packet is received the time between its arrival and the time the last ping was sent is calculated. If there is a difference between the ID of the lastStartPingChecksent and this packet’s ID then this value is multiplied by 5 and added to the time, because pings are sent out at a constant rate. The maximum of this value the old value and the time the system has been waiting since for a reply with an ID higher than this packet is taken. This value is then multiplied by 0.8 and added to the new sample which multiplied by 0.2, this smooths the RTT as well as ensuring that the system reacts quickly to increases in RTT.
Jiang and Dovrolis, in [88], describe two methods for estimating the RTT of a connection. The first works by taking the time between the SYN and ACK of the 3-way handshake. To achieve this, the time of the first packet in the connection is recorded and the number of packets in the connection is counted. If the 3rd packet results in the connection reaching the ESTABLISHED state and the time between the first and second packets and the second and third packet are roughly equal, then the time between when the connection started and the time of the third packets is taken to be the RTT. This method results in one or zero RTT calculations per connection. For long connections the value is possibly only accurate for a small part of the connection. This technique is not entirely reliable as the handshake phase of a connection is not handled by the same code as the rest of the connection. The second method involves taking the time between the bursts of packets during the slow start phase of the connection.
The technique described in [23] improves on the accuracy and quantity of RTT estimations that were performed by other techniques. This technique is designed to closely follow the technique used by implementations of TCP as is described in [4]. The RTT is calculated by taking the time between a packet and its ACK. This gives the RTT between the monitor and the receiving host so the RTT is separately calculated for each direction, the RTTs for both directions are then added together to get the RTT for the connection. This technique works by recording the time, that each sequence number, was last received. When a packet with a valid acknowledgement number is received all of the sequence numbers, going in the opposite direction, less than the acknowledgement number are examined, to find the time between the ACK and the packet that triggered it. The record of the acknowledged packet is then removed from the list.
When packet retransmission happens TCP implementations stop calculating RTT as the relationship between packets being sent and packets being received gets less predictable. To achieve this in the connections a Boolean variable for each direction is used, this variable is set to false when a retransmission is detected. If this variable is false the RTT is not updated, but if the ACK number is larger than the largest retransmitted sequence number for that direction then the variable is set to true. If the variable is true the RTT for that direction is then updated, the difference in time of the ACK and the time taken from the list sequence numbers is calculated, if the RTT is zero it is set to the calculated value, otherwise it set to the 7/8 of the old RTT plus 1/8 of the calculated value, as is described in [148]. This can result in a greater number of RTT calculations than other systems provide if there is useful traffic in both directions. If there is no traffic in one direction a connection may produce no RTT calculations.
3.2.6
Summary
The SLparse program was created to analysis the packets that are investigated in this dissertation. It can examine DCCP, UDP, TCP and SLP. The loss of packets in the connection can be detected and RTT values can be calculated. The program can monitor the state of connections and can statistics generate statistics from information extracted from the packets.