Library - Adaptive network traffic management for multi user virtual environments

C.3.1 Provided functionality

The library can send and receive DCCP packets in IP packets and can also use UDP packets to send and receive DCCP packets. The DCCP packets are valid and interact correctly with packets from other implementations. It can work on Operating systems which include support for DCCP at the same time as the kernel level implementation is in use. The library functions create a Unix socket that data can be read from and written to as with a normal socket, for a kernel level implementation of a network protocol. The functions provided by the library are based on the socket API and use the Unix socket that is used to send and receive data. In order to allow an unmodified program to use the library a version of the library is provided that can be preloaded before other libraries that implements the socket API using USDL for DCCP packets and OS socket API for non DCCP sockets.

User Program Interface

The interface functions take a file descriptor(FD), this FD is the FD for the Unix data socket used for reading and righting data to the socket, it is used as a key into a hash map to retrieve a data structure that holds the connection’s control socket. The other arguments to the function are then sent to the library process, using the control socket, which calls a function that performs the action and then sends back the result.

There is also another library to be preloaded before other libraries, to allow unmodified programs to use the library rather than Linux kernel implementation. This library only includes the part of the library that runs in the user process and communicates with the other half of the main library.

libpcap is used to receive DCCP/IP packets and raw sockets are used to send DCCP/IP packets. Packets sent using the Raw Sockets interface are valid DCCP packets inside of IP packets and compatible with the implementation of in the Linux kernel. A filter set on the incoming socket so that only DCCP packet for open ports are received by the library.

When a port is opened the existing filter is altered to include the new port and the filter is added to the incoming socket. When a packet is received, the connection to which it belongs is retrieved. The packet and connection are then passed to a function to update the state of the connection. After the connection has been updated another function is called to attempt to send a packet.

Each DCCP connection requires several timers. The OS part includes a list of functions to call after specific times, a function is included which can add a function to be run at any time. When a packet arrives or 10 ms passes the list of times is examined and the functions for the expired timers are run.

The library can also use UDP packets to carry DCCP packets this is done using the normal UDP packet sockets provided by the OS. Using UDP packet for transporting DCCP packets has the disadvantage that it increases the header overhead it does however have some advantages. It allows the library to be used in situations were it is not possible to use the Raw Sockets interface to send out DCCP packet and/or it is not possible to get incoming DCCP packets. Network address translation is used on a number of networks on the Internet, this involves, at least the sources or destination IP address of the packet to be changed, as the DCCP header checksum includes a pseudo header similar to TCP and UDP, this will make the packet invalid. As a result of this NATs need to know about the transport level protocols. Currently the Linux kernel has support doing NAT with DCCP however this is not widely used. Using UDP packets to transport DCCP packets would allow them to pass through a NAT that was setup to deal with UDP packets.

The library can be used on systems that include support for DCCP and also on operating systems that do not include support for DCCP. On Linux when the kernel supports DCCP, the firewall rules are changed to prevent the kernel from responding to packets intended for the library. The operating system and library share the port range, in

Figure C.1: The overall structure of DCCP

order to prevent the both using the same port the library attempts to create a DCCP socket for every socket when a socket is bound to a port and then attempts to bind the created socket to the port.

C.3.2 Structure

The library forms two parts, one of which is part of the user program and the other which runs in a separate process (figure C.1). Communication between the user programs processes and the library process is done through Unix sockets, the control of the DCCP sockets is done through byte stream sockets and data is moved through ordered packet oriented sockets, this allows the library to work when the user program creates more processes as the sockets will then be duplicated and sockets will only appear closed when there are no more references to them.

When a program tries to use the library and it is not already in use, the program will attempt to connect to the process and fail. It will then create the library process. The process will then connect to this new process.

For each connection a data structure is held that stores the state of the connection and the queues. The state of each feature is stored, options to be sent out on the next non-data packet are queued. Figure C.2 shows the relationship of this data structure to other. A hash map of port numbers to connections is stored containing opened or opening client ports and the listening server ports. The connections received on each server port are stored in a hash map for the remote port and address in the listening socket.

As the library cannot directly access the network hardware packets must be sent to the operating system to be sent out. When a user program sends data on the UNIX socket to be sent to the other end of the DCCP connection it is first read off of the UNIX socket and queued before an attempt is made to send it. Copying data in memory from one buffer to another is time consuming so when data put into the queue sufficient space is reserved to construct the packet headers, this decreases the number of memory copies. The data packet is then sent on to the operating system. Implementing the network stack and using UNIX socket for communication results each piece of data that is to be sent being written to a socket twice, once by the user program and once by the library process, for the reason the implementation is designed to minimise the number of other copying that is done. When data is received it is read out of the packet and sent directly to the user program using the UNIX socket.

The delay between packets being received and them being processed is important to the performance of a network protocol. The library being in user-space introduces extra delay into this process. To allow for the delaying the

dccp_conection_state connections feature head tail next features local remote shortseq send_ack_vector NDP win ckcksum ackratio ECN CCID mincksum ackrec ackrecs next databuf tosend ccid_func local_ccid remote_ccid

Figure C.2: The structure of the DCCP connection state struct

transmission of packets DCCP allows the time elapsed between a packet and its acknowledgement to be sent, this should reduce the impact of this extra delay. The delayed sending of packets requires timers that go off at the correct time to send packets at the correct time, the accuracy of these timers is also affected by the being in user-space.

A single process handles all of the connections. It consists of a loop that reads from a number of sockets and deals with values read. The incoming packets are read from PCAP packet capture devices. Commands for the sockets are read from the control sockets and thedispatchComandfunction is called to deal with the commands. The data is read from the data sockets. This can result in a large number of FDs to read from, to manage this, poll is used. select was initially used for waiting on the sockets, however it is limited to 1024 sockets, which would limit the library to around 500 connections.

The library has been designed so that it can work with multiple process, both one program that forks another process and several separate programs. This is accomplished by having the connection handling code in the library process, as a result when a process forks the operating system duplicates the file descriptors that the process had. The library keeps a small amount of information about the socket in the processes address space, this is not a problem as the memory of a process is also duplicated when a process forks. When one of the processes closes the socket that processes copy of the FD will be closed and the data structure freed, this will not affect the FDs of other processes or the data structure. The library will only detect that the socket has been closed when the last instance of the other end has been closed, the other closes will as a result not cause any reaction by the library process. This is consistent with the expected behaviour of sockets where only the last close would result in the connection being closed. It would be possible to use a more complicated solution were each process maintained separate connections to the library process for each DCCP connection, this would reduce the possibility of any problems caused by several processes using the same UNIX socket, however it would be considerably more complex to implement and would result in a larger number of sockets being required. The library could also have been implemented using a single process with a process handling the DCCP connection, this have the advantage that both process would share a single address space so reducing the number of memory copies that need to be performed. Using a thread would have the disadvantage of only allowing the program to have a single process.

The congestion control mechanisms are designed so that there is a structure which points to all the functions of the congestion control system. This limits the amount of code that is specific to anyone congestion control system, to the system itself and the code that sets the congestion control system. The congestion control system could alternatively have been more integrated into the rest of the library, this would have allowed unnecessary calls to function that don’t do anything to be eliminated potentially allowing the implementation to be more efficient. This alternative would however make it far more complicated and time consuming to create new congestion control systems for the library.

When data is received from the user program it is first queued, before the congestion control mechanism is queried to determine if the packet can be transmitted. The attempt send function performs the process of examining the queue and calling therequestSendfunction of the congestion control mechanism, if it is successful in sending a packet it will repeat the process until there is no more queued data or the congestion control system does not allow any more

data to be sent. When the congestion control mechanism allows it, this allows more than one packet to be sent out at a time. It also gives preference to the connections that are limited by the available bandwidth rather than the speed at which the program is attempting to send data. The alternatives to using a queue in this manner would be to: 1) Send one or more packets at a time but read them directly from the data socket. This would have the advantage of not using pieces of memory for the queue. This would however allow one connection to consume too much of the time. A counter could also be included to limit the number of packets a connection can send in one attempt, but this would still fever the connections that produce the largest amount of data, 2) Send only one packet at a time. This would be the simplest solution requiring no memory management however this would also give more time to connections that are less limited by the congestion control system.

C.3.3 Congestion control system

There are currently two congestion control systems implemented. Both follow the same design were there is a setup function that is called when the congestion control system is to be initialised this function puts pointers to its functions into the passed connection structure and initialises memory for it’s self to use and puts a pointer to this in the connection structure as well.

TCP-like Congestion Control

CCID 2 [57] is designed to approximate the behaviour of the congestion control system used by TCP [160, 4]. A congestion window is maintained to control the number of packets that are in the network at anyone time, as DCCP is a packet oriented rather than byte oriented, like TCP, the congestion window is in units of packets rather than bytes. TCP does not explicitly provide congestion control of the acknowledgements, though a change in the rate of data packets will also change the number of ACKs. CCID 2 provides for the ratio of ACKs to data packets to be changed when the loss of ACKs is detected. CCID 2 requires that acknowledgements contain an ackvectoroption that indicates the arrival status of packets.

This provides congestion control that rapidly changes rate at which packets can be sent in order to take advantage of, and react to, changes in the available bandwidth. This is useful to applications that can react quickly and to applications that do not require data to be transmitted at a continuous rate, but instead need a large amount of data to be transmitted.

TCP-Friendly Rate Control

CCID 3 [58] is a slight variation of TFRC [72] for DCCP. It is designed to compete fairly with TCP whilst providing a more smooth packet sending rate. It uses the following formula to determine the correct sending rate:

X = s

R∗p

2∗b∗p/3 + (t RT O∗(3∗p

3∗b∗p/8∗p∗(1 + 32∗p2₎₎₎ (C.1)

X is the rate at which data is to be transmitted at in bits/second.

sis the packet size in bytes.

Ris the round trip time in seconds.

pis the loss event rate, between 0 and 1.0, of the number of loss events as a fraction of the number of packets transmitted.

t RT O is the retransmission timeout value in seconds that TCP would have calculated.

bis the number of packets acknowledged by a single TCP acknowledgement.

This system is intended to be more useful to the large number of applications need to be able to transmit packets of a fixed size, at rate that change smoothly. Fortunately a large number of streaming applications can operate in this fashion.

Of the two congestion control systems, CCID 3 is the more complex of the two. The method of controlling congestion that it is based on has been used by other protocols as a result there are existing implementations of it. These libraries were not used in the implementation of the library, this allows the implementation to be specific to DCCP rather than attempting to make the library fit the slightly modified version used by DCCP. If such a library had been used it could potentially have allowed the congestion control system to be implemented more quickly, though this is not necessarily true. The library implementation would have needed to be integrated into the DCCP implementation the complexity of doing this would determine if using a library was would have been quicker.

Figure C.3: The state transition diagram for DCCP

bindconnection

addport getCon

blockPort

Figure C.4: The call graph of the bindconnection function

In document Adaptive network traffic management for multi user virtual environments (Page 176-180)