• No results found

5.5 Direct Sockets over Extoll

5.5.2 Setup and Connection Management

Approximately 15% of the functions provided by the Sockets interface are related to data exchange. One of the most expensive interface calls is the connection setup, but the setup procedure happens only once. For the connection establishment, EXT-DS relies on the fundamental TCP/IP behavior by utilizing EXT-Eth, which creates a traditional connection between two TCP endpoints. The following sections describe the address resolution and connection management, including TCP port mapping. 5.5.2.1 Address Resolution

EXT-DS relies on IP addressing (either with IPv4 or IPv6 addresses) and utilizes the EXT-Eth address resolution mechanism to map an IP address to an Extoll node ID, which is needed to communicate between Extoll nodes. Instead of defining a new methodology, EXT-DS simply passes the IP address to EXT-Eth, which returns the MAC address for a given IP. The node ID is encoded in the MAC address. Thus, the EXT-DS protocol begins after the source destination IP addresses have been resolved during the connection setup.

5.5.2.2 Connection Establishment and Port Mapping

The connection sequence of a characteristic server-client application is displayed in Figure 5.17. TCP is a connection-based protocol and only supports point-to-point connections. Before any data can be exchanged, two socket endpoints need to establish such a point-to-point connection. The basic function to create a socket 132

socket() bind() listen() accept() socket() connect() read() read() write() write() close() close() blocks until client connects resumes

(Possibly multiple) data transfers in either direction

Passive Socket

(Server)

Active Socket

(Client)

Figure 5.17: Overview of system calls used in stream sockets connection.

handle which can be used with send and receive functions is the socket() call. After a handle has been allocated, a connection can be established through the accept() and connect() calls. When using the EXT-DS protocol, the connect/accept sequence establishes a so called shadow socket connection over EXT-Eth, which returns a TCP/IP socket descriptor containing the IP addresses, MAC addresses, and ports used for the connection, and allocates the intermediate kernel and RDMA buffers. In addition, EXT-DS performs an initial handshake between the two Extoll nodes to exchange the RMA receive buffer space addresses. The shadow socket descriptor can be used to interface with the user application.

EXT-DS maps each TCP port to a virtual device, which provides an RMA and VELO handle to interface with the functional units. A virtual device basically defines a management structure that is pinned to a specific Extoll virtual process ID. Extoll reserves a user-tunable number of RMA and VELO VPIDs for EXT-DS to provide concurrent sends and receives. The mapping methodology relies on a simple module operation, which provides a static mapping between a port and a virtual device. For example, equation 5.3 performs the modulo operation for 16 VPIDs.

5 RDMA-Accelerated TCP/IP Communication Process 0 Socket 0 Socket 1 Socket 2 Process 0 Socket 0 Socket 1 TCP Ports Port 0 Port 1 Port 216-1 Port N-1 Port N Virtual Devices Device 0 Device 1 Device N-1

User Space Kernel Space

Figure 5.18: Relation between socket handles, ports, and virtualized hardware.

The relation between the socket handles in user space, the TCP port numbers, and the Extoll virtual devices in kernel space is displayed in Figure 5.18. Multiple ports are mapped to the same virtual device. The virtual device also contains a pointer to the corresponding shadow socket descriptor, which can be used to de-multiplex incoming data to the correct TCP port.

5.5.2.3 Connection Teardown

The TCP protocol defines two ways to tear a connection down: (1) a graceful close, where any posted and outstanding data transmission is completed before the connection is closed, and (2) the abortive close, where the connection is immediately terminated. EXT-DS emulates the TCP connection teardown functionality.

Graceful Close Depending on the set socket options, the graceful close, also known as half-closed connections, can describe two types of behavior:

(1) Graceful shutdown with delaying – delays return until all queued messages have been successfully sent or the linger timeout has been reached.

(2) Graceful shutdown with immediate return – immediately returns, allowing the shutdown sequence to complete in the background.

When the shutdown sequence is initiated by calling close(), EXT-DS needs to check whether the socket option SO_LINGER is set for the socket. If the option is set and the socket has outstanding data transmissions, then close() shall block for up to the current linger interval or until all data is transmitted. In addition, a VELO message with the user tag SHUTDOWN (refer to section 5.5.3.1) is triggered to initiate the shutdown on both sides of the connection.

Table 5.3: Overview of VELO user tags.

Type Description

VELO The user tag VELO indicates that the payload of the incoming VELO message carries a fragment of user data, which needs to be copied into the application buffer space.

VELO_LAST The user tag VELO_LAST indicates that the payload of the incoming VELO message carries a fragment of user data, but also notifies the receiver that the complete buffer has been transmitted and can be passed to the application.

RMA_INFO This user tag indicates that the next user buffer will be transferred either through a RMA PUT or RMA GET operation. The tag ensures message ordering between VELO and RMA transfers.

RMA_AVAIL The RMA_AVAIL tag is used to notify the sender that a user application has read data from the RMA sink buffer. The VELO message also provides the amount of data (in bytes) that has been freed.

SHUTDOWN The SHUTDOWN user tag indicates that one side has triggered the TCP shutdown sequence.

Abortive Close The abortive shutdown sequence returns immediately for the close() call. If the EXT-DS protocol is violated, e.g., an Extoll error occurs, the connection is abortively closed and all outstanding transmissions are dropped.