KATRAGADDA INNOVATIVE TRUST FOR EDUCATION NETWORK PROGRAMMING. Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

(1)

NETWORK

PROGRAMMING

(2)

UNIT-I

Introduction and TCP/IP

INTRODUCTION

When writing programs that communicate across a computer network, one must first invent a protocol, an agreement on how those programs will communicate. Before delving into the design details of a protocol, high-level decisions must be made about which program is expected to initiate communication and when responses are expected. For example, a Web server is typically thought of as a long-running program (or daemon) that sends network messages only in response to requests coming in from the network. The other side of the protocol is a Web client, such as a browser, which always initiates communication with the server. This organization into client and server is used by most network-aware applications.

(3)

OSI Model

(4)

A common way to describe the layers in a network is to use the International Organization for Standardization (ISO) open systems interconnection (OSI) model for computer communications. This is a seven-layer model, along with the approximate mapping to the Internet protocol suite.

The sockets programming interfaces described are interfaces from the upper three layers (the

"application") into the transport layer. Why do sockets provide the interface from the upper three layers of the OSI model into the transport layer? There are two reasons for this design:

First, the upper three layers handle all the details of the application (FTP, Telnet, or HTTP, for example) and know little about the communication details. The lower four layers know little about the application, but handle all the communication details: sending data, waiting for acknowledgments, sequencing data that arrives out of order, calculating and verifying checksums, and so on. The second reason is that the upper three layers often form what is called a user process while the lower four layers are normally provided as part of the operating system (OS) kernel. Unix provides this separation between the user process and the kernel, as do many other contemporary operating systems. Therefore, the interface between layers 4 and 5 is the natural place to build the API.

APPLICATION LEVEL VIEW OF A SOCKET

(5)

KERNEL LEVEL VIEW OF A SOCKET (IPv4)

represents SOCKET

The Big Picture

(6)

IPv4 Internet Protocol version 4. IPv4, which we often denote as just IP, has been the workhorse protocol of the IP suite since the early 1980s. It uses 32-bit addresses. IPv4 provides packet delivery service for TCP, UDP, SCTP, ICMP, and IGMP.

IPv6 Internet Protocol version 6. IPv6 was designed in the mid-1990s as a replacement for IPv4. The major change is a larger address comprising 128 bits, to deal with the explosive growth of the Internet in the 1990s. IPv6 provides packet delivery service for TCP, UDP, SCTP, and ICMPv6. We often use the word "IP" as an adjective, as in IP layer and IP address, when the distinction between IPv4 and IPv6 is not needed.

TCP Transmission Control Protocol. TCP is a connection-oriented protocol that provides a reliable, full-duplex byte stream to its users. TCP sockets are an example of stream sockets.

TCP takes care of details such as acknowledgments, timeouts, retransmissions, and the like.

Most Internet application programs use TCP. Notice that TCP can use either IPv4 or IPv6.

UDP User Datagram Protocol. UDP is a connectionless protocol, and UDP sockets are an example of datagram sockets. There is no guarantee that UDP datagrams ever reach their intended destination. As with TCP, UDP can use either IPv4 or IPv6.

SCTP Stream Control Transmission Protocol. SCTP is a connection-oriented protocol that provides a reliable full-duplex association. The word "association" is used when referring to a connection in SCTP because SCTP is multihomed, involving a set of IP addresses and a single port for each side of an association. SCTP provides a message service, which maintains record boundaries. As with TCP and UDP, SCTP can use either IPv4 or IPv6, but it can also use both IPv4 and IPv6 simultaneously on the same association.

ICMP Internet Control Message Protocol. ICMP handles error and control information between routers and hosts. These messages are normally generated by and processed by the TCP/IP networking software itself, not user processes, although we show the ping and traceroute programs, which use ICMP. We sometimes refer to this protocol as ICMPv4 to distinguish it from ICMPv6.

(7)

IGMP Internet Group Management Protocol. IGMP is used with multicasting, which is optional with IPv4.

ARP Address Resolution Protocol. ARP maps an IPv4 address into a hardware address (such as an Ethernet address). ARP is normally used on broadcast networks such as Ethernet, token ring, and FDDI, and is not needed on point-to-point networks.

RARP Reverse Address Resolution Protocol. RARP maps a hardware address into an IPv4 address. It is sometimes used when a diskless node is booting.

ICMPv6 Internet Control Message Protocol version 6. ICMPv6 combines the functionality of ICMPv4, IGMP, and ARP.

BPF BSD packet filter. This interface provides access to the datalink layer. It is normally found on Berkeley-derived kernels.

DLPI Datalink provider interface. This interface also provides access to the datalink layer. It is normally provided with SVR4.

We use the terms "IPv4/IPv6 host" and "dual-stack host" to denote hosts that support both IPv4 and IPv6.

USER DATAGRAM PROTOCOL [UDP]:-

The User Datagram Protocol (UDP) provides a connectionless, unreliable transport service.

Connectionless means that a communication session between hosts is not established before exchanging data. UDP is often used for communications that use broadcast or multicast Internet Protocol (IP) packets. The UDP connectionless packet delivery service is unreliable because it does not guarantee data packet delivery or send a notification if a packet is not delivered.

Because delivery of UDP packets is not guaranteed, applications that use this protocol must supply their own mechanisms for reliability if necessary. Although UDP appears to have some limitations, it is useful in certain situations.

(8)

Each UDP datagram has a length. The length of a datagram is passed to the receiving application along with the data.

TRANSMISSION CONTROL PROTOCOL [TCP]:-

 Connection oriented: An application requests a ―connection‖ to destination and uses connection to transfer data.

 Point-to-point: A TCP connection has two endpoints (no broadcast/multicast).

 Reliability: TCP guarantees that data will be delivered without loss, duplication or transmission errors.

 Full duplex: Endpoints can exchange data in both directions simultaneously.

 Delivering TCP: TCP segments travel in IP datagrams. Internet routers only look at IP header to forward datagrams. Each segment contains a sequence number.

 Flow Control: Flow control is necessary when a computer in the network transmits data too fast for another computer to receive it .Flow control requires some form of feedback from the receiving peer. This is executed effectively due to the receivers buffer i.e., Window.

 TCP contains algorithms to estimate the round-trip time (RTT) between a client and server dynamically so that it knows how long to wait for an acknowledgment. For example, the RTT on a LAN can be milliseconds while across a WAN, it can be seconds. Furthermore, TCP continuously estimates the RTT of a given connection, because the RTT is affected by variations in the network traffic.

TCP Connection Establishment

Three-Way Handshake

The following scenario occurs when a TCP connection is established:

1. The server must be prepared to accept an incoming connection. This is normally done by calling socket, bind, and listen and is called a passive open.

2. The client issues an active open by calling connect. This causes the client TCP to send a "synchronize" (SYN) segment, which tells the server the client's initial sequence

(9)

number for the data that the client will send on the connection. Normally, there is no data sent with the SYN; it just contains an IP header, a TCP header, and possible TCP options (which we will talk about shortly).

3. The server must acknowledge (ACK) the client's SYN and the server must also send its own SYN containing the initial sequence number for the data that the server will send on the connection. The server sends its SYN and the ACK of the client's SYN in a single segment.

4. The client must acknowledge the server‘s SYN.

TCP Connection Termination

1. One application calls close first, and we say that this end performs the active close.

This end's TCP sends a FIN segment, which means it is finished sending data.

2. The other end that receives the FIN performs the passive close. The received FIN is acknowledged by TCP. The receipt of the FIN is also passed to the application as an endof- file (after any data that may have already been queued for the application to receive), since the receipt of the FIN means the application will not receive any additional data on the connection.

(10)

3. Sometime later, the application that received the end-of-file will close its socket. This causes its TCP to send a FIN.

4. The TCP on the system that receives this final FIN (the end that did the active close) acknowledges the FIN.

Since a FIN and an ACK are required in each direction, four segments are normally required.

We use the qualifier "normally" because in some scenarios, the FIN in Step 1 is sent with data. Also, the segments in Steps 2 and 3 are both from the end performing the passive close and could be combined into one segment.

Importance of TIME_WAIT State:

Undoubtedly, one of the most misunderstood aspects of TCP with regard to network programming is its TIME_WAIT state. The end that performs the active close goes through this state. The duration that this endpoint remains in this state is twice the maximum segment lifetime (MSL), sometimes called 2MSL.

(11)

Every implementation of TCP must choose a value for the MSL. The recommended value in RFC 1122 [Braden 1989] is 2 minutes, although Berkeley-derived implementations have traditionally used a value of 30 seconds instead. This means the duration of the TIME_WAIT state is between 1 and 4 minutes. The MSL is the maximum amount of time that any given IP datagram can live in a network. We know this time is bounded because every datagram contains an 8-bit hop limit with a maximum value of 255. Although this is a hop limit and not a true time limit, the assumption is made that a packet with the maximum hop limit of 255 cannot exist in a network for more than MSL seconds.

The way in which a packet gets "lost" in a network is usually the result of routing anomalies.

A router crashes or a link between two routers goes down and it takes the routing protocols seconds or minutes to stabilize and find an alternate path. During that time period, routing loops can occur (router A sends packets to router B, and B sends them back to A) and packets can get caught in these loops. In the meantime, assuming the lost packet is a TCP segment, the sending TCP times out and retransmits the packet, and the retransmitted packet gets to the final destination by some alternate path. But sometime later (up to MSL seconds after the lost packet started on its journey), the routing loop is corrected and the packet that was lost in the loop is sent to the final destination. This original packet is called a lost duplicate or a

wandering duplicate. TCP must handle these duplicates.

THE FOLLOWING INFORMATION HAS BEEN TAKEN FROM:

http://sit.iitkgp.ernet.in/archive/teaching/internetTech/tcp/www.scit.wlv.ac.uk/%257Ejphb/comms/

tcp.html

It should be noted that the exchange is really two independent exchanges and it is possible to close the connection in one direction but not the other. This is known as a half close. The following example (due to Stevens) demonstrates the use of the half-close.

Consider the Unix command rsh remote sort < datafile

The effect of this is that the local file datafile is sorted on the remote host and the results transferred back to the local host. The data flow is shown in the following diagram.

(12)

The problem here is that the sort program on the remote host will not start sorting the data until it has read all the data, this event is indicated by the local host closing the connection and the sort program responding to the corresponding EOF indication. However, the "back"

connection must remain open for the return of data.

Stevens suggests that the library call shutdown() be used with sockets programming to achieve a half close.

Once the final ACK has been sent on an active close, the port/connection cannot be relaeased and re-used for the time period 2MSL. This is twice the maximum segment life and this constraint is imposed in case the the final ACK is lost. If the final ACK is lost then the passive closing host will time out awaiting an ACK in response to the closing FIN and will resend the FIN. If this arrives before the 2MSL time has expired there is no problem, after this time the FIN does not appear to belong to whatever connection might exist between the two clients.

(13)

RFC 793 defines MSL (Maximum Segment Lifetime) as 120 seconds but some implementations use 30 or 60 seconds. It is, basically, the maximum time for which it is reasonable to wait for a segment, i.e. if a segment doesn't reach its destination in MSL, it probably won't get there at all at it can be assumed that it has been lost.

(14)

(15)

There are two reasons for the TIME_WAIT state:

1. To implement TCP's full-duplex connection termination reliably 2. To allow old duplicate segments to expire in the network

The first reason can be explained by assuming that the final ACK is lost. The server will resend its final FIN, so the client must maintain state information, allowing it to resend the final ACK. If it did not maintain this information, it would respond with an RST (a different type of TCP segment), which would be interpreted by the server as an error. If TCP is performing all the work necessary to terminate both directions of data flow cleanly for a connection (its full-duplex close), then it must correctly handle the loss of any of these four segments. This example also shows why the end that performs the active close is the end that remains in the TIME_WAIT state: because that end is the one that might have to retransmit the final ACK.

To understand the second reason for the TIME_WAIT state, assume we have a TCP connection between 12.106.32.254 port 1500 and 206.168.112.219 port 21. This connection is closed and then sometime later, we establish another connection between the same IP addresses and ports: 12.106.32.254 port 1500 and 206.168.112.219 port 21. This latter connection is called an incarnation of the previous connection since the IP addresses and ports are the same. TCP must prevent old duplicates from a connection from reappearing at some later time and being misinterpreted as belonging to a new incarnation of the same connection. To do this, TCP will not initiate a new incarnation of a connection that is currently in the TIME_WAIT state. Since the duration of the TIME_WAIT state is twice the MSL, this allows MSL seconds for a packet in one direction to be lost, and another MSL seconds for the reply to be lost. By enforcing this rule, we are guaranteed that when we successfully establish a TCP connection, all old duplicates from previous incarnations of the connection have expired in the network.

USEFUL LINKS FOR TIME_WAIT IMPORTANCE:

 http://support.citrix.com/article/CTX117910

 http://www.pcvr.nl/tcpip/tcp_time.htm

(16)

Port Numbers

ALLOCATION OF PORT NUMBERS

INTRODUCTION TO CONCURRENT SERVERS:

SOCKETPAIR:

The socket pair for a TCP connection is the four-tuple that defines the two endpoints of the connection: the local IP address, local port, foreign IP address, and foreign port. A socket pair uniquely identifies every TCP connection on a network.

(17)

NOTE: FOR MORE INFORMATION ABOUT FIRST 6 UNITS, PLEASE GO THROUGH THE FOLLOWING LINK:

http://beej.us/guide/bgnet/output/html/singlepage/bgnet.html

(18)

UNIT-II

Socket Address Structure

Most socket functions require a pointer to a socket address structure as an argument. Each supported protocol suite defines its own socket address structure.

IPv4 Socket Address Structure(SAS)

An IPv4 socket address structure, commonly called an "Internet socket address structure," is named sockaddr_in and is defined by including the <netinet/in.h> header. The POSIX definition of IPV4 SAS is shown below:

struct in_addr {

in_addr_t s_addr;

};

struct sockaddr_in {

uint8_t sin_len;

sa_family_t sin_family;

in_port_t sin_port;

struct in_addr sin_addr;

char sin_zero[8];

};

The diagrammatical representation of IPV4 SAS is:

(19)

Datatype, Description and Header File of IPV4 SAS Members

IMP NOTE: The 32-bit IPv4 address can be accessed in two different ways. For example, if serv is defined as an Internet socket address structure, then serv.sin_addr references the 32- bit IPv4 address as an in_addr structure, while serv.sin_addr.s_addr references the same 32- bit IPv4 address as an in_addr_t (typically an unsigned 32-bit integer). We must be certain that we are referencing the IPv4 address correctly, especially when it is used as an argument to a function, because compilers often pass structures differently from integers.

Socket address structures are used only on a given host: The structure itself is not communicated between different hosts, although certain fields (e.g., the IP address and port) are used for communication.

(20)

Value-Result Arguments

Three functions, bind, connect, and sendto, pass a socket address structure from the process to the kernel. One argument to these three functions is the pointer to the socket address structure and another argument is the integer size of the structure. Since the kernel is passed both the pointer and the size of what the pointer points to, it knows exactly how much data to copy from the process into the kernel.

(21)

Four functions, accept, recvfrom, getsockname, and getpeername, pass a socket address structure from the kernel to the process, the reverse direction from the previous scenario. Two of the arguments to these four functions are the pointer to the socket address structure along with a pointer to an integer containing the size of the structure.

The reason that the size changes from an integer to be a pointer to an integer is because the size is both a value when the function is called (it tells the kernel the size of the structure so that the kernel does not write past the end of the structure when filling it in) and a result when the function returns. This type of argument is called a value-result argument.

(22)

Byte Ordering Functions

Consider a 16-bit integer that is made up of 2 bytes. There are two ways to store the two bytes in memory: with the low-order byte at the starting address, known as little-endian byte order, or with the high-order byte at the starting address, known as big-endian byte order.

Network Byte Order – Big Endian Byte Order Host Byte Order – Big Endian or Little Endian Byte Order

We must deal with these byte ordering differences as network programmers because networking protocols must specify a network byte order. For example, in a TCP segment, there is a 16-bit port number and a 32-bit IPv4 address. The sending protocol stack and the receiving protocol stack must agree on the order in which the bytes of these multibyte fields will be transmitted. The Internet protocols use big-endian byte ordering for these multibyte integers.

In theory, an implementation could store the fields in a socket address structure in host byte order and then convert to and from the network byte order when moving the fields to and from the protocol headers, saving us from having to worry about this detail. But, both history and the POSIX specification say that certain fields in the socket address structures must be

(23)

maintained in network byte order. Our concern is therefore converting between host byte order and network byte order. We use the following four functions to convert between these two byte orders.

In the names of these functions, h stands for host, n stands for network, s stands for short, and l stands for long. The terms "short" and "long" are historical artifacts from the Digital VAX implementation of 4.2BSD. We should instead think of s as a 16-bit value (such as a TCP or UDP port number) and l as a 32-bit value (such as an IPv4 address). Indeed, on the 64-bit Digital Alpha, a long integer occupies 64 bits, yet the htonl and ntohl functions operate on 32-bit values.

NOTE: These functions are used exclusively for data functionality between sockets (storage).

Byte Manipulation Functions

There are two groups of functions that operate on multibyte fields, without interpreting the data, and without assuming that the data is a null-terminated C string. We need these types of functions when dealing with socket address structures because we need to manipulate fields such as IP addresses, which can contain bytes of 0, but are not C character strings.

The first group of functions, whose names begin with b (for byte), are from 4.2BSD and are still provided by almost any system that supports the socket functions. The second group of functions, whose names begin with mem (for memory), are from the ANSI C standard and are provided with any system that supports an ANSI C library.

(24)

src might represent application space and dest might represent socket send buffer space (socket receive buffer space).

(25)

inet_aton, inet_addr, and inet_ntoa Functions

To send IP address on the network, we have the functions that serve the purpose. The following functions are for IPV4.

inet_pton and inet_ntop Functions

The IPV6 functions for the data communication over the network, following functions are used. These functions can also be used for IPV4 addresses also (The ‗family‘ argument specifies this).

(26)

sock_ntop Function

A basic problem with inet_ntop is that it requires the caller to pass a pointer to a binary address. This address is normally contained in a socket address structure, requiring the caller to know the format of the structure and the address family.

To solve this problem, sock_ntop() is used which takes pointer to a socket address structure as an argument, calls the appropriate function and the presentation address is returned.

readn, writen, and readline Functions

Stream sockets (e.g., TCP sockets) exhibit a behavior with the read and write functions that differ from normal file I/O. A read or write on a stream socket might input or output fewer bytes than requested, but this is not an error condition. The reason is that buffer limits might be reached for the socket in the kernel. All that is required to input or output the remaining bytes is for the caller to invoke the read or write function again. Some versions of Unix also exhibit this behavior when writing more than 4,096 bytes to a pipe. This scenario is always a possibility on a stream socket with read, but is normally seen with write only if the socket is nonblocking. Nevertheless, we always call our writen function instead of write, in case the implementation returns a short count.

(27)

The following functions overcome this problem.

(28)

Elementary TCP Sockets

Socket functions for elementary TCP client/server Socket:

socket (af, type, protocol);

Creates a socket on demand (placing it in an unconnected state), returns an integer identifying the socket (descriptor), and specifies:

Address Family (af) - particular address of the family.

Type - Type of communication socket:

(29)

SOCK_STREAM - connection-oriented SOCK_DGRAM - connection-less

SOCK_RAW - access to low-level protocols or network interfaces.

Protocol - Accommodates multiple protocols within a family.

Bind:

bind (socket, localaddr, addrlen);

Socket is created without any association to local or destination addresses, so a program uses bind to establish a local address for it.

Socket - integer descriptor of the socket.

Localaddr - structure that specifies the local address to be bound.

Addrlen - integer length of the address (in bytes).

Listen:

listen (socket, qlength);

Server creates a socket, binds it to a well-known port, and waits for requests. To avoid rejecting service requests that cannot be handled, a server queue is created using Listen. It provides a mechanism to create the queue and then listen for incoming connections (passive mode). Listen only works with sockets using a reliable stream service.

Socket - Integer descriptor.

Qlength - length of the request queue for that socket (max. = 5).

Connect:

connect (socket, destaddr, addrlen);

Binds a permanent destination to a socket placing it in a connected state. Sockets using connection-less service do not have to use connect (specify the address in every datagram), but may.

Socket - socket descriptor.

Destaddr - socket_addr structure (also includes protocol port number) specifying the destination address.

Addrlen - length of destination address (in bytes).

(30)

Accept:

accept (socket, addr, addrlen);

Bind associates a socket with port, but that socket is not connected to a foreign destination.

When a request comes in, Accept establishes the full connection. It blocks until a connection request arrives.

Addr - pointer to the sockaddr structure.

Addrlen - pointer to integer size of address.

Close: (A system call from traditional UNIX Environment) close (socket descriptor);

When a client or server finishes with a socket, calls close to deallocate it‘s resources. The connection immediately terminates unless several processes share the same socket. It then decrements the reference count (closing it completely when reference count = 0).

Order of Socket System Calls:

Client Side

Client Side (depends on connection type):

Socket Connect

Write (may be repeated) Read (may be repeated) Close

Server Side

Server Side (depends on connection type):

Socket Bind Listen Accept

Read (may be repeated) Write (may be repeated) Close (go back to Accept)

(31)

Shutdown:

Shutdown (socket, direction);

The shutdown function applies to full-duplex sockets (connected using a TCP socket) and is used to partially close the connection.

Socket - socket descriptor of a connected socket.

Direction - direction in which shutdown is desired 0 = terminate further input.

1 = terminate further output.

2 = terminate input / output (close).

IMPORTANT NOTES:

File and Socket Descriptors:

A socket is a generalized UNIX file access mechanism that provides an endpoint for communication. Descriptors (maintained in the descriptor tables) are kept per process by the operating system to point to internal data structures for files and sockets. Descriptors are small integer values.

File Descriptor:

Bound to a file when open is called.

Socket Descriptor:

Created using open, but does not bind it to a destination.

Unbounded - UDP specifies destination every time.

Bounded - TCP specifies destination during an open system call.

(32)

After a socket has been created (using open), additional system calls are required to specify the details of it‘s use.

Passive Socket - used by a server to wait for calls.

Active Socket - used by a client to initiate a connection.

Basic I/O Functions in UNIX:

UNIX and other operating systems provide a basic set of system functions used for I/O operations on files and other devices. Most operating systems provide similar variations to the five standard I/O operations that BSD UNIX uses.

I/O Functions:

Open - prepare for input / output.

Close - terminate the use of a device.

Write - transfer data from memory to an output device.

Read - transfer data from an input device to memory.

Lseek - position the head of a disk drive to a specific place on the disk.

The Socket Interface:

 The Berkeley socket interface provides generalized functions that support network communication using many possible protocols.

 Socket calls refer to all TCP/IP protocols as a single protocol family (protocol suite).

The calls allow a programmer to specify the type of service required, rather than the name of a specific protocol.

 The socket interface was created since an API (application program interface) for network connections is not standardized, it‘s design lies outside the scope of a protocol suite.

(33)

Concurrent Servers

(34)

getsockname and getpeername Functions

These two functions return either the local protocol address associated with a socket (getsockname) or the foreign protocol address associated with a socket (getpeername).

#include <sys/socket.h>

int getsockname(intsockfd, struct sockaddr *localaddr, socklen_t *addrlen);

int getpeername(intsockfd, struct sockaddr *peeraddr, socklen_t *addrlen);

Both return: 0 if OK, -1 on error

Notice that the final argument for both functions is a value-result argument. That is, both functions fill in the socket address structure pointed to by localaddr or peeraddr. We mentioned in our discussion of bind that the term "name" is misleading. These two functions return the protocol address associated with one of the two ends of a network connection, which for IPV4 and IPV6 is the combination of an IP address and port number. These functions have nothing to do with domain names.

(35)

These two functions are required for the following reasons:

 After connect successfully returns in a TCP client that does not call bind, getsockname returns the local IP address and local port number assigned to the connection by the kernel.

 After calling bind with a port number of 0 (telling the kernel to choose the local port number), getsockname returns the local port number that was assigned. getsockname can be called to obtain the address family of a socket.

 In a TCP server that binds the wildcard IP address, once a connection is established with a client (accept returns successfully), the server can call getsockname to obtain the local IP address assigned to the connection. The socket descriptor argument in this call must be that of the connected socket, and not the listening socket.

 When a server is execed by the process that calls accept, the only way the server can obtain the identity of the client is to call getpeername.

(36)

UNIT-III

TCP Client/Server Example Introduction

Our simple example is an echo server that performs the following steps:

1. The client reads a line of text from its standard input and writes the line to the server.

2. The server reads the line from its network input and echoes the line back to the client.

3. The client reads the echoed line and prints it on its standard output.

Normal Startup(w.r.to socket pair)

In order to initiate the communication between the client and server, we first start the Server by calling socket(). The socket pair at the server is;

SP = (IPs:Ps , IPc:Pc) where

IPc – IP address of Client IPs – IP address of Server Pc – Port Number of Client Ps – Port Number of Server

Next comes bind(), then SP = (localhost:33600 , IPc:Pc)

Then listen(), now SP = (localhost:33600 , IPc:Pc) [You may enter wildcard character „*‟

for IPs, IPc, Pc when they are not known.]

So, at Server the status is ―Passive Open‖ and the format is:

Server

socket() - SP = (IPs:Ps , IPc:Pc)

bind() - SP = (localhost:33600 , IPc:Pc)

(37)

listen() - SP = (localhost:33600 , IPc:Pc) or (*:33600 , *:*)

Now, the Client requests the connection with the server. The function calls are;

socket(). The socket pair is;

SP = (IPc:Pc , IPs:Ps)

So, at the client side, the status is ―Active Open‖. Now, ―SIMULTANEOUS OPEN‖

situation occurs as both the ends connect with each other as, At Client:

Call is connect() – SP = (localhost:33597, x.y.z.w:33600) At Server:

Call is accept() – SP = (localhost:33600 , a.b.c.d:33597) The format is:

Client

socket() - SP = (IPc:Pc , IPs:Ps)

SIMULTANEOUS OPEN

connect() – SP = (localhost:33597, x.y.z.w:33600) accept() – SP = (localhost:33600 , a.b.c.d:33597)

At this point, Normal Startup of Client and Server is said to be occurred.

The following steps take place with our Client/Server example:

1. The client calls str_cli, which will block in the call to fgets, because we have not typed a line of input yet.

2. When accept returns in the server, it calls fork and the child calls str_echo. This function calls readline, which calls read, which blocks while waiting for a line to be sent from the client.

3. The server parent, on the other hand, calls accept again, and blocks while waiting for the next client connection.

(38)

Normal Termination

We can follow through the steps involved in the normal termination of our client and server:

1. When we type our EOF character, fgets returns a null pointer and the function str_cli returns.

2. When str_cli returns to the client main function , the latter terminates by calling exit.

3. Part of process termination is the closing of all open descriptors, so the client socket is closed by the kernel. This sends a FIN to the server, to which the server TCP responds with an ACK. This is the first half of the TCP connection termination sequence. At this point, the server socket is in the CLOSE_WAIT state and the client socket is in the FIN_WAIT_2 state.

4. When the server TCP receives the FIN, the server child is blocked in a call to readline, and readline then returns 0. This causes the str_echo function to return to the server child main.

5. The server child terminates by calling exit.

6. All open descriptors in the server child are closed. The closing of the connected socket by the child causes the final two segments of the TCP connection termination to take place: a FIN from the server to the client, and an ACK from the client. At this point, the connection is completely terminated. The client socket enters the TIME_WAIT state.

7. Finally, the SIGCHLD signal is sent to the parent when the server child terminates.

This occurs in this example, but we do not catch the signal in our code, and the default action of the signal is to be ignored. Thus, the child enters the zombie state.

We can verify this with the ps command.

wait and waitpid Functions

we call the wait function to handle the terminated child.

#include <sys/wait.h>

pid_t wait (int *statloc);

pid_t waitpid (pid_tpid, int *statloc, intoptions);

Both return: process ID if OK, 0 or–1 on error

(39)

wait and waitpid both return two values: the return value of the function is the process ID of the terminated child, and the termination status of the child (an integer) is returned through the statloc pointer. There are three macros that we can call that examine the termination status and tell us if the child terminated normally, was killed by a signal, or was just stopped by job control. Additional macros let us then fetch the exit status of the child, or the value of the signal that killed the child, or the value of the job-control signal that stopped the child.

We will use the WIFEXITED and WEXITSTATUS macros for this purpose. If there are no terminated children for the process calling wait, but the process has one or more children that are still executing, then wait blocks until the first of the existing children terminates.

waitpid gives us more control over which process to wait for and whether or not to block.

First, the pid argument lets us specify the process ID that we want to wait for. A value of -1 says to wait for the first of our children to terminate. (There are other options, dealing with process group IDs, but we do not need them in this text.) The options argument lets us specify additional options. The most common option is WNOHANG. This option tells the kernel not to block if there are no terminated children.

(40)

Termination of Server Process

We will now start our client/server and then kill the server child process. This simulates the crashing of the server process, so we can see what happens to the client. The following steps take place:

1. We start the server and client and type one line to the client to verify that all is okay.

That line is echoed normally by the server child.

2. We find the process ID of the server child and kill it. As part of process termination, all open descriptors in the child are closed. This causes a FIN to be sent to the client, and the client TCP responds with an ACK. This is the first half of the TCP connection termination.

3. The SIGCHLD signal is sent to the server parent and handled correctly.

4. Nothing happens at the client. The client TCP receives the FIN from the server TCP and responds with an ACK, but the problem is that the client process is blocked in the call to fgets waiting for a line from the terminal.

5. Running netstat at this point shows the state of the sockets.

linux % netstat -a | grep 9877

tcp 0 0 *:9877 *:* LISTEN

tcp 0 0 localhost:9877 localhost:43604 FIN_WAIT2 tcp 1 0 localhost:43604 localhost:9877 CLOSE_WAIT

6. We can still type a line of input to the client. Here is what happens at the client starting from Step 1:

linux %tcpcli01 127.0.0.1 start client

hello the first line that we type

hello is echoed correctly here we kill the

server child on the server host

another line we then type a second line to the client

str_cli : server terminated prematurely

(41)

When we type "another line," str_cli calls writen and the client TCP sends the data to the server. This is allowed by TCP because the receipt of the FIN by the client TCP only indicates that the server process has closed its end of the connection and will not be sending any more data. The receipt of the FIN does not tell the client TCP that the server process has terminated (which in this case, it has).

When the server TCP receives the data from the client, it responds with an RST since the process that had that socket open has terminated. We can verify that the RST was sent by watching the packets with tcpdump.

7. The client process will not see the RST because it calls readline immediately after the call to writen and readline returns 0 (EOF) immediately because of the FIN that was received in Step 2. Our client is not expecting to receive an EOF at this point so it quits with the error message "server terminated prematurely."

8. When the client terminates, all its open descriptors are closed.

Crashing of Server Host

The following steps take place:

1. When the server host crashes, nothing is sent out on the existing network connections.

That is, we are assuming the host crashes and is not shut down by an operator.

2. We type a line of input to the client, it is written by writen , and is sent by the client TCP as a data segment. The client then blocks in the call to readline, waiting for the echoed reply.

3. If we watch the network with tcpdump, we will see the client TCP continually retransmitting the data segment, trying to receive an ACK from the server. Section 25.11 of TCPv2 shows a typical pattern for TCP retransmissions: Berkeley-derived implementations retransmit the data segment 12 times, waiting for around 9 minutes before giving up. When the client TCP finally gives up (assuming the server host has not been rebooted during this time, or if the server host has not crashed but was unreachable on the network, assuming the host was still unreachable), an error is returned to the client process. Since the client is blocked in the call to readline, it returns an error. Assuming the server host crashed and there were no responses at all

(42)

to the client's data segments, the error is ETIMEDOUT. But if some intermediate router determined that the server host was unreachable and responded with an ICMP

―destination unreachable‖ message, the error is either EHOSTUNREACH or ENETUNREACH.

Crashing and Rebooting of Server Host

The following steps take place:

1. We start the server and then the client. We type a line to verify that the connection is established.

2. The server host crashes and reboots. We type a line of input to the client, which is sent as a TCP data segment to the server host.

3. When the server host reboots after crashing, its TCP loses all information about connections that existed before the crash. Therefore, the server TCP responds to the received data segment from the client with an RST.

4. Our client is blocked in the call to readline when the RST is received, causing readline to return the error ECONNRESET.

Shutdown of Server Host

The previous two sections discussed the crashing of the server host, or the server host being unreachable across the network. We now consider what happens if the server host is shut down by an operator while our server process is running on that host.

When a Unix system is shut down, the init process normally sends the SIGTERM signal to all processes (we can catch this signal), waits some fixed amount of time (often between 5 and 20 seconds), and then sends the SIGKILL signal (which we cannot catch) to any processes still running. This gives all running processes a short amount of time to clean up and terminate. If we do not catch SIGTERM and terminate, our server will be terminated by the SIGKILL signal.

When the process terminates, all open descriptors are closed, and we then follow the same sequence of steps discussed in TERMINATION OF SERVER PROCESS. As stated there, we must use the select or poll function in our client to have the client detect the termination of the server process as soon as it occurs.

(43)

UNIT-IV

I/O Multiplexing:The

select

and

poll

functions

Introduction

We saw our TCP client handling two inputs at the same time: standard input and a TCP socket. We encountered a problem when the client was blocked in a call to fgets (on standard input) and the server process was killed. The server TCP correctly sent a FIN to the client TCP, but since the client process was blocked reading from standard input, it never saw the EOF until it read from the socket (possibly much later). What we need is the capability to tell the kernel that we want to be notified if one or more I/O conditions are ready (i.e., input is ready to be read, or the descriptor is capable of taking more output). This capability is called I/O multiplexing and is provided by the select and poll functions. We will also cover a newer POSIX variation of the former, called pselect.

(44)

I/O multiplexing is typically used in networking applications in the following scenarios:

 When a client is handling multiple descriptors (normally interactive input and a network socket), I/O multiplexing should be used.

 It is possible, but rare, for a client to handle multiple sockets at the same time.

 If a TCP server handles both a listening socket and its connected sockets, I/O multiplexing is normally used.

 If a server handles TCP and UDP, I/O multiplexing is normally used.

 If a server handles multiple services and perhaps multiple protocols, I/O multiplexing is normally used.

There are normally two distinct phases for an input operation:

1. Waiting for the data to be ready

2. Copying the data from the kernel to the process

For an input operation on a socket, the first step normally involves waiting for data to arrive on the network. When the packet arrives, it is copied into a buffer within the kernel. The second step is copying this data from the kernel's buffer into our application buffer.

I/O Models

The five I/O models those are available to us under UNIX:

 blocking I/O

 nonblocking I/O

 I/O multiplexing (select and poll)

 signal driven I/O (SIGIO)

 asynchronous I/O (the POSIX aio_functions)

(45)

BLOCKING I/O MODEL:

NONBLOCKING I/O MODEL:

(46)

I/O MULTIPLEXING

SIGNAL-DRIVEN I/O

(47)

ASYNCHRONOUS I/O MODEL

SELECT FUNCTION

select()

—Synchronous I/O Multiplexing

This function is somewhat strange, but it's very useful. Take the following situation: you are a server and you want to listen for incoming connections as well as keep reading from the connections you already have.

No problem, you say, just an accept() and a couple of recv()s. Not so fast, buster! What if you're blocking on an accept() call? How are you going to recv() data at the same time? "Use non-blocking sockets!" No way! You don't want to be a CPU hog. What, then?

select() gives you the power to monitor several sockets at the same time. It'll tell you which ones are ready for reading, which are ready for writing, and which sockets have raised exceptions, if you really want to know that.

(48)

This being said, in modern times select(), though very portable, is one of the slowest methods for monitoring sockets. One possible alternative is libevent, or something similar, that encapsulates all the system-dependent stuff involved with getting socket notifications.

Without any further ado, I'll offer the synopsis of select():

#include <sys/time.h>

#include <sys/types.h>

#include <unistd.h>

int select(int numfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout);

The function monitors "sets" of file descriptors; in particular readfds, writefds, and exceptfds. If you want to see if you can read from standard input and some socket descriptor, sockfd, just add the file descriptors 0 and sockfd to the set readfds. The parameter numfds should be set to the values of the highest file descriptor plus one. In this example, it should be set tosockfd+1, since it is assuredly higher than standard input (0).

When select() returns, readfds will be modified to reflect which of the file descriptors you selected which is ready for reading. You can test them with the macro FD_ISSET(), below.

Before progressing much further, I'll talk about how to manipulate these sets. Each set is of the type fd_set. The following macros operate on this type:

FD_SET(int fd, fd_set *set); Add fd to the set. FD_CLR(int fd, fd_set *set); Remove fd from the set. FD_ISSET(int fd, fd_set *set); Return true if fd is in the set. FD_ZERO(fd_set *set); Clear all entries from the set.

Finally, what is this weirded out struct timeval? Well, sometimes you don't want to wait forever for someone to send you some data. Maybe every 96 seconds you want to print "Still Going..." to the terminal even though nothing has happened. This time structure allows you to specify a timeout period. If the time is exceeded and select() still hasn't found any ready file descriptors, it'll return so you can continue processing.

(49)

The struct timeval has the follow fields:

struct timeval {

int tv_sec; // seconds int tv_usec; // microseconds };

Just set tv_sec to the number of seconds to wait, and set tv_usec to the number of microseconds to wait. Yes, that's microseconds, not milliseconds. There are 1,000 microseconds in a millisecond, and 1,000 milliseconds in a second. Thus, there are 1,000,000 microseconds in a second. Why is it "usec"? The "u" is supposed to look like the Greek letter μ (Mu) that we use for "micro". Also, when the function returns, timeout might be updated to show the time still remaining. This depends on what flavor of Unix you're running.

Yay! We have a microsecond resolution timer! Well, don't count on it. You'll probably have to wait some part of your standard Unix timeslice no matter how small you set yourstruct timeval.

Other things of interest: If you set the fields in your struct timeval to 0, select() will timeout immediately, effectively polling all the file descriptors in your sets. If you set the parametertimeout to NULL, it will never timeout, and will wait until the first file descriptor is ready. Finally, if you don't care about waiting for a certain set, you can just set it to NULL in the call toselect().

The following code snippet waits 2.5 seconds for something to appear on standard input:

/*

** select.c -- a select() demo

*/

#include <stdio.h>

#include <sys/time.h>

#include <unistd.h>

#define STDIN 0 // file descriptor for standard input

int main(void) {

struct timeval tv;

fd_set readfds;

(50)

tv.tv_sec = 2;

tv.tv_usec = 500000;

FD_ZERO(&readfds);

FD_SET(STDIN, &readfds);

// don't care about writefds and exceptfds:

select(STDIN+1, &readfds, NULL, NULL, &tv);

if (FD_ISSET(STDIN, &readfds)) printf("A key was pressed!\n");

else

printf("Timed out.\n");

return 0;

}

If you're on a line buffered terminal, the key you hit should be RETURN or it will time out anyway.

Now, some of you might think this is a great way to wait for data on a datagram socket—and you are right: it might be. Some Unices can use select in this manner, and some can't. You should see what your local man page says on the matter if you want to attempt it.

Some Unices update the time in your struct timeval to reflect the amount of time still remaining before a timeout. But others do not. Don't rely on that occurring if you want to be portable. (Use gettimeofday() if you need to track time elapsed. It's a bummer, I know, but that's the way it is.)

What happens if a socket in the read set closes the connection? Well, in that case, select() returns with that socket descriptor set as "ready to read". When you actually do recv() from it,recv() will return 0. That's how you know the client has closed the connection.

One more note of interest about select(): if you have a socket that is listen()ing, you can check to see if there is a new connection by putting that socket's file descriptor in the readfds set.

And that, my friends, is a quick overview of the almighty select() function.

(51)

But, by popular demand, here is an in-depth example. Unfortunately, the difference between the dirt-simple example, above, and this one here is significant. But have a look, then read the description that follows it.

This program acts like a simple multi-user chat server. Start it running in one window, then telnet to it ("telnet hostname 9034") from multiple other windows. When you type something in onetelnet session, it should appear in all the others.

/*

** selectserver.c -- a cheezy multiperson chat server

*/

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

#include <unistd.h>

#include <sys/socket.h>

#include <netinet/in.h>

#include <arpa/inet.h>

#include <netdb.h>

#define PORT "9034" // port we're listening on

// get sockaddr, IPv4 or IPv6:

void *get_in_addr(struct sockaddr *sa) {

if (sa->sa_family == AF_INET) {

return &(((struct sockaddr_in*)sa)->sin_addr);

}

return &(((struct sockaddr_in6*)sa)->sin6_addr);

}

int main(void) {

fd_set master; // master file descriptor list

fd_set read_fds; // temp file descriptor list for select() int fdmax; // maximum file descriptor number

int listener; // listening socket descriptor

int newfd; // newly accept()ed socket descriptor struct sockaddr_storage remoteaddr; // client address socklen_t addrlen;

char buf[256]; // buffer for client data int nbytes;

(52)

char remoteIP[INET6_ADDRSTRLEN];

int yes=1; // for setsockopt() SO_REUSEADDR, below int i, j, rv;

struct addrinfo hints, *ai, *p;

FD_ZERO(&master); // clear the master and temp sets FD_ZERO(&read_fds);

// get us a socket and bind it memset(&hints, 0, sizeof hints);

hints.ai_family = AF_UNSPEC;

hints.ai_socktype = SOCK_STREAM;

hints.ai_flags = AI_PASSIVE;

if ((rv = getaddrinfo(NULL, PORT, &hints, &ai)) != 0) { fprintf(stderr, "selectserver: %s\n", gai_strerror(rv));

exit(1);

}

for(p = ai; p != NULL; p = p->ai_next) {

listener = socket(p->ai_family, p->ai_socktype, p->ai_protocol);

if (listener < 0) { continue;

}

// lose the pesky "address already in use" error message

setsockopt(listener, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof(int));

if (bind(listener, p->ai_addr, p->ai_addrlen) < 0) { close(listener);

continue;

}

break;

}

// if we got here, it means we didn't get bound if (p == NULL) {

fprintf(stderr, "selectserver: failed to bind\n");

exit(2);

}

freeaddrinfo(ai); // all done with this

// listen

if (listen(listener, 10) == -1) { perror("listen");

exit(3);

}

// add the listener to the master set

(53)

FD_SET(listener, &master);

// keep track of the biggest file descriptor fdmax = listener; // so far, it's this one

// main loop for(;;) {

read_fds = master; // copy it

if (select(fdmax+1, &read_fds, NULL, NULL, NULL) == -1) { perror("select");

exit(4);

}

// run through the existing connections looking for data to read for(i = 0; i <= fdmax; i++) {

if (FD_ISSET(i, &read_fds)) { // we got one!!

if (i == listener) {

// handle new connections addrlen = sizeof remoteaddr;

newfd = accept(listener,

(struct sockaddr *)&remoteaddr, &addrlen);

if (newfd == -1) { perror("accept");

} else {

FD_SET(newfd, &master); // add to master set if (newfd > fdmax) { // keep track of the max fdmax = newfd;

}

printf("selectserver: new connection from %s on "

"socket %d\n",

inet_ntop(remoteaddr.ss_family,

get_in_addr((struct sockaddr*)&remoteaddr), remoteIP, INET6_ADDRSTRLEN),

newfd);

} } else {

// handle data from a client

if ((nbytes = recv(i, buf, sizeof buf, 0)) <= 0) { // got error or connection closed by client if (nbytes == 0) {

// connection closed

printf("selectserver: socket %d hung up\n", i);

} else {

perror("recv");

}

close(i); // bye!

FD_CLR(i, &master); // remove from master set } else {

// we got some data from a client for(j = 0; j <= fdmax; j++) {

(54)

// send to everyone!

if (FD_ISSET(j, &master)) {

// except the listener and ourselves if (j != listener && j != i) {

if (send(j, buf, nbytes, 0) == -1) { perror("send");

} } } } }

} // END handle data from client } // END got new incoming connection } // END looping through file descriptors

} // END for(;;)--and you thought it would never end!

return 0;

}

Notice I have two file descriptor sets in the code: master and read_fds. The first, master, holds all the socket descriptors that are currently connected, as well as the socket descriptor that is listening for new connections.

The reason I have the master set is that select() actually changes the set you pass into it to reflect which sockets are ready to read. Since I have to keep track of the connections from one call of select() to the next, I must store these safely away somewhere. At the last minute, I copy the master into the read_fds, and then call select().

But doesn't this mean that every time I get a new connection, I have to add it to the master set? Yup! And every time a connection closes, I have to remove it from the master set? Yes, it does.

Notice I check to see when the listener socket is ready to read. When it is, it means I have a new connection pending, and I accept() it and add it to the master set. Similarly, when a client connection is ready to read, and recv() returns 0, I know the client has closed the connection, and I must remove it from the master set.

If the client recv() returns non-zero, though, I know some data has been received. So I get it, and then go through the master list and send that data to all the rest of the connected clients.

(55)

And that, my friends, is a less-than-simple overview of the almighty select() function.

In addition, here is a bonus afterthought: there is another function called poll() which behaves much the same way select() does, but with a different system for managing the file descriptor sets.

http://beej.us/guide/bgnet/output/html/singlepage/bgnet.html#select

POLL FUNCTION

poll()

Test for events on multiple sockets simultaneously

Prototypes

#include <sys/poll.h>

int poll(struct pollfd *ufds, unsigned int nfds, int timeout);

Description

This function is very similar to select() in that they both watch sets of file descriptors for events, such as incoming data ready to recv(), socket ready to send() data to, out-of-band data ready to recv(), errors, etc.

The basic idea is that you pass an array of nfdsstruct pollfds in ufds, along with a timeout in milliseconds (1000 milliseconds in a second.) The timeout can be negative if you want to wait forever. If no event happens on any of the socket descriptors by the timeout, poll() will return.

Each element in the array of struct pollfds represents one socket descriptor, and contains the following fields:

struct pollfd {

int fd; // the socket descriptor

short events; // bitmap of events we're interested in

short revents; // when poll() returns, bitmap of events that occurred };