2.2 Network Protocol Stack
2.2.1 Application Layer Standards
Application layer protocols on the protocol stack reside the closest to client users, and interact directly with the users and are influenced directly by different requirements of users. Protocols on this layer provide host-to-host connections and service requirements of different applications. There are several widely used protocols at this layer which will be discussed in more details.
The Hypertext Transfer Protocol (HTTP) is the basis of data transfer in the World Wide Web. The version in common use is standardized in RFC 2616 [38], and runs on top
Figure 2.7 TCP/IP protocol suite
of reliable protocols at transport layer such as Transmission Control Protocol (TCP). HTTP mainly defines how the data should be formatted, transferred and the proper way of inter- action between request on the client side and response on the server side. Although UDP is used by many streaming protocols, it may sometimes be blocked by the firewall. In this case, HTTP based streaming provides a solution as it works on the top of TCP and requires a simple web server. For this reason, HTTP streaming has one of the largest penetration and in the market HTTP traffic accounts for a large fraction of Internet bandwidth used for streaming.
The File Transfer Protocol (FTP) is another widely deployed protocol used for file transfer. The protocol is standardized in RFC 0959 [39]. It provides supports for sepa- rate control and data connections between client and server. The main function of control connection is to perform user authentication and command exchange and remains open
when file transfer is being carried on, while the data connection is only active during data transmission.
The Simple Mail Transfer Protocol (SMTP) defined by RFC 2821 [40] is a standard designed for email transmission. The protocol is normally used for outgoing data transmis- sion for example when emails are sending out. On the other hand, incoming data is fetched by the client using either the Post Office Protocol (POP) RFC 1939 [41] or the Internet Message Access Protocol (IMAP) defined by RFC 3501 [42].
Real-Time Transport Protocol (RTP) [43] provides support for end-to-end delivery for real time audio/video data streaming. Applications run RTP on the top of UDP to make proper utilization of its multiplexing and checksum capabilities. RTP does not provide timely or in-order packet delivery or QoS support. The sequence number contained in the RTP packet only allows the receiver to reconstruct the senders sequence number. An RTP session consists of one or more participants, where each of the clients can send or receive media data. A network address and two port numbers are used to identify the participants. One port number is for media data and the other one is for RTP Control Protocol (RTCP). The participants are enabled to choose the media types they are willing to receive. For example, a participant may just want to receive the audio part of a media streaming video only.
RTP Control Protocol (RTCP) [44] aims at maintaining high QoS levels of RTP through providing the feedback information such as packet loss, jitter condition to all the participants. It works along with RTP and does not carry any media content. The feed- back information is used to adjust the media transfer rate. Moreover, it can also be used to monitor network conditions and diagnose the problem in data distribution among receivers. Although RTP runs on UDP, TCP is used for RTCP data transmission. Figure 2.8 illustrates an RTP session.
Real Time Streaming Protocol (RTSP) [45] is an application layer protocol designed to control streaming media servers used in entertainment and communications systems. It provides support for establishing and controlling media sessions between end points. The
Figure 2.8 Illustration of an RTP session
protocol itself does not provide transmission of streaming data, and most RTSP servers use the RTP and RTCP for data delivery. An RTSP session is not bound to any specific underlying transport layer protocol, and the protocol provides an extended framework to choose media delivery channels such as UDP, multicast UDP and TCP, and RTP based delivery mechanisms. An RTSP session starts by requesting a presentation or media to be started at the server. Server labels each session with an identifier. This session identifier represents the shared state between the server and client and is used in all subsequent controls. If the state is lost, RTSP stops the transmission of media by not receiving RTCP messages while using RTP. Figure 2.9 depicts an RTSP session and illustrates the basic requests used in RTSP.
Session Description Protocol (SDP) [46] is a protocol used to carry media details, transport addresses and other session description metadata to other participants. The proto- col is normally used by other streaming protocols such as RTSP to provide the description of a multimedia session for the purpose of session announcement, session invitation or other form of multimedia session initiation. A common SDP session description consists of the name and purpose of the session, the time duration of an active session, the type of media comprising the session, and information like address, port, format, etc. used for data receiving.
Figure 2.9 Illustration of an RTSP session and basic RTSP requests
Session Initiation Protocol (SIP) [47] is an application layer protocol widely used for controlling multimedia communication sessions. It can be used for either two-party or multi-party sessions involed in audio and video communications. The design of SIP is similar to HTTP, the client of which makes a request through a particular method and receives response from the server. Some of the header fields, encoding rules and status codes of HTTP are re-used by SIP.
Dynamic Adaptive Streaming over HTTP (DASH) [48] is another protocol designed for multimedia transmission. Dash chopps the multimedia content into HTTP-based seg- ments, each of which is made available of various bitrates. The client selects the segment with the highest bitrate possbile in real time to make sure smooth playback is provided without causing stalls or rebuffering events.