2.4 Application Layer Protocols
3.1.3 Adaptation
Although recent developments within the IETF focus on QoS issues and discuss how to provide QoS for packet-switched networks, such as the Internet, none of the QoS mecha- nisms are widely deployed. Thus, today’s media streaming applications must still tolerate variations in QoS (i.e. dynamic changes of delay variations, preserved throughput and packet loss) delivered by the network.
Mechanisms to provide continuous service even when external conditions change (i.e. net- work congestion, router queue overflows and processing overload) are commonly known as QoS adaptation mechanisms. Adaptive applications are able to gracefully adapt their service quality depending on the QoS received from lower-level services. Even severe ser- vice fluctuations can be accommodated by means of QoS adaptation mechanisms. In such cases, however, it is often appropriate to inform the application of the service degradation so that it can adjust to the new QoS level [CCH92]. If the delivered performance violates the negotiated QoS (for example, QoS reserved by means of RSVP), the user may choose to take some remedial action (i.e. adjust application state to accommodate the current load conditions, re-negotiate the flow’s QoS, disconnect from the service).
Application level QoS adaptation is mainly increasing or reducing the QoS properties of the application depending on variations in the network QoS characteristics. Adaptation, for example, changes the media stream (i.e. audio quality, encoding format), adds redundancy to the stream, or adjusts the receiver buffer size (i.e. playout point estimation) to make users think that their application have constant network service qualities. This trick, however, works only as long as the “real” QoS provided by the underlying network is within a certain range. If the “real” QoS degrades below the “adaptation limits”, adaptation cannot operate properly anymore and the quality remains poor. An example of QoS control for adaptive distributed multimedia applications is given in [GS95].
If the network supports QoS by means of resource reservation, applications have “guaran- teed” resources for their media stream, and hence, need not adapt to changing network QoS characteristics. Thus, in an environment where resource reservation is available, adaptation is only of user-initiated nature.
In a recent comparison of best-effort versus resource reservation [BS98], it is pointed out that adaptation mechanisms enable multimedia applications over simple best-effort net- works to perform – from a user’s point of view – very similarly to applications that make use of resource reservation. The utility or value that users derive from adaptive application under moderate QoS conditions is almost equal to rigid applications with resource reserva- tion support. Since adaption accounts only for limited service fluctuations, the remaining question is to what extent applications can adapt; the comparison did not find an answer. In summary we can conclude that adaptive applications significantly improve performance under moderate to high network load. However, they can only account service degradation to a certain level.
3.1. APPLICATION LAYER QOS 79
3.1.4
Receiver Buffering
Since the quality of real-time media depends mainly on timely delivery and play out of the stream data, protocols and mechanisms must address the control of delay, jitter and reliability in an integrated fashion.
Receiver buffering is required to compensate for delay variations, also called jitter, intro- duced by the network and the processing in the end systems (see section 1.2.6). It is also important to resolve the problem of erroneous data transmission, such as packet re- ordering. The tasks of receiver buffering are to estimate the optimal playout delay, which is required to compute the buffering time, and to manage the queuing of the received packets until their playback point exceeds. The playback time for each packet is usually determined by the timestamp assigned by the sender and an estimate of the network and processing delay.
TP layback = TRecording+ DN etwork + DP rocessing (3.4)
The processing delay estimate DP rocessing accounts only for the processing delay (i.e. de-
coding, decompression, scheduling) at the receiver. Since receivers cannot differentiate the delay (or jitter) caused by the sender processing and the network, the network delay
estimate DN etwork determines the packet delay up to the receiver.
The playout delay can be constant throughout the entire session or can be adjusted adap- tively during the session. Since end-to-end delays in the Internet vary significantly over
time [Bol93, S+97], a constant, and non-adaptive playout delay estimation performs badly.
Adaptive playout adjustment can be accomplished on a “per-talkspurt” or a “per-packet” basis. In the context of packet audio the playout delay estimate for the first packet of a
talkspurt2 is crucial since it determines the playback time for all subsequent packets of
the talkspurt. It should be noted that playback gaps in the audio signal are immediately recognized and perceived as disturbing crackles (compare section 1.3.1.3). In the case of video streaming playout delay adjustment can be done on a “per-packet” basis if the video and the audio are coupled only loosely, or if the video is played on its own. Human image recognition does not notice small variations in the display time of individual frames. As an example, the problem of playout estimation in the case of packet audio streaming is discussed here. Figure 3.1 illustrates the problem of delay variations caused by the
network. The playout delay of the i-th packet dpi is the sum of the network delay di and
the buffering time bi.
di = ai− ti (3.5)
2Time interval that encompasses speech or music data rather than silence; transmission of silence is
Receiver
t2 tn t1 a a p p p 1 2 n 1 2 talkspurt silence . . . . . . . . . time time time arrival playout talkspurt an lateSender
Figure 3.1: Timings associated with individual packets and their talkspurts
bi = pi− ai (3.6)
dpi = di+ bi = pi− ti (3.7)
The main problem of network delay estimation is in general that the sender and receiver clocks are not synchronized, and hence, it is not trivial to calculate the absolute delay. The delay variation required to estimate the playout delay, however, does not depend on absolute times. The jitter of the i-th packet can simply be calculated as follows:
ji =|(ai− ai−1)− (ti− ti−1)| (3.8)
If packets would arrive in equal time intervals, meaning that packet jitter is zero, the audio packets could be immediately played back on reception. However, since packets on store-
and-forward networks experience different transmission delays (ji > 0), receiver buffering
is an absolute necessity.
The calculation of the optimal playback delay has the competing goals of minimizing the extra delay introduced by buffering while maximizing the number of packets arriving prior
to their playback time. Late packets that arrive after their playout point (pi < ai) are
regarded as lost. Increasing the playout delay or in other words the buffering time, to prevent packets from being late, however, is not a good solution. Long playout delays to compensate for extreme delay variations increase the total end-to-end delays and thus, limit the usability of interactive real-time streaming applications.
3.1. APPLICATION LAYER QOS 81
3.1.4.1 Network Delay
In order to compensate for the jitter introduced by the network, receiver applications need to know the current delay and delay variation of the network. Several methods for
network delay estimation (DN etwork) have been proposed [Mon83, AC+93, R+94, MKT98].
Network delay estimation is calculated either statically at the beginning of a session, or dynamically by permanently adjusting the delay according to the instantaneous network state. Buffering mechanisms that rely on dynamical delay estimation adapt the buffering time to changing network delay variations.
Adaptation to delay changes in the network requires some form of filtering of past samples, such as a low-pass filter modeled after the TCP round-trip time estimator. For wide- area Internet transmissions the effects of sudden large changes in the delay, delay spikes,
can skew network delay estimates badly. The study in [R+94] develops a network delay
estimation algorithm that explicitly considers the phenomenon of delay spikes. Simulations based on wide-area Internet audio traces have proved that this estimator performs better than conventional buffering techniques without spike detection. A similar approach, used in the mechanism presented in [MKT98], is designed to recover quickly from sudden delay spikes and presents evidence of good performance.
An interesting relationship between buffering techniques simply controlled by the packet’s transmission characteristic and open-loop error control schemes, such as FEC (see section
3.1.2), is documented in [D+94]. Generally, playout delay estimation based on the network
jitter only does not provide adequate service when error control is also an issue. Supporting an open-loop error control scheme, such as FEC, requires modification of the receiver buffering algorithms. It is suggested that applications use sufficient buffering times to ensure with high probability that a copy of lost audio packets has arrived at the receiver before its playback point exceeds.
3.1.4.2 Processing Delay
Delay variations are not only introduced while the packet is transferred on the network. The receiving node, for example, introduces a so called processing delay when decoding, decompressing, mixing, etc. the audio data. Since normal user workstations are the end systems, process scheduling delay variations appear if non-real-time operating systems are used (see section 1.2.6). As a result, buffering to compensate jitter caused by irregularities in the process scheduling is required. These buffers, however, should not be controlled by user processes, because these processes clearly cannot compensate for scheduling delay variations. Therefore, audio devices usually provide special buffers for this purpose. Since scheduling delays usually do not vary greatly, these buffers can by fairly small.
The processing delay estimation (DP rocessing) depends entirely on the end-node’s operating
mainly on the timer accuracy, whereas the accuracy of the exact playout time of the sched- uled packets depend on the operating system’s scheduling granularity and, in particular,
the minimum scheduling unit3 of processes. Dynamic adjustment of the delay is preferable
to static delay estimation, especially if the scheduling delay varies with different processing loads.
3.1.4.3 Summary
Summarizing one can conclude that receiver buffering, in order to compensate for the network delay variations and, less critically, to make up for the processing delays, plays an important role in media streaming application. In the context of real-time streaming, however, buffering delays should be as small as possible to minimize the total end-to-
end delay and as big as necessary to accomplish the required loss characteristics. In
general adaptive (dynamic) buffering delay estimations are preferable over simple (static) buffering mechanisms since “optimal” playout delay estimation depends highly on the network dynamics.
3.1.5
Summary
This section summarizes the analysis of application layer QoS mechanisms regarding their usability and importance for interactive media streaming. The following application layer techniques: packet transfer, forward error correction, adaptation and receiver buffering are examined.
With respect to packet transfer, interactive real-time streaming applications have to con- sider the following issues: First, for packet transfer RTP-on-UDP provides the best trans- port and streaming services among current Internet protocols. Second, the packet size used for media streams has the following tradeoff: it should be as small as necessary and as large as possible. Interactive streaming applications should use 1 at the most 2 media frames per packet to minimize packetization delay. Non-responsive streaming applications, in contrast, are recommended to use higher payloads in order to minimize packet overhead. Third, interactive real-time streaming applications are advised to “shape” their data traf- fic such that media packets are sent isochronously over time rather than in bursts. This has the potential to reduce packet loss rates since the likelihood of packet clustering is minimized.
Packet-based forward error correction mechanisms that are capable of correcting several consecutively lost packets provide good service for interactive real-time streaming in the Internet. Since the number of consecutively lost packets is usually small (in the order of 1
3.2. NETWORK LAYER QOS 83