MPEG packets) in order to decrease the packetization delay and possibly the decoding time if the decoder processes bits as soon as they enter its buer.
Buering in the Depacketization Function
We assume now that the depacketization function in the receiver reconstructs the CBR MPEG stream, as shown in Figure 77, bybuering incoming packets and feeding the decoder buer at the constant rate B. Received
bits are feeded to the decoder only after the whole packet has been received. Thus, bits exiting the CBR MPEG encoder are buered in the buer of the packetization (sender) and depacketization (receiver) functions for an overall time
Dp = PB +s PCs (35)
The packetization delay introduced in this system conguration is Ps=B larger than the
one given by Equation (33). Thus, if the depacketization function reconstructs a CBR MPEG
stream, the end-to-end delay is increased. The Ps=B increment is necessary if Assumption 6
is dropped and the last bit of a picture must not be the last of the packet in which it is
sent. In fact, the sender packetization function delays the rst bit of each packet by Ps=B, as
shown in Figure 77; if decoding is not delayed by the same amount of time, the decoder buer under ows when, for example, the last bit of a picture happens to be the rst in a packet.
Frame
Grabber AdaptorVideo
Packetization
Encoder Packetization Decoder
Figure 78: Architecture of a Videoconferencing System Exploiting a CBR MPEG Encoder and an Asynchronous Packet Switched Network.
5.7.2 Multi-Hop Conguration
Figure 78 shows the architecture of a videoconferencing system that exploits a CBR MPEG encoder and an asynchronous packet switched network. This conguration adds the queueing
delay, the network resynchronization delay, and the excess resynchronization delay Er to the
end-to-end delay given by Equation (34): Async
CBR =Sc+Ss+Dp +P + QM +Er+D + Pd (36)
where P is the propagation delay on the links on the path from source to destination, QM
is the maximum queueing delay experienced by packets in the network, Er 2 [0;Q] is the
excess resynchronization delay introduced by the replay buer (being it in the packetization function or in the decoder) due to lack of synchronization between sender and receiver network interfaces (see Section 3.4.1), Q is the variation of the queueing delay. The packetization
delay Dp is given by Equation (33) or Equation (35) depending on the behavior of the de-
packetization function (see Section 5.7.1).
Since bits are produced at the constant rate B and packets are sent at constant pace, the
dimension of the replay buer must be, at least,
2QB
This expression provides also the increment required in the dimension of the decoder buer in case it is used to compensate the network delay variation.
5.7.3 Packetization and Startup Shaping Delay
The packetization function can impact the startup shaping delay. In many practical cases the mean bit production rate is higher than the target rate; thus the startup shaping delay is essential only while encoding the rst MBs of a picture in order to compensate a possible instantaneous low bit production rate. If the packetization function is included in the closed control loop of the encoder, the startup shaping delay can be reduced or even eliminated. The control function takes into account that bits are not removed from the encoder buer for a
time Ps
B. Thus, even if the initial bit production rate is low, the average over this time interval
can be sucient to produce a packet worth of bits. 105
Dedicated Circuit Time Driven Asynchronous
Link Switching Priority Packet Switching
Raw Video D ed Raw = F r C +P+P d CS Raw =S n +P+S w+P d TD P Raw =S n +LT f +P d Sn=S AS n +(Nr?1)T f S AS n 2[0;T] Asy nc Raw = Fr C + P+ Q M + E r +P d E r 2[0;Q] Asy nc?S h Raw =Sn+ P s C + P+Q M +Er+P d VBR MPEG D ed VB R =C M + P+ D+ P d CS VB R =C M +S w+P+ D+P d CS ?TS VB R =C M +S CS n + P+S w+D+P d TD P VB R?I =C M +S AS n + LT f +D+P d S AS n 2[0;T] TD P?CxS c VB R = C M + S S ched n +LT f +D+P d S S ched n 2[0;NT] Asy nc VB R =C M + P s C +P+ Q M +Er+D+P d Asy nc?TS VB R = C M + S TS n + P s C + P+ Q M + Er+ D+P d CBR MPEG D ed CB R =S c +S s +P+ D+P d CS CB R =S c +S s +S w+ P+D+P d TD P CB R =S c +S s +LT f + S n +E r +D+P d Sn=S Pack n +S AS n S Pack n 2[0;Tc] S AS n 2[0;T c ] Er2[0;Tc?P f =B ] Asy nc CB R =Sc+Ss+Dp+ P+Q M +E r +D+P d D p = P s B + P s C
Table 2: Summary of the Congurations Considered in this Work.
6 Summary
In this work we analyzed the end-to-end delay of videoconferencing over packet switched networks. Our key ndings are summarized in Table 2. The main design objective is to keep the end-to-end delay below 100 ms. This requirement comes from the human hearing sensitivity for delays larger than 100 ms and the requirement for lip-synchronization, i.e., the need for the audio and video to be synchronized.
Controlling the delay also requires to control the amount of buers used at the (i) sender, (ii) network and (iii) receiver. The control of the buer sizes can have adverse consequences: 1. Decreasing the buers inside the network can increase the packet loss inside the network
and degrade the quality of the received video.
2. Limiting the MPEG encoder buer size will limit the maximumI-frame size and degrade the compressed MPEG video quality.
Thus, what we found and formulated, in this study, are some tradeo between the perception quality due to delay and the received picture quality due to loss and compression.
We found interesting results some of which are counter intuitive. This in turn illustrates the importance of this sort of study.
1. Transmission of raw video does not necessarily provide the shorter delay. This is because of the transmission time needed for large number of bits of high denition pictures. 2. MPEG CBR encoding of a xed scene introduces long delay. This is because the inter-
frame coding of P-frames requires few bits, and therefore, a single I-frame can use all the CBR capacity allocated to the group of picture (GOP). As a result, the I-frame transmission will last the entire GOP period.
3. Using asynchronous packet switching with statistical multiplexing is challenging. This is because the distribution of the delay variation or jitter inside the network can be large and with heavy tail. Thus, conservative design with large replay buer would result in large delay, and small replay buer would result in occasional under ow/over ow of the replay buer and distortion of the video viewed by the user at the receiver.
4. If the capture card and the display buer are using the same reference clock the delay can be decreased by one video frame period. For 10 frames per second this means delay reduction of 100 ms and for 30 frames per second the delay reduction is 33 ms.
5. MPEG VBR video over asynchronous packet switched network requires high equivalent capacity or eective bandwidth, which can be too much to be wasted over wide area links. Therefore, in some case it may be better to use only I-frames, i.e., to use for example Motion-JPEG.
In conclusion, MPEG-based videoconferencing is possible, however, in order to keep the end-to-end delay below human perception, there are four requirements:
1. The capture (frame grabber) card and display (video) adapter should have a common time reference clock.
2. The pictures should be sent right after the compression is completed as a variable bit rate (VBR) stream.
3. The network jitter should be controlled with a well dened bound.
We also showed that with time driven priority and complex scheduling it is possible to have the following properties for VBR MPEG:
1. Bound of 250seconds on the jitter.
2. No loss even if the link is fully utilized.
3. The end-to-end delay is dominated by the propagation delay plus L125seconds (L
depends on the number of hops). This delay is shorter than the delay that can be obtained over a circuit switched network. On a circuit switched connection a CBR encoder must be deployed which introduces a coding shaping delay larger than the video frame period. If the scene is slow, this delay can be as large as the duration of a group of pictures.