Transporting Real-time Video over the Internet:
Challenges and Approaches
Dapeng Wu, Student Member, IEEE, Yiwei ThomasHou,Member, IEEE,
and Ya-Qin Zhang Fellow,IEEE
Abstract|Delivering real-timevideoovertheInternet is
an importantcomponentof manyInternet multimedia
ap-plications. Transmissionofreal-timevideo hasbandwidth,
delay and loss requirements. However, the current
Inter-net does not oer anyqualityof service(QoS) guarantees
to video transmissionover the Internet. In addition, the
heterogeneityofthenetworksandend-systemsmakesit
dif-cult to multicast Internet video in an eÆcient and
ex-ible way. Thus, designing protocols and mechanisms for
Internet videotransmissionposesmanychallenges. Inthis
paper, wetakea holisticapproach to thesechallenges and
presentsolutionsfrombothtransportandcompression
per-spectives. With the holisticapproach, we design a
frame-work for transporting real-time Internet video, which
in-cludes twocomponents,namely,congestioncontroland
er-rorcontrol. Specically,congestioncontrol consistsofrate
control,rateadaptiveencoding,andrateshaping;error
con-trolconsistsofforwarderrorcorrection(FEC),
retransmis-sion,error-resilienceanderrorconcealment. Forthedesign
ofeachcomponentintheframework,weclassifyapproaches
andsummarizerepresentativeresearchwork. Wepointout
there existsadesignspace whichcanbeexploredbyvideo
applicationdesigners,andsuggestthatthesynergyofboth
transportandcompressioncouldprovidegoodsolutions.
Keywords|Internet, real-timevideo, congestion control,
errorcontrol.
I. Introduction
U
NICASTandmulticastdeliveryofreal-timevideoare
importantbuildingblocksofmanyInternet
multime-dia applications, such as Internet television (see Fig. 1),
videoconferencing,distancelearning,digitallibraries,
tele-presence,andvideo-on-demand. Transmissionofreal-time
video hasbandwidth, delay and lossrequirements.
How-ever,thereisnoqualityofservice(QoS)guaranteeforvideo
transmission over the current Internet. In addition, for
videomulticast,theheterogeneityofthenetworksand
re-ceivers makes it diÆcult to achieve bandwidth eÆciency
andserviceexibility. Therefore,therearemany
challeng-ingissuesthatneedtobeaddressedindesigningprotocols
andmechanismsforInternetvideotransmission.
WelistthechallengingQoSissuesasfollows.
1. Bandwidth. Toachieveacceptablepresentation
qual-ity,transmissionofreal-timevideotypicallyhasminimum
bandwidthrequirement(say,28Kbps). However,the
cur-ManuscriptreceivedFebruary2,2000;revisedAugust3,2000.This
paperwasrecommendedbyEditorJimCalder.
D. WuiswithCarnegieMellon University, Dept.of Electrical &
Computer Engineering, 5000 Forbes Ave., Pittsburgh, PA 15213,
USA.
Y.T.HouiswithFujitsuLaboratoriesof America,595 Lawrence
Expressway,Sunnyvale,CA94085,USA.
Y.-Q.Zhang iswithMicrosoftResearchChina,5F,BeijingSigma
Center, No. 49 Zhichun Road, Haidian District, Beijing 100080,
rent Internet does not provide bandwidth reservation to
meet such a requirement. Furthermore, since traditional
routerstypicallydo notactivelyparticipatein congestion
control [7],excessivetraÆccancause congestioncollapse,
which can further degrade the throughput of real-time
video.
2. Delay. Incontrasttodatatransmission,whichare
usu-allynotsubjecttostrictdelayconstraints,real-timevideo
requiresboundedend-to-enddelay(say,1second). Thatis,
everyvideopacketmustarriveatthedestinationintimeto
bedecodedanddisplayed. Thisisbecausereal-timevideo
mustbeplayedoutcontinuously. Ifthevideopacket does
notarrivetimely,the playoutprocesswill pause,whichis
annoyingtohumaneyes. Inotherwords,thevideopacket
thatarrivesbeyondatimeconstraintisuselessandcanbe
considered lost. Although real-time videorequires timely
delivery, the current Internet does notoer such a delay
guarantee. In particular, the congestion in the Internet
could incur excessive delay, which exceeds the delay
re-quirementofreal-timevideo.
3. Loss. Lossofpackets canpotentiallymakethe
presen-tationdispleasing tohumaneyes,or,in somecases,make
thepresentationimpossible. Thus,videoapplications
typ-icallyimposesome packet lossrequirements. Specically,
thepacketlossratioisrequiredtobekeptbelowa
thresh-old(say,1%)toachieveacceptablevisualquality. Although
real-time video hasa lossrequirement, the current
Inter-netdoesnotprovideanylossguarantee. Inparticular,the
packet loss ratio could bevery high during network
con-gestion,causingseveredegradationofvideoquality.
Besides the above QoS problems, for video multicast
applications, there is another challenge coming from the
heterogeneity problem. Beforeaddressingthe
heterogene-ity problem, we rst describe the advantages and
disad-vantages of unicast and multicast. The unicast delivery
of real-timevideouses point-to-point transmission,where
only one sender and one receiver are involved. In
con-trast, themulticastdeliveryofreal-timevideouses
point-to-multipointtransmission,whereonesenderandmultiple
receiversareinvolved. Forapplicationssuchasvideo
con-ferencingand Internet television, deliveryusing multicast
can achieve high bandwidth eÆciency since the receivers
can share links. On the other hand, unicast delivery of
such applicationsis ineÆcient in terms of bandwidth
uti-lization. An exampleisgivein Fig.2,where, for unicast,
vecopiesofthevideocontentowacrossLink1andthree
Fig.1. Internettelevisionusesmulticast(point-to-multipointcommunication)insteadofunicast(point-to-pointcommunication)todeliver
real-timevideosothatuserscansharethecommonlinkstoreducebandwidthusageinthenetwork.
Receiver
Receiver
Receiver
Receiver
Receiver
Sender
Link 1
Link 2
Receiver
Receiver
Receiver
Receiver
Receiver
Sender
Link 1
Link 2
(a) (b)
Fig.2. (a)Unicastvideodistributionusingmultiplepoint-to-pointconnections. (b)Multicastvideodistributionusingpoint-to-multipoint
transmission.
only onecopyof the video content traversingany link in
thenetwork(Fig.2(b)),resultinginsubstantialbandwidth
savings. However, the eÆciency of multicast is achieved
at the costof losingthe service exibilityof unicast (i.e.,
in unicast,eachreceivercan individuallynegotiateservice
parameters with the source). Such lack of exibility in
multicast can beproblematic in aheterogeneous network
environment,whichweelaborateas follows.
Heterogeneity. There are two kinds of heterogeneity,
namely, network heterogeneity and receiver heterogeneity.
Networkheterogeneityreferstothesub-networksinthe
In-ternethavingunevenlydistributedresources(e.g.,
process-ing, bandwidth, storage and congestion control policies).
Networkheterogeneitycouldmakedierentuserexperience
dierentpacketloss/delaycharacteristics. Receiver
hetero-geneitymeansthatreceivershavedierentorevenvarying
latency requirements, visual quality requirement, and/or
processing capability. For example, in live multicast of a
lecture, participantswhowantto askquestions and
inter-actwith thelecturerdesirestringentreal-timeconstraints
onthevideowhilepassivelisteners maybewillingto
sac-ricelatencyforhighervideoquality.
of networks and receiverssometimes present aconicting
dilemma. Forexample,the receiversin Fig.2(b) may
at-tempt to request fordierentvideoqualitywith dierent
bandwidth. Butonlyonecopyofthevideocontentissent
outfrom thesource. Asaresult, allthereceivershaveto
receivethe same videocontentwith thesame quality. It
is thus a challenge to design a multicast mechanism that
notonlyachieveseÆciencyinnetworkbandwidthbutalso
meetsthevariousrequirementsofthereceivers.
To addressthe above technical issues, two general
ap-proaches have been proposed. The rst approach is
network-centric. That is, the routers/switchesin the
net-work are required to provide QoS support to guarantee
bandwidth,boundeddelay,delayjitter,andpacketlossfor
video applications (e.g, integrated services [6], [11], [42],
[65] or dierentiated services [2], [27], [35]). The second
approach is solely end system-based and doesnot impose
any requirements on the network. In particular, the end
systemsemploycontroltechniquesto maximize thevideo
quality withoutanyQoSsupport from the transport
net-work. In this paper, we focus on the end system-based
andisapplicabletoboththecurrentandfuture Internet.
Extensive research based on the end system-based
ap-proachhasbeenconductedandvarioussolutionshavebeen
proposed. This paper aims at giving the reader a big
picture of this challenging area and identifying a design
space that can be explored by video application
design-ers. Wetakeaholisticapproachtopresentsolutionsfrom
both transport and compression perspectives. By
trans-portperspective,wereferto theuse ofcontrol/processing
techniqueswithoutregardof thespecic videosemantics.
Inotherwords,thesecontrol/processingtechniquesare
ap-plicable to generic data. By compression perspective, we
meanemployingsignalprocessingtechniqueswith
consid-eration of the video semantics on the compression layer.
With theholisticapproach,wedesignaframework,which
consistsoftwocomponents,namely,congestioncontroland
errorcontrol.
1. Congestion control. Burstylossand excessive delay
have devastatingeect onvideo presentation qualityand
theyareusuallycausedbynetworkcongestion. Thus,
con-gestioncontrolisrequiredtoreduce packetlossanddelay.
Onecongestioncontrolmechanismisratecontrol [5]. Rate
control attempts to minimize network congestion and the
amount of packet lossby matching the rate of the video
stream to the available network bandwidth. In contrast,
without rate control, the traÆc exceeding the available
bandwidth would be discarded in the network. To force
thesourcetosendthevideostreamattheratedictatedby
the rate control algorithm, rate adaptive video encoding
[63]orrateshaping[18]isrequired. Notethatratecontrol
isfromthetransportperspective,whilerateadaptivevideo
encodingisfromthecompressionperspective;rateshaping
isin bothtransportandcompressiondomain.
2. Error control. The purpose of congestion control is
topreventpacketloss. However,packetlossisunavoidable
intheInternetandmayhavesignicantimpacton
percep-tualquality. Thus, othermechanismsmustbein placeto
maximizevideopresentationqualityin presenceof packet
loss. Such mechanisms include error control mechanisms,
whichcanbeclassiedintofourtypes,namely,forward
er-rorcorrection (FEC),retransmission,error-resilience,and
error concealment. The principleof FEC is to add extra
(redundant)informationtoacompressedvideobit-stream
sothattheoriginalvideocanbereconstructedinpresence
of packet loss. There are three kinds of FEC: (1)
chan-nel coding, (2) source coding-based FEC, and (3) joint
source/channel coding. The use of FEC is primarily
be-cause of its advantage of small transmission delay [14].
ButFEC couldbe ineectivewhen burstypacket loss
oc-curs and such loss exceeds the recoverycapability of the
FEC codes. Conventional retransmission-based schemes
such as automatic repeat request (ARQ) are usually
dis-missed as a means for transporting real-time video since
the delay requirement may not be met. However, if the
one-way trip time is short with respect to the maximum
allowable delay, a retransmission-based approach (called
delay-constrainedretransmission)isaviableoptionfor
er-lossonthecompressionlayer. UnliketraditionalFEC(i.e.,
channelcoding),whichdirectlycorrectsbiterrorsorpacket
losses,error-resilientschemesconsiderthesemantic
mean-ingofthecompressionlayerandattempttolimitthescope
ofdamage(causedbypacketloss)onthecompressionlayer.
As a result, error-resilient schemes could reconstruct the
videopicturewithgracefullydegradedquality. Error
con-cealment is a post-processing technique used by the
de-coder. When uncorrectable bit errors occur, the decoder
useserror concealment to hide theglitch from the viewer
so that amorevisually pleasingrendition of the decoded
videocanbe obtained. Note that channelcoding and
re-transmission recover packet loss from the transport
per-spective, while source coding-based FEC,error-resilience,
anderrorconcealmentdealwithpacketlossfromthe
com-pression perspective; joint source/channel coding falls in
bothtransport andcompressiondomain.
Theremainderofthispaperisorganizedasfollows.
Sec-tion II presentstheapproachesfor congestion control. In
Section III, wedescribethemechanismsfor errorcontrol.
Section IV summarizes this paper and points out future
researchdirections.
II. Congestion Control
Therearethreemechanismsforcongestioncontrol: rate
control, rate adaptive video encoding, and rate shaping.
Rate control follows the transport approach; rate
adap-tivevideoencodingfollowsthecompressionapproach;rate
shapingcould follow either thetransport approach orthe
compressionapproach.
For thepurpose of illustration,wepresent an
architec-tureincludingthethreecongestioncontrolmechanismsin
Fig.3,wheretheratecontrolisasource-basedone(i.e.,the
sourceisresponsibleforadaptingtherate). Although the
architectureinFig.3istargetedattransportinglivevideo,
this architecture is also applicable to stored video if the
rateadaptiveencodingisexcluded. Atthesenderside,the
compressionlayercompressesthelivevideobasedonarate
adaptive encoding algorithm. After this stage, the
com-pressedvideobit-streamisrstlteredbytherateshaper
and then passedthrough theRTP/UDP/IP layersbefore
entering the Internet, where RTP is Real-time Transport
Protocol[41]. PacketsmaybedroppedinsidetheInternet
(due to congestion) or at the destination (due to excess
delay). For packets that are successfully deliveredto the
destination,theyrstpassthroughtheIP/UDP/RTP
lay-ersbeforebeingdecodedat thevideodecoder.
Underourarchitecture,aQoSmonitorismaintainedat
thereceiverside to infernetwork congestion status based
on the behavior of the arriving packets, e.g., packet loss
anddelay. Suchinformationisusedinthefeedbackcontrol
protocol,whichsendsinformationbacktothevideosource.
Basedonsuchfeedbackinformation,theratecontrol
mod-uleestimatestheavailablenetworkbandwidthandconveys
theestimatednetworkbandwidthto therateadaptive
en-coderortherateshaper. Then, therateadaptiveencoder
Compresion Layer
Rate Control
Rate Adaptive Encoding
Rate Shaper
Video Decoder
Protocol
QoS Monitor
Feedback Control
RTP Layer
UDP Layer
Receiver Side
Internet
IP Layer
IP Layer
RTP Layer
UDP Layer
Sender Side
Transport
Compression
Domain
Domain
Fig.3. Alayeredarchitecturefortransportingreal-timevideo.
is clear that thesource-basedcongestion control must
in-clude: (1)ratecontrol,and(2)rateadaptivevideo
encod-ingorrateshaping.
We organize therest of this sectionas follows. In
Sec-tionII-A,wesurveythe approachesforratecontrol.
Sec-tion II-B describes basicmethods for rate-adaptivevideo
encoding. In Section II-C, we classify methodologies for
rateshapingandsummarizerepresentativeschemes.
A. Rate Control: ATransportApproach
Since TCP retransmission introduces delays that may
notbeacceptableforreal-timevideoapplications,UDPis
usually employed as the transport protocol for real-time
video streams [63]. However, UDPis not ableto provide
congestioncontrolandovercomethelackofservice
guaran-teesintheInternet. Therefore,itisnecessarytoimplement
acontrolmechanismontheupperlayer(higherthanUDP)
to preventcongestion.
Therearetwotypesofcontrolforcongestionprevention:
one is window-based [26] and the other is rate-based [52].
The window-basedcontrolsuch asTCPworksasfollows:
it probes for the available network bandwidth by slowly
increasingacongestionwindow(usedtocontrolhowmuch
dataisoutstandinginthenetwork);whencongestionis
de-tected (indicatedbythe lossofoneormorepackets), the
protocolreducesthecongestionwindowgreatly(seeFig.4).
Therapidreductionofthewindowsizeinresponseto
con-gestionisessentialtoavoidnetworkcollapse. Ontheother
hand,therate-basedcontrolsetsthesendingratebasedon
the estimated available bandwidth in the network; if the
estimationoftheavailablenetworkbandwidthisrelatively
accurate,therate-basedcontrolcouldalsopreventnetwork
collapse.
Sincethewindow-basedcontrollikeTCPtypically
cou-plesretransmissionwhichcanintroduceintolerabledelays,
the rate-based control (i.e., rate control) is usually
em-ployed fortransportingreal-time video[63]. Existingrate
control schemes for real-time videocan be classied into
0
5
10
15
20
25
30
Time (in round trip time)
0
5
10
15
20
Congestion window (packets)
Packet loss
Fig.4. Congestionwindowbehaviorunderwindow-basedcontrol.
hybridratecontrol,whicharedescribedinSectionsII-A.1
toII-A.3,respectively.
A.1 Source-basedRateControl
Under the source-based rate control, the sender is
re-sponsible for adapting the transmission rate of the video
stream. The source-based rate control can minimize the
amount of packet lossby matching the rate of the video
stream to the available network bandwidth. In contrast,
without rate control, the traÆc exceeding the available
bandwidthcouldbediscardedin thenetwork.
Typically, feedback is employed by source-based rate
control mechanisms to conveythe changing status of the
Internet. Baseduponthefeedbackinformationabout the
network, the sender could regulate the rate of the video
stream. The source-basedrate control can be applied to
bothunicast[63]andmulticast[3].
Forunicastvideo,theexistingsource-basedratecontrol
mechanismscanbeclassiedintotwoapproaches,namely,
theprobe-basedapproach andthemodel-basedapproach,
whicharepresentedasfollows.
Probe-based approach. Such an approach is based on
0
50
100
150
Time (sec)
0
10
20
Rate (kbps)
Packet loss ratio > threshold
Fig.5. SourceratebehaviorundertheAIMDratecontrol.
so that someQoS requirements are met, e.g., packet loss
ratio p is below a certain threshold P
th
[63]. The value
ofP
th
isdeterminedaccordingtotheminimumvideo
per-ceptual quality required by the receiver. There are two
ways to adjust the sending rate: Additive Increase and
Multiplicative Decrease (AIMD) [63], and Multiplicative
Increase and Multiplicative Decrease (MIMD) [52]. The
probe-basedratecontrolcouldavoidcongestionsinceit
al-waystries toadapttothecongestionstatus, e.g.,keepthe
packetlossat anacceptablelevel.
Forthe purpose of illustration, we briey describe the
source-based rate control based on additive increase and
multiplicativedecrease. TheAIMDratecontrolalgorithm
isshownasfollows[63].
if(pP
th )
r:=minf(r+AIR );MaxRg;
else
r:=maxf(r);MinR g.
where p is the packet loss ratio; P
th
is the threshold for
the packet loss ratio; r is thesending rateat the source;
AIR is the additive increase rate; MaxR and MinR are
the maximum rate and the minimum rate of the sender,
respectively;andis themultiplicativedecrease factor.
Packet lossratiopismeasuredbythereceiverand
con-veyedbacktothesender. Anexamplesourceratebehavior
under theAIMDratecontrolisillustratedinFig.5.
Model-basedapproach. Dierentfromtheprobe-based
approach,whichimplicitlyestimatestheavailablenetwork
bandwidth, the model-based approach attempts to
esti-matetheavailablenetworkbandwidthexplicitly. Thiscan
be achieved by using athroughput model of aTCP
con-nection, which is characterized by the following formula
[19]: = 1:22MTU R TT p p ; (1)
where is the throughput of a TCP connection, MTU
(MaximumTransitUnit)isthemaximumpacketsizeused
by the connection, R TT is the round trip time for the
connection, p is the packet loss ratio experienced by the
connection. Under the model-based rate control, Eq. (1)
stream. That is, the rate-controlled video ow gets its
bandwidthshare likea TCPconnection. As aresult, the
rate-controlledvideoowcould avoidcongestion inaway
similartothatofTCP,andcanco-existwithTCPowsin
a\friendly"manner. Hence,themodel-basedratecontrol
isalsocalled\TCP friendly"ratecontrol[57]. Incontrast
to this TCP friendliness, a owwithout rate control can
getmuchmorebandwidththanaTCPowwhenthe
net-work is congested. This may lead to possible starvation
ofcompeting TCPowsdueto therapidreductionof the
TCPwindowsizein responsetocongestion.
To compute the sending rate in Eq. (1), it is
neces-saryforthesourceto obtaintheMTU,R TT,andpacket
loss ratiop. The MTU canbefound through the
mech-anism proposed by Mogul and Deering [34]. In the case
when the MTU information is not available, the default
MTU, i.e., 576bytes, will be used. The parameterR TT
can be obtained through feedback of timing information.
Inaddition,thereceivercanperiodicallysendthe
parame-terptothesourceinthetimescaleoftheroundtriptime.
Uponthereceiptoftheparameterp,thesourceestimates
thesending rate andthen arate control actionmaybe
taken.
Single-channel multicast vs. unicast. For multicast
underthesource-basedratecontrol,thesenderusesasingle
channel orone IPmulticast groupto transport the video
stream to the receivers. Thus, such multicast is called
single-channel multicast.
For single-channel multicast, only theprobe-based rate
control canbeemployed[3]. A representativeworkis the
IVS (INRIAVideo-conference System)[3]. Therate
con-trolinIVSisbasedonadditiveincreaseandmultiplicative
decrease, which is summarized asfollows. Each receiver
estimates its packet loss ratio, based on which, each
re-ceivercandeterminethenetworkstatustobeinoneofthe
threestates: UNLOADED,LOADED,andCONGESTED.
The source solicits the network status information from
thereceiversthrough probabilisticpolling, which helps to
avoidfeedbackimplosion. 1
Inthisway,thefractionof
UN-LOADED andCONGESTED receiverscanbeestimated.
Then,thesourceadjuststhesendingrateaccordingtothe
followingalgorithm. if(F con >T con ) r:=max(r=2;MinR); elseif(F un == 100%)
r:=minf(r+AIR);MaxRg;
where F con , F un , and T con
are fraction of CONGESTED
receivers,fraction of UNLOADEDreceivers,and apreset
threshold, respectively;r, MaxR, MinR, and AIRare the
sending rate, the maximum rate, the minimum rate, and
additiveincreaserate,respectively.
Single-channel multicast has good bandwidth eÆciency
sinceallthereceiversshareonechannel(e.g., theIP
mul-ticastgroupin Fig.2(b)). Butsingle-channel multicastis
unable to provide serviceexibility and dierentiation to
1
mes-Efficiency
low
high
Bandwidth
low
Flexibility
Service
high
Unicast
Receiver-based/Hybrid
Rate Control
Single-channel
Multicast
Fig.6. Trade-obetweenbandwidtheÆciencyandserviceexibility.
Fig.7. Layeredvideoencoding/decoding. Ddenotesthedecoder.
dierent receiverswith diverseaccess link capacities,
pro-cessingcapabilitiesandinterests.
Ontheotherhand,multicastvideo,deliveredthrough
in-dividualunicaststreams(seeFig.2(a)),canoer
dieren-tiatedservicestoreceiverssinceeachreceivercan
individu-allynegotiatetheserviceparameterswiththesource. But
the problem with unicast-based multicast video is
band-widthineÆciency.
Single-channelmulticastandunicast-basedmulticastare
twoextremecasesshowninFig.6. Toachievegood
trade-obetweenbandwidtheÆciency andserviceexibilityfor
multicast video, two mechanisms, namely, receiver-based
andhybridratecontrol,havebeenproposed,whichwe
dis-cussasfollows.
A.2 Receiver-basedRateControl
Underthereceiver-basedratecontrol,thereceivers
regu-latethereceivingrateofvideostreamsbyadding/dropping
channels. Incontrasttothesender-basedratecontrol,the
sender does not participate in rate control here.
Typi-cally, the receiver-basedrate control is applied to layered
multicast videoratherthanunicastvideo. Thisis
primar-ilybecausethesource-basedratecontrolworksreasonably
wellforunicastvideoandthereceiver-basedratecontrolis
targetedatsolvingheterogeneityproblemin themulticast
case.
Beforeweaddressthereceiver-basedratecontrol,werst
briey describelayeredmulticastvideoas follows. At the
senderside, arawvideosequence iscompressedinto
mul-tiple layers: a base layer (i.e., Layer 0) and one ormore
enhancementlayers(e.g., Layers1 and 2in Fig. 7). The
base layercan be independently decoded and it provides
basic video quality; the enhancement layers can only be
decodedtogether withthebase layerandtheyfurther
re-ne the quality of the base layer. This is illustrated in
64KbpsinFig.7); thehigherthelayeris,themore
band-width thelayerconsumes(seeFig.7). Aftercompression,
eachvideo layer issentto a separateIP multicast group.
At thereceiverside, each receiver subscribes to a certain
set of videolayersbyjoining the corresponding IP
multi-castgroup. In addition, each receivertries to achievethe
highestsubscriptionlevelofvideolayerswithoutincurring
congestion. IntheexampleshowninFig.8,eachlayerhas
a separate IP multicast group. Receiver 1 joins all three
IPmulticastgroups. Asaresult,itconsumes1Mbpsand
receivesall the three layers. Receiver 2 joins the two IP
multicast groupsfor Layer0and Layer1with bandwidth
usageof256Kbps. Receiver3onlyjoinstheIPmulticast
groupforLayer0withbandwidthconsumptionof64Kbps.
Like the source-based rate control, we classify
exist-ing receiver-based rate control mechanisms into two
ap-proaches, namely, the probe-based approach and the
model-basedapproach,whicharepresentedasfollows.
Probe-based approach. This approach was rst
em-ployed in Receiverdriven Layered Multicast (RLM) [33].
Basically, the probe-based rate control consists of two
parts:
1. Whennocongestionisdetected,areceiverprobesforthe
available bandwidth byjoining alayer, which leads to an
increaseof its receivingrate. Ifno congestion is detected
after the joining, the join-experiment is considered
\suc-cessful". Otherwise, the receiver drops the newly added
layer.
2. Whencongestionisdetected,thereceiverdropsalayer,
resultinginreductionof itsreceivingrate.
The above control has a potential problem when the
numberofreceiversbecomeslarge. Ifeachreceivercarries
out the above join-experiment independently, the
aggre-gatefrequencyofsuchexperimentsincreaseswiththe
num-berofreceivers. Sinceafailed join-experimentcouldincur
congestiontothenetwork,anincreaseofjoin-experiments
couldaggravatenetworkcongestion.
Tominimizethefrequencyofjoin-experiments,ashared
learningalgorithmwasproposedin[33]. Theessenceofthe
sharedlearningalgorithmistohaveareceivermulticastits
intent to the groupbefore it starts ajoin-experiment. In
thisway,eachreceivercanlearnfromotherreceivers'failed
join-experiments, resultinginadecrease ofthenumberof
failedjoin-experiments.
The sharedlearning algorithm in [33] requires each
re-ceivertomaintainacomprehensivegroupknowledgebase,
which contains the results of all the join-experiments for
the multicast group. In addition, the use of multicasting
to update the comprehensive group knowledge base may
decreaseusablebandwidth onlow-speedlinks andleadto
lowerquality forreceiversonthese links. Toreduce
mes-sageprocessing overhead at each receiverandto decrease
bandwidthusageof thesharedlearningalgorithm, a
hier-archicalratecontrolmechanismcalledLayeredVideo
Mul-ticast with Retransmissions (LVMR) [30] was proposed.
Themethodologyofthehierarchicalratecontrolisto
Fig.8. IPmulticastforlayeredvideo.
information (rather than all the information) to the
re-ceivers. Inaddition,thepartitioningofthecomprehensive
group knowledge base allows multiple experiments to be
conducted simultaneously, makingitfaster fortherateto
convergetothestablestate. Althoughthehierarchicalrate
controlcouldreducecontrolprotocoltraÆc, itrequires
in-stalling agentsin the network so that the comprehensive
groupknowledgebasecanbepartitionedandorganizedin
ahierarchicalway.
Model-based approach. Unlike the probe-based
ap-proach which implicitly estimates the available network
bandwidththroughprobingexperiments,themodel-based
approachattemptstoexplicitlyestimatetheavailable
net-work bandwidth. Themodel-based approach is based on
thethroughputmodelofaTCPconnection,whichwas
de-scribedinSectionII-A.1.
Fig.9showstheowchartofthebasicmodel-basedrate
controlexecutedbyeachreceiver,where
i
isthe
transmis-sion rateof Layeri. Inthe algorithm, it is assumedthat
eachreceiverknowsthetransmissionrateofallthelayers.
Forthe ease of description, we divide the algorithm into
thefollowingsteps.
Initialization: A receiverstarts withsubscribing thebase
layer(i.e.,Layer0)andinitializesthevariableLto0. The
variableLrepresentsthehighestlayercurrentlysubscribed.
Step 1: Receiver estimates MTU, R TT, and packet loss
ratio p for a given period. The MTU can be found
through the mechanism proposed by Mogul and Deering
[34]. Packetlossratiopcan beeasily obtained. However,
the R TT cannot be measured througha simple feedback
mechanismdue tofeedbackimplosionproblem. A
mecha-nism[53],basedonRTCPprotocol,hasbeenproposedto
estimatetheR TT.
Step 2: Upon obtaining MTU, R TT, and p for a given
period,thetargetratecanbecomputedthroughEq.(1).
Step 3: Upon obtaining , a rate control action can be
taken. If < , drop the base layerand stop receiving
video(the network cannotdelivereventhebase layerdue
tocongestion);otherwise,determineL 0
,thelargestinteger
such that P L 0 i=0 i . If L 0
> L, add the layers from
LayerL+1toLayerL 0
,andLayerL 0
becomesthehighest
layer currently subscribed (let L = L 0
); if L 0
< L, drop
layersfromLayerL 0
+1toLayerL,andLayerL 0
becomes
thehighestlayercurrentlysubscribed(letL=L 0
). Return
toStep1.
Theabovealgorithm hasapotentialproblem when the
numberofreceiversbecomeslarge. Ifeachreceivercarries
outtheratecontrolindependently,theaggregatefrequency
ofjoin-experimentsincreaseswiththenumberofreceivers.
Since a failed join-experiment could incur congestion to
thenetwork, anincreaseof join-experimentscould
aggra-vatenetworkcongestion.Tocoordinatethejoining/leaving
actionsofthereceivers,aschemebasedonsynchronization
points [55] wasproposed. With small protocol overhead,
theproposed schemein [55]helps toreduce thefrequency
and duration of join-experiments, resulting in a smaller
possibilityofcongestion.
A.3 HybridRateControl
Underthehybridratecontrol,thereceiversregulatethe
receivingrate of videostreamsby adding/dropping
chan-nelswhile thesenderalsoadjuststhetransmissionrateof
each channel based on feedback information from the
re-ceivers. Sincethehybridratecontrolconsists ofrate
con-trolatboththesenderandareceiver,previousapproaches
describedin SectionsII-A.1andII-A.2canbeemployed.
The hybrid rate control is targeted at multicast video
andisapplicabletobothlayeredvideo[44]andnon-layered
video [8]. Dierent from the source-based rate control
frameworkwherethesenderusesasinglechannel,the
hy-brid rate control framework uses multiple channels. On
theotherhand,dierentfromthereceiver-basedrate
con-trolframeworkwheretherateforeachchannelisconstant,
Estimate MTU, RTT,
and packet loss ratio p
Σ
i=0
L’
i
such that
γ <=λ
Determine L’, the largest integer
L’ > L ?
L’ < L ?
Let L=L’
Stop
Subscribe to the base layer
Let L=0
λ < γ ?
0
No
Subscribe to the base layer
λ
Compute through Eq. (1)
No
Let L=L’
Drop the base layer
Yes
Yes
No
Yes
(i.e., Layer 0)
Add the layers
Drop the layers
from Layer L+1 to Layer L’
from Layer L’+1 to Layer L
Fig.9. Flowchartofthebasicmodel-basedratecontrolforareceiver.
changetheratefor each channel basedoncongestion
sta-tus.
Onerepresentativeworkusinghybridratecontrolis
des-tinationsetgrouping(DSG)protocol[8]. Beforewepresent
theDSGprotocol,werstbrieydescribethearchitecture
associatedwith DSG. At thesender side, araw video
se-quence is compressed into multiple streams (called
repli-cated adaptive streams), which carry the same video
in-formation with dierent rate and quality. Dierent from
layered video, each stream in DSG can be decoded
inde-pendently. Aftercompression,eachvideostreamissentto
a separateIP multicast group. Atthe receiverside, each
receiver can choose a multicast group to join by taking
into account of its capability and congestion status. The
receiversalsosend feedbackto thesource, and thesource
usesthisfeedbacktoadjust thetransmissionrateforeach
stream.
TheDSGprotocolconsistsoftwomaincomponents:
1. Rate control at the source. For each stream, the
rate control at the source is essentially the sameas that
usedin IVS(seeSectionII-A.1). Butthefeedbackcontrol
foreachstreamworksindependently.
2. Rate control at a receiver. A receivercan change
its subscription andjoin ahigheror lower qualitystream
basedonnetworkstatus,i.e.,thefractionofUNLOADED,
LOADEDandCONGESTEDreceivers.Themechanismto
GESTEDreceiversissimilartothatusedinIVS.Therate
control at a receiver takes the probe-based approach as
presentedinSection II-A.2.
B. Rate-adaptive Video Encoding: A Compression
Ap-proach
Rate-adaptive video encoding has been studied
exten-sivelyforvariousstandardsandapplications,suchasvideo
conferencing withH.261 andH.263 [31],[61],storage
me-dia with MPEG-1 and MPEG-2 [16], [28], [48], real-time
transmissionwithMPEG-1andMPEG-2[17],[23],andthe
recent object-based coding with MPEG-4 [54], [63]. The
objectiveofarate-adaptiveencodingalgorithmisto
max-imizetheperceptualqualityunderagivenencodingrate. 2
Such adaptiveencoding canbeachievedbythealteration
of theencoder'squantizationparameter (QP) and/or the
alterationofthevideoframerate.
Traditionalvideoencoders(e.g.,H.261,MPEG-1/2)
typ-ically rely on altering the QP of the encoder to achieve
rate adaptation. These encoding schemes must perform
codingwith constantframe rates. This is becauseeven a
slightreductioninframeratecansubstantiallydegradethe
perceptualqualityat thereceiver,especially during a
dy-namic scenechange. Sincealtering theQP is notenough
to achieveverylowbit-rate, these encoding schemes may
2
chang-Fig.10. Anexampleof videoobject(VO)conceptinMPEG-4video. Avideoframe(left)issegmentedinto twoVOplaneswhereVO1
(middle)isthebackgroundandVO2(right)istheforeground.
notbesuitableforverylowbit-ratevideoapplications.
Onthecontrary,MPEG-4andH.263codingschemesare
suitableforverylowbit-ratevideoapplicationssincethey
allow the alteration of the frame rate. In fact, the
alter-ationoftheframerateisachievedbyframe-skip. 3
Speci-cally,iftheencoderbuerisindangerofoverow(i.e.,the
bitbudgetisover-usedbythepreviousframe),acomplete
frame canbeskipped at theencoder. Thiswill allow the
codedbitsofthepreviousframestobetransmittedduring
thetimeperiodofthisframe,thereforereducingthebuer
level(i.e.,keepingtheencodedbitswithinthebudget).
Inaddition, MPEG-4is therstinternationalstandard
addressingthecodingofvideoobjects(VO's)(seeFig.10)
[24]. WiththeexibilityandeÆciencyprovidedbycoding
videoobjects,MPEG-4iscapableofaddressinginteractive
content-basedvideoservicesaswellas conventionalstored
and live video [39]. In MPEG-4, a frame of a video
ob-jectiscalledavideoobjectplane(VOP),whichisencoded
separately. Suchisolationofvideoobjectsprovidesuswith
much greaterexibility to performadaptiveencoding. In
particular, we candynamically adjust target bit-rate
dis-tributionamongvideoobjects,inadditiontothealteration
of QP on each VOP (such a scheme is proposed in [63]).
Thiscanupgradetheperceptualqualityfortheregionsof
interest(e.g.,headandshoulder)whileloweringthequality
forotherregions(e.g.,background).
Forallthevideocodingalgorithms,afundamental
prob-lem ishowto determineasuitableQPtoachievethe
tar-getbit-rate. Therate-distortion(R-D)theoryisapowerful
tooltosolvethisproblem. UndertheR-Dframework,there
aretwoapproachesforencoding ratecontrolin the
litera-ture: the model-basedapproach andthe operationalR-D
basedapproach. The model-basedapproachassumes
var-ious input distributions and quantizer characteristics [9],
[63]. Under this approach, closed-form solutions can be
obtainedbyusing continuousoptimizationtheory. Onthe
other hand, the operational R-D based approach
consid-ers practical coding environments where only a nite set
of quantizersis admissible[23],[28],[48],[61]. Under the
operational R-D based approach, the admissible
quantiz-ers are used by the rate control algorithm to determine
theoptimalstrategy tominimize thedistortionunder the
constraintofagivenbitbudget. Theoptimaldiscrete
solu-tionscanbefoundthroughapplyingintegerprogramming
3
theory.
C. Rate Shaping
Rate shaping is a technique to adapt the rate of
com-pressedvideobit-streamsto thetarget rateconstraint. A
rate shaperis aninterface(or lter)between theencoder
andthenetwork,withwhichtheencoder'soutputratecan
bematchedtotheavailablenetworkbandwidth. Sincerate
shapingdoesnotrequireinteractionwiththeencoder,rate
shaping is applicable to any video coding scheme and is
applicabletobothliveandstoredvideo. Rateshapingcan
beachievedthroughtwoapproaches: oneisfromthe
trans-port perspective [22], [45], [67] and the other is from the
compressionperspective[18].
Arepresentativemechanismfromthetransport
perspec-tiveisserverselectiveframediscard [67].Theserver
selec-tiveframediscardismotivatedbythefollowingfact.
Usu-ally,aservertransmitseachframe withoutanyawareness
of the available network bandwidth and the client buer
size. As a result, the network may drop packets if the
available bandwidth is less than required, which leads to
framelosses. Inaddition,theclientmayalsodroppackets
that arrivetoolate for playback. This causes wastageof
networkbandwidthandclientbuerresources. Toaddress
this problem, the selectiveframe discardscheme
preemp-tively dropsframes at theserverin an intelligentmanner
byconsideringavailablenetworkbandwidthandclientQoS
requirements. Theselectiveframe discard hastwo major
advantages. First, by taking the network bandwidthand
clientbuerconstraintsintoaccount,theservercanmake
thebest useofnetworkresourcesbyselectivelydiscarding
framesinordertominimizethelikelihoodoffutureframes
being discarded, thereby increasing the overall quality of
thevideodelivered. Second, unlikeframedropping in the
networkorattheclient,theservercanalsotakeadvantage
ofapplication-specicinformationsuchasregionsof
inter-est andgroupofpictures(GOP) structure, in itsdecision
indiscardingframes. Asaresult,theserveroptimizesthe
perceived quality at the client while maintaining eÆcient
utilizationofthenetworkresources.
A representativemechanismfrom thecompression
per-spective is dynamic rate shaping [18]. Based onthe R-D
theory, the dynamic rate shaper selectively discards the
Discrete Cosine Transform (DCT) coeÆcients of thehigh
Internet
Request
Retransmission
Error Concealment
Error Resilient
Decoder
IP Layer
UDP Layer
Transport
Compression
Domain
Domain
IP Layer
RTP Layer
UDP Layer
Receiver Side
Loss Detection
Error Resilient
Sender Side
Retransmission
Control
FEC Encoder
FEC Decoder
Encoder
RTP Layer
Raw Video
Display
Video
Fig.11. Anarchitectureforerrorcontrolmechanisms.
namic rateshaperselectsthehighestfrequencies and
dis-cards the DCT coeÆcients of these frequencies until the
targetrateismet.
Congestion control attempts to prevent packet loss by
matching therateof videostreamsto theavailable
band-width inthenetwork. However,packet lossisunavoidable
intheInternetandmayhavesignicantimpacton
percep-tualquality. Therefore,weneedothermechanismsto
max-imize thevideopresentation quality in presenceof packet
loss. Such mechanisms include error control mechanisms,
whicharepresentedinthenextsection.
III. ErrorControl
IntheInternet, packets maybedroppeddue to
conges-tionatrouters,theymaybemis-routed,ortheymayreach
thedestinationwith suchalongdelayastobeconsidered
uselessorlost. Packetlossmayseverelydegradethevisual
presentationquality. Toenhancethevideoqualityin
pres-ence of packet loss, error control mechanisms have been
proposed.
Forcertaintypesofdata(suchastext),packetlossis
in-tolerablewhiledelayisacceptable. Whenapacketislost,
there are two ways to recover the packet: the corrupted
data must be corrected by traditional FEC (i.e., channel
coding),orthepacketmustberetransmitted. Ontheother
hand,forreal-timevideo,somevisualqualitydegradation
isoftenacceptablewhiledelaymustbebounded. This
fea-tureof real-timevideointroducesmanynewerrorcontrol
mechanisms,whichareapplicabletovideoapplicationsbut
notapplicable to traditionaldata suchastext. Basically,
the error control mechanisms for video applications can
beclassiedintofourtypes,namely,FEC,retransmission,
error-resilience, and errorconcealment. FEC,
retransmis-sion,anderror-resilienceareperformedatboththesource
and the receiver side, while error concealment is carried
out only at the receiver side. Fig.11 shows the location
of each errorcontrolmechanisminalayeredarchitecture.
As shown in Fig. 11, retransmission recovers packet loss
from the transport perspective; error-resilience and error
concealment deal with packet loss from the compression
siondomains. Fortherestofthissection,wepresentFEC,
retransmission,error-resilience,anderrorconcealment,
re-spectively.
A. FEC
Theuseof FECisprimarilybecauseofitsadvantageof
small transmission delay, compared with TCP [14]. The
principleof FECis to add extra(redundant)information
toacompressedvideobit-streamsothattheoriginalvideo
canbe reconstructedin presenceof packet loss. Based on
thekindofredundantinformationto beadded, the
exist-ing FEC schemes can be classied into three categories:
(1)channelcoding,(2) sourcecoding-basedFEC,and(3)
jointsource/channelcoding,whichwillbepresentedin
Sec-tionsIII-A.1to III-A.3, respectively.
A.1 ChannelCoding
For Internet applications, channel coding is typically
used in termsof blockcodes. Specically, avideostream
isrstchoppedinto segments,eachof which ispacketized
into k packets; then for each segment, a block code (e.g.,
Tornadocode [1])is applied to thek packets to generate
a n-packet block, where n > k. Specically, the channel
encoderplacesthekpacketsintoagroupandthencreates
additionalpacketsfrom them sothat thetotalnumberof
packets in the groupbecomes n, where n > k (shown in
Fig. 12). This group of packets is transmitted to the
re-ceiver, which receivesK packets. To perfectly recover a
segment, a user must receive K (K k) packets in the
n-packet block (see Fig. 12). Inother words, auseronly
needstoreceiveanykpacketsinthen-packetblocksothat
itcanreconstruct alltheoriginalkpackets.
Sincerecoveryiscarriedoutentirelyatthereceiver,the
channel coding approachcanscaleto arbitrarynumberof
receiversinalargemulticastgroup. Inaddition,duetoits
abilitytorecoverfromanykoutofnpacketsregardlessof
whichpacketsarelost,itallowsthenetworkand receivers
to discard some of the packets which cannot be handled
due to limited bandwidth or processing power. Thus, it
Encoder
Decoder
source
data
encoded
data
received
data
reconstructed
data
k
n
K>=k
k
Fig.12. Channelcoding/decodingoperation.
disadvantagesassociatedwithchannel codingas follows.
1. Itincreasesthetransmissionrate. Thisisbecause
chan-nelcodingaddsn kredundantpacketstoeverykoriginal
packets,whichincreasestheratebyafactorofn=k. In
ad-dition, thehigherthelossrateis,thehigherthe
transmis-sion rateis requiredto recoverfrom the loss. The higher
the transmission rate is, the more congested the network
gets, whichleadsto an evenhigher lossrate. This makes
channelcodingvulnerableforshort-termcongestion.
How-ever, eÆciency may be improved by using unequal error
protection[1].
2. Itincreasesdelay. Thisisbecause(1)achannelencoder
mustwaitforallkpacketsinasegmentbeforeitcan
gener-atethen kredundantpackets;and(2)thereceivermust
waitforatleastkpacketsofablockbeforeitcanplayback
thevideosegment. Inaddition, recoveryfrom burstyloss
requires the use of either longer blocks (i.e., largerk and
n)ortechniqueslikeinterleaving. Ineithercase,delaywill
befurtherincreased. Butforvideostreamingapplications,
which can tolerate relatively large delay, the increase in
delaymaynotbeanissue.
3. It is notadaptiveto varyinglosscharacteristicsand it
works best only when the packet loss rate is stable. If
morethann kpacketsofablockarelost,channelcoding
cannot recoveranyportion of the original segment. This
makeschannelcodinguselesswhentheshort-termlossrate
exceedstherecoverycapability ofthecode. Onthe other
hand,ifthelossrateis wellbelowthecode'srecovery
ca-pability,theredundantinformationismorethannecessary
(a smallerratio n=k would be moreappropriate). To
im-provethe adaptivecapability of channel coding, feedback
canbeused. Thatis,ifthereceiverconveystheloss
char-acteristicstothesource,thechannelencodercanadaptthe
redundancy accordingly. Note that this requires aclosed
loopratherthananopenloopintheoriginalchannel
cod-ingdesign.
Asignicantportionofpreviousresearchonchannel
cod-ingforvideotransmissionhasinvolvedequal error
protec-tion (EEP),in which allthe bitsof thecompressed video
stream aretreated equally, andgivenanequal amountof
redundancy. However, the compressed video stream
typ-ically does not consist of bits of equal signicance. For
example, in MPEG, anI-frameis moreimportant thana
P-framewhileaP-frameismoreimportantthanaB-frame.
Currentresearchisheavilyweightedtowardsunequalerror
informationbits aregivenmoreprotection. A
representa-tive work of UEP is the Priority Encoding Transmission
(PET) [1]. A key feature of the PET scheme is to allow
auserto setdierentlevels(priorities)oferrorprotection
for dierent segmentsof the videostream. This unequal
protectionmakesPETeÆcient(lessredundancy)and
suit-ablefor transporting MPEGvideowhich hasan inherent
priorityhierarchy(i.e.,I-,P-,andB-frames).
Toprovideerrorrecoveryinlayeredmulticastvideo,Tan
and Zakhor proposed a receiver-driven hierarchical FEC
(HFEC)[50]. InHFEC,additionalstreamswithonlyFEC
redundantinformationaregeneratedalongwiththevideo
layers. Each ofthe FECstreamsis used forrecoveryof a
dierentvideolayer,and each oftheFECstreams issent
to a dierent multicast group. Subscribing to more FEC
groupscorrespondstohigherlevelofprotection. Likeother
receiver-driven schemes, HFEC also achievesgood
trade-obetweenexibilityofprovidingrecoveryandbandwidth
eÆciency,that is:
Flexibilityof providing recovery: Each receiver can
inde-pendently adjust the desired level of protection based on
pastreception statisticsand theapplication's delay
toler-ance.
BandwidtheÆciency: Each receiverwillsubscribetoonly
asmany redundancy layersas necessary, reducing overall
bandwidthutilization.
A.2 SourceCoding-basedFEC
Source coding-based FEC(SFEC)is arecentlydevised
variant of FEC for Internet video [4]. Like channel
cod-ing,SFECalsoaddsredundantinformationtorecoverfrom
loss. Forexample,SFECcouldaddredundantinformation
asfollows: the nth packet containsthenth GOB (Group
ofBlocks)andredundantinformationaboutthe(n 1)th
GOB. If the (n 1)th packet is lost but the nth packet
isreceived,thereceivercanstillreconstruct the(n 1)th
GOBfromtheredundantinformationaboutthe(n 1)th
GOB,whichis containedinthenthpacket. However,the
reconstructed (n 1)th GOB has acoarser quality. This
isbecausetheredundantinformationaboutthe (n 1)th
GOBisacompressedversionofthe(n 1)thGOBwitha
largerquantizer,resultinginlessredundancyaddedtothe
nthpacket.
Themain dierencebetweenSFECand channel coding
isthekindofredundantinformationbeingaddedtoa
com-pressedvideostream. Specically,channelcodingadds
re-dundantinformation accordingtoablockcode(irrelevant
to the video) while the redundant information added by
SFEC is morecompressed versions of the raw video. As
a result, when there is packet loss, channel coding could
achieve perfect recovery while SFEC recovers the video
withreducedquality.
One advantage of SFEC over channel coding is lower
delay. ThisisbecauseeachpacketcanbedecodedinSFEC
while,underthechannelcodingapproach,boththechannel
encoderandthechanneldecoder havetowaitfor atleast
R
1
R
2
Distortion
Source rate
2
1
D
D
Distortion
o
1
2
D
D
D
o
R
1
R
R
2
Source rate
(a) (b)Fig.13. (a)Rate-distortion relation forsourcecoding. (b)
Rate-distortionrelationforthecaseofjointsource/channelcoding.
are: (1)anincreaseinthetransmissionrate,and(2)
inex-ible to varying losscharacteristics. However,such
inexi-bility to varying losscharacteristicscan alsobeimproved
through feedback [4]. That is, if thereceiverconveysthe
loss characteristicsto the source, the SFEC encoder can
adjusttheredundancyaccordingly. Notethatthisrequires
acloseloopratherthananopenloopintheoriginalSFEC
codingscheme.
A.3 JointSource/ChannelCoding
Due to Shannon's separation theorem [43], the coding
worldwasgenerallydividedintotwocamps: sourcecoding
and channel coding. Thecamp ofsource coding was
con-cerned with developing eÆcient source coding techniques
while thecampof channel coding wasconcernedwith
de-velopingrobust channel coding techniques [21]. In other
words, the camp of source coding did not take channel
coding into account and the camp of channel coding did
not consider source coding. However, Shannon's
sepa-ration theorem is not strictly applicable when the delay
is bounded, which is the case for such real-time services
as video over the Internet [10]. The motivation of joint
source/channel coding forvideocomes from thefollowing
observations:
CaseA:Accordingtotherate-distortiontheory(shown
inFig.13(a))[13],thelowerthesource-encodingrateRfor
avideounit,thelargerthedistortionD ofthevideounit.
That is,R#)D".
Case B: Suppose that the total rate (i.e., the
source-encoding rateR plusthechannel-codingredundancy rate
R 0
)isxedandchannellosscharacteristicsdonotchange.
The higher the source-encoding rate for a video unit is,
the lower the channel-coding redundancy rate would be.
ThisleadstoahigherprobabilityP
c
oftheeventthat the
video unit gets corrupted, which translates into a larger
distortion ofthevideounit. That is,R")R 0
#)P
c ")
D".
CombiningCasesA andB, itcan bearguedthat there
existsanoptimalsource-encodingrateR
o
thatachievesthe
minimum distortionD
o
(seeFig.13(b)), givenaconstant
totalrate. Asillustratedin Fig.13(b),theleftpartofthe
curveshowsCaseAwhiletherightpartofthecurveshows
CaseB.Thetwopartsmeetattheoptimalpoint(R
o ,D
o ).
The objective of joint source/channel coding is to
nd the optimal point shown in Fig. 13(b) and design
source/channel coding schemes to achieve the optimal
point. In other words, nding an optimal point in joint
source/channel coding is to make an optimal rate
alloca-tionbetweensourcecodingandchannelcoding.
Basically,jointsource/channelcodingisaccomplishedby
threetasks:
Task 1: nding an optimal rate allocation between
sourcecodingand channel coding foragivenchannelloss
characteristic,
Task 2: designing a source coding scheme (including
specifyingthequantizer)to achieveitstargetrate,
Task 3: designing/choosingchannelcodesto matchthe
channellosscharacteristicandachievetherequired
robust-ness.
For thepurpose of illustration, Fig.14 shows an
archi-tecturefor joint source/channel coding. Under the
archi-tecture,aQoSmonitoriskeptatthereceiversidetoinfer
the channel losscharacteristics. Such information is
con-veyedbacktothesourcesidethroughthefeedbackcontrol
protocol. Based on such feedback information, the joint
source/channeloptimizermakesanoptimalrateallocation
between the source coding and the channel coding (Task
1) and conveys the optimal rate allocation to the source
encoderandthechannelencoder. Thenthesourceencoder
choosesanappropriatequantizertoachieveitstargetrate
(Task 2)andthechannelencoderchoosesasuitable
chan-nelcodeto matchthechannellosscharacteristic(Task3).
Anexampleofjointsource/channelcodingisthescheme
introducedbyDavisandDanskin[14]fortransmitting
im-agesovertheInternet. Inthisscheme,sourceandchannel
coding bits are allocated in a way that can minimize an
expected distortion measure. As a result, more
percep-tually important low frequency sub-bands of images are
shieldedheavilyusingchannelcodeswhile higher
frequen-cies areshieldedlightly. This unequalerrorprotection
re-duceschannelcodingoverhead,whichismostpronounced
onburstychannelswhereauniformapplicationofchannel
codesisexpensive.
B. Delay-constrained Retransmission: A Transport
Ap-proach
A conventional retransmission scheme, ARQ, works as
follows: whenpackets arelost,thereceiversendsfeedback
to notify the source; then thesource retransmits the lost
packets. TheconventionalARQ isusually dismissed asa
method for transporting real-time video since a
retrans-mitted packet arrives at least 3 one-way trip time after
the transmission of the original packet, which might
ex-ceedthedelayrequiredbytheapplication. However,ifthe
one-way trip time is short with respect to the maximum
allowable delay, a retransmission-based approach, called
delay-constrainedretransmission,isaviableoptionfor
er-rorcontrol[37],[38].
Typically, one-way trip time is relatively small within
sen-Internet
Source Decoder
IP Layer
UDP Layer
IP Layer
RTP Layer
UDP Layer
Receiver Side
Sender Side
RTP Layer
QoS Monitor
Protocol
Feedback Control
Joint Source/Channel
Optimizer
Channel Decoder
Channel Encoder
Source Encoder
Raw Video
Compression
Domain
Transport
Domain
Fig.14. Anarchitectureforjointsource/channelcoding.
constrainedretransmissionforlossrecoveryinanLAN
en-vironment[15]. Delay-constrainedretransmissionmayalso
be applicable to streaming video, which can tolerate
rel-atively large delay due to a largereceiver buer and
rel-atively long delay for display. As a result, even in wide
area network (WAN), streaming video applications may
have suÆcient time to recoverfrom lost packets through
retransmissionandtherebyavoidunnecessarydegradation
in reconstructedvideoquality.
In the following, we present various delay-constrained
retransmission schemes for unicast (Section III-B.1) and
multicast (SectionIII-B.2),respectively.
B.1 Unicast
Based on who determines whether to send and/or
respond to a retransmission request, we design three
delay-constrained retransmissionmechanismsfor unicast,
namely,receiver-based,sender-based,andhybridcontrol.
Receiver-based control. Theobjectiveof the
receiver-based control is to minimize the requests of
retransmis-sion that will not arrive timely for display. Under the
receiver-based control,the receiverexecutes thefollowing
algorithm.
WhenthereceiverdetectsthelossofpacketN:
if(T c +R TT+D s <T d (N))
sendtherequestforretransmissionof
packetN to thesender;
where T
c
is thecurrenttime, R TT isan estimatedround
triptime, D
s
isaslackterm,andT
d
(N)is thetime when
packetN isscheduledfordisplay. TheslacktermD
s could
include toleranceoferrorinestimating R TT,thesender's
responsetimetoarequest,and/orthereceiver'sprocessing
delay(e.g.,decoding). IfT
c +R TT+D s <T d (N)holds,it
isexpectedthattheretransmittedpacketwillarrivetimely
fordisplay. Thetimingdiagram forreceiver-basedcontrol
isshowninFig.15.
Sender-basedcontrol. Theobjectiveofthesender-based
RTT
T
c
Sender
Receiver
request for packet 2
retransmitted packet 2
packet 1
packet 2 lost
packet 3
T (2)
d
D
s
Fig.15. Timingdiagramforreceiver-basedcontrol.
miss theirdisplaytimeat thereceiver. Under the
sender-basedcontrol,thesenderexecutesthefollowingalgorithm.
Whenthesenderreceivesarequestforretransmission
ofpacketN: if(T c +T o +D s <T 0 d (N))
retransmitpacketN to thereceiver
where T
o
is the estimated one-way trip time (from the
sendertothereceiver),andT 0 d (N)isanestimateofT d (N). To obtain T 0 d
(N), the receiver has to feedback T
d (N) to
the sender. Then, based on the dierences between the
sender's system time and the receiver's system time, the
sendercanderiveT 0
d
(N). The slacktermD
s
mayinclude
error termsin estimating T
o and T
0
d
(N), as well as
toler-ance in thereceiver'sprocessingdelay(e.g., decoding). If
T c +R TT +D s < T 0 d
(N) holds, it can be expected that
retransmittedpacketwillreachthereceiverintimefor
dis-play. Thetimingdiagramforsender-basedcontrolisshown
inFig.16.
Hybrid control. The objective of the hybrid control is
tominimizetherequestofretransmissionsthatwillnot
ar-rivefor timely display, and to suppress retransmission of
thepacketsthatwillmisstheirdisplaytimeatthereceiver.
Thehybridcontrolis asimplecombinationof the
sender-basedcontrol and thereceiver-basedcontrol. Specically,
thereceivermakesdecisionsonwhethertosend
retransmis-sionrequestswhilethesendermakesdecisionsonwhether
con-T
c
D
s
D
s
d
T (2)
Sender
Receiver
T
o
T’ (2)
d
packet 1
packet 2 lost
packet 3
request for packet 2
retransmitted packet 2
Fig.16. Timingdiagramforsender-basedcontrol.
complexity.
B.2 Multicast
Inthemulticastcase,retransmissionhastoberestricted
within closely located multicast members. This is
be-cause one-way trip timesbetweenthese memberstend to
be small, making retransmissions eective in timely
re-covery. In addition, feedback implosion of retransmission
requests is a problem that must be addressed under the
retransmission-based approach. Thus, methods are
re-quired to limit thenumber orscopeof retransmission
re-quests.
Typically, alogicaltree is conguredto limitthe
num-ber/scopeof retransmission requests and to achievelocal
recovery among closely located multicast members [29],
[32], [64]. The logical tree can be constructed by
stati-cally assigning Designated Receivers (DRs) at each level
ofthetreeto helpwithretransmissionoflost packets[29].
Oritcanbedynamicallyconstructedthroughtheprotocol
used inSTructure-OrientedResilientMulticast (STORM)
[64]. By adapting the tree structure to changing
net-worktraÆcconditionsandgroupmemberships,thesystem
couldachievehigherprobabilityofreceivingretransmission
timely.
Similartothereceiver-basedcontrolforunicast,receivers
inamulticastgroupcanmakedecisionsonwhethertosend
retransmission requests. By suppressing the requests for
retransmissionofthosepacketsthatcannotberecoveredin
time, bandwidtheÆciency can beimproved[29]. Besides,
using a receiving buer with appropriate size could not
only absorb the jitter but also increase the likelihood of
receiving retransmitted packets before their display time
[29].
To address heterogeneity problem, a receiver-initiated
mechanism for error recovery canbe adopted as done in
STORM [64]. Under this mechanism, each receiver can
dynamically select the best possible DR to achieve good
trade-o betweendesired latency and the degreeof
relia-bility.
C. Error-resilience: ACompressionApproach
Error-resilient schemes address loss recovery from the
compression perspective. Specically, they attempt to
prevent error propagation or limit the scope of the
damage (caused by packet losses) on the compression
layer. The standardized error-resilient tools include
re-Resilience
Error
low
high
Inter Mode
Intra Mode
Compression
Efficiency
low
high
Optimal in R-D sense
Fig.17. Illustrationofoptimalmodeselection.
covery(e.g.,reversiblevariablelengthcodes(RVLC))[24],
[49]. However,re-synchronizationmarking,data
partition-ing,anddatarecoveryaretargetedaterror-prone
environ-mentslikewireless channelsandmaynotbeapplicableto
theInternet. ForInternetvideo,theboundaryofapacket
already provides a synchronization point in the
variable-lengthcodedbit-stream atthereceiverside. Ontheother
hand,sinceapacketlossmaycausethelossofallthe
mo-tion data and its associated shape/texture data,
mecha-nismssuchasre-synchronizationmarking,data
partition-ing,anddatarecoverymaynotbeusefulforInternetvideo
applications. Therefore, we donot intend to present the
standardized error-resilienttools. Instead,wepresenttwo
techniques which are promising for robust Internet video
transmission,namely,optimal mode selection andmultiple
description coding.
C.1 OptimalModeSelection
Inmanyvideocodingschemes,ablock,whichisavideo
unit,is codedbyreference to apreviouslycoded blockso
thatonlythedierencebetweenthetwoblocksneedstobe
coded,resultinginhighcodingeÆciency. Thisiscalled
in-ter mode. Constantlyreferringtopreviouslycodedblocks
hasthedangeroferrorpropagation.Byoccasionally
turn-ing o this inter mode, error propagation canbelimited.
But it will be more costly in bits to code a block all by
itself, without any reference to a previously coded block.
Suchacodingmodeiscalledintramode. Intra-codingcan
eectively stoperror propagation at the cost of
compres-sion eÆciency while inter-codingcan achievecompression
eÆciencyattheriskoferrorpropagation. Therefore,there
isatrade-oinselectingacodingmodeforeachblock(see
Fig.17). Howtooptimallymakethese choices isthe
sub-jectofmanyresearchinvestigations[12],[62],[66].
Forvideocommunicationoveranetwork,ablock-based
coding algorithm such as H.263 or MPEG-4 [24] usually
employsratecontroltomatchtheoutputratetothe
avail-ablenetwork bandwidth. Theobjectiveof rate-controlled
compression algorithms is to maximize the video quality
under the constraint of a given bit budget. This can be
achievedbychoosingamodethatminimizesthe
quantiza-tion distortion between the original block and the
recon-structed one under agivenbit budget[36], [46], which is
theso-calledR-D optimizedmodeselection. Werefersuch
Raw Video
Video
Encoder
Video
Decoder
Transport
Protocol
Network
Protocol
Transport
Depacketizer
Packetizer
Source behavior
Receiver behavior
Path characteristics
Fig.18. Factorsthathaveimpactonthevideopresentationquality:
sourcebehavior,pathcharacteristics,andreceiverbehavior.
malityundertheerror-proneenvironmentsinceitdoesnot
considerthenetworkcongestionstatusandthereceiver
be-havior.
Toaddressthisproblem, an end-to-endapproach to
R-D optimizedmodeselectionwasproposed [62]. Under the
end-to-endapproach,three factorswere identiedto have
impact on the video presentation quality at the receiver:
(1) the source behavior,e.g., quantizationand
packetiza-tion, (2)the pathcharacteristics,and (3) thereceiver
be-havior,e.g.,errorconcealment(seeFig.18). Basedonthe
characterizations,atheory [62] forglobally optimalmode
selection wasdeveloped. By taking into considerationof
the network congestion status and the receiverbehavior,
the end-to-end approach is shown to be capable of
oer-ing superior performance over the classical approach for
Internetvideoapplications[62].
C.2 MultipleDescriptionCoding
Multiple description coding (MDC) is another way to
achieve trade-o between compression eÆciency and
ro-bustness to packet loss [59]. With MDC, araw video
se-quence iscompressedintomultiple streams(referredtoas
descriptions). Eachdescriptionprovidesacceptable visual
quality; more combined descriptions provide a better
vi-sual quality. Theadvantagesof MDC are: (1) robustness
to loss: evenif areceivergetsonlyonedescription(other
descriptionsbeinglost), itcanstill reconstructvideowith
acceptable quality; and(2) enhanced quality: ifareceiver
getsmultipledescriptions,itcancombinethemtogetherto
produce abetterreconstruction than that produced from
anysingledescription.
However,theadvantagesdonotcomeforfree. Tomake
eachdescriptionprovideacceptablevisualquality,each
de-scriptionmustcarrysuÆcientinformationaboutthe
origi-nalvideo. ThiswillreducethecompressioneÆciency
com-paredto conventional singledescriptioncoding(SDC). In
addition, although more combined descriptions provide a
better visual quality, a certain degree of correlation
be-tweenthemultipledescriptionshastobeembeddedineach
description,resultingin further reductionofthe
compres-sion eÆciency. Current research eort is to nd a good
trade-o between the compression eÆciency and the
re-D. ErrorConcealment: A CompressionApproach
When packet loss is detected, the receiver can employ
error concealment to concealthe lost data and make the
presentation morepleasing to humaneyes. Since human
eyes can tolerate a certain degree of distortion in video
signals,error concealment is aviabletechniqueto handle
packetloss[60].
There are two basic approaches for error concealment,
namely, spatialand temporalinterpolation. Inspatial
in-terpolation, missing pixel values are reconstructed using
neighboring spatial information, whereas in temporal
in-terpolation,thelostdataisreconstructedfromdatainthe
previousframes. Typically,spatialinterpolationisusedto
reconstructmissing datain intra-codedframeswhile
tem-poral interpolationis used to reconstruct missing data in
inter-codedframes.
In recent years, numerous error-concealment schemes
have been proposed in the literature (refer to [60] for a
good survey). Examples include maximally smooth
re-covery [58], projection onto convexsets [47], and various
motionvectorandcodingmoderecoverymethodssuchas
motion compensated temporal prediction [20]. However,
most error concealment techniques discussed in [60] are
only applicable to either ATM or wireless environments,
and require substantial additional computation
complex-ity, which is acceptable for decoding still images but not
tolerablein decoding real-time video. Therefore, weonly
describesimpleerrorconcealmentschemesthat are
appli-cableto Internetvideocommunication.
We describe three simple error concealment (EC)
schemesasfollows.
EC-1: Thereceiverreplacesthewholeframe(wheresome
blocksarecorrupteddue topacketloss)withtheprevious
reconstructedframe.
EC-2: The receiverreplaces acorrupted block with the
blockat thesamelocationfromthepreviousframe.
EC-3: Thereceiverreplacesthecorruptedblockwiththe
block from the previous frame pointed by a motion
vec-tor. Themotionvectoriscopiedfromitsneighboringblock
whenavailable,otherwisethemotionvectorisset tozero.
EC-1andEC-2arespecialcasesofEC-3. Ifthemotion
vectorofthecorruptedblockisavailable,EC-3canachieve
betterperformancethanEC-1andEC-2whileEC-1and
EC-2havelesscomplexitythanthatofEC-3.
IV. Summary
Transporting video over the Internet is an important
componentofmanymultimedia applications. LackofQoS
support in the current Internet, and the heterogeneityof
thenetworksandend-systemsposemanychallenging
prob-lemsfordesigningvideodeliverysystems. Inthispaper,we
identied fourproblemsfor videodeliverysystems:
band-width, delay, loss,and heterogeneity. There are two
gen-eralapproachesthataddresstheseproblems: the
network-centric approachand theendsystem-based approach. We