• No results found

Tracked Frame

Tracked Frame

Figure 6.4. Forwardandbackwardmotionestimation. Adaptedfrom[39 ,Fig.5.5].

anchor andtrackedframes, respectively. Ingeneral, wecanrepresent themotion

eld asd(x;a), where a = [a

1 ;a 2 ;:::;a L ] T

is avector containingall the motion

parameters. Similarly, the mapping function can be denoted by w (x;a) = x+

d(x;a):Themotionestimationproblemistoestimatethemotionparametervector

a. Methods thathavebeendevelopedcanbecategorizedinto twogroups: feature-

basedandintensity-based. Inthefeature-basedapproach,correspondencesbetween

pairsofselectedfeaturepointsintwovideoframesare rstestablished. Themotion

model parameters are then obtainedby a least squares tting of the established

correspondences into the chosen motionmodel. This approach is only applicable

toparametricmotionmodelsandcanbequitee ectivein,say,determiningglobal

motions. Theintensity-basedapproachappliestheconstantintensityassumptionor theoptical owequationateverypixelandrequirestheestimatedmotiontosatisfy thisconstraintascloselyaspossible. Thisapproachis moreappropriatewhenthe

underlyingmotioncannotbecharacterizedbyasimplemodel,andthatanestimate

ofapixel-wiseorblock-wisemotion eldisdesired.

Inthis chapter, we only consider intensity-based approaches, which are more

widelyusedin applicationsrequiringmotioncompensatedpredictionand ltering.

In general, the intensity-based motion estimation problem can be converted into

anoptimization problem, andthree keyquestionsneed to beanswered: i)how to

parameterizetheunderlyingmotion eld? ii)whatcriteriontousetoestimatethe parameters? andiii)howtosearchfortheoptimalparameters? Inthissection,we

rstdescribeseveralwaystorepresentamotion eld. Thenweintroducedi erent

types of estimation criteria. Finally, we present search strategies commonly used

6.2.1 Motion Representation

A keyproblem in motion estimationis howto parameterize the motion eld. As

shown in Sec. 5.5, the 2D motion eld resulting from acamera orobjectmotion

canusuallybedescribedbyafewparameters. However,usually,therearemultiple objectsintheimagedscenethatmovedi erently. Therefore,aglobalparameterized

model isusually notadequate. Themostdirect and unconstrainedapproachis to

specifythemotionvectorateverypixel. Thisistheso-calledpixel-basedrepresenta- tion. Sucharepresentationisuniversallyapplicable,butitrequirestheestimation

ofalargenumberof unknowns(twicethenumberofpixels!) andthesolutioncan

oftenbephysicallyincorrectunlessaproperphysicalconstraintisimposed during theestimationstep. Ontheotherhand,ifonlythecameraismovingortheimaged scenecontainsasinglemovingobjectwithaplanarsurface,onecoulduseaglobal motion representationtocharacterizetheentiremotion eld. Ingeneral,forscenes containingmultiplemovingobjects,itismoreappropriatetodivideanimageframe intomultipleregionssothatthemotionwithineachregioncanbecharacterizedwell

byaparameterizedmodel. This is known asregion-based motion representation,

3

whichconsistsofaregionsegmentationmapandseveralsetsofmotionparameters,

one for each region. The diÆculty with such an approach is that one does not

know in advance which pixels havesimilar motions. Therefore, segmentation and

estimationhavetobeaccomplishediteratively,whichrequiresintensiveamountof

computationsthatmaynotbefeasibleinpractice.

Oneway to reduce thecomplexityassociatedwith region-basedmotion repre-

sentationisbyusinga xedpartitionoftheimagedomainintomanysmallblocks.

Aslongaseachblockissmall enough,themotionvariationwithin eachblockcan

becharacterizedwellbyasimplemodelandthemotionparametersforeach block

can be estimated independently. This brings us to the popular block-based repre-

sentation. The simplest version models the motion in each block by a constant

translation, so that the estimation problem becomes that of nding one MV for

eachblock. Thismethodprovidesagoodcompromisebetweenaccuracyandcom-

plexity, and hasfound great successin practical videocoding systems. One main

problem with theblock-basedapproach is that itdoesnotimpose any constraint

on the motion transition between adjacent blocks. Theresulting motion is often

discontinuousacrossblockboundaries,evenwhenthetruemotion eld ischanging

smoothlyfromblocktoblock. Oneapproachto overcomethisproblemisbyusing

a mesh-based representation, by which the underlying image frame is partitioned

into non-overlapping polygonalelements. The motion eld over the entire frame

is described by the MVs at the nodes (corners of polygonal elements) only, and

theMVsattheinteriorpointsofanelementareinterpolatedfromthenodalMVs.

Thisrepresentationinducesamotion eldthatiscontinuouseverywhere. Itismore appropriatethantheblock-basedrepresentationoverinteriorregionsof anobject,

3

Thisissometimescalledobject-basedmotionrepresentation[27].Hereweusetheword\region- based"toacknowledgethefactthatweareonlyconsidering2Dmotions,andthataregionwith

Figure6.5. Di erentmotionrepresentations: (a)global,(b)pixel-based,(c)block-based, and(d)region-based. From[38 ,Fig.3].

whichusuallyundergoesacontinuousmotion,butitfailstocapturemotiondiscon-

tinuities at object boundaries. Adaptiveschemes that allow discontinuities when

necessaryisneededformoreaccuratemotionestimation. Figure6.5illustratesthe

e ect of several motion representations described above for a head-and-shoulder

scene. Inthenextfewsections,wewillintroducemotionestimationmethodsusing di erentmotionrepresentations.

6.2.2 Motion Estimation Criteria

Forachosenmotionmodel,theproblemishowtoestimatethemodelparameters.In thissection,wedescribeseveraldi erentcriteriaforestimatingmotionparameters.

CriterionbasedonDisplacedFrameDi erence

The most popular criterion for motion estimation is to minimize the sum of the

errorsbetweentheluminancevaluesofeverypairofcorrespondingpointsbetween

the anchor frame

1

and the tracked frame

2 . Recall that x in 1 is moved to w (x;a)in 2

. Therefore,theobjectivefunction canbewritten as,

E DFD (a)= X j 2 (w (x;a)) 1 (x)j p ; (6.2.1)

whereisthedomainofallpixelsin 1

,andpisapositivenumber. Whenp=1,

the above error is called mean absolute di erence (MAD), and when p = 2, the

mean squared error (MSE). The error image, e(x;a) =

2

(w (x;a)) 1

(x), is

usuallycalled displacedframe di erence(DFD)image, andtheabovemeasurethe

DFD error.

Thenecessarycondition forminimizing E

DFD

isthat its gradientvanishes. In thecaseofp=2,this gradientis

@E DFD @a = 2 X x2 ( 2 (w (x;a)) 1 (x)) @d(x) @a r 2 (w (x;a)) (6.2.2) where @d @a = " @d x @a1 @d x @a2  @d x @aL @dy @a 1 @dy @a 2  @dy @a L # T :

CriterionbasedonOptical Flow Equation

Instead of minimizing the DFD error, another approach is to solve the system

of equations establishedbased onthe optical ow constraintgiven in Eq. (6.1.3). Let 1 (x;y) = (x;y;t); 2 (x;y) = (x;y;t+d t ): If d t

is small, we canassume

@ @t dt= 2 (x) 1

(x):Then,Eq.(6.1.3)canbewrittenas

@ 1 @x d x + @ 1 @y d y +( 2 1 )=0 or r T 1 d+( 2 1 )=0: (6.2.3)

This discrete version of the optical ow equation is more often used for motion

estimation in digitalvideos. Solving the aboveequations for all x canbe turned

intoaminimizationproblemwiththefollowingobjectivefunction:

E ow (a)= X x2 r 1 (x) T d(x;a)+ 2 (x) 1 (x) p : (6.2.4) ThegradientofE ow is,whenp=2; @E ow @a = 2 X x2 r 1 (x) T d(x;a)+ 2 (x) 1 (x)  @d(x) @a r 1 (x): (6.2.5)

Ifthemotion eld isconstantoverasmallregion 0 , i.e.,d(x;a)=d 0 ;x2 0 ,

thenEq.(6.2.5) becomes

@E ow @d 0 = X x2 0 r 1 (x) T d 0 + 2 (x) 1 (x)  r 1 (x): (6.2.6)

Settingtheabovegradienttozeroyieldstheleastsquaressolutionford 0 : d 0 = X 0 r 1 (x)r 1 (x) T ! 1 X 0 ( 1 (x) 2 (x))r 1 (x) ! : (6.2.7)

When themotion is not a constant, but canbe related to the model parameters linearly, onecanstill derivea similar least-squaressolution. SeeProb. 6.6 in the

Problemsection.

Anadvantageoftheabovemethodisthattheminimizingfunctionisaquadratic

function of the MVs, when p =2. If the motion parametersare linearly related

to the MVs, then the function has a unique minimum and can be solved easily.

This is not true with the DFD error given in Eq. (6.2.1). However, the optical

ow equation is valid only when the motion is small, or when an initial motion

estimate ~

d(x)thatisclosetothetruemotioncanbefoundandonecanpre-update 2

(x)to 2

(x+ ~

d(x))Whenthis isnotthecase,itisbettertousetheDFDerror

criterion, and nd the minimal solution using the gradient descent orexhaustive

searchmethod.

Regularization

MinimizingtheDFDerrororsolvingtheoptical owequationdoesnotalwaysgive

physicallymeaningfulmotionestimate. Thisispartiallybecausetheconstantinten- sityassumptionisnotalwayscorrect. Theimagedintensityofthesameobjectpoint

mayvaryafter anobjectmotionbecauseof thevariousre ectanceandshadowing

e ects. Secondly,inaregionwith attexture,manydi erentmotionestimatescan satisfytheconstantintensityassumptionortheoptical owequation. Finally,ifthe

motionparametersaretheMVs ateverypixel, theoptical owequationdoesnot

constrainthemotionvectorcompletely. Thesefactorsmaketheproblemofmotion

estimationaill-posedproblem.

Toobtainaphysicallymeaningfulsolution,oneneedstoimposeadditionalcon-

straintstoregularizetheproblem. Onecommonregularizationapproachis toadd

apenaltytermto theerrorfunction in (6.2.1)or(6.2.4),whichshould enforcethe

resultingmotionestimatetobearthecharacteristicsofcommonmotion elds. One

well-knownpropertyofatypicalmotion eldisthatitusuallyvariessmoothlyfrom pixeltopixel,exceptatobjectboundaries. Toenforcethesmoothness,onecanuse

apenaltytermthat measures thedi erencesbetween theMVsofadjacentpixels,

i.e., E s (a)= X x2 X y 2N x kd(x;a) d(y ;a)k 2 ; (6.2.8)

where kkrepresents the 2-norm, N

x

represents the set of pixels adjacent to x. Eitherthe4-connectivityor8-connectivityneighborhoodcanbeused.

Theoverallminimizationcriterioncanbewrittenas

E=E DFD +w s E s : (6.2.9)

TheweightingcoeÆcientw

s

shouldbechosenbasedon theimportance ofmotion

smoothness relativeto thepredictionerror. Toavoidover-blurring,oneshould re- ducetheweightingatobjectboundaries. This,however,requiresaccuratedetection

Bayesian Criterion

TheBayesianestimator isbased onaprobablisticformulationof themotionesti-

mationproblem,pioneeredbyKonradandDubois[22,38]. Underthisformulation,

givenananchorframe

1

,theimagefunctionatthetrackedframe 2

isconsidered arealizationofarandom eld ,andthemotion elddisarealizationofanother

random eld D . The a posterior probability distribution of the motion eld D

givenarealization of and

1

canbewritten, usingtheBayesrule

P(D =dj = 2 ; 1 )= P( = 2 jD =d; 1 )P(D =d; 1 ) P( = 2 ; 1 ) : (6.2.10)

In the above notation, the semicolon indicates that subsequent variables are de-

terministicparameters. An estimatorbasedontheBayesiancriterionattempts to

maximize the a posterior probability. But for given

1 and

2

, maximizing the

aboveprobabilityis equivalent tomaximizing the numeratoronly. Therefore,the

maximumaposterior(MAP)estimateofdis

d MAP = argmax d fP( = 2 jD =d; 1 )P(D =d; 1 )g: (6.2.11)

The rstprobabilitydenotesthelikelihoodofanimageframegiventhemotion

eldandtheanchorframe. LetE representtherandom eldcorrespondingtothe

DFD imagee(x)=

2

(x+d)

1

(x)forgivendand 1 , then P( = 2 jD =d; 1 )=P(E =e);

andtheaboveequationbecomes

d MAP = argmax d fP(E =e)P(D =d; 1 )g = argmin d f logP(E =e) logP(D =d; 1 )g: (6.2.12)

From the source coding theory (Sec. 8.3.1), the minimum coding length fora

source X is its entropy, logP(X = x). We see that the MAP estimate is

equivalent to minimizing thesum of the coding lengthfor the DFD image e and

that for themotion eld d. As will beshown in Sec.9.3.1, this is precisely what

avideocoderusingmotion-compensated predictionneedsto code. Therefore,the

MAPestimatefordisequivalenttoaminimumdescriptionlength(MDL)estimate

[34]. Becausethepurposeof motionestimationin videocodingis tominimizethe

bitrate,theMAPcriterionisabetterchoicethanminimizingthepredictionerror.

The most common model for the DFD image is a zero-mean independently

identicallydistributed (i.i.d.) Gaussian eld, withdistribution

P(E =e)=(2 2 ) jj=2 exp P x2 e 2 (x) 2 2 ; (6.2.13)

wherejjdenotes thesize of (i.e.,thenumberofpixelsin ). With thismodel, minimizingthe rstterminEq.(6.2.12)isequivalenttominimizingthepreviously

Forthemotion eldD ,acommonmodelisaGibbs/Markovrandom eld[11].

Suchamodelisde nedbyaneighborhoodstructurecalledclique. LetCrepresent

thesetofcliques,themodelassumes

P(D =d)= 1 Z exp( X c2C V c (d)); (6.2.14)

whereZ isanormalizationfactor. Thefunction V

c

(d)iscalledthepotentialfunc-

tion,which isusuallyde nedto measurethedi erencebetweenpixelsin thesame

clique: V c (d)= X (x;y )2c jd(x) d(y )j 2 : (6.2.15)

Underthismodel,minimizingthesecondterminEq.(6.2.12)isequivalenttomin-

imizing the smoothing function in Eq. (6.2.8). Therefore, the MAP estimate is

equivalenttotheDFD-basedestimatorwithanappropriatesmoothnessconstraint.

6.2.3 Minimization Methods

The error functions presented in Sec. 6.2.2 can be minimized using various opti-

mization methods. Here we only consider exhaustive search and gradient-based

searchmethods. Usually,fortheexhaustivesearch,theMADisusedforreasonsof

computational simplicity, whereasfor thegradient-basedsearch, theMSE is used

foritsmathematicaltractability.

Obviously,theadvantageoftheexhaustivesearchmethodisthat itguarantees reachingtheglobalminimum. However,suchsearchisfeasibleonlyifthenumberof unknownparametersissmall,andeachparametertakesonlya nitesetofdiscrete values. Toreducethesearchtime,variousfastalgorithmscanbedeveloped,which achievesub-optimalsolutions.

Themost commongradientdescent methods include the steepest gradient de-

scent andtheNewton-Ralphsonmethod. Abriefreviewofthesemethodsisprovided

inAppendixB. Agradient-basedmethodcanhandleunknownparametersinahigh

dimensional continuousspace. However,it canonlyguaranteethe convergence to

alocalminimum. Theerrorfunctions introducedintheprevioussectioningeneral

arenotconvexandcanhavemanylocalminimathatarefarfromtheglobalmini-

mum. Therefore,itisimportanttoobtainagoodinitialsolutionthroughtheuseof

aprior knowledge,orbyaddingapenaltytermtomaketheerrorfunctionconvex.

Withthegradient-basedmethod,onemustcalculatethespatio-temporalgradi-

entsoftheunderlyingsignal. AppendixAreviewsmethodsforcomputing rstand

second order gradientsfrom digitalsampled images. Note that the methods used

for calculating the gradient functions canhave profound impact on the accuracy

and robustnessofthe associatedmotion estimation methods, ashavebeen shown

by Barron et al. [4]. Using a Gaussian pre- lter followed by acentral di erence

Oneimportantsearchstrategyistouseamulti-resolutionrepresentationofthe

motion eldand conductthesearchin ahierarchicalmanner. Thebasicideaisto

rst search the motion parametersin acoarse resolution, propagatethis solution

into a ner resolution, andthen re ne thesolution in the ner resolution. Itcan

combat both the slowness of exhaustive search methods and the non-optimality

of gradient-basedmethods. Wewill present themulti-resolution method in more

detailinSec.6.9.

6.3 Pixel-Based Motion Estimation

In pixel-based motion estimation, one tries to estimate a motion vector for ev-

ery pixel. Obviously, this problem is ill-de ned. If one uses the constant inten-

sity assumption, foreverypixel in theanchorframe, there aremany pixelsin the trackedframe that have exactlythesameimage intensity. If oneusesthe optical

ow equation instead, the problem is again indeterminate, because there is only

oneequationfor twounknowns. Tocircumvent thisproblem, there are ingeneral

four approaches. First, onecan use the regularizationtechnique to enforce some

smoothnessconstraintsonthemotion eld,sothatthemotionvectorofanewpixel

isconstrained bythose foundfor surroundingpixels. Second,one canassumethe

motionvectorsin aneighborhoodsurroundingeach pixelarethesame,and apply

theconstantintensityassumptionortheoptical owequationovertheentireneigh-

borhood. Thethird approach is to makeuse of additionalinvariance constraints.

In addition to intensity invariance, which leadsto the optical ow equation, one

can assume that the intensity gradient is invariant under motion, as proposed in

[29,26,15]. Finally,onecanalsomakeuseoftherelationbetweenthephasefunc-

tions of the frame before and after motion [9]. In [4], Barron, et al. evaluated

variousmethods foroptical owcomputation,bytestingthese algorithmsonboth

synthetic and real worldimageries. In this section, we will describe the rst two approachesonly. Wewillalsointroducethepel-recursivetypeofalgorithmswhich aredevelopedforvideocompressionapplications.

6.3.1 Regularization Using Motion Smoothness Constraint

HornandSchunck[16]proposed toestimatethemotionvectorsbyminimizingthe

followingobjectivefunction,whichisacombinationofthe ow-basedcriterionand

amotionsmoothness criterion:

E(v (x))= X x2  @ @x v x + @ @y v y + @ @t  2 +w s krv x k 2 +krv y k 2  : (6.3.1)

Intheiroriginal algorithm,thespatial gradientofv x and v y areapproximatedby rv x =[v x (x;y) v x (x 1;y);v x (x;y) v x (x;y 1)] T ;rv y =[v y (x;y) v y (x 1;y);v y (x;y) v y (x;y 1)] T

Nagle and Enkelmann conducted a comprehensive evaluation of the e ect of smoothnessconstraintsonmotionestimation[30]. Inordertoavoidover-smoothing

of the motion eld, Nagel suggested an oriented-smoothness constraint in which

smoothness isimposedalongtheobjectboundaries,butnotacrosstheboundaries

[29]. This has resultedin signi cantimprovement in motion estimation accuracy

[4].

6.3.2 Using a Multipoint Neighborhood

Inthisapproach,whenestimatingthemotionvectoratapixelx n

,weassumethat

themotionvectorsofallthepixelsinaneighborhoodB(x n

)surroundingitarethe same,beingd

n

. Todetermined n

,onecaneitherminimizethepredictionerrorover B(x

n

), orsolve the optical owequation using aleast squares method. Here we

presentthe rst approach. Toestimatethemotion vectord

n forx

n

, weminimize

theDFDerroroverB(x

n ): E n (d n )= 1 2 X x2B (x n ) w(x)( 2 (x+d n ) 1 (x)) 2 ; (6.3.2)

where w(x) are theweights assigned to pixel x. Usually, theweight decreases as

thedistancefromxto x

n

increases.

Thegradientwithrespecttod

n is g n = @E n @d n = X x2B (x n ) w(x)e(x;d n ) @ 2 @x x+dn ; (6.3.3) wheree(x;d n )= 2 (x+d n ) 1

(x)istheDFDatxwiththeestimated n

. Letd l n representtheestimateatthel-thiteration,the rstordergradientdescentmethod

would yieldthefollowingupdate algorithm

d l+1 n = d l n g n (d l n ): (6.3.4)

From Eq. (6.3.3), the update at each iteration depends on the sum of the image

gradientsat variouspixelsscaledbytheweightedDFDvaluesatthosepixels.

OnecanalsoderiveaniterativealgorithmusingtheNewton-Ralphsonmethod.

FromEq.(6.3.3),theHessianmatrixis

H n = @ 2 E n @d 2 n = X x2B (x n ) w(x) @ 2 @x @ 2 @x T x+d n +w(x)e(x;d n ) @ 2 2 @x 2 x+dn  X x2B (x n ) w(x) @ 2 @x @ 2 @x T x+dn :

TheNewton-Ralphsonupdatealgorithmisthen(See AppendixB):

d l+1 = d l H(d l ) 1 g n (d l ): (6.3.5)

This algorithmconvergesfaster thanthe rstorder gradientdescentmethod, but itrequiresmorecomputationineachiteration.

Insteadofusing gradient-basedupdatealgorithms, onecanalsouseexhaustive

search to ndthed

n

thatyields theminimal errorwithin ade ned searchrange.

Thiswillleadtotheexhaustiveblockmatchingalgorithm(EBMA)tobepresented

inSec.6.4.1. Thedi erencefrom theEBMAisthat theneighborhoodusedhereis

aslidingwindowandaMVisdeterminedforeachpixelbyminimizingtheerrorin

itsneighborhood. Theneighborhood in generaldoesnothavetobearectangular

block.

6.3.3 Pel-Recursive Methods

Ina videocoder using motioncompensated prediction, oneneedsto specify both

theMVsandtheDFDimage. Withapixel-basedmotionrepresentation,onewould

needto specifyaMV foreachpixel, which isverycostly. Inpel-recursivemotion

Related documents