Given the identiο¬ed dependencies, ideally the goal is to compute the full distribution of π(π,π’β£β) since the primary interest is to estimate the displacement ο¬eld and occu- pancy probabilities at time π‘ given image observations. Unfortunately this inference is intractable given the huge state spaces of π and π’. However the problem is well suited for an EM formulation [Dempster et al. (1977)] with some simpliο¬cations. By
focusing on estimating the Maximum A Posteriori (MAP) of π(πβ£β) and treating π’ as a latent, unobserved variable set, an EM procedure can be constructed that iteratively reο¬nes estimatesπ0, π1β β β , πβ of the motion ο¬eld π(M-step), while providing with each new iteration π+ 1 an estimate of π(π’β£β, ππ) (E-step). The latter term corresponds to
estimating voxel occupancy probabilities given the image data at time π‘ and using the previous iterationβs computed motion ο¬eld, thus providing a probabilistic shape esti- mate representation analog to previous probabilistic methods [Broadhurst et al. (2001); Franco and Boyer (2005)]. A MAP-EM formulation is chosen also because the tradi- tional EM formulation does not consider the prior π(π) on motion ο¬elds, which is used to enforce spatial continuity.
5.4.1
Expectation Maximization Formulation
Here is a MAP-EM formulation, whose goal is to ο¬nd the optimal πβ such that:
πβ = argmaxπ πΉ(π) with πΉ(π) =π(πβ£β). (5.4)
This goal is achieved iteratively starting from an initial guess π0, by building a se-
quence of motion ο¬eld estimatesπ0, π1,β β β , πβ which increase the log-posterior objective functionπΉ(π), i.e. πΉ(π0)β€ β β β β€πΉ(πβ). This is usually achieved at each step by max- imizing a lower bound of πΉ(π), whose maximum coincides with an analytically simpler function π(πβ£ππ). For a good choice of lower bound and π(πβ£ππ), the maximization guarantees an increase of the log-posterior, such that ππ+1 can be deο¬ned as follows:
The M-Step: ππ+1 = argmaxπ π(πβ£ππ). (5.5)
our problem using the following conditional expectation:
π(πβ£ππ) =πΈπ’β£β,ππ{lnπ(β,π’,π)}
=β
π’
π(π’β£β, ππ) lnπ(β,π’,π). (5.6)
The E-step evaluates π(π’β£β, ππ) of Eq. 5.6, i.e. the grid occupancy probabilities
given images βand the previously predicted displacement ππ. Given this distribution,
the log-expectation of π(β,π’,π) can be computed for all possible πas in Eq. 5.6. The E-step of the algorithm then consists in evaluating the π(π’β£β, ππ) term of the expression, i.e. the grid occupancy probabilities given images and the previously pre- dicted displacement. Both probability distributions can be obtained from Eq. 5.3. Each step is developed subsequently in further detail.
5.4.2
E-step: Occupancy Probability Update
In order to compute voxel probabilities at a given EM iteration, one needs to express
π(π’β£β, ππ) in terms of the joint probability distribution (5.3). Bayesβ rule is used (5.7) and
the summations is re-factored (5.8) to depending terms, withβdenoting proportionality up to a unit normalization factor:
π(π’β£β, ππ)ββ Β― π’ π( Β―π’π’ππβ) (5.7) ββ π Ξ¦(π’π)β β Β― π’ πβππ π π( Β―π’πβππ π)π(π’πβ£ Β― π’πβππ π), (5.8) whereβ Β― π’ πβππ π π( Β―π’πβππ π)π(π’πβ£ Β― π’πβππ
π) sums possibilities over occupancy states of the voxel π β ππ
π(π’πβ£π’Β―πβππ
π) is set deterministically: if the previous voxel πβπ
π
π was occupied (resp.
empty), then once displaced to πit is still occupied (resp. empty) with probability 1. Expression (5.8) then becomes:
π(π’β£β, ππ)ββ π ( Ξ¦(π’π)β π([ Β―π’πβππ π =π’π]) ) . (5.9)
For the purpose of providing a probabilistic shape estimate, each voxel πβs proba- bility after an E-step can thus be identiο¬ed as π(π’πβ£β, ππ)βΞ¦(π’π)β π([ Β―π’πβππ
π =π’π]), the product of current observation terms at time π‘, with the probability of the voxel mapped to πfrom π‘β1.
Ξ¦(π’π) can be computed by expliciting the image formation terms π(βπβ£π’π). For
every pixel π₯ in every image, it is assumed that the parameters β¬ of a background model have been learned oο¬ine from images of a quasi-static scene with no object of interest. Similar to Section 2.4.2, the image term π(βπβ£π’π) can then be set as following
[Franco and Boyer (2005)]:
π(βπβ£π’π) =π(π’π= 0)π(βπβ£β¬) +π(π’π= 1)π°(βπ),
where π°(βπ) is the uniform distribution over pixel color space, used to model the
aspect of objects of interest since no information about it is used, and π(βπβ£β¬) is the
probability ofβπto be drawn from the background modelβ¬. π(βπβ£β¬) can be, for instance,
a Normal or Gaussian Mixture Model distribution. Here the voxel πβs occupancy π’π
serves as a mixture label to drawβπ from foreground or background aspect distributions.
The silhouette information is latent in this representation and does not require any binary segmentation decision.
5.4.3
M-step: 3D Motion Field Update
To optimize Eq. 5.5, one still needs to expand the expression ofπ(πβ£ππ) in Eq. 5.6. The
distributionπ(β,π’,π) can be computed by marginalizing Eq. 5.3 over Β―π’,and simpliο¬ed
similarly to Eq. 5.8: π(β,π’,π)βπ(π)β π ( Ξ¦(π’π)β π([ Β―π’πβππ =π’π]) ) (5.10)
Taking the logarithm of Eq. 5.10 to computeπ(πβ£ππ), and noting that theΞ¦(π’π)term
does not depend on π, the M-step becomes:
ππ+1 = argmaxπ ln(π(π)) +β π β π’π π(π’πβ£β, ππ)β lnπ([ Β―π’πβππ =π’π]) (5.11)
where π(π’πβ£β, ππ) is computed in the E-step (Eq. 5.9).
To model 3D motion ο¬eld continuity, π(π) is chosen to be a Markov Random Field. The M-step in the EM framework becomes a standard ο¬rst-order MRF MAP problem. From time π‘ to π‘ + 1, if the displacement possibilities at every point is quantized to
π displacement options denoted as a label set β = {π1,β β β , ππ}, then Eq. 5.11 can be represented as standard graph optimization pairwise and unary energy terms:
πΈππ πΉ = β π β πβπ©(π) πΈππ(ππ, ππ) + β π πΈπ(ππ), (5.12)
where π©(π) is the neighborhood system of point πin the 3D graph. In Eq. 5.12, β
π
β
πβπ©(π)πΈππ(ππ, ππ) are the binary terms, and
β
They correspond to the negative of ln(π(π)) and β π β π’ππ(π’πβ£β, π π)β lnπ([ Β―π’ πβππ = π’π]) in Eq. 5.11 respectively.
The M-step in this EM framework thus becomes a discrete multi-labeling problem, with the goβ
π
β
πβπ©(π)πΈππ(ππ, ππ) are the binary terms, and
β
ππΈπ(ππ) are the
unary terms. They correspond to the negative of ln(π(π)) and β
π
β
π’π π(π’πβ£β, π
π)β
lnπ([ Β―π’πβππ =π’π]) in Eq. 5.11 respectively.
The M-step in our EM framework becomes a discrete multi-labeling problem, with the goal of computing a labeling πΏβ ββ£π³ β£, which assigns each grid node π β π³ a label fromβsuch that the energy πΈππ πΉ is minimized. Thus
ππ+1 = argminπΏπΈππ πΉ. (5.13)
The solution to this MRF thus provides the updated displacement ο¬eld in the EM iteration. Further details on MRF implementation is given in the sections below.