GREEDY ALGORITHM
4.1 MAP ESTIMATE
By formulating scene reconstruction as a Maximum A Posteriori (MAP) estimation problem, and using a voxel based scene representation, the reconstruction problem can be expressed as an optimisation problem where the objective is to assign an opacity and radiance to each voxel, so that the joint posterior probability distribution over these parameters is maximised. This can alternatively be expressed as a minimisation problem, where the objective is to minimise a weighted combination of the projected error and negative log prior probability of the scene.
Using S = {S1, S2, . . . , SM} to represent the set of scene parameters and C =
{C1, C2, . . . , CN} to represent the set of camera pixel intensities, the MAP reconstruc-
tion problem can be expressed as givenCequalsc, find the most likely estimates ofS. This can be written using Bayes’ rule as
SMAP(c) = arg maxs
·ρ
C|S(c|s)ρS(s) ρC(c)
¸
. (4.1)
The denominator term, ρC(c), represents the prior probability of obtaining the
observed camera data. Since this term is independent of s, it can be removed from the expression without affecting the optimisation. The first numerator term ρC|S(c|s), represents the likelihood of observing the datac given estimates. Using ˇc, to represent the pixel intensities that would be recorded in the absence of any noise, abberations, or modelling errors, this can be equivalently written as
ρC|S(c|s) =
Z
ˇ
c
ρC|C,Sˇ (c|ˇc,s)ρCˇ|S(ˇc|s), (4.2)
where ρC|C,Sˇ (c|cˇ,s) is the probability distribution of obtaining c, given ˇc and s, and ρCˇ|S(ˇc|s) is the probability distribution of ˇc, given s. Assuming independent noise at
each of the sensors, this can be simplified to give
ρC|S(c|s) = Z ˇ c ρCˇ|S(ˇc|s) N Y k=1 ρCk|Cˇk(ck|cˇk). (4.3)
In most instances, the ideal intensity at each pixel will be uniquely determined by the scene parameters, and can be expressed as ˇck = projk(s), where projk(s) is the
projection of s onto the kth pixel. For infinite scenes, or any semi-infinite scenes that include all points that are within the field of view of the cameras, this is guaranteed. It is also true for finite scenes, provided that there are no radiating or opaque regions outside the scene that are visible in any of the cameras. In situations where points outside the defined scene volume are visible, the probability distribution ρCˇ|S(ˇc|s) can
be simplified by assuming that the ideal pixel intensities depend on the radiance of regions inside or outside the scene but not both. This condition will be valid provided
4.1 MAP ESTIMATE 75
that radiances outside the modelled scene volume are independent of those within the scene, there are no opaque or radiating surfaces between the defined scene volume and any of the cameras, and that the scene is either completely transparent or opaque along any pixel beam.
Assuming binary transmittances through the scene along each pixel beam, and using
ξk(s) to represent a boolean function that is equal to one if the transmittance along the kth pixel beam is zero, and zero otherwise, the conditional probability distribution
ρCˇ|S(ˇc|s) can be expanded to give
ρCˇ|S(ˇc|s) =ρξ1(s)(ˇcξ1(s)|s)ρξ0(s)(ˇcξ0(s)|s)
= Y
k∈Cˇξ1(s)
δ(ˇck−projk(s))×ρξ0(s)(ˇcξ0(s)|s), (4.4)
where ˇCξ1(s) is the set of pixels for whichξk(s) = 1 and ˇCξ0(s) is the remaining set of
pixels, corresponding with ξk(s) = 0. For pixel rays outside the scene, ξk(s) = 0. The
term ρξ0(s)(ˇcξ0(s)|s) represents the joint probability distribution of obtaining the ideal
pixel intensities in the set ˇCξ0(s). This function is governed by the prior probability
of the background radiances. In situations where the background radiance is known, this term will be a delta function. If the background radiances are unknown, a uniform distribution over the range of pixel intensities is usually assumed, allowing the term to be approximated by 1/κn, where n is the number of pixels in ˇCξ0(s), and κ is the
dynamic range of the cameras.
This function can be further simplified by assuming the average transmittance through the scene along any pixel beam is zero. Such a scene is referred to as com- plete, as it completely defines all ideal pixel intensities [Seitz and Dyer 1999]. For scenes with binary regional opacities, this condition is ensured if there is at least one opaque region extending across every pixel ray. With infinite or semi-infinite scenes, an equiv- alent scene can always be found that is complete with respect to the set of camera images. This is achieved by replacing any transparent region extending to infinity along incomplete pixel rays, with an opaque region. So long as the transmitted radiance of the two regions is the same, both will appear identical from all camera positions.
By ensuring the scene estimate is complete, the conditional probability distribution
ρCˇ|S(ˇc|s), can be simplified to give
ρCˇ|S(ˇc|s) =
N
Y
k=1
δ(ˇck−projk(s)). (4.5)
Substituting this back into Eq. 4.3, gives
ρC|S(c|s) = N Y k=1 ρC k|Cˇk(ck|projk(s)). (4.6)
Assuming binary regional transmittances, the projection projk(s) ofs onto thekth
pixel is given from Theorem 4 in Chapter 2 as
projk(s) =Rνi(xk, yk, Zi∗(xk, yk, Tν)), (4.7)
where iis the index of the image containing pixel k, Z∗
i(xk, yk, Tν) is the depth of the
nearest opaque region along thekthpixel ray, andxkandykare thexandycoordinates
of the kth pixel.
Using a voxel based scene model where each voxel is represented by its radiance,
rj(θ), and binary opacity,αj, the regional transmittanceTν, and transmitted radiances, Rνi, for each camera, are found by filtering and interpolating between the opacity
and transmitted radiances of surrounding scene voxels. This complicates the inverse mapping as the radiance of numerous voxels will affect the observed intensity of each pixel. To simplify the optimisation, the joint conditional probability function,ρC|S(c|s),
can alternatively be re-expressed so that the filtering and interpolation are performed in the image domain, rather than in scene space.
By modelling the observed pixel intensities in each image as sample points of a continuous intensity distribution, the joint conditional probability function, ρC|S(c|s), can be closely approximated as a product of local conditional probability distributions over the set of perturbed pixel intensities, ´C={C´1,C´2, . . . ,C´N}, where ´Ck lies within
half a pixel width of Ck. The set of pixel rays corresponding with possible perturbed
positions of a given pixel Ck, define a rectangular cone in space that will be referred
to as the extended pixel ray of pixel Ck. If the position of ´Ck, denoted x´k, is chosen
to correspond with the projected image position of the nearest opaque voxel along the
kth extended pixel ray and the discrete depths, defined byZ´∗
k(x´k, y´k, Tν), coincide with
the voxel depths, then the imaging sample points will coincide with the voxel positions. This avoids the need for interpolating between scene voxels. Also as discussed at the end of Section 2.3.1, provided that the voxel kernel is similar to the imaging kernelWi,
filtering of the samples can be ignored without too many adverse effects. This allows the conditional probability distribution,ρC|S(c|s), to be expressed as
ρC|S(c|s) =
N
Y
k=1
ρC´k|Cˇ´k(´ck(sk)|rζk(sk)(θk)), (4.8)
where sk are the states of voxelsSk located along thekth extended pixel ray,ζk(sk) is
the index of the nearest opaque voxel along thekth extended pixel ray, andr
ζk(sk)(θk) is
the radiance of that voxel in the direction of thekthsensor element. Since the perturbed pixel positions depend on which voxel in Sk is nearest, the perturbed pixel intensities
´
ck(sk) are also a function of sk.
The termρC´k|Cˇ´k(´ck(sk)|rζk(sk)(θk)) in Eq. 4.8 represents the probability distribution
4.1 MAP ESTIMATE 77
Mixed Distribution
Uniform Distribution Gaussian Distribution
(Summation) Mixed Distribution
Uniform Distribution Gaussian Distribution
(Maximum)
(a) (b)
Figure 4.1 (a) The probability distribution of the difference between the observed image intensity, ´
ck(sk), and the ideal image intensity, ˇc´k, can be expressed using a mixed probability model, that is the weighted summation of two independent distributions. The first gaussian distribution explains the majority of the variations due to sensor noise and small modelling errors. The second uniform distribu- tion accounts for outlying points, caused by occasional large modelling errors. (b) If the probability of outliers is reasonably small, the mixed model can be closely approximated as a maximisation over the two component distributions rather than a summation. Theoretically, this must be scaled appropriately to ensure the integral over the resulting distribution equals one. However, in practice the scale factor will be close to unity, and will not effect the MAP estimate, so can be ignored.
rζk(sk)(θk). This probability distribution is a function of the image noise and modelling
errors. This distribution can be modelled as a linear combination of two underlying distributions caused by different processes. Jaynes [2003], chapter 21, calls this a “two- model model” which is a mixture of a model that accounts for the regular observations and a second model which explains outliers.
Using the mixed model approach, the probability distribution is given by
ρC´k|Cˇ´k(´ck(sk)|rζk(sk)(θk)) = (1−ν)ρimage(´ck(sk)|rζk(sk)(θk)) +νρmodel(´ck(sk)|rζk(sk)(θk)),
(4.9) whereρimage is the probability density function (PDF) of the image noise plus any small
modelling errors,ρmodelis the PDF of any outliers caused by occasional modelling errors,
andν is the probability of outlier observations occurring. As discussed in Section 2.2.4, the distribution of image noise can usually be closely approximated by a robust Gaussian function. Modelling errors, on the other hand, may cause significant variations in the observed pixel intensities from the ideal predicted intensity. This can be approximated using a Gaussian with a large variance, or a uniform distribution across the range of recordable pixel intensities.
As shown in Fig. 4.1, the resulting mixed distribution can be closely approximated as a weighted maximum of the two component distributions, rather than a summation. Assuming Gaussian image noise with variance σ2, and usingλ
p =νρmodel to represent
the uniform PDF of the modelling errors, the individual probability terms are given by
ρC´k|Cˇ´k(´ck(sk)|rζk(sk)(θk)) = max à 1 σ√2π exp −(´ck(sk)−rζk(sk)(θk))2 2σ2 , λp ! . (4.10)
the MAP scene estimate can finally be expressed as
SMAP(c) = arg maxs
"N Y k=1 max à exp−(´ck(sk)−rζk(sk)(θk)) 2 2σ2 , λpσ √ 2π ! ×ρS(s) # , (4.11) where the constant 1/(σ√2π) has been removed from the expression, since it does not affect the MAP estimate.
By taking logarithms of each side and negating, this can alternatively be described as a summation, giving
SMAP(c) =−arg maxs
"N X k=1 max à −(´ck(sk)−rζk(sk)(θk))2 2σ2 ,log(λpσ √ 2π) ! + log(ρS(s)) # , = arg min s " 1 2σ2 N X k=1 min¡ (´ck(sk)−rζk(sk)(θk))2, λe ¢ −log(ρS(s)) # , (4.12) where λe =−2σ2log(λpσ √ 2π), (4.13)
is the robustness parameter.
Instead of expressing the data error in Eq. 4.12 as a summation over the set of image pixels, it can instead be expressed as a summation over the scene voxels, giving
SMAP(c) = arg mins
1 2σ2 M X j=1 X k∈{k:ζk(sk)=j} min¡ (´ck(sk)−rj(θk))2, λe ¢ −log(ρS(s)) , (4.14) wherejare the voxel indices and{k:ζk(sk) =j}is the set of pixels for whichζk(sk) =j.
This can more conveniently be expressed in terms of voxel visibilities Ωj(s), where Ωj(s)
is the set of pixels which can observe voxel j, giving
SMAP(c) = arg mins
M X j=1 Ej(sj, c,Ωj(s))−log(ρS(s)) , (4.15) where Ej(sj, c,Ωj(s)) = 1 2σ2 X k∈Ωj(s) min¡ (´ck(sk)−rj(θk))2, λe¢ ifαj = 1 0 otherwise, (4.16)
and αj is the opacity of the jth voxel.
This states that the most likely or probable estimate ofSis the one which minimises the right hand objective function. This function consists of two terms: the first is an error function between the estimated and actual image intensities weighted by one over