MAP ESTIMATE - GREEDY ALGORITHM - 3 D Scene Reconstruction from Multiple Photometric Images

GREEDY ALGORITHM

4.1 MAP ESTIMATE

By formulating scene reconstruction as a Maximum A Posteriori (MAP) estimation problem, and using a voxel based scene representation, the reconstruction problem can be expressed as an optimisation problem where the objective is to assign an opacity and radiance to each voxel, so that the joint posterior probability distribution over these parameters is maximised. This can alternatively be expressed as a minimisation problem, where the objective is to minimise a weighted combination of the projected error and negative log prior probability of the scene.

Using S = _{S1, S2, . . . , SM} to represent the set of scene parameters and C =

{C1, C2, . . . , CN} to represent the set of camera pixel intensities, the MAP reconstruc-

tion problem can be expressed as givenCequalsc, find the most likely estimates ofS. This can be written using Bayes’ rule as

SMAP(c) = arg max_s

·_ρ

C|S(c|s)ρS(s) ρC(c)

. (4.1)

The denominator term, ρC(c), represents the prior probability of obtaining the

observed camera data. Since this term is independent of s, it can be removed from the expression without affecting the optimisation. The first numerator term ρ_C_|_S(c_|s), represents the likelihood of observing the datac given estimates. Using ˇc, to represent the pixel intensities that would be recorded in the absence of any noise, abberations, or modelling errors, this can be equivalently written as

ρC|S(c|s) =

ρ_C_|_C,Sˇ (c|ˇc,s)ρ_Cˇ_|_S(ˇc|s), (4.2)

where ρ_C_|_C,Sˇ (c|cˇ,s) is the probability distribution of obtaining c, given ˇc and s, and ρ_Cˇ_|_S(ˇc|s) is the probability distribution of ˇc, given s. Assuming independent noise at

each of the sensors, this can be simplified to give

ρC|S(c|s) = Z ˇ c ρ_Cˇ_|_S(ˇc|s) N Y k=1 ρ_C_k_|_Cˇ_k(ck|cˇk). (4.3)

In most instances, the ideal intensity at each pixel will be uniquely determined by the scene parameters, and can be expressed as ˇck = projk(s), where projk(s) is the

projection of s onto the kth pixel. For infinite scenes, or any semi-infinite scenes that include all points that are within the field of view of the cameras, this is guaranteed. It is also true for finite scenes, provided that there are no radiating or opaque regions outside the scene that are visible in any of the cameras. In situations where points outside the defined scene volume are visible, the probability distribution ρ_Cˇ_|_S(ˇc|s) can

be simplified by assuming that the ideal pixel intensities depend on the radiance of regions inside or outside the scene but not both. This condition will be valid provided

4.1 MAP ESTIMATE 75

that radiances outside the modelled scene volume are independent of those within the scene, there are no opaque or radiating surfaces between the defined scene volume and any of the cameras, and that the scene is either completely transparent or opaque along any pixel beam.

Assuming binary transmittances through the scene along each pixel beam, and using

ξk(s) to represent a boolean function that is equal to one if the transmittance along the kth pixel beam is zero, and zero otherwise, the conditional probability distribution

ρ_Cˇ_|_S(ˇc|s) can be expanded to give

ρ_Cˇ_|_S(ˇc|s) =ρξ1(s)(ˇcξ1(s)|s)ρξ0(s)(ˇcξ0(s)|s)

= Y

k∈Cˇξ₁(s)

δ(ˇck−projk(s))×ρξ0(s)(ˇcξ0(s)|s), (4.4)

where ˇCξ1(s) is the set of pixels for whichξk(s) = 1 and ˇCξ0(s) is the remaining set of

pixels, corresponding with ξk(s) = 0. For pixel rays outside the scene, ξk(s) = 0. The

term ρξ0(s)(ˇcξ0(s)|s) represents the joint probability distribution of obtaining the ideal

pixel intensities in the set ˇCξ0(s). This function is governed by the prior probability

of the background radiances. In situations where the background radiance is known, this term will be a delta function. If the background radiances are unknown, a uniform distribution over the range of pixel intensities is usually assumed, allowing the term to be approximated by 1/κn, where n is the number of pixels in ˇCξ0(s), and κ is the

dynamic range of the cameras.

This function can be further simplified by assuming the average transmittance through the scene along any pixel beam is zero. Such a scene is referred to as complete, as it completely defines all ideal pixel intensities [Seitz and Dyer 1999]. For scenes with binary regional opacities, this condition is ensured if there is at least one opaque region extending across every pixel ray. With infinite or semi-infinite scenes, an equiv- alent scene can always be found that is complete with respect to the set of camera images. This is achieved by replacing any transparent region extending to infinity along incomplete pixel rays, with an opaque region. So long as the transmitted radiance of the two regions is the same, both will appear identical from all camera positions.

By ensuring the scene estimate is complete, the conditional probability distribution

ρ_Cˇ_|_S(ˇc|s), can be simplified to give

ρ_Cˇ_|_S(ˇc|s) =

k=1

δ(ˇck−projk(s)). (4.5)

Substituting this back into Eq. 4.3, gives

ρ_C_|_S(c_|s) = N Y k=1 ρ_C k|Cˇk(ck|projk(s)). (4.6)

Assuming binary regional transmittances, the projection proj_k(s) ofs onto thekth

pixel is given from Theorem 4 in Chapter 2 as

proj_k(s) =Rνi(xk, yk, Zi∗(xk, yk, Tν)), (4.7)

where iis the index of the image containing pixel k, Z∗

i(xk, yk, Tν) is the depth of the

nearest opaque region along thekthpixel ray, andxkandykare thexandycoordinates

of the kth pixel.

Using a voxel based scene model where each voxel is represented by its radiance,

rj(θ), and binary opacity,αj, the regional transmittanceTν, and transmitted radiances, Rνi, for each camera, are found by filtering and interpolating between the opacity

and transmitted radiances of surrounding scene voxels. This complicates the inverse mapping as the radiance of numerous voxels will affect the observed intensity of each pixel. To simplify the optimisation, the joint conditional probability function,ρC|S(c|s),

can alternatively be re-expressed so that the filtering and interpolation are performed in the image domain, rather than in scene space.

By modelling the observed pixel intensities in each image as sample points of a continuous intensity distribution, the joint conditional probability function, ρ_C_|_S(c_|s), can be closely approximated as a product of local conditional probability distributions over the set of perturbed pixel intensities, Ć=_{C´1,C´2, . . . ,CŃ}, where Ćk lies within

half a pixel width of Ck. The set of pixel rays corresponding with possible perturbed

positions of a given pixel Ck, define a rectangular cone in space that will be referred

to as the extended pixel ray of pixel Ck. If the position of ´Ck, denoted x´_k, is chosen

to correspond with the projected image position of the nearest opaque voxel along the

kth extended pixel ray and the discrete depths, defined byZ_´∗

k(x´k, y´k, Tν), coincide with

the voxel depths, then the imaging sample points will coincide with the voxel positions. This avoids the need for interpolating between scene voxels. Also as discussed at the end of Section 2.3.1, provided that the voxel kernel is similar to the imaging kernelWi,

filtering of the samples can be ignored without too many adverse effects. This allows the conditional probability distribution,ρ_C_|_S(c_|s), to be expressed as

ρ_C_|_S(c_|s) =

k=1

ρ_C´_k_|_Cˇ_´_k(´ck(sk)|rζk(s_k)(θk)), (4.8)

where s_k are the states of voxelsS_k located along thekth extended pixel ray,ζk(sk) is

the index of the nearest opaque voxel along thekth _{extended pixel ray, and}_r

ζk(s_k)(θk) is

the radiance of that voxel in the direction of thekthsensor element. Since the perturbed pixel positions depend on which voxel in Sk is nearest, the perturbed pixel intensities

ck(sk) are also a function of sk.

The termρ_C´_k_|_Cˇ_´_k(´ck(sk)|rζk(s_k)(θk)) in Eq. 4.8 represents the probability distribution

4.1 MAP ESTIMATE 77

Mixed Distribution

Uniform Distribution Gaussian Distribution

(Summation) Mixed Distribution

Uniform Distribution Gaussian Distribution

(Maximum)

(a) (b)

Figure 4.1 (a) The probability distribution of the difference between the observed image intensity, ´

ck(sk), and the ideal image intensity, ˇc´k, can be expressed using a mixed probability model, that is the weighted summation of two independent distributions. The first gaussian distribution explains the majority of the variations due to sensor noise and small modelling errors. The second uniform distribution accounts for outlying points, caused by occasional large modelling errors. (b) If the probability of outliers is reasonably small, the mixed model can be closely approximated as a maximisation over the two component distributions rather than a summation. Theoretically, this must be scaled appropriately to ensure the integral over the resulting distribution equals one. However, in practice the scale factor will be close to unity, and will not effect the MAP estimate, so can be ignored.

rζk(sk)(θk). This probability distribution is a function of the image noise and modelling

errors. This distribution can be modelled as a linear combination of two underlying distributions caused by different processes. Jaynes [2003], chapter 21, calls this a “two- model model” which is a mixture of a model that accounts for the regular observations and a second model which explains outliers.

Using the mixed model approach, the probability distribution is given by

ρ_C´_k_|_Cˇ_´_k(ćk(sk)|rζk(sk)(θk)) = (1−ν)ρimage(ćk(sk)|rζk(sk)(θk)) +νρmodel(ćk(sk)|rζk(sk)(θk)),

(4.9) whereρimage is the probability density function (PDF) of the image noise plus any small

modelling errors,ρmodelis the PDF of any outliers caused by occasional modelling errors,

andν is the probability of outlier observations occurring. As discussed in Section 2.2.4, the distribution of image noise can usually be closely approximated by a robust Gaussian function. Modelling errors, on the other hand, may cause significant variations in the observed pixel intensities from the ideal predicted intensity. This can be approximated using a Gaussian with a large variance, or a uniform distribution across the range of recordable pixel intensities.

As shown in Fig. 4.1, the resulting mixed distribution can be closely approximated as a weighted maximum of the two component distributions, rather than a summation. Assuming Gaussian image noise with variance σ2_{, and using}_λ

p =νρmodel to represent

the uniform PDF of the modelling errors, the individual probability terms are given by

ρ_C´_k_|_Cˇ_´_k(´ck(sk)|rζk(sk)(θk)) = max Ã 1 σ√2π exp −(´ck(sk)−rζk(sk)(θk))2 2σ2 , λp ! . (4.10)

the MAP scene estimate can finally be expressed as

SMAP(c) = arg max_s

"_N Y k=1 max Ã exp−(´ck(sk)−rζk(sk)(θk)) 2 2σ2 , λpσ √ 2π ! ×ρS(s) # , (4.11) where the constant 1/(σ√2π) has been removed from the expression, since it does not affect the MAP estimate.

By taking logarithms of each side and negating, this can alternatively be described as a summation, giving

SMAP(c) =−arg max_s

"_N X k=1 max Ã −(´ck(sk)−rζk(sk)(θk))2 2σ2 ,log(λpσ √ 2π) ! + log(ρS(s)) # , = arg min s " 1 2σ2 N X k=1 min¡ (´ck(sk)−rζk(s_k)(θk))2, λe ¢ −log(ρS(s)) # , (4.12) where λe =−2σ2log(λpσ √ 2π), (4.13)

is the robustness parameter.

Instead of expressing the data error in Eq. 4.12 as a summation over the set of image pixels, it can instead be expressed as a summation over the scene voxels, giving

SMAP(c) = arg min_s

  1 2σ2 M X j=1 X k∈{k:ζk(sk)=j} min¡ (´ck(sk)−rj(θk))2, λe ¢ −log(ρS(s))  , (4.14) wherejare the voxel indices and_{k:ζk(sk) =j}is the set of pixels for whichζk(sk) =j.

This can more conveniently be expressed in terms of voxel visibilities Ωj(s), where Ωj(s)

is the set of pixels which can observe voxel j, giving

SMAP(c) = arg min_s

  M X j=1 Ej(sj, c,Ωj(s))−log(ρS(s))  , (4.15) where Ej(sj, c,Ωj(s)) =      1 2σ2 X k∈Ωj(s₎ min¡ (´ck(sk)−rj(θk))2, λe¢ ifαj = 1 0 otherwise, (4.16)

and αj is the opacity of the jth voxel.

This states that the most likely or probable estimate ofSis the one which minimises the right hand objective function. This function consists of two terms: the first is an error function between the estimated and actual image intensities weighted by one over

In document 3 D Scene Reconstruction from Multiple Photometric Images (Page 84-89)