Regularizing the Distance Field - Robust Surface Reconstruction

CHAPTER 5: LIVING 3D RECONSTRUCTIONS

5.1 Robust Surface Reconstruction

5.1.2 Regularizing the Distance Field

The TSDF aggregation procedure can be interpreted as the weighted least-squares solution that minimizes the energy functional

E(u(x)) = Z Ω 1 2 X (fk,wk)∈F(x) wk(u(x)−fk)2dx, (5.2)

where the integral is taken over all coordinates xin the 3D volume Ω. In general, the set F(x) denotes all observed data points associated with a given point in space, for any choice of data association. In the case where the space is a voxelized TSDF,_F(x)consists of all weighted distance values aggregated in the given voxel. Since the above energy functional can be evaluated point-wise, it is straightforward to see thatT(x)is the optimum at each pointx:

∂E ∂u (u(x)) ! = 0 X (fk,wk)∈F(x) wk(u(x)−fk) = 0 u(x) X (fk,wk)∈F(x) wk− X (fk,wk)∈F(x) wkfk = 0 u(x) = T(x). (5.3)

In practice, the surface extracted from a raw aggregated TSDF can be incomplete and, especially when derived from depthmap data obtained using MVS, noisy. To obtain a smooth geometry, Zach et al. (2007) proposed to minimize a total variation (TV) functional:

E(u(x)) =

Ω|∇

u(x)_|+λΦ (u(x),_F(x))dx. (5.4)

The first term is the so-called total variation penalty that encourages a smooth zero-level set by selecting a distance field that undergoes minimal change. The second term,Φ(u,F), is a (potentially robust) data term weighted by some valueλ. Applying a non-robust squared loss results in the

well-known Rudin-Osher-Fatemi (ROF) model (Rudin et al., 1992): ΦROF(u,F) = 1 2 X (fk,wk)∈F wk(u−fk)2, (5.5)

which is the same as the integrand in Eq. (5.2). To make the data term robust to outlier observations, Zach et al. (2007) suggested a TV-L1approach with

ΦL1(u,F) =

(fk,wk)∈F

wk|u−fk|. (5.6)

Ummenhofer and Brox (2015) adopted a similar data term but use a small-threshold Huber model to maintain differentiability near zero.

Let us consider these data term options from the perspective of maximum likelihood estimation, i.e., minimizing the negative log-likehood of an assumed probability distributionpk(f)for each

distance observation. In the squared (ROF) case, the error from each observation to the true distance value is assumed to follow a Gaussian distribution with varianceσ2 _{= 1/(λw}

k). Without a very

effective prior on eachwk, this loss is going to be quite senstive to spurious surface measurements

brought about by incorrect estimations in the MVS depthmaps. The L1 model assumes an underlying Laplace distribution that lends greater probability to outlier observations; a good confidence estimate inwkcan, of course, still help the approach. The result here is that the median observed distance is

preferred, rather than the mean preferred in the ROF case. A Huber loss adopts the Gaussian up to a certain threshold, and then switches to the higher-probability tails of the Laplace distribution. Of course, all three of these data terms assume that non-outlier measurements follow some fixed distribution (Gaussian or Laplace) that may or may not well model the complex surface variations that can arise in MVS depth estimation for general image collections.

From a practical perspective, the squared loss has the nice property that only the aggregated value T(x)and weightW(x)need to be stored in order to compute its derivative (Eq. (5.2)). The L1 and Huber approaches require thatallobservations_F(x)be kept in memory, which can be prohibitive when processing billions of points from a reconstruction with several thousand MVS depthmaps.

Histogram binning approaches can potentially avoid this overhead, albeit with quantization error in the target distances (Zach, 2008). When scaling to very large spaces like those found in large-scale Internet photo-collections, an accurate, low-memory solution is preferred. So, unless we have a reason to assume that an L1 noise model would better characterize the non-outlier noise distribution, it follows that a perhaps more scalable approach is to derive a robust loss that uses the squared data term.

An approach that I have found effective is simple truncation of the ROF model:

ΦT RU N C(u,F) =        1 2 P (fk,wk)∈Fwk(u−fk) 2 if_|u₋T_|< τT ∧W > τW 0 otherwise, (5.7)

for TSDF error threshold τT and TSDF weight threshold τW. This approach is intended for

reconstructions with relatively large voxels (say, greater than 0.1m3) that aggregate many surface observations from the individual depthmap pixels. In this case, the TSDF computation for well- supported surfaces is quite robust — a small number of spurious point measurements passing through the voxel will not strongly affect the overall weighted average distance. Outliers in the TSDF mainly come from spurious surface estimates for points that are actually in the air or underground. These points are usually only supported by a small number of images, and their associated voxels thus typically have a relatively low aggregated TSDF weightW(x).

Of course, it is also likely the case that valid surfaces exist whose computedW(x)is also small, so the threshold ofτW may be on its own too restrictive. Accordingly, I start with a value ofτW

that is relatively high and progressively decrease it each iteration. The thresholdτT helps to restrict

spurious points while allowing weakly observed points to become active. The idea here is that the observed points grow “from the ground up,” or to put it perhaps more accurately, outward from the existing surface. That is, only strongly supported surfaces are reconstructed in the early iterations; voxels away from these surfaces begin to adopt strong positive or negative distance values. When τW reaches a smaller value, spurious points that are,e.g., floating in the air find themselves in a

together to inhibit a surface to form from this sporadic point. For weakly supported valid points, on the other hand, we have the constraint that no observable static object simply floats in the air or is buried in the ground — it must touch air and be connected to the ground. Weakly observed surfaces near to the current estimated surface are therefore more likely to pass theτT check, and thus there is

a better chance that the missing structure evolves. Note that this approach will not always work if structures are missed, for example if a street sign is reconstructed in MVS but the pole it is attached to is not.

In document Price_unc_0153D_18939.pdf (Page 104-107)