5.5
Joint Optimization of Geometry, Albedo, and Im-
age Formation Model
One of the core ideas of our method is the joint optimization of the volumetric 3D re- construction as well as the image formation model. In particular, we simultaneously optimize for the signed distance and albedo values of each voxel of the volumetric grid, as well as the camera poses and camera intrinsics such as focal length, center pixel, and (radial and tangential) lens distortion coefficients. We stack all parameters in the un- known vector X = (T , ˜D, a, fx, fy, cx, cy, κ1, κ2, ρ1) and formulate our minimization
objective as follows:
Escene(X ) = X
v∈ ˜D0
λgEg+ λvEv + λsEs+ λaEa, (5.13)
with λg, λv, λs, λathe weighting parameters that define the influence of each cost term.
For efficiency, we only optimize voxels within a thin shell close to the current estimate of the iso-surface ˜D0, i.e., | ˜D| < tshell.
5.5.1
Camera Poses and Camera Intrinsics
For initial pose estimates, we use poses obtained by the frame-to-model tracking of Voxel Hashing [86]. However, this merely serves as an initialization of the non-convex energy landscape for our global pose optimization, which is performed jointly along with the scene reconstruction (see below). In order to define the underlying residuals of the en- ergy term, we project each voxel into its associated input views by using the current state of the estimated camera parameters. These parameters involve not only the extrin- sic poses, but also the pinhole camera settings defined by focal length, pixel center, and lens distortion parameters. During the coarse-to-fine pyramid optimization, we derive the camera intrinsics according to the resolution of the corresponding pyramid levels.
5.5.2
Shading-based SDF Optimization
In order to optimize for the 3D surface that best explains the re-projection and follows the RGB shading cues, we directly solve for the parameters of the refined signed distance field ˜D, which is directly coupled to the shading through its surface normals n(v). In addition to the distance values, the volumetric grid also contains per-voxel albedo pa- rameters, which again is coupled with the lighting computation (cf. Equation 5.9); the surface albedo is initialized with a uniform constant value. Although this definition of solving for a distance field follows the direction of Zollh¨ofer et al. [135], it is different at
its core: here, we dynamically constrain the reconstruction with the RGB input images, which contrasts Zollh¨ofer et al. who simply rely on the initially pre-computed per-voxel colors. In the following, we introduce all terms of the shading-based SDF objective. Gradient-based Shading Constraint In our data term, we want to maximize the consistency between the estimated shading of a voxel and its sampled observations in the corresponding intensity images. Our objective follows the intuition that high-frequency changes in the surface geometry result in shading cues in the input RGB images, while more accurate geometry and a more accurate scene formation model result in better sampling of input images.
We first collect all observations in which the iso-surface point ψ(v) of a voxel v is visible; we therefore transform the voxel into each frame using the pose Ti and check
whether the sampled depth value in the respective depth map Zi is compatible. We
collect all valid observations Ov, sort them according to their weights wiv (cf. Equa-
tion (5.7)), and keep only the best tbest views Vbest = {Ii}. Our objective function is
defined as follows:
Eg(v) = X
Ii∈Vbest
wivk∇B(v) − ∇Ii(π(vi))k22, (5.14) where vi = g(Ti, ψ(v))is the 3D position of the voxel center transformed into the view’s
coordinate system. Observations are weighted with their view-dependent observation weights wv
i. By transforming and projecting a voxel v into its associated input intensity
images Ii, our joint optimization framework optimizes for all parameters of the scene
formation model, including camera poses, camera intrinsics, and lens distortion param- eters. The shading B(v) depends on both surface and material parameters and allows to optimize for signed distances, implicitly using the surface normals, and voxel albedo on-the-fly. Instead of comparing shading and intensities directly, we achieve improved robustness by comparing their gradients, which we obtain by discrete forward differ- ences from its neighboring voxels.
To improve convergence, we compute an image pyramid of the input intensity im- ages and run the optimization in a coarse-to-fine manner for all levels. This inner loop is embedded into a coarse-to-fine grid optimization strategy, that increases the resolution of the SDF with each level.
Regularization We add multiple cost terms to regularize our energy formulation re- quired for the ill-posed problem of Shape-from-Shading and to mitigate the effect of noise.
5.5. Joint Optimization of Geometry, Albedo, and Image Formation Model 73 First, we use a Laplacian smoothness term to regularize our signed distance field. This volumetric regularizer enforces smoothness in the distance values between neighboring voxels:
Ev(v) = (∆ ˜D(v))2. (5.15)
To constrain the surface and keep the refined reconstruction close to the regularized original signed distances, we specify a surface stabilization constraint:
Es(v) = ( ˜D(v)− D(v))2. (5.16)
Given spherical harmonics coefficients, the shading computed at a voxel depends on both its albedo as well as its surface normal. We constrain to which degree the albedo or normal should be refined by introducing an additional term that regularizes the albedo. In particular, the 1-ring neighborhood Nvof a voxel is used to constrain albedo changes
based on the chromaticity differences of two neighboring voxels. This follows the idea that chromaticity changes often go along with changes of intrinsic material:
Ea(v) =
X
u∈Nv
φ(Γ(v)− Γ(u)) · (a(v) − a(u))2, (5.17)
where the voxel chromaticity Γ = C(v)/I(v) is directly computed from the voxel colors and φ(x) is a robust kernel with φ(x) = 1/(1 + trob· x)3.
5.5.3
Joint Optimization Problem
We jointly solve for all unknown scene parameters stacked in the unknown vector X by minimizing the proposed highly non-linear least squares objective:
X∗ = arg min
X
Escene(X ) (5.18) We solve the optimization using the well-known Ceres Solver [10], which provides au- tomatic differentiation and an efficient Levenberg-Marquardt implementation.
By jointly refining the SDF and image formation model, we implicitly obtain opti- mal colors for the reconstruction at minimal re-projection error. In the optimization, the color and shading constraints are directly expressed with respect to associated input images; however, for the final mesh generation, we recompute voxel colors in a postpro- cess after the optimization. Finally, we extract a mesh from the refined signed distance field using Marching Cubes [75].
Dataset # frames # keyframes Resolution color depth Fountain[133] 1086 55 1280x1024 640x480 Lucy[135] 100 20 640x480 640x480 Relief [135] 40 8 1280x1024 640x480 Lion 515 26 1296x968 640x480 Tomb Statuary 523 27 1296x968 640x480 Bricks 773 39 1296x968 640x480 Hieroglyphics 919 46 1296x968 640x480 Gate 1213 61 1296x968 640x480
Table 5.1: Test RGB-D datasets used for the evaluation.