Chapter 2 Background
2.1.2 The 3D Reconstruction Process
The 3D reconstruction process produces a 3D digital model by acquiring and integrating multiple sets of range data of an object or environment. It is assumed that each set of range data is a range image—an organized point cloud—and this section focuses on reconstruction methods for organized point clouds.
The process of 3D reconstruction of an object or environment typically consists of the following steps, as seen in Figure 2.1: (1) range acquisition, (2) range image registration, (3) merging of range images, and (4) post processing. The registration, merging and the post processing are usually very time-consuming, therefore they are normally done offline, after all the range scans have been acquired. Details of the steps are given next, and more detailed examinations of some of the 3D reconstruction steps can be found in [Forsyth2002], and in [Hilton1997a].
Figure 2.1: Steps in the 3D reconstruction process.
(1) Range acquisition. In the range acquisition step, the range scanner is positioned at a set of different poses (and/or the object is moved with respect to the scanner) to scan different parts of the object. Imaging parameters of the scanner can be set differently at each pose. For example, the sampling density and field of view can be adjusted appropriately for each pose. A range image is produced when the scanner makes a “scan” of the object from a given pose and obtains a 2D array of range points expressed in the scanner’s local imaging coordinate frame. The objective of the acquisition step is to acquire range images that cover as much of the object’s surface as possible. This almost always requires multiple scans to be made due to the self-occlusion by the object, and also due to the imaging limitations of the scanner, such as a limited range and a limited field of view. If the 3D model to be
offline range acquisition range image registration post processing merging of range images
16
reconstructed has to meet some quality requirements, then the acquisition process must attempt to satisfy them. For example, to achieve a certain minimum sampling density on all visible object surface regions, the scanner must be placed at an appropriate distance and orientation from each surface region. If the range images have to be registered after the acquisition step, then it must be ensured that every scan has enough overlapping regions with at least one or two other scans, and the overlapping regions must have sufficient geometric complexity to constrain the relative 3D rigid-body transformation between the range images. To ensure that the constraints and requirements are satisfied, proper planning of the scanner’s positions and orientations (and imaging parameters) is necessary.
(2) Range image registration. The next step is to register the multiple range images, so that they are aligned with one another in a common coordinate frame. During the range acquisition step, if the pose of the scanner is precisely known when each scan is being made, then it is not necessary to register the range scans, because the scans just have to be appropriately transformed into a common coordinate frame. However, this is normally not the case because the positioning and pose measurement systems always have errors, and their accuracies are usually much less than the depth accuracy and sampling density of the scanner.
In order to achieve successful registration of all the range images, every range image must have sufficient overlapping regions with at least one other range image and the overlapping regions must have sufficient shape complexity. To start the registration, most registration algorithms require all the range images be already roughly aligned [Chen1992, Nishino2002]. This rough alignment may come from the approximate pose data provided by the tracking system or the positioning system of the scanner, or it may come from a human operator who manually positions the range images to roughly align them. After that, an iterative optimization method is usually used to achieve more precise alignment among all the range images [Pulli1999, Bergevin1996, Nishino2002]. Many of such registration algorithms that simultaneously align multiple range images are based on the idea of the Iterative Closest Point (ICP) algorithm [Besl1992, Chen1992, Rusinkiewicz2001]. The ICP algorithm is used for pair-wise registration of only two surfaces.
(3) Merging of range images. After the scans have been registered, there is redundant data at the overlapping regions. The objective of the merging step is to combine surface information in all the scans into a single non-redundant model. An ideal merging algorithm should take into consideration the physical characteristics of the scanner and the measurement quality or uncertainty of each individual range point, so as to create the most plausible model. Many of the merging methods for organized point clouds can be classified as surface mesh integration approaches [Turk1994, Boissonnat1984, Rutishauser1994, Soucy1995, Pito1996b], or volumetric approaches [Curless1996, Hilton1997, Roth1997].
The zippering method [Turk1994] proposed by Turk and Levoy is a surface mesh integration approach. Each scan is first converted to a triangle mesh. The redundant surfaces at the overlapping regions between each pair of meshes are removed by eroding their boundaries until they just meet. Then, the boundary triangles on one mesh are clipped against those on the other mesh. Next, the redundant clipped triangles are removed, and the two triangle meshes are joined together by re-triangulation of the boundary region. After all the meshes have been zippered together, each vertex in the final model is moved to a consensus position given by a weighted average of positions from the original range images.
An example of a volumetric approach is the method proposed by Curless and Levoy [Curless1996]. Each range image is first scan-converted to a discrete signed distance function represented in a 3D uniform grid. Then, one at a time, each signed distance function is weighted by its corresponding weight function and accumulated into a resulting signed distance function. The weight function represents the measurement confidence at each point of the range image. Finally, using the marching cubes algorithm [Lorensen1987], the final model is generated by extracting an isosurface from the volumetric grid of the resulting signed distance function.
The resulting 3D model from the above merging methods is a polygonal boundary representation of the 3D object.
(4) Post Processing. Many types of post processing operations can be done to the merged model. Some of the common ones are hole filling, simplification, surface
18
fittings, and surface fairing. Holes on the reconstructed model can come from a few sources. Self-occlusion by the object, together with the physical limitations in positioning the scanner, is one unavoidable cause. Another cause is the difficult reflectance properties on the object surface—both overly-absorbent and overly- reflective surface can cause “drop outs,” that is, missing samples in the range images. Another cause of holes comes from human errors. It may be due to human carelessness that some visible parts of the objects are just missed during range acquisition. Holes in the reconstructed model are difficult to patch up automatically, unless additionally knowledge about the object can be provided to the hole-filling algorithm. This knowledge can be provided interactively by a human operator [Wang2002], or the algorithm just uses some heuristic assumptions about the missing surfaces [Wang2003a, Davis2002].
If the merged model is a polygon mesh, it usually consists of an excessive number of polygons. This model can be optimized by reducing the number of polygons at low-curvature regions [Schroeder1992, Garland1997]. Another form of optimizing or approximating the merged model is to fit higher-order surface primitives to the model’s surface. A good review of some of these methods can be found in [Söderkvist1999]. The model may also be smoothed by some surface fairing methods, for example [Taubin1995], to reduce the effects caused by noise in the range images.