• No results found

RECONSTRUCTION TECHNIQUES

3.4 PRIOR INFORMATION

region to region depending on the profile of the surface. To deal with this an adap- tive window approach can be used where the window size and shape is modified locally based on the accuracy of constant depth assumption [Kanade and Okutomi 1994, Farid et al. 1994]. Another simple variation is the multiple window approach proposed by Fusiello et al. [1997]. Although this is presented as testing several different windows for each point, it can easily be implemented as a two stage filtering process. Matching likelihoods are first mean filtered, as in standard window based matching, and then maximum filtered before the minimum is selected.

To further improve performance the smoothing function can be extended to 3D by applying a 3D filter. This allows prior knowledge to be implemented more precisely and prevents fronto-planar surfaces from being preferentially reconstructed. However, this approach is not applicable to image based methods, where filtering can only be performed in two dimensions.

Feature-based matching can also be used, where higher level objects called image features are compared. This requires extracting these features from an image and then matching them using some criterion. Typical features include edges, corners, and tex- tures. Because the distribution of features is usually sparse and uneven, the acquired depth map will be incomplete. Extra processing is also required to extract features.

Improved performance can also be achieved through better choice of filtering. Tradi- tionally, smoothness priors have been implemented using mean filters. These are useful for certain types of scenes but are not appropriate around object boundaries where large discontinuities may occur. In such instances median or other such filtering may be more suitable.

Another approach is to iteratively filter scene likelihoods using diffusion or relax- ation based techniques [Marr and Poggio 1976, Zitnick and Kanade 1999, De Bonet and Viola 1999, Scharstein and Szeliski 1996, Lee et al. 2001]. These have proved popular in recent years due to their ability to perform complex filtering through a number of relatively simple iterative steps.

3.4.2 Segmentation

As with scene opacities, surface radiances are usually correlated between nearby points. This leads to the important observation that discontinuities in depth generally corre- spond with sharp changes in intensity in each image. This information can be used to improve the scene estimate by favouring the reconstruction of surfaces, whose bound- aries correspond with intensity edges in the images. A number of recent algorithms have applied this idea using a segmentation process, where surfaces within the scene are fitted to segments within the images [Lin and Tomasi 2004, Birchfield and Tomasi 1999, Hong and Chen 2004, Bleyer and Gelautz 2007, Sun et al. 2003, Klaus et al. 2006, Yang et al. 2006]. This approach has proved particularly successful, with all of the current

top four algorithms on the Middlebury test set1using some form of image segmentation.

3.5 OCCLUSIONS

Optimising the objective function for a full system model is extremely difficult because of the complex visibility interaction that occurs between different regions of the scene. As a consequence, a large number of algorithms make various assumptions about the scene visibilities to simplify the system model and optimisation process.

To simplify the problem of dealing with visibilities, three basic approaches can be used. The simplest is to assume that all cameras can see all surface points. Although untrue in most situations, this is a reasonable approximation for scenes containing only a small number of occluded regions. This is the approach used by most traditional depth map based methods, where cameras are usually located close together and face in a similar direction. However, in most cases some occlusions will still occur, often leading to substandard reconstructions. Consequently, this approach is really only suitable for applications such as aerial photogrammetry [Gimel’farb and Zhong 2001, Gruen and Baltsavias 1988, Krupnik 1996], where the scene consists of a single surface visible from all camera positions.

The second approach is to try and estimate a point’s visibility by comparing inten- sities between multiple cameras [Kang et al. 2001]. This can be done in a number of ways. The easiest is to assume that at least M out of N cameras observe each point, with the M cameras chosen to be those most consistent with the data. Alternatively the visibility patterns can be constrained by fitting masks to the set of images [Park and Inoue 1998, Farid et al. 1994]. This allows the spatial relationship between cam- eras to be used in addition to the observed intensities. A detailed comparison of these techniques is given by Satoh and Ohta [1996].

The third and most accurate approach is to iteratively calculate a point’s visibility based on the current scene estimate. The improved visibilities are then used to obtain a new, and hopefully more accurate, estimate of the scene. This process can be performed in a single sweep of the scene volume [Seitz and Dyer 1999] or iteratively over large local search spaces [Kolmogorov and Zabih 2001, Kolmogorov et al. 2003]. Although computationally more intensive, this approach leads to an estimate of the scene that is consistent with the detailed scene model described by Theorem 3. This approach is good for dealing with complex occlusions and is the basis of the dynamic belief propagation algorithm presented in Chapter 6.