Image-Based Modeling and Rendering - Previous Work on View Planning

Chapter 2 Background

2.2 Previous Work on View Planning

2.2.4 Image-Based Modeling and Rendering

In image-based modeling and rendering (IBMR), a 3D scene is modeled as a collection of reference images, rather than a set of conventional geometric primitives. Novel views of a scene can then be synthesized from the reference images using a variety of interpolation and re-projection techniques, for example [Chen1993] and [McMillan1995]. IBMR has several important advantages over traditional modeling and rendering: (i) the images can be of real world scenes, thus making the task of modeling such scenes much easier; (ii) image-based rendering algorithms are typically inexpensive and can be performed on general purpose computers; (ii) the rendering time is typically independent of the geometrical and physical complexity of the scene being rendered.

A complete IBMR solution involves both sampling and reconstruction of the plenoptic function [McMillan1995], which is a 5-dimensional function that describes all the infinite- resolution images that can be seen at all viewpoints in space (it can actually be reduced to a 4-dimensional function of all rays in space). Most of the research has been on reconstruction

algorithms (efficient warping, splatting, etc.), and research in the important problem of properly sampling the plenoptic function has been less active. Proper sampling typically involves acquiring images of the scene from some appropriately selected viewpoints and view directions, and at the appropriate resolutions. The following three papers propose different techniques to attempt to solve the sampling problem for synthetic models.

Given a polygonal 3D model of a scene, a viewing volume, and a sampling quality threshold, Fleishman et al. [Fleishman1999] proposed a method to compute a small set of viewpoints in the viewing volume such that every polygon visible from the viewing volume is visible from at least one of the computed viewpoints. Each visible polygon is associated with only one viewpoint, even though it may be visible to many. The end result is an image- based model made up of a set of masked reference images, where each masked reference image from a viewpoint consists of pixels from the polygons associated to the viewpoint.

Their method starts by subdividing the polygons in the input 3D model to reduce the problem of polygons being partially visible from the viewpoints in the view volume. The next step evaluates the visibility of the polygons from each viewpoint in the view volume. Since rays that go into the interior of the view volume always intersect the volume’s boundary, every surface point visible from the view volume is visible from the volume’s boundary. Therefore, only viewpoints on the volume’s boundary are evaluated.

The boundary is first tessellated into small patches, and then a hemispherical image of the input scene is rendered from the center of each patch. Each polygon in the scene is rendered in its unique color. In practice, each hemispherical image is rendered as multiple planar images. Each planar image is traversed and the colors of the pixels are used to determine which scene polygons are visible, and the number of pixels of each color is used to determine the sampling quality of the polygon. Each visible polygon and its sampling quality are associated with the planar camera (view), and this information is put into a database.

A greedy strategy is used to select a subset of cameras. For each camera in the database, a list of adequately sampled polygons is computed. Then all the cameras in the database are ranked according to the number of adequately sampled polygons that it adds to those already covered by previously selected cameras. A camera is selected at each iteration until all the visible polygons are adequately covered by the union of the selected cameras. For the alternative problem of selecting only k camera positions that cover as much of the scene as

possible, Fleishman et al. proposed another camera ranking strategy. Polygons that are most likely to be seen should be favored, so for each scene polygon, the number of cameras that can see it is counted, and a camera is ranked according to the sum of the visible cameras of its associated polygons. From these selected camera positions (and parameters), the masked reference images are generated.

Stuerzlinger [Stuerzlinger1999] addressed a similar view planning problem for IBMR, but dealt with only the visibility portion of the problem, and did not consider surface sampling quality. A hierarchical visibility algorithm is used to compute the visibility between the viewing regions (viewing volumes) and the scene surfaces. A similar algorithm has been used in hierarchical radiosity [Hanrahan1991]. The basic idea is that if a surface polygon and a viewing region are partially visible to each other, then one of them is subdivided and the visibility computation is continued on the smaller viewing regions or polygons. This process repeats recursively until the “shaft” between the surface polygon and the viewing region is completely unoccluded or completely occluded, or when the potential visibility error is below a threshold. A link is created if the surface polygon and the viewing region are completely or partially visible to each other. Shaft culling [Haines1994] is used to speed the visibility determination. The result is a hierarchy of surface polygons and a hierarchy of viewing regions, with links between the polygons and viewing volumes.

The next step of the algorithm is to select a set of good viewing regions that can see all the visible surfaces. The method starts by enumerating all the finest viewing regions and all the finest subdivided polygons. The links between them are used to create a two-dimensional visibility array, which is indexed by viewing region and polygon number. Each array entry is set if the polygon and the viewing region are partially or completely visible to each other. After this, a global optimization search, using simulated annealing, is performed to select a small set of viewing regions. It works by changing the vector of solutions randomly. The objective function is the total surface area of all polygons visible from the candidate viewing regions. The method calls the optimization procedure repeatedly with increasing numbers of viewing regions. As soon as the maximum total surface area is reached, the loop terminates.

Finally, for each selected viewing region, a good viewpoint in it is found. This search is also performed using simulated annealing, and the objective is to maximize the total visible surface area. However, the computed viewpoint does not always see all the surfaces that are

visible from the viewing region. Stuerzlinger argued that this is not very common, and moreover, surfaces that are missed by this viewpoint may be visible to viewpoints in other viewing regions.

In the work by Wilson and Manocha [Wilson2003], the objective is to compute image- based simplifications of a large and complex synthetic environment so that the environment can be rendered at interactive rates for walkthrough applications. In the preprocessing phase, a set of viewpoints is computed, and from each of them, the environment is sampled with a panoramic color image with range (depth) at each pixel. Since the panoramic images may overlap one another, the next step removes redundant samples in the images. For each image, the remaining samples are used to create textured polygonal meshes, called textured depth meshes, and care is taken to avoid connecting samples across depth discontinuities. Each mesh is then simplified to reduce the number of polygons, and the result is stored in a database, ready to be used for rendering during walkthrough of the environment.

The set of viewpoints used to generate the panoramic images is computed in the following way. First, the large environment is decomposed into smaller sections, and a set of viewpoints is computed for each section independently. Each section has a navigable region, and all computed viewpoints must lie within it. A 2D rectangular region is assumed in the paper.

A greedy approach is used to select the viewpoints. First, viewpoints at the four corners of the navigable region are put into the solution set. Subsequently, at each iteration, a set of candidate viewpoints is generated, and an objective function is evaluated to select the best candidate as the next viewpoint. To generate the candidate viewpoints, a 2D Voronoi diagram is created from the viewpoints already in the solution set, and the candidate viewpoints are the Voronoi vertices inside the navigable region, and the intersections of the Voronoi edges with the navigable region’s boundary. The objective function is evaluated for each candidate viewpoint, and the one that can see the largest projected area of the global void surfaces (or occlusion surfaces) is chosen. The global void surfaces are the resulting skins when the surfaces and skins (occlusion surfaces) seen by each of the previous viewpoints are merged together.

In document View planning for range acquisition of indoor environments (Page 50-54)