Volumetric modelling from multiple camera images

3.2 Telepresence

3.3.6 Volumetric modelling from multiple camera images

Volumetric reconstruction approaches [20], [112] are those which are concerned

with modelling the volume occupied by an object. [77] first described a method

by which a volumetric ”voxel” model of an object could be constructed from multiple views. A voxel may be regarded as a three dimensional pixel, and is a discretisation of three dimensional space. As such, the conceptual step from two dimensional image pixels to voxels is a logical transition that is easy to under- stand.

Early approaches created low resolution, fairly rough models This is because ex- tending the concept of a pixel into three dimensions significantly increases memory requirements, and correspondingly CPU overhead. To overcome this and al- low the generation of higher definition models, the octree representation was con- ceived. Octrees are the three dimensional equivalent of the two dimensional quad- tree and the one dimensional binary tree, and were first described as an efficient

method for representing geometric models by [57]. [22] used octrees to represent

a volumetric model constructed from three orthographic projections. The octree representation of a voxel object significantly reduces the memory required to store the object by progressive subdivision of space into smaller volumes where the object exhibits more detail. This not only reduces memory requirements, but intel- ligent approaches can be taken to construction of the octree from images which

greatly increase the speed with which the object can be reconstructed. [96] de-

scribed such an approach which enabled octree models to be built from silhouette

images of an object. This method was improved upon by [117] who introduced

an incremental refining process resulting in a near real-time reconstruction of a 64x64x64 voxel representation of an object.

The space carving algorithm proposed by [64] is a volumetric reconstruction

method that introduces the concept of the photo-hull This solves two drawbacks of the visual hull concept: Firstly the visual hull cannot be calculated for images where there are no background pixels, for example reconstructing an entire scene rather than an object placed within the scene. Secondly, since the photo

hull is constructed by means of adhering to photo-consistency constraints, surface concavities can be represented. However, the colour consistency check across multiple views is an expensive operation and therefore this method is not usually considered as a candidate for real-time applications.

Volumetric reconstruction can be easily broken down into computational tasks that can be processed in parallel, this is because voxel occupancy can be calculated in-

dependently each other. [123] propose such an approach in which computation

is distributed over a network to be calculated by several computers. Silhouette images are captured by capture nodes, which are also responsible for calculating their projection into the volume to be reconstructed. The reconstruction volume is broken down into a number of slices beginning with the ”base plane”. Each silhouette is projected onto a base plane using homographic principles, and its projection onto successively higher planes can then be achieved using computa- tionally less expensive scale and translate operations. Since the volume is now broken down into many slices which can be worked upon independently of one another, groups of slices can be sent to processing nodes on the network which calculate the intersection of voxels on each plane. The resulting intersected slices are then brought back together to form the overall voxel set which constitutes the visual hull. In this way, higher resolution voxel reconstruction is achieved through distributed processing. The paper presents this as a method for capturing human motions.

The advent of general purpose GPU computing has recently meant that highly parallelisable numeric problems such as volumetric modelling can be approached in a new way. Traditional CPUs and graphics hardware have followed different evolutionary paths over the preceding decades due to the differing requirements of a general purpose CPU compared with the specific requirements of a GPU. GPUs were originally invented to remove the burden of graphics processing from the CPU and associated memory. Typically GPUs comprise many processing cores compared with CPUs and offer very high bandwidth to video memory. New general purpose interfaces to GPUs such as CUDA (Compute Unified Device Archi- tecture) and OpenCL (Open Computing Language) are now allowing processing

CHAPTER 3. BACKGROUND AND RELATED WORK 49 unrelated to graphics to be performed on GPUs which can yield orders of mag- nitude performance increases for problems of a parallel, rather than sequential nature.

Two volumetric reconstruction algorithms are implemented using CUDA and com-

pared in [65] One algorithm pre-computes the mapping of voxels to image pixel

bounding boxes and stores these in a lookup table. The second does no pre- computation, but instead downsamples the silhouette images so that each voxel approximately maps to a single image pixel. Octree variants of both algorithms are also tested. The results show that although the first algorithm is faster for small voxel sets, the size of lookup tables soon becomes too large for the GPU memory. The second algorithm does not exhibit this problem since the downsam- pled silhouette images are significantly smaller than the lookup tables. It can also be observed that in both cases the octree variant of the algorithm is faster This becomes increasingly apparent as the size of the voxel set increases. Furthermore, the lookup table algorithm can only be used when the cameras are stationary, whereas the downsampling approach could be employed for moving cameras and therefore provides the best general solution of the two.

In document Improving the performance of video based reconstruction and validating it within a Telepresence context (Page 60-62)