3.4 Arrangements
4.1.1 The Visibility Cube
A point sample is defined to be the set of polygons visible from a given point. A visibility cube (strongly related to a radiosity hemi-cube [CG85]) is used to generate such samples (see Figure 12). This is created by treating each of the six sides of a tiny cube enclosing the sample point as in- dependent depth and frame buffers onto which the scene is rendered. Depth buffers are supported by all modern consumer level graphics hardware and ensure that only the pixels of these polygons visible from the point in question are rendered. Each polygon is assigned a distinct 32 bit colour. This allows a given pixel to be mapped back to the polygon responsible for its generation. Any polygon associated with a pixel is considered visible. The set of polygons that are mapped to by at least one pixel from any of the six frame buffers is considered to be the set of polygons visible from the sample point. The visibility cube can be considered a high density sampling over the angular domain, for a fixed spatial position.
The rendering process is not a traditional sampling mechanism, and is, in many ways, different from ray-casting through pixels. We give highlight of these differences in Table 2.
The intended application is for visibility culling in a rasterisation engine. A useful heuristic for obtaining good accuracy for point samples in practice is to set parameters (frame buffer resolution, bit depth of depth buffer and near and far planes) similar to that of the desired output parameters. For maximum accuracy, these factors should be set in accordance with the Nyquist limit.
Sub-sampling the intended rendering resolution is beneficial, however, since it enhances perfor- mance by minimising frame buffer reads and reducing the required fill rate, this allows accuracy to be traded for speed. We have found sub-sampling to be necessary when trying to achieve the opti- mal combination of accuracy and performance. Although accuracy is reduced, sub-sampling results
Figure 12:The Visibility Cube. A sample of several visibility cubes over a surface. The visible geometry (of
several teapots) has been projected onto the cubes.
Ray Casting Rasterization
Geometry Infinitely thin half-line Sheared frustum. Size depends on sample resolution (pixel size).
Z-Fighting – Aliasing in the Z-buffer results in
visual interlacing of polygons of similar depth.
Near/Far Clipping – Near and far planes may cause near
or far geometry to be omitted. Small Polygons In the traditional sense of sampling, Small polygons are never “missed”,
insufficient sampling results in rather only the nearest polygon polygons being omitted. intersecting a pixel frustum is
selected.
Table 2: Aliasing: Ray-Casting vs. Rasterisation. Ray-casting and rasterisation do not produce identical
results. We list several differences and catalogue the different types of aliasing artefacts that affect visibility.
only in the occasional omission of small polygons (with little perceptual impact). This is known as approximate culling or equivalently contribution culling [ASVNB00, BMH98, Zha98].
The concept of a “from-region” visibility set can be defined in terms of point samples. This is simply the union of the visible sets of all possible point samples taken within the rectangular region. Since there are an infinite number of these points, an exact evaluation via point sampling is impossible. Instead, we form the union of a finite subset of point samples.
Insufficient sampling leads to aliasing artefacts that manifest as the exclusion of visible polygons (false invisibility error) when rendering.
Performance
The computation of a visibility cube consists of six renderings of scene geometry from a single point. For each render, the frame buffer needs to be read in order to obtain the visible polygon indices. In this section we examine the performance issues and propose several optimisations.
Firstly, we consider the rendering process. The generation of one side of a visibility cube is similar to that of standard rendering, however several simplifications can be exploited:
1. Mapping is not required (texture maps, bump maps, environment maps, light maps)
2. Smooth shading (Gouraud/Phong) is not required
3. Lighting calculations are not required
4. All geometry in the preprocess is static
This implies that the rendering process can be achieved more efficiently than traditional rendering, often approaching the peak efficiency of the graphics hardware. For further efficiency, our imple- mentation incorporates the following:
1. High performance video/AGP memory is allocated for geometry when possible. Using (on our hardware) 230mb of such memory allows for 8 million triangles to be stored.
2. Triangle stripping enhances performance, and allows more geometry to be inserted into high performance memory.
3. Geometry could be uploaded to video/AGP while rendering using synchronisation extensions to exploit CPU-GPU parallelism. We have not implemented this since we have not found the amount of high performance memory to be a limitation. The large quantities of available high performance memory is a consequence of unused texture memory.
Current hardware (we use an NVidia GeForce4 Ti 4600) claims to be able to transform 136 million vertices per second. Already, top end hardware, such as the ATI Radeon 9700, claims to be able to double or even triple this. In practice, we achieve 17 million triangles per second throughput. Since our samples are taken within the scene bounding box, the slow-down is due mainly to an insufficient fill rate, since nearby triangles consist of many pixels.
Any acceleration technique that can be applied to traditional point rendering can also be applied to visibility cube rendering. We implement frustum culling using bounding spheres. This acceler- ates our throughput by a factor of 4.2. We optimise the process by noting that six views (partitioning
the full angular domain) need to be computed from the same point. We note that each of these six (infinite) frusta are bounded by four planes from a shared set of six. These six planes are well de- fined, and are those six that intersect the center and embed any two edges of the visibility cube. We classify the bounding spheres with respect to the six planes once, and use this classification for all six sides.
We consider the utilisation of point-based occlusion techniques as an area of future investigation. It should be noted that during our preprocess, visibility information that has already been computed is exploited (see Section 4.2.2).
The second main issue is that of frame-buffer reading. This is often a bottle neck. Wonka et al. [WWS00] claim that this accounts for approximately 54% of their run-time. This is most likely due to the fact that they only render simplified scenes (their 8 million triangle scene is represented by a much smaller building “facade”), and thus may exaggerate the frame buffer read times for more general scenes. Frame buffer reading consists of a considerably smaller part (20%) of our run-times. The performance of frame buffer reads has not improved at the same rate as triangle render- ing. The read bottleneck is due to limitations on bus technology. Older UMA (Unified Memory Architecture) hardware such as the SGI Visual Workstation are on par with current hardware (Intel Pentium 4 with GeForce 2/3/4). Frame buffer reads (RGBA channels) occur at approximately 46 million pixels per second (for 512x512 pixel blocks). Much older UMA hardware such as the SGI O2 only read at 11 million pixels per second.
In Section 4.6 we give several suggestions on how specialised hardware could be engineered in order to enhance visibility cube rendering and frame-buffer processing.