Other Acceleration Structures - Higher Performance Traversal and Construction of Tree-Based Ray

For the sake of completeness, before we proceed with BVH and kd-tree construction with the surface area metric in the next chapter we close this chapter with a brief discussion

2.4. Other Acceleration Structures

Figure 2.17: Example scene with 16 primitives for two iterations (left and right image) of an skd-tree partition with the spatial median split construction strategy. The left partition has overlapping (gray area) implicit bounds, which are also larger than with a kd-tree. The right partition has smaller implicit bounds, but still larger bounds than a BVH with AABBs as bounding volumes.

of skd-trees/bounding interval hierarchies, graphs, and uniform grids. The first two are directly related to BVHs and kd-trees and can benefit from our contributions on hierarchy quality, out-of-core construction, and memory layouts.

Skd-Trees / Bounding Interval Hierarchies The spatial kd-tree (skd-tree) has been origi- nally introduced byOoi et al. [1987]in the context of spatial databases to store non-points objects (objects with an extent) without producing duplicates. Havran et al. [2006]and Wächter and Keller [2006]concurrently introduced skd-trees to ray tracing where the lat- ter coined the term bounding interval hierarchy as they were unaware of the work fromOoi et al. [1987]. Skd-trees can be interpreted as relaxed kd-trees, where the implicit bounds of the children are allowed to overlap. Instead of a split plane it stores an upper bound for the left child and a lower bound for the right child in the split dimension. Another interpre- tation is to view skd-trees as restricted BVHs with AABBs as children bounding volumes, where five bounding interval limits are predetermined by the parent bounds. Figure2.17 shows an skd-tree example. The main benefit of skd-trees is their lower memory footprint compared to kd-trees and BVHs. Compared to a BVH with its twelve bounding interval limits for each pair of children AABBs the skd-tree only has to store two limits. As an skd-tree is an object partitioning structure it has the same bounded number of nodes as a BVH (see Equation2.19). Empirically the number of skd-tree nodes is much lower than for a kd-tree resulting in a lower memory footprint even though a kd-tree node is slightly smaller (seeWächter and Keller [2006]).

The disadvantage of an skd-tree is its traversal performance. Wächter and Keller [2006] reported roughly the same and up to three times lower trace performance than kd-trees constructed with an unspecified construction strategy. Havran et al. [2006]ob- served a generally lower trace performance, which was even about one order of magni- tude lower for some scenes. This can be partially explained by the possibly larger implicit bounds compared to kd-trees due to the allowed overlap (see Figure2.17). Also, traversal is more involved. Two planes have to be intersected and more logic is needed to determine which children have been intersected with the additional case that no child is intersected. In contrast to kd-trees, skd-trees have no early ray termination mechanism because of the possible node overlap.

1a 1b 3 2a 2b _4b4a _{5b 5c}5a _6b6a 7 {2b, 3} {4b} {5b} {5c, 6b} {7} {1a, 2a} {1b, 4a} {5a, 6a}

— | R _R R R R L L L _L L L T T T T T B B B B

Figure 2.18: 2D example of the graph based acceleration structure fromGribble and Naveros [2013]. Basis is the example kd-tree and scene in Figure2.15from the kd-tree section. The original kd-tree leaves are turned into sectors (colored) with explicit bounds. The left (L), right (R), top (T), and bottom (B) face of each sector references its adjacent sector. The bottom face of the purple sector and the right face of the yellow sector have no unambiguous neighbor. Thus, each face references the deepest inner node in the original kd-tree (red nodes) which contains all adjacent sectors in its subtree. The referenced subtree allows to disambiguate ambiguous sectors at traversal time.

2 3 1 1 2 2 2 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 2 1 2 1 1 1 1

Figure 2.19: Example of two uniform grids with different resolution for a scene with 7 primitives. The left grid has resolution r = d1.5 ·p3

7e = 3 and the left grid has resolution r= d3 ·p37e = 6. The numbers in voxels indicate the number of referenced primitives.

Graph-based Acceleration Structure Gribble and Naveros [2013]very briefly hint at a novel graph-based spatial acceleration structure which has been developed with GPU ray tracing in mind. The goal is to get rid of the traversal stack (similar to uniform grids) while at the same time have a structure, which adapts to the geometry distribution (Alexis Naveros, personal communication, March 18, 2013). Algorithmic details can be retrieved from the publicly available implementation (Naveros [2016]). The basis of the structure is a kd-tree, which can be constructed with any construction strategy. For every face of the implicit AABBs of leaves (which are called sectors in this structure) a reference to the neighboring sector is stored. If several sectors touch an AABB face the closest common ancestor node of those sectors in the kd-tree is referenced. The resulting graph structure allows to traverse from sector to sector. Figure2.18depicts this graph-based acceleration structure for the example kd-tree in Figure 2.15from the kd-tree section. We can see that all leaves of the original kd-tree have turned into sectors. Two inner nodes of the

2.4. Other Acceleration Structures original kd-tree (orange circles) are still present in the graph to resolve neighborhood ambiguity for the pink and the leftmost yellow sectors. For traversal the ray must start in the sector, which contains the ray origin. If the ray origin is outside of the scene bounds the entry sector has to be identified first. For this, Gribble and Naveros [2013] sketch several methods to identify an initial starting sector in constant time. If those methods fail the kd-tree is traversed down to find the sector. For secondary rays in ray tracing based global illumination algorithms the starting sector is known, as they usually start on surfaces. After, the starting sector has been identified the ray checks all sector primitives for intersection. On intersection traversal can terminate immediately. Otherwise, the ray proceeds to the sector which is referenced by the sector face the ray intersects. If the face references an inner kd-tree node the subtree is traversed to find the next sector. All these operations do not require a traversal stack. Gribble and Naveros [2013]report competitive trace performance with kd-trees.

Uniform Grids Fujimoto et al. [1986]proposed uniform grids as a ray tracing acceleration structure. The scene bounds are subdivided into a grid of equally shaped and sized boxes or voxels (volumetric elements). Each voxel stores the list of all primitives which overlap with its volume. As a result a primitive can be referenced in more than one voxel. According toPharr et al. [2016], the number of voxels optimally is roughly proportional to the number n of primitives. For uniformly distributed geometry this results in a grid resolution r ∈ N of r ≈ cp3

nwith some constant c for an r × r × r grid. Traversal of the grid processes voxels which intersect the ray in front to back order. This does not require a traversal stack. For each voxel the ray visits primitive intersection with all referenced primitives is performed. Traversal can terminate as soon as an intersection has been found inside the current voxel. Figure2.19depicts a grid with two different resolutions. Fuji- moto et al. [1986] proposed the 3DDDA (3d digital differential analyzer) algorithm as an efficient implementation of this traversal process whichAmanatides and Woo [1987] improved on. In the best case a grid can find an intersection after just a couple of steps if the intersection occurs in the starting cell or its neighborhood. This gives essentially an O(1) traversal complexity in the number of grid cells in this case.

Grids have some disadvantages which partially can be seen in Figure2.19. One prob- lem is that they do not adapt well to the geometry distribution. Only the grid resolution r can be adapted. If no intersection is found tracing parallel to a coordinate axis requires roughly r traversal steps. Tracing along the diagonal of the grid requires roughly p3r traversal steps. Thus the worst case complexity of grid traversal is O(r) if no intersection is found. Using the optimal resolution of r ≈ cp3

nthis results in a complexity of O(p3n), which is much higher than the O(log n) complexity of tree-based ray tracing. If the reso- lution r is too high too many traversal steps have to be performed to find an intersection and much time is spent in traversing empty areas. A higher resolution also causes a higher memory consumption due to the increased number of voxels and duplicate primitive references. Sramek and Kaufman [2000]andEs and ˙I¸sler [2007]proposed to enrich voxels with additional empty space information computed in a preprocess which allows to per- form larger leaps over empty regions. This allows to drastically reduce traversal for scenes with large empty regions, but further increases the memory overhead. If r is too low too many primitives are stored per voxel which results in a larger number of unnecessary primitive intersection tests.

construction complexity. Hapala et al. [2011]analyzed when it is more beneficial to use grids based on the number of rays to trace and determined critical points in the number of rays at which the higher construction cost of trees is amortized. The number of primitives of their scenes ranged from hundreds to millions. For random rays starting outside of the scene the critical point was approximately proportional to the number of primitives. For rays generated from a path tracer critical points where heavily scene dependent and seemingly independent of the number of scene primitives. For more than half of the 24 test scenes grids never paid off or had their critical point at a couple thousands of rays. For other scenes the critical point ranged from 1M to 50M rays. It would be interesting to see how the empty space leaping techniques fromSramek and Kaufman [2000]andEs and ˙I¸sler [2007]affect the critical point.

In document Higher Performance Traversal and Construction of Tree-Based Raytracing Acceleration Structures (Page 42-46)