Generative reordering - Practical photon mapping in hardware

The naive ordering can be thought of as the base case for reordering. The first set of techniques examine and modify the order in which the final gather ray directions are generated. These rays are determined before ray casting has been performed. As a consequence, the final locations of the photon gathers, denoted yi in Section 2.6.2, are

not yet known. Even without this knowledge, significant increases in temporal locality are possible.

In many scenes the eye rays from neighboring pixels will intersect the scene at points

x in close proximity to each other (Wald et al., 2001). The nature of this coherence is that the origins of the final gather rays cast during the Monte Carlo integration will tend to be similar for pixels near to each other in the image.

Figure 3.3: A small image tile has the property that its pixels will tend to project to a small number of objects that are close to each other. The origins for the final gather rays will therefore also be closer together than those of large tiles.

3.3.1 Tiled reordering

It would seem that this coherence could be exploited by breaking the screen into tiles. The naive photon map algorithm would then be applied to each tile independently. The pixels of each individual tile are processed in scanline order. The list of associated photon gather sites, Y<a,b>_{, is an enumeration of the photon gather locations for the tile}_{<a, b>}_.

A similar technique is commonly used in graphics rasterization hardware to improve texture memory locality (McCormack et al., 1998). As demonstrated in Figure 3.3, image tiles that are of moderate size generally project onto just a small portion of the scene, causing all the gather rays in Y<a,b> _{to have similar origins. This will be true}

except in scenes that have significant depth discontinuities, such as a leafy forest. The problem with the tiled approach by itself is that while the origins, x, of the rays used by the Monte Carlo integration are similar, the directions, ω~i, remain spread

across the hemisphere. In scenes which consist primarily of relatively open rooms, with or without complex objects or wall surface geometry, the resulting search locationsyiremain

G1 G1 G2G2 G3G3 G4G4 G5G5 G6G6 G7G7 G8G8 G9G9 G2 G6 G9 G8 G7 G4 G1 G3 G5

Figure 3.4: Although the division of the image into tiles brings coherence to the origins of the final gather rays, the directions of the rays remains random. To expose the coherence in the location of photon gathers, the tile is processed multiple times. During each pass only those final gather rays with similar directions are generated.

too scattered throughout the scene to improve cache efficiency. This lack of improvement can be seen on the top line of the graphs in Figure 3.5. As the tile sizes varies, there is not a noticeable change in the bandwidth requirements of the naive tiled algorithm.

3.3.2 Tiled direction-binning reordering

The tiled algorithm can be improved by explicitly grouping the final gather rays by direction, ω~i, in addition to the implicit grouping by origin, x. The resulting rays will

share both similar origins and directions (see Figure 3.4). They will therefore tend to highly coherent and intersect the scene at points yi near each other (Wald et al., 2001).

This generative ordering can be implemented by performing multiple passes over the tile, after the initial ray casting has located the eye-ray intersections x. Each pass will

only generate those rays,ω~i, that fall within a specified portion of the hemisphere. This

is in contrast to the naive algorithm which generates all the rays for a single eye ray before continuing. The hemisphere is divided into bins of equal solid angle. The number of bins is a system parameter; the smaller that each direction bin is, the more coherent the rays generated will be. This benefit is offset by the repeated work incurred during each pass. The correct tradeoff is found experimentally.

The tiled direction-binned reordering algorithm requires less than a third of the naive algorithm’s bandwidth for all tile sizes larger than 4×4 (Figure 3.5). The modified Cornell scene is reduced from 50GB per image to 13 GB with a tile size of 16×16, while the Sponza scene is reduced from 367 GB to 58 GB. An interesting feature of the graphs in Figure 3.5 is the knee at tiles of size 16×16. As mentioned above, larger tiles cover multiple surfaces, reducing the coherence of the final gather ray origins, x. The result is that the final gather rays for the tile will spread further out in the scene when the tiles get large. For the test scenes in Chapter 1, this knee occurred at 16×16 and it was determined to use 16×16 bins for all results presented in this chapter.

When generating an image using the photon map, highly specular surfaces such as mirrors and glass objects are treated specially. Instead of using a final gather immediately, the eye ray is reflected and/or refracted as a single ray until it reaches a diffuse or glossy surface. If this occurs to some of the pixels in a tile, then the origins of the final gather rays will be very far apart, eliminating the coherence of photon gather locations. The implementation used in this dissertation tackles this problem by creating a list of those eye rays affected, and delaying all action on them until the final gathers for all other pixels in the tile are handled.

1 2 4 8 16 32 64 128 0 5 10 15 20 25 30 35 40 45 50 55 Tile Size Bandwidth (GB) Tiled Tiled DirBin Tiled DirBin Hash Tiled Hilbert

(a) Modified Cornell box,NF G = 33

1 2 4 8 16 32 64 128 0 50 100 150 200 250 300 350 400 Tile Size Bandwidth (GB) Tiled Tiled DirBin Tiled DirBin Hash Tiled Hilbert

(b) The Sponza atrium,NF G= 200

Figure 3.5: The tiled Hilbert curve reordering results in the lowest bandwidth for each tile size. Only tiles of moderate size are practical due to internal storage constraints. 16×16 and 32×32 are both feasible and perform well for the cost effective reorderings of tiled direction binned, both with and without hash reordering.

In document Practical photon mapping in hardware (Page 64-69)