7.4 Simulation and analysis
7.5.4 Limitations
There are several limitations to the overall system design, the specific architecture pre- sented here, and its evaluation. The most serious is the requirement that the photon map be pre-computed by the host and broadcast to each rendering chip. Depending on host speed, this precludes dynamic scenes with moving objects or light sources; only the camera is allowed to move. The cost of generating a photon map is significantly less than that of rendering from it using final gather visualization, so it is possible that an architecture could incorporate this task directly.
Although the architecture is not itself limited to Lambertian and Phong BRDFs, the simulation and performance analysis is. There are two ways that the performance of the system is sensitive to the choice of allowed BRDF. The computation costs of the BRDF can be significant, but it is the cost of finishing the evaluation of the BRDF,CB2 FLOPs, that is particular important as it is part of the photon gather cost for final gathers. Secondly, the storage size of the partially evaluated BRDF, SB bytes, not only changes
the size of the packets emitted by the ray caster, but also the storage that the controller and photon gatherer hold.
One area where the architecture does not scale well is with increases in either scene complexity or photon map size. It is currently required that both of these data structures be fully replicated to the dedicated memory of each rendering chip. Relaxing this would require a complete redesign of the architecture.
Only a functional simulation was constructed. Each unit was written in a high level language, accepts the specified input and emits the described output. Every operation
performed was justified in terms of analyzed computation cost, but was implemented with general purpose libraries. No effort was made to be cycle accurate, so there is no simulated timing data. A logical extension of this research would be a low level simulator, or implementation of key units on a FPGA to ensure validity of the design.
CHAPTER 8
SUMMARY AND CONCLUSION
The growing demand for realistic image synthesis using global illumination is clear. Un- fortunately, the interaction of light, in a general way, with multiple surfaces between the light source and the viewer is very costly to simulate, as it involves a potentially unbounded number of computations. This problem is particularly acute for interactive applications such as video games and training simulators. These applications must gen- erate a new image dozens of times a second.
The research presented in this dissertation provides one possible solution to the prob- lem by providing a novel hardware architecture that interactively renders scenes using the photon mapping algorithm. The photon mapping algorithm accurately renders many of the visual effects that are expected in real scenes. However, the resources it requires have precluded an interactive implementation of the full algorithm. In my dissertation, I have addressed this issue by presenting several techniques that reduce the resource requirements. A hardware architecture implementing this techniques was shown to be promising, and presents a potential direction for the next generation of graphics hard- ware.
8.1
Research Contributions
The research presented in this dissertation has made several contributions. These include novel techniques to dramatically reduce the bandwidth cost of photon mapping. These
techniques were then combined into a feasible hardware architecture, which could be built in the next three years, that supports interactive applications.
Specifically, the research contributions of my work include:
Low bandwidth photon gathers using reordering: I presented photon gather re- ordering in Chapter 3. This technique generates the exact same images as the standard photon mapping algorithm, but reorders the computations such that the memory accesses become more coherent. The higher locality of reference increases the efficiency of a cache, reducing the bandwidth requirements.
Several reordering algorithms were introduced and compared. While the Hilbert reordering, applied to the entire image, reduced bandwidth requirements by four orders of magnitude, it required a prohibitive amount of intermediate storage. The combination of the tiled direction-binning generative reordering with the hashed deferred reordering is practical, easy to implement and highly effective, achieving over an order of magnitude reduction in bandwidth requirements, 357 GB to 31 GB for the Sponza atrium image.
Tiled irradiance caching with pre-computed radius for split-sphere heuristic: Irradiance caching reduces the number of final gathers by interpolating previously computed values. Although it should only be used on purely diffuse surfaces, it can be highly effective at reducing the computation and memory bandwidth require- ments of photon mapping for some scenes. The conventional irradiance caching algorithm, however, imposes a sequential dependency between the pixels of an im- age, preventing efficient parallel execution.
In Chapter 5, I laid out two solutions that when implemented together allow irra- diance caching to be used in a parallel rendering system. The first used a separate irradiance cache for each tile, eliminating communication between processors. The second stored a pre-computed value for Ward’s split-sphere heuristic, permitting
the combination of irradiance caching with photon gather reordering. The mod- ified irradiance caching algorithm was able to avoid up to half of all gathers in compatible scenes.
Combined importance sampling: Global importance sampling, which generates final gather rays in proportion to prior knowledge of both the incident radiance and the surface reflectance, reduces the number final gather rays required to generate an image. Previously published sampling algorithms maintained the prior knowledge in two separate sampling strategies. Although providing superior sampling, this raises costs.
Combined importance sampling was presented in Chapter 6 as a technique that lowers computational and storage costs by merging multiple probability distribution functions together, leaving only a single sampling strategy. Combined importance sampling was demonstrated to be effective in common scenes while being cheap to compute. For the test scenes shown in Chapter 7, with complex illumination and/or glossy surfaces, the number of final gather rays are reduced by approximately one third while generating higher quality images.
A complete photon mapping architecture: In Chapter 7, these three techniques were combined into a complete hardware architecture, which was then function- ally simulated. It was shown that this architecture would be capable of rendering scenes with complex illumination from a static pre-computed photon map. A tar- get implementation, using two expansion boards in a workstation with a total of 8 replications of a custom designed chip, would be expected to render images of the test scenes at rates of at least 30 frames per second. As the scene geometry, textures and photon map are replicated to each chip, they must fit in the dedicated memory, proposed at 256 MB. This architecture was shown to be feasible, for the
expected semiconductor technology of 2010, by measuring and/or calculating the external bandwidth, computation, inter-unit bandwidth and storage requirements.
8.2
Limitations
The techniques and architecture presented in this dissertation are promising, but there are limitations that should be addressed in future research. The first set of limitations address the type of application and scenes that may be efficiently rendered. Because the photon map is pre-computed and broadcast to each rendering chip, the scene and the illumination must remain static. This presents an obstacle to applications where the dynamic motion of objects, not just the viewpoint, is required.
A second limitation on the scenes is the permissible size of the scene representation. The entire scene including material properties and textures must fit in the relatively small, dedicated memories of each rendering chip. This will be a concern for larger environments or scenes highly detailed with geometry or textures.
Scenes with many glossy surfaces or complicated geometry are unable to obtain a significant benefit from irradiance caching. Outdoor natural scenes with leafy plants and glass and chrome office interiors, for example, will require more final gathers. In contrast, importance sampling works best when the surfaces are glossy or the illumination localized to a few regions of the visible hemisphere. The current system relies on the operator to select the number of final gather rays to be generated. An adaptive system would be beneficial, but requires that some computations be fully resolved before determining if others should be computed. It is therefore unclear how to incorporate adaptive sampling with photon gather reordering.
A further limitation of combined importance sampling, as presented in this disser- tation, is using a tabulated probability distribution function. High frequencies in the surface reflectance or incident radiance functions are lost or aliased. This reduces the ef-
ficiency of the sampling strategy. Research should be performed on applying the general idea, of merging multiple sampling strategies into one, to p.d.f.s that can represent sharp functions.
The architecture simulation was only performed at the functional level. A cycle- accurate simulation of an actual hardware design would provide not only higher confi- dence in the architecture but also a fuller sense of the cost of implementation, performance of operation, and correctness of design.