Limitations and Extensions - Interactive global illumination on the CPU

This section examines the work presented in this thesis and discuss the limitations of the current approaches as well as a number of possible extensions and directions for further research.

During the examination of the impact of selective rendering on state-of-the-art ray tracing algorithms the selective guidance consisted of sub-sampling the image with a regular stride, this was done to provide an easy and controlled mechanism with which to degrade the overall spatial coherence. Of further interest would be the examination of more complex selective guidance methods, such as one of those outlined in Section 2.3.2 such as the saliency map [YPG01] and task map [CCW03] to drive the computation. In certain cases, these methods would offer increased spatial coherence over the approach used in Chapter 4, this is due to the samples being unevenly distributed and forming groups of spatially coherent samples, potentially leading to to improved performance. The addition of such a metric, which is not trivial to compute, would introduce a serial critical section into a highly parallel system. Utilising other computational resources, such as the GPU, to calculate these guidance metrics, as in Lee et al. [LKC09], and the the trade off between the computation of an expensive selective guidance metric and more spatially coherent ray distribution would need to be examined. For the implementation of adaptive interleaved sampling (Chapter 5) the ray tracing kernel could be enhanced with features such as packetisation [WSBW01]. Unlike most other adaptive approaches, the method should adapt well to a faster ray tracing kernel since the guidance and sampling for each tile are computed at the same time making it naturally coherent and suited for packetisation. Trac- ing the rays for the SS and ID components (Sections 5.3.3 and 5.3.2) could be packetised as multiple rays would share an origin on a light source or at a VPL due to the way the methods reuse samples for the same pixel in the IS pattern as it tiled. Another potential avenue of investigation would be more complex global heuristics. While only local heuristics that utilise information for a single tile and one component were implemented, global heuristics that use information from surrounding tiles as well as multiple components would be of interest. Fi- nally the applicability of the framework for other algorithms such as GPU-based global illumination techniques, and how the criteria would need to be adjusted, would be worth examining.

of further research. One aspect common to most caching mechanisms is that the search for cached samples to interpolate from can impact the performance as the cache count increases. This problem is further accentuated since the cache count affects the update computation. Since samples computed in earlier frames might not be re-used (if they do not contribute to the current view point), ageing methods similar to those presented by Tawara et al. [TMS04], whereby cached samples that are infrequently used are discarded, would directly improve the performance. The test for ageing would not impact on the current temporal instant cache method and could be integrated as part of the update function when cycling through each cached sample (Line 2, Algorithm 1).

Temporal instance caching supports on-demand re-evaluation of the visibility rays, which although partially improving performance, still requires major parts of the computation for each sample to be executed every frame. An alternative approach would be to only update cached samples when requested for interpola- tion, which would entail that the update is performed on demand. This would require maintaining a structure consisting of those VPLs which are invalidated and those objects that would have moved at each frame. The on-demand cached sample update would need to query this structure to identify which visibility rays to update. Ageing would benefit this method by placing an upper-bound on the number of frames that the structure would need to store. Computing gradients for the instant cache, and possibly the temporal method also, which, as with similar gradient methods [WH92], could reduce the number of cached samples and help mitigate this issue.

Both adaptive interleaved sampling and temporal instant caching rely on VPLs in a similar manner to that of instant radiosity. One of the limitations of methods based on instant radiosity is that their rendering time is, for the most part, linearly dependent on the number of VPLs that are shot. For the case of the temporal instant cache, the situation is further aggravated since the temporal update is also dependent on the number of VPLs. Several approaches have been proposed to improve instant radiosity-based algorithms and alleviate this problem. Importance has been used in the past to direct VPL placement for complex environments [WBS03], thus reducing the number of VPLs that are required to be shot. Temporal-awareness was used to ensure that coherence between VPL placement was maintained. This importance would allow the algorithms to op- erate without large numbers of VPLs for complex highly occluded scenes. This could be integrated into the both algorithms as a pre-process before shooting the

VPLs, requiring little changes to the methods as presented, and few modifica- tions to the algorithm presented by Waldet al. [WBS03]. Methods such as those presented in Segoviaet al. [SIMP06a, SIP07]; Wald et al. [WBS03] could also be investigated to help improve VPL distribution, especially for complex scenes and those with high levels of occlusion. Lightcuts [WFA∗05,WABG06] have been used to cluster point light sources (or VPLs) and reduce the VPL processing count. Their utilisation in conjunction with the temporal instant cache and adaptive interleaved sampling can be mutually beneficial. This may require adding some form of temporal criteria to lightcuts to ensure that the VPL clustering does not change drastically over frames, due to the interactivity of the systems. Lightcuts require building the clusters binary tree and selecting, for each shading point, the most appropriate cut. These operations incur non-negligible overheads that might compromise interactivity; careful optimisation and eventual relaxation of the clustering and cut selection criteria might be required.

Although the wait-free algorithm described in Chapter 7 has shown good scalability with up to eight threads, further investigation would be interesting to identify the limits of this trend by running the algorithm on machines with a larger number of processors sharing the same address space. Also the memory or- ganisation might impact on the performance of the proposed algorithm, especially with an increased number of threads. Utilisation of the irradiance cache within dynamic environments, i.e., those where geometry might change between frames, would require the ability to remove from the shared data structure records which became invalid as well as those that are no longer being used. Assessment of a wait-free synchronisation algorithms supporting this removal operation would be of great interest.

In document Interactive global illumination on the CPU (Page 150-152)