Distributed ray tracing - Foundations and Methods for GPU based Image Synthesis

(a)(a)(a)(a)(a)(a)(a)(a) (a) (a) (a) (a) (a) (a) (a) (a) (a) (b)(b)(b)(b)(b)(b)(b)(b)(b)(b)(b)(b)(b)(b)(b)(b)(b)

Figure 2.10: The image (a) shows motion blur, depth-of-field and soft shadows rendered using distributed ray tracing[Cook et al., 1984]. Figure (b) illustrates motion blur intro- duced by camera and object motion.

2.4 Distributed ray tracing

Several phenomena such as perfect specular reflections and transparent objects can be rendered using recursive ray tracing[Whitted, 1980] (see Section 2.1.5). To simulate motion blur, depth-of-field, and soft shadows it is necessary to generate several rays per pixel. Cook et al.[1984] proposed sampling strategies to incorporate these effects. Thereby, a new ray with an uniformly distributed origin on a camera lens is generated. In addition we have to sample in the time domain to create motion blur. Therefore, the simplified pinhole camera model 2.1.1 has to be replaced by a lens and aperture.

Depth-of-field The basic idea of depth-of-field rendering using distributed ray tracing is to sample the associated size of the lens aperture and create a ray from the lens position with a direction through the focal plane. Therefore we need the know the focal distance to compute the focal point.

At first we sample a point on the unit disk. Therefore we map a randomly selected point on the unit square(ξ₁,ξ₂) to a point on a unit disk so that we achieve a uniform distribution. The most basic way to map the unit square is to use a polar mapping such as r= pξ1and

θ = 2πξ2. Shirley and Chiu[1997] presented an algorithm that maps points onto a unit

disk(ξ₁,ξ₂) ∈ [−1, 1]2 to concentric circles. To compute a new ray origin we multiply the sampled point with the lens radius (size of the lens aperture).

To create the new ray direction we assume that the focal plane is perpendicular to the z-axis. This simplifies the computation of the focal point ~f pas we know that a ray which passes through the center of the lens is not refracted — the direction is not changed (see Section 2.2.1). We need to compute the intersection of the original ray r with the focal plane using the focal distance i as follows:

f p= ~or+

d_z_rd~r. (2.25)

Motion blur In addition to the spatial domain (lens(u, v)) the time domain t is sampled to incorporate motion blur. All that is needed is to calculate the position of an object at the time t between two transformations with t = 0 and t = 1. The calculation of the position can be arbitrarily complex as in general the motion of an object does not have to be linear. To simplify the problem it is possible to assume that the complex motion is

Chapter 2. Background

piecewise linear so we can simply interpolate between t = 0 and t = 1. Afterwards, the intersection of the transformed object with the ray has to be computed.

To accelerate distributed ray tracing one can use specialized acceleration structures which encompass the motion trajectory. For example, a BVH tree can be constructed for a cer- tain t (e.g. t = 0.5). During the ray traversal each bounding box must be interpolated according to t. A tighter bounding volume can reduce the number of intersection tests. The intersection test itself becomes more complex and therefore the overall performance can suffer. Olsson[2007] extended kD-trees by adding a temporal split in the time domain. The increasing number of object references introduce a significant memory over- head, which limits its practical applicability. Grünschloß et al. [2011] proposed a 4D space-time extension to the spatial split BVH algorithm[Stich et al., 2009].

Part I

Chapter

3

Motivation

With the introduction of the programmable rendering pipeline, general purpose computing on GPUs became available not only for computer graphics but also gained a wide acceptance in the high performance and scientific computing community. In this part of the thesis we will introduce the current GPU architecture, discuss its implications on pro- gramming models, describe the mapping of ray tracing on a GPU, and outline possible drawbacks. We present a dynamic memory allocation algorithm (Chapter 4) for arbitrary allocation requests optimized for many-core architectures such as GPUs. The allocator can be applied not only to ray tracing but to high performance computing as well. In Chapter 5 we introduce a dynamic load balancing and distribution algorithm to support interactive applications in a GPU cluster environment. We use a classic bidirectional path tracer to evaluate the scalability and performance of our approach.

3.1 Introduction

Parallel architectures can be classified using Flynn’s taxonomy [Flynn, 1972]. They are based on single instruction multiple data (SIMD) model or multiple instructions multiple data (MIMD). Multiple processing units are executing the same single instruction on dif- ferent data in parallel or multiple instructions in case of MIMD. Based on the viewpoint, current CPUs and GPUs can be grouped into different classes. A single core of a CPU works in the SIMD fashion while multiple cores can execute different instructions in parallel and therefore will be classified into MIMD — going up to clusters of compute units.

In document Foundations and Methods for GPU based Image Synthesis (Page 31-35)