GPU Gems 3

(1)

(2)

GPU Gems 3 - Graphically Rich Book

Copyright

Foreword

Preface

Contributors

Part I: Geometry

Chapter 1. Generating Complex Procedural Terrains Using the GPU

Section 1.1. Introduction

Section 1.2. Marching Cubes and the Density Function

Section 1.3. An Overview of the Terrain Generation System

Section 1.4. Generating the Polygons Within a Block of Terrain

Section 1.5. Texturing and Shading

Section 1.6. Considerations for Real-World Applications

Section 1.7. Conclusion

Section 1.8. References

Chapter 2. Animated Crowd Rendering

Section 2.1. Motivation

Section 2.2. A Brief Review of Instancing

Section 2.3. Details of the Technique

Section 2.4. Other Considerations

Section 2.5. Conclusion

Section 2.6. References

Chapter 3. DirectX 10 Blend Shapes: Breaking the Limits

Section 3.1. Introduction

Section 3.2. How Does It Work?

Section 3.3. Running the Sample

Section 3.4. Performance

Section 3.5. References

(3)

Section 4.1. Introduction

Section 4.2. Silhouette Clipping

Section 4.3. Shadows

Section 4.4. Leaf Lighting

Section 4.5. High Dynamic Range and Antialiasing

Section 4.6. Alpha to Coverage

Section 4.7. Conclusion

Section 4.8. References

Chapter 5. Generic Adaptive Mesh Refinement

Section 5.1. Introduction

Section 5.2. Overview

Section 5.3. Adaptive Refinement Patterns

Section 5.4. Rendering Workflow

Section 5.5. Results

Section 5.6. Conclusion and Improvements

Section 5.7. References

Chapter 6. GPU-Generated Procedural Wind Animations for Trees

Section 6.1. Introduction

Section 6.2. Procedural Animations on the GPU

Section 6.3. A Phenomenological Approach

Section 6.4. The Simulation Step

Section 6.5. Rendering the Tree

Section 6.6. Analysis and Comparison

Section 6.7. Summary

Section 6.8. References

Chapter 7. Point-Based Visualization of Metaballs on a GPU

Section 7.1. Metaballs, Smoothed Particle Hydrodynamics, and Surface

Particles

Section 7.2. Constraining Particles

Section 7.3. Local Particle Repulsion

Section 7.4. Global Particle Dispersion

(4)

Section 7.5. Performance

Section 7.6. Rendering

Section 7.7. Conclusion

Section 7.8. References

Part II: Light and Shadows

Chapter 8. Summed-Area Variance Shadow Maps

Section 8.1. Introduction

Section 8.2. Related Work

Section 8.3. Percentage-Closer Filtering

Section 8.4. Variance Shadow Maps

Section 8.5. Summed-Area Variance Shadow Maps

Section 8.6. Percentage-Closer Soft Shadows

Section 8.7. Conclusion

Section 8.8. References

Chapter 9. Interactive Cinematic Relighting with Global Illumination

Section 9.1. Introduction

Section 9.2. An Overview of the Algorithm

Section 9.3. Gather Samples

Section 9.4. One-Bounce Indirect Illumination

Section 9.5. Wavelets for Compression

Section 9.6. Adding Multiple Bounces

Section 9.7. Packing Sparse Matrix Data

Section 9.8. A GPU-Based Relighting Engine

Section 9.9. Results

Section 9.10. Conclusion

Section 9.11. References

Chapter 10. Parallel-Split Shadow Maps on Programmable GPUs

Section 10.1. Introduction

Section 10.2. The Algorithm

Section 10.3. Hardware-Specific Implementations

Section 10.4. Further Optimizations

(5)

Section 10.5. Results

Section 10.6. Conclusion

Section 10.7. References

Chapter 11. Efficient and Robust Shadow Volumes Using Hierarchical

Occlusion Culling and Geometry Shaders

Section 11.1. Introduction

Section 11.2. An Overview of Shadow Volumes

Section 11.3. Our Implementation

Section 11.4. Conclusion

Section 11.5. References

Chapter 12. High-Quality Ambient Occlusion

Section 12.1. Review

Section 12.2. Problems

Section 12.3. A Robust Solution

Section 12.4. Results

Section 12.5. Performance

Section 12.6. Caveats

Section 12.7. Future Work

Section 12.8. References

Chapter 13. Volumetric Light Scattering as a Post-Process

Section 13.1. Introduction

Section 13.2. Crepuscular Rays

Section 13.3. Volumetric Light Scattering

Section 13.4. The Post-Process Pixel Shader

Section 13.5. Screen-Space Occlusion Methods

Section 13.6. Caveats

Section 13.7. The Demo

Section 13.8. Extensions

Section 13.9. Summary

Section 13.10. References

Part III: Rendering

(6)

Chapter 14. Advanced Techniques for Realistic Real-Time Skin

Rendering

Section 14.1. The Appearance of Skin

Section 14.2. An Overview of the Skin-Rendering System

Section 14.3. Specular Surface Reflectance

Section 14.4. Scattering Theory

Section 14.5. Advanced Subsurface Scattering

Section 14.6. A Fast Bloom Filter

Section 14.7. Conclusion

Section 14.8. References

Chapter 15. Playable Universal Capture

Section 15.1. Introduction

Section 15.2. The Data Acquisition Pipeline

Section 15.3. Compression and Decompression of the Animated

Textures

Section 15.4. Sequencing Performances

Section 15.5. Conclusion

Section 15.6. References

Chapter 16. Vegetation Procedural Animation and Shading in Crysis

Section 16.1. Procedural Animation

Section 16.2. Vegetation Shading

Section 16.3. Conclusion

Section 16.4. References

Chapter 17. Robust Multiple Specular Reflections and Refractions

Section 17.1. Introduction

Section 17.2. Tracing Secondary Rays

Section 17.3. Reflections and Refractions

Section 17.4. Results

Section 17.5. Conclusion

Section 17.6. References

(7)

Section 18.1. Introduction

Section 18.2. A Brief Review of Relief Mapping

Section 18.3. Cone Step Mapping

Section 18.4. Relaxed Cone Stepping

Section 18.5. Conclusion

Section 18.6. References

Chapter 19. Deferred Shading in Tabula Rasa

Section 19.1. Introduction

Section 19.2. Some Background

Section 19.3. Forward Shading Support

Section 19.4. Advanced Lighting Features

Section 19.5. Benefits of a Readable Depth and Normal Buffer

Section 19.6. Caveats

Section 19.7. Optimizations

Section 19.8. Issues

Section 19.9. Results

Section 19.10. Conclusion

Section 19.11. References

Chapter 20. GPU-Based Importance Sampling

Section 20.1. Introduction

Section 20.2. Rendering Formulation

Section 20.3. Quasirandom Low-Discrepancy Sequences

Section 20.4. Mipmap Filtered Samples

Section 20.5. Performance

Section 20.6. Conclusion

Section 20.7. Further Reading and References

Part IV: Image Effects

Chapter 21. True Impostors

Section 21.1. Introduction

Section 21.2. Algorithm and Implementation Details

Section 21.3. Results

(8)

Section 21.4. Conclusion

Section 21.5. References

Chapter 22. Baking Normal Maps on the GPU

Section 22.1. The Traditional Implementation

Section 22.2. Acceleration Structures

Section 22.3. Feeding the GPU

Section 22.4. Implementation

Section 22.5. Results

Section 22.6. Conclusion

Section 22.7. References

Chapter 23. High-Speed, Off-Screen Particles

Section 23.1. Motivation

Section 23.2. Off-Screen Rendering

Section 23.3. Downsampling Depth

Section 23.4. Depth Testing and Soft Particles

Section 23.5. Alpha Blending

Section 23.6. Mixed-Resolution Rendering

Section 23.7. Results

Section 23.8. Conclusion

Section 23.9. References

Chapter 24. The Importance of Being Linear

Section 24.1. Introduction

Section 24.2. Light, Displays, and Color Spaces

Section 24.3. The Symptoms

Section 24.4. The Cure

Section 24.5. Conclusion

Section 24.6. Further Reading

Chapter 25. Rendering Vector Art on the GPU

Section 25.1. Introduction

Section 25.2. Quadratic Splines

Section 25.3. Cubic Splines

(9)

Section 25.4. Triangulation

Section 25.5. Antialiasing

Section 25.6. Code

Section 25.7. Conclusion

Section 25.8. References

Chapter 26. Object Detection by Color: Using the GPU for Real-Time

Video Image Processing

Section 26.1. Image Processing Abstracted

Section 26.2. Object Detection by Color

Section 26.3. Conclusion

Section 26.4. Further Reading

Chapter 27. Motion Blur as a Post-Processing Effect

Section 27.1. Introduction

Section 27.2. Extracting Object Positions from the Depth Buffer

Section 27.3. Performing the Motion Blur

Section 27.4. Handling Dynamic Objects

Section 27.5. Masking Off Objects

Section 27.6. Additional Work

Section 27.7. Conclusion

Section 27.8. References

Chapter 28. Practical Post-Process Depth of Field

Section 28.1. Introduction

Section 28.2. Related Work

Section 28.3. Depth of Field

Section 28.4. Evolution of the Algorithm

Section 28.5. The Complete Algorithm

Section 28.6. Conclusion

Section 28.7. Limitations and Future Work

Section 28.8. References

Part V: Physics Simulation

(10)

Section 29.1. Introduction

Section 29.2. Rigid Body Simulation on the GPU

Section 29.3. Applications

Section 29.4. Conclusion

Section 29.5. Appendix

Section 29.6. References

Chapter 30. Real-Time Simulation and Rendering of 3D Fluids

Section 30.1. Introduction

Section 30.2. Simulation

Section 30.3. Rendering

Section 30.4. Conclusion

Section 30.5. References

Chapter 31. Fast N-Body Simulation with CUDA

Section 31.1. Introduction

Section 31.2. All-Pairs N-Body Simulation

Section 31.3. A CUDA Implementation of the All-Pairs N-Body

Algorithm

Section 31.4. Performance Results

Section 31.5. Previous Methods Using GPUs for N-Body Simulation

Section 31.6. Hierarchical N-Body Methods

Section 31.7. Conclusion

Section 31.8. References

Chapter 32. Broad-Phase Collision Detection with CUDA

Section 32.1. Broad-Phase Algorithms

Section 32.2. A CUDA Implementation of Spatial Subdivision

Section 32.3. Performance Results

Section 32.4. Conclusion

Section 32.5. References

Chapter 33. LCP Algorithms for Collision Detection Using CUDA

Section 33.1. Parallel Processing

(11)

Section 33.3. Determining Contact Points

Section 33.4. Mathematical Optimization

Section 33.5. The Convex Distance Calculation

Section 33.6. The Parallel LCP Solution Using CUDA

Section 33.7. Results

Section 33.8. References

Chapter 34. Signed Distance Fields Using Single-Pass GPU Scan

Conversion of Tetrahedra

Section 34.1. Introduction

Section 34.2. Leaking Artifacts in Scan Methods

Section 34.3. Our Tetrahedra GPU Scan Method

Section 34.4. Results

Section 34.5. Conclusion

Section 34.6. Future Work

Section 34.7. Further Reading

Section 34.8. References

Part VI: GPU Computing

Chapter 35. Fast Virus Signature Matching on the GPU

Section 35.1. Introduction

Section 35.2. Pattern Matching

Section 35.3. The GPU Implementation

Section 35.4. Results

Section 35.5. Conclusions and Future Work

Section 35.6. References

Chapter 36. AES Encryption and Decryption on the GPU

Section 36.1. New Functions for Integer Stream Processing

Section 36.2. An Overview of the AES Algorithm

Section 36.3. The AES Implementation on the GPU

Section 36.4. Performance

Section 36.5. Considerations for Parallelism

Section 36.6. Conclusion and Future Work

(12)

Section 36.7. References

Chapter 37. Efficient Random Number Generation and Application

Using CUDA

Section 37.1. Monte Carlo Simulations

Section 37.2. Random Number Generators

Section 37.3. Example Applications

Section 37.4. Conclusion

Section 37.5. References

Chapter 38. Imaging Earth's Subsurface Using CUDA

Section 38.1. Introduction

Section 38.2. Seismic Data

Section 38.3. Seismic Processing

Section 38.4. The GPU Implementation

Section 38.5. Performance

Section 38.6. Conclusion

Section 38.7. References

Chapter 39. Parallel Prefix Sum (Scan) with CUDA

Section 39.1. Introduction

Section 39.2. Implementation

Section 39.3. Applications of Scan

Section 39.4. Conclusion

Section 39.5. References

Chapter 40. Incremental Computation of the Gaussian

Section 40.1. Introduction and Related Work

Section 40.2. Polynomial Forward Differencing

Section 40.3. The Incremental Gaussian Algorithm

Section 40.4. Error Analysis

Section 40.5. Performance

Section 40.6. Conclusion

Section 40.7. References

(13)

Variable-Length GPU Feedback

Section 41.1. Introduction

Section 41.2. Why Use the Geometry Shader?

Section 41.3. Dynamic Output with the Geometry Shader

Section 41.4. Algorithms and Applications

Section 41.5. Benefits: GPU Locality and SLI

Section 41.6. Performance and Limits

Section 41.7. Conclusion

Section 41.8. References

Addison-Wesley Warranty on the DVD

NVIDIA Statement on the Software

DVD System Requirements

Inside Back Cover

Geometry

Light and Shadows

Rendering

Image Effects

Physics Simulation

GPU Computing

Index

A

B

C

D

E

F

G

H

I

J

(14)

K

L

M

N

O

P

Q

R

S

T

U

V

W

X

Z

(15)

GPU Gems 3

by Hubert Nguyen

Publisher: Addison Wesley Professional Pub Date: August 02, 2007

Print ISBN-10: 0-321-51526-9

Print ISBN-13: 978-0-321-51526-1 eText ISBN-10: 0-321-54542-7

eText ISBN-13: 978-0-321-54542-8 Pages: 1008

Table of Contents | Index

Overview

"The GPU Gems series features a collection of the most essential algorithms required by Next-Generation 3D Engines."

—Martin Mittring, Lead Graphics Programmer, Crytek

This third volume of the best-selling GPU Gems series provides a snapshot of today's latest Graphics Processing Unit (GPU) programming techniques. The programmability of modern GPUs allows developers to not only distinguish themselves from one another but also to use this awesome processing power for non-graphics applications, such as physics simulation, financial analysis, and even virus detection—particularly with the CUDA architecture. Graphics remains the leading application for GPUs, and readers will find that the latest algorithms create ultra-realistic characters, better lighting, and post-rendering compositing effects.

Major topics include

Geometry

Light and Shadows Rendering

Image Effects

Physics Simulation GPU Computing

(16)

3Dfacto

Adobe Systems Apple

Budapest University of Technology and Economics CGGVeritas

The Chinese University of Hong Kong Cornell University

Crytek

Czech Technical University in Prague Dartmouth College

Digital Illusions Creative Entertainment Eindhoven University of Technology Electronic Arts

Havok

Helsinki University of Technology Imperial College London

Infinity Ward Juniper Networks

LaBRI–INRIA, University of Bordeaux mental images Microsoft Research Move Interactive NCsoft Corporation NVIDIA Corporation Perpetual Entertainment Playlogic Game Factory Polytime

Rainbow Studios SEGA Corporation UFRGS (Brazil) Ulm University

University of California, Davis University of Central Florida University of Copenhagen University of Girona

University of Illinois at Urbana-Champaign University of North Carolina Chapel Hill University of Tokyo

University of Waterloo

Section Editors include NVIDIA engineers: Cyril Zeller, Evan Hart, Ignacio Castaño, Kevin Bjorke, Kevin Myers, and Nolan Goodnight.

(17)

The accompanying DVD includes complementary examples and sample programs.

(18)

GPU Gems 3

by Hubert Nguyen

Publisher: Addison Wesley Professional Pub Date: August 02, 2007

Print ISBN-10: 0-321-51526-9

Print ISBN-13: 978-0-321-51526-1 eText ISBN-10: 0-321-54542-7

eText ISBN-13: 978-0-321-54542-8 Pages: 1008

Table of Contents | Index Copyright

Foreword Preface

Contributors

Part I: Geometry

Chapter 1. Generating Complex Procedural Terrains Using the GPU Section 1.1. Introduction

Section 1.2. Marching Cubes and the Density Function

Section 1.3. An Overview of the Terrain Generation System Section 1.4. Generating the Polygons Within a Block of Terrain Section 1.5. Texturing and Shading

Section 1.6. Considerations for Real-World Applications Section 1.7. Conclusion

Section 1.8. References

Chapter 2. Animated Crowd Rendering Section 2.1. Motivation

Section 2.2. A Brief Review of Instancing Section 2.3. Details of the Technique Section 2.4. Other Considerations Section 2.5. Conclusion

Chapter 3. DirectX 10 Blend Shapes: Breaking the Limits Section 3.1. Introduction

(19)

Section 3.3. Running the Sample Section 3.4. Performance

Chapter 4. Next-Generation SpeedTree Rendering Section 4.1. Introduction

Section 4.2. Silhouette Clipping Section 4.3. Shadows

Section 4.4. Leaf Lighting

Section 4.5. High Dynamic Range and Antialiasing Section 4.6. Alpha to Coverage

Section 4.7. Conclusion Section 4.8. References

Chapter 5. Generic Adaptive Mesh Refinement Section 5.1. Introduction

Section 5.2. Overview

Section 5.3. Adaptive Refinement Patterns Section 5.4. Rendering Workflow

Section 5.5. Results

Section 5.6. Conclusion and Improvements Section 5.7. References

Chapter 6. GPU-Generated Procedural Wind Animations for Trees Section 6.1. Introduction

Section 6.2. Procedural Animations on the GPU Section 6.3. A Phenomenological Approach Section 6.4. The Simulation Step

Section 6.5. Rendering the Tree

Section 6.6. Analysis and Comparison Section 6.7. Summary

Chapter 7. Point-Based Visualization of Metaballs on a GPU

Section 7.1. Metaballs, Smoothed Particle Hydrodynamics, and Surface Particles

Section 7.2. Constraining Particles Section 7.3. Local Particle Repulsion Section 7.4. Global Particle Dispersion Section 7.5. Performance

(20)

Section 7.6. Rendering Section 7.7. Conclusion Section 7.8. References Part II: Light and Shadows

Chapter 8. Summed-Area Variance Shadow Maps Section 8.1. Introduction

Section 8.2. Related Work

Section 8.3. Percentage-Closer Filtering Section 8.4. Variance Shadow Maps

Section 8.5. Summed-Area Variance Shadow Maps Section 8.6. Percentage-Closer Soft Shadows

Chapter 9. Interactive Cinematic Relighting with Global Illumination Section 9.1. Introduction

Section 9.2. An Overview of the Algorithm Section 9.3. Gather Samples

Section 9.4. One-Bounce Indirect Illumination Section 9.5. Wavelets for Compression

Section 9.6. Adding Multiple Bounces Section 9.7. Packing Sparse Matrix Data Section 9.8. A GPU-Based Relighting Engine Section 9.9. Results

Chapter 10. Parallel-Split Shadow Maps on Programmable GPUs Section 10.1. Introduction

Section 10.2. The Algorithm

Section 10.3. Hardware-Specific Implementations Section 10.4. Further Optimizations

Section 10.5. Results Section 10.6. Conclusion Section 10.7. References

Chapter 11. Efficient and Robust Shadow Volumes Using Hierarchical Occlusion Culling and Geometry Shaders

(21)

Section 11.1. Introduction

Section 11.2. An Overview of Shadow Volumes Section 11.3. Our Implementation

Chapter 12. High-Quality Ambient Occlusion Section 12.1. Review

Section 12.2. Problems

Section 12.3. A Robust Solution Section 12.4. Results

Section 12.5. Performance Section 12.6. Caveats

Section 12.7. Future Work Section 12.8. References

Chapter 13. Volumetric Light Scattering as a Post-Process Section 13.1. Introduction

Section 13.2. Crepuscular Rays

Section 13.3. Volumetric Light Scattering Section 13.4. The Post-Process Pixel Shader Section 13.5. Screen-Space Occlusion Methods Section 13.6. Caveats

Section 13.7. The Demo Section 13.8. Extensions Section 13.9. Summary Section 13.10. References Part III: Rendering

Chapter 14. Advanced Techniques for Realistic Real-Time Skin Rendering Section 14.1. The Appearance of Skin

Section 14.2. An Overview of the Skin-Rendering System Section 14.3. Specular Surface Reflectance

Section 14.4. Scattering Theory

Section 14.5. Advanced Subsurface Scattering Section 14.6. A Fast Bloom Filter

(22)

Section 15.2. The Data Acquisition Pipeline

Section 15.3. Compression and Decompression of the Animated Textures Section 15.4. Sequencing Performances

Chapter 16. Vegetation Procedural Animation and Shading in Crysis Section 16.1. Procedural Animation

Section 16.2. Vegetation Shading Section 16.3. Conclusion

Chapter 17. Robust Multiple Specular Reflections and Refractions Section 17.1. Introduction

Section 17.2. Tracing Secondary Rays Section 17.3. Reflections and Refractions Section 17.4. Results

Chapter 18. Relaxed Cone Stepping for Relief Mapping Section 18.1. Introduction

Section 18.2. A Brief Review of Relief Mapping Section 18.3. Cone Step Mapping

Section 18.4. Relaxed Cone Stepping Section 18.5. Conclusion

Chapter 19. Deferred Shading in Tabula Rasa Section 19.1. Introduction

Section 19.2. Some Background

Section 19.3. Forward Shading Support Section 19.4. Advanced Lighting Features

Section 19.5. Benefits of a Readable Depth and Normal Buffer Section 19.6. Caveats

Section 19.7. Optimizations Section 19.8. Issues

Section 19.9. Results

(23)

Chapter 20. GPU-Based Importance Sampling Section 20.1. Introduction

Section 20.2. Rendering Formulation

Section 20.3. Quasirandom Low-Discrepancy Sequences Section 20.4. Mipmap Filtered Samples

Section 20.5. Performance Section 20.6. Conclusion

Section 20.7. Further Reading and References Part IV: Image Effects

Chapter 21. True Impostors Section 21.1. Introduction

Section 21.2. Algorithm and Implementation Details Section 21.3. Results

Chapter 22. Baking Normal Maps on the GPU Section 22.1. The Traditional Implementation Section 22.2. Acceleration Structures

Section 22.3. Feeding the GPU Section 22.4. Implementation Section 22.5. Results

Chapter 23. High-Speed, Off-Screen Particles Section 23.1. Motivation

Section 23.2. Off-Screen Rendering Section 23.3. Downsampling Depth

Section 23.4. Depth Testing and Soft Particles Section 23.5. Alpha Blending

Section 23.6. Mixed-Resolution Rendering Section 23.7. Results

Chapter 24. The Importance of Being Linear Section 24.1. Introduction

(24)

Section 24.2. Light, Displays, and Color Spaces Section 24.3. The Symptoms

Section 24.4. The Cure Section 24.5. Conclusion

Section 24.6. Further Reading

Chapter 25. Rendering Vector Art on the GPU Section 25.1. Introduction

Section 25.2. Quadratic Splines Section 25.3. Cubic Splines Section 25.4. Triangulation Section 25.5. Antialiasing Section 25.6. Code

Chapter 26. Object Detection by Color: Using the GPU for Real-Time Video Image Processing

Section 26.1. Image Processing Abstracted Section 26.2. Object Detection by Color Section 26.3. Conclusion

Section 26.4. Further Reading

Chapter 27. Motion Blur as a Post-Processing Effect Section 27.1. Introduction

Section 27.2. Extracting Object Positions from the Depth Buffer Section 27.3. Performing the Motion Blur

Section 27.4. Handling Dynamic Objects Section 27.5. Masking Off Objects

Section 27.6. Additional Work Section 27.7. Conclusion

Chapter 28. Practical Post-Process Depth of Field Section 28.1. Introduction

Section 28.2. Related Work Section 28.3. Depth of Field

Section 28.4. Evolution of the Algorithm Section 28.5. The Complete Algorithm Section 28.6. Conclusion

(25)

Section 28.7. Limitations and Future Work Section 28.8. References

Part V: Physics Simulation

Chapter 29. Real-Time Rigid Body Simulation on GPUs Section 29.1. Introduction

Section 29.2. Rigid Body Simulation on the GPU Section 29.3. Applications

Section 29.4. Conclusion Section 29.5. Appendix Section 29.6. References

Chapter 30. Real-Time Simulation and Rendering of 3D Fluids Section 30.1. Introduction

Section 30.2. Simulation Section 30.3. Rendering Section 30.4. Conclusion Section 30.5. References

Chapter 31. Fast N-Body Simulation with CUDA Section 31.1. Introduction

Section 31.2. All-Pairs N-Body Simulation

Section 31.3. A CUDA Implementation of the All-Pairs N-Body Algorithm Section 31.4. Performance Results

Section 31.5. Previous Methods Using GPUs for N-Body Simulation Section 31.6. Hierarchical N-Body Methods

Chapter 32. Broad-Phase Collision Detection with CUDA Section 32.1. Broad-Phase Algorithms

Section 32.2. A CUDA Implementation of Spatial Subdivision Section 32.3. Performance Results

Chapter 33. LCP Algorithms for Collision Detection Using CUDA Section 33.1. Parallel Processing

Section 33.2. The Physics Pipeline

Section 33.3. Determining Contact Points Section 33.4. Mathematical Optimization

(26)

Section 33.5. The Convex Distance Calculation

Section 33.6. The Parallel LCP Solution Using CUDA Section 33.7. Results

Chapter 34. Signed Distance Fields Using Single-Pass GPU Scan Conversion of Tetrahedra

Section 34.2. Leaking Artifacts in Scan Methods Section 34.3. Our Tetrahedra GPU Scan Method Section 34.4. Results

Section 34.5. Conclusion Section 34.6. Future Work Section 34.7. Further Reading Section 34.8. References

Part VI: GPU Computing

Chapter 35. Fast Virus Signature Matching on the GPU Section 35.1. Introduction

Section 35.2. Pattern Matching

Section 35.3. The GPU Implementation Section 35.4. Results

Section 35.5. Conclusions and Future Work Section 35.6. References

Chapter 36. AES Encryption and Decryption on the GPU

Section 36.1. New Functions for Integer Stream Processing Section 36.2. An Overview of the AES Algorithm

Section 36.3. The AES Implementation on the GPU Section 36.4. Performance

Section 36.5. Considerations for Parallelism Section 36.6. Conclusion and Future Work Section 36.7. References

Chapter 37. Efficient Random Number Generation and Application Using CUDA

Section 37.1. Monte Carlo Simulations Section 37.2. Random Number Generators Section 37.3. Example Applications

(27)

Chapter 38. Imaging Earth's Subsurface Using CUDA Section 38.1. Introduction

Section 38.2. Seismic Data

Section 38.3. Seismic Processing

Section 38.4. The GPU Implementation Section 38.5. Performance

Chapter 39. Parallel Prefix Sum (Scan) with CUDA Section 39.1. Introduction

Section 39.2. Implementation Section 39.3. Applications of Scan Section 39.4. Conclusion

Chapter 40. Incremental Computation of the Gaussian Section 40.1. Introduction and Related Work

Section 40.2. Polynomial Forward Differencing Section 40.3. The Incremental Gaussian Algorithm Section 40.4. Error Analysis

Section 40.5. Performance Section 40.6. Conclusion Section 40.7. References

Chapter 41. Using the Geometry Shader for Compact and Variable-Length GPU Feedback

Section 41.2. Why Use the Geometry Shader?

Section 41.3. Dynamic Output with the Geometry Shader Section 41.4. Algorithms and Applications

Section 41.5. Benefits: GPU Locality and SLI Section 41.6. Performance and Limits

Addison-Wesley Warranty on the DVD Addison-Wesley Warranty on the DVD NVIDIA Statement on the Software

(28)

DVD System Requirements Inside Back Cover

Geometry

Light and Shadows Rendering

Image Effects

Physics Simulation GPU Computing Index

(29)

Copyright

About the Cover: The image on the cover has been rendered in real time in

the "Human Head" technology demonstration created by the NVIDIA Demo Team. It illustrates the extreme level of realism achievable with the GeForce 8 Series of GPUs. The demo renders skin by using a physically based model that was previously used only in high-profile prerendered movie projects. Actor Doug Jones is the model represented in the demo. He recently starred as the Silver Surfer in Fantastic Four: Rise of the Silver Surfer.

Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals.

GeForce™, CUDA™, and NVIDIA Quadro® are trademarks or registered trademarks of NVIDIA Corporation.

The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no

responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein.

NVIDIA makes no warranty or representation that the techniques described herein are free from any Intellectual Property claims. The reader assumes all risk of any such claims based on his or her use of these techniques.

The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions

and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact: U.S. Corporate and Government Sales

(800) 382-3419

[email protected]

For sales outside of the United States, please contact: International Sales

[email protected]

Visit us on the Web: www.awprofessional.com

(30)

GPU gems 3 / edited by Hubert Nguyen. p. cm.

Includes bibliographical references and index.

ISBN-13: 978-0-321-51526-1 (hardback : alk. paper) ISBN-10: 0-321-51526-9

1. Computer graphics. 2. Real-time programming. I. Nguyen, Hubert.

T385.G6882 2007 006.6'6—dc22

All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or

transmission in any form or by any means, electronic, mechanical,

photocopying, recording, or likewise. For information regarding permissions, write to:

Pearson Education, Inc.

Rights and Contracts Department 501 Boylston Street, Suite 900 Boston, MA 02116

Fax: (617) 671-3447 ISBN-13: 978-0-321-51526-1

Text printed in the United States on recycled paper at Courier in Kendallville, Indiana.

(31)

Foreword

Composition, the organization of elemental operations into a nonobvious whole, is the essence of imperative programming. The instruction set

architecture (ISA) of a microprocessor is a versatile composition interface, which programmers of software renderers have used effectively and creatively in their quest for image realism. Early graphics hardware increased rendering performance, but often at a high cost in composability, and thus in

programmability and application innovation. Hardware with microprocessor-like programmability did evolve (for example, the Ikonas Graphics System), but the dominant form of graphics hardware acceleration has been organized around a fixed sequence of rendering operations, often referred to as the

graphics pipeline. Early interfaces to these systems—such as CORE and later,

PHIGS—allowed programmers to specify rendering results, but they were not designed for composition.

OpenGL, which I helped to evolve from its Silicon Graphics-defined

predecessor IRIS GL in the early 1990s, addressed the need for composability by specifying an architecture (informally called the OpenGL Machine) that was accessed through an imperative programmatic interface. Many features—for example, tightly specified semantics; table-driven operations such as stencil and depth-buffer functions; texture mapping exposed as a general 1D, 2D, and 3D lookup function; and required repeatability properties—ensured that

programmers could compose OpenGL operations with powerful and reliable results. Some of the useful techniques that OpenGL enabled include texture-based volume rendering, shadow volumes using stencil buffers, and

constructive solid geometry algorithms such as capping (the computation of surface planes at the intersections of clipping planes and solid objects defined by polygons). Ultimately, Mark Peercy and the coauthors of the SIGGRAPH 2000 paper "Interactive Multi-Pass Programmable Shading" demonstrated that arbitrary RenderMan shaders could be accelerated through the composition of OpenGL rendering operations.

During this decade, increases in the raw capability of integrated circuit technology allowed the OpenGL architecture (and later, Direct3D) to be extended to expose an ISA interface. These extensions appeared as

programmable vertex and fragment shaders within the graphics pipeline and now, with the introduction of CUDA, as a data-parallel ISA in near parity with that of the microprocessor. Although the cycle toward complete

microprocessor-like versatility is not complete, the tremendous power of

graphics hardware acceleration is more accessible than ever to programmers. And what computational power it is! At this writing, the NVIDIA GeForce 8800

(32)

Ultra performs over 400 billion floating-point operations per second—more than the most powerful supercomputer available a decade ago, and five times more than today's most powerful microprocessor. The data-parallel

programming model the Ultra supports allows its computational power to be harnessed without concern for the number of processors employed. This is critical, because while today's Ultra already includes over 100 processors, tomorrow's will include thousands, and then more. With no end in sight to the annual compounding of integrated circuit density known as Moore's Law,

massively parallel systems are clearly the future of computing, with graphics hardware leading the way.

GPU Gems 3 is a collection of state-of-the-art GPU programming examples. It

is about putting data-parallel processing to work. The first four sections focus on graphics-specific applications of GPUs in the areas of geometry, lighting and shadows, rendering, and image effects. Topics in the fifth and sixth sections broaden the scope by providing concrete examples of nongraphical applications that can now be addressed with data-parallel GPU technology. These

applications are diverse, ranging from rigid-body simulation to fluid flow simulation, from virus signature matching to encryption and decryption, and from random number generation to computation of the Gaussian.

Where is this all leading? The cover art reminds us that the mind remains the most capable parallel computing system of all. A long-term goal of computer science is to achieve and, ultimately, to surpass the capabilities of the human mind. It's exciting to think that the computer graphics community, as we

identify, address, and master the challenges of massively parallel computing, is contributing to the realization of this dream.

(33)

Preface

It has been only three years since the first GPU Gems book was introduced, and some areas of real-time graphics have truly become ultrarealistic. Chapter 14, "Advanced Techniques for Realistic Real-Time Skin Rendering," illustrates this evolution beautifully, describing a skin rendering technique that works so well that the data acquisition and animation will become the most challenging problem in rendering human characters for the next couple of years.

All this progress has been fueled by a sustained rhythm of GPU innovation. These processing units continue to become faster and more flexible in their use. Today's GPUs can process enormous amounts of data and are used not only for rendering 3D scenes, but also for processing images or performing massively parallel computing, such as financial statistics or terrain analysis for finding new oil fields.

Whether they are used for computing or graphics, GPUs need a software interface to drive them, and we are in the midst of an important transition. The new generation of APIs brings additional orthogonality and exposes new capabilities such as generating geometry programmatically. On the computing side, the CUDA architecture lets developers use a C-like language to perform computing tasks rather than forcing the programmer to use the graphics

pipeline. This architecture will allow developers without a graphics background to tap into the immense potential of the GPU.

More than 200 chapters were submitted by the GPU programming community, covering a large spectrum of GPU usage ranging from pure 3D rendering to nongraphics applications. Each of them went through a rigorous review process conducted both by NVIDIA's engineers and by external reviewers.

We were able to include 41 chapters, each of which went through another review, during which feedback from the editors and peer reviewers often significantly improved the content. Unfortunately, we could not include some excellent chapters, simply due to the space restriction of the book. It was difficult to establish the final table of contents, but we would like to thank everyone who sent a submission.

Intended Audience

For the graphics-related chapters, we expect the reader to be familiar with the fundamentals of computer graphics including graphics APIs such as DirectX and OpenGL, as well as their associated high-level programming languages,

(34)

will find in this book a wealth of applicable techniques for today's and tomorrow's GPUs.

Readers interested in computing and CUDA will find it best to know parallel computing concepts. C programming knowledge is also expected.

Trying the Code Samples

GPU Gems 3 comes with a disc that includes samples, movies, and other

demonstrations of the techniques described in this book. You can also go to the book's Web page to find the latest updates and supplemental materials:

developer.nvidia.com/gpugems3.

Acknowledgments

This book represents the dedication of many people—especially the numerous authors who submitted their most recent work to the GPU community by

contributing to this book. Without a doubt, these inspirational and powerful chapters will help thousands of developers push the envelope in their

applications.

Our section editors—Cyril Zeller, Evan Hart, Ignacio Castaño Aguado, Kevin Bjorke, Kevin Myers, and Nolan Goodnight—took on an invaluable role,

providing authors with feedback and guidance to make the chapters as good as they could be. Without their expertise and contributions above and beyond their usual workload, this book could not have been published.

Ensuring the clarity of GPU Gems 3 required numerous diagrams, illustrations, and screen shots. A lot of diligence went into unifying the graphic style of

about 500 figures, and we thank Michael Fornalski and Jim Reed for their wonderful work on these. We are grateful to Huey Nguyen and his team for their support for many of our projects. We also thank Rory Loeb for his contribution to the amazing book cover design and many other graphic elements of the book.

We would also like to thank Catherine Kilkenny and Teresa Saffaie for tremendous help with copyediting as chapters were being worked on. Randy Fernando, the editor of the previous GPU Gems books, shared his wealth of experience acquired in producing those volumes.

We are grateful to Kurt Akeley for writing our insightful and forward-looking foreword.

(35)

managed this project to completion before handing the marketing aspect to Curt Johnson. Christopher Keane did fantastic work on the copyediting and typesetting.

The support from many executive staff members from NVIDIA was critical to this endeavor: Tony Tamasi and Dan Vivoli continually value the creation of educational material and provided the resources necessary to accomplish this project.

We are grateful to Jen-Hsun Huang for his continued support of the GPU Gems series and for creating an environment that encourages innovation and

teamwork.

We also thank everyone at NVIDIA for their support and for continually

building the technology that changes the way people think about computing.

(36)

Contributors

Thomas Alexander, Polytime

Thomas Alexander cofounded Exapath, a startup focused on mapping

networking algorithms onto GPGPUs. Previously he was at Juniper Networks working in the Infrastructure Product Group building core routers. Thomas has a Ph.D. in electrical engineering from Duke University, where he also worked on a custom-built parallel machine for ray casting.

Kavita Bala, Cornell University

Kavita Bala is an assistant professor in the Computer Science Department and Program of Computer Graphics at Cornell University. Bala specializes in

scalable rendering for high-complexity illumination, interactive global

illumination, perceptually based rendering, and image-based texturing. Bala has published research papers and served on the program committees of several conferences, including SIGGRAPH. In 2005, Bala cochaired the Eurographics Symposium on Rendering. She has coauthored the graduate-level textbook Advanced Global Illumination, 2nd ed. (A K Peters, 2006). Before Cornell, Bala received her S.M. and Ph.D. from the Massachusetts Institute of Technology, and her B.Tech. from the Indian Institute of

Technology Bombay.

Kevin Bjorke, NVIDIA Corporation

Kevin Bjorke is a member of the Technology Evangelism group at NVIDIA, and continues his roles as editor and contributor to the previous volumes of GPU

Gems. He has a broad background in production of both live-action and

(37)

games. Kevin has been a regular speaker at events such as SIGGRAPH and GDC since the mid-1980s. His current work focuses on applying NVIDIA's horsepower and expertise to help developers fulfill their individual ambitions.

Jean-Yves Blanc, CGGVeritas

Jean-Yves Blanc received a Ph.D. in applied mathematics in 1991 from the Institut National Polytechnique de Grenoble, France. He joined CGG in 1992, where he introduced and developed parallel processing for high-performance computing seismic applications. He is now in charge of IT strategy for the Processing and Reservoir product line.

Jim Blinn, Microsoft Research

Jim Blinn began doing computer graphics in 1968 while an undergraduate at the University of Michigan. In 1974 he became a graduate student at the University of Utah, where he did research in specular lighting models, bump mapping, and environment/reflection mapping and received a Ph.D. in 1977. He then went to JPL and produced computer graphics animations for various space missions to Jupiter, Saturn, and Uranus, as well as for Carl Sagan's PBS series "Cosmos" and for the Annenberg/CPB-funded project "The Mechanical Universe," a 52-part telecourse to teach college-level physics. During these productions he developed several other techniques, including work in cloud simulation, displacement mapping, and a modeling scheme variously called

blobbies or metaballs. Since 1987 he has written a regular column in the IEEE Computer Graphics and Applications journal, where he describes mathematical

techniques used in computer graphics. He has just published his third volume of collected articles from this series. In 1995 he joined Microsoft Research as a Graphics Fellow. He is a MacArthur Fellow, a member of the National Academy of Engineering, has an honorary Doctor of Fine Arts degree from Otis Parsons School of Design, and has received both the SIGGRAPH Computer Graphics Achievement Award (1983) and the Steven A. Coons Award (1999).

(38)

George Borshukov is a CG supervisor at Electronic Arts. He holds an M.S. from the University of California, Berkeley, where he was one of the creators of The

Campanile Movie and real-time demo (1997). He was technical designer for the

"bullet time" sequences in The Matrix (1999) and received an Academy Scientific and Technical Achievement Award for the image-based rendering technology used in the film. Borshukov led the development of photoreal digital actors for The Matrix sequels (2003) and received a Visual Effects

Society Award for the design and application of the Universal Capture system in those films. Other film credits include What Dreams May Come (1998),

Mission: Impossible 2 (2000), and Michael Jordan to the Max (2000). He is also

a co-inventor of the UV pelting approach for parameterization and seamless texturing of polygonal or subdivision surfaces. He joined Electronic Arts in 2004 to focus on setting a new standard for facial capture, animation, and rendering in next-generation interactive entertainment. He conceived the

Fight Night Round 3 concept and the Tiger Woods tech demos presented at

Sony's E3 events in 2005 and 2006.

Tamy Boubekeur, LaBRI–INRIA, University of Bordeaux

Tamy Boubekeur is a third-year Ph.D. student in computer science at INRIA in Bordeaux, France. He received an M.Sc. in computer science from the

University of Bordeaux in 2004. His current research focuses on 3D geometry processing and real-time rendering. He has developed new algorithms and data structures for the 3D acquisition pipeline, publishing several scientific papers in the fields of efficient processing and interactive editing of large 3D objects, hierarchical space subdivision structures, point-based graphics, and real-time surface refinement methods. He also teaches geometric modeling and virtual reality at the University of Bordeaux.

(39)

Ralph Brunner graduated from the Swiss Federal Institute of Technology (ETH) Zürich with an M.Sc. degree in computer science. He left the country after the bear infestation made the major cities uninhabitable and has been working in California on the graphics stack of Mac OS X since then.

Iain Cantlay, NVIDIA Corporation

Iain started his career in flight simulation, when 250 polys per frame was state of the art. With the advent of consumer-level 3D hardware, he moved to

writing game engines, with published titles including Machines and MotoGP 3. In 2005 he moved to the Developer Technology group at NVIDIA, which is the perfect place to combine his passions for games and 3D graphics.

Ignacio Castaño Aguado, NVIDIA Corporation

Ignacio Castaño Aguado is an engineer in the Developer Technology group at NVIDIA. When not playing Go against his coworkers or hiking across the Santa Cruz Mountains with his son, Ignacio spends his time solving computer

graphics problems that fascinate him and helping developers take advantage of the latest GPU technology. Before joining NVIDIA, Ignacio worked for several game companies, including Crytek, Relic Entertainment, and Oddworld

Inhabitants.

(40)

Mark Colbert is a Ph.D. student at the University of Central Florida working in the Media Convergence Lab. He received both his B.S. and his M.S. in

computer science from the University of Central Florida in 2004 and 2006. His current research focuses on user interfaces for interactive material and

lighting design.

Keenan Crane, University of Illinois

Keenan recently completed a B.S. in computer science at the University of Illinois at Urbana-Champaign, where he did research on GPU algorithms, mesh parameterization, and motion capture. As an intern on the NVIDIA Demo

Team, he worked on the "Mad Mod Mike" and "Smoke in a Box" demos. His foray into graphics programming took place in 1991 at Nishimachi

International School in Tokyo, Japan, where he studied the nuances of the

LogoWriter turtle language. This summer he will travel to Kampala, Uganda, to participate in a service project through Volunteers for Peace.

Eugene d'Eon, NVIDIA Corporation

Eugene d'Eon has been writing demos at NVIDIA since 2000, when he first joined the team as an intern, spending three months modeling, rigging, and rotoscoping the short film "Luxo Jr." for a real-time demo that was only shown once. After quickly switching to a more forgiving programming position, he has since been employing the most mathematical, overly sophisticated models

available to solve the simplest of shading and simulation problems in NVIDIA's real-time demos. He constantly struggles between writing a physically correct shader and just settling for what "looks good." Eugene received an Honours B.Math. from the University of Waterloo, applied mathematics and computer science double major, and is occasionally known for his musical abilities (piano and Guitar Hero) and ability to juggle "Eric's Extension." Research interests include light transport, scattering, reflectance models, skin shading, theoretical physics, and mathematical logic. He never drives faster than c, and unlike

most particles in the universe, neither his position nor his momentum can be known with any certainty. He never votes for someone who doesn't have a

(41)

clear stance on the Axiom of Choice. Eugene uses Elixir guitar strings.

Bernard Deschizeaux, CGGVeritas

Bernard Deschizeaux received a master's degree in high energy physics in 1988 and a Ph.D. in particle physics in 1991. Since then he has worked for CGG, a French service company for the oil and gas industry, where he applies his high-performance computing skills and physics knowledge to solve seismic processing challenges. His positions within CGG have varied from development to high-performance computing and algorithm research. He is now in charge of a GPGPU project developing an industrial solution based on GPU clusters.

Franck Diard, NVIDIA Corporation

Franck Diard is a senior software architect at NVIDIA. He received a Ph.D. in computer science from the University of Nice Sophia Antipolis (France) in 1998. Starting with vector balls and copper lists on Amiga in the late 1980s, he then programmed on UNIX for a decade with Reyes rendering, ray tracing, and computer vision before transitioning to Windows kernel drivers at NVIDIA. His interests have always been around scalability (programming multi-core, multi-GPU render farms) applied to image processing and graphics rendering. His main contribution to NVIDIA has been the SLI technology.

Frank Doepke, Apple

After discovering that one can make more people's lives miserable by writing buggy software than becoming a tax collector, Frank Doepke decided to

become a software developer. Realizing that evil coding was wrong, he set sail from Germany to the New World and has since been tracking graphic gems at Apple.

(42)

Henrik Dohlmann, 3Dfacto R&D

From 1999 to 2002, Henrik Dohlmann worked as a research assistant in the Image Group at the Department of Computer Science, University of

Copenhagen, from which he later received his Cand. Scient. degree in

computer science. Next, he took part in an industrial collaboration between the 3D-Lab at Copenhagen University's School of Dentistry and Image House. He moved to 3Dfacto R&D in 2005, where he now works as a software engineer.

Bryan Dudash, NVIDIA Corporation

Bryan entered the games industry in 1997, working for various companies in Seattle, including Sierra Online and Escape Factory. He has a master's degree from the University of Washington. In 2003 he joined NVIDIA and began

teaching (and learning) high-end, real-time computer graphics. Having studied Japanese since 2000, Bryan convinced NVIDIA in 2004 to move him to Tokyo, where he has been supporting APAC developers ever since. If you are ever in Tokyo, give him a ring.

Kenny Erleben, University of Copenhagen

In 2001 Kenny Erleben received his Cand. Scient. degree in computer science from the Department of Computer Science, University of Copenhagen. He then worked as a fulltime researcher at 3Dfacto A/S before beginning his Ph.D.

studies later in 2001. In 2004 he spent three months at the Department of Mathematics, University of Iowa. He received his Ph.D. in 2005 and soon thereafter was appointed assistant professor at the Department of Computer Science, University of Copenhagen.

(43)

Ryan has been a pioneer in music visualization for many years. While working at Nullsoft, he wrote many plug-ins for Winamp, most notably the popular

MilkDrop visualizer. More recently, he spent several years as a member of the NVIDIA Demo Team, creating the "GeoForms" and "Cascades" demos and doing other GPU research projects.

Nolan Goodnight, NVIDIA Corporation

Nolan Goodnight is a software engineer at NVIDIA. He works in the CUDA software group doing application and driver development. Before joining

NVIDIA he was a member of the computer graphics group at the University of Virginia, where he did research in GPU algorithms and approximation methods for rendering with precomputed light transport. Nolan's interest in the

fundamentals of computer graphics grew out of his work in geometric modeling for industrial design. He holds a bachelor's degree in physics and a master's degree in computer science.

Larry Gritz, NVIDIA Corporation

Larry Gritz is director and chief architect of NVIDIA's Gelato software, a

hardware-accelerated film-quality renderer. Prior graphics work includes being the author of BMRT; cofounder and vice president of Exluna, Inc. (later

acquired by NVIDIA), and lead developer of their Entropy renderer; head of Pixar's rendering research group; a main contributor to PhotoRealistic

RenderMan; coauthor of the book Advanced RenderMan: Creating CGI for

Motion Pictures; and occasional technical director on several films and

commercials. Larry has a B.S. from Cornell University and an M.S. and Ph.D. from The George Washington University.

(44)

John Hable, Electronic Arts

John Hable is a rendering engineer at Electronic Arts. He graduated from Georgia Tech with a B.S. and M.S. in computer science, where he solved the problem of reducing the rendering time of Boolean combinations of triangle meshes from exponential to quadratic time. His recent work focuses on the compression problems raised by trying to render high-quality facial animation in computer games. Currently he is working on a new EA title in Los Angeles.

Earl Hammon, Jr., Infinity Ward

Earl Hammon, Jr., is a lead software engineer at Infinity Ward, where he assisted a team of talented developers to create the multiplatinum and

critically acclaimed titles Call of Duty 2 and Call of Duty. He worked on Medal of

Honor: Allied Assault prior to becoming a founding member of Infinity Ward.

He graduated from Stanford University with an M.S. in electrical engineering, preceded by a B.S.E.E. from the University of Tulsa. His current project is Call

of Duty 4: Modern Warfare.

Takahiro Harada, University of Tokyo

Takahiro Harada is an associate professor at the University of Tokyo. He received an M.S. in engineering from the University of Tokyo in 2006. His current research interests include physically based simulation, real-time simulation, and general-purpose GPU computation.

(45)

Mark Harris is a member of the Developer Technology team at NVIDIA in

London, working with software developers all over the world to push the latest in GPU technology for graphics and high-performance computing. His primary research interests include parallel computing, general-purpose computation on GPUs, and physically based simulation. Mark earned his Ph.D. in computer science from the University of North Carolina at Chapel Hill in 2003 and his B.S. from the University of Notre Dame in 1998. Mark founded and maintains

www.GPGPU.org, a Web site dedicated to general-purpose computation on GPUs.

Evan Hart, NVIDIA Corporation

Evan Hart is a software engineer in the Developer Technology group at NVIDIA. Evan got his start in real-time 3D in 1997 working with visual

simulations. Since graduating from The Ohio State University in 1998, he has worked to develop and improve techniques for real-time rendering, having his hands in everything from games to CAD programs, with a bit of drivers on the side. Evan is a frequent speaker at GDC and he has contributed to chapters in the Game Programming Gems and ShaderX series of books.

Milo Ha an, Cornell University

Milo Ha an graduated with a degree in computer science from Comenius University in Bratislava, Slovakia. Currently he is a Ph.D. student in the Computer Science Department at Cornell University. His research interests include global illumination, GPU rendering, and numerical computations.

(46)

Jared Hoberock is a graduate student at the University of Illinois at Urbana-Champaign. He has worked two summers at NVIDIA as an intern and is a two-time recipient of the NVIDIA Graduate Fellowship. He enjoys spending two-time writing rendering software.

Lee Howes, Imperial College London

Lee Howes graduated with an M.Eng. in computing from Imperial College London in 2005 and is currently working toward a Ph.D. at Imperial. Lee's research relates to computing with FPGAs and GPUs and has included work with FFTs and financial simulation. As a distraction from education and to

dabble in the realms of reality, Lee has worked briefly with Philips and NVIDIA.

Yuntao Jia, University of Illinois at Urbana-Champaign

Yuntao Jia is currently pursuing a Ph.D. in computer science at the University of Illinois at Urbana-Champaign. He is very interested in computer graphics, and his current research interests include realistic rendering (especially on the GPU), video and image processing, and graph visualizations.

Alexander Keller, Ulm University

Alexander Keller studied computer science at the University of Kaiserslautern from 1988 to 1993. He then joined the Numerical Algorithms Group at the same university and defended his Ph.D. thesis on Friday, the 13th of June,

(47)

1997. In 1998 he was appointed scientific advisor of mental images. Among four calls in 2003, he chose to become a full professor for computer graphics at the University of Ulm in Germany. His research interests include quasi-Monte Carlo methods, photorealistic image synthesis, ray tracing, and scientific

computing. His 1997 SIGGRAPH paper "Instant Radiosity" can be considered one of the roots of GPGPU computing.

Alexander Kharlamov, NVIDIA Corporation

Alex is an undergraduate in the Department of Computational Mathematics and Cybernetics at the Moscow State University. He became interested in video games at the age of ten and decided that nothing else interested him that much. Currently he works as a member of NVIDIA's Developer Technology team implementing new techniques and effects for games and general-purpose computation on GPUs.

Peter Kipfer, Havok

Peter Kipfer is a software engineer at Havok, where he works as part of the Havok FX team that is pioneering work in large-scale real-time physics

simulation in highly parallel environments, such as multi-core CPUs or GPUs. He received his Ph.D. in computer science from the Universität of Erlangen-Nürnberg in 2003 for his work in the KONWIHR supercomputing project. He also worked as a postdoctoral researcher at the Technische Universität

München, focusing on general-purpose computing and geometry processing on the GPU.

Rusty Koonce, NCsoft Corporation

(48)

in physics. He has worked on multiple shipped video game titles across a wide range of platforms, including console, PC, and Mac. Computer graphics has held his interest since his first computer, a TRS-80. Today he calls Austin, Texas, home, where he enjoys doing his part to "Keep Austin Weird."

Kees van Kooten, Playlogic Game Factory

Kees van Kooten is a software developer for Playlogic Game Factory. In 2006 he graduated summa cum laude for his master's degree at the Eindhoven

University of Technology. The result of his master's project can be found in this book. His interests are closely related to the topics of his master's research: 3D graphics and real-time simulations. After working hours, Kees can often be found playing drums with "real" musicians.

Jaroslav K ivánek, Czech Technical University in Prague

Jaroslav K ivánek is an assistant professor at the Czech Technical University in Prague. He received his Ph.D. from IRISA/INRIA Rennes and the Czech

Technical University (joint degree) in 2005. In 2003 and 2004 he was a

research associate at the University of Central Florida. He received a master's in computer science from the Czech Technical University in Prague in 2001.

Bunny Laden, Apple

Bunny Laden graduated from the University of Washington with a Special Individual Ph.D. in cognitive science and music in 1989. She joined Apple in 1997, where she now writes documentation for Quartz, Core Image, Quartz Composer, and other Mac OS X technologies. She coauthored Programming

with Quartz (Morgan Kaufmann, 2006) and Learning Carbon (O'Reilly, 2001).

(49)

musical acoustics, and other assorted topics.

Andrew Lauritzen, University of Waterloo

Andrew Lauritzen recently received his B.Math. in computer science and is now completing a master's degree in computer graphics at the University of

Waterloo. To date, he has completed a variety of research in graphics, as well as theoretical physics. His current research interests include lighting and

shadowing algorithms, deferred rendering, and graphics engine design. Andrew is also a developer at RapidMind, where he works with GPUs and other high-performance parallel computers.

Scott Le Grand, NVIDIA Corporation

Scott is a senior engineer on the CUDA software team at NVIDIA. His previous commercial projects include the game BattleSphere for the Atari Jaguar;

Genesis, the first molecular modeling system for home computers, for the Atari ST; and Folderol, the first distributed computing project targeted at the protein folding problem. Scott has been writing video games since 1971, when he

played a Star Trek game on a mainframe and he was instantly hooked. In a former life, he picked up a B.S. in biology from Siena College and a Ph.D. in biochemistry from The Pennsylvania State University. In addition, he wrote a chapter for ShaderX and coedited a book on computational methods of protein structure prediction.

Ignacio Llamas, NVIDIA Corporation

Ignacio Llamas is a software engineer in NVIDIA's Developer Technology group. Before joining NVIDIA, Ignacio was a Ph.D. student at Georgia Tech's College of Computing, where he did research on several topics within computer

(50)

graphics. In addition to the exciting work he does at NVIDIA, he also enjoys snowboarding.

Charles Loop, Microsoft Research

Charles Loop works for Microsoft Research in Redmond, Washington. He received an M.S. in mathematics from the University of Utah in 1987 and a Ph.D. in computer science from the University of Washington in 1992. His

graphics research has focused primarily on the representation and rendering of smooth free-form shapes, including subdivision surfaces, polynomial splines and patches, and algebraic curves and surfaces.

Charles also works on interactive modeling and computer vision techniques. Lately, his efforts have gone into GPU algorithms for the display of curved objects.

Tristan Lorach, NVIDIA Corporation

Since graduating in 1995 with a master's in computer science applied on art and aesthetic, Tristan Lorach has developed a series of 3D real-time

interactive installations for exhibitions and events all over the world. From the creation of a specific engine for digging complex galleries into a virtual solid, to the conception of new 3D human interfaces for public events, Tristan has

always wanted to fill the gap between technology and artistic or ergonomic ideas. Most of his projects (such as "L'homme Transformé" and "Le Tunnel sous l'Atlantique") were presented in well-known exhibition centers like Beaubourg and Cité des Sciences in Paris. Now Tristan works at NVIDIA on the Technical Developer Relations team, based in Santa Clara, California.

(51)

David Luebke is a research scientist at NVIDIA. He received an M.S. and Ph.D. in computer science in 1998 from the University of North Carolina under

Frederick P. Brooks, Jr., following a B.A. in chemistry from the Colorado College. David spent eight years on the faculty of the University of Virginia before leaving in 2006 to help start the NVIDIA Research group. His research interests include real-time rendering, illumination models, and graphics

architecture.

Kenny Mitchell, Electronic Arts

Kenny is a lead engine programmer at Electronic Arts' UK Studio. His Ph.D. introduced the use of real-time 3D for information visualization on consumer hardware, including a novel recursive perspective projection technique. Over the past ten years he has shipped games using high-end graphics technologies including voxels, PN patches, displacement mapping and clipmaps. In between shipping games for EA's flagship Harry Potter franchise, he is also involved in developing new intellectual properties.

Jefferson Montgomery, Electronic Arts

Jefferson Montgomery holds a B.A.Sc. in engineering physics and an M.Sc. in computer science from the University of British Columbia. He is currently a member of the World Wide Visualization Group at Electronic Arts, tasked with adapting advanced techniques to the resource constraints faced by current game teams and producing real-time demonstrations such as those at Sony's E3 presentations in 2005 and 2006.

Kevin Myers, NVIDIA Corporation