2.3 Shadow Depth Maps
2.4.2 Shadow Volumes on the GPU
Heidmann [236] is one of the first to present a GPU implementation of the original Crow [114] approach. This is referred as the z-pass approach. The exact steps to achieve z-pass shadow volumes include
1. Enable the depth buffer in order to render the scene (without shadows) onto the color buffer.
2. Clear the stencil buffer and disable depth buffering.
3. Draw all front-facing shadow polygons, and increment the shadow count for the relevant pixels in the stencil buffer.
4. Draw all back-facing shadow polygons and decrement the shadow count for the relevant pixels in the stencil buffer.
5. Darken the pixels in the color buffer where the stencil values are not equal to 0.
The above steps, however, do not properly handle the combination of ambient, diffuse, and specular terms. An alternate would be to change steps 1 and 5:
1. Enable the depth buffer in order to render the scene without shadows but only with the ambient term onto the color buffer.
5. Draw the diffuse and specular terms for the pixels when the stencil value is equal to 0.
Unfortunately when using the GPU, shadow polygons must be clipped against the view frustum, thus potentially introducing erroneous initial shadow counts for some of the image pixels if the camera view is inside the scene. One simple way to resolve this is to compute the value Esvcfor the camera point in terms of
C C
L L
occluder occluder
clip plane clip plane
shadow polygons shadow polygons
Figure 2.23.Potential errors in the initialization of a single camera shadow count when the near clipping plane intersects some shadow polygons.
its initial shadow count, then initialize the stencil buffer with Esvcinstead of 0. This method is fairly accurate unless the camera is deep inside the scene and the near clipping plane intersects a shadow polygon. For example, Figure 2.23(left) shows that a single initial shadow count is valid, but this is not the case forFig- ure 2.23(right), where the shadow polygon cuts across the near clipping plane and the initial shadow counts are different when the near clipping plane intersects the shadow polygon. See Figure 2.24(left) for shadow errors when a single shadow count is assumed, andFigure 2.24(right) illustrates the correct shadow result.
Some authors [37, 393, 48] propose to cap the shadow volumes with new shadow polygons where the shadow volumes are clipped against the view frustum’s near clipping plane. Everitt and Kilgard [165] explain why it is difficult to robustly implement this capping technique that needs to account for all the different combi- nations. However, ZP+ shadow volumes [249] manage to properly initialize all the shadow counts in the stencil buffer by rendering the scene that resides between the light and the camera near clip plane from the perspective of the light. Numerical precision issues can occur when the light or an object is very close to the camera near clipping plane or an object. Z-fail algorithms and per-triangle shadow vol- umes (both described in the remainder of this section) seem more appropriate for robust behaviors.
Figure 2.24. Shadow errors due to a single shadow count (left), with the correct shadows (right). Image posted by Aidan Chopra, courtesy of Google Sketchup and Phil Rader.
Z-fail Algorithms
Another set of solutions [50, 81, 165] to deal with a correct initial shadow count starts from the observation that this point-in-shadow-volume test need not be evaluated along the line of sight, but can also be evaluated from the visible surface to infinity instead (called z-fail test). Thus, the shadow count for the viewpoint is simply initialized to 0: occluded back-facing shadow polygons increment their shadow counts, occluded front-facing ones decrement them, and shadow polygons need not be clipped by the near plane. A shadow count greater than 0 indicates shadowing. As can be seen inFigure 2.25(right), the z-fail count starts from the right at 0; then, going left, it ends up at 2 when it hits the point P to be shaded. Note that z-fail can be considered the computation of the z-pass in the reverse di- rection, where the z-pass (Figure 2.25(left)) starts at the left (camera) at 1, going right, and ending up at 2 when it reaches P.
Note that capping shadow volumes against the rest of the view frustum is still necessary to produce correct shadow counts. Some hardware extensions [301, 165] avoid the need to correctly compute the capping with respect to the view frus- tum. This has actually become standard in both DirectX (DepthClipEnable) and OpenGL (DepthClamp). Another solution simply eliminates the far clipping pro- cess with a small penalty to depth precision; as a bonus, capping at infinity for infinite light sources is not necessary, as all shadow polygons converge to the same vanishing point.
Everitt and Kilgard [165] and Lengyel [345] realize that the z-pass approach is generally much faster than the z-fail approach. They thus detect whether the camera is within a shadow volume and employ z-pass if not, and z-fail if so. How- ever, the sudden shifts in speed difference may be disruptive to some applications. Similarly, Laine [330] determines such cases on a per-tile basis so that the z-pass approach can be used more often than the z-fail approach, while retaining correct shadows. This is done by comparing the contents of a low-resolution shadow depth map against an automatically constructed split plane.
L C z−fail P 2 1 0 occluders P L 2 1 z−pass 0 C 1 occluders shadow polygons shadow polygons
Figure 2.25.Z-pass (left) versus z-fail (right) algorithm. Both algorithms arrive at the same shadow count of 2, but come from opposite directions.
Per-triangle Shadow Volumes
Very recently, Sintorn et al. [536] revisit the shadow volume algorithm, where each scene triangle generates its shadow volume for a point light source. The main idea is to compute a hierarchical Z-buffer of the image. In other words, a hierarchy of min- max 3D boxes is constructed from hierarchical rectangular regions of the image. Pixels with their shadowing status already known (e.g., background, back-facing the light, etc.) are not part of a 3D box. Shadowing is applied as deferred shading. All four support planes of a triangle shadow volume are rasterized together, and each min-max 3D box is tested against these four planes. Usually, a box will be efficiently culled as not intersecting the triangle shadow volume. If a box is completely enclosed in the shadow volume, it is marked (and all its pixels) as in shadow. Otherwise, the box is refined at the next (more detailed) level of boxes.
The rasterization of the triangle shadow volumes is made very efficient and is computed in the homogeneous clip space, thus avoiding problems with near and far clipping planes. Because each triangle shadow volume is treated independently, the method achieves stable frame rates, and it can work with any soup of polygons. Therefore, it can handle some ill-formed geometry and does not need to prepro- cess silhouette edges. It can be extended for textured and semitransparent shadow casters.
The method, implemented in CUDA, is always more competitive than z-fail methods, and for larger resolution images and/or higher antialiasing sampling, it outperforms z-pass methods. Although it could be less efficient for scenes with high depth variance and tiny triangles, the robustness of the method, its small memory footprint, and its generality show great promise. It is also not immediately obvious if it is worthwhile to consider the optimizations mentioned in Section 2.4.3 for this approach.
Other Considerations
Brabec and Seidel [67] use the GPU to compute the actual shadow polygons to further speed up shadow computations as well as to avoid differences in numerical properties (between the CPU and GPU) that can result in some shadow-leaking problems. In fact, numerical instability issues can potentially arise due to shadow polygons created from elongated polygons, or when the normal of the polygon is almost perpendicular to the light-source direction. Care must be taken in dealing with such cases.
If the silhouette optimization techniques discussed in Section 2.4.3 are not used, then there are likely other numerical problems. In particular, because all shadow polygons (not just on the silhouettes) are created for shadow volumes, there are likely visibility problems due to the limited precision of the z-values— this problem is sometimes referred to as z-fighting.
Roettger et al. [487] replace the stencil buffer with the alpha or screen buffers (because the stencil buffer can be overloaded in its use to cause bottlenecks) and
the computations of shadow counts by blending operations, i.e., instead of incre- menting and decrementing shadow counts, they multiply and divide by 2 in the alpha buffer, respectively. As the shadow-polygon fill rate is the main bottleneck in shadow volume approaches, these two buffers can be computed at lower res- olutions (at the additional cost of a rerendering at this lower resolution), copied into textures, and the shadows can be treated by exploiting texture bilinear inter- polation. However, the basic approach may have incorrect results with specular highlights and may suffer with complex objects in terms of the performance capa- bility.