Rendering Techniques
O PTIMIZING S OFT P ARTICLES
In order to minimize the performance impact of adding soft particles to an existing engine, we need to do as much work outside of the pixel shader as possible. Therefore, we must minimize the number of additional instructions and make sure the texture hit is as small as possible. Thankfully, the depth buffer is sampled in a very sequential fashion, so it should make good use of the texture cache. However, even better use of the GPU’s texture cache can be achieved by using a downsampled version of the depth buffer.
A good way to down-sample the depth buffer is to render a screen-aligned quad with a viewport that is half the size along both the X and Y axes using a pixel shader that picks the min or max of the four adjacent texels.
With the down-sampling of the depth buffer, another optimization comes from transforming the depth values to view space Z values during the down-sampling of the depth buffer. This will avoid the transformation in the particle pixel shader.
Now the added instructions to our particle shader have been greatly reduced, and the depth buffer texture has been reduced to a quarter of the original size.
The demo on the accompanying DVD-ROM contains a sample implementation of the optimized effect.
RESULTS
Below are the results of our optimization on a GeForce 8800 GT graphics card when rendering 35,000 particles at 1440×900 resolution.
Hard Particles: 4.25 ms Soft Particles: 5.5 ms
Optimized Soft Particles: 4.35 ms Down-Sampling Depth Buffer: 0.3 ms
The results show that there is virtually no added cost to rendering soft particles on this graphics card using our optimized technique besides the fixed cost of down-sampling the depth buffer.
Here are the results for a similar scene running at 1440×900 resolution on a much weaker GPU, the GeForce 8400.
Hard Particles: 30 ms Soft Particles: 60 ms
Optimized Soft Particles: 42 ms Down-Sampling Depth Buffer: 1.8 ms
There is a significant cost to creating soft particles on the GeForce 8400 GPU, but in this particular scene we saved about 18 ms by using the down-sampled, view space depth buffer.
CONCLUSION
In this article, we discussed a method to optimize soft particle rendering that is so efficient as to become nearly free. Since many other graphical effects can be optimized by using a down-sampled or view space depth buffer, adding soft-particles using the technique described in this paper
should be practical for many existing game engines.
REFERENCES
[Gilham07] Gilham, David. “Real-Time Depth-of-Field Implemented with a Post-Processing only Technique,” Shader X5, Wolfgang Engel, Ed., Charles River Media, 2007, pp. 163–175.
2.3 Simplified High-Quality Anti-Aliased Lines
STEPHEN COY ([email protected]) MICROSOFT RESEARCH
ABSTRACT
In this article, we present a method for rendering high-quality anti-aliased lines. The method is easily implemented on modern GPUs with programmable shaders. Unlike previous methods that required significant fragment processing, our new method only requires a single texture lookup and a multiply in the pixel shader. The resulting lines can be of any width, and the method allows for an arbitrary filter kernel to be applied to the edges. Properly rounded line ends also result from this method. The method is trivially extended to render points.
For this method the filter kernel is precomputed and stored as a small (typically 16×16) texture. The lines are rendered as a strip of six triangles.
A vertex shader calculates the locations of the vertices based on the line endpoints and the line width. The filter texture is sampled to provide the coverage amount for each pixel.
INTRODUCTION
Our primary motivation for the creation of this tool was to re-create the classic Atari game Battlezone. Released in 1980, Battlezone was one of the earliest 3D arcade games (if not the first). Battlezone used a vector display for its graphics, resulting in beautifully smooth wireframe images. In order to reproduce this level of quality on a conventional raster display, we needed to find a way to generate anti-aliased lines that approach the quality of a vector display. Previous work [McNamara et al. 00] approached the problem by rendering a box around the line.
For each pixel in this box the distance from the pixel center to the line is calculated, and the result is used to index into a 1D texture that represents the convolved filter weights. This requires considerable computation for each individual pixel. Their technique also required extra work to provide high-quality endpoints on the line segments.
METHOD
The key insight into the new method is that the GPU hardware already does most of what is needed. Given a properly aligned texture, no per-pixel computation needs to be done beyond the built-in texture coordinate iteration and texture sampling.
The benefits of our approach are:
High-quality lines Exact line widths
2D vector display lines that integrate properly with a 3D, solid shaded world
Low shader cost (about 34 instructions in the vertex shader and a single texture sample in the fragment shader)
TEXTURE CREATION
Unlike McNamara et al.’s 1D texture, we use a 2D texture that represents a quarter circle of our desired end point. Figure 2.3.1 shows the texture used for the sample code and a close-up of it in use.
FIGURE 2.3.1 The sample code’s texture.
During rendering, the texture is flipped and stretched as needed to fit the shape of the line. By using a 2D texture rather than a 1D ramp, we ensure that the end points can be properly rendered without any extra cost. For the sample implementation, we use a SmoothStep [Upstill 90]
function to create the falloff for the edge of the line. Any function can be used. Since this is done as a preprocessing step, the curve can be as complicated as required. We chose the SmoothStep function because the results look good enough and it is quite simple to calculate. In our testing we found that even a linear falloff preformed well visually. Because of the uncertainties involved in understanding how individual monitor gamma curves would affect the result, it was deemed fruitless to try to be any more exact with the solution. Taking the next step and animating the lines then brings into play the rise and decay curves of the pixels on the display device, making the quality of the solution dependent upon the individual device and the speed of the animation. In other words, even though this solution allows for arbitrarily complex falloff curves, you are probably better off just using whatever looks good.
In order to create accurate lines, you must understand the relationship between the filter texture, the desired line width, and the width of the rendered quad. Looking at the filter as a 1D slice, this relationship can be easily shown. For example, Figure 2.3.2 shows how a filter texture would be created for a two-pixel-wide line.
FIGURE 2.3.2 Generation of the filter texture for a two-pixel-wide line.
In this example, the applied filter had a width equal to the pixel width. By increasing this width, the lines can be made to appear softer. This example also shows the pre-filtered pixel width to be exactly half of the texture size. This is not required and is not always desirable. The example shown in Figure 2.3.3 would work fine for half-pixel-wide lines.
FIGURE 2.3.3 Generation of the filter texture for a half-pixel-wide line.
One caveat that might not be immediately obvious is that mipmaps should not be used in the shader code when sampling the texture. With bilinear filtering, only the filtered texture represents a piece-wise linear approximation of the ideal samples. If mipmapping is employed, then the results will be erroneous.
The ability to employ arbitrary curves also opens the possibility of using the texture to create multiple stroked lines at no extra cost.