Rendering Techniques
2.1 Quick Noise for GPUs
DOUG E. SMITH SCHOOL OF ANIMATION, ARTS & DESIGN, SHERIDAN COLLEGE, OAKVILLE, ON, CANADA
INTRODUCTION
Ken Perlin’s Noise function [Perlin85] is an invaluable primitive for producing controlled randomness in procedural shaders. For real-time purposes, developers have had to choose between the artifacts of a fast, bilinear texture lookup or a slower, high-quality implementation [Green05]. In this article, we present faster alternatives that are only limited in precision by the underlying hardware.
B
ACKGROUNDPerlin Noise is an example of gradient noise. Space is divided into cubical cells with the boundaries defined at integer coordinates. The corners of the cells are set to a value of zero, and their coordinates are hashed to assign a pseudo-random gradient vector. Due to those shared gradients, the value of noise varies smoothly within and across the cells.
The regular zero crossings give noise a vaguely sinusoidal character. The downside is that it can show up as visible grid artifacts. In practice, these artifacts can usually be obscured.
In [Perlin02] the original noise function was improved. The original 256 random gradients were replaced with just 12 gradients. These were arranged as vectors from the center of a cube to each edge’s mid-point. The other improvement was to change the cubic blending for a fifth-order function to eliminate artifacts when bump-mapping. Lastly, Perlin established a standard hash table to ensure that different implementations would produce the same results.
The steps involved in generating noise are summarized below: 1. Find which cell contains our sample point.
2. Calculate a smooth weighting factor based on the position within the cell. 3. Assign a pseudo-random gradient to each corner of the cell.
4. For each corner, calculate the dot product of the gradient and the relative sample position. 5. Linearly interpolate the products using the weighting factor.
By design, texturing hardware is efficient at fetching vectors (colors) and linearly interpolating them. If we could postpone calculating the dot products, an implementation could take full advantage of the fast bilinear filtering present in GPUs.
M
ATH TO THER
ESCUEFor simplicity, we will look at the 2D case only in Figure 2.1.1.
FIGURE 2.1.1 The geometry of a cell. On the left are the corners’ gradient vectors (G) at half scale and the sample position (P). On the right are the relative position vectors for each corner (P). Notice that they can all be expressed in terms of the fractional position vector (F).
Expanding and factoring out the fractional position results in Equation 2.1.3.
All the elements are arranged in terms of a single bilinear interpolation. However, another channel is now required for the extra terms. Define new vectors like so:
Giving our final result:
By using bilinear texture interpolation at the weighted point, the resulting gradient (with extra terms) dotted with the fractional position evaluates to the exact same noise value. The 3D case is similar, except it uses trilinear interpolation. Also, the gradients have four components, and the extra channel must account for Z.
A
PPLYINGI
T IN THER
EALW
ORLDUnfortunately, the extra terms complicate our implementation. We can put them in the alpha channel of a texture, but the corner diagonally across from the cell origin contains the sum of all the gradient channels. Fortunately, the 12 gradient vectors can never sum outside the range (−2, 2). By halving the range of the alpha channel, we can match the range of the other channels and only need to rescale it in the shader.
The other complication is that the alpha channel contents vary, depending on which cell we are within. This prevents the edges of adjacent cells from overlapping. The simplest solution is to double the dimensions of our texture so that each cell is independent of its neighbors as shown in Figure 2.1.2.
FIGURE 2.1.2 Relation between pixels (light gray) and cells (black).
Here is the QuickNoiseSmall HLSL function using a volume of 32×32×32 cells with the result in Figure 2.1.3.
s am pl er3D s m pNoi s eS m al l ; fl oat CE LLS = 32;
fl oat Qui c kNoi s eS m al l (fl oat3 texCoord) {
// Cal c ul ate c oords fl oat3 i ntCoord = fl oor(texCoord); fl oat3 frac Coord = frac (texCoord);
fl oat3 s m oothCoord = s m ooths tep(fl oat3(0,0,0), fl oat3(1,1,1), frac Coord); fl oat3 s am pl eCoord = (i ntCoord + 0.5*s m oothCoord)/CE LLS + 0.75/CE LLS ; // s am pl e texture
return dot(gradi entV ec , fl oat4(frac Coord,-2.0)); }
The resulting function compiles to just 9 arithmetic instructions and 1 texture access, occupying a total of 13 instruction slots. In comparison, Green’s optimized function requires 42 arithmetic instructions and 9 texture accesses and occupies 58 instruction slots.
FIGURE 2.1.3 Slices of noise on the XY plane. Precomputed value Noise (left), Quick Noise (center), and Green’s implementation (right). At this scale, Quick Noise and Green are virtually indistinguishable. The linear artifacts of value Noise are particularly ugly, with only 2×2×2
samples per cell.
A
NDN
OW THEB
ADN
EWSOlder GPUs have limited precision when filtering texels, and this is noticeable when Quick Noise is sampled at low frequencies. Banding is visible and tends to accentuate the inherent grid artifacts found in Perlin Noise. The newest GPUs are capable of higher-precision filtering, which should eliminate the banding.
FIGURE 2.1.4 Artifacts in a close-up of Quick Noise (left) and the scaled difference (×16) from the reference (right). Cell boundaries and banding may be noticeable at low frequencies.
Another serious limitation is the small size of the volume that can be represented. The full volume that Perlin supports is 256×256×256. This has proven to be large enough to avoid repeating patterns in most situations. Although the smaller 32×32×32 volume is identical to the subset of the noise volume it represents, it is far too small, and the repetition is very obvious at higher frequencies.
I
MPLEMENTATION, THES
EQUELAt first glance, such a small number of gradients seem inadequate to guarantee randomness. A quick check shows that those 12 gradients give us 128—or approximately 430 million unique cells. Since there are only some 16.8 million cells in the entire noise volume, there are clearly more than enough permutations.
However, on a single plane, we end up with 124, or only 20,736, combinations. The randomness suffers, but it is a great opportunity. We can easily make a 2D texture atlas of all those combinations. The full 3D volume can be implemented by sampling both the front and back planes of the current cell and linearly interpolating between those samples as shown in Figure 2.1.5.
Given that perm is a table of random values and (X,Y,Z) is the origin of the cell, Perlin’s hash takes the form:
This complete hash can be calculated through texture lookups. The first texture is accessed at the X and Y coordinates and returns the starting offsets within the permutation table. One lookup will suffice since all four corners of the current cell are packed in the four channels of the texture. The Z value is then added to all four channels. One pair at a time, we use those to index into a second texture. The value returned is either the U or V coordinate of the appropriate cell in the texture atlas.
Since the Z offset is the last to affect the permutation, the corners at Z + 1 will always be the adjacent values in the table. This is exploited to return the coordinates for the Z + 1 cell at the same time as the Z cell coordinates.
s am pl er2D s m pP erm X Y ; s am pl er2D s m pP erm Z; s am pl er2D s m pCel l A tl as ;
fl oat Qui c kNoi s eFul l ( fl oat3 texCoord ) { // P rep c oords
fl oat3 i ntCoord = fl oor(texCoord);
fl oat3 frac Coord = frac (texCoord);
fl oat3 s m oothCoord = s m ooths tep(fl oat3(0,0,0), fl oat3(1,1,1),
frac Coord);
// Look-up c oords i n textures
fl oat4 pac kedOffs etX Y = tex2D(s m pP erm X Y , (i ntCoord.xy/256+0.5/256));
fl oat4 pl us Z1 = pac kedOffs etX Y + i ntCoord.z/256;
fl oat2 pl us Z2 = pl us Z1.wz; // us i ng ps 2_0 wzyx s wi zzl e
fl oat4 c oord1 = tex2D(s m pP erm Z, pl us Z1.xy);
fl oat4 c oord2 = tex2D(s m pP erm Z, pl us Z2);
c oord1.zw = c oord2.xy;
// Cal c ul ate fi nal s am pl e c oordi nates
fl oat2 s am pl eCoord0 = c oord1.xz + (s m oothCoord.xy)/512.0;
fl oat2 s am pl eCoord1 = c oord1.yw + (s m oothCoord.xy)/512.0;
// S am pl e from atl as and l erp to get gradi ent
fl oat4 gradi entV ec 0 = tex2D(s m pCel l A tl as , s am pl eCoord0);
fl oat4 gradi entV ec 1 = tex2D(s m pCel l A tl as , s am pl eCoord1);
gradi entV ec 1.w += 0.5*gradi entV ec 1.z;
fl oat4 grad = l erp(gradi entV ec 0, gradi entV ec 1, s m oothCoord.z); return dot(grad, fl oat4(frac Coord,-2.0));
}
The resulting function compiles to 18 arithmetic instructions and 5 texture accesses occupying approximately 24 instruction slots. Refer to the RenderMonkey and shader files on the DVD-ROM for further details on the shader implementation and texture creation.
RESULTS
The Quick Noise functions occupy far fewer instruction slots than Green. This allows the implementation of procedural shaders on old or limited hardware or more complex shaders on new hardware.
Noise is an ideal primitive for imitating the results of natural processes that would be too tedious to paint by hand. It may also be used to augment traditional texturing by adding detail to otherwise identical assets or in extreme close-ups. The need for very-high-resolution textures is reduced, and more assets can fit in the limited graphics memory.
Noise is also volumetric and does not require unwrapped texture coordinates. It is bandwidth-limited and may be procedurally filtered to avoid aliasing artifacts.
Table 2.1.1 Average values from DirectX preview window in AMD RenderMonkey 1.71. Measured on an IBM Thinkpad T41p with 128MB ATI Mobility FireGL T2 (Radeon 9600 class)
F
UTUREW
ORKdate, an elegant way of finding noise derivatives has proven elusive. Shader model 4 implementations have not been explored.
R
EFERENCES[Green05] Green, Simon. 2005. “Implementing Improved Perlin Noise.” In GPU Gems 2, edited by Matt Phar, pp. 409–416. Addison-Wesley. [Perlin02] Perlin, Ken. 2002. “Improving Noise.” In Computer Graphics (Proceedings of SIGGRAPH 2002), pp. 681–682.