2.4 Variable-Precision Applications
2.4.1 Graphics
Variable-Precision Rendering
Hao and Varshney looked in-depth at variable-precision rendering in the geometry transform and lighting stage to accelerate 3D graphics (Hao and Varshney, 2001). It is important to note that their work focused on CPU-side rendering, so they exploited the use of MMX (a single-instruction multiple-data (SIMD) instruction set designed by Intel) instructions and operated on integer and fixed-point representations. Further, they applied their work to the fixed-function pipeline, which has fallen to the wayside with the introduction of programmable shading. However, their work provides a foun- dation upon which to build a modern exploration. First, they present a breakdown of sources of error in data sets and computations for inputs withn bits, listing worst-case errors.
1. Representation error. These are statistical and observational uncertainties. At worst, the representation error is one half bit: rep ≤ 1
2.
2. Addition error. Propagation error leads to at most one bit of lost accuracy for each addition.
3. Multiplication error. Using 2n bits to store the intermediate result, the worst case error occurs when both operands are close to 2n−1 and the representation
error is 12: one bit of accuracy can be lost during each multiplication.
4. Division error. Assuming the division is in the transformation from homogeneous coordinates to 3D image-space coordinates, the loss of accuracy is:
log2(1 +
distanceof f arplanef romeye distanceof scenevertextoeye)
Finding the total error incurred is a linear combination of errors for each operation. Working backwards from, for example, 10 bits of precision in x and y for a 1024x1024 rendering window, one can find the necessary bits at the input to guarantee 10 output bits of precision. Sub-pixel accuracy is computed by artificially enlarging the window size.
Small objects in the distance do not need as much precision as a big object in the foreground. They propose an octree-based bounding volume hierarchy (BVH) to keep track of the position of rendered items in space to take advantage of this technique. If the near and far vertices in a cell need the same number of bits to be represented accurately, then this number can be used for every vertex in the cell; otherwise, it must be split.
Spatial coherence can be exploited in 3D models by encoding neighboring vertex positions as offsets from previous positions. Temporal coherence can be similarly ex- ploited by expressing a transformed vertex as the sum of the originally transformed vertex and the original vertex transformed by the difference between the previous and current transformation matrices.
There are further sources of error in lighting operations that were not present in vertex transformations.
1. Operands with different accuracy. When two operands have different precisions, results always take on the precision of the lesser-precise operand.
2. Dot products (of unit vectors). For dot products of two three-component vectors, the results will lose one to two bits of precision.
3. Square roots. When implemented with a lookup table, the result will have nearly the same precision as the input (as long as the input is bigger than 22n−2). 4. Exponentiation. A step in the calculation of the specular component which will
incur a loss of precision of 6 bits.
Lighting computations can be treated just like spatially-coherent geometry, calculating one vertex’s lighting as an offset from a neighboring vertex’s result.
Minimum Triangle Separation
A common problem that has plagued graphics applications for years is called z-fighting, and it occurs when two triangles are (nearly) co-planar. The limited precision of the
Figure 2.3: Z-fighting in the shoreline of a frame from “Grand Theft Auto: IV.”
depth buffer cannot capture the correct rendering order across the entirety of the tri- angles. So, one triangle is rendered in front of the other triangle in some pixels, with the opposite ordering chosen for other pixels. The effect is exacerbated as the view- point moves, since the ordering is not spatially coherent. An example of z-fighting in the video game “Grand Theft Auto: IV” can be seen in Figure 2.3 (Rockstar Games, 2008). Apparent even when rendering a scene at full-precision, this problem can become worse as geometric precision is reduced.
Akeley and Su analyze the minimum triangle separation in object-space for cor- rect occlusion given a viewing environment: camera position, field of view (fov), and window coordinate precision (Akeley and Su, 2006). By beginning with a minimum triangle separation, instead, an artist can calculate a final minimum necessary buffer and geometric transform precision to use when reducing the precision of an application that utilizes their 3D models.
Their method works as follows: an uncertainty cuboid is formed for each 3D location in window coordinates, the depth of which is the numeric distance between the repre- sentable z-buffer values nearest its location, and whose width and height (identical for all cuboids in a window) are determined byb, the precision of the window coordinates. Given a traditional z-buffer, cuboids near the near plane will be shallow; those near the far plane will be deep. Conversion to eye coordinates is done by inverting the projec- tion and viewport transformations to reverse map the cuboids, which become frusta.
Parallel triangles may swap order (fight) if and only if any of their uncertainty frusta overlap. The minimum distance,Smin, is the length of the frustum’s longest diagonal.
A frustum in a screen corner will be highly sheared, meaning its diagonal will be longer than it would be at the center of the screen. This factor is labelled Kf ov— the ratio of corner-screen to center-screen diagonal length for uncertainty frusta on a given zeye plane. The minimum separation depends on all these factors—simulations show that discounting any one of them will lead to an under-prediction and possible punch-through.
Finite-precision projection, viewport, and rasterization (mapping) arithmetic can further increase the minimum precision. The authors modeled the error in these oper- ations by performing them in double precision. The contribution of this mapping error to Smin is minor due to the spatial-related error dominating the depth-related error; 10.8 fixed-point spatial precision used in the representation of window coordinatesxwin and ywin is far below that of floating-point.
Texture Mapping
Textures, or pre-computed images, are often applied to triangles to add detail that is not captured by lighting equations alone. (While texture mapping can be performed at both the vertex and pixel shader stages, I will discuss texturing at the pixel level in particular.) These textures can represent color, normal, reflectance, and many other types of information. Special fixed-function hardware is used to determine what texture element, or texel, is to be applied to a particular pixel based on that pixel’s texture coordinates, effectively an address into the texture memory. This address, though, is often a floating-point number that selects an element a fraction of the way through the data. If this address is greater than ‘1,’ either the address is clamped or the texture is treated like a periodic signal.
Since floating-point addresses do not often land precisely on a single texel and a single pixel may cover several texels, the texture mapping hardware must decide what value to return. The simplest approach the hardware can take is to choose the nearest texel; this is seldom used in practice because of its poor quality and aliasing artifacts. Instead, filtering (i.e. interpolation) is often performed. By examining the four nearest texels to the pixel’s center and performing a weighted average on their values, the tex- ture hardware can enabled smoother gradients across texel boundaries. This is referred to as bilinear filtering. Trilinear filtering, on the other hand, performs bilinear filtering on two mipmap levels (Williams, 1983) and linearly interpolates between these two
values to find a single result. This inclusion of mipmapping leads to gentle transitions when a texture is applied to triangles of varying sizes. Finally, anisotropic filtering is the highest-quality filtering commonly used; it addresses cases in which a texture is applied to a triangle at a high relative angle to the camera, meaning it is much larger in one dimension than the other (not isotropic).
Chittamuru et al. present a method of trading off energy for quality in this texture mapping hardware (Chittamuru et al., 2003). They discuss two techniques for skipping certainMACoperations in texture filtering: weight-based and intensity-based techniques. If texel weights in the bilinearly or trilinearly sampled texels are small enough, they can be ignored. Similarly, if two neighboring texels are roughly equal, the two MAC operations can be transformed into an addition and a multiplication. This technique offers a tradeoff: comparing more bits of neighboring texels leads to more accurate results, and comparing fewer bits will lead to fewer MAC operations. The authors also present an architecture for efficiently evaluating texel and weight similarities, so that power spent in comparisons will not outweigh the savings realized. In total, the authors save 30–50% of the power and speculated that up to 80% could be saved with the use of multiple voltage supplies.