Fast Skin Shading - Rendering Techniques - ShaderX7 Advanced Rendering Techniques (2009)

Rendering Techniques

2.4 Fast Skin Shading

JOHN HABLE, GEORGE BORSHUKOV, AND JIM HEJL

INTRODUCTION

Rendering realistic skin is a difficult problem in computer graphics, and especially in real-time. Most real-time applications that re-create heads use normal maps and diffuse maps to create an object with the shape and color of a head. However, these applications often use the same generic lighting model for skin that they use for cardboard, concrete, and plastic. Consequently, most heads rendered in-game do not “feel” like skin because the lighting model does not accurately model how light affects skin.

Skin looks very different from a pure diffuse surface such as cardboard. One of the fundamental differences between skin and other lighting models is that light bounces around inside of skin. Skin has several layers with differing levels of translucency, whereas a “pure” diffuse model is based on the assumption that when light hits an object, the light scatters equally in all directions. This model is fast to compute, but in order to make CG skin look like real skin, it is necessary to take into account the way light bleeds throughout an object.

This article discusses an implementation of quickly simulating how light transfers underneath skin. This article contains few new advancements, but rather, it synthesizes other people’s great work and applies optimizations. Specifically, this article’s goal is to make realistic skin shading practical in typical games on the current generation of consoles, such as the PlayStation 3 and Xbox 360. For the theory of making realistic subsurface scattering, rather than adding new information, we will try to synthesize existing art into this standalone article. Then, we will discuss several new variations on these techniques that allow fast subsurface scattering to be used in several in-development games at Electronic Arts.

BACKGROUND AND E^XISTING A^RT

The complete theory of light transfer underneath skin is quite complicated and beyond the scope of this article. For a great discussion, see d’Eon and Luebke’s chapter in GPU Gems 3 [d’EonGPUGems07]. In brief, subsurface scattering happens in three steps. First, the light hits the skin.

Second, light bounces around underneath the skin. Third, light exits the skin and is viewed by the camera. We simulate the first step by rendering diffuse light to a light map. The bouncing of the light is simulated by blurring the light map. Finally, light exiting is simulated by multiplying the blurred light by the diffuse map of the face.

While there are numerous techniques for skin shading, this article will talk about texture space diffusion. The idea is quite simple: Render the lights into UV space, blur those lights to simulate subsurface scattering, and render. The first use of this technique was by Borshukov and Lewis in the Matrix sequels [Borshukov03, Borshukov05]. They rendered the light to a map in texture-space, blurred that map, and used the blurred light in lighting calculations. To increase realism, they used different blur kernels for the red, green, and blue color channels, since light red, green, and blue light scatter differently in real skin. For the blur kernel, they used the formula 1/(1 + Radius)^p, where p was chosen based on extensive photographic reference. They used 2.75 for red, 3.25 for green, and 3.50 for blue (Figure 2.4.1).

FIGURE 2.4.1 The formula 1/(1 + r)^p when p is 3.50, 3.25, and 2.75. The bottom curve is for p = 3.50.

This idea was applied to real-time by Gosselin [Gosselin04], using a Poisson sampling, and Green [Green04], using a Gaussian blur. This work laid the foundation for the Adrianne and Doug Jones demos by NVIDIA [NVIDIA 06, NVIDIA 07] that set the current high bar for skin shading in real-time. The main technical leap in these demos was decomposing the dipole function into the addition of separable Gaussian blurs. These demos also showed several other features including finding a good set of shading parameters for real skin, compensating for UV stretching, popularizing a good specular model skin [Kelemen01], and making this information accessible.

In our opinion, the Doug Jones demo sets the clear standard for high-quality skin shading in real-time. The only problem is that “real-time” for a tech demo is quite different from being fast enough to use in a game in real-use cases. The Doug Jones demo fully taxes the processor of a high-end graphics card. In contrast, most games must run on less powerful consoles. Also, the commercial games must rhigh-ender an entire world, which only leaves a small fraction of time for skin shading. The premise of this article is that the Doug Jones demo is the high standard for skin shading, and the goal of this article is to show various ways to scale the Doug Jones demo down so that it is fast enough for a console but still retains as much quality as possible.

SPECULAR AND DIFFUSE

The most common lighting model in computer games is diffuse shading with a specular highlight. When light hits an object, it does one of two things. In the first case, light hits an object, gets absorbed, and then light of a different color is emitted. In the other case, light does not get absorbed and “skips” off the edge. The first case is called diffuse shading, and the second case is generally called specular. We will handle these cases in order.

D^IFFUSE

The diffuse case is much more fundamentally difficult because rendering diffuse lighting at a single point on skin requires knowing the incoming light intensities at nearby points. The first problem is simulating this light transfer if we had infinite computational time, and the second problem is performing that calculation quickly.

Fortunately, the problem of finding a diffusion dipole is nearly a solved problem. Donner and Jensen [Donner05] performed extensive analysis. In their work, they found different curves that show the intensity of red, green, and blue at output points a given distance away from the source of incoming light. For the Doug Jones demo, d’Eon and Leubke found sums of Gaussian blurs that closely match those curves.

Decomposing a dipole into a sum of blurs has several benefits. The first is that it allows that demo to run in real-time, since performing five 7×7 Gaussian blurs is more than an order of magnitude faster than performing one 50×50 blur. The second benefit is that it allows an intuitive way to tweak the numbers. While the artistic effect of changing these numbers is not exactly obvious, it provides a good starting point for changing the look for different skin types. Table 2.4.1 shows the numbers that Eugene d’Eon and David Leubke list as a good starting point for light, Caucasian skin.

One thing to notice is that the red channel is far blurrier than the blue and green. This happens because in the real world, red light scatters farther in skin than the green and blue wavelengths.

Table 2.4.1 The weights used by Eugene d’Eon and David Luebke [d’EonGPUGems07]

The Doug Jones demo takes the incoming light and renders it to a light map. That demo blurs the light map several times, and then combines them together. This raises the issue of how the diffuse map alters the light. Does the diffuse map affect the light going in or the light going out? We will discuss that issue more later. The short answer is that we advocate applying the diffuse map to the outgoing light in most cases, but it depends on your specific circumstances.

One final improvement in the Doug Jones demo is using a texture to compensate for the stretching in the UV map. In a typical UV map, certain areas of the face will be greatly distorted—in particular, the nose and the ears. A stretch texture helps avoid this artifact. Also, the first weight set is not actually used in the blur. To preserve sharpness, that weight is used in the final lighting calculation. So you can think of that first blur as the non-scattered lighting that immediately enters the skin and exits without traversing. That is the core algorithm for subsurface scattering as described in GPU Gems 3. Now, we will explain our optimizations to improve performance.

O^UR CONTRIBUTIONS

The first big problem with the technique as described is the blurs. If you use five blurs, since Gaussian blurs are separable, each blur is actually two passes, one for the horizontal and one for the vertical. At 7 taps per pass, each blur requires 14 taps, so the total cost is 14×5 = 70 taps. Then later on, we have the cost of reading those textures. This procedure very accurately calculates a 50×50 blur in a way that is much faster than a naïve implementation that would take 2,500 taps. However, with a carefully chosen sampling pattern, we can get significantly better performance with minimal additional error.

Our contribution to improve d’Eon and Leubke’s technique is to simulate that same blur in fewer taps, and we have achieved acceptable results using 12 or fewer samples. We generate our blur kernel based on the same dipole. After trying many approaches, we found that the best way to simulate this blur is to use two “rings.” We divide each ring into 6 sections for a total of 12 sections. Then, in each section, we do a jittered sample.

Then, we sample the full kernel with our 12 weights. Keep in mind that there are actually 13 weights because there is an implied weight in the center.

Note that you can use a different sampling pattern. The first weight is for the light that comes in and directly comes out. The next six weights simulate the “mid-level” blurring. The final six weights are mainly for the wide red blurring. Since the red bleeds much farther than the green and blue, the outer ring is primarily for the red channel.

fl oat3 bl urJ i tteredW ei ghts [13] = {

{ 0.220441, 0.437000, 0.635000 }, { 0.076356, 0.064487, 0.039097 }, { 0.116515, 0.103222, 0.064912 }, { 0.064844, 0.086388, 0.062272 },

{ 0.131798, 0.151695, 0.103676 }, process, we combine the sum of the blurs into a single blur, and then sample. This approach provides us with an intuitive way to tweak the blur for different skin types.

Plugging in these numbers, we actually have a different blur for each R, G, and B channel. Additionally, all of these samples can be done in exactly one pass instead of 10 passes, which greatly assists with memory. Note that these numbers are scaled by a constant. One other interesting feature is that since these are all done in the same pass, we don’t actually need to do a separate blur pass. In the final pixel shader, we can perform the 12 texture reads if we so desire.

The second optimization is that while writing the light map texture, we can poke a hole in the depth buffer and use High-Z to only perform the blur on front-facing polygons. During the initial render to light map pass, we can set the depth to dot(N,V) × 0.5 + 0.5, where N is the world normal vector and V is the normalized world-space vector from the camera to the point. Using this formula, a point facing directly at the camera would have an output depth of 1, and a point facing directly away would have an output depth of 0 (see Figure 2.4.2).

Here is the shader code for the blur pass.

fl oat3 total Col or = 0;

fl oat2 s tretc h = tex2D(S tretc hT extureB l urred, uv.xy).rg;

fl oat s hadow = tex2D(Li ghtMap, uv.xy).a;

for (i nt i = 1; i <= 12; i ++)

total Col or += S ubs urfac eJ i tterS am pl er(uv.xy, s tretc h, i );

FIGURE 2.4.2 We can render the depth and poke a hole in the depth buffer during our light map render pass. Then, while rendering the blur, we can turn on High-Z, which will allow the blur to only be rendered on the visible pixels. This approach prevents wasted computation on

either the black areas (which were never rendered to) or the gray areas (which fail the depth test).

During the blur pass, while rendering the quad, we can render with depth. If we desired, we could render to a depth of 0.5, which would only perform the depth on front-facing normals. In practice, we want to push this buffer back a little bit because while any single triangle is either visible or not visible, due to interpolation, some visible points may have a normal that points away from the viewer. Alternatively, if we are using DirectX10 hardware, we can use a geometry shader to determine if the face is front-facing or not in the final image and use that to poke a hole in the depth buffer that way.

One issue the GPU Gems chapter discusses is the idea that the diffuse map should affect the light coming in and the light coming out. For the Adrianne demo, the diffuse map primarily affected light coming in and less light coming out. In the Doug Jones demo, it was split evenly between the two.

We have found that for real video games, our diffuse textures for faces are DXT1-compressed. Additionally, we render the light map to an fp16 RGBA buffer. So while a 1024×1024 DXT1 diffuse texture will take 500k, a 1024×1024 fp16 RGBA buffer requires 8 MB. As a result, the light map blur texture is much smaller than the source textures used, and our light map buffer is generally 512×512 or smaller. When the light map texture gets this small, applying the diffuse map at all to this incoming light causes the diffuse texture to lose too much sharpness.

Additionally, our textures are based on photos, as opposed to painted by hand. When that happens, a certain amount of blurriness is built into the capture process. That is why we advocate using the diffuse texture only on outgoing light.

Due to this resolution issue, we apply the sharpest of the blurs during the final stage. During the final render, we sample the light map and multiply

it by the diffuse map, but we also recalculate the diffuse shading and multiply by the diffuse map again for the sharpest blur.

One final trick you can do is to put the shadow in the alpha channel of the light map. One disadvantage of doing the lighting this way is that we have to calculate the lighting twice per pixel. In reality, this is not much of a problem because usually calculating the diffuse component is cheaper than the specular component. However, this can get very expensive for shadows. In most games, one shadow is used for characters, and in this case, we can put the shadow term in the unused alpha channel. Then during the final pass, we just read the shadow. Note that this will blur the shadow slightly, but we have found that this adds a nice effect because it softens the shadow’s jagged edges (see Figure 2.4.3).

FIGURE 2.4.3 On the left is the shadow computed in the final pass, and on the right we have used the light-mapped shadow. This helps hide shadow artifacts.

Using these techniques, it is practical to render a high-quality head in real-time in a real game, but we have also found that we can have multiple heads rendered simultaneously with subsurface scattering. In most cases we have a fixed amount of memory dedicated to subsurface scattering.

How we allocate this buffer varies heavily based on the application.

In the case of a two-person fighting game, an obvious choice would be one map for each person. For more general-purpose cases such as first-person shooters, it is impractical to have subsurface scattering on tens or hundreds of heads. However, with a first-first-person camera, it is very rare to have an extreme close-up of more than one character at a time. Games can take advantage of this by having more memory allocated to close characters’ heads than far characters’ heads. In many cases, games can get better results by dedicating a large portion of memory to the closest character and dividing the rest among remaining characters.

SPECULAR

As mentioned before, there are two classifications of light that shades an object. First, there is light that gets absorbed by the surface and then retransmitted, which we call diffuse. Second, there is light that skips right off the surface, which we call specular.

The specular case is much easier. To get real-world reference of specular highlights, you can perform tricks with polarized lenses to separate the diffuse and specular components. Then, modeling the specular term is as easy (or hard) as finding a mathematical model that looks similar to the images. One interesting note is that skin is actually very shiny. In fact, it almost looks like metal.

Most games use a Phong model to approximate the specular highlight of skin. However, when you look at the specular highlights of real skin, it becomes clear that the Phong model will not suffice, and that it is not possible to get an accurate specular falloff using a single Phong lobe. We use the same model that the Doug Jones head uses. We directly use the model that d’Eon and Leubke advocate [d’EonGPUGems07], which is Kelemen-Szirmay-Kalos [Kelemen01]. See Figure 2.4.4 for a comparison. A full discussion of this lighting model is beyond the scope of this article, so we recommend reading d’Eon and Leubke’s excellent discussion.

Sill, we will reiterate a few points. There are three important qualities that a better specular model gives you.

1. The model is split into several lobes, and in our case, we use four. By having several lobes, we can get both the soft specular look as well as the tight specular look.

2. There is a built-in Fresnel term.

3. The specular highlight is brighter at grazing angles.

Granted, that model is pretty expensive for consoles, and if it is too expensive, you will have to find a good way to scale it down. In d’Eon and Leubke’s chapter where they discuss the Doug Jones demo, they advocate preintegrating the specular into a texture. If that solution does not appeal to you or is not feasible, here are some other ideas. For the first aspect, one cheap way is to use several Phong lobes.

FIGURE 2.4.4 The left image shows the Roberto head with Kelemen-Szirmay-Kalos specular, and the right image shows the same head with the Phong model.

For the Fresnel, you can add that as well to the Phong term. The specular highlight being brighter at grazing angles could be done with other approximations. Ideally, you should try to use Kelemen-Szirmay-Kalos, but even if it is too expensive, implement it anyway as reference.

VARIATION ACROSS THE FACE

One other aspect of rendering is that the lighting model at all points on the face is not equal. Different parts of the face are shinier than others, and have different subsurface scattering effects. There are several ways to implement this variation.

The first and most standard way to add variation is by including a specular map. Additionally, we can also add a term for how tight the specular highlight is. For example, the nose usually has a tighter specular highlight than the cheek. MERL has done some research with measuring the

In document ShaderX7 Advanced Rendering Techniques (2009) (Page 94-100)