The Buffer-Template Method - How Does It Work?

A Few Important Points

Chapter 3. DirectX 10 Blend Shapes: Breaking the Limits

3.2. How Does It Work?

3.2.4. The Buffer-Template Method

Note that we never perform any CPU readback of our data: everything is kept within vertex/stream buffers. Therefore, you have to tell this buffer that it is a receiver for streamed data (D3D10_BIND_STREAM_OUTPUT) and a source for the Input Assembler (D3D10_BIND_VERTEX_BUFFER).

3.2.4. The Buffer-Template Method

An alternative to using the CPU to drive the iteration over the active blend shapes, as occurs in the stream-out method, is for the GPU to perform the iterations. DirectX 10 enables this by providing flow control in the vertex shader for managing the iterations along with the ability to bind a buffer as a shader resource view to provide access to the data. This buffer is available through a template in HLSL, which can be read by using the Load() method:

Buffer<float3> myBuffer;

. . .

float3 weight = myBuffer.Load(x);

In the application, this shader resource view is created from any buffer that was created with the D3D10_BIND_SHADER_RESOURCE flag. Once the resource has been created, it is bound to the effect variable like this:

ID3D10EffectShaderResourceVariable *v;

. . .

v->SetResource(myRV);

Using a shader resource to hold the blend shape data, this method breaks the input data set into two types. The base mesh is read through the input

assembler, while the blend shape data are all loaded explicitly in the shader from the resource view of the buffer. Utilizing loads from the buffer means that an effectively unlimited amount of blend shape data can be read in with a single invocation of the vertex shader.

In addition to the nearly unlimited loads, a buffer provides other advantages

over alternative solutions. Textures are restricted to 8,096 elements in a single direction, and while 2D and 3D textures extend the total addressable size beyond the size of video memory, the extra arithmetic for computing row and column offsets is an undesirable complexity. On the other hand, buffers support more than sixteen million elements in a simple 1D package.

With this method, we use this type of buffer to store all the blend shapes in one single big buffer. As mentioned previously, creation of this buffer requires a special binding, D3D10_BIND_SHADER_RESOURCE, so we can create a 1D

(D3D10_SRV_DIMENSION_BUFFER) shader resource view. Additionally, because blend shapes are not modified at all at runtime, declaring the buffer as

immutable (D3D10_USAGE_IMMUTABLE) ensures that it is allocated in the most optimal way. See Listing 3-3.

To address the blend shape components, the shader can utilize the

SV_VertexID semantic introduced in DirectX 10. This semantic provides the element number currently being processed. By combining this element number with the stride and pitch of the blend shape elements, the shader can easily compute the proper offset for the Load() function to retrieve the necessary blend shape elements.

Obviously, the shader must be restricted to process only those blend shapes currently in use. This is done by using an additional pair of buffers that store the indices and weights of the active blend shapes. The number of meaningful entries in these buffers is provided by the variable numBS. The index buffer, weight buffer, and the numBS variable are all updated every frame. To optimize this usage pattern, the buffers are declared with D3D10_USAGE_DYNAMIC (telling DirectX that it will be updated frequently) and D3D10_CPU_ACCESS_WRITE

(telling DirectX that it will be updated directly from the CPU). Listing 3-4 shows how the blend shapes are accumulated in this method. Figure 3-5 illustrates the process.

Figure 3-5. Using Loops Reduces the Number of API Calls to One

Listing 3-3. Data Declaration and Resources Creation

Code View:

D3D10_BUFFER_DESC bufferDescMesh = {

sizeBytes,

D3D10_USAGE_IMMUTABLE,

D3D10_BIND_SHADER_RESOURCE, 0,

0 };

D3D10_SUBRESOURCE_DATA data;

data.SysMemPitch = 0;

data.SysMemSlicePitch = 0;

data.pSysMem = pVtxBufferData;

hr = pd3dDevice->CreateBuffer( &bufferDescMesh, &data, & pVtxResource );

D3D10_SHADER_RESOURCE_VIEW_DESC SRVDesc;

ZeroMemory( &SRVDesc, sizeof(SRVDesc) );

SRVDesc.Format = DXGI_FORMAT_R32G32B32_FLOAT;

SRVDesc.ViewDimension = D3D10_SRV_DIMENSION_BUFFER;

SRVDesc.Buffer.ElementOffset = 0;

SRVDesc.Buffer.ElementWidth = numBlendShapes * vertexCount * (vtxBufferStrideBytes/(3 * sizeof(float)));

hr = pd3dDevice->CreateShaderResourceView( pVertexResource, &SRVDesc, &pVertexView );

To get to the final vertex position, the vertex shader simply Loops over these two arrays of indices and weights

Retrieves the corresponding vertex attributes in the blend shape pointed out by the index

And finally adds these contributions to the final vertex

If you compare this approach with the previous method, you see that now the whole construction of the final shape is performed in one single draw call: we don't need to drive the iterations by sending additional draw calls. Instead, we stay in the vertex shader and loop in it depending on how many blend shapes need to be processed.

Listing 3-4. A More Flexible Way of Computing Blend Shapes

for(int i=0; i<numBS; i++) {

uint offset = bsPitch * bsOffsets.Load(i);

float weight = bsWeights.Load(i);

dp = bsVertices.Load(offset + 3*vertexID+0);

dn = bsVertices.Load(offset + 3*vertexID+1);

dt = bsVertices.Load(offset + 3*vertexID+2);

pos += dp * weight;

normal += dn * weight;

tangent += dt * weight;

}

Listing 3-5 shows the final sample code for this vertex shader.

Although this method is more efficient, you need to be aware of a limitation in DirectX 10 when using buffers to read vertex data:

Listing 3-5. Initialization Code Surrounding the Loop Featured in Listing 3-4

Head_VSOut VSFaceBufferTemplate(Head_VSIn input, uint vertexID : SV_VertexID)

{

Head_VSOut output;

float3 pos = input.pos;

float3 normal = input.normal;

float3 tangent = input.tangent;

float3 dp, dn, dt;

for(int i=0; i<numBS; i++) {

uint offset = bsPitch * bsOffsets.Load(i);

float weight = bsWeights.Load(i);

dp = bsVertices.Load(offset + 3*vertexID+0);

dn = bsVertices.Load(offset + 3*vertexID+1);

dt = bsVertices.Load(offset + 3*vertexID+2);

pos += dp * weight;

In the shader code, it is impossible to use a user-defined type for data (for vertex attributes) in the Buffer<> template. Only basic types such as

float, float3, and so on can be used.

In the application code, when we create a shader resource view to bind to the buffer, we face the same problem: only the types from the

DXGI_FORMAT enum are available. There is no way to specify a complex input layout made of different formats in a resource view.

This issue is not a problem at all in our case because our blend shapes are made of three float3 attributes (position, normal, and tangent). So we can simply declare a buffer of float3 and step into it three by three. However, there is a problem if you want to read a set of vertex attributes made of

different widths, say, float2 for texture coordinates, float4 for color, and so on.

The easiest workaround is to pad the shorter data. For example a float2 texture coordinate will have to be float4 in memory, and the shader will use only the first two components. But this trick requires us to prepare data with some "holes" in it, which is not very elegant and takes more memory. A more complicated workaround would be to read a set of float4 values and to

reconstruct the vertex attributes by ourselves in the shader. As an example, we may be able to use the third components of position, normal, and tangent to reconstruct another three-component vector. We didn't test anything

related to this issue, and so we leave it to the reader to find some compression solutions.

In document GPU Gems 3 (Page 142-148)