Optimization of the Constraint Solver

Figure 3.12. Double loop of ﬁnding overlapped pairs.

Each SPU executes the process of ﬁnding the overlapping of AABBs and DMA transfer in parallel with the double-buffering method because the AABB array is just a simple array str ucture. When the number of necess ary AABBs becomes too small to use double buffering at a later stage of the process, we load the rest of the AABBs to the local storage and execute processes one at a time.

For example, Batch 3 in Figure 3.12, includes only three AABBs.

The PPU has to allocate buffers where output pairs are stored before starting tasks because SPUs can’t allocate bu ffers in the main memory. Each SPU ﬁnds overlapping pairs and then puts them back to the main memory when enough pairs are stored in the local storage. After all overlapping pairs are found by SPUs, the PPU gathers these pairs and merges them into one array.

3.4 Optimization of the Constraint Solver

Constraint is a condition that limits the behavior of a rigid body; it is used for collision response or ragdoll joints in many real game s. Often, impulses that directly change the velocity of a rigid body are used to also control the behavior of a rigid body. To solve the constraint condition for many related rigid bodies, the general solution is to construct a huge matrix of linear equations and solve these with an iterative method known as projected Gauss-Seidel (PGS) or sequential impulse (SI) [Catto 05]. In this section, we don’t describe the details of this method but rather focus on how to optimize the solver computation for Cell/BE.

If needed, please refer to Chapter 2 for details.

58 3. Broad Phase and Constraint Optimization for PlayStation ^^R3

Figure 3.13. Constraints for collision response.

3.4.1 Overview of Solver Calculation

In the iterative method, one row of linear equations represents one constraint. A row of equations is solved and its output is used immediately by other rows. Then, by repeating the calculation for all rows, a result will be close to the suitable solution. Calculation continues until the calculation times reach the speciﬁ ed iteration limit, and a result will be a solution.

for ( in t i =0; i <it er at io nL im it ; i ++) { for ( in t n=0 ;n <numCon stra ints ; n++) {

Constr aint & constr aint = constraintA rray [n ];

Rig idB ody &rigid bodyA = co ns t ra in t . get Rig idB od yA ( ) ; Rig idB od y &rigidbodyB = co ns t ra in t . get Rig idB od yB ( ) ;

solveConstraintRow ( co ns tr ai nt , rigidb odyA , rigidbodyB ) ; }

}

Listing 3.3. Example code for the solver iteration.

Information about two related rigid bodies is necessary to calculate one con-straint (see Figure 3.13). An output is applied to two related rigid bodies immed i-ately in calculation, but there are dependencies between rigid bodies when some constraints share rigid bodies. We can’t parallelize this solver calculation simpl y to assign divided constraints into each SPU.

3.4. Optimization of the Constraint Solver 59

3.4.2 Optimize Solver Calculation

The easiest way to convert the solver calculation for parallel computation is to break constraints into small independent batches without sharing rigid bodies be-tween batche s. It is too complicated to divi de all constraints at the same time.

However, it is easy to create small indep endent batches. Then we gather these batches into a group. Independent batches in a group can be processed by SPUs in parallel. We continue to create groups in the same way until all constraints are assigned.

Figure 3.14 shows some groups containing batches that can be executed in parallel. After the calculatio n of batches in a group is completed, we synchroni ze all SPUs and continue to calculate batches for the next group until all groups are completed. However, the cost of synchronization is not free. As the number of groups increases, the cost of synchronization will also increase. However, Cell/BE has a mechanism by which it can operate synchroni zation between SPUs without PPU operation. The cost of the SPU synchronization is low enough when the number of synchronizations is not too large.

Double buffering with two phases.

In using the constraint solver, we need to be careful about DMA transfer when double buffering. As described in the previous section, each SPU calculates an assign ed batch. But each constraint in a batch has data dependencies if the constraints share rigid bod ies. In a worst case, the Put and Get DMA operations for the same data occur at the same time; then double buffering causes an irregular result.

Moreover, we need a constraint and two related rigid bodies to calculate one constraint. The structure of a constraint has link s to rela ted rigid bodies. So we have to get a constraint ﬁrst, then we can get two related rigid bodies before calculation. But such a dependency causes a disabling of double buffering.

Figure 3.14. Constraints assigned to batches.

60 3. Broad Phase and Constraint Optimization for PlayStation ^^R3

Figure 3.15. A data-holder structure.

To enable double buffering, we need to transfer all the data used for one cal-culation. As shown in Figure 3.15, we prepare the array of a structure that holds related indices (or pointers) of necessary data. Figure 3.16 then shows double buffering with two phases. In the ﬁrst phase, we transfer the data-ho lder array with double buffering. Then, in the second phase, we calculate real addresses of necessary data and call the Get DMA command to transfer data used in the fu-ture to the local storage. We then continue to process in the same way as normal double buffering.

Figure 3.16. Double buffering with two phases.

To solve dependencies between rigid bodies in double-buffering mode in one batch, once the rigid body is stored in the local storage, we keep it and don’t load the same rigid body again. Updated rigid bodies are returned to main memory at the last step of the processing of a batch. It works well becaus e no SPU shares the same rigid bodies in a group. Implementation of this algorithm is easy . We prepare the reference table of rigid bodies in the local storage and always check this table before transf erring a rigid body. If a rigid body already exists in the local storage, we don’t need to transfer it again since we can just use the one in the local storage. We can specify the number of rigid bodies in a batch so that all rigid bodies in this batch can be stored in the local storage.

In document Game Physics Pearls (Page 67-70)