An Iterative Method - Parallel Multiscale Contact Dynamics for Rigid Non-spherical Bodies

An alternative approach to solve the triangle-to-triangle distance problem is pa- rameterise the triangles such that the distance between them is formulated as a quadratic function. The overall scheme then is an iterative solution that approxi- mates the minimum distance between two triangles using the Newton method.

Let x and y be two points belonging to triangle TA and TB. Assuming that

points A, B, C span TA and that points D, E, F are points of TB, x and y can be

defined using the following equations over their barymetric parameters: TA: x(a, b) = A + (B− A) · a + (C − A) · b

and

TB : y(g, d) = D + (E− D) · g + (F − D) · d.

To find the minimum distance between TA and TA we minimise

f (a, b, g, d) =kx (a, b) − y (g, d)k2.

It is important that x and y stay within the area of the two triangles. The four parameters of the function f have to thus comply with six inequality constraints

CHAPTER 5. ALGORITHM OUTLINE AND VECTORISATION

Figure 5.3: Example of minimum distance and the corresponding barycentric points (parameters of objective function) on a pair of triangles in 3D. Triangle X:TA has

points A, B, C where barycentric parameters a,b correspond to point x on the triangle. Triangle Y:TBhas points D,E,F where barycentric parameters g,d correspond to

a point x. The two defined barycentric points define the minimum distance between the two triangles in 3D.

such that

{a ≥ 0, b ≥ 0, a + b ≤ 1, d ≥ 0, g ≥ 0, g + d ≤ 1} .

The iterative method finds the minimum distance not primitive-wisely but rather using nonlinear constrained optimisation. A point on a triangle can be defined using triangle barymetric coordinates.

5.5.1 Penalty-Based Formalism

The penalty method enforces the constraints to the problem f using the penalising method. This approach adds a penalty term to the objective function to penalise the solution when outside of the feasible region:

P (x) = f (x) + r· X

i=1...6

max(0, c(xi))2 (5.1)

where r > 0 is the penalty parameter and x is a, b, g, d. Newton iterations always converge to a solution but this solution might be slightly outside of the feasible region. Speed and ”outsideness” can be controlled by the r parameter that controls the sharpness of the curve for the constraints. One aspect that requires care however is the invertibility of the Hessian _{∇∇P .}

CHAPTER 5. ALGORITHM OUTLINE AND VECTORISATION

Figure 5.4: Illustration of a 2D problem showing the penalty function (red line) penalising the objective function (black line) f(x) under a constraint a (dash line) to create the feasible region (blue line).

two triangles and there can be more than one solution. As the Hessian matrix is not invertible, it is not possible to compute the Hessian and gradient. This illustrates the fact that f has multiple minima and _{∇∇f is singular. Consequently, ∇∇P} is also singular inside the feasible region. Because of the ill-conditioning, we use a quasi-Newton approach, where the Hessian is approximated by a perturbed operator ∇∇P + eps · I. I is an identity matrix and eps is suitably small.

The penalty algorithm as shown in Algorithm 5 accepts A, B, C, D, E, F vectors of triangles TA(A, B, C) and TB(D, E, F ) as well as the required parameters for the

algorithm to be solved. The penalty parameter r controls the steepness the P(x) function (5.1), eps is the perturbation parameter for the Hessian matrix, T ol is the tolerance for convergence (floating point accuracy). At line 23 of Algorithm 5 initial guess is chosen to be the centre of the two triangles, then the for loop initiates the Newton iterations to find the points on the triangle planes under the constraints c. For each of the six constraints (line 12) the max function of the penalty is determined so that every possible active constraint is detected. In line 17 and line 18 the gradient and Hessian of P is evaluated. Then the Gaussian elimination direct solver yield the Newton direction DX.

My optimised C implementation exploits matrix symmetries to reduce repeti- tions of calculations for matrix elements, for this reason hf in Algorithm 5 utilises symmetry. For the same reason, X-Y is only calculated once and stored into a vari- able XY so that its value is accessed instead of being repetitively calculated. Inside of the Newton loop, the optimised algorithm is totally different from the prototype.

CHAPTER 5. ALGORITHM OUTLINE AND VECTORISATION Algorithm 5 Penalty Solver Algorithm.

1: function Penalty(A, B, C, D, E, F, rho, tol)

2: BA _{← B − A; CA ← C − A; ED ← E − D; F D ← F − D;} 3: hf _{← [2 · BA · BA}′_{, 2}_{· CA · BA}′_,_{−2 · ED · BA}′_,_{−2 · F D · BA}′_; 4: 2· BA · CA′_{, 2}_{· CA · CA}′_,_{−2 · ED · CA}′_,_{−2 · F D · CA}′_; 5: 2· BA · ED′,−2 · CA · ED′, 2· ED · ED′, 2· F D · ED′; 6: 2_{· BA · F D}′_,_{−2 · CA · F D}′_{, 2}_{· ED · F D}′_{, 2}_{· F D · F D}′_]; 7: x = [0.33; 0.33; 0.33; 0.33]; 8: for i← 1 : 99 do 9: X _{← A + BA · x(1) + CA · x(2);} 10: Y ← D + ED · x(3) + F D · x(4); 11: gf ← [2 · (X − Y ) · BA′; 12: 2_{· (X − Y ) · CA}′_; 13: −2 · (X − Y ) · ED′; 14: −2 · (X − Y ) · F D′]; 15: h _{← [ − x(1); −x(2); x(1) + x(2) − 1; −x(3); −x(4); x(3) + x(4) − 1];} 16: dh ← [ − 1, 0, 1, 0, 0, 0; 0, −1, 1, 0, 0, 0; 17: 0, 0, 0, −1, 0, 1; 0, 0, 0, 0, −1, 1]; 18: mask _{← h}′ _{>= 0;}

19: dmax ← dh. · [mask; mask; mask; mask];

20: gra ← gf + ρ · dmax · max(0, h(:)); 21: hes _{← hf + ρ · dmax · dmax}′ _{+ I(4, 4)/ρ}2_;

22: dx ← hes \ gra;

23: DX ← BA · dx(1) + CA · dx(2); 24: DY _{← ED · dx(3) + F D · dx(4);}

25: error ← sqrt(DX · DX′_{+ DY} _{· DY}′_);

26: if error < tol then 27: BREAK;

28: end if

29: x ← x − dx; 30: end for

31: end function

The derivatives of the constraints are stored in an array instead of the sparse matrix, so only the non-zeroes are used. Operator max(0, h(:)) is calculated without the use of the std library function max() which is too generic for the code. It is possible to replace it with if statements and directly assign values to an array of active derivatives of constraints dmax, avoiding the masking operations in lines 10-11. The gradient gra is calculated so that any redundant operations are removed. The same techniques are used for the Hessian matrix hes. The point here is to end up with as few assignments as possible. The most significant aspect of the optimised algorithm is the linear solution in line 18. Unlike MATLAB which exploits a separate direct solver, our implementation merges individual operations of a 4x4 Gaussian elimina-

CHAPTER 5. ALGORITHM OUTLINE AND VECTORISATION

tion with the rest of the algorithm in a monolithic manner. This means that with Gauss elimination, calculation of the gradient and calculation of the Hessian are fused together into one compute kernel. Such solution streamlines the requirement of temporary variables assignments and floating point operations per iteration.

Operations like division are limited because of their computational latency, operations like addition, subtraction and multiplication are preferred. Divisions are slow because they are solved iteratively in hardware (floating point unit of a general purpose processor), although the actual hardware-based method may vary by processor manufacturers [62]. The penalty method is well-suited for SIMD optimisation because I can concurrently determine the distance between multiple triangle pairs as long as we use the same number of Newton steps: Up to four or eight triangle pair distances can be determined at the same time; depending on the vector width. Such a speed-up statement however has to be read carefully. While the concurrency is high, it is not clear a priori how many Newton steps are required. A high number of Newton steps can render the penalty method slower than the brute force approach.

In the case of two parallel triangles in three dimensions brute force and the iterative scheme may produce different points that define the same distance. When two triangles are parallel to each other the iterative scheme is ill con- ditioned as multiple solutions exist. Our novel iterative implementation fuses the direct Gaussian elimination phase with the Newton solver into a single monolithic algorithm, thus removing all computational redundancy.

5.5.2 Penalty Method Parameter Tuning

The serial penalty-based algorithm forms the baseline our optimised implementation. For this reason, it is important to identify correlations between the optimisation parameters and individual triangle sizes. Such a relationship between the penalty parameter and output error is identified and exploited (Figure 5.5). The error is defined as the degree of difference in the solution of the penalty solver over the brute force solver.

For an optimum penalty parameter we decided to tune the calculation on random triangles sizes that reside in boundary boxed domain. To maintain consistency in the randomness of the triangle sizes, various length of triangles are tested with random rotations. A wide spectrum of cases is thus examined for the purpose of parameter tuning. Using different scales of boundary boxes, a linear relationship is found between the size of the boundary space and the size of triangle and the penalty parameter. Figure 5.5 shows that for different sizes of triangles, the penalty

CHAPTER 5. ALGORITHM OUTLINE AND VECTORISATION

Figure 5.5: Tuning data for the penalty method to find a scaling relationship between the size (upper left to lower right figure) of problem, penalty parameter and error - using logarithmic scales

parameters correlate linearly with respect to the optimal equation that is derived based on the triangles size:

roptimal = s· 10(log10(s)+10) (5.2)

where s is the size of the triangle boundary (average side length). The error shows results for a hundred thousand of pairs of triangles.

The eps Hessian matrix regularisation parameter is tuned with respect to its effects on the failure rate at specific sizes of triangles but also the number of Newton iterations. To achieve a low on average 4-5 iterations ( 400,000-500,000 total iteration on average for 100,000 triangle pairs), eps parameter has to be small for small triangles. While the size of triangles increases, eps linearly increases. Using a low epsilon parameter doesn’t mean optimal number of iterations, eps is adjusted using a coefficient of boundary size.

Debugging and experiments of the prototype code show that the first and sec- ond Newton iterations often oscillate around the convergence point, and the third converges to the solution (unless the point on the triangle is at a corner). The eps perturbation along the diagonal Hessian is increased at this point to complete

CHAPTER 5. ALGORITHM OUTLINE AND VECTORISATION

Figure 5.6: Relationship between eps and number of iterations for different lengths of triangles. Optimal eps should be low and increased depending on the r parameter.

Figure 5.7: Histogram retrieved when tuning thousands of triangle pairs with optimal penalty parameters and epsilon perturbation of the problem. It shows that most problems converge within six iterations.

the convergence since it is known that the search is close to the convergence area. Statistically, the error is at the minimum at four iterations (Figure 5.7). Because penalty is used, it is inevitable that the solution has to be in a position spatially

CHAPTER 5. ALGORITHM OUTLINE AND VECTORISATION

outside (difference seen by the error values, see Figure 5.6). Epsilon initially cannot be too high but also not low because the solver cannot get the right direction (eps- r-iteration graph, Figure 5.6). The perturbation eps is increased after three steps for convergence with minimum error and minimum iterations.

The penalty parameters that act as a spring to the Newton step to keep it inside the feasible region is tuned against the error. Epsilon (eps) regularisation parameters are also tuned accordingly.

In document Parallel Multiscale Contact Dynamics for Rigid Non-spherical Bodies (Page 66-73)