Dirichlet Edge: DirichEdgeChange() Function
4.10 Discussion
For simplicity and clarity we split our discussion in two parts. Part one of the dis- cussion corresponds to Examples 1 to 6 above, where we have only 2 or 4 processors involved. The second part of the discussion corresponds to Examples 7 to 12 above, where the number of processors involved are either 8 or 16.
4.10.1 Discussion I
The results of applying the four algorithms to each of the rst six test problems are summarised in Table 4.7. In the table MaxImb stands for the maximum imbalance (seex3.5 for a quantitative denition of this). Qualitatively, this means the largest
percentage by which the total weight on any single processor exceeds the average weight per processor. Also, CutWt stands for the cut-weight. As in Chapter 3 it is dened as the total weight of all of those edges of the weighted dual graph of the root mesh which are cut by the partition boundary. The entry G R Time means the generation or rebalance time of the corresponding mesh and SolTime means the time taken by the solver to numerically solve the simple PDE mentioned above.
We start our discussion with Examples 1 and 5. In these problems the number of subdomains are exactly 2. As expected all algorithms (except the VKV0 algorithm) produce identical results (as in the case of 2 processors all they have to do (except the VKV0 algorithm) is to shift some load from the heavily loaded processor to the lightly loaded processor using the concept of \gain density"). In the case of the VKV0 algorithm results are dierent as it does not use the concept of \gain density". Upon completion all four algorithms produce perfectly load balanced partitions.
There is a substantial increase in the cut weight for the VKV0 algorithm in both Examples 1 and 5 whereas for the other algorithms there is only a small increase in Example 1 and a signicant decrease in the cut weight for Example 5. There is very little saving in the solution time. This is to be expected on the grounds that the initial imbalance was not that high (being less than 1.7% in both the cases). As a matter of fact the particular mesh-generator is very good in terms of producing well balanced meshes in case of two subdomains (this is why we present only two examples using 2 processors).
nal partitions with relatively high cut-weight. In the case of the VKV1 and New algorithms the cut-weight is relatively smaller and almost identical (except for Ex- ample 2 where cut-weight produced by the New algorithm is smallest compared to the other three). The cut-weights produced by the algorithm of Hu and Blake are in between these two extremes. Also the solutions times are all roughly same. Apart from Examples 2 and 3 the reduction in the solution time is not that sig- nicant. This can be explained by observing that although the initial imbalance in the current situation is higher than the corresponding imbalance in case of two subdomains (except Example 4 where the initial imbalance is slightly less than that of initial imbalance of Example 1) it is still not high enough to produce any signif- icant increase in the solution time when solver is applied on the modied meshes. Also the time to rebalance the meshes is negligible, always less than or equal to 0.1 seconds (except Example 4 where the algorithm of Hu and Blake took 0.2 seconds).
4.10.2 Discussion II
The results of applying the four algorithms to each of the last six test problems are summarised in Table 4.14. All the headings in this table are exactly the same as given in Table 4.7 and also have same meanings as described inx4.10.1.
There are a number of comments which need to be made concerning these re- sults. Firstly, the cut-weights produced by the VKV0 algorithm are higher than the corresponding cut-weights produced by the VKV1 algorithm (as a matter of fact cut-weights produced by the VKV0 algorithm are highest as compared to the cut-weights produced by all other algorithms). This clearly shows the eect of using the concept of gains in the migration phase of the VKV1 algorithm. Also, by look- ing at second problem (e.g. problem 8), it may appear on rst inspection that all four of these techniques perform quite poorly in terms of the size of the maximum imbalance (the nal maximum imbalance is well above the desired allowable target of 1% (which we maintained throughout the chapter)). This is not really the case however since the mesh renement in this example is highly localised (as is typical in the adaptive solution of partial dierential equations) and so some root elements have extremely large weights compared with others. This makes it impossible to achieve an exact load-balance in this case without increasing the cut-weight mas- sively. However the situation in all other ve problems is not so bad. In these
examples each algorithm consistently achieves a maximum imbalance of well under 1% (with two exceptions - in Example 7 a highest maximum imbalance of 1.4% is produced by both the VKV1 and New algorithms and in Example 10 both the VKV0 and VKV1 algorithms achieve a maximum imbalance of exactly 1.1%).
As mentioned above in all problems the VKV0 algorithm produces the highest cut-weight as compared to the other three algorithms. Apart from Examples 7 and 9 the New algorithm has the least amount of cut-weight. In Example 9 the VKV1 algorithm enjoys the least amount of cut-weight. In Example 7 both the VKV1 and New algorithms produce the least amount of cut-weight.
It is interesting to observe that the parallel execution times for all of these algorithms are generally quite similar with only one exception, the exception being Example 10 (which is rather surprising) in which case the algorithm of Hu and Blake is taking twice as much time as taken by other two algorithms. But in either case the cost of rebalancing the mesh is only a fraction of the cost of generating the mesh. Hence for this class of problem with these reasonably good initial partitions, it would appear that minimising data migration is not as important as obtaining a high quality partition.
A nal note for this section is to analyse the parallel execution time taken by our simple solver. This is the most important parameter of any dynamic load balancing algorithm. As far as the New algorithm is concerned the net saving in solver time ranges from 3% to 15%. Except for Example 11 the SolTime of the solver using the partition of the New algorithm is less than the corresponding SolTime of the solver using the partition algorithm of Hu and Blake. The corresponding dierence in time in case of Example 11 is negligible. In all cases the SolTime taken by the VKV0 algorithm is higher than the corresponding time of other algorithms.
4.11 Conclusions
In this chapter we have introduced a post-processing algorithm for the parallel generation of unstructured meshes for use in parallel nite element or nite volume analysis. The algorithm is based upon a parallel implementation of the dynamic load-balancing algorithm of Chapter 3 so as to perform a local modication of the partition of an underlying background grid from which the mesh was generated in parallel. This modication aims to improve the load-balance whilst respecting data
locality and ensuring that the length of the partition boundary is not increased unnecessarily.
We have successfully demonstrated an implementation of this algorithm in two dimensions. In addition it has been shown that the execution time of the code, implemented in C using MPI, is extremely competitive. It should be noted however that the post-processing step described here can only be as eective as the coarse mesh allows it to be. For example, if the background grid only has a small number of elements which are evenly spread across the domain and the ne mesh is very ne in some particularly local regions, then it is possible that even an optimal solution of the corresponding load-balancing problem may have a very large imbalance and/or cut-weight (e.g. Example 8 above).
As mentioned earlier, at the time of undertaking the work of this chapter no public domain dynamic load-balancing algorithm was available to compare with our algorithm. Recently, parallel versions of the publicly available software pack- ages METIS [63] and JOSTLE [109] have been released and so it would now also be possible to make use of these within the post-processing step and compare the performance of these with the above algorithm. We have not made these compar- isons however since extensive use of both of these packages is made in the next chapter in which the load-balancing algorithms are applied to a problem arising in the adaptive solution of 3-d time-dependent equations. Moreover, some further modications to our new dynamic load-balancer have been made for this 3-d ap- plication and it is with this nal version that we compare the parallel versions of METIS and JOSTLE.