4.2 Technique
4.2.3 Domain Decompositon Method and ARD
I now discuss my work in conext of the Domain Decomposition Method (DDM). This serves to illuminate the choices in ARD, its novel aspects, the main source of error, and ways to address it. Interestingly, the main motivation of the DDM when it was conceptualized more than a century ago was very similar in spirit to my approach – divide the domain into simpler-shaped partitions which could be analyzed more easily [22]. However, since then, in nearly all DDM approaches for wave propagation the principal goal has been to divide and parallelize the workload across multiple processors. Therefore, the chief requirement in such cases is that the partitions be of near-equal size and have minimal interface area, since such decomposition corresponds to balancing the
computation and minimizing the communication cost between processors.
The motivation of ARD for partitioning the domain is different – it is done to ensure that the partitions have a particular rectangular shape even if that implies partitions with highly varying sizes. This is because the rectangular shape yields to analytical solution, which in turn means reduced computation without compromising numerical accuracy for propagation within the partitions. This holds independently of any paral- lelization considerations, which have been the main focus of prior work on wave-solvers in high-performance computing. However, by virtue of having performed a decomposi- tion, parallelism is still present to be exploited in ARD. I have pursued this direction to some extent by off-loading the computed-intensive per-partition FFTs to the GPU, as will be described later in this chapter. It is possible to parallelize my approach further by allocating the partitions to different cores or machines, and doing interface handling between them, and would be the way to scale ARD to very large scenes with billions of cells, just as with other DDM approaches. I am part of current work in this direction [57] which has already shown promising speedups of nearly ten times compared to the results I present in this thesis. Thus, decomposing the domain into partitions and performing interface handling between them are very well-known techniques that are applicable to ARD, but are not the novel contribution of this work – which is the restriction of the partition shape to rectangles and exploitation of analytical solutions.
Another interesting question is whether the existing principles of DDM can be ap- plied directly for performing the interface handling between partitions. Most DDM approaches have been designed for elliptic equations and require a global iterative solu- tion where each partition is updated multiple times before reaching the final, globally- consistent solution. The time-domain wave equation, on the other hand, is hyperbolic. It turns out that using an explicit time-stepping scheme with the wave equation elimi- nates the need for global iteration. Thus, in order to ensure global consistency at each time-step with explicit wave equation solvers, including ARD, it is sufficient to apply
interface operators between partitions, without any need for such iterations, such as the Schwarz alternating method. The mathematical reason for this is that as long as the discrete spatial derivative operator reads its input values from the correct places, whether these values exist locally or not is immaterial. Domain decomposition in the context of explicit schemes takes a very simple form – the spatial derivative operator near the boundary is split additively into two parts – one local to the partition and the other non-local, potentially indexing anywhere, which is the interface operator. The local part is calculated on all processors in parallel, but the interface operator requires communication between processors. There are no errors introduced due to domain par- titioning, no matter how the derivative operator is split additively into the two parts – the result is mathematically identical to the global solution, owing to the linearity of the numerical derivative operator.
The interface errors that result in ARD are because the above-mentioned decom- position of the discrete derivative operator isn’t perfect – because of using a spectral approximation (in the Cosine basis), the spatial derivative operators are global in na- ture, extending all over the domain. In other words, they are not compact, unlike finite-difference techniques. This is the price to pay for the great increase in accuracy. Thus, theperfectinterface operator is not compact, making it computationally expensive to evaluate. The difference between the ideal global operator and the compact sixth- order finite difference operator actually used results in the erroneous reflections. These errors can thus be decreased by increasing the support of the interface operator and reducing the difference between the ideal operator and the approximate one, preferably while looking at the Fourier transform of the error, which corresponds to the frequency response of the artificial interface. Thus, the interface errors in ARD can be reduced without bound by optimizing the frequency response as discussed above, at the cost of increased computation. In particular, there is no need for global iteration between partitions.