Chapter 3 Numerical model
3.1.3 Spanwise discretization
Flow field perturbations have been assumed to be periodic in the spanwise direction and thus can be adequately represented by Fourier expansions. This representation is exploited in the formulation of the mathematical model itself, and no further discretization would actually be required apart from the prescription of a particular spanwise mode β to be defined for every linear simulation carried out. In fact, the spanwise modes can be decoupled and calculated separately. This allows a highly-efficient parallelisation scheme to be implemented relatively simply for a full 3D flow field. Actually, once a Fourier expansion is defined over a finite set of spanwise modes, advantage of highly efficient FFT algorithms can be exploited in order to perform full three- dimensional reconstruction. In such a case, the accuracy of the method in the spanwise direction is spectral. For example, as originally suggested in [37], non-interactive control calculations could be performed for groups of modes which could be ascribed to individual processors and run completely independently of each other. Following these ideas a computational parallelisation was implemented employing Message-Passing protocols aiming to speed-up simultaneous multiple spanwise modes simulations. In particular, this sort of parallel methodology was employed for two different situations: (i) Parametric optimisation of forcing and initial
perturbation fields of the Linearised Navier-Stokes Equations (LNSE) and (ii) Three-dimensional simulations
of boundary layer response to selected control cases. It is important to highlight that, strictly speaking, a full computational parallelisation is not a required or essential element in the linearized mathematical model adopted here, considering that every single spanwise mode can run independently on separate processors. However, such a parallel implementation makes it easier to manage and control the numerical information coming from all the different modes. However, although it was not explored in the present work, if a non- linear extension was desired for the present model, a real parallel computational implementation would be an efficient way to perform single-mode calculations, and using the parallel protocol to perform the non-linear part in those terms involving products of spanwise-modes.
MPI Parallelisation
Message-passing interface (MPI) is currently one of the most powerful and widely accepted parallelisation techniques (though at the time of writing, GPU-based1 ones are a strong competitor). Since its inception, MPI protocol has been aimed at exploiting modern computational architectures. In past few years MPI has become a standard definition for message-passing libraries. For these reasons MPI has been adopted here to implement the parallelisation. Even more importantly, such protocol has been chosen because it aims to provide independence for each processor involved in a given computational task, a feature that is fully
compatible with the proposed methodology. A wide variety of texts and on-line resources are available to help any programmer to incorporate this protocol in his/her code. However, for more details than those provided here and for specific terminology, the reader is referred to Pacheco [99].
For this work, libraries based on Open MPI-2 were used as the MPI protocol. A single communicator, or group of cores, with no special topology has been employed. The number of processors required is read from an input file and therefore is set interactively at each simulation. For most of the cases studied the number of cores equals the number of spanwise modes to be simulated.
As in any MPI standard procedure, initially a master core performs the majority of I/O tasks on the disk, such as reading the simulation parameters from the same file. Although in some cases every processor stores its own data in a separate folder, at the very end of each simulation the master core saves in an output file information from a number of diagnostic checks. Equally, at the end of every simulation, every single core passes information specifically related to flow field variables to the master core to be saved. Any special post processing of data, like estimation of integrals in time or normalisation against maximum in time is performed for every individual processor. A short list of the MPI subroutines more intensively used, with a brief description of its use in the current implementation, is given directly below:
MPI_Bcast: Used to transmit information from the master core to the rest members of the communicator.
MPI_Barrier: Called for synchronisation between different cores.
MPI_Gather: Required for the master core as a reduction task to gather data from the other cores at the end of the simulation.
MPI_Get_Address: Employed to acquire memory allocation information to construct derived data types.
MPI_Type_Create_Struct: This subroutine enables to create derived data types to efficiently trans- mit information between cores.
MPI_Type_Commit: Final instruction to generate an identification code within the commu- nicator for a particular data type.
In the present coding every processor can perform either a single-mode simulation or multiple-modes simulation depending upon the ratio of the spanwise-modes-required to cores available. Thanks to the inherent simplicity brought by linearization, along with the modal decomposition, it was possible to have a fully parallelised version of the serialised code with relative ease and a reduced set of MPI instructions. This feature was exploited with a minimal penalty on computational cost.