• No results found

Chapter 3 Numerical Methods

3.3 Poisson Solver

The calculation of change in pressure,δp, appears in equation (3.13) in the form:

DGδp= 1 ∆t(Du

This can be expanded, and expressed as the Poisson equation: ∂2(δp) ∂x2 + ∂2(δp) ∂y2 + ∂2(δp) ∂z2 = 1 ∆t ∂u∗ ∂x + ∂v∗ ∂y + ∂w∗ ∂z −cbc =f, (3.20)

wheref is the right-hand side of the Poisson equation which can be calculated from the known intermediate velocity information. Given that, for channel flow, there is a uniform mesh in the x and z directions, and that we have periodic boundary conditions in these directions it is possible to take a two dimensional Fourier trans- form. A fast Fourier transform (FFT) is used which expresses the pressure variable as δp = P

x,zδpb ·eikxxeikzz. When using central differences for the discretisation of the second-order derivative, the modified wavenumbers (which are found in the same way forx andz) are calculated as:

kx(i) = cos2Nπi x −2 ∆x2 . (3.21)

Equation (3.20), after applying an FFT, can be written:

−kx2δpb +

∂2(δpc)

∂y2 −k 2

zδpb = ˆf(kx, y, kz). (3.22) Asδpb is dependent on y, and a central difference discretisation in y gives a simple tridiagonal matrix equation, with the influence of the wavenumbers in the diagonal. This can be solved with a 1D inversion algorithm in the wall-normal direction. The inverse transform is performed on the resulting vector, δpb, in order to evaluate the change in pressure.

3.4

Parallelisation

The aim of the current work was to create an efficient parallel code to perform DNS of channel flow at increased Reynolds numbers. The method presented by Kim et al.

[2002], although an efficient algorithm, is slow and there is a limit of simulation size set by the memory requirements. The main drawback of this method, when considering parallelisation, is the need for one-dimensional banded matrix inversions which prevent division of the domain in the direction of the algorithm. This means that the domain must be split onto the individual processors in a 2D fashion. This is performed using the library developed by Li and Laizet [2010], which contains some useful subroutines for an MPI parallelisation strategy. It adopts the pencil structure shown in figure 3.1.

Figure 3.1: Pencil structure for parallelisation, the notation used is (a)xpencil (b)

y pencil (c)z pencil.

The advantage of this structure is that, for transpositions between thexand

y pencils, data is shared between groups of processors, so the communication can be performed in parallel between processors in the z direction. A similar situation occurs for transpositions between the y and z pencils. This is not the same when transferring from thezpencil to thexpencil or vice versa, however the most efficient transpositions of the data between these formats requires an intermediate step into theypencil. The local data is also required in directions, other than that of the one dimensional algorithms, as the gradients must be calculated based on the currently known information. To avoid the requirements for unnecessary transpositions, the local data around the edge of the blocks in all three directions must be kept. Thus, two layers of halo cells are used and updated before its use, only when the interior

information is modified. An illustration of this is shown in figure 3.2.

Figure 3.2: Halo cells for parallelisation

To solve the Poisson equation a two dimensional Fourier transform is re- quired. This is performed by using FFTW in the x direction, then transposing through theypencil into thezpencil, and applying FFTW in thezdirection. This must in turn be transferred to they pencil structure in order to perform a tridiag- onal matrix inversion. The application of this set of transpositions must be treated carefully as there is a change from real to complex variables and hence a reduction in length of the data set. This applies after the first use of FFTW and reduces that dimension of the arrays in x from Nx to N2x + 1. The consequence of this is

that transpositions after this must be performed using smaller arrays of complex type. After the calculation iny is performed a set of transposition steps in reverse to those performed initially, along with inverse Fourier transforms are necessary to return the correct change in pressure.

Reading and writing the data in serial is time consuming, especially in a parallel system, which requires a copious amount of data transfer before the infor- mation is written to file. When writing output, namely turbulent statistics or data fields, the frequency of this action causes heavy load in I/O. To reduce this effect, MPI I/O is used, which writes the data from each processor to file, simultaneously. The method to implement this is to calculate the blocks of continuous data con- tained on each processor, and the locations which each of those blocks fit into the

file to create an ordered data file which is independent of the number of processors or decomposition geometry used.

3.5

Algorithm

The following parallel algorithm is performed at each time step.

• Solve foru velocity:

– Calculate ∆tR1 in thex pencil.

– Transposex→y – Transposey→z – Calculate I+ ∆tM113 −1∆tR1 – Transposez→y – Calculate I+ ∆tM112 −1 I+ ∆tM113 −1∆tR1 – Transposey→x – Calculateδu∗∗1 = I+ ∆tM1 11 −1 I+ ∆tM2 11 −1 I+ ∆tM3 11 −1 ∆tR1 • Solve forv velocity:

– Calculate ∆tR∗∗2 = ∆t(R2−M21δu∗∗1 ) in thex pencil.

– Transposex→y – Transposey→z – Calculate I+ ∆tM223 −1∆tR∗∗2 – Transposez→y – Calculate I+ ∆tM2 22 −1 I+ ∆tM3 22 −1 ∆tR∗∗2 – Transposey→x – Calculateδu∗∗2 = I+ ∆tM221 −1 I+ ∆tM222 −1 I+ ∆tM223 −1 ∆tR∗∗2

• Solve forw velocity:

– Calculate ∆tR∗∗3 = ∆t(R3−M31δu∗∗1 −M32δu∗∗2 ) in thex pencil.

– Transposex→y – Transposey→z – Calculate I+ ∆tM333 −1∆tR∗∗3 – Transposez→y – Calculate I+ ∆tM332 −1 I+ ∆tM333 −1∆tR∗∗3 – Transposey→x – Calculateδu∗3 = I+ ∆tM1 33 −1 I+ ∆tM2 33 −1 I+ ∆tM3 33 −1 ∆tR∗∗3

• Find intermediate velocities:

– Calculateδu∗2 =δu∗∗2 −∆tM23δu∗3

– Calculateδu∗1 =δu∗∗1 −∆tM12δu∗2−∆tM13δu∗3

– Calculateu∗i =uni +δu∗i fori= 1,2,3 • Solve Poisson equation for pressure:

– Calculatef = 1t(Du∗−cbc) in thex pencil.

– Perform Fourier transform off in the xdirection.

– Transposex→y – Transposey→z

– Perform Fourier transform off in the zdirection.

– Transposez→y – Calculateδpb = −kx2+∂y∂22 −k2z −1 ˆ f – Transposey→z

– Transposez→y – Transposey→x

– Perform inverse Fourier transform ofδpb in thex direction to find δp. • Update new velocities and pressure:

– Calculateuni+1 =u∗i −∆t∂x

iδp fori= 1,2,3 in the x pencil.

– Calculatepn+12 =pn− 1 2 +δp

– Calculate turbulent statistics.

– Write statistical data.

In the above we calculate the right-hand side of the velocity equations as:

Ri = 1 ReLu n i −N uni − ∂ ∂xi pn−12 +mbci. (3.23) Also, the matrixM is defined by:

Mij =Nij−

1

2ReLij, (3.24)

whereNij and Lij are discretisations of the velocityui in directionxj.

3.6

Scaling

The parallel code was tested on HECToR using two different grid sizes in order to understand the scaling. Figure 3.3 plots the time taken per iterations compared to the number of cores used to parallelise the simulation. The scaling is close to the ideal scaling, and only deviates slightly as the number of processors becomes large.

Figure 3.3: Scaling up to 1000 cores using two grid sizes.

3.7

Statistical Analysis

3.7.1 Triple Decomposition

When performing simulations on large data sets, multiple simulations are required in order to achieve a long time average of the turbulent statistics. The length of simulations is currently limited on both the local machine and on HECToR, hence the requirement to perform different simulations in which the output velocity and pressure fields are used as an input for the next. It was therefore necessary to de- velop a post-processing code in which the long time average from various continuous simulations is combined to calculate the turbulent statistics. This code requires var- ious different averaging techniques to analyse the three waveforms involved in the streamwise travelling wave.

When calculating turbulent statistics for non-periodic channel flow, namely the no control case, it is common to adopt a double decomposition in the form

u =U +u0, where U is the mean velocity averaged both temporally and spatially inx and z and u0 is the fluctuation from the mean. With a periodic element, it is necessary to use a triple decomposition. This can be expressed as:

In this form U is the mean in space and time, as before, whereas ˜u is the purely periodic component of the mean (phase-averaged in time) andu0 is the fluctuation from the periodic mean. It is important to note here that, although the periodic mean is removed from the fluctuation, the intensity of the fluctuation can vary over the period and hence the variation of the root-mean-square over the period can be of interest. An example of the format of presentation of this data is shown in figure 3.4. (a) (b)

y

+

u

+ 100 101 102 0 0.5 1 1.5 2 2.5 3 ξ= 0 ξ= 0.1 ξ= 0.2 ξ= 0.3 ξ= 0.4

Figure 3.4: Variation of u0 over the period in 2D format (a) and 1D format (b) (plotted at the locations of coloured lines in (a)).

Figure 3.4a shows the rms fluctuation u0 over a period, with the periodic and time-averaged mean removed. The x-variable is the normalised period, ξ, and depends on the form of control used. For the oscillating wallξ =t/T where t is the time and T is the period of oscillation. For the standing and travelling waves this is set to ξ = xλTt (therefore ξ = xλ for the standing wave) as in Quadrio et al. [2009]. The coloured lines show the locations of the 1D plots presented in figure 3.4b, which shows the same data at five points in a half-period. This is due to the symmetry of the oscillation and independence of the direction of motion. Hence the variation loops twice over each period. A time-averaged and space-averaged 1D profile can also be obtained by taking the average overξ. This, notably, is different from performing a double decomposition as the periodic component is also removed.

This is not always appropriate, given large oscillations in the statistics, but can give a useful insight into the general trend throughout the oscillation.

3.7.2 Data Output

To express the turbulent statistics in the above form the data must be saved in various different ways. As writing 3d velocity data is expensive in terms of memory, it is useful to calculate the statistics at each time-step and perform any necessary averaging to save the data. For the wall-oscillation cases, spatial averaging may be performed in both the streamwise and spanwise directions. Due to the dependency in time this 1D data must be written out for every (or possibly everykth) time step. For the standing wave, the statistical information cannot be averaged in the streamwise direction, however time-averaging can be performed. Therefore xy-dependent 2D data can be written at the end of each simulation. For the streamwise travelling wave, spanwise-averaging can again be performed, however a simple time-average is not possible. Due to the similarity in the parameter:

ξ = x

λ− t T,

a spatial shift can be performed based on the time, and then a time average can be taken. Hence (ξ −y)-dependent 2D data can be written at the end of each simulation.

The post-processing code can then be used to gather the data from sequential simulations and either take the time average (in the standing or travelling wave cases) or concatenate the time dependent information (for the wall oscillation). The phase averaging can then be performed by first interpolating onto a grid (in either

t,x or ξ) which has a number of grid points which is a multiple of the number of periods of the forcing. Then the phase average can be taken simply. This data (now in the same format independent of control method) can be written in 2D or 1D

Related documents