Ludwig

### - a general purpose Lattice-Boltzmann code on the

### Cray T3E

### J.-C. Desplat

### Edinburgh Parallel Computing Centre

### The University of Edinburgh, Edinburgh EH9 3JZ, U.K.,

### P. Bladon, V.M. Kendon, I. Pagonabarraga and M.E. Cates

### Department of Physics and Astronomy

### The University of Edinburgh, Edinburgh EH9 3JZ, U.K.

### September 21, 1999

Abstract

This paper describesLudwig, a general purpose code for the simulation of Lattice-Boltzmann

(LB) models in 3-D on cubic lattices. Ludwigis not a single code, but a set of codes that share

certain common routines, such as I/O and communications. If Ludwig is used as intended, new

models should be simple to code, so that one may concentrate on the physics of the problem, rather than on parallel computing issues.

We rst explain the philosophy and structure ofLudwigwhich is argued to be the most eective

way of developing large codes for academic consortia. Next we elaborate on some parallel implemen-tation issues such as parallel I/O, and the use of MPI to achieve full portability and good eciency on MPP systems such as the Cray T3D and T3E.

Finally we brie y summarize our recent ground-breaking results obtained usingLudwigfor the

study of 3-D spinodal decomposition of binary uids in the inertial regime, report some preliminary results on simulation of uid mixtures under shear, and outline a novel scheme for the thermodynam-ically consistent simulation of wetting phenomena.

### 1 Objectives

The objective of our research eort has been to develop a general-purpose parallel Lattice-Boltzmann code (LB), called Ludwig, capable of simulating the hydrodynamics of complex uids in 3-D. Such a

simulation program should be capable of handling multicomponent uids, amphiphilic systems, and ow in porous media as well as colloidal particles and polymers. In due course we would like to address many problems including wetting and interfacial dynamics, detergency, binary uids in porous media, mesophase formation in amphiphiles, colloidal suspensions, and liquid crystal ows. So far, however, we have restricted attention to simple binary uids, and it is this version of the code that will be described below. We discuss in some detail how properly to include solid objects, such as static and moving walls and/or freely suspended colloids, in contact with a binary uid. More generally, the modular structure of

Ludwigshould facilitate its extension to many other of the above problems without extensive redesign.

But note that, with several of these problems (such as liquid crystal ows), it is not yet clear how to proceed even at the serial level.

### 2 The model

It can be shown that, at suciently large length and timescales, the Lattice Boltzmann (LB) model can simulate nearly incompressible viscous ows [4, 5]. For a one-component uid, it describes the evolution of a discrete set of particle densities on the sites (or nodes) of a lattice:

f i( ~r+~c i ;t+ 1)?f i( ~r;t) =?!(f eq i ( ~ r;t)?f i( ~r;t)) (1) The quantityf i( ~

r;t) is the density of particles with velocity~c

i resident at node ~

r at time t. This

particle density, will in unit time be convected (or propagated) to a neighbouring site~r+~c

i. Hence ~ c

iis a

lattice vector, or link vector, and the model is characterised by a nite set of velocitiesf~c i

g. The quantity f

eq i (

~r;t) is the `equilibrium distribution' off i(

~r;t). It is chosen to reproduce both the desired equilibrium

properties of the system of interest, and to ensure the appropriate hydrodynamics. The right hand side of equation (1) describes a mixing up of dierent particle densities, or collision: thef

i distribution relaxes

towardsf eq

i at a rate determined by

!, the relaxation parameter.

The dynamics of LB, as expressed in equation (1), provides immediate insight into the actual imple-mentation and underlying optimisation issues. It is characterized by two basic dynamic stages:

The propagation stage (left-hand side of equation (1)) consists of a set of nested loops performing

memory-to-memory copies.

The collision stage (right hand side) has a strong degree of spatial locality and relies on basic

add/multiply operations: its implementation is straightforward and can be highly optimised. The LB model described so far can be extended to a two-component convection-diusion model by the addition of a second distribution function, g

i. We follow the procedure of Swift et al. [9] in which f

i

describes the density eld, whilstg

i describes the order parameter eld,

. Both distribution functions

will have dynamics of the type of equation (1) but will be characterized by dierent relaxation parameters

!

;. By studying appropriate moments of the distribution functions, one can construct collision rules

that describe, in the continuum limit, the dynamics of a near-incompressible, isothermal binary uid with an arbitrary local free energy functionalF[]. The model chosen is a `

4' or Cahn-Hilliard type free

energy: F = Z dr ? A 2 2+ B 4 4+ ~ ln ~+ 2jrj 2 (2) whereA,B andare model parameters and ~is the total density. In practiceremains almost constant

at a value which we choose to be unity.

We emphasize thatLudwig is structured so that the free energy functional can be chosen at will.

This is a desirable feature of LB over, for example, the dissipative particle dynamics algorithm (DPD), where the free energy being modelled has to be deduced a posteriori from the simulation results [11]. On the other hand, unlike DPD and some other competing mesoscale techniques, the LB model for binary uids is not unconditionally stable. However, our experience suggests that even if the model becomes eventually unstable for a given set of parameter values, it arises so suddenly that such an instability does not impede collection of robust and reliable data over long periods beforehand.

Note also that, although there is a long history of studying

4 theory on the lattice, one needs to

be aware of possible lattice artifacts in the thermodynamic, as well as the hydrodynamic, sectors of the model [10]. For example, the coecient , which determines the thickness of the interface between two

uids [9], must be kept large enough to avoid a strong anisotropy of the interfacial tension. Moreover, the values of the dierent parameters should be carefully selected to ensure a reasonable physical response of the model [10]. However, since the same physical parameters (in a binary uid, viscosity, density, interfacial tension) can be achieved with more than one set of simulation parameters, it is normally possible to steer around these problems, though they do present traps for the unwary.

### 3 Implementation details

Ludwighas been developed with a modular and hierarchical structure in mind. The package is split in

three main components:

1. Model subdirectory: contains all the model-specic functions as well as main.c. Users can plug-in their own routplug-ines (e.g. to implement a dierent free energy functional). Once the model is dened, the only modication required to run simulations is to edit main.c to call the relevant

measuring functions. At this time, the models available include D 3 Q 15, and D 3 Q 19 (see [4, 5] for

their denitions), i.e. 3-D cubic lattices with 15 and 19 velocity vectors respectively.

2. Common subdirectory: contains all the low-level calls such as the communication layer and the parallel I/O. These generic functions can be called by all models.

3. Utilities subdirectory: contains stand-alone pre-processors for setting-up initial congurations, as well as a set of routines to provide real-time graphics during simulations. This functionality proves invaluable to gain a better understanding of the dynamics and for debugging purposes.

The main advantage of this modular approach is the fact that the computational complexity is hidden, which allows the users to concentrate on the physical analysis of a given system rather than on imple-mentation issues. Other advantages include code re-use, package extendibility, portability and eciency.

Ludwig has been programmed using ANSI C and MPI1.1 thus achieving a high level of portability:

indeed, it has been successfully installed on a variety of platforms (Cray T3E, T3D and J90, SGI Origin 2000, Hitachi SR2201, and Sun E3500) with no modication required.

Although periodic boundary conditions are applied to the model, these can be modied by explic-itly adding solid surfaces at the boundaries. In this case one has to consider three dierent kind of, or `coloured', sites on the lattice: solid, uid and boundary sites (i.e., uid sites with at least one neigh-bouring solid site). Accordingly, the links are then classied as wet or dry links depending whether they join uid sites or solid to uid sites, respectively. Then, in order to implement the appropriate boundary conditions at the walls (which we discuss in detail in the next section), the values off

iand g

i

correspond-ing to the boundary sites and dry links are stored in two separate linked lists, dierent from the basic vectors which storef

iand g

i for all sites. 3.1 Solid objects

Solid objects have been implemented by applying so-called stick boundary conditions, following the bounce-back on the links (BBL) scheme proposed by Ladd [5]. During propagation, the component of the distribution function that would propagate into the solid node is bounced back and ends up back at the uid node, pointing in the opposite direction. This produces stick boundary conditions at one half the distance along the link vector joining the solid and uid nodes. Let us assume that a solid- uid boundary exists between a node at ~r and one at ~r+~c

i, where

i labels the relevant lattice vector. Let i

0 be the opposite lattice vector, so that ~c

i = ?~c

i

0. Then, at the link, there are two incoming velocity

distributions after the collision; let's call this time t

+. The post collision distributions are: f i( ~ r;t +) and f i 0( ~ r+~c i ;t

+). These distributions are re ected so that: f i( ~r+~c i ;t+ 1) = f i 0(~r+~c i ;t +) (3) f i 0( ~r;t+ 1) = f i( ~r;t +)

If the solid is moving with a velocity~u

b, the previous boundary conditions have to be modied. If

the densitiesf

iand order parameters g

i are allowed to `leak' across the boundary links, then the velocity

at the link can be matched to the velocity of the wall[5]. In the case of a binary mixture, generazing the results of Ladd[5], the basic BBL scheme is modied as follows:

f i( ~r+~c i ;t+ 1) = f i 0(~r+~c i ;t +) + 6 t i ~u b ~c i (4) f i 0( ~r;t+ 1) = f i( ~r;t +) ?6t i ~u b ~c i g i( ~r+~c i ;t+ 1) = g i 0( ~r+~c i ;t +) + 6 t i ~u b ~c i g i 0(~r;t+ 1) = g i( ~r;t +) ?6t i ~u b ~c i

where the quantities t

i are geometric factors related to the weights of the dierent subsets of velocities ~c

i, and are xed when imposing the appropriate equilibrium distribution functions for f

i and g

i. Note

that the BBL rules given above require careful implementation if they are properly to account for the eect on the composition variableof motion in a direction normal to the solid- uid boundary [3, 12].

The velocity of the solid particles can be xed beforehand. In this case, one can use such moving objects e.g. to apply a shear ow through parallel plates at the boundaries of a sample, or to study aspects of colloid hydrodynamics such as the steady-state sedimentation of an ordered array of colloidal spheres with a prescribed distribution. Alternatively, if the velocities of the solid particles are updated, one can for example simulate the dynamics of colloidal suspensions [5].

3.2 Wetting

Although wetting is known to play a major part in in uencing the behaviour of complex uids next to solid objects, its actual implementation in simulations still remains in its infancy. We have devised a novel predictor-corrector scheme for a more accurate implementation of controlled wetting eects at the solid/ uid/ uid interface. Recalling that Ludwig uses a

4 model free energy (see equation (2)), the

best way to account for wetting properties is to associate with the solid surfaces an additional surface free energyA s( s), where: dA s d s = d dx : (5) with A s= C 2 2 ?H (6)

The parameters C and H can then be used to control the wetting properties of the surface in a

thermodynamically controlled manner [3, 13], so that the various interfacial tensions can be varied at will.

The main diculty to implement the general boundary condition, equation (6), is that, due to BBL, the solid surface lies between the sites thus making the calculation of r and r

2

by nite dierence

from neighbouring sites impossible. To circumvent this drawback, we use a predictor-corrector scheme to estimate the gradient at the solid wall as follows:

1. determine which sites are next to a wall (boundary sites), and hence which links cross the wall (i.e., dry links);

2. estimaterusing nite dierences on all wet links;

3. from this estimate ofr, extrapolate to halfway along the dry links, and calculate s; using

s on

the dry links, calculated=dxj

s on these links;

4. calculaterandr 2

for the boundary sites using all the gradients estimated on the links.

Figure 1: The wetting algorithm

The numerical stability and accuracy of this scheme can be improved by increasing the accuracy in the computation of the gradient of the order parameter[3]. This scheme gives good quantitative results of the wetting angles in accord with thermodynamic predictions. Results of typical case-studies, both for a droplet and for planar interfaces will be published elsewhere [3].

Most production runs have been carried out both on the Cray T3D and the T3E-900 at EPCC and on the Cray T3E-1200 at CSAR. Benchmarks reproduced gure 2 demonstrate that over 90% of the time is spent in the collision and propagation stages. The communications and BBL only account for a mere 2-6%. As expected, the increased clock speed of the Cray-1200 benets the collision stage which is a highly-localised algorithm using basic arithmetic operators (typically, add/multiply). This routine has been highly optimised and makes a good use of the T3E memory hierarchy. On the other hand, the memory-to-memory copies performed in the propagation stage does not benet from this increase in clock speed. However, it could be optimised by grouping loops to make an ecient use of the T3E streams. The most ecient (and portable) way to optimise this stage is by an appropriate re-ordering of the velocity setfc

i

g. Gains of up to 20% can be achieved this way.

T3E900-344 T3E1200-576 0.0% 20.0% 40.0% 60.0% 80.0% 100.0% BBL 0.8% 1.6% 1.6% 1.6% 1.0% 1.0% 1.0% Halo swap 1.2% 1.8% 3.7% 4.6% 1.9% 3.8% 4.7% Propagation 46.3% 44.6% 43.1% 43.4% 68.6% 64.2% 63.8% Collision 51.6% 52.0% 51.6% 50.3% 28.5% 31.0% 30.5% 16 32 64 128 32 64 128

Figure 2: Fraction of time spent in each functions

As shown in gure 3,Ludwig demonstrates linear scaling from 16 up to 128 processors. Results are

presented for simulations with and without I/O. Clearly, the I/O is a major bottleneck (e.g. a 2563system

will generate in excess of 4.5Gb for each data dump). This bottleneck became even more noticeable on the T3E-900 on which our consortium had only access to NFS-mounted volumes. Scaling gures have been normalised with respect to the time taken for the lowest number of processors (16 or 32 for the T3E-900 and T3E-1200 respectively). This explains the slight super-linear scale-up obtained for the scaling on the T3E-1200.

The I/O has been highly optimised by performing parallel I/O. The pool of PEs is split intoN groups

ofpprocessors, thus providingN concurrent I/O streams (typically,N= 8). Each group has a root PE

which will perform all I/O operations. The remaining (p?1) PEs send their data in turn to the I/O PEs

which pack these data and write them to disk. This approach is known to produce the highest possible bandwidth without having to use platform specic calls such as disk striping. Note that MPI2-IO had to be discounted on the ground of performance.

One has to conclude this discussion by deploring the lack of MPI2 single-sided communications in Cray's own MPI library. Indeed, this functionality would have considerably simplied (and speeded-up) communications, e.g., when updating the contents of the boundary sites in the halo region.

0
20
40
60
80
100
120
140
0 16 32 48 64 80 96 112 128 144
**N PEs**
(128)3 with I/O
(128)3 w/o I/O
(256)3 with I/O
(256)3 w/o I/O
y=x

Figure 3: Scale-up graphs

### 4 Results

We brie y present results for two problems in binary uid mixtures: 1. Establishment of the role of inertia in late-stage coarsening.

2. Study of the eect of an applied shear ow on the coarsening process.

We have simulated the late-stage coarsening process of a fully symmetric 0.5/0.5 mixture by applying a deep quench below its spinodal temperature. As phase separation occurs, the two domains will form a bicontinuous structure with a sharp, well-dened interface (see gure 4).

Figure 4: Late-stage coarsening at T = f2000;2600;3400g(left to right); the isosurface denotes the

uid- uid interface

This process involves capillary forces, uid inertia and viscous dissipation which are controlled through three physical parameters: the interfacial tension, the uid mass density, and the shear viscosity.

As mentioned previously, a manifold of simulation parameters exists for each physical set, and careful parameter steering allows the main pitfalls (lattice artifacts and numerical instability) to be avoided. Of major interest is the characteristic domain scaleL(dened through the rst moment of the composition

autocorrelation in reciprocal space [1]). This can be scaled to dimensionless form as l=L= 2, with

a similar procedure for time, using 3

=

2 as its unit. Theory [6] predicts at

l;t1 an inertial scaling

(see [1] for a discussion). By combining several large runs with dierent parameter choices, our simulations covered an unprecedented range of length scales (1.l.10

5) and time scales (10

.t.10

8). At very

late times (beyond time t

'10

5) we indeed recover the inertial scaling expected. The corresponding

Reynolds number (as usually dened for this problem,R e=ldl=dt) attains values of a few hundred; we

nd no indication thatR eis self-limiting at valuesR e100 as suggested by Grant and Elder [7], which

would lead tolt

1=2 at large enough times.

0 2000 4000 6000 8000 10000 12000 14000 16000 18000
T (time steps)
0
10
20
30
40
50
60
70
80
L (lattice units)
Lmax
L_{min}
0 2000 4000 6000 8000 10000 12000 14000 16000 18000
T (time steps)
0
10
20
30
40
50
60
70
80
L (lattice units)
10−1
100
101
102
103
104
105
106
10−1
100
101
102
103
104
105
106
102 103 104
102
103
104
100 101 102 103 104 105 106 107 108
reduced time, t = (T − Tint) / T0

10−1 100 101 102 103 104 105 106 reduced length, l = L / L 0 2/3 1

Figure 5: L vs. T ts (left) and scaling plot in reduced variable (right)

Figure 5a demonstrates the ts for= 1;2=3 (solid lines) and free exponent ts (dashed lines) with

best results for values of= 1:16;0:69 for two runs at the extremes of our range of simulated parameters.

Figure 5b illustrates the range of length and time scales covered by our simulation (solid lines) compared to that of other groups (squares, Ref. [18], triangles, Ref. [16], and circles, Ref. [17]) and a comparison with other simulation data from recent DPD runs (inset, Ref. [15]). See Ref. [1] for further details.

In a separate study, our group clearly established the growing inhibitory eect that applying shear ow has on the coarsening process as shear is increased. As the system becomes anisotropic, one could observe strong variations in the three characteristic length-scales as well as the formation of an oriented domain texture (see gure 6a,b). Preliminary results suggest, however, that extreme anisotropy may not be enough to completely inhibit coarsening. A view down the streamlines shows the existence of a network of thin uid necks, in the plane transverse to the ow, which allow coarsening to continue even once nearly all the interface is perpendicular to this plane (see gure 6c). This apparently contradicts the framework of Doi and Ohta's scaling theory [8], which assumes eventual saturation of the mean domain size at a scale set by the balance of viscous and interfacial tension terms. A further discussion of these preliminary results is presented in reference [2].

Figure 6: Oriented domain texture at T = f1000;4000g; the shear is applied through the top plane

moving left and the bottom plane moving right (left and centre). Narrow uid necks connecting blocks of similar uids; the block colours as in (a) and (b) show the identity of the two uids, now on a vertical plane towards the back of the simulation cell (right)

### 5 Future plans

Some of the authors are currently working on the hydrodynamic simulation of multicomponent uid ow in a porous networks with controlled wetting; on implementation of Lees-Edwards (sliding periodic boundary conditions); on large-scale simulations under shear; and on the improvement of the gradients to make the thermodynamics of this model independent of the underlying symmetries of the lattice. Further plans include studying colloid hydrodynamics and extendingLudwigto study amphiphilic systems under

shear (see [14] for an example of this studied by DPD).

The authors would like to acknowledge Simon Jury, Patrick Warren, and Julia Yeomans for valuable discussions. This work has been funded in part under the Maxwell Institute's project on `Fluid ow in soft and porous matter' and EPSRC E7 Grand Challenge (GR/M56234).

### References

[1] V.M. Kendon, J.-C. Desplat, P. Bladon, and M.E. Cates, Phys. Rev. Lett.83, 576 (1999).

[2] M.E. Cates, V.M. Kendon, P. Bladon, J.-C. Desplat, Faraday Discussions112, 1 (1999).

[3] P. Bladon, J.-C. Desplat, I. Pagonabarraga, and M.E. Cates, in preparation, to be submitted to Comp. Phys. Comm..

[4] Y.H. Qian, D. D'Humieres, and P. Lallemand, Europhys. Lett.17, 479 (1992).

[5] A.J.C. Ladd, J. Fluid Mech.271, 285 (1994).

[6] H. Furukawa, Phys. Rev. A31, 1103 (1985).

[7] M. Grant, and K.R. Elder, Phys. Rev. Lett.82, 14 (1999).

[8] M. Doi, and T. Ohta, J. Chem. Phys.95, 1242 (1991).

[9] M.R. Swift, E. Orlandini, W.R. Osborn, and J. Yeomans, Phys. Rev. E54, 5041 (1996).

[10] V. Kendon, Ph.D. Thesis, University of Edinburgh (1999). [11] R.D. Groot and P.B. Warren, J. Chem. Phys.107, p4423 (1997).

[12] We are grateful to Dr. P.B. Warren for a discussion on this point. [13] J. Cahn, J. Chem. Phys.66, 3667 (1977).

[14] S. Jury, P. Bladon, M.E. Cates, S. Krishna, M. Hagen, N. Ruddock, and P. Warren, Phys. Chem. Chem. Phys.1, 2051 (1999).

[15] S.I. Jury, P. Bladon, S. Krishna and M.E. Cates, Phys. Rev. E.59, R2535 (1999).

[16] M. Laradji, S. Toxvaerd, and O.G. Mouritsen, Phys. Rev. Lett.77, 2253 (1996).

[17] S. Bastea and J.L. Lebowitz, Phys. Rev. Lett.78, 3499 (1997).