• No results found

100.0% 80.0% 60.0% 40.0% 20.0%


Academic year: 2021

Share "100.0% 80.0% 60.0% 40.0% 20.0%"

Show more ( Page)

Full text



- a general purpose Lattice-Boltzmann code on the

Cray T3E

J.-C. Desplat

Edinburgh Parallel Computing Centre

The University of Edinburgh, Edinburgh EH9 3JZ, U.K.,

P. Bladon, V.M. Kendon, I. Pagonabarraga and M.E. Cates

Department of Physics and Astronomy

The University of Edinburgh, Edinburgh EH9 3JZ, U.K.

September 21, 1999


This paper describesLudwig, a general purpose code for the simulation of Lattice-Boltzmann

(LB) models in 3-D on cubic lattices. Ludwigis not a single code, but a set of codes that share

certain common routines, such as I/O and communications. If Ludwig is used as intended, new

models should be simple to code, so that one may concentrate on the physics of the problem, rather than on parallel computing issues.

We rst explain the philosophy and structure ofLudwigwhich is argued to be the most e ective

way of developing large codes for academic consortia. Next we elaborate on some parallel implemen-tation issues such as parallel I/O, and the use of MPI to achieve full portability and good eciency on MPP systems such as the Cray T3D and T3E.

Finally we brie y summarize our recent ground-breaking results obtained usingLudwigfor the

study of 3-D spinodal decomposition of binary uids in the inertial regime, report some preliminary results on simulation of uid mixtures under shear, and outline a novel scheme for the thermodynam-ically consistent simulation of wetting phenomena.

1 Objectives

The objective of our research e ort has been to develop a general-purpose parallel Lattice-Boltzmann code (LB), called Ludwig, capable of simulating the hydrodynamics of complex uids in 3-D. Such a

simulation program should be capable of handling multicomponent uids, amphiphilic systems, and ow in porous media as well as colloidal particles and polymers. In due course we would like to address many problems including wetting and interfacial dynamics, detergency, binary uids in porous media, mesophase formation in amphiphiles, colloidal suspensions, and liquid crystal ows. So far, however, we have restricted attention to simple binary uids, and it is this version of the code that will be described below. We discuss in some detail how properly to include solid objects, such as static and moving walls and/or freely suspended colloids, in contact with a binary uid. More generally, the modular structure of

Ludwigshould facilitate its extension to many other of the above problems without extensive redesign.

But note that, with several of these problems (such as liquid crystal ows), it is not yet clear how to proceed even at the serial level.

2 The model

It can be shown that, at suciently large length and timescales, the Lattice Boltzmann (LB) model can simulate nearly incompressible viscous ows [4, 5]. For a one-component uid, it describes the evolution of a discrete set of particle densities on the sites (or nodes) of a lattice:


f i( ~r+~c i ;t+ 1)?f i( ~r;t) =?!(f eq i ( ~ r;t)?f i( ~r;t)) (1) The quantityf i( ~

r;t) is the density of particles with velocity~c

i resident at node ~

r at time t. This

particle density, will in unit time be convected (or propagated) to a neighbouring site~r+~c

i. Hence ~ c

iis a

lattice vector, or link vector, and the model is characterised by a nite set of velocitiesf~c i

g. The quantity f

eq i (

~r;t) is the `equilibrium distribution' off i(

~r;t). It is chosen to reproduce both the desired equilibrium

properties of the system of interest, and to ensure the appropriate hydrodynamics. The right hand side of equation (1) describes a mixing up of di erent particle densities, or collision: thef

i distribution relaxes

towardsf eq

i at a rate determined by

!, the relaxation parameter.

The dynamics of LB, as expressed in equation (1), provides immediate insight into the actual imple-mentation and underlying optimisation issues. It is characterized by two basic dynamic stages:

 The propagation stage (left-hand side of equation (1)) consists of a set of nested loops performing

memory-to-memory copies.

 The collision stage (right hand side) has a strong degree of spatial locality and relies on basic

add/multiply operations: its implementation is straightforward and can be highly optimised. The LB model described so far can be extended to a two-component convection-di usion model by the addition of a second distribution function, g

i. We follow the procedure of Swift et al. [9] in which f


describes the density eld, whilstg

i describes the order parameter eld,

. Both distribution functions

will have dynamics of the type of equation (1) but will be characterized by di erent relaxation parameters


;. By studying appropriate moments of the distribution functions, one can construct collision rules

that describe, in the continuum limit, the dynamics of a near-incompressible, isothermal binary uid with an arbitrary local free energy functionalF[]. The model chosen is a `

4' or Cahn-Hilliard type free

energy: F = Z dr  ? A 2 2+ B 4 4+ ~ ln ~+  2jrj 2  (2) whereA,B andare model parameters and ~is the total density. In practiceremains almost constant

at a value which we choose to be unity.

We emphasize thatLudwig is structured so that the free energy functional can be chosen at will.

This is a desirable feature of LB over, for example, the dissipative particle dynamics algorithm (DPD), where the free energy being modelled has to be deduced a posteriori from the simulation results [11]. On the other hand, unlike DPD and some other competing mesoscale techniques, the LB model for binary uids is not unconditionally stable. However, our experience suggests that even if the model becomes eventually unstable for a given set of parameter values, it arises so suddenly that such an instability does not impede collection of robust and reliable data over long periods beforehand.

Note also that, although there is a long history of studying 

4 theory on the lattice, one needs to

be aware of possible lattice artifacts in the thermodynamic, as well as the hydrodynamic, sectors of the model [10]. For example, the coecient , which determines the thickness of the interface between two

uids [9], must be kept large enough to avoid a strong anisotropy of the interfacial tension. Moreover, the values of the di erent parameters should be carefully selected to ensure a reasonable physical response of the model [10]. However, since the same physical parameters (in a binary uid, viscosity, density, interfacial tension) can be achieved with more than one set of simulation parameters, it is normally possible to steer around these problems, though they do present traps for the unwary.

3 Implementation details

Ludwighas been developed with a modular and hierarchical structure in mind. The package is split in

three main components:

1. Model subdirectory: contains all the model-speci c functions as well as main.c. Users can plug-in their own routplug-ines (e.g. to implement a di erent free energy functional). Once the model is de ned, the only modi cation required to run simulations is to edit main.c to call the relevant


measuring functions. At this time, the models available include D 3 Q 15, and D 3 Q 19 (see [4, 5] for

their de nitions), i.e. 3-D cubic lattices with 15 and 19 velocity vectors respectively.

2. Common subdirectory: contains all the low-level calls such as the communication layer and the parallel I/O. These generic functions can be called by all models.

3. Utilities subdirectory: contains stand-alone pre-processors for setting-up initial con gurations, as well as a set of routines to provide real-time graphics during simulations. This functionality proves invaluable to gain a better understanding of the dynamics and for debugging purposes.

The main advantage of this modular approach is the fact that the computational complexity is hidden, which allows the users to concentrate on the physical analysis of a given system rather than on imple-mentation issues. Other advantages include code re-use, package extendibility, portability and eciency.

Ludwig has been programmed using ANSI C and MPI1.1 thus achieving a high level of portability:

indeed, it has been successfully installed on a variety of platforms (Cray T3E, T3D and J90, SGI Origin 2000, Hitachi SR2201, and Sun E3500) with no modi cation required.

Although periodic boundary conditions are applied to the model, these can be modi ed by explic-itly adding solid surfaces at the boundaries. In this case one has to consider three di erent kind of, or `coloured', sites on the lattice: solid, uid and boundary sites (i.e., uid sites with at least one neigh-bouring solid site). Accordingly, the links are then classi ed as wet or dry links depending whether they join uid sites or solid to uid sites, respectively. Then, in order to implement the appropriate boundary conditions at the walls (which we discuss in detail in the next section), the values off

iand g


correspond-ing to the boundary sites and dry links are stored in two separate linked lists, di erent from the basic vectors which storef

iand g

i for all sites. 3.1 Solid objects

Solid objects have been implemented by applying so-called stick boundary conditions, following the bounce-back on the links (BBL) scheme proposed by Ladd [5]. During propagation, the component of the distribution function that would propagate into the solid node is bounced back and ends up back at the uid node, pointing in the opposite direction. This produces stick boundary conditions at one half the distance along the link vector joining the solid and uid nodes. Let us assume that a solid- uid boundary exists between a node at ~r and one at ~r+~c

i, where

i labels the relevant lattice vector. Let i

0 be the opposite lattice vector, so that ~c

i = ?~c


0. Then, at the link, there are two incoming velocity

distributions after the collision; let's call this time t

+. The post collision distributions are: f i( ~ r;t +) and f i 0( ~ r+~c i ;t

+). These distributions are re ected so that: f i( ~r+~c i ;t+ 1) = f i 0(~r+~c i ;t +) (3) f i 0( ~r;t+ 1) = f i( ~r;t +)

If the solid is moving with a velocity~u

b, the previous boundary conditions have to be modi ed. If

the densitiesf

iand order parameters g

i are allowed to `leak' across the boundary links, then the velocity

at the link can be matched to the velocity of the wall[5]. In the case of a binary mixture, generazing the results of Ladd[5], the basic BBL scheme is modi ed as follows:

f i( ~r+~c i ;t+ 1) = f i 0(~r+~c i ;t +) + 6 t i ~u b ~c i (4) f i 0( ~r;t+ 1) = f i( ~r;t +) ?6t i ~u b ~c i g i( ~r+~c i ;t+ 1) = g i 0( ~r+~c i ;t +) + 6 t i ~u b ~c i g i 0(~r;t+ 1) = g i( ~r;t +) ?6t i ~u b ~c i

where the quantities t

i are geometric factors related to the weights of the di erent subsets of velocities ~c

i, and are xed when imposing the appropriate equilibrium distribution functions for f

i and g

i. Note

that the BBL rules given above require careful implementation if they are properly to account for the e ect on the composition variableof motion in a direction normal to the solid- uid boundary [3, 12].


The velocity of the solid particles can be xed beforehand. In this case, one can use such moving objects e.g. to apply a shear ow through parallel plates at the boundaries of a sample, or to study aspects of colloid hydrodynamics such as the steady-state sedimentation of an ordered array of colloidal spheres with a prescribed distribution. Alternatively, if the velocities of the solid particles are updated, one can for example simulate the dynamics of colloidal suspensions [5].

3.2 Wetting

Although wetting is known to play a major part in in uencing the behaviour of complex uids next to solid objects, its actual implementation in simulations still remains in its infancy. We have devised a novel predictor-corrector scheme for a more accurate implementation of controlled wetting e ects at the solid/ uid/ uid interface. Recalling that Ludwig uses a 

4 model free energy (see equation (2)), the

best way to account for wetting properties is to associate with the solid surfaces an additional surface free energyA s(  s), where: dA s d s = d dx : (5) with A s= C 2 2 ?H (6)

The parameters C and H can then be used to control the wetting properties of the surface in a

thermodynamically controlled manner [3, 13], so that the various interfacial tensions can be varied at will.

The main diculty to implement the general boundary condition, equation (6), is that, due to BBL, the solid surface lies between the sites thus making the calculation of r and r


 by nite di erence

from neighbouring sites impossible. To circumvent this drawback, we use a predictor-corrector scheme to estimate the gradient at the solid wall as follows:

1. determine which sites are next to a wall (boundary sites), and hence which links cross the wall (i.e., dry links);

2. estimaterusing nite di erences on all wet links;

3. from this estimate ofr, extrapolate to halfway along the dry links, and calculate s; using

 s on

the dry links, calculated=dxj

s on these links;

4. calculaterandr 2

for the boundary sites using all the gradients estimated on the links.

Figure 1: The wetting algorithm

The numerical stability and accuracy of this scheme can be improved by increasing the accuracy in the computation of the gradient of the order parameter[3]. This scheme gives good quantitative results of the wetting angles in accord with thermodynamic predictions. Results of typical case-studies, both for a droplet and for planar interfaces will be published elsewhere [3].


Most production runs have been carried out both on the Cray T3D and the T3E-900 at EPCC and on the Cray T3E-1200 at CSAR. Benchmarks reproduced gure 2 demonstrate that over 90% of the time is spent in the collision and propagation stages. The communications and BBL only account for a mere 2-6%. As expected, the increased clock speed of the Cray-1200 bene ts the collision stage which is a highly-localised algorithm using basic arithmetic operators (typically, add/multiply). This routine has been highly optimised and makes a good use of the T3E memory hierarchy. On the other hand, the memory-to-memory copies performed in the propagation stage does not bene t from this increase in clock speed. However, it could be optimised by grouping loops to make an ecient use of the T3E streams. The most ecient (and portable) way to optimise this stage is by an appropriate re-ordering of the velocity setfc


g. Gains of up to 20% can be achieved this way.

T3E900-344 T3E1200-576 0.0% 20.0% 40.0% 60.0% 80.0% 100.0% BBL 0.8% 1.6% 1.6% 1.6% 1.0% 1.0% 1.0% Halo swap 1.2% 1.8% 3.7% 4.6% 1.9% 3.8% 4.7% Propagation 46.3% 44.6% 43.1% 43.4% 68.6% 64.2% 63.8% Collision 51.6% 52.0% 51.6% 50.3% 28.5% 31.0% 30.5% 16 32 64 128 32 64 128

Figure 2: Fraction of time spent in each functions

As shown in gure 3,Ludwig demonstrates linear scaling from 16 up to 128 processors. Results are

presented for simulations with and without I/O. Clearly, the I/O is a major bottleneck (e.g. a 2563system

will generate in excess of 4.5Gb for each data dump). This bottleneck became even more noticeable on the T3E-900 on which our consortium had only access to NFS-mounted volumes. Scaling gures have been normalised with respect to the time taken for the lowest number of processors (16 or 32 for the T3E-900 and T3E-1200 respectively). This explains the slight super-linear scale-up obtained for the scaling on the T3E-1200.

The I/O has been highly optimised by performing parallel I/O. The pool of PEs is split intoN groups

ofpprocessors, thus providingN concurrent I/O streams (typically,N= 8). Each group has a root PE

which will perform all I/O operations. The remaining (p?1) PEs send their data in turn to the I/O PEs

which pack these data and write them to disk. This approach is known to produce the highest possible bandwidth without having to use platform speci c calls such as disk striping. Note that MPI2-IO had to be discounted on the ground of performance.

One has to conclude this discussion by deploring the lack of MPI2 single-sided communications in Cray's own MPI library. Indeed, this functionality would have considerably simpli ed (and speeded-up) communications, e.g., when updating the contents of the boundary sites in the halo region.


0 20 40 60 80 100 120 140 0 16 32 48 64 80 96 112 128 144 N PEs (128)3 with I/O (128)3 w/o I/O (256)3 with I/O (256)3 w/o I/O y=x

Figure 3: Scale-up graphs

4 Results

We brie y present results for two problems in binary uid mixtures: 1. Establishment of the role of inertia in late-stage coarsening.

2. Study of the e ect of an applied shear ow on the coarsening process.

We have simulated the late-stage coarsening process of a fully symmetric 0.5/0.5 mixture by applying a deep quench below its spinodal temperature. As phase separation occurs, the two domains will form a bicontinuous structure with a sharp, well-de ned interface (see gure 4).

Figure 4: Late-stage coarsening at T = f2000;2600;3400g(left to right); the isosurface denotes the

uid- uid interface

This process involves capillary forces, uid inertia and viscous dissipation which are controlled through three physical parameters: the interfacial tension, the uid mass density, and the shear viscosity.

As mentioned previously, a manifold of simulation parameters exists for each physical set, and careful parameter steering allows the main pitfalls (lattice artifacts and numerical instability) to be avoided. Of major interest is the characteristic domain scaleL(de ned through the rst moment of the composition

autocorrelation in reciprocal space [1]). This can be scaled to dimensionless form as l=L= 2, with

a similar procedure for time, using 3


2 as its unit. Theory [6] predicts at

l;t1 an inertial scaling


(see [1] for a discussion). By combining several large runs with di erent parameter choices, our simulations covered an unprecedented range of length scales (1.l.10

5) and time scales (10


8). At very

late times (beyond time t 


5) we indeed recover the inertial scaling expected. The corresponding

Reynolds number (as usually de ned for this problem,R e=ldl=dt) attains values of a few hundred; we

nd no indication thatR eis self-limiting at valuesR e100 as suggested by Grant and Elder [7], which

would lead tolt

1=2 at large enough times.

0 2000 4000 6000 8000 10000 12000 14000 16000 18000 T (time steps) 0 10 20 30 40 50 60 70 80 L (lattice units) Lmax Lmin 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 T (time steps) 0 10 20 30 40 50 60 70 80 L (lattice units) 10−1 100 101 102 103 104 105 106 10−1 100 101 102 103 104 105 106 102 103 104 102 103 104 100 101 102 103 104 105 106 107 108 reduced time, t = (T − Tint) / T0

10−1 100 101 102 103 104 105 106 reduced length, l = L / L 0 2/3 1

Figure 5: L vs. T ts (left) and scaling plot in reduced variable (right)

Figure 5a demonstrates the ts for = 1;2=3 (solid lines) and free exponent ts (dashed lines) with

best results for values of = 1:16;0:69 for two runs at the extremes of our range of simulated parameters.

Figure 5b illustrates the range of length and time scales covered by our simulation (solid lines) compared to that of other groups (squares, Ref. [18], triangles, Ref. [16], and circles, Ref. [17]) and a comparison with other simulation data from recent DPD runs (inset, Ref. [15]). See Ref. [1] for further details.

In a separate study, our group clearly established the growing inhibitory e ect that applying shear ow has on the coarsening process as shear is increased. As the system becomes anisotropic, one could observe strong variations in the three characteristic length-scales as well as the formation of an oriented domain texture (see gure 6a,b). Preliminary results suggest, however, that extreme anisotropy may not be enough to completely inhibit coarsening. A view down the streamlines shows the existence of a network of thin uid necks, in the plane transverse to the ow, which allow coarsening to continue even once nearly all the interface is perpendicular to this plane (see gure 6c). This apparently contradicts the framework of Doi and Ohta's scaling theory [8], which assumes eventual saturation of the mean domain size at a scale set by the balance of viscous and interfacial tension terms. A further discussion of these preliminary results is presented in reference [2].

Figure 6: Oriented domain texture at T = f1000;4000g; the shear is applied through the top plane

moving left and the bottom plane moving right (left and centre). Narrow uid necks connecting blocks of similar uids; the block colours as in (a) and (b) show the identity of the two uids, now on a vertical plane towards the back of the simulation cell (right)


5 Future plans

Some of the authors are currently working on the hydrodynamic simulation of multicomponent uid ow in a porous networks with controlled wetting; on implementation of Lees-Edwards (sliding periodic boundary conditions); on large-scale simulations under shear; and on the improvement of the gradients to make the thermodynamics of this model independent of the underlying symmetries of the lattice. Further plans include studying colloid hydrodynamics and extendingLudwigto study amphiphilic systems under

shear (see [14] for an example of this studied by DPD).

The authors would like to acknowledge Simon Jury, Patrick Warren, and Julia Yeomans for valuable discussions. This work has been funded in part under the Maxwell Institute's project on `Fluid ow in soft and porous matter' and EPSRC E7 Grand Challenge (GR/M56234).


[1] V.M. Kendon, J.-C. Desplat, P. Bladon, and M.E. Cates, Phys. Rev. Lett.83, 576 (1999).

[2] M.E. Cates, V.M. Kendon, P. Bladon, J.-C. Desplat, Faraday Discussions112, 1 (1999).

[3] P. Bladon, J.-C. Desplat, I. Pagonabarraga, and M.E. Cates, in preparation, to be submitted to Comp. Phys. Comm..

[4] Y.H. Qian, D. D'Humieres, and P. Lallemand, Europhys. Lett.17, 479 (1992).

[5] A.J.C. Ladd, J. Fluid Mech.271, 285 (1994).

[6] H. Furukawa, Phys. Rev. A31, 1103 (1985).

[7] M. Grant, and K.R. Elder, Phys. Rev. Lett.82, 14 (1999).

[8] M. Doi, and T. Ohta, J. Chem. Phys.95, 1242 (1991).

[9] M.R. Swift, E. Orlandini, W.R. Osborn, and J. Yeomans, Phys. Rev. E54, 5041 (1996).

[10] V. Kendon, Ph.D. Thesis, University of Edinburgh (1999). [11] R.D. Groot and P.B. Warren, J. Chem. Phys.107, p4423 (1997).

[12] We are grateful to Dr. P.B. Warren for a discussion on this point. [13] J. Cahn, J. Chem. Phys.66, 3667 (1977).

[14] S. Jury, P. Bladon, M.E. Cates, S. Krishna, M. Hagen, N. Ruddock, and P. Warren, Phys. Chem. Chem. Phys.1, 2051 (1999).

[15] S.I. Jury, P. Bladon, S. Krishna and M.E. Cates, Phys. Rev. E.59, R2535 (1999).

[16] M. Laradji, S. Toxvaerd, and O.G. Mouritsen, Phys. Rev. Lett.77, 2253 (1996).

[17] S. Bastea and J.L. Lebowitz, Phys. Rev. Lett.78, 3499 (1997).


Related documents

This Service Level Agreement (SLA or Agreement) document describes the general scope and nature of the services the Company will provide in relation to the System Software (RMS


The undifferentiated murine neuroblastoma Neuro2a cell line (ATCC) was stably transfected with AhR cDNA and the established cell line was named N2a-R α.. The activation of exogenous

This is the recurring motto of the unedited treatise Diez privilegios para mujeres preñadas 4 (Ten Privileges for Pregnant Women), written in 1606 by the Spanish physician

Note: if you want to burn your current movie production to a disc right away, go directly to the Create Disc module. In the Create Disc module you can create a disc menu, produce

The aim of this paper was to explore the effects on the French forest sector of three policies to mitigate climate change: a fuelwood consumption subsidy (substitution), a payment

Howel die poging om die lewe van Jesus van Nasaret te rekonstrueer, volgens MachoveC (1976:41) baie moeilik en selfs onriloontlik is, word die historiese Jesus deur.. hom

Ad hoc reviewer for the following journals: Nature, Nature Climate Change, The Journal of Politics, Political Psychology, Public Opinion Quarterly, American Politics Research,

de Klerk, South Africa’s last leader under the apartheid regime, Mandela found a negotiation partner who shared his vision of a peaceful transition and showed the courage to

Named in memory of Conall 6 Fearraigh, a Donegal singer, it was for amhranafocht gan tionlacan nach sean-nos f (unaccompanied singing which is not sean-nos). This was meant to

1. To ensure that parents receive the best opportunity to reunify with children and youth in foster care, services designed to safely return the children and youth to their

Consultation with a wide variety of potential stakeholders, including the public, NHS health and information technology professionals, government departments, the Human

ANGIO-IMMUNOBLASTIC lymphadenopathy (AILD) - a case report by 0 Azizon, NH Hamidah, 0 Ainoon, SK Cheong and KS Phang (abstract).. ANTIBODY responses of dengue fever

Pulverization of the limestone followed by wet separation using dispersion cum settling technique leads to liberation and separation of the clay minerals present in it leading to

calcined phosphate which is quit popular in Indonesia and Malaysia. shows that when PR:Soda ash ratio is 3:1,the calcined material is similar to Rhenania phosphate fertilizer. In

After determination of the oxidation potentials of acids [4], which were sufficient for the extraction studies, experiments were conducted to investigate the optimum

The phosphate rock of Syrian mines is being upgraded by increasing P205 content to higher values (>28%) suitable for fertilizers and other industries.. But it is not

The Beldih deposit of Purulia was earlier selectively mined by private industries for high grade ore but subsequently West Bengal Mineral Development & Trading Corporation took

Heavy media separation of washed lumps at a specific gravity of 3.0 produced heavies with improvement in alumina level in some of the individual type samples but in the

Figure 1 is an Ellingham diagram which shows the standard Gibbs energy of formation of various oxides as a function of temperature, with respect to one mole of oxygen gas [1].

11.10 Conclusions and Remarks : The single stage carbothermic reduction "quick lime method] of calcium silicon alloys demands quality raw materials, very precise proportioning

Alternatively, or in addition to, a Supreme Court precedent binding the Board and the courts, Congress could take action to ensure employers will not infringe the rights of

q w e r t y Description Rod cover Head cover Cylinder tube Piston rod Piston Bushing Cushion valve Snap ring Tie rod Tie rod nut Wear rod Rod end nut Back up O ring Rod seal Piston