CHOICE OF PARALLEL ALGORITHM - molecule It can be shown [Care 1987a] that (4.1) is the most gen

the i molecule It can be shown [Care 1987a] that (4.1) is the most general form of potential for nearest neighbour interactions It is

4.4 CHOICE OF PARALLEL ALGORITHM

Monte Carlo simulations are demanding of computer resources. The use of parallel computing is seen as a convenient way of obtaining the necessary computer power. Parallel computers are defined as computers having processors working simultaneously under the control of an application programme [Fincham 1987]. Such computers can be broadly classified as either SIMD (Single Instruction stream/Multiple Data stream) or MIMD (Multiple Instruction stream/Multiple Data stream).

SIMD machines run the same code on each processor, although the data on each may be different. An example of a SIMD computer is the Digital Array Processor (DAP). Lattice simulations are suited to the DAP as lattice sites can be identified directly with processing elements, although, in our case, the existence of the amphiphile chains will make the algorithm complex in practice.

Unlike SIMD, MIMD machines are able to run different code on each processor simultaneously. This makes them very versatile, providing the opportunity to use different techniques in exploiting parallelism. An example of a MIMD machine u s e s Inmos transputers, This is a high performance microprocessor designed specifically for concurrent proc essing. Each transputer has four links enabling "arrays" of transput ers to be connected together in various topologies. Each transputer processes its own code, communicating data to and from other proces sors via the bi-directional links as necessary. The transputer also has the advantage of its own concurrent programming language based on the occam model of concurrency. The transputer is being used to pro vide the computing power required in the simulation considered here both because of its suitability and because it is readily available.

In many problems suitable for parallel programming there are several distinct ways to obtain concurrency and it is not always obvious which method will be best suited to the particular problem. As very signifi cant speed increases can be obtained by optimising the parallelism, it is necessary to evaluate the different options available before start ing to write the code.

There are three common categories of parallel algorithms which are used in scientific computing; processor farms, geometric parallelism and algorithmic parallelism. Both the geometric and algorithmic meth ods of parallelism take advantage of the inherent parallelism in the Metropolis Monte Carlo technique.

In the Monte Carlo simulation a molecule is taken at random and an

attempted move is made. The move is accepted or rejected according to the change in energy of the system. As we have only nearest neighbour interactions this energy change can be calculated by examining only a small part of the lattice. Thus, in principal, there is no reason why moves may not be attempted on two or more non-interacting molecules concurrently.

In considering parallel algorithms and architectures a distinction is made between processes and processors. In the occam model a system is decomposed into parallel processes. Each processor (transputer) is able to run one or more processes.

4.4.1 Processor Farm

In order to run a Metropolis Monte Carlo simulation on a processor farm, each processor runs the same programme but with different start ing conditions. If the time for thermalisation is T^ and the produc tion time on a single processor is Tp , the total effective simulation time on a farm of N processors is Tp where

Tp = Tt + Tp/N (4.31)

This method has the advantage that only a small time is spent on communications, but the fixed thermalisation time limits the value of the method.

4.4.2 Geometric Parallelism

If geometric parallelism is used for this problem, the lattice is divided into cells and each cell is assigned to a different processor. The transputers must communicate in order to move amphiphiles which lie at the boundary of two or more cells. The simulation time for N processors in a geometric array will be

where Eff is the efficiency of the geometric algorithm and is a func tion of N. Note that 0 < Eff < 1.

I f we assume that T^. = Tp, the geometric algorithm will be superior to the farm method provided

Unfortunately, for the model considered here, the amphiphiles will form clusters and this may cause severe load matching problems in a geometric array. Thus one transputer may be responsible for the Monte Carlo moves of significantly more amphiphiles than its neighbouring transputers. To overcome these problems a form of algorithmic paral lelism was considered.

4.4.3 Algorithmic Parallelism 1 (Tt + Tp )

(4.32)

N Eff

Eff > (2/N + 1) (4.33)

The simulation algorithm was decomposed such that the selection of the amphiphiles, the testing of the Monte Carlo criterion and the subse quent updating of the amphiphiles on the lattice were separate proc

esses. The lattice was placed on a single transputer along with the selection and update processes, whilst the MC process was replicated on a farm of transputers. Each selected amphiphile was taken from the lattice along with its nearest neighbour sites and was sent to a vacant MC process, where a move was attempted. The updated section of the lattice was then returned to update the main lattice. Assuming enough transputers in the MC farm, the program speed would be limited by the time taken to remove and add amphiphiles to the lattice.

In practice, there were overheads involved with the selection of amphiphiles as amphiphiles with an overlapping region of nearest neighbour interaction could not be processed simultaneously. This meant that the selection process took a large amount of time compared to the time taken to do the MC test. For this reason the program was abandoned and an algorithm based on the processor farm method de scribed above was adopted.

In document Lattice models of amphiphile and solvent mixtures (Page 97-101)