• No results found

RANDOM WAVE GENERATION ON PARALLEL COMPUTERS

RANDOM WAVE GENERATION ON PARALLEL COMPUTERS.

8.5 Transim simulation

In addition to the ad-hoc data flow analysis by timing probes mentioned in the previous section a simulation of the whole parallel algorithm was carried out using the proprietary package Transim (Hart an Flavell 1990). The main advantage of the use of such a package is that more extensive distributed algorithms can be analysed than one has the hardware for. Also new ideas that the programmer feels could improve the performance of a distributed program can be tried first in

Transim. An additional bonus is the fact that Transim allows debugging statements to be incorporated in the program, these producing information lines in a log file. The major independent computational units should be identified, as well as the communications that are required between these units. The computational units will then have to be timed, in terms of CPU usage, whereas the communication buffer lengths have also to be specified. A skeletal program comprising only the computational units in terms of time delays, the channel communications and the buffer lengths can be fed into the package. The output of Transim will then contain

the total elapsed time of the simulated run of the user’s program, as well as

information on profiles of the run at regular time intervals. The latter, showing the percentage of usage of the total time by the individual processes and by channel communications, is useful in identifying deadlocks and idle units.

Calibration.

Transim, as mentioned in an earlier paragraph, requires timings of the independent computational units and buffer lengths. The latter can be easily provided if the transputer code already exists. The timings can be estimated from sample runs of the program. So the time required to calculate elevations forA^ grid points, for a fixed number of wave directions, per timestep, can be found by solving the

equation

Pi + T J i j N ^ T ^ (8.2)

where is the production time per packet, and can be measured as displayed in the previous section. Two measurements of suffice to estimate the unknowns. Similarly the delay at the collection point can be timed from the equation:

T3+ _ i^ tj^ (8.3)

where Tarrival.UutJma is again measured from previous experiments, for two cases. Using these equations, unit timings were obtained from 3 calibration trials which were "normal" cases (i.e. free of flow disruption). The production times in these trials ranged between 4800 and 7200 ticks per packet (approximately). With all 6 transputers producing exactly identical parts of the sea, there would be one packet every 800 - 1200 ticks, trying to enter the root transputer (on average, and ignoring the fact that the arrivals are clustered) This should match the slope of Figure 8.9. Note that the figure of 930 is not the (average) time between entrances of packets into the receiver: it is the time one particular packet spends on the last stage of its journey. The transmission of a packet through a multiplexor was assumed to be

instantaneous. The values for the timings found were (in low priority timer ticks; x 64 = microseconds, to be used in Transim’s SERV)

T, t2 T, r ,

34 29 930 0.5

Programming.

Version 3.2(7) of Transim as used here only accepts an input program in

quasi-Occam; therefore the Fortran code was recast in this language. This gave rise to a difficulty inasmuch as the target program used 3L Fortran's threads. Threads are parallel processes which may share data with the main thread and with each other by accessing the data placed in COMMON. A system of semaphores exists in order to regulate the access to the data. The semaphores are also variables in COMMON, which the threads may read and modify. Access regulation for the semaphores themselves is presumably handled by the package. Since there is no comparable mechanism in Occam (in fact sharing of variables is strongly

discouraged) it was necessary to emulate the semaphores by some explicit Occam code. This emulation does not cover all the functions of a semaphore but is more in the nature of a Boolean flag variable, shared, set and reset by the processes which require i t This facility is used, for example, in the swinging buffer

operation between the production thread and the sender thread as discussed above. In this way, our Transim code has been kept as close to the original 3L Fortran as possible. The Quasi-Occam algorithm devised to resemble the threads and

semaphores of 3L-Fortran, as they are used in our distributed program, is given below. The code is stripped of the parts that are irrelevant to Boolean flags

mechanism. It should be noted that the purpose of the WAIT(1) that appears in the checking of the values of the Booleans, is to momentarily deschedule the running thread (process) so as to allow its parallel counterpart to resume execution and probably change the flag before the latter is checked again.

BOOL sender.can.start, sender.finished: SEQI worker — workers numbered from 1 to 6

{ sender.can.start := FALSE senderiinished := FALSE work := 1 PAR {