Benchmarking Environment - Performance Evaluation

Method for Terminating a Parallel Simulation

6.3 Performance Evaluation

6.3.1 Benchmarking Environment

Initial studies of AKAROA's performance were done on a local computer network (a multiprocessor SUN Server with two SPARC CPUs, various SUN 4 and SUN SPARC workstations) connected by 10Mbps Ethernet. Apart from the obvious dierences in processing power between the workstations available for our investi- gation, none of the machines were dedicated to AKAROA's use. Specically, they were used concurrently by our stage 3 students (numbering approx. 55), honours class (14), research students, and lecturers of the Department of Computer Science. Furthermore, the SUN server was used by several other departments in addition to ours. Our approach to evaluating AKAROA in this environment, is to conduct all single processor (P=1) simulation experiments using our fastest machine during low load periods (at night), whilst executing multi-machine experiments with AKAROA using a mix of the fast and lower rated workstations during periods when higher competition for their use is expected. In this manner, we expect our results for speedup by AKAROA to be on the pessimistic (and hence safe) side. For added safety, all processes of AKAROA were executed at lower priorities for

P >1 cases than for single processor runs.

Main performance measures considered were: real time speedup (the ratio of time needed to achieve an estimate at a given precision level on a single processor to the time required for achieving it on P processors), CPU-time speedup ( the ratio of CPU-time needed to produce an estimate at a given precision level on a single processor to the CPU-time required for achieving it on P processors), and coverage (the frequency with which the condence intervals produced by the SA- PTS method contain the true parameter at a given condence level). Analytically, we can write

speedup(P) = simulation time on one processor_{simulation time on P processors} (6.1)

CPU,time,speedup(P) =

CPU time on one processor

average CPU time on P processors (6.2)

coverage(P) = no: of P processor experiments giving a CI that enclosed the true parameter_{total number of experiments using P processors} (6.3)

The speedup measures gauge AKAROA's potential for delivering results faster over simulations executed on a single uniprocessor machine. speedup(P) represents the reduction in real time (as observed by the user) achievable through the use of

P processors to execute the simulation. Thus speedup(P) accounts for the over- head incurred in parallel processing, including time required for process creation, management, and inter-machine interprocess communication, as well as the de- lays caused by non-AKAROA processes of other users which shares the machines we used. In comparison, CPU-time-speedup measures improvement in terms of reduction in computation time per machine engaged in simulation. Thus this measures speedup attainable if IPC, and the eects of non-AKAROA processes were negligible.

Whereas the speedup indexes report AKAROA's speed in producing estimates to a level of precision desired by the user, coverage(P) measures the quality of the resulting estimators when obtained using P workstations. All presented results were obtained during steady state simulations of M/M/1/1queuing systems with

trac load =90%; each experiment was repeated 200 times. The parameter estimated by simulation is the mean delay experienced by a customer in the system. In all experiments, a level of precision of5% or better was required of the nal

estimates. The level of condence required was 95%. This means that the half width of the condence intervals for the mean delay produced by each experiment should be no greater than 5% of the estimated value, and that the condence intervals produced should be correct (i.e. contain the true value of the expected delay) in 95% of experiments. Much is know analytically about the M/M/1/1

system, including the true value of the expected delay, hence the correctness of the condence intervals for the delay obtained by our simulation experiments, and hence an estimate of the coverage could be obtained.

Geometric checkpoint distribution (default) and constant checkpoint distribution options are supported by AKAROA. Intuitively, geometrically distributed checkpoints would bias speedup results in favour of sequential experiments, since

less checkpoints are expected to be reached than parallel simulations, so less computation. Also in parallel simulations, communication between processes on dier- ent machines would be performed at each checkpoint, unlike the sequential simulations where interprocess communication is never needed. On the other hand, since parallel simulations result in more checkpoints being reached, and that the precision of estimators is computed at each checkpoint, we would expect that parallel simulations would stop at a level of precision closer to the minimum requirement than a corresponding sequential simulation which we expect to be more likely to 'overshoot', stopping when the precision of estimates has exceeded the target level. These competing eects cloud our ability to apply intuition in deducing whether the geometric checkpoint option favours simulations executed sequentially or in parallel, and consequently it is hard to see in advance its eects as a function of the degree of parallelism employed. We considered both the geometric checkpoint distribution (default) and the constant checkpoint distribution versions of AKAROA.

In document Automating Parallel and Distributed Quantative Stochastic Simulation (Page 156-158)