• No results found

5.5 Evaluation

5.5.2 Feasibility Test

This experiment demonstrates that an AA-enabled application is easy to run by end-users (transparency), supports dynamic resource allocation (flexible execution), and supports wide-area distributed comput- ing. A more complex experiment that demonstrates the ability of the AA for supporting interactive and dynamic applications is given in Appendix D.

We chose to use a simple application, Mandelbrot computation for the demonstration. The calcu- lation of the Mandelbrot set is an excellent candidate for parallelization. The set of dots that must be calculated can be divided up and allocated to different processors, and no communication is required between the processors to calculate the value for each pixel. The only communication required is the gathering of all of the results into a single processor so that the dots can be plotted.

The Mandelbrot computation is a typical example of traditional static parallel application (e.g. gmandel, http://gmandel.sourceforge.net/). The static partition method is to divide the workspace into n equal rectangles, where n is the number of processors. A contiguous number of rows is computed by each processor. Resources are allocated in advanced of the execution and cannot change during the execution. Here we change the application by introducing a dynamic load balancing algorithm to make the application adapt to the dynamic environment so that the application is suited to flexible execution.

The tested application draws a Mandelbrot set on a 3000 × 3000 dots canvas with magnification = 1.0. The benchmark iteration is 500,000. The workspace is broken down into 3000 computing objects, and each object takes charge of the computation of a row. The objects are distributed to a number of “herder” processes. Processes compute objects and send results back to a control program, which mon- itor the execution progress. The canvas is updated from the top down. The application is implemented with a simple dynamic load balancing policy: each process balances the load with its two neighbour processes every 10 seconds, as stated in Algorithm 3. One of the advantages of the local load balancing algorithm is that the adaptation is performed without the participation of a central process, which is nor- mally started on user’s local machine that might be far from the computation cluster. This algorithm only

5.5. Evaluation 68 distributes the data to computing nodes from a central node once, reducing the distribution overhead that is introduced in the master-worker algorithm, where a master/central process distributes data to worker processer once they become idle. The maximum number of the “herder” processes are expected to be 30. A fragment of the control program Man control.cpp is shown in List 5.2.

Listing 5.2: Control program code # i n c l u d e <AA . h> . . . / / s t a r t an AA . AA a a = new AA( a r g e , 0 ) ; . . . / / R e q u e s t t o add 30 ‘ ‘ h e r d e r ’ ’ p r o c e s s e s . f o r ( i n t i =0 ; i <30; i ++){ aa−>A d d P r o c e s s ( ” H e r d e r ” ) ; } / / D i s t r i b u t e s o b j e c t s t o t h e f i r s t a d d e d p r o c e s s . w h i l e ( t r u e ) {

i f ( ( aa−>G e t N o t i f i c a t i o n ( PROCESS ADDED ) ) > 0 ) { / / G e t p r o c e s s ’ s UPID i n t u p i d = aa−>U p k I n t e g e r ( ) ; / / D i s t r i b u t e 3000 o b j e c t s t o t h e f i r s t p r o c e s s . . . break ; } . . . } . . . w h i l e ( t r u e ) { / / More p r o c e s s e s a r e b e i n g a d d e d f o r t h e c o m p u t a t i o n i f ( ( aa−>G e t N o t i f i c a t i o n ( PROCESS ADDED ) ) > 0 ) {

i n t u p i d = aa−>U p k I n t e g e r ( ) ; / / a s s i g n n e i g h b o u r p r o c e s s e s ’ UPID t o t h e t h e a d d e d / / p r o c e s s f o r l o c a l l o a d b a l a n c i n g . . . . } / / R e c e i v e r e s u l t s , draw M a n d e l b r o t s e t i f ( aa−>NReceiveFrom ( WILDCARD ) >0){ . . .

We started the application on a laptop which was connected to the Internet through a BT homeR

router. The invocation was simply achieved by ./M an control, rather than any job submission lan- guages. Once the control process was started, the AA was automatically invoked to deploy the virtual machine to run the application. It started related daemons on the local laptop, and automatically placed PMWComms on the hierarchical domain frontends: amy.cs.ucl.ac.uk, condor.cs.ucl.ac.uk and more- cambe.cs.ucl.ac.uk, in order to connect the cluster resources. Figure 5.14 depicts how the AA discovered the path to allocate resources and deploy processes in the two clusters.

5.5. Evaluation 69 According to the demand of the application, the AA contacted the Condor and SGE clusters to obtain resources and deployed processes on the allocated resources. The AA could not collect 30 re- sources immediately, due to resource competitions from other users. Instead of reserving and waiting for resources, the AA started the execution with available resources immediately and added resource to the execution on the fly. The dynamic resource configuration is shown in Figure 5.15. From Figure 5.16 we can see after the first process was added from the condor cluster, the master sent all the 3000 objects to it and started the execution (left objects was decreasing). Later 4 more resources (1 from Condor cluster and 3 from HPC cluster) became available (probably other users finished their work) and the AA immediately deployed 4 processes on them. The load in the first process was immediately migrated to the 4 processes for load balancing. In the rest of the execution more resources/processes were gradually added for the computations and the load was continually migrated to the under-loaded processes. The whole execution finally leveraged 30 hosts including 12 from the Condor cluster and 18 from the HPC cluster. The execution took about 10 minutes.

The execution speed changed when more processes involved in the computation. During the exe- cution, the control program drew the Mandelbrot set on-the-fly when it received finished dots. The user setting in front of his laptop viewed the Mandelbrot set drawing but had no idea how the underlying resources were managed.

C ondor C l ust er C ondor

Applica t ion AA

Al l ocat e r esour ces

Depl oy pr ocesses am y.c s.uc l.. H P C C l ust er SGE m o re c am be .c s.uc l.. c o ndo r.c s.uc l.. Inte rne t

Figure 5.14: The AA deploys an application in the Condor and HPC cluster from a laptop.