5.5 Clouds versus other Infrastructures
6.3.4 Simulation versus Execution
The previous simulation experiments have shown that it is possible to use the combina- tion of ASKALON and GroudSim in a scalable way for two simple parallel workflows and different resources. To verify the correctness of the simulation environment, we executed a real-world workflow called Montage with 649 activities in the Austrian Grid and a private Cloud-based on the Eucalyptus middleware and hardware that has
a performance similar to Amazon EC2 c1.medium instances, and compared the real
execution trace with the simulated one.
Figure 6.10 shows Gantt charts of the real and simulated executions of a subset of the Montage workflows containing 27 activities (for readability reasons) generated us- ing the ASKALON monitoring tool. The real execution took 1,180 seconds (about 20
Workflow Number Total Average Real Simulation name of number of activities execution time
runs activities per run time [hours] [min.]
Blender 432 131,501 305 41.20 17.2 LinMod 62 100,042 1,620 24.89 12.1 Montage 158 359,946 2,300 78.40 58.5 PovRay 182 78,402 430 1,156.98 9.4 Wien2k 673 207,256 310 108.21 29.3 Total 1,507 877,293 580 1,409.68 126.0
Table 6.1: Run-times of all executed workflows and their simulated executions. minutes), while the workflow simulation needed four seconds to simulate the same ex- ecution and had a simulated runtime of 1,290 seconds (21.5 minutes). The experiment shows that the scheduling and the execution of the simulated workflow are comparable with the real execution. The runtime difference of 110 seconds is due the inaccuracies of the prediction service used for deciding the activity and file transfer execution times used in the simulation, whose inaccuracy is within a 10% range.
The real workflows executed using the ASKALON environment last several hours in most cases. Many of these executions are synthetic runs for tuning the middleware and the underlying methods, which could be significantly improved if simulation would replace all expensive executions done on real hardware. Table 6.1 provides a sum- mary of the execution of several workflow applications submitted to the Austrian Grid using ASKALON. A submission of about 900,000 activities to GroudSim leads to a simulation duration of 20 seconds when a simple simulation is run without any other overheads, i.e. scheduling and resource management. Executing the same amount of workflows and activities using ASKALON and simulating their execution has a higher duration of about two hours. The major part of the execution is consumed in the EE2 and its communication with the other services, and only a marginal part of this exe- cution (30 seconds) represents simulation overhead. As the simulator does not get all activities of a workflow execution simultaneously as it has to execute them in the order given by the workflow and wait for EE2 and the scheduler, it cannot reach its peak performance and requires 50% longer than when using synthetic jobs.
As all workflows executed were different and of many different sizes, we approx- imate the simulation workflow size for our evaluation matching the average number of activities per workflow, as listed in Table 6.1. Analyzing the sum of all executions
leads to a total average of 580 processed activities per executed workflow and 310 to 2,300 activities for the individual workflows. Simulation of these executions took 2.5 to 19 seconds leading to an overall simulation runtime of 126 minutes for the complete set of workflows (more than 1,500 workflow runs). This results in a speedup of about 700 compared to the real execution on the Grid and Cloud, which took more than 1,400 hours (58 days). The presented approach can therefore significantly reduce the time to validate new research ideas in the area of scheduling or resource provisioning and it closes the gap of developing simulation scenarios which are normally evaluated and, if successful, ported to the real system.
6.4 Related Work
GridSim [148] is a simulation toolkit for resource modeling and application scheduling for Grid computing. It uses SimJava [95], a process-based discrete event simulation package, as its underlying simulation framework. As it runs a separate thread for each entity in the system it has a poor runtime performance compared to GroudSim [120]. Evaluation results show that the toolkit suffers from memory limitations when simu-
lating more than2,000Grid sites concurrently on a certain machine. CloudSim [25] is
an extension of GridSim for modeling and simulating Cloud infrastructures and shows the same scalability problems.
SimGrid [27] is a simulation framework for evaluating cluster, Grid, and peer– to–peer algorithms and heuristics. The approach is comparable to the one used in GroudSim, but uses C instead of Java as the main development language, which makes its integration with existing Java tools and services such as the ASKALON Grid ap- plication development and computing environment [47] more difficult. We integrated GroudSim into ASKALON to allow the user to easily switch from simulations to real executions from the same unique graphical user interface. Integration of C-based Sim- Grid into ASKALON using the Java Native Interface technology would break the ”compile once, run everywhere” advantage of Java. Our goal is to achieve a simu- lation framework that can be run on any architecture directly from the browser using Java Webstart technology, which is not possible using the C language that needs to be compiled for each architecture and operating system separately. This and other integration reasons such as user interfaces for configuring the simulation and direct
communication within a Java container to eliminate Web services overheads lead us to the decision of developing a new Java-based simulator. Furthermore, since SimGrid does not address simulation of Cloud infrastructures, we were unable to compare its performance with GroudSim.
GridFlow [26] is an agent-based Grid middleware that supports execution and simu- lation of workflows, which has important limitations compared to our approach. Grid- Flow simulates a workflow before it is executed to estimate the schedule time and it has no support for Cloud resources or modification of the scheduling scenarios as the environment parameters are used. Our approach allows to read the system state from an information service (like the Monitoring and Directory Service) to provide similar scenarios and to customize the resource set using a graphical user interface.
The simulation framework presented in [62] is used for trace-based simulations, does not support Cloud environments, and is not related to a real execution framework. The framework provides no easy possibility to prove the correctness of the simulations results, as opposed to our approach.
There are many more Grid-related simulation frameworks like OptorSim [13], Cas- Sim [171] and GangSim [42] which are more specialized than our generic approach which covers a broader range of possible simulation scenarios including Clouds and real executions.