• No results found

The Virtual Grid Application Development Software (VGrADS) Project

N/A
N/A
Protected

Academic year: 2021

Share "The Virtual Grid Application Development Software (VGrADS) Project"

Copied!
17
0
0

Loading.... (view fulltext now)

Full text

(1)

The Virtual Grid Application Development

Software (VGrADS) Project

“VGrADS: Enabling e-Science Workflows on Grids and Clouds with

Fault Tolerance”

(2)

VGrADS Goal: Distributed Problem Solving

• 

Where We Want To Be

o  Transparent computing

–  In an increasingly distributed

space

–  Applications to HPC

–  Applications to cloud computing

• 

Where We Were (circa 2003)

o  Low-level hand programming o  Programmer had to manage:

–  Heterogeneous resources

–  Scheduling of computation and data movement –  Fault tolerance and performance adaptation

• 

What Progress Have We Made?

o  Separate application development from resource management

–  VGrADS provides a uniform “virtual grid” abstraction atop widely

differing resources

o  Provide tools to bridge the gap

–  Scheduling, resource management, distributed launch, simple

(3)

Overview of SCʼxx Activities for VGrADS

• 

Built on previous SC demonstrations

o  Gradually built up system to handle LEAD workflow

o  Previous years focused improved performance estimates,

scheduling methods, fault tolerance

o  Use LEAD as an application driver

• 

Current status

o  VGrADS integrates HPC and cloud resources

–  Using TeraGrid (HPC), Amazon EC2 (cloud), Eucalyptus (cloud)

resources

–  Using reservations, batch queues, and on-demand clouds

o  Scheduling for balancing deadlines, reliability, and cost

–  vgES supports search for best set of resources

–  Application-specific trade-offs of reliability, time, cost

(4)

VGrADS Components

o  Virtual Grid Execution System (vgES)

–  Uses Amazon EC2 tools to interact with cloud resources –  Uses QBETS and Globus to provision batch resources

–  Uses Personal PBS to control execution on batch resources –  Provides a “resource gantt chart” view of resources to aid

higher level workflow orchestration tool

o  Eucalyptus – Developed for VGrADS (now Eucalyptus Inc.)

–  Implements cloud computing on Xen-enabled clusters

–  Open-source software infrastructure that is compatible with

Amazon EC2

➔ vgES “thinks” a Eucalyptus cloud is EC2

o  Fault Tolerance (FTR)

–  Schedules a task to increase the probability of successful

execution of a task up to a desired level, constrained by resource availability and application deadlines

(5)

Executing LEAD Workflow Sets

o 

Demonstrate

planning and execution of LEAD workflow sets

execution atop virtualized cloud and Grid resources.

o 

LEAD Workflow Orchestration schedules a set of

independent workflows with characteristics

–  a deadline D (e.g. 2 hours)

–  fraction F such that at least F of the workflows finish by the deadline (e.g. 3/8)

o  Virtual Grid Execution System (vgES) provides an abstraction over

batch and cloud systems including Amazon EC2 and Eucalyptus cloud sites.

(6)

Workflow Orchestration

o 

Workflow Orchestration

o  Phase 1: Minimal Scheduling – Schedule minimum fraction of

workflows using simple probabilistic DAG scheduler

o  Phase 2: Fault Tolerance Tradeoff – Compare scheduling

additional workflows with increasing fault-tolerance of one or more tasks of the scheduled workflows

o  Phase 3: Additional Scheduling – Use available slots for other

scheduling

o  Phase 4: EC2 Scheduling - For tasks below certain threshold,

schedule on EC2

o 

Execution Manager

–  A prototype for ordered execution of tasks on the slot based on the schedule determined by the orchestration.

(7)

Resources

HPC

Cloud

Globus EC2 interfaces

Workflow Planner Workflow Engine Application Service Portal Virtual Grid Execution System Resources HPC Cloud

Globus EC2 interfaces

Execution Manager Workflow Engine Application Service Portal Execution Manager resource planning resource binding query execution plan standard execution specific protocol based execution User User

Comparison of the LEAD-VGrADS collaboration system with

cyberinfrastructure production deployments

(8)

Start Deadline D A B C D J K L E F G H I Start Deadline D Batch Queue Cloud A B C D E F G H J K L I W AIT W AIT W

AIT WAIT WAIT

W

AIT WAIT WAIT WAIT

W

AIT

(a)

Example Workflow Set

need F=2/3 (c) Coordinated Schedule With VGrADS (b) Uncoordinated Schedule Without VGrADS Batch Queue Cloud A B C D E F G H I W AIT W AIT J K L W AIT E´ F´ G´ H´ I´ E F G H I

(9)

DAG

 Scheduler
 Virtual
Grid
 Execu3on
System
 Workflow

 Planner

 schedule resource find & bind Fault


 tolerance
 EC2


Phase 1,2,3 Phase 2,3 Phase 4

BQP GT4, Personal PBS VARQ Resources HPC Cloud

Globus EC2 interfaces

query batch queue times virtual res. submit job

setup virtual machines

Interaction of system components for resource procurement


and planning

(10)

LEAD / VGrADS Architecture : Putting It All Together

DAG Scheduler

Virtual Grid

Execution System BQP / VAR

GT4, Personal PBS

HPC and Cloud Resources VGrADS System Globus Workflow Planner Execution Manager LEAD BPEL Workflow Engine Application Service Event Broker App. Factory Portal schedule failures
 resource find & bind Fault tolerance Batch Systems HPC Cloud LEAD System failures


(11)

Visualization Key

Run

Done

Failed

Workflow Color

Time

Slot

W

idth

Task Status

Resource Name

(12)
(13)
(14)

Infrastructure Timing Metrics

Planning time line Workflow execution time line

(15)

Fault Tolerance Exploration

Medium Reliability Low Reliability

(16)

Conclusions

• 

VGrADS’ virtual grid abstraction simplifies

o  Programming grid and cloud systems for e-Science workflows o  Managing QoS (performance and reliability)

• 

The VGrADS system unifies workflow execution over

o  batch queue systems (with and without advanced reservations),

and

o  cloud computing sites (including Amazon EC2 and Eucalyptus)

• 

The system provides an enabling technology for executing

deadline-driven, fault-tolerant workflows

• 

The integrated cyber-infrastructure from the LEAD and

VGrADS system components provides a strong foundation for

next-generation dynamic and adaptive environments for

(17)

Thank you..

Thank You

• 

“VGrADS: Enabling e-Science Workflows on Grids and Clouds

with Fault Tolerance”, L. Ramakrishnan, D. Nurmi, A. Mandal,

C. Koelbel, D. Gannon, T. M. Huang, Y. S. Kee, G. Obertelli,

K. Thyagaraja, R. Wolski, A. Yarkhan and D. Zagorodnov

• 

Paper to be presented at 3:30 p.m., Thursday, Nov. 19 in

References

Related documents

Aiming at the problems mentioned in Chapter 2, the author makes some recommendations for the Intellectual Property Commission Management System in Chapter 3: enterprises should

Figure 2. Torque Vs Engine Speed. From the above data can be calculated that on lap2000 rpm, when replacing the standard ECU to the resulting progamable torque ECU is 6.776 N.m, the

Valve repair can be considered for patients with chronic severe secondary MR due to severe LV dysfunction (LVEF <30%) persistently in NYHA functional class III–IV despite

Derfor vil også skatten gi størst nytte om den brukes til en subsidie for å gi frivillige bidrag til det kollektive godet, som blir som en oppfordring til individene om å

While we agree that economic reasoning may help to change employee behaviour, Weirich (2005) found that negative reinforcement - which would include financial sanctions for

Financial reporting quality index with 40 items was used as the dependent variable, while bank specific characteristics of leverage, size, and profitability as

H3: Individual readiness for change has significant mediating role between the positive relationship of adaptability trait of organizational culture and

H3: Learning / specialisation leads to incremental innovations H2: Repeat installations reduce abatement costs H1: Reduced information uncertainty Invention: Generation of new