• No results found

pomsets: Workflow management for your cloud

N/A
N/A
Protected

Academic year: 2021

Share "pomsets: Workflow management for your cloud"

Copied!
29
0
0

Loading.... (view fulltext now)

Full text

(1)

Outline Introduction to workflow management Workflow management + cloud computing Workflow management challenges pomsets

pomsets: Workflow management for your cloud

Michael J Pan

Nephosity

(2)

Outline

Introduction to workflow management Workflow management + cloud computing Workflow management challenges pomsets

Introduction to workflow management

Definition Motivation

Workflow management + cloud computing

Issues with workflow management + grid computing Workflow management is crucial to cloud computing

Workflow management challenges

Workflow structures Ease of use

pomsets

(3)

Outline

Introduction to workflow management

Workflow management + cloud computing Workflow management challenges pomsets

Definition

Motivation

Workflow management is ...

the design, specification, coordination of the execution of tasks and task dependencies.

(4)

Outline

Introduction to workflow management

Workflow management + cloud computing Workflow management challenges pomsets

Definition

Motivation

Motivation

We have lots of data and compute nodes to process that data. To minimize execution time, we need a tool to

I design and specify of task parallelism and task dependency ordering

I coordinate execution of the tasks over large compute resources

(5)

Outline Introduction to workflow management

Workflow management + cloud computing

Workflow management challenges pomsets

Issues with workflow management + grid computing Workflow management is crucial to cloud computing

Why workflow management + cloud computing?

I Cloud computing provides the ability to scale compute resources with the work that needs to be done

I Better than what is available today, i.e. WFM+grid computing

I WFM is critical to a successful long-term cloud computing strategy

I A critical component of the cloud computing software stack

I Significant cloud computing community desire for WFM functionalities

(6)

Outline Introduction to workflow management

Workflow management + cloud computing

Workflow management challenges pomsets

Issues with workflow management + grid computing

Workflow management is crucial to cloud computing

Workflow management + grid computing

Large computing resources historically available as grid computing. Issues with WFM on grids

I Jobs submitted to grids are often queued up behind jobs of other users, reduces the effectivity of workflow management optimizations

I Hetereogeneous compute environments may result in different

task results and/or make the workflow specification unnecessarily complex

I Grids are not easily federated, limiting burst computing

I Available only to institutions with the resources to deploy their own grid, as well as implement their own WFM

(7)

Outline Introduction to workflow management

Workflow management + cloud computing

Workflow management challenges pomsets

Issues with workflow management + grid computing

Workflow management is crucial to cloud computing

Components of a cloud computing software stack

I virtual machines (VMWare, Xen, Virtuzzo, KVM)

I dynamic provisioning (Amazon EC2, Eucalyptus, GoGrid,

Rackspace, Dell/Joyent)

I task partitions (MapReduce, Hadoop, Disco, Sphere)

I data distribution (GFS, HDFS, Ceph, Sector, Voldemort,

MongoDB, CouchDB)

I unified messaging (Qpid, RabbitMQ, Amazon SNS)

I workflow management (Azkaban, Kepler, Oozie, Pipeline,

Pegasus, Taverna, Triana, pomsets)

I monitoring & reporting (RightScale, Nagios, Ganglia, Graphite)

(8)

Outline Introduction to workflow management

Workflow management + cloud computing

Workflow management challenges pomsets

Issues with workflow management + grid computing

Workflow management is crucial to cloud computing

Significant community demand

(9)

Outline Introduction to workflow management

Workflow management + cloud computing

Workflow management challenges pomsets

Issues with workflow management + grid computing

Workflow management is crucial to cloud computing

Identification by the scientific community

“Beyond the Data Deluge”

(Science, Vol. 323. no. 5919, pp. 1297-1298, 2009)

In the future, the rapidity with which any given discipline advances is likely to depend on how well the community acquires the necessary expertise in database,

workflow management, visualization, and

(10)

Outline Introduction to workflow management Workflow management + cloud computing

Workflow management challenges

pomsets

Workflow structures Ease of use

Challenges with workflow management

I Ability to handle the various workflow structures

I Ease of use

I Others that we will not cover, including, but not limited to

I data management and distribution

I validation of data (both inputs and outputs)

I data provenance

(11)

Outline Introduction to workflow management Workflow management + cloud computing

Workflow management challenges

pomsets Workflow structures Ease of use

Workflow structures

I Fan out I Fan in I Diamond I Intermediary I N

I Task partitioning (Parameter sweep, MapReduce)

What do they look like, in a dependency graph, and when linearized (coded into a script)? What issues do they present?

(12)

Outline Introduction to workflow management Workflow management + cloud computing

Workflow management challenges

pomsets Workflow structures Ease of use

Fan out

I A; B; C I A; C; B

(13)

Outline Introduction to workflow management Workflow management + cloud computing

Workflow management challenges

pomsets Workflow structures Ease of use

Fan in

I A; B; C I B; A; C

(14)

Outline Introduction to workflow management Workflow management + cloud computing

Workflow management challenges

pomsets Workflow structures Ease of use

Diamond

I A; B; C; D I A; C; B; D

(15)

Outline Introduction to workflow management Workflow management + cloud computing

Workflow management challenges

pomsets

Workflow structures

Ease of use

Intermediary

I A; B; C

Another variation of combination “fan in” and “fan out”. Need to ensure that C is not run twice.

(16)

Outline Introduction to workflow management Workflow management + cloud computing

Workflow management challenges

pomsets Workflow structures Ease of use

N

I A; C; B; D I A; C; D; B I C; A; B; D I C; A; D; B I C; D; A; B

Another variation of combination “fan in” and “fan out”. Computational linguistics theory: N structures in a pomset

(17)

Outline Introduction to workflow management Workflow management + cloud computing

Workflow management challenges

pomsets Workflow structures Ease of use

Task partitioning

I A1; A2; ...; An Issues

I Dynamic generation of task partitions

(18)

Outline Introduction to workflow management Workflow management + cloud computing

Workflow management challenges

pomsets

Workflow structures

Ease of use

Pig

(19)

Outline Introduction to workflow management Workflow management + cloud computing

Workflow management challenges

pomsets

Workflow structures

Ease of use

Oozie

(20)

Outline Introduction to workflow management Workflow management + cloud computing

Workflow management challenges

pomsets

Workflow structures

Ease of use

Usability Hypothesis

All things being equal (i.e. functionality), the product easiest to use becomes dominant

I Search and mail: Google

I Phone and tablet: Apple

(21)

Outline Introduction to workflow management Workflow management + cloud computing

Workflow management challenges

pomsets

Workflow structures

Ease of use

Usability goals

I Visual: no user coding

I Simple: easy enough for non-programmers to design their

workflows and to execute workflows on existing clouds

I Powerful: capable of specifying dependencies, task partitions, etc. if desired by user, but not overwhelm user by default

(22)

Outline Introduction to workflow management Workflow management + cloud computing Workflow management challenges

pomsets

The mathematical model The workflow management system

pomsets is ...

I a mathematical model– first used in 1985 by Vaughn Pratt to

describe concurrent processes

I an application that implements the mathematical model as

the data structures that represent workflow components, facilitates the design and specification of workflows, and coordinates the execution of the workflows on cloud deployments.

(23)

Outline Introduction to workflow management Workflow management + cloud computing Workflow management challenges

pomsets

The mathematical model

The workflow management system

The mathematical model

A labelled partial order is a 4 tuple (V, Σ,,µ) where

I V is a set of vertices

I Σ is the alphabet

I is the partial order on the vertices

I µ is the labelling functionµ: V→ Σ

(24)

Outline Introduction to workflow management Workflow management + cloud computing Workflow management challenges

pomsets

The mathematical model

The workflow management system

The workflow management system

Two main components

I the core is the backend and provides an API

(25)

Outline Introduction to workflow management Workflow management + cloud computing Workflow management challenges

pomsets

The mathematical model

The workflow management system

Features

I Parallel computing

I Data flow

I Flow control

I Workflow reusability

I Compute cloud agnosticism

I Execute environment agnosticism

I MapReduce

I Intuitive GUI

(26)

Outline Introduction to workflow management Workflow management + cloud computing Workflow management challenges

pomsets

The mathematical model

The workflow management system

(27)

Outline Introduction to workflow management Workflow management + cloud computing Workflow management challenges

pomsets

The mathematical model

The workflow management system

Target users

I end users who have workflows that they run repetitively over different datasets

I subject matter experts who design workflows to share with their colleagues/collaborators

I developers who develop programs to be executed as workflow

tasks

I developers who explicitly define workflows that their application executes

(28)

Outline Introduction to workflow management Workflow management + cloud computing Workflow management challenges

pomsets

The mathematical model

The workflow management system

Future work

Apply workflow management to applications in various domains; make improvements as necessary

I rendering, animation, special effects

I medical imaging

I scientific computing

(29)

Outline Introduction to workflow management Workflow management + cloud computing Workflow management challenges

pomsets

The mathematical model

The workflow management system

Getting to know pomsets

http://pomsets.org

I Current release is 1.0.6

I Download source

I Download Mac OS X application bundle

I Prepackage binaries for other platforms soon

References

Related documents

Another justification for permitting the determination of causation is to up- hold the purpose of appraisal. 109 The court in CIGNA explained that if the apprais- ers were

Minnesota statutes have undergone revisions while the administrative rules have remained unchanged since the 1990s. Several of Minnesota’s administrative rules pertaining to special

The City of Richmond COOP Worksheets are adapted from the Virginia Department of Emergency Management’s Local Government Continuity of Operations Worksheets and Pandemic

This paper addresses the problem of adaptive output-feedback control for more general class of stochastic time-varying delay nonlinear systems with unknown control coefficients

In the first chapter we present the fundamentals of Ellsberg games. 1 We start by discussing the major aspect in which Ellsberg games differ from other models of ambiguity in games:

The data gathered in this study allowed for a comprehensive answer to the research question: to what extent are social workers in the foster care system taught trauma-informed care

Discharge hoses Exhaust discharge Water drain tank Locking castors Fans on rocker switch Cooling on rocker switch Tank full lamp Power on