• No results found

On the Benefits of a Workflow-Aware File System in High-Performance Computing Systems

N/A
N/A
Protected

Academic year: 2021

Share "On the Benefits of a Workflow-Aware File System in High-Performance Computing Systems"

Copied!
12
0
0

Loading.... (view fulltext now)

Full text

(1)

Introduction The Design of WaFS The WaFS-aware Scheduler Simulation Study Conclusions and Future Work

On the Benefits of a Workflow-Aware File

System in High-Performance Computing

Systems

Yang Wang Paul Lu

Department of Computing Science University of Alberta, Canada {yangwang, paullu}@cs.ualberta.ca

The 8th International Conference on High Performance Computing in Asia Pacific Region, 2005

(2)

Introduction The Design of WaFS The WaFS-aware Scheduler Simulation Study Conclusions and Future Work

Outline

1 Introduction

Workflow-based Workloads in HPC The Problems

Our Contributions

2 The Design of WaFS

The Architecture of WaFS

3 The WaFS-aware Scheduler

Versioned Namespace (VNS) Policy

4 Simulation Study

The Experiment Setup The Results

(3)

Introduction

The Design of WaFS The WaFS-aware Scheduler Simulation Study Conclusions and Future Work

Workflow-based Workloads in HPC

The Problems Our Contributions

Workflow-based Workloads in HPC

Characteristics

a set of control/data dependent jobs, multiple instances of the same workflow shape but different input, intermediate, and output files. Example Feature Extraction 2 Function Classifier Localization Classifier Create Summary B C D E F out.B out.C out.D out.E A BLASTout.A out.A Feature Extraction 1

1 a bioinformatics application for proteome classification.

2 input is a large set of DNA/protein sequences.

3 BLAST is to find homologs for a given unknown sequence.

4 extracting features or keyword from the homologs for corresponding classifiers.

(4)

Introduction

The Design of WaFS The WaFS-aware Scheduler Simulation Study Conclusions and Future Work

Workflow-based Workloads in HPC

The Problems

Our Contributions

The Problems

1 Filename Conflicts:

1,000 sequences, then 1,000 instances. But, the static filename results in filename conflicts.

Example Out.A A Out.A C D B A Out.A C D Out.A B

Some Known Solutions Serial execution (low concurrency and low performance)

Sub-directories (Sometimes, user’s burden)

Overwrite-Safe Concurrency (OSC) (PDCAT 2005)

(5)

Introduction

The Design of WaFS The WaFS-aware Scheduler Simulation Study Conclusions and Future Work

Workflow-based Workloads in HPC The Problems

Our Contributions

Our Contributions

1 The design of a Workflow-aware File System (WaFS):

captures and represents the workflow-specific information (e.g., dataflow information)

2 The design of a WaFS-aware Scheduler:

integration of job scheduler with file system that makes possible a variety of performance optimization and benefits.

3 A simulation study of WaFS and its benefits for job

(6)

Introduction

The Design of WaFS

The WaFS-aware Scheduler Simulation Study Conclusions and Future Work

The Architecture of WaFS

The Architecture of WaFS

Data Model

File Work

flow API DepSolver QueryUtil Others

Applications Scheduler Users

Other FS

Traditional File System Run Job User WorkflowSpace WorkflowSpace RunSpace RunSpace JobSpace JobSpace RunSpace RunSpace FileSpace ... UserSpace

The Data Model

Namespace Functionality

UserSpace: User Information, Access Control

WorkflowSpace: Workflow Information (e.g., shape parameters)

RunSpace: Workflow Instance Information (e.g., runtime parameters)

JobSpace: Detail Information of jobs

(7)

Introduction

The Design of WaFS

The WaFS-aware Scheduler Simulation Study Conclusions and Future Work

The Architecture of WaFS

The Architecture of WaFS

Data Model

File Work

flow API DepSolver QueryUtil Others

Applications Scheduler Users

Other FS

Traditional File System Run Job User

1 Query Utility: provides remote access services to WaFS.

2 On-line Data Dependency Solver: constructs the data dependencies as the

(8)

Introduction The Design of WaFS

The WaFS-aware Scheduler

Simulation Study Conclusions and Future Work

Versioned Namespace (VNS) Policy

Versioned Namespace (VNS) Policy

To enable any existing scheduler to schedule and execute each workflow instance in its own namespace. Example: the execution of WI1, WI2 and WI3 are overlapped and in their own namespaces (NS1, NS2 and NS3). A C B E D F A B C E D F E A F C B D NS2 NS1 NS3 WI2 WI1 WI3

(9)

Introduction The Design of WaFS The WaFS-aware Scheduler

Simulation Study

Conclusions and Future Work

The Experiment Setup

The Results

The Experiment Setup

The Workloads

Shape Fork&Join (stage=3, fan-out=32) # of Instances 100 (total 9,600 jobs)

Avg. Interarrival Time exponential distribution (µ≥100 time units) Job service time uniform distribution on [500, 1000] time units

The Policies

Policies Intra-Instance Concurrency Inter-Instance Concurrency BASE Control-flow Impossible in general case

(10)

Introduction The Design of WaFS The WaFS-aware Scheduler

Simulation Study

Conclusions and Future Work

The Experiment Setup

The Results

The Results

100 200 400 800 1200 1600 2400 3200 4800 6400 0 1e+05 2e+05 3e+05 4e+05 5e+05 6e+05 7e+05

Makespan (Time Units)

BASE (# of files: 128) VNS (# of files: 12800) Fork&Join(3x32) JST[500, 1000]

100 200 400 800 1200 1600 2400 3200 4800 6400 Average Interarrival Time

0 100 200 300 400 500 Avg. DOC.

(11)

Introduction The Design of WaFS The WaFS-aware Scheduler

Simulation Study

Conclusions and Future Work

The Experiment Setup

The Results OSC HB DOC Storage Overhead CFG−BASE VNS Sub−dir

(12)

Introduction The Design of WaFS The WaFS-aware Scheduler Simulation Study

Conclusions and Future Work

Conclusions and Future Work

1 Introduced the motivation, design, and simulation study of

WaFS.

2 Proposed a Versioned Namespace (VNS) policy to

integrate WaFS and job scheduler for HPC workloads to address the filename conflicts and inefficient

re-computation problems.

3 To address the storage overhead of VNS, we have started

to explore other options, such as the hybrids between dataflow-based policies and VNS.

References

Related documents

from a technical mixture with higher degree of chlorination to a technical mixture with lower degree of chlorination over this time period, (2) early, industrial, or combustion

Because of this, the model needs to use a recursive form of the moisture content prediction equation that will allow the equation to adapt from the experimentally tested

The anti- oxidant activity was performed by DPPH free radical scavenging method using ascorbic acid as standard and compound IIa, IIc and IId showed significant free

This study aims to test whether the liquidity proxied by Current Ratio (CR) has an effect on the dividend policy of manufacturing sector of consumer goods sector

The solid line represents the relation between 4th grade enrollment in a school district and expected class size in grades 4 to 6 based on the maximum class size rule, while the

It was decided that with the presence of such significant red flag signs that she should undergo advanced imaging, in this case an MRI, that revealed an underlying malignancy, which

The transform matrix is a square matrix of the suited coefficients 2x2 in the former case and 3x3 in

Figure 20 shows the effect of the following two coordination rules: “ if the going price in the Dutch auction (right) is less than the current bid in the English auction (left),