• No results found

Towards Autonomic Grid Data Management with Virtualized Distributed File Systems

N/A
N/A
Protected

Academic year: 2021

Share "Towards Autonomic Grid Data Management with Virtualized Distributed File Systems"

Copied!
20
0
0

Loading.... (view fulltext now)

Full text

(1)

Towards Autonomic Grid Data

Management with Virtualized

Distributed File Systems

Ming Zhao

, Jing Xu, Renato Figueiredo

Advanced Computing and Information Systems

Electrical and Computer Engineering

University of Florida

(2)

GRID

z

Grid data provisioning

Wide-area latency/bandwidth, dynamism of resources …

How to provide data with application-tailored optimizations?

Motivation

Data server

data

Data server

data

job job job job job job

job job job

Compute server Sparse file access Highly reliable access Virtual machine Compute server Compute server Genome sequence Aggressive prefetching Simulation

(3)

GRID

Motivation

Data server data Data server data

job job job job job job

job job job

Compute server Sparse file access Highly reliable access Virtual machine Compute server Compute server Genome sequence Aggressive prefetching Simulation

z

Grid data management

Dynamic data access, resource sharing, changing resource availability …

How to manage data provisioning in dynamically changing environments?

(4)

Overview

z

Goal:

Efficient data management in heterogeneous, dynamic

and large-scale Grid environments

z

Challenges:

Application-tailored data provisioning

Data management in dynamically changing environment

z

Contribution: Autonomic Grid data management

Virtualized Grid-wide file systems

for user-transparent

Grid data access with application-tailored enhancements

Autonomic data management services

for self-managing,

(5)

Outline

z

Background

Grid virtual file system and data management

z

Architecture

Autonomic data management services

z

Evaluation

Experiments and analysis

z

Summary

(6)

GRID

Grid Virtual File Systems

GVFS Sessi on II job C1 $ Proxy $ Proxy C2 $ Proxy $ Proxy

GVFS Session I Proxy ProxyS1

data S2 Proxy Proxy data job job job

[Cluster Computing, Vol 9/1, 2006]

z

G

rid

V

irtual

F

ile

S

ystem (GVFS)

Distributed file system virtualization through user-level NFS proxies

Enhancements for Grid environment (disk caching, secure tunneling)

Dynamic, independent, application-tailored GVFS sessions

(7)

GRID

Data Management Services

GVFS Sessi on II C1 FSS M GVFS Session I Proxy Proxy C2 FSS Proxy Proxy S1 FSS Proxy Proxy S2 FSS Proxy Proxy DRS DSS z

D

ata

S

cheduler

S

ervice (DSS)

Scheduling and customization of GVFS sessions

z

D

ata

R

eplication

S

ervice (DRS)

Management of application data sets and their replicas

z

Server-side

F

ile

S

ystem

S

ervice (FSS)

Management of server-side GVFS proxies

z

Client-side

F

ile

S

ystem

S

ervice (FSS)

Management of client-side mount points, GVFS proxies and disk caches

[HPDC’2005]

(8)

GRID

Analyze

z Response z Reassign z Load balance Autonomic Server-side

File System Service

z Performance Plan Execute Monitor Analyze z Storage z Replicate z Replication Autonomic Data Replication

Service z Reliability Plan Execute Monitor

Autonomic Services

GVFS Sessi on II C1 Analyze z Response z Redirect z Prim-server Autonomic Client-side

File System Service

z Performance Plan Execute Monitor M Analyze z Storage z Reconfigure z Configuration Autonomic Data Scheduler

Service z Performance Plan Execute Monitor GVFS Session I Proxy Proxy FSS C2 Proxy Proxy FSS Proxy Proxy FSS Proxy Proxy FSS S2 S1 DRS DSS

(9)

z

Manages concurrent GVFS

sessions on shared resources

Considers both application performance

and resource utilization policy

Session

i

’s utility:

U

i

= Performance

i

* Priority

i

Configures sessions to maximize

z

E.g. Disk cache management

Hit rate is important in wide-area environment

Allocates client-side storage among sessions

Monitors storage usage via client-side FSS

Configures the sessions’ caches to maximize global utility

Applies configurations via client-side FSS

Autonomic Data Scheduler Service

Analyze zStorage zReconfigure zConfiguration zPerformance job C1 $ FSS Proxy S2 Proxy FSS Monitor Plan Execute job $ Proxy

i i

U

S1 Proxy FSS

(10)

z

Replication degree and placement

Increases reliability against server-side

failures (crash, network partitioning)

The reliability of a session’s data set

d

:

Reliability

d

> R

min

Reduces replication overhead

The cost of creating replicas for data set

d

Cost

d

< C

max

Replaces replicas based on utility

U

d

= Value

d

* Reliability

d

,

maximizes

z

Replica regeneration

Monitors server status via server-side FSS

Reconfigures sessions’ replicas after a failure

Generates replicas via server-side FSSs

Autonomic Data Replication Service

Analyze zStorage zReplicate zReplication zReliability C1 FSS FSS S2 Proxy FSS Monitor Plan Execute Proxy Proxy S1 replica

i d

U

(11)

z

Fail-over against server failures

Minimizes impact on the application

Non-interrupt session redirection

Monitors server response time

Detects failure when request times out

Redirects session to backup server

Regenerate replicas via DSS/DRS

z

Primary server selection

Maximizes performance against dynamic environment

Monitors connections with effective, low-overhead mechanism:

Small random writes to a hidden file on the mounted partition

Predicts performance with simple effective forecasting algorithm

Chooses the best connection and switches transparently

Autonomic File System Service

C1 FSS S2 Proxy FSS Analyze zResponse zRedirect zConfiguration zPerformance Monitor Plan Execute job $ Proxy Proxy FSS S1

(12)

Experimental Setup

z

File system clients and servers

VMware-based virtual machines

z

Wide-area networks

NIST Net emulated links

z

I/O part of Grid applications

IOzone file system benchmark

z

Grid data management

User-level NFS v3 based GVFS proxies

(13)

C1 FSS Proxy S1 FSS Proxy S2 FSS Proxy 0.03 0.04 0.05 0.06 0.07 0.08 0.09 10 110 210 310 410 510 610 710 810 Time (second) R e spons e Ti m e ( s ec) Server 1 0.03 0.04 0.05 0.06 0.07 0.08 0.09 10 110 210 310 410 510 610 710 810 Time (second) R e spons e Ti m e ( s ec ) Server 1 Server 2 0.03 0.04 0.05 0.06 0.07 0.08 0.09 10 110 210 310 410 510 610 710 810 Time (second) R e spons e Ti m e ( s ec )

Server 1 Server 2 Autonomic session redirection

Autonomic Session Redirection

z

Scenario (I)

Data transfer under

network fluctuation

E.g. caused by parallel TCP transfers

z

Setup

Client connected to two servers with

independently emulated WAN links

Randomly applied latencies with values

from a real wide-area measurement

z

Autonomic session redirection achieves the best performance by adapting

to the changing network condition

20 30 40 50 60 70 80 0 2 4 8 N e t w o r k R T T ( m s)

(14)

Autonomic Session Redirection

0 0.1 0.2 0.3 0.4 0.5 0.6 60 180 300 420 540 660 780 900 1020 1140 1260 Time (second) R e spons e Ti m e ( s ec) Server 1 0 0.1 0.2 0.3 0.4 0.5 0.6 60 180 300 420 540 660 780 900 1020 1140 1260 Time (second) R e s pons e Ti m e ( s ec) Server 1 Server 2 0 0.1 0.2 0.3 0.4 0.5 0.6 60 180 300 420 540 660 780 900 1020 1140 1260 Time (second) R e s pons e Ti m e ( s ec ) Server 1 Server 2

Autonomic session redirection

z

Scenario (II)

Data transfer under

server load variation

z

Setup

Randomly generated background load

applied on the data servers

C1 FSS Proxy S1 FSS Proxy S2 FSS Proxy

z

Autonomic session redirection achieves the best performance by adapting

(15)

Autonomic Data Replication

z

Scenario

Data transfer and replication under

dynamic server failures

z

Setup

Replica degree: 2 (1 primary + 1 backup)

Failures randomly injected on the servers

Consecutive executions of the benchmark

z

Case I:

Independent sessions

(different inputs)

primary server failed

primary server failed

new replica created

new replica created primary server failedprimary server failed

new replica created

new replica created

backup server failed

backup server failed

new replica created

new replica created

redirected to backup

redirected to backup

1st run done

1st run done 2nd run done2nd run done

backup server failed

backup server failed

new replica created

new replica created

3rd run done

3rd run done 4th run done4th run done

180 309 884 1015 1077 1213 2020 2159 198 676 1329 1996 2642 redirected to backup redirected to backup 1096 0 time (s) C1 FSS Proxy S1 FSS Proxy S2 FSS Proxy

(16)

z

Scenario

Data transfer and replication under

dynamic server failures

z

Setup

Replica degree: 2 (1 primary + 1 backup)

Failures randomly injected on the servers

Consecutive executions of the benchmark

z

Case II:

Dependent sessions

: same input, cache reuse

Autonomic Data Replication

C1 FSS Proxy S1 FSS Proxy S2 FSS Proxy

primary server failed

new replica created

redirected to backup

180 311

201 675

backup server failed

884

new replica created

1011 715 754 793 833 873 913 953 993 1033 7th run do ne 8th run do ne 9th run do ne 10 th r un don e 2nd r un don e 3rd r un don e 4th run do ne 5th run do ne 6th run do ne time (s): 0 1st ru n do n e

(17)

Related Work

z

Data management in Grid environment

GridFTP, GASS, Condor, Legion

z

Middleware control over Grid data transfers

Batch-Aware Distributed File System [NSDI’04]

Self-optimizing, fault-tolerant bulk data transfer

framework based on Condor and Stork [ISPDC’03]

z

Replication and storage management

Wide-area data replication based on Globus RFT

(18)

Summary

z

Problem:

Efficient data management in heterogeneous,

dynamic and large-scale Grid environments

z

Solution:

Autonomic data management system based on

GVFS and self-managing, goal-driven services

z

Future work:

Autonomic session checkpointing and migration

(19)

[1] M. Zhao, J. Zhang and R. Figueiredo, “Distributed File System Virtualization Techniques Supporting On-Demand Virtual Machine Environments for Grid Computing”, Cluster Computing, 2006.

[2] M. Zhao, V. Chadha and R. Figueiredo, “Supporting Application-Tailored Grid File System Sessions with WSRF-Based Services”, HPDC-2005.

[3] J. Xu, S. Adabala and J. A.B. Fortes, “Towards Autonomic Virtual Applications in the In-VIGO System”, ICAC-2005.

[4] L. W. Russell et al, “Clockwork: A New Movement in Autonomic Systems”, IBM Systems Journal, 42(1), 2003.

[5] J. Bent et al, “Explicit Control in a Batch-Aware Distributed File System”, NSDI-2004.

[6] T. Kosar et al, “A Framework for Self-optimizing, Fault-tolerant, High Performance Bulk Data Transfers in a Heterogeneous Grid Environment”, ISPDC-2003.

[7] A. Chervenak et al, “Wide Area Data Replication for Scientific Collaborations”, Grid-2005.

[8] Y. Zhong, S. G. Dropsho, C. Ding, “Miss Rate Prediction across All Program Inputs”, PACT-2003.

WSRF::Lite An Implementation of Web Services Resource Framework http://www.sve.man.ac.uk/Research/AtoZ/ILCT

GVFS Virtualized distributed file system for Grid environment http://www.acis.ufl.edu/~ming/gvfs

In-VIGO Virtualization middleware for computational Grids http://www.acis.ufl.edu/invigo

References

Grid virtualization middleware

(20)

Acknowledgments

z

In-VIGO team

http://invigo.acis.ufl.edu

z

NSF Middleware Initiative

z

NSF Research Resources

z

IBM Shared University Research

z

Intel ISTG R&D Council

References

Related documents

We rejoice in the gift of our Diocese and we pray that our Diocese will continue to announce the Good News of Jesus Christ in the Counties of Armstrong, Fayette, Indiana, and

independent of energy intake are highly warranted, and ideally be complemented by tracer studies or assessments of change in lean body mass. An important aspect to consider

Linux Data z/OS Disk System z z/VM Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux UPSTREAM z/OS Storage Server z/OS UPSTREAM Linux on System z Clients

Unidirectional IMS Data Replication SOURCE SERVER Classic Data Architect Replication Metadata Capture Services Source IMS Databases TARGET SERVER TCP/IP IMS Logs IMS DB RC A PI

For Switzerland’s union movement, this observa- tion still applied in the 1990s: membership was heavily dominated by production and maintenance workers in manufacturing,

Notes: All samples showed 100% of cell viability except in the case of time-varying magnetic fields applied on magnetically loaded cells, which caused 95% ± 5% cell

• In Windows, press the OneKey Rescue system button to launch Lenovo OneKey Recovery system.. • On the main screen, click Create

• Make sure that the hard disk drive is included in the Boot menu in the BIOS setup utility correctly. A