• No results found

Distributed RAID Architectures for Cluster I/O Computing. Kai Hwang

N/A
N/A
Protected

Academic year: 2021

Share "Distributed RAID Architectures for Cluster I/O Computing. Kai Hwang"

Copied!
32
0
0

Loading.... (view fulltext now)

Full text

(1)

Distributed RAID Architectures

Distributed RAID Architectures

for Cluster I/O Computing

for Cluster I/O Computing

Kai Hwang

Kai Hwang

Internet and Cluster Computing Lab.

University of Southern California

(2)

Presentation Outline :

n

Scalable Cluster I/O

n

The RAID-x Architecture

n

Cooperative disk drivers

n

Benchmark Experiments

n

Security and Fault Tolerance

(3)

K. Hwang , March 15,2001 in Beijing

K. Hwang , March 15,2001 in Beijing

3

Scalable clusters providing SSI

services are gradually replacing the

SMP, cc-NUMA, and MPP in Servers,

(4)

K. Hwang , March 15,2001 in Beijing

K. Hwang , March 15,2001 in Beijing

4

g

Size Scalability (physical & application)

g

Enhanced Availability (failure management)

g

Single System Image (Middleware,OS extensions)

g

Fast Communication (networks & protocols)

g

Load Balancing (CPU, Net, Memory, Disk)

g

Security and Encryption (clusters of clusters)

g

Distributed Environment (User friendly)

g

Manageability (Jobs and resources )

g

Programmability (simple API required)

g

Applicability (cluster- and grid-awareness)

(5)

K. Hwang , March 15,2001 in Beijing

K. Hwang , March 15,2001 in Beijing

5

g

Sixteen Pentinum PCs are housed in

two 9-ft computer racks.

g

All PCs run with the RedHat

Linux v. 6.0 (Kernel v. 2.2.5)

g

All nodes are connected by a 100

Mbps Fast Ethernet

g

The cluster is ported with DQS,

LSF, MPI, PVM, TreadMarks,

Elias, and NAS benchmarks, etc.

g

Scalable to a future system with

100’s of future PC nodes

inter-connected by Gigabit networks

The USC Trojans Cluster Project

Internet and Cluster Computing Lab. EEB Rm.104

(6)

K. Hwang , March 15,2001 in Beijing

K. Hwang , March 15,2001 in Beijing

6

Trojans Linux Cluster

with Middleware for Security

and Checkpoint Recovery

Pentium

PC

Pentium

PC

Pentium

PC

Gigabit Network Interconnect

Security and Checkpointing Middleware

Single-System Image and Availability Infrastructure

Programming Environments

(Java, EDI, HTML, XML)

Web Windows

User Interface

Other Subsystems

(Database, OLTP, etc.)

(7)

An I/O-centric cluster architecture

Entry

Partition

Fast Ethernet

Internet/Intranet

Client

Database

Partition

Service

Partition

Service Flow

Data Flow

(8)

K. Hwang , March 15,2001 in Beijing

K. Hwang , March 15,2001 in Beijing

8

Distributed RAID

Embedded in

Clusters or Storage-Area Networks:

g

I/O Bottleneck in Scalable Cluster Computing

n

The gap between CPU/Memory and disk-IO widens

as the

µµP doubles in speed every year

n

Cluster applications are often I/O-bound

g

Disks connected to hosts are often subject to failure by

hosts themselves. Distributed RAID has much higher

availability by fault isolation, rollback recovery, and

automatic file migration.

(9)

Distributed RAID with a single

I/O space embedded in a cluster

Workstations

or PCs

Cluster Network (SAN or LAN)

(10)

Research Projects on

Parallel and Distributed RAID

System Attributes USC Trojans RAID-x Princeton TickerTAIP Digital Petal Berkeley Tertiary Disk HP AutoRAID RAID Architecture environment Orthogonal striping and mirroring in a Linux cluster RAID-5 with multiple controllers Chained Declustering in Unix cluster

RAID-5 built with a PC cluster Hierarchical with RAID-1 and RAID-5 Enabling Mechanism for SIOS Cooperative device drivers in Linux kernel Single RAID server implementation Petal device drivers at user level xFS storage servers at file level Disk array within single controller Data Consistency Checking Locks at device driver level Sequencing of user requests Lamport’s Paxos algorithm Modified DASH protocol in the xFS file system Use mark to update the parity disk Reliability and Fault Tolerance Orthogonal striping and mirroring Parity checks in RAID-5 Chained Declustering

SCSI disks with parity in RAID-5

Mirroring and parity checks

(11)

Four RAID architectures using different

mirroring and parity checking schemes

B8

M6 M7 M8

B6 B7

Disk 1 Disk 2 Disk 3 Disk 4

B0 B4 B8 P4 B12 B16 B1 B5 P3 B9 B13 B17 B2 P2 B6 B10 B14 P6 P1 B3 B7 B11 P5 B15

Disk 1 Disk 2 Disk 3 Disk 4

B0 B2 B4 B6 B8 B10 B1 B3 B5 B7 B9 B11 M0 M2 M4 M6 M8 M10 M1 M3 M5 M7 M9 M11

Disk 0 Disk 1 Disk 2 Disk 3 B0 B4 B1 B5 B2 M3 M4 M5 B3 M9 M10 M11 B9 B10 B11 M0 M1 M2

Disk 0 Disk 1 Disk 2 Disk 3 B0 M3 B4 M7 7 B1 M0 B5 M4 B2 M1 B6 M5 B3 M2 B7 M6 B8 M11 B9 M8 B10 M9 B11 M10 Data blocks Mirrored blocks

(a) Striped mirroring in RAID-10

(b) Parity checking in RAID-5

(c) Orthogonal striping and mirroring

(OSM) in the RAID-x

(d) Skewed striping in a chained

(12)

Theoretical Peak Performance of

Four RAID Architectures

7

Performance

Indicators

RAID-10

RAID-5

Chained

Declustering

RAID-x

Read

n B

n B

n B

n B

Large Write

n B

(n-1) B

n B

n B

Max. I/O

Bandwidth

Small Write

n B

nB / 2

n B

n B

Large Read

mR / n

mR / n

mR / n

mR / n

Small Read

R

R

R

R

Large Write

2 mW / n

mW / (n-1)

2 mW / n

mW / n +

mW / n(n-1)

Parallel

Read or

Parallel

Write Time

Small Write

2W

R+W

2W

W

Max. Fault Coverage

n/2 disk

failures

Single disk

failure

n/2 disk

failures

Single disk

failure

(13)

Distributed RAID-x architecture

Cluster Network

P/M CDD P/M CDD P/M CDD

Node 0

Node 1

Node 3

D0 P/M CDD

Node 2

D1 D2 D3 B0 B12 B24 M25 M26 M27 B1 B13 B25 M14 M15 M24 B2 B14 B26 M3 M12 M13 B3 B15 B27 M0 M1 M2 D4 D5 D6 D7 B4 B16 B28 M29 M30 M31 B5 B17 B29 M18 M19 M28 B6 B18 B30 M7 M16 M17 B7 B19 B31 M4 M5 M6 D8 D9 D10 D11 B8 B20 B32 M33 M34 M35 B9 B21 B33 M22 M23 M32 B10 B22 B34 M11 M20 M21 B11 B23 B35 M8 M9 M10

8

(14)

K. Hwang , March 15,2001 in Beijing

K. Hwang , March 15,2001 in Beijing

14

Single I/O space in a Distributed RAID

enabled by CDDs at Linux kernel level

Cluster node Cluster node Cluster node

Interconnection Network

(b) A global virtual disk with a

SIOS formed by cooperative disks

CDD CDD CDD Cluster node Cluster node Central NFS Server

Interconnection Network

(a) Separate disks driven by

independent disk drivers (IDDs)

IDD

(15)

K. Hwang , March 15,2001 in Beijing

K. Hwang , March 15,2001 in Beijing

15

(b) Using CDDs to achieve a SIOS in a serverless cluster.

User Level Kernel Level User Level Kernel Level

Client side Server side

CDD CDD

User

Application NFS Serveris bypassed

(a) Parallel disk I/O using the NFS in a server/client cluster.

User Level Kernel Level User Level Kernel Level

Client side Server side

NFS Client NFS Server User Application Traditional Device Driver

Remote disk access using central NFS server versus

using cooperative disk drivers in the RAID-x cluster

1 3 5 4 6 1 3 4 5 6 2 2

(16)

K. Hwang , March 15,2001 in Beijing

K. Hwang , March 15,2001 in Beijing

16

Architectural design of

Architectural design of

cooperative device drivers

cooperative device drivers

Node 2

Node 1

CDD

CDD

Virtual disks

Physical disks

Communications

through the network

(a) Device masquerading

Cooperative Disk Driver (CDD)

Storage

Manager

CDD Client

Module

Data Consistency Module

Communications through the network

(17)

K. Hwang , March 15,2001 in Beijing

K. Hwang , March 15,2001 in Beijing

17

Cluster node 1 Application

/sios

dir1 dir2 file1 file2 file3

Cluster node 2 Application

/sios

dir1 dir2

file1 file2 file3

Cluster node 3 Application

/sios

dir1 dir2 file1 file2 file3

Cluster node 4 Application

/sios

dir1 dir2 file1 file2 file3

CDD CDD CDD CDD

Cluster Network

Maintaining consistency of the global directory

(18)

Elapsed Time in Executing the Andrew

Benchmark on the Linux Cluster at USC

0 5 10 15 20 25 30 35 1 4 8 12 16 Number of Clients

Elapsed Time (sec)

Compile Read File Scan Dir Copy Files Make Dir 0 1 2 3 4 5 6 7 8 1 4 8 12 16 Number of Clients

Elapsed Time (sec)

(19)

Parallel Write Performance of four RAID

Architectures against Traffic Rate

13

Parallel writes (20MB per client)

0 2 4 6 8 10 12 14 16 18 1 4 8 12 16

Number of Clients

Aggregate Bandwidth (MB/s)

RAID-x Chained Declustering RAID-10 RAID-5 NFS

(20)

Parallel Write Performance of four

RAID architectures vs. Disk Array Size

13

Parallel write

0 2 4 6 8 10 12 14 16 18 2 4 8 12 16 Disk Numbers Aggregate Bandwidth (MB/s) RAID-x Chained Declustering RAID-10 RAID-5

(21)

K. Hwang , March 15,2001 in Beijing

K. Hwang , March 15,2001 in Beijing

21

Achievable I/O Bandwidth and

Improvement Factor on Trojans Cluster

NFS

RAID-x

I/O

Operations

1 Client

16 Clients

Improve

1 Client

16 Clients

Improve

Large Read

2.58 MB/s

2.3 MB/s

0.89

2.59 MB/s

15.63 MB/s

6.03

Large Write

2.11 MB/s

2.77 MB/s

1.31

2.92 MB/s

15.29 MB/s

5.24

Small Write

2.47 MB/s

2.81 MB/s

1.34

2.35 MB/s

15.1 MB/s

6.43

Chained Declustering

RAID-10

Operations

1 Client

16 Clients

Improve

1 Client

16 Clients

Improve

Large Read

2.46 MB/s

15.8 MB/s

6.42

2.37 MB/s

10.76 MB/s

4.54

Large Write

2.62 MB/s

12.63 MB/s

4.82

2.31 MB/s

9.96 MB/s

4.31

Small Write

2.31 MB/s

12.54 MB/s

5.43

2.27 MB/s

9.98 MB/s

4.39

(22)

Effects of Stripe Unit Size on

I/O Bandwidth of RAID Architectures

Large write (320MB for 16 clients)

14

0 2 4 6 8 10 12 14 16 18 20 16 32 64 128

Stripe Unit Size (KB)

Aggregate Bandwidth (MB/s)

RAID-xChained Declustering

RAID-10 RAID-5

(23)

Bonnie Benchmark Results on Trojans Cluster

Bonnie Benchmark Results on Trojans Cluster

File rewrite

0 0.5 1 1.5 2 2.5 3 3.5 2 4 8 12 16

Number of Disks

Output Rate (MB/s)

RAID-x chained declustering RAID-10 RAID-5

(24)

Securing Networks, Intranets,

Clusters, or Grid Resources

with intrusion control and automatic

recovery from malicious attacks

Highly secured Intranet with

intrusion detection and

response, automatic

recovery from

malicious attacks,

and fault-tolerance

with distributed

storage for

reliable I/O

Gateway

firewall

to screen

traffic flow

between

networks

Cluster

with no

security

protection

Increasing

Reliability

SMP cluster Intranet

Grid

Increasing scalability

No data

protection

Fault

(25)

K. Hwang , March 15,2001 in Beijing

K. Hwang , March 15,2001 in Beijing

25

Distributed Micro-Firewalls

(26)

Distributed Checkpointing on

Distributed Checkpointing on

The RAID

The RAID

-

-

x in Trojans Cluster

x in Trojans Cluster

Process 0 Time Process 2 Process 4 Process 1 Process 3 Process 5 Process 6 Process 7 Process 8 Process 9 Process10 Process11 Stripe0 Stripe1 Stripe2 C S C: Checkpointing overhead S: Synchronization overhead

18

(27)

K. Hwang , March 15,2001 in Beijing

K. Hwang , March 15,2001 in Beijing

27

Security Component Technologies

F

Firewalls

and

Cryptography

F

Cluster Middleware for Security

F

Anti-virus and Immune Systems

F

Intrusion Detection and Response

F

Distributed Software RAIDs

(28)

Distributed Intrusion Detection and Responses

Security Threats

Effectiveness in using Micro-Firewalls

Insider attacks

Protect hosts against attack from insiders

Denial-of-Service attacks

Protect against denial-of-service attacks

from any source

Trojan Program

Protect hosts from trapdoors by any source

IP Address

Spoofing

Can be reconfigured to prevent IP spoofing

at the client host level

Probes and

Scans

Use with IDS to block the probes and scans

close to their sources

Unauthorized

External access

Can prevent unauthorized access to the

external networks at the source

Attacks on

Intranet

Infra-structure

Resist both internal and external attacks and

provide fine-grained access control

(29)

K. Hwang , March 15,2001 in Beijing

K. Hwang , March 15,2001 in Beijing

29

Checkpointing overhead

on distributed RAIDs

0

0.5

1

1.5

2

2.5

3

3.5

1.21

2.21

3.21

4.21

5.21

6.21

7.21

8.11

checkpoint file size (MB)

checkpoint overhead (sec)

NFS

Vaidya

Striped

(30)

K. Hwang , March 15,2001 in Beijing

K. Hwang , March 15,2001 in Beijing

30

Advantages and Shortcomings

of Distributed Checkpointing

Checkpointing Scheme Advantages Shortcomings Suitable applications

Simultaneous writing to a central storage

(The NFS scheme)

Simple,

no inconsistent state

Has network and I/O contentions, NFS is single point of failure

Small size of checkpoint, small number of nodes, low I/O operation Staggered writing to a

central storage (Vaidya scheme)

Eliminate the network and I/O contention Network bandwidth is wasted, NFS is a single point of failure Small size of checkpointers, small number of nodes, low I/O operations Striped staggering

checkpointing on any distributed RAID (Our scheme)

Eliminate network and I/O contentions, low checkpoint overhead, fully utilize network bandwidth, tolerate multiple failures among stripe groups

Can not tolerate more node failures within each stripe group

Large size of checkpointers, large number of nodes, low communication, I/O intensive applications

(31)

K. Hwang , March 15,2001 in Beijing

K. Hwang , March 15,2001 in Beijing

31

F

Distributed storage-area networks demands hardware or

software support of a single I/O space not only in clusters

but also in pervasive information grids.

F

Hierarchical checkpointing with striping and staggered

mirroring for building fault-tolerant clusters to provide

continuous network services

F

Hacker-proof clusters are in great demand for securing

E-business, distributed computing, and metacomputing

grid applications.

F

Exploring new applications in multiserver consolidation,

collaborative design, and pervasive network services.

(32)

Call for Participation

IEEE Third International

Conference on Cluster Computing

CLUSTER 2001

Sutton Place Hotel, Newport Beach, California

References

Related documents