An Broad outline of Redundant Array of Inexpensive Disks

Download (0)

Full text

(1)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 4, April 2012)

388

An Broad outline of Redundant Array of Inexpensive Disks

Shaifali Shrivastava

1Department of Computer Science and Engineering AITR, Indore

1

shaifalishrivastav@gmail.com.

Abstract

:

In this paper, I reviewed several levels of RAID and a brief of RADD in the storage environment. RAID was originally developed at a time when HARD DISK were still very expensive and less reliable than they are today. We will discuss architecture and utilities of each technique or levels like RAID 0, RAID 1, RAID 2, RAID 3, RAID 4, RAID 5 and RAID 6 based on our literature review. We have discussed or identified the various factors impacting the disk drive performance and issues for improving the functioning of the storage infrastructure.

Keywords: RAID (Redundant array of independent or inexpensive disk), RADD (Redundant array of distributed disk), Stripe (Number of Blocks), HS (Hot Spare), RAID CLTR (Redundant array of Independent Disk Controller)

I. INTRODUCTION

Information is highly important in our daily lives . We have become information dependent of the Twenty-First Century, living in on-compel on-demand world, that means we need information when and where it is required.

SLED (single large expensive disk) was the most popular media for storing information. Now for providing the large space the organization had to have lots of disk drives. Then the concept of RAID (Redundant array of inexpensive disk) came into picture.

Mean time between failures: The mean time between failures (MTBF) measures in hours the average life expectancy of an HDD.

Today, data centers deploy thousand of HDD in their storage infrastructures. The greater the number of HDDs in the storage array, the greater the probability of a disk failure in the array. Consider a storage array of 100 HDDs each with a Mean time between failures of 650,000 hours. The MTBF of this collection of HDDs in the array, therefore, is 650,000/100 or 6,500 hours. This means that a HDD in this array is likely to fail at least once in 6,500 hours.

There was a need for a way to prevent single disk drive failure from causing data failure within the stack of disks. RAID is an enabling technology that leverages multiple disks as part of a set, which provide data protection against HDD failures.

The result was simply the six RAID levels from RAID 0 through RAID 6. RAID can be implemented in two ways, Hardware and Software.

Before discussing RAID we should know some terms which plays an important role in the storage and management. Disk Striping: Disk Striping breaks a body of data into units across available disk. There are two types of disk striping: single user and multi-user. Single user disk striping allows multiple hard disks to simultaneously service multiple I/O requests from a single workstation. Multi-user disk striping allows multiple I/O requests from several workstations to be sent to multiple hard disks. This means that while one hard disk is servicing a request from a workstation, another hard disk is handling a separate request from a different workstation.

DISK ONE DISK TWO

FIG 1- DISK STRIPPING

Mirroring: Mirroring is a technique whereby data is stored on two different HDDs, yielding two copies of information or data. In the event of one HDD failure, the data is intact on the surviving hard disk drive and the controller continues to service the host’s data requests from the surviving disk of a mirrored pair. When the failed disk is replaced with a new disk, the controller copies the data from the surviving disk of the mirrored pair. This particular activity is transparent to the host.

Data A

Data C

Data E

Data D Data B

(2)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 4, April 2012)

389 DISK ONE DISK TWO

FIG-2DISK MIRRORING

Parity: Detects error in the data to ensure it is not corrupted after transmission. Parity is basically a method of protecting stripped data from HDD failure without the cost of mirroring. An additional HDD is added to the stripe width to hold the parity. Using a mathematical illustration we can understand the concept

DISK ONE DISK TWO DISK THREE

FIG-3 PARITY GENERATION

Hot Spare: A hot spare refers to a spare HDD in a RAID array that temporarily replaces a failed HDD of a RAID set. A hot spare takes the identity of the failed HDD in the array. Simply we can say that if any hard drive fails the hot spare will take its place with little or no interruption.

FIG-4 HOT SPARE

RAID Array Components: A RAID array is an enclosure that contains a number of HDDs and the supporting hardware and software to implement RAID. A subset of disk drive within a RAID array can be grouped to form logical arrays, also known as a RAID set or a RAID group.

Logical arrays are comprised of logical volumes (LV). The operating system recognizes the LVs as if they are physical HDDs managed by a RAID controller. The number of hard disk drive in a logical array depends on the RAID level used.

Host

FIG-5 COMPONENTS OF RAID ARRAY

Various levels of RAID:

LEVELS BRIEF DESCRIPTION RAID 0 Striped Stack with no fault tolerance

RAID 1 Disk Mirroring with fault tolerance

RAID 3 Parallel access array with dedicated parity disks

RAID 4 Striped array with independent disk and a dedicated parity disk

RAID 5 Striped array with independent disk and dual parity

RAID 6 Striped array with independent disk and dual distributed parity

0+1 Striping than mirroring

1+0 Mirroring than striping

1.RAID LEVELS :

Depending on the business need one can easily identify the level of RAID which suits the organization.

Data A

Data C Data B Data A

Data C Data B

Data A

Data C

Data E

Data B

Data D

Data F

Check A+B

Check C+D

Check E+F

RAID

CLTR FAIL HS

Physical Array

Logical Array

RAID CLTR

(3)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 4, April 2012)

390

1.1 RAID 0

Not a true RAID, it offers no redundancy and normally distributes the data evenly across 1+ drives. In level 0, data is striped across drives, resulting in higher data throughput. Since no redundant information is stored, performance is very good, but the failure of any disk in the array results in data loss.

FIG-6 RAID 0

It utilizes the full disk capacity by distributing strips of data over multiple HDDs in the RAID set. To read data all disks are put together by the controller. When the number of drives in the RAID array increase, performance increase because more data can be read or write simultaneously.

1.2RAID 1

In RAID 1 data is mirrored to improve fault tolerance. A RAID 1 group consists of at least two hard disk drives. As explained in the mirroring that every write is written to both disks, which is transparent to the host in the hardware RAID implementation. In the event of disk failure, the impact on data recovery is the least among all RAID implementations. This is because the RAID controller uses the mirror drive for data recovery and continuous operation.

MIRROR SET MIRROR SET FIG-7 RAID 1 RAID 1 is suitable for applications that require high availability.

1.3RAID 3

RAID 3 stripes data for high performance and uses parity for improved fault tolerance. Parity information is stored on a dedicated drive so that data can be reconstructed if a drive fails

FIG-8 RAID 3

. DATA

RAID CLTR

A

D

G

C

F

I B

E

H

A

D

G

B

E

H DATA

RAID CLTR

A

D

G

B

E

H

DATA

RAID CLTR

A

D

G

B

E

H

Cp

Fp

(4)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 4, April 2012)

391 RAID Level 3 stripes data at a byte level across several drives, with parity stored on one drive. For example, if three drives are used for data that then out of three one disk is surely used for parity. Therefore, the total disk space required is 1.25 times the size of the data disks.

RAID 3 always reads and writes complete Stripes of data across all disks. As the drives operates in parallel.

There are no partial writes that update one out of many strips in a stripe.

1.4RAID 4

RAID Level 4 is quite similar to RAID 3 the only difference is that it stripes data at a block level across several drives, with parity stored on one drive. The parity information allows recovery from the failure of any single drive.

The performance of a level 4 array is very good for reads (the same as level 0). Writes, however, require that parity data be updated each time.

Unlike RAID 3 data disk in RAID 4 can be accessed independently so that specific data element can be read or written on single disk without read or write of an entire stripe.

1.5RAID 5

RAID 5 is quite similar to RAID 4 the only difference is that the parity is distributed across the data disks. In RAID 4, parity is written to a dedicated drive, creating a write bottleneck for the parity disk. In RAID 5 parity is distributed across disks. The distribution of parity in RAID 5 overcomes the write bottleneck.

1.6 RAID 6

Two disk failures in a RAID set leads to data unavailability and data loss in single-parity schemes, such as RAID-3, 4, and RAID 5

Increasing number of drives in an array and increasing drive capacity leads to a higher probability of two disks failing in a RAID set. RAID-6 protects against two disk failures by maintaining two parities. Horizontal parity which is the same as RAID-5 parity.Diagonal parity is calculated by taking diagonal sets of data blocks from the RAID set members Even-Odd, and Reed-Solomon are two commonly used algorithms for calculating parity in RAID-6.

1.7Nested RAID(1+0, 0+1)

Most data centers require data redundancy and performance from their RAID array. RAID 0+1 and 1+0 combines the performance benefits of RAID 0 with the redundancy benefits of RAID 1. They use striping and mirroring techniques and combine their benefits.

A common misconception is that RAID 1+0 and RAID 0+1 are the same. Under normal conditions RAID levels 1+0 and 0+1 offers identical benefits. However, rebuild operations in the case of disk failure differ between the two. RAID 1+0 is also called stripped mirror. The basic element of RAID 1+0 is a mirrored pair, which means the data is first mirrored and then both copies of data are stripped across multiple HDDs in the RAID set. When replacing a failure drive, only the mirror is rebuilt.

The basic element of RAID 0+1 is a stripe. This means that the process of striping data across HDDs is performed initially and then the entire stripe is mirrored. If one drive fails, then the entire stripe is faulted. A rebuild operation copies the entire stripe, copying data from each disk in the healthy stripe.

FIG-9 RAID 5

DATA

RAID CLTR

C

H

Op

Q B

G

L

Tp

A

F

K

P

D

Jp

M

R

o

Ep

I

N

S

(5)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 4, April 2012)

392

RAID

HDD Cost READ

performance WRITE performance

RAID

0 2 Low Very good for both sequential and random read

Very good

RAID 1

2 High Good better than a single disk

Good slower Than a single disk

RAID

3 3 Moderate Good for random reads Poor to fair For small Random writes RAID

4 3 Moderate Good for random reads Poor to fair For small Random writes RAID

5 3 Moderate Very good for random reads good for sequential reads

Fair for random writes slower due to priority overhead RAID

6

4 Moderat e but more than RAID 5

Very good for random reads good for sequential reads

Good for small, random writes(has write penalty) 0+1 4 High Very good Good 1+0 4 High Very good Good

II. CONCLUSION

Based on our discussion we derive the several finding about the redundant array of independent disks. Performance of RAID 1 is high as compared to RAID 0 if we consider availability of data the main concern. For the purpose of redundancy one can go for RAID 0 also. Raid 3 and RAID 4 are quite similar the only difference exist only in the way both stores data. Raid 3 and 4 are the enhancement of RAID 1 because they came with the logic of parity. The performance of RAID 5 is higher as compared to all the previous RAIDs. And lastly 6 is superior than all the form or levels of RAID available. Performance of RAID 6 can be improved.

References

[1] A Case for Redundant Arrays of Inexpensive Disks (RAID) Davtd A Patterson, Garth Gibson, and Randy H Katz Computer Séance D~v~smn

[2] Department f ElecmcalE ngmeermga and Computers clenches 571 Evans Hall Umversity of Cabforma

Berkeley. CA 94720 (partrsl@WF -kY du)

[3] Redundant Array of Independent Disks (RAID) on HP Compaq dc5750 Business PCs

[4] RAID: Redundant Array of [Independent Inexpensive] Disks posted: June 3rd, 2005 | Author: Matt Croydon | [5] Anvin H.P., “The Mathematics of RAID6”, online paper,

2009.

[6] http://kernel.org/pub/linux/kernel/people/hpa/raid6.pdf [7] Dorfman P.M., “From Obscurity to Utility: ADDR, PEEK,

POKE as DATA Step Programming Tools”, Proceedings of the SAS Global Forum 2009 Conference, Washington D.C., paper 010, 2009.

[8] http://support.sas.com/

resources/papers/proceedings09/010-2009.pdf

[9] Milne J.S., “Fields and Galois Theory”, online lecture notes, 2003.

[10]http://www.galois-group.net/theory/math594fS.pdf [11]Patterson D.A., Gibson G., Katz R.H., “A Case for

Redundant Arrays of Inexpensive Disks (RAID)”, Proc. ACM SIGMOD Conf., Chicago, 1988, p. 109.

[12]http://www.cs.cmu.edu/~garth/RAIDpaper/Patterson88.pdf [13]Plank J.S., “A Tutorial on Reed-Solomon Coding for Fault-Tolerance in RAID-like Systems”, Software—Practice and Experience (SPE), 27(9), 1997, pp. 955-1012.

Figure

FIG 1- DISK STRIPPING

FIG 1-

DISK STRIPPING p.1

References