How To Write A Powerline

(1)

The Stragegy for Fast I/O in Degraded Mode of RAID-5

DONG-JAE, KANG CHANG-SOO, KIM **BUM-JOO,SHIN

*Computer & System Lab. ETRI

161 Gajeong-Dong, Yuseong-Gu, Deajeon, 305-350

KOERA

**

1025-1 Naei-Dong Miryang Gyeongnam

KOREA

Abstract: - RAID has been used for high performance and high availability of data, but it has problems that cost of data regeneration for supporting high availability in disk failure mode(degraded mode) and cost of data rebuilding for recovery of failed disk are very expensive. In this paper, we suppose the stragegy for fast I/O in degraded mode of RAID5. For proposed method, we add a spare disk to traditional RAID5 architecture, and use it for getting better performance in degraded mode and reducing the cost at disk recovery time. When I/O request, read and write operations, occurred for blocks in failed disk, request block can be regenerated correctly by reading corresponding blocks from surviving member disks and computing the exclusive OR of their contents. And regenerated blocks is saved additional spare disk. then, mark recovery-information to S.D(Spare Disk) recovery bitmap. Requests after this process can use the block recovered in spare disk and it is done repeatly.

This strategy allows data to be read or written with normal performance by accessing to spare disk blocks, not failed disk blocks. and requires only non-accessed data in degraded mode to be recovered on rebuilding time. By using supposed strategy, we have several advantages, better performance in degraded mode of RAID5 ,low cost at disk recovery time and short response time.

Key-Words: -RAID5, Degraded mode, I/O performance, Spare disk, Rebuilding, Regeneration

1 Introduction

RAID improves I/O performance because they tend to balance stream of either sequential or random I/O requests approximately across the member disks and progress data availability by storing redundant data that allows user data to be regenerated if the disk on which it is stored fails. Additionally, it simplify storage management by aggregating multiple physical disks into one large virtual disk and treating more storage capacity as a single management entity. RAID5 most frequently used to minimize the write performance bottleneck and support fault-tolerance by the distribution of successive parity stripes across some or all disk members cyclically. But it has problems that cost of data regeneration for offering high availability in disk failure mode(degraded mode) and cost of data rebuilding for recovery of failed disk are very expensive. Degraded mode refers to the situation that in the case of one disk failure and no attempt has been made to rebuild the dta to place it on anther disk, system can still work without interrupted[7]. But it has a significant increase in the system load when single disk is failed, degraded mode. Because read operations for the failed disk cause n-1 read accesses to the

relative blocks on surviving disks to regenerate lost data block. And write operations for the failed disk cause n-1 read accesses to the relative blocks on surviving member disks to compute the parity block and one write access to write the computed parity information to the block of the corresponding disk. Like upper description, RAID5 has serious defect in

degraded mode.

In this paper, we suppose the strategy for fast I/O in

degraded mode and reduce recovery cost at data rebuilding of failed disk

2 Related Work

When single disk failure is occurred in RAID5, failed disk should be replaced with a new disk to rebuild data. Otherwise, RAID5 performance will be very slow down to regenerate data using the surviving member disks in striping set for every read / write operations[1]. For the past several years, many researches for method to improve degraded performance of RAID5 have been proceeded.

When all disks in an array are operational, it is called to be normal mode, the array is said to be in degraded

(2)

mode when single disk in the RAID5 has failed and no attempt has been made to rebuild the data that used to be on that disk and place it on another disk. And the array is said to be in rebuild mode during the time that data on the failed disk is being rebuilt and placed on another disk[7]. To recover the data in failed disk, RAID uses spare disk scheme, dedicated sparing scheme, distributed sparing scheme and parity sparing scheme. Dedicated sparing is so that data on failed disk can be immediately rebuilt to a spare drive. And distributed sparing has the distributed spare space among the disks in a RAID, the parity sparing is smilar to the distributed sparing but it has not spare space ,when single disk is failed, stripe units in different disks are combined to form larger stripe set and disks[2][7]. The distributed sparing and parity sparing

improve the data write bottleneck, when failed data is recoverd.

Spare disk schemes described upper are recovery method for data in failed disk, and they don't guarantee I/O performance in degraded mode. But there is many

methods to improve performance in degraded

mode[5][7]. If single disk is failed, parity blocks in RAID5 is changed to normal data blocks and it is used as space for data of failed disk. The method can support performance in normal mode. But when data in failed disk is rebuild on recovery time, still the cost of recovery that data is regenerated by exclusive-OR and written to new disk is very expensive.

3 The Strategy for Fast I/O

Fig.1 is the architecture which is for supposed strategy in this paper. For presented strategy, spare disk is added to traditional RAID5 architecture, and it has a

S.D(Spare Disk) Recovery Bitmap to check whether a block in failed disk is recovered or not.

d p d d d p

s

p d d d p d d d d p d d d d p d d d RAID-5 Spare Disk S.D Recovery Bitmap stripe unit

parity unit data unit

Fig.1 Architecture for supposed method

First, let's define terminologies used in this paper. Data unit means the unit of data access supported by the

array, and parity unit represent redundancy

information generated from the bitwise exclusive-OR of the collection of data unit. The size of data unit and parity unit is same. The redundancy group formed by data unit and parity unit is called stripe unit.

Supposed method is for improving I/O performance in

degraded mode. So, in this paper, we describe only I/O strategy in single disk failure mode, degraded mode.

3.1 Read Strategy

In this sectrion, we describe read strategy in the

degraded mode of RAID5.

d0 d4 d7 p3 d1 d5 p2 d10 d2 p1 d8 d11 p0 d6 d9 d12 s0 / d1 s1 s2 s3 failed disk

READ request S.D recovery Bitmap

Spare disk

disk0 disk1 disk2 disk3

Fig.2 Read operation in degraded mode of RAID5 Fig.2 presents the method of read operation in

degraded mode, in system adapted by supposed strategy. Let's assume disk1 is failed to explain read operation in degraded mode. If a request for data block, d1, in failed disk is occurred, S.D Recovery Bitmap is investigated to ensure whether the failed block is recovered or not before read operation is executed. Bitmap is marked as "not recovered", read operation that map to failed disk regenerates requested data from surviving disk member as illustrates in [Fig. 2]. To regenerate the block, it reads data units(d0, d2) and parity unit(p0), then execute exclusive-OR among read data. The result is returned to upper application. Additionally it is saved on block(s0) in spare disk included to same stripe unit. If it is finished, the corresponding bitmap(S.D Recovery Bitmap) is marked as "recovered". Otherwise, if bitmap is checked as "recovered" on read operation, , it means re-access to the block after single disk is failed. In this case, read operation that map to the block in failed disk reads the data from same block in spare disk, without the cost of regeneration for failed data. So, re-access

(3)

request for data in failed disk is processed in normal mode performance. It has several advantages, no need of regeneration cost for failed block, prevention from sudden fall of performance, etc. And it improves the problem in degraded mode of RAID5.

3.2 Write Strategy

In this chapter, we describe write strategy in the

degraded mode of RAID5.

Fig.2 presents the method of write operation in

degraded mode, in system adapted by supposed strategy. Let us assume that disk1 is failure to describe write operation in degraded mode, and a request for data(d5) in failed disk is occurred.

d0 d4 d7 p3 d1 d5 p2 d10 d2 p1 d8 d11 p0 d6 d9 d12 s0 / d1 s1 / d5 s2 s3 failed disk WRITE request New

Block S.D recovery_Bitmap

Spare disk

Fig.3 Write operation in degraded mode of RAID5 Like read operation, write operation investigates S.D Recovery Bitmap before it is executed wether it is marked as "recovered" or not.

In case that checked as "not recovered", it read data(d4, d6) from member disk in stripe unit to update corresponding parity data(p1). If it is complete, then execute exclusive-OR operation between read data and buffer data to be written to failed disk block. The result is identical with d5 data and it is saved p1 to update the parity data. Additionally, it is saved to s1 block in spare disk to recover the failed disk block. Finally, it marks the corresponding S.D recovery bitmap as "recovered". In case that bitmap is checked as "recovered", it means re-access to a block after the disk is failed. In this case, as is the same with normal write operation, read the corresponding data blocks(d4, d6) to update parity block(p1), and then execute exclusive-OR operation. If it is complete, update the parity block, and requested data block in spare disk. In case requested block in failed disk is re-access block, read operation for d4, d6 data block is not need. So, more second access for same block in failed disk is identical with normal mode

operation. In next read operation, the data block recovered in write operation time can is used, too. It supports normal mode performance in degraded mode

by admitting input and output from/to spare disk and improves delayed response time and sudden fall of system performance in degraded mode.

3.3 Data Rebuilding Strategy

Fig.4 presents rebuilding process of data in failed disk after adapting supposed scheme.

s0, s1 and s4 is re-accessed blocks by read and write operations in degraded mode, and it has recovered blocks for failed disk. If a block in failed disk is accessed by I/O operations, when the block is re-accessed, it is recovered on same block in spare disk, and it can be served like normal disk state.

d0 d4 d7 p3 d2 p1 d8 d11 p0 d6 d9 d12 s3 s0 / d1 s1 / d5 s2 d13 p5 p4 d15 d16 d18 s5 Spare disk s4 / d14 S.D Recovery Bitmap d1 d5 d10 p2 d14 d17 0 1 2 3 4 5 failed disk

disk0 disk1 disk2 disk3

Fig.4 Rebuilding failed disk in supposed Strategy If I/O operations are processed for failed disk in

degraded mode, more data blocks for failed disk are recovered in spare disk. So the performance of the system adapted by supposed strategy approximates to

normal mode performance, and the cost of I/O in degrade mode is reduced.

Rebuilding or reconstruction means that make a replacement disk's contents consistent with those of the remaining normal member disks. To do it, reading corresponding stripe unit data from each of the surviving original memver disks and computing the exclusive-OR of these stripe unit's contents, then writing the result to replacement disk. Rebuilding is time-consuming job. It can last up to several hours for large disk. So RAID system must be capable of operating while it is occurring. Operation while rebuilding process tends to increase rebuild time, but provides the compensating benefit of continuous data availability to applications. In traditional RAID5 system, rebuilding is processed through whole blocks in failed disk, its overhead is very serious.

(4)

But, in presented strategy, we has a advantage that can execute selective rebuilding for only not-accessed blocks, not-recovered block in spare disk, in degraded mode. In [Fig. 4], s0, s1 and s4 means recovered blocks, after single disk, disk1, is failed. So, when data rebuilding is needed to recover failed disk. Only s2, s3

and s5 is selected. Rebuilding process in supposed strategy is the same with traditional RAID5 system, except supporting selective rebuilding, If selective rebuilding is completed, spare disk is replaced with failed disk to make the RAID5 system as normal. Selective rebuilding has low cost for disk recovery comparing of traditional system, and improves sudden fall of system performance in recovery time.

3.4 Advantages of Supposed Strategy

In this section, we describe several advantages that sparing disk scheme supports. Assume the number of disks consisting of RAID5 array is N.

First, when read operation is processed in degraded mode, regeneration for failed disk block always requires N-1 disk access in traditional RAID5 systems, reading N-2 data blocks and 1 parity block from corresponding member disks.

In supposed method, first read operation to a failed disk block requires N disk access, but read operations for re-accessed failed disk blocks need 1 disk access, as read operation is processed in normal mode.

Second, when write operation is executed in degraded mode, write on failed disk block always requires N disk accesses and bitwise operation, reading N-2 data units and 1 parity unit in stripe unit, and exclusive-OR operation among data units and parity unit, and 1 disk access to write updated parity block to relative disk. In supposed method, when first write to a failed disk block requires N+1 disk accesses, but write operations for re-accessed failed disk blocks have the same disk accesses in compared with write of normal mode. Third, when disk recovery, rebuilding, is processed, traditional RAID system requires recovery of whole blocks in failed disk. but, in supposed method, requires recovery of selective blocks, not accessed blocks by read or write operations in degraded mode.

Described advantages, upper, can improve the sudden fall of system performance in degraded mode, and reduce the cost of rebuilding on recovery time. It guarantees that the systems performance in degraded mode approximates to that of normal mode.

4. Evaluation and Analysis

In this section, we report the experimental results obtained in supposed strategy. The experiments were designed to compare the performance in between

degraded mode and normal mode of RAID5 and to analyze cost of rebuilding process. The environment for test is described in section 4.1.

4.1 Environments for Envaluation

We used SANtopiaVM(SANtopia Volume Manager), logical volume manager, to test supposed strategy, it has been implemented at Computer & System Dep. ETRI in Korea. Host Machine Switch Disk Array DataDataDisk Fiber channel HBA

Fig.5 Environment for testing the strategy Table 1 Detailed environmental value for testing

Env. elements Specification

Computer Compaq Server(smp)

O.S Linux 6.2(kernel-2.2.18)

RAM 512

Test System SANtopia Volume Manager

Switch to StoragSAN(Fibre channel) Host to Switch LAN

Storage RAID-5(500M * 4)

Stripe unit 128K

SANtopiaVM(SVM) presents the physical storages as one or more virtual disks, volume, by converting I/O requests directed to a virtual disks to I/O operations on the underlying member of physical disks. it can simply the management of many storages. SVM supports various RAID-levels, Linear, stripping, mirroring, and parity RAID. Fig. 5 represents environment for test. Whole system consists of host machine, switch and storage groups. And they are connected with fiber

(5)

channel or ethernet. The connection between host machine and switch consist of LAN, used to transfer TCP/IP based messages among the hosts. And connection between switch and storage groups is fiber channel, used to transfer SCSI based I/O data. The detailed spec refers to Table 1.

4.2 Performance Evaluation

We present the performances of read / write operation in degraded mode and in normal mode, and analyze it. Then, we compare two cases, to ensure that supposed method can improve the performance in degraded mode and reduce the cost of rebuilding. Each test is performed three times to obtain correct testing values.

4.2.1 Performance of Read Operation

Fig.6 presents performances of read operation in each mode, normal mode, degraded mode and S.D mode. In this test, degraded mode means that of traditional RAID5 not-adapting sparing disk scheme. And S.D

mode means degraded mode adapting supposed

method, sparing disk scheme.

Fig.6 shows that I/O cost in degraded mode requires three times more than normal mode. Read operaton in

normal mode is satisfied as 1 disk access, but that in

degraded mode requires N-1 disk accesses when RAID5 array consist of N-disks.

0 200 400 600 800 1000 1200

1M 10M_{I/O Block Size( 1KB / block )}100M 500M 1G

Required Time( sec )

Normal mode Failure mode S.D mode

Fig.6 Read performance in each mode

That is, read operation in degraded mode read data units and parity unit to regenerate the requested block. So, read operation in degraded mode has problem, sudden fall of system performance. But, Supposed method improves it by supporting I/O through additional spare disk. Supposed strategy regenerates the failed disk blocks and saves it in sparing disk. Then, next read operation for the same block in failed disk is served by recovered block in spare disk. It guarantees

that performance in degraded mode is improved. Additionally, on rebuilding time, it is used to reduce the cost of recovery. In Fig.6, read performance graph for 10M data in S.D mode has low performance than that of degraded mode, because of additional write to sparing disk. But as execution of read operation is continued, it approximate to normal state.

4.2.2 Performance of Write Operation

Fig.7 presents performances of write operation in each mode. 0 2000 4000 6000 8000 10000 12000 1M 10M 100M 500M 1G

I/O Block Size( 1KB / block )

Required Time( sec )

Normal mode Failure mode S.D mode

Fig.7 Write performance in each mode

Like the graph of read performance, write performance in S,D mode is lower than degraded mode for less 10M data because of additional write to spare disk. But we have a good performance through one more disk access. Re-accesses for failed disk blocks can be processed like a normal mode operations by writing data to spare disk. The shape of graph can be effected by stripe unit size, data size to be written, access frequency for same data and etc. Many re-accesses for same data make system has a good performance.

0 10 20 30 40 50 60 70 80 90 100 Acce ss R ate to th e Sa m e Blo ck( %) 1M 10M 100M 500M 1G I/ O Block Size Access Rate to the Same Block

(6)

Fig.8 shows access ratio to same blocks according to increase the number of blocks to be written. It is tested for 500M capacity disk and other environmental value is same. In case 500M data blocks is written, the re-access ratio is about 37%, and that of 1G data blocks is about 57%. So, if operations in degraded mode is executed continuously, re-access ratio to same block is increased. And the performance in S.D mode is improved because failed disk blocks is recovered in spare disk and I/O request for failed disk block is served by spare disk blocks.

4.2.3 Performance in Rebuilding mode

Supposed strategy improved performance in degraded mode by recovering failed disk data to spare disk and supporting I/O through it. When single disk is failed in RAID, it is required to be replaced as new normal disk. In this paper, while supposed method performs I/O request, it rebuild the failed disk blocks. So, on recovery time, selective rebuilding is needed, not rebuilding of whole data blocks in failed disk, it can reduced cost of recovery.

Fig.9 shows recovery ratio according to increasing the number of I/O blocks in degraded mode. We used 500M capacity disk to test recovery ratio. In case I/O of 500M data is performed, the recovery ratio of failed disk approach about 63%, and that of 1G data block has about 86% recovery ratio. So, by recovery time, if 500M data block is already written or read, Only 37% data block of whole disk is needed to be rebuilt.

0 10 20 30 40 50 60 70 80 90 100 R ec ove ry R ate ( % ) 1M 10M 100M 500M 1G

I/O Block Size( 1KB / block )

Disk Recovery Rate

Fig.9 Recovery ratio according to the number of I/O blocks In presented method, if re-accesses to a failed disk blocks is rare, I/O performance in degraded mode will be bad. But recovery ratio for failed disk blocks will be high. While, if re-accesses to a failed block is frequent, the result will be reverse. So I/O performance in

degraded mode and recovery ratio for failed disk

blocks have a relationship of inverse proportion. Reported results will be changed by many testing variables, that is, the number of array disks, stripe unit size, the capacity of disks, etc.

5. Conclusion and Future Works

RAID5 is used for high performance and high availability of data, but it has problems that cost of data regeneration for offering data availability in degraded mode and of data rebuilding for recovery of failed disk is very expensive. In this paper, we supposed strategy to resolve the problem and improve the performance in

degraded mode. It supports the fast I/O processing by recovering failed disk blocks to spare disk and serving I/O request throuth it. Additionally, on recovery time, it can reduce the cost of recovery by using data blocks regenerated in spare disk. Supposed method can process the I/O in degrade mode as like normal mode operations after first access to the failed blocks. So it make the system to get better performance in degraded mode.

In future works, tests should be performed for various environmental items, the number of array disks, various stripe unit size, various capacity of disks, etc. And the detailed result is required to be reported and analyzed.

References :

[1] Paul Massiglia, "The RAID book" 6th edition, RAID Advisory Board, 1997.

[2] Sanghoon Jeon, Byoungchul Ahn, A Cost - Effective Solution to the Single disk Failure in RAID Architecture, Communications, Computers and Signal Processing, 1997, pp. 285 -288 vol.1.

[3] Hai Jin, Kai hwang, Jiangling Zhang, A RAID Reconfiguration Scheme for Gracefully Degraded Operations, Parallel and Distributed Processing, 1999,

PDP ’99. Proceedings of the Seventh Eusomicro Workshop, 1999, pp.66-73.

[4] M.Holland, G,Gibson, and D.Siewiorek, Fast, On-line Failure Recovery in Redundant Disk Array, In 23rd Annual International Symposium on Fault-Tolerant Computing, 1993.

[5] Muppalaneni, N, Gopinath, K, A multi-tier I/O RAID storage system with RAID1 and RAID5,

Parallel and Distributed Processing Symposium, 2000. IPDPS 2000. Proceedings. 14th International, 2000, pp. 663-671.

[6] David A. Patterson, Garth A. Gibson, Randy H. Katz, A Case for Redundant Arrays of Inexpensive

(7)

Disks(RAID), ACM SIGMOD Conference Proceedings, 1988, pp. 109-116.

[7] Jai Menon, Dick Mattson, Comparison of Sparing

Alternatives for Disk Arrays, Proceeding of

International Symposium on Computer Architecture, 1992, pp. 318-329.

[8] Peter M. Chen and Edward K. Lee, Stripping in a RAID Level 5 Disk Array, In Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems, 1995, pp 136-145. [9] Kai Hwang, Hai Jin, Roy Ho, RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing, High-Performance Distributed Comput ing, 2000. Proceedings, 2000, pp 279-286.

[10] P. M. Chen, E. K. Lee, G. A. Gibson, R. H. Katz and D. A. Patterson, RAID: High- Performance, Reliable Secondary Storage, ACM Computing Surveys, Vol. 26, No.2, 1994, pp 145-185.

[11] G. Gibson and D. Patterson, Designing Disk Array for High Data Reliability, Journal of Parallel and Distributed Computing, January, 1993, pp 4-27

[12] Technology Forums, Ltd., Is RAID Five Levels of Confusion?, Computer Technology Revi ew, February, 1993, pp 37.

[13] N. H. Vaidya, A Case for Two-Level Distributed Recovery Schemes, Proc. ACM International Conf. on Measurement and Modeling of Computer Systems(Sigmetrics '95), pp 64-73.

[14] P. Cao, S. B. Lim, S. Venkataraman, and J. Wilkes, The TickerTAIP Parallel RAID Architecture,

ACM Trans. on Computer System, Vol.12, No.3, 1994, pp 236-296.

[15] G. A. Alvarez, W. A. Burkhard, and F. Cristian, Tolerating Multiple Failures in RAID Architectures with Optimal Storage and Uniform Declustering,

Proceedings of the 24th Annual ACM/IEEE International Symposium on Computer Architecture, 1997, pp 62-72.

[16] A. Thomasian, and J. Menon, RAID 5

Performance with Distributed Sparing, IEEE

Transactions on Parallel and Distributed Systems, Vol.8, No.6, 1997, pp 640-657.

[17] M. Y. Lee, M. S. Park, Double Parity Sparing for Performance Improvement in Disk Arrays, Proceeding of the International Conference on Parallel and Distributed Systems, 1966.