Technical White paper RAID Protection and Drive Failure Fast Recovery

(1)

Technical White paper

RAID Protection and Drive Failure Fast Recovery

RAID protection is a key part of all ETERNUS® Storage Array products. Choices of the level chosen to meet

customer application requirements involve evaluating a number of aspects of the application demands.

The considerations of these aspects are addressed in this paper, including operation in degraded mode,

time to recover to a protected state, and time to restore to the fully configured state. The introduction of

the Fast Recovery feature changes some of the long held assumptions about the use of the different RAID

level choices and provides recovery to a protected state in a small fraction of the time of other choices.

Table of Contents

1 Introduction ... 1

2 Definitions ... 1

3 Failure / Protection Relationships ... 2

3.1 RAID1 and RAID10 Failed Drive Operations ... 2

3.2 RAID5 Failed Drive Operations ... 3

3.3 RAID6 Failed Drive Operations ... 3

3.4 RAID6-FR Failed Drive Operations... 3

4 The Cost of Protection ... 4

5 Recovery of Protection ... 4

5.1 Copy-back Full Configuration Restore Considerations ... 4

6 Conclusions ... 5

7 Minimum and Nominal Rebuild Time Charts... 6

List of Figures Figure 1 - Minimum Rebuild Times with very low host traffic ... 6

Figure 2 - Nominal Rebuild Times with host traffic... 6

List of Tables Table 1 - Valid RAID6-FR RAID Group Combinations ... 2

Table 2 - Normalized Relative Rebuild Rates ... 3

Table 3 - Usability / Protection Relationships ... 4

1 Introduction

RAID has been the standard means of ensuring against loss of data with storage arrays for many years, with several

organizational forms available within the ETERNUS Storage Array product family. There are three key aspects of protecting against data loss when drives fail; first, operations when the failure is recognized; second, recovery to a protected state; and third when

This paper is not intended to be a tutorial on RAID technology, as the technology is well covered in many existing documents, both within Fujitsu and in publicly available papers.

2 Definitions

(2)

■ Copy-back Mode – the state of a RAID Group when the protection data is being restored to a replacement drive, while the RAID Group is in a Fully Protected state

■ Copy-back-less Mode – a mode of operation where a replacement drive does not assume the role of the drive it is replacing, but leaves the protection data in the rebuilt target location(s).

■ Global Hot Spare (GHS) – one or more drives within an array that can be used in any of several RAID Groups to replace a failed drive through the rebuild process appropriate for that RAID organization.

■ Dedicated Hot Spare – a drive that is a part of a RAID Group but that is not holding active data, but rather is available to replace a failed drive within just that group.

■ RAID6-FR – a special form of a RAID6 Group that includes a Dedicated Hot Spare-like drive with all drives active within the normal operation of the group. The Hot Spare space is distributed across all of the drives in the group. There are specific valid member disk combinations, where “xD” represents a number of Data drives, “2P” designates two Parity drives, and “1HS” indicates one Dedicated Hot Spare drive.

RAID6-FR Organization (Ordered by Total Drives

per RAID Group)

User Drives per RAID Group Total Drives per RAID Group (3D+2P)x2+1HS 6 11 (6D+2P)x2+1HS 12 17 (9D+2P)x2+1HS 18 23 (12D+2P)x2+1HS 24 29 (5D+2P)x4+1HS 20 29 (13D+2P)x2+1HS 26 31 (3D+2P)x6+1HS 24 31

Table 1 - Valid RAID6-FR RAID Group Combinations

■ Usability Factor – the portion of the total space in the drives of a group that can be used to hold user data (see Table 3); the higher the usability factor the less the cost in number of drives for a given amount of user storage. Likewise, the lower the usability factor the greater the cost in number of drives for a given amount of user storage.

3 Failure / Protection Relationships

When disk drives fail within a RAID protected set, a change in activity takes place. With the data space that was held on the failed drive no longer available, accesses require special processing; depending upon the organization of the RAID Group.

■ Failure Probability – indicates the probability that a device will fail, in this case that there will be a failure in a disk drive. There are a number of different failures that can occur within a disk drive that are important to consider in choosing a RAID Group organization.

■ Degraded Operation Time – indicates the period of time that the RAID Group must reconstruct portions of the data that were held on the failing drive.

■ Recover Protection Time – indicates the period of time that the RAID Group has less than the expected level of fault protection. This is the time that a Hot Spare may be used to rebuild the content of the failed drive. During this time, host accesses will experience greater than normal response times.

■ Restore Protection Time – indicates the period of time that will be required to fully restore the RAID Group to the planned

configuration. This includes the time required to obtain the replacement drive, get it installed in the array, and restore it into the planned configuration role.

3.1 RAID1 and RAID10 Failed Drive Operations

In the case of RAID1 or RAID10, instead of balancing the Read operations across the two mirrored drives, all of the reads must be serviced by the remaining drive of the pair. Instead of writes being directed to both drives of the pair, only one write can be supported. Although failure of another drive within a RAID10 group is

protected, if the failing drive is mated with an already failed drive, the second failure will result in data loss.

If there is a suitable Hot Spare drive, either dedicated or global, then rebuilding can begin to restore protection to the group right away. The operation of rebuilding a RAID1 or RAID10 member involves copying the data from the surviving mate of the pair over to the replacement drive (either the Hot Spare or new replacement drive). The maximum rate of the rebuild is determined by the Write rate of the single drive in the copy operation.

(3)

3.2 RAID5 Failed Drive Operations

In the case of RAID5, Read accesses require recovery of the data from the surviving drives within the RAID Group. Consider a RAID5(4D+1P) Group where four drives must be read to reconstruct the data for the failed drive access. Write operations involve reading from all of the surviving drives and may require writing back to one or two, depending upon where the data is located within the stripe.

If there is a suitable Hot Spare drive, either dedicated or global, then rebuilding can begin to restore protection to the group right away. The rebuilding of a RAID5 group involves reading from all of the surviving drives and writing to the replacement drive. If another drive fails before the rebuilding process is completed, the failure will result in data loss.

If the RAID5 group is configured in Copy-back Mode, when the failed drive is replaced, a copy back operation will be initiated. This involves copying the content of the used Hot Spare drive to the new drive, and is limited to the maximum Write rate of the single drive in the copy operation.

3.3 RAID6 Failed Drive Operations

As in the case of RAID5, Read accesses in RAID6 require recovery of the data from the surviving drives within the RAID Group. In the case of a RAID6(4D+2P) Group, five drives must be read to reconstruct the data for the failed drive access. Likewise, Write operations involve reading all the surviving drives and writing two or three drives, depending upon where the data is located within the stripe.

If there is a suitable Hot Spare drive, either dedicated or global, then rebuilding can begin to restore the full protection to the group right away. The rebuilding in a RAID6 group involves reading from all of the surviving drives and writing to the replacement drive. If another drive fails while the rebuilding process is active, the rebuilding can still complete without data loss, but the additional failed drive will need to be treated as well. The maximum rate of the rebuild is limited by the Write rate of the single drive. If the RAID6 group is configured in Copy-back Mode and a Hot Spare drive was used for the initial recovery, when the failed drive is replaced, a copy back operation will be initiated. This involves copying the content of the used Hot Spare drive to the new drive and is limited to the maximum Write rate of the single drive in the copy operation.

With large NL-SAS drives, both rebuild and copy-back can take many hours, and with host traffic active, the very large drives can easily take more than a day to complete the rebuild or copy-back. With the protection present against a second drive failure afforded by RAID6, and with a maintenance contract to provide a

replacement drive within a day for the failed drive, it has been recommended to not use Hot Spare drives for these RAID Groups.

3.4 RAID6-FR Failed Drive Operations

The RAID6-FR introduces a new form of RAID6 Group, which includes an extra drive in support of two or more RAID6 groups. The equivalent space of one drive is provided through some reserved space on all of the drives in the set. Data spans all of the drives, along with dual parity protection over subgroups within the set using a rotating assignment scheme.

When a drive in a RAID6-FR group fails, rebuild begins immediately to the reserved space in all of the surviving drives. The rebuild operation proceeds more rapidly than in the other RAID

organizations because the rebuild rate is not limited by the Write rate of a single drive.

Table 2 shows the relative rebuild rates with both the standard RAID6 organization and most of the RAID6-FR organizations. The key reason that the RAID6-FR rebuild rates are much higher is that all of the surviving drives in the RAID Group provide space for the recovered data, eliminating the bottleneck of the Write rate of a single drive. This results in much shorter times that the group is exposed to a lesser degree of protection than planned in the configuration. The relative rebuild rates are normalized to the rate required by the RAID6(3D+2P) rate, considered in MB/s. This rate is determined by the Write rate on the target drive(s) for the rebuild. The time taken to rebuild is dependent upon the type of drive, the size of the drive and the amount of traffic in the system when the rebuild is running.

RAID Organization (Ordered by Rebuild Rate)

Rate with No Host Traffic Rate with Host Traffic RAID6(3D+2P) 1.0 0.5 RAID6-FR(3D+2P)x2+1HS 7.4 3.8 RAID6-FR(6D+2P)x2+1HS 12.5 4.2 RAID6-FR(9D+2P)x2+1HS 17.7 5.6 RAID6-FR(13D+2P)x2+1HS 22.2 6.3 RAID6-FR(3D+2P)x6+1HS 48.0 13.3

Table 2 - Normalized Relative Rebuild Rates

(4)

4 The Cost of Protection

Protecting against data loss does not come for free – there is a cost for various levels of protection offered by the different RAID organizations. One way to look at the cost of protection is to consider what portion of the total space offered by the drives is available for user data. This can be viewed from a high level as the ratio of user drives to total drives. This needs to be weighed against the cost of lost data when a drive fails.

Table 3 shows the levels of usability for approximately the same number of user drives and the associated protection level afforded by the different RAID organizations.

■ Protect Level 0 – indicates that there is data loss when any drive fails – the data is not protected at all.

■ Protect Level 1 – indicates that one drive in a group can fail and the data is protected against loss, but a second drive failure will cause loss.

■ Protect Level 2 – indicates that any two drives in a group can fail and the data is protected against loss.

RAID Organization (Ordered by Usability Factor) # RAID Groups User Drives Total Drives Usability Factor Protect Level RAID0(4D) 6 24 24 1.00 0 RAID6-FR (13D+2P)x2+HS 1 26 31 0.84 2 RAID6-FR (12D+2P)x2+HS 1 24 29 0.83 2 RAID6-FR (9D+2P)x2+HS 1 18 23 0.78 2 RAID6-FR (3D+2P)x6+HS 1 24 31 0.77 2 RAID5(4D+1P)+GHS 6 24 31 0.77 1 RAID6-FR (6D+2P)x2+HS 2 24 34 0.71 2 RAID6-FR (5D+2P)x4+HS 1 20 29 0.69 2 RAID6(4D+2P)+GHS 6 24 37 0.65 2 RAID6-FR (3D+2P)x2+HS 4 24 44 0.55 2 RAID10(4+4)+GHS 6 24 49 0.49 1*

Table 3 - Usability / Protection Relationships

(1* indicates that in a RAID10 group there may be protection against another drive failure, provided it is not the mate of the first failed drive.)

Note that Global Hot Spares are commonly used with some of the RAID organizations to reduce the number of drives and improve the Usability Factor. When the first drive failure in any group which is protected with a Global Hot Spare is encountered, the rebuild operation can begin right away. Some other group can encounter a failure, but without a spare to use, the rebuild will be delayed, exposing the group to data loss if another drive fails before the Hot Spare has been replaced.

5 Recovery of Protection

A key aspect of any protection mechanism is the time of exposure to additional failures and the time required to recover a degree of protection closer to that planned in the configuration. It should be clear to the reader that until the failed drive is replaced, the protection level is not at that planned for the configuration. Recovery of the primary level of protection should be completed as soon as possible to ensure against data loss on any subsequent failures. The amount of time to complete the rebuild after a drive fault varies quite widely depending upon several factors. These include:

■ RAID Organization – most RAID organizations (RAID1, RAID10, RAID5, and RAID6) require rebuilding to a single replacement or Hot Spare drive, which limits the rebuild rate to the Write rate of a single drive. RAID6-FR is able to use all of the surviving drives in the group during the rebuild process, therefore rebuilding at a much faster rate, reducing the exposure time.

■ Drive Size and Speed – the size and speed of the failed drive determine the rate of rebuild and the time it takes to complete the rebuild, with larger slower drives rebuilding can take a long time.

■ Host Traffic – the level of host traffic on the system also impacts the rebuild rate, as the rebuild is normally conducted at a lower priority than supporting the host demands, so with heavy host traffic, the exposure time is extended as well. The fastest rebuild rate, and therefore the minimum exposure time is when there is very little host traffic. In this case the RAID organization and drive type determine the exposure time.

5.1 Copy-back Full Configuration Restore Considerations

It is important to recognize that it is necessary to replace any failed drive in a timely manner, and RAID6-FR is no exception. Fast Recovery provides full protection for additional failures, but when the failed drive is replaced, it must be integrated into the RAID6-FR group to complete the restore operation. This operation requires rebuilding the content of the single drive that is being

(5)

6 Conclusions

This paper has shown that the RAID6-FR feature reduces the recovery time for the first disk failure to only one tenth of the time that other RAID organizations require. This reduces the possibility of data loss from a second disk failure. By reducing the recovery time, normal host response time performance will return much more quickly than with conventional recovery procedures. In addition, RAID6-FR provides full protection during the drive replacement process, further ensuring against data loss. It is noted however, that when there is heavy host traffic, the recovery time will take longer, but with RAID6-FR the recovery time with traffic is still much less than with the other RAID organizations. As is always the case, recovery time is directly a function of the size and speed of the drives making up the RAID Group.

(6)

7 Minimum and Nominal Rebuild Time Charts

Figure 1 - Minimum Rebuild Times with very low host traffic

Figure 2 - Nominal Rebuild Times with host traffic

(7)

About Fujitsu America

Fujitsu America, Inc., is a leading ICT solutions provider for organizations in the U.S., Canada and the Caribbean. Fujitsu enables clients to meet their business objectives through integrated offerings and solutions, including consulting, systems integration, managed services, outsourcing and cloud services for infrastructure, platforms and applications; data center and field services; and server, storage, software and mobile/tablet technologies. For more information, please visit: http://solutions.us.fujitsu.com/ and http://twitter.com/fujitsuamerica

FUJITSU AMERICA, INC.

Address: 1250 East Arques Avenue Sunnyvale, CA 94085-3470, U.S.A. Telephone: 800 831 3183 or 408 746 6000

Website: http://solutions.us.fujitsu.com

Contact Form: http://solutions.us.fujitsu.com/contact

Have a question? Email us at: [email protected]

Fujitsu, the Fujitsu logo and ETERNUS are trademarks or registered trademarks of Fujitsu Limited in the United States and other countries. All other trademarks referenced herein are the property of their respective owners.

The statements provided herein are for informational purposes only and may be amended or altered by Fujitsu America, Inc. without notice or liability. Product description data represents Fujitsu design objectives and is provided for comparative purposes; actual results may vary based on a variety of factors. Specifications are subject to change without notice.

FPC65-7381-01 03/15