The RAID technology allows a collection of disks to perform as one. For our data and log files RAID, depending on the exact RAID configuration, can offer some or all of the advantages below.
• Redundancy – if one of the drives happens to go bad, we know that, depending on the RAID configuration, either the data on that drive will have been mirrored to a second drive, or it will be able to be reconstructed, and so will still be accessible, while the damaged drive is replaced.
• Improved read and write I/O performance – reading from and writing to multiple disk spindles in a RAID array can dramatically increase I/O performance, compared to reading and writing from a single (larger) disk.
• Higher storage capacity – by combining multiple smaller disks in a RAID array, we overcome the single-disk capacity limitations (while also improving I/O performance).
For our data files we would, broadly speaking, want a configuration optimized for maximum read performance and, for our log file, maximum write performance. For backup files, the simplest backup storage, if you're using DAS, may just be a separate, single physical drive.
Chapter 2: Planning, Storage and Documentation
However, of course, if that drive were to become corrupted, we would lose the backup files on that drive, and there isn't much to be done, beyond sending the drive to a recovery company, which can be both time consuming and expensive, with no guarantee of
success. Therefore, for backup files it's just as important to take advantage of the redun-dancy advantages offered by RAID storage.
Let's take just a brief look at the more popular of the available RAID configurations, as each one provides different levels of protection and performance.
RAID 0 (striping)
This level of RAID is the simplest and provides only performance benefits. A RAID 0 configuration uses multiple disk drives and stripes the data across these disks. Striping is simply a method of distributing data across multiple disks, whereby each block of data is written to the next disk in the stripe set. This also means that I/O requests to read and write data will be distributed across multiple disks, so improving performance.
There is, however, a major drawback in a RAID 0 configuration. There is no fault tolerance in a RAID 0 setup. If one of the disks in the array is lost, for some reason, the entire array will break and the data will become unusable.
RAID 1 (mirroring)
In a RAID 1 configuration we use multiple disks and write the same data to each disk in the array. This is called mirroring. This configuration offers read performance benefits (since the data could be read from multiple disks) but no write benefits, since the write speed is still limited to the speed of a single disk.
However, since each disk in the array has a mirror image (containing the exact same data) RAID 1 does provide redundancy and fault tolerance. One drive in the mirror set can be lost without losing data, or that data becoming inaccessible. As long as one of the disks in the mirror stays online, a RAID 1 system will remain in working order but will take a hit in read performance while one of the disks is offline.
RAID 5 (striping with parity)
RAID 5 disk configurations use block striping to write data across multiple disks, and so offer increased read and write performance, but also store parity data on every disk in the array, which can be used to rebuild data from any failed drive. Let's say, for example, we had a simple RAID 5 setup of three disks and were writing data to it. The first data block would be written to Disk 1, the second to Disk 2, and the parity data on Disk 3. The next data blocks would be written to Disks 1 and 3, with parity data stored on Disk 2. The next data blocks would be written to Disks 2 and 3, with the parity being stored on Disk 1. The cycle would then start over again.
This allows us to lose any one of those disks and still be able to recover the data, since the parity data can be used in conjunction with the still active disk to calculate what was stored on the failed drive. In most cases, we would also have a hot spare disk that would immediately take the place of any failed disk, calculate lost data from the other drives using the parity data and recalculate the parity data that was lost with the failure.
This parity does come at a small cost, in that the parity has to be calculated for each write to disk. This can give a small hit on the write performance, when compared to a similar RAID 10 array, but offers excellent read performance since data can be read from all drives simultaneously.
RAID 10 (striped pairs of mirrors)
RAID 10 is a hybrid RAID solution. Simple RAID does not have designations that go above 9, so RAID 10 is actually RAID 1+0. In this configuration, each disk in the array has at least one mirror, for redundancy, and the data is striped across all disks in the array. RAID 10 does not require parity data to be stored; recoverability is achieved from the mirroring of data, not from the calculations made from the striped data.
Chapter 2: Planning, Storage and Documentation
RAID 10 gives us the performance benefits of data striping, allowing us to read and write data faster than single drive applications. RAID 10 also gives us the added security that losing a single drive will not bring our entire disk array down. In fact, with RAID 10, as long as at least one of the mirrored drives from any set is still online, it's possible that more than one disk can be lost while the array remains online with all data accessible.
However, loss of both drives from any one mirrored set will result in a hard failure.
With RAID 10 we get excellent write performance, since we have redundancy with the need to deal with parity data. However, read performance will generally be lower that a RAID 5 configuration with the same number of disks, since data can be read simultane-ously from only half the disks in the array.