RAID Solutions - 002Inside MS SQL Server 2000

Simply put, RAID solutions are usually the best choice for SQL Server for these reasons:

 RAID makes multiple physical disk drives appear logically as one drive.

 RAID provides different ways of distributing physical data across multiple disk drives.

 RAID can be provided as a hardware solution (with a special RAID disk controller card), but it doesn't necessarily imply special hardware. RAID can also be implemented via software.

 Windows NT and Windows 2000 Server provides RAID levels 0 (striping), 1 (mirroring), and 5 (striping with parity) for implementation with SQL Server.

Some RAID levels provide increased I/O bandwidth (striping), and others provide fault-tolerant storage (mirroring or parity). With a hardware RAID controller, you can employ a combination of striping and mirroring, often referred to as RAID-10 (and sometimes as RAID-1+0 or RAID-1&0). RAID solutions vary in how much additional disk space is required to protect the data and how long it takes to recover from a system outage. The type and level of RAID you choose depends on your I/O throughput and fault tolerance needs.

RAID-0

RAID-0, or striping, offers pure performance but no fault tolerance. I/O is done in "stripes" and is distributed among all drives in the array. Instead of the I/O capacity of one drive, you get the benefit of all the drives. A table's hot spots are dissipated, and the data transfer rates go up cumulatively with the number of drives in the array. So although an 18-GB disk might do 100 random I/Os per second, an array of eight 2.1-GB disks striped with Windows NT Server or Windows 2000 Server's RAID-0 might perform more than 400 I/Os per second.

Choosing an Appropriate Backup Strategy

Although you can use RAID for fault tolerance, it's no substitute for regular backups. RAID solutions can protect against a pure disk-drive failure, such as a head crash, but that's far from the only case in which you would need a backup. If the controller fails, garbage data might appear on both your primary and redundant disks. Even more likely, an administrator could accidentally clobber a table, a disaster (such as a fire, flood, or earthquake) could occur at the site, or a simple software failure (in a device driver, the operating system, or SQL Server) could threaten your data. RAID doesn't protect your data against any of those problems, but backups do. I'll talk about SQL Server's options for backing up your data in Chapter 5.

NOTE

You could argue that RAID-0 shouldn't be considered RAID at all because it provides no redundancy. That might be why it's classified as level 0.

Windows NT and Windows 2000 provides RAID-0; a hardware RAID controller can also provide it. RAID-0 requires little processing overhead, so a hardware implementation of RAID-0 offers at best a marginal performance advantage over the built-in Windows NT and Windows 2000 capability. One possible advantage is that RAID-0 in hardware often lets you adjust the stripe size, while the size is fixed at 64 KB in Windows NT and Windows 2000 and cannot be changed.

RAID-1

RAID-1, or mirroring, is conceptually simple: a mirror copy exists for every disk drive. Writes are made to the primary disk drive and to the mirrored copy. Because writes can be made concurrently, the elapsed time for a write is usually not much greater than it would be for a single, unmirrored device. The write I/O performance to a given logical drive is only as fast as that of a single physical drive.

Therefore, you can at best get I/O rates of perhaps 100 I/Os per second to a given logical RAID-1 drive, as opposed to the much higher I/O rates to a logical drive that can be achieved with RAID-0. You can, of course, use multiple RAID-1 drives. But if a single table is the hot spot in your database, even multiple RAID-1 drives won't be of much help from a performance perspective. (SQL Server 2000 lets you create filegroups within a database for, among other purposes, placing a table on a specific drive or drives. But no range partitioning capability is available for data within a table, so this option doesn't really give you the control you need to eliminate hot spots.) If you need fault tolerance, RAID-1 is normally the best choice for the transaction log. Because transaction log writes are synchronous and sequential, unlike writes to the data pages, they're ideally suited to RAID-1.

If price and performance are your only criteria, you might find that software-based RAID is perfectly adequate. If you have a fixed budget, the money you save on hardware RAID controllers might be better spent elsewhere, such as on a faster processor, more disks, more memory, or an uninterruptible power supply.

Purely in terms of performance, it might not be wise to spend extra money on low-end hardware RAID (if you have sufficient CPU capacity in the server), but mid-level to high-end RAID offers many other benefits, including hot swappability of drives and intelligent controllers that are well worth the extra money—especially for systems that need to run 24 hours a day, 7 days a week.

You might also combine hardware-based and software-based RAID to great advantage. For example, if you have three hardware-based RAID units, each with seven 9 GB disks, with RAID-5 you'd have 6 x 9 = 54 GB in each, and Windows NT and Windows 2000 would see three 54 GB disks. You can then use software-based RAID on

Windows NT and Windows 2000 to define a RAID-0 striped volume to get a single 162 GB partition on which to create SQL Server databases.

RAID-1 can significantly increase read performance because a read can be obtained from either the primary device or the mirror device. (A single read isn't performed faster, but multiple reads will have a faster overall time because they can be performed simultaneously.) If one of the drives fails, the other continues and SQL Server uses the surviving drive. Until the failed drive is replaced, fault tolerance is not available unless multiple mirror copies were used.

NOTE

The Windows NT and Windows 2000 RAID software doesn't allow you to keep multiple mirror copies; it allows only one copy. But some RAID hardware controllers offer this feature. You can keep one or more mirror copies of a drive to ensure that even if a drive fails, a mirror still exists. If a crucial system is difficult to access in a timely way to replace a failed drive, this option might make sense.

Windows NT and Windows 2000 Server provides RAID-1, and some hardware RAID controllers also provide it. RAID-1 requires little processing overhead, so a hardware implementation of RAID-1 offers at best a marginal performance advantage over the built-in Windows NT and Windows 2000 capability (although the hardware solution might provide the ability to mirror more than one copy of the drive).

RAID-5

RAID-5, or striping with parity, is a common choice for SQL Server use. RAID-5 not only logically combines multiple disks to act like one disk, but it also records extra parity information on every drive in the array (thereby requiring only one extra drive). If any drive fails, the others can reconstruct the data and continue without any data loss or immediate down time. By doing an Exclusive OR (XOR) between the surviving drives and the parity information, the bit patterns for a failed drive can be derived on the fly.

RAID-5 is less costly to implement than mirroring all the drives because only one additional drive is needed rather than the double drives that mirroring requires. Although RAID-5 is commonly used, it's often not the best choice for SQL Server because it imposes a significant I/O hit on write performance. RAID-5 turns one write into two reads and two writes to keep the parity information updated. This doesn't mean that a single write takes four or five times as long, because the operations are done in parallel. It does mean, however, that many more I/Os occur in the system, so many more drives and controllers are required to reach the I/O levels that RAID0 can achieve. So although RAID-5 certainly is less costly than mirroring all the drives, the performance overhead for writes is significant. Read performance, on the other hand, is excellent—it's almost equivalent to that of RAID-0.

RAID-5 makes sense, then, for the data portion of a database that needs fault tolerance, that is heavily read, and that does not demand high write performance. Typically, you shouldn't place the transaction log on a RAID-5 device unless the system has a low rate of changes to the data. (As mentioned before, RAID-1 is a better choice for the transaction log because of the sequential nature of I/O to the log.)

RAID-5 requires more overhead for writes than RAID-0 and RAID-1 do, so the incremental advantage provided by a hardware RAID-5 solution over the Windows NT and Windows 2000 software solution can be somewhat higher because more work must be offloaded to the hardware. However, such a difference would be noticeable only if the system were nearly at I/O capacity. If this is the case in your system, you probably shouldn't use RAID-5 in the first place because of the write performance penalty. You're probably better served by the combination of RAID-0 and RAID-1, known as RAID-10.

RAID-10

RAID-10, or mirroring and striping, is the ultimate choice for performance and recoverability. This capability is really not a separate type of RAID, but rather a combination of RAID-1 and RAID-0. It is sometimes also referred to as RAID-1+0 or RAID-1&0. A set of disks is striped to provide the performance advantages of RAID-0, and the stripe is mirrored to provide the fault-tolerance features and increased read performance of RAID-1. Performance for both writes and reads is excellent.

The most serious drawback to RAID-10 is cost. Like RAID-1, it demands duplicate drives. In addition, the built-in RAID software in the operating system doesn't provide this solution, so RAID-10 requires a hardware RAID controller. Some hardware RAID controllers explicitly support RAID-10, but you can achieve this support using virtually any RAID controller by combining its capabilities with those of Windows NT and Windows 2000. For example, you can set up two stripe sets of the same size and number of disks using hardware RAID. Then you can use Windows NT and Windows 2000 mirroring on the two stripe sets, which Windows NT and Windows 2000 sees as two drives of equal size. If you need high read and write performance and fault tolerance and you cannot afford an outage or decreased performance if a drive fails, RAID-10 is the best choice.

A separate RAID-10 array is usually not an appropriate choice for the transaction log. Because the write activity tends to be sequential and is synchronous, the benefits of multiple disks are not realized. A RAID-1 mirroring of the transaction log is preferable and cheaper. With internal Debit-Credit benchmarks, Microsoft has shown that the transaction log on a simple RAID-1 device is sufficient to sustain thousands of transactions per second—which is likely more than you need. Log records are packed, and a single log write can commit multiple transactions (known as group commit). In the Debit-Credit benchmarks, 40 transactions can be packed into a log record. At a rate of 100 writes per second (writes/second), the log on a RAID-1 mirror would not become a bottleneck until around 4000 tps.

In fact, this threshold is probably even higher than 4000 tps. For pure sequential I/O, 100 writes/second is a conservative estimate. Although a typical disk drive is capable of doing 100 random I/Os per second, the rate for pure sequential I/O—largely eliminating the seek time of the I/Os—is probably better than 200 writes/second. In a few cases, a physically separate RAID-10 array for the log might make sense—for example, if your system requires high online transaction processing (OLTP) performance but you also use replication of transactions. Besides performing sequential writes to the log, the system also does many simultaneous reads of the log for replication and might benefit from having the multiple disks available. (In most cases, however, the log pages are in cache, so RAID-10 probably won't provide a significant benefit. However, if replication is in a "catch-up" mode because the distribution database isn't available or for some other reason, some significant benefit might exist.)

RAID-01

Most of the discussion in the SQL Server documentation that concerns combining RAID0 and RAID-1 assumes that two stripes will be created first, and then one of the stripes will be defined as the mirror of the other. However, there is another possibility, which is to create multiple mirrored sets of disks first and then create a stripe from the mirrors. The first solution is referred to as mirrored stripes, and is the RAID 10 solution I discussed in the previous section. The second solution is referred to as striped mirrors and can be called RAID-01.

NOTE

that speed and fault tolerance are achieved, without the overhead of computing a parity value. However, the terminology is not standard. As mentioned previously, some sources will refer to RAID-1&0, or RAID-1+0. Some use the term RAID-10 to refer to striping a set of mirrored volumes, and use RAID-01 to refer to mirroring a striped volume, which is the opposite of the terminology I've been using. In many cases it might not matter which is done first, striping or mirroring, at least for the discussion at hand. I hope that in situations where it does matter, the authors of whatever sources you read will define the terms they use.

In document 002Inside MS SQL Server 2000 (Page 120-124)