Logical Drive Block1 Block2 Block3 Block4 Mirroring Block1 Block2 Block3 Block4 Block1 Block2 Block3 Block4 Physical Drives Block1 Block1 Block2 Block3 Block4 Block1 Block2 Block3 Block4 Striping
Logical Drive Physical Drives
Block3 Block4 Block1 Block2 Physical Drives Mirroring Mirroring
Introduction to RAID
Team members: 電機一 B94901150 王麒鈞, 電機一 B94901151 吳炫逸, 電機一
B94901154 孫維隆
A. Motivation
Gosh, my hard disk is broken again, and my computer can’t boot normally. I even have no chance to burn my cartoon and drama into DVD, it will cost me a lot of time to download again. Is there a method to automatically backup my data when I download a file? Yes, that is RAID (Redundant Array of Independent Disks) or (Redundant Array of Inexpensive Disks).
B. RAID LEVEL
The basic conception of RAID is that combine couple of small and cheap drives into array that offered greater capacity, reliability, speed. So the “I” has double meanings
Independent and Inexpensive. RAID is a means of spreading data into many drives by using disk striping (RAID 0), disk mirroring (RAID 1), disk striping with parity (RAID 5).
Depending on the level chosen, the benefit of RAID is one or more of increased data integrity, fault-tolerance, throughput or capacity compared to single drives. To spread data to every drive evenly, data must be divided into a lot of same-sized chunks (usually 32K or 64K). Depending on the RAID level chosen, write every chunk into the drives of array. When data are read, use the counter process. So that it can make an illusion that many drives are a big drive.
RAID 0
Simply put, we divide the data into parts and store them in more than two disks, the division makes the disks work more faster than one
disk does. RAID 0 wouldn’t store reduplicate data. When storing one data, RAID 0 has the lowest disk capacity requirement. But if any block in RAID 0 goes wrong, the combination has no ability to recover the data.
RAID 1
Block1 Block2 Block3 Block4 Block1 Block2 Block3 Block4 Striping
Logical Drive Physical Drives
Block3 Block4 Block1 Block2 Physical Drives Mirroring Mirroring RAID 0+1
It seems like that RAID 0 and RAID 1 are simple and they have their own advantages and
disadvantages. If we combine them as RAID 0+1, then we have a proper
way to apply RAID in use. First, we divide data into many parts, storing them in two(or three, four…) disks(as RAID 1). Second, we copy the divided data and store the copy ones in other two(or three, four…)disks. This way, we will have a combined disks with advantages as RAID 0 & RAID 1. In other words, the combined disk work efficiently and has recovery ability.
RAID2 & RAID3
RAID2 and RAID3 have the highest IO speed because the controller run all drives simultaneously (they divide a datum into bit or byte and spread to all the drives). But they can’t service multiple requests simultaneously, a datum is read all of drives woke and none have time to read another datum. So they are not used today.
RAID4 & RAID5
RAID4 and RAID5 are both use parity to evaluate their fault-tolerance. Parity are computed by data in other drives of array. When a drive of array can’t work, data in it can be computed by parity and data in remainder drives. When the broken drive is replaced by a same standard drive, original data can be rebuilt in the new one.
Parity in RAID4 is store in a specific drive. However the writing speed is limited by the parity drive. So RAID5 break the limitation, it spread parity into all drives in the array. The speed limitation is only the process of computing parity, and it will cost lot of time.
Now let’s see how RAID5 work when a drive is broken. Supposed that RAID5 use XOR (the operator we learn in chapter1) to compute parity. Why XOR can compute parity see the form.
data A data B parity AB 1 1 0 0 0 0 1 0 1 0 1 1
data B parity AB data A
How RAID5 works see the illustrations
The comparison of different version RAID, see the front.
Name Description of disk array Data reliability Data Transition Rate Max IO Transition Rate RAID 0 Store data in
parallel but no fault-tolerance
lower than single disk very high high in read &write
1 1 0
data A parity AB data B
1 0 1 0 0 0 1 1 0 0 1 1
A
B
parityCD
A
C
D
Logical Drive
Physical Drives
C
B
D
parityAB
Normal situation
One drive broken
A
B
parityCD
A
C
D
Logical Drive
Physical Drives
C
B
D
parityAB
broken
Write AB
XOR
computing
XOR
computing
A
B
C
D
Logical Drive
Physical Drives
RAID 1 All data copied to N disks
higher than RAID2,3,4 but lower than RAID 6
read:higher than one single disk. write:like single disk
read:twice than single disk. write:like single
disk RAID 4 Data written in
different disks in parallel.
Much higher than single disk. Just like
RAID5
read:just like RAID 0.write:much lower than single disk
read:like RAID 0. write:much lower than
single disk RAID 5 Data written in
different disks in parallel.
Much higher than single disk. Just like
RAID4
read:just like RAID 0.write: lower than
single disk
read:like RAID 0. write:usually lower
than single disk RAID
1+0
Have advantages of RAID0,RAID1
Higher than RAID 2,3,4
very high high in read &write
C. Hardware RAID & Software RAID
The implementation of RAID can be sorted into two groups: the software ones and the hardware ones, due to what controls the spanning, mirroring, or parity calculation RAID. For the software RAID implementation, the operating system handles the disks of array through a normal disk drive controller, controlled by program codes. The speed of software RAID depends on how fast the CPU of the computer is. Since the quality of CPU has improved a lot recently, the software approach is likely to be better than the hardware one.
However, the software RAID has disadvantages: because the CPU has a lot of tasks to perform, it is recommended not to use it to do RAID calculations, especially when the computer is busy. Sometimes it will cause data loss when there is a crash and sometimes it consumes time to wait for the arrangements of array to be rebuilt.
So that’s when the hardware approach comes in. Unlike the software RAID, this approach needs at least a special RAID controller. Sometimes it is set in the motherboard, sometimes it is in the form of expansion card. The controller ménages the parity calculations of the disks and allow operating system to rest. By the way, in order to boot up the speed, hardware RAID also have a special battery back-up write back cache system. This cache allows quick storage and access to disk drive (flushed by controller).
From the comparison above, we know that hardware implementation seems to be better, but actually the controllers may sometimes make mistakes. So the most common use of RAID control nowadays is the hybrid RAID (partly hardware, partly software). For this combined one, controllers are normal but there is a back up cache and users can set up RAIDs control through BIOS. For the OS, the disks will still look like one big block of disk. The parity calculations are still performed by software, but cache can ensure the security and speed of data transaction.
D. Conclusion
doesn’t it become popular? The answer is clear: most of us don’t need it. Actually, only servers that need to store great amounts of data in a short time with faults prevention need RAID. For PCs, it isn’t necessary and for supercomputers, only high transaction rate of CPU and RAM is needed. Therefore, only servers that contains a lot of data, like those of bbs boards, need RAID.
Still, there are improvements for this interesting technology: even for the most common RAID 5, the speed is still limited by the parity calculation and the word “redundant” means “waste” of resources. Even though RAID is a useful concept, there might be new approaches replacing it in the future.