Server Software RAID Recovery
and Troubleshooting.
(Version 1.0)
10(1)/2008-OTC/CHN-PROJECT-16/05/2013 – 216
OPEN TECHNOLOGY CENTRE NATIONAL INFORMATICS CENTRE
DEPARTMENT OF INFORMATION TECHNOLOGY CHENNAI
Prepared by Open Technology Centre National Informatics Centre
1st Level 2nd Level Final 1.0 24/10/13 M Dhanesh system admin Manas biswal,Ra ghuraman Balakrish nan
Note: This document is tested under Ubuntu Server + Desktop 12.04 64Bit.
Color Definitions
➢ Words in black color indicates the comments, description of the document. ➢ Words in Bordeaux (this) color indicates the values displayed in the console
command line.
➢ Words in Turquoise (this) color indicates the values provided by the user in the
console command line.
➢ Words in orange1 (this) color indicates the values provided by the user in the
configuration or ini files.
➢ Words in Light blue5 (this) color indicates the commands provided by the user in
the command line.
Table of Contents
Color Definitions...3
1. Replacing A Failed Hard Drive In Software RAID1 Array...5
2. How do we confirm Hard Disk Failed...5
In this kickstart Installation we have two hard drives, /dev/sda and /dev/sdb, with the partitions /dev/sda1 and /dev/sda5 as well as /dev/sdb1 and /dev/sdb5.We have mentioned partition information in the preseed script.
/dev/sda1 and /dev/sdb1 make up the RAID1 array /dev/md0. /dev/sda5 and /dev/sdb5 make up the RAID1 array /dev/md1. /dev/sda1 + /dev/sdb1 = /dev/md0
/dev/sda5 + /dev/sdb5 = /dev/md1
2. How do we confirm Hard Disk Failed.
If a disk has failed, we will find lot of error messages in the log files. e.g. /var/log/messages or /var/log/syslog.
If any error messages in /var/log/syslog related software raid please run the command
cat /proc/mdstat
and instead of the string [UU] we will see [U_] if we have a degraded RAID1 array.
#cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6]
[raid10]
md0 : active raid1 sda1[0]
24418688 blocks [2/1] [U_]
md1 : active raid1 sda5[0]
24418688 blocks [2/1] [U_]
unused devices: <none>
Here it shows harddisk sdb is failed.
If Raid is optimal it shows UU as well as sda and sdb
3. Removing The Failed Disk
In this recovery method failed harddisk is “/dev/sdb” ( Please change the harddisk name accordingly)
To remove /dev/sdb, we will mark /dev/sdb1 and /dev/sdb5 as failed and remove them from their respective RAID arrays (/dev/md0 and /dev/md1).
First we mark /dev/sdb1 as failed:
#mdadm --manage /dev/md0 --fail /dev/sdb1
Please run the below command to see the raid status.
#cat /proc/mdstat
Then we remove /dev/sdb1 from /dev/md0:
#mdadm --manage /dev/md0 --remove /dev/sdb1
mdadm: hot removed /dev/sdb1
#cat /proc/mdstat
Now we do the same steps again for /dev/sdb5 (which is part of /dev/md1):
[raid10]
md0 : active raid1 sda1[0]
24418688 blocks [2/1] [U_]
md1 : active raid1 sda5[0]
24418688 blocks [2/1] [U_]
unused devices: <none>
SHUTDOWN the system using the below command #shutdown -h now
Replace the hard disk with new one here, “/dev/sdb” , then boot the system. Create the same partitioning as on /dev/sda.
Using the below command:
#sfdisk -d /dev/sda | sfdisk /dev/sdb
We can test the output by running the command:
#fdisk -l
to check if both hard drives have the same partitioning now.
#mdadm --manage /dev/md0 --add /dev/sdb1
mdadm: re-added /dev/sdb1
#mdadm --manage /dev/md1 --add /dev/sdb5
mdadm: re-added /dev/sdb5
Now both arays (/dev/md0 and /dev/md1) will be synchronized. Run
#cat /proc/mdstat
to see the status.
During the synchronization the output will look like this:
#cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6]
[raid10]
md0 : active raid1 sda1[0] sdb1[1]
24418688 blocks [2/1] [U_]
[=>...] recovery = 9.9% (2423168/24418688)
finish=2.8min speed=127535K/sec
md1 : active raid1 sda5[0] sdb5[1]
24418688 blocks [2/1] [U_]
[=>...] recovery = 6.4% (1572096/24418688)
finish=1.9min speed=196512K/sec
unused devices: <none>
When the synchronization is finished, the output will look like this:
# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6]
[raid10]
md0 : active raid1 sda1[0] sdb1[1]
24418688 blocks [2/2] [UU]
md1 : active raid1 sda5[0] sdb5[1]
24418688 blocks [2/2] [UU]