Ubuntu bit + 64 bit Server Software RAID Recovery and Troubleshooting.

(1)

Server Software RAID Recovery

and Troubleshooting.

(Version 1.0)

10(1)/2008-OTC/CHN-PROJECT-16/05/2013 – 216

OPEN TECHNOLOGY CENTRE NATIONAL INFORMATICS CENTRE

DEPARTMENT OF INFORMATION TECHNOLOGY CHENNAI

(2)

Prepared by Open Technology Centre National Informatics Centre

(3)

1st_{Level 2}nd_Level _Final 1.0 24/10/13 M Dhanesh system admin Manas biswal,Ra ghuraman Balakrish nan

Note: This document is tested under Ubuntu Server + Desktop 12.04 64Bit.

Color Definitions

➢ Words in black color indicates the comments, description of the document. ➢ Words in Bordeaux (this) color indicates the values displayed in the console

command line.

➢ Words in Turquoise (this) color indicates the values provided by the user in the

console command line.

➢ Words in orange1 (this) color indicates the values provided by the user in the

configuration or ini files.

➢ Words in Light blue5 (this) color indicates the commands provided by the user in

the command line.

(4)

2. How do we confirm Hard Disk Failed.

If a disk has failed, we will find lot of error messages in the log files. e.g. /var/log/messages or /var/log/syslog.

If any error messages in /var/log/syslog related software raid please run the command

cat /proc/mdstat

and instead of the string [UU] we will see [U_] if we have a degraded RAID1 array.

#cat /proc/mdstat

Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6]

[raid10]

md0 : active raid1 sda1[0]

(6)

24418688 blocks [2/1] [U_]

unused devices: <none>

Here it shows harddisk sdb is failed.

If Raid is optimal it shows UU as well as sda and sdb

3. Removing The Failed Disk

In this recovery method failed harddisk is “/dev/sdb” ( Please change the harddisk name accordingly)

To remove /dev/sdb, we will mark /dev/sdb1 and /dev/sdb5 as failed and remove them from their respective RAID arrays (/dev/md0 and /dev/md1).

First we mark /dev/sdb1 as failed:

#mdadm --manage /dev/md0 --fail /dev/sdb1

Please run the below command to see the raid status.

#cat /proc/mdstat

Then we remove /dev/sdb1 from /dev/md0:

#mdadm --manage /dev/md0 --remove /dev/sdb1

mdadm: hot removed /dev/sdb1

#cat /proc/mdstat

Now we do the same steps again for /dev/sdb5 (which is part of /dev/md1):

(7)

[raid10]

24418688 blocks [2/1] [U_]

SHUTDOWN the system using the below command #shutdown -h now

Replace the hard disk with new one here, “/dev/sdb” , then boot the system. Create the same partitioning as on /dev/sda.

Using the below command:

#sfdisk -d /dev/sda | sfdisk /dev/sdb

We can test the output by running the command:

#fdisk -l

to check if both hard drives have the same partitioning now.

#mdadm --manage /dev/md0 --add /dev/sdb1

mdadm: re-added /dev/sdb1

(8)

#mdadm --manage /dev/md1 --add /dev/sdb5

mdadm: re-added /dev/sdb5

Now both arays (/dev/md0 and /dev/md1) will be synchronized. Run

#cat /proc/mdstat

to see the status.

During the synchronization the output will look like this:

#cat /proc/mdstat

[raid10]

md0 : active raid1 sda1[0] sdb1[1]

24418688 blocks [2/1] [U_]

[=>...] recovery = 9.9% (2423168/24418688)

finish=2.8min speed=127535K/sec

24418688 blocks [2/1] [U_]

[=>...] recovery = 6.4% (1572096/24418688)

finish=1.9min speed=196512K/sec

When the synchronization is finished, the output will look like this:

# cat /proc/mdstat

[raid10]

24418688 blocks [2/2] [UU]

(9)

Ubuntu bit + 64 bit Server Software RAID Recovery and Troubleshooting.

Server Software RAID Recovery

and Troubleshooting.

Color Definitions

Table of Contents

2. How do we confirm Hard Disk Failed.

3. Removing The Failed Disk