Navigating the Rescue Mode for Linux

(1)

(2)

Page| 2

ABOUT THIS GUIDE

This document will take you through the process of booting your Linux server into rescue mode to identify and fix the problem(s) that may be causing it to be unresponsive.

This guide will instruct you on how to:

• Log into rescue mode • Identify disk partitions

• Detect physical disk problems • Detect and fix file system errors • Access and recover data

Logging into rescue mode

If your Linux dedicated server is unresponsive and fails to come online after a reboot, you can boot the server into rescue mode from the Tagadab control panel to identify and fix the problem.

1. Once rescue mode has been started on your dedicated server, log in to the system via SSH using your servers usual IP address and the root password that was set when the system was first installed (you can find this in your Tagadab control panel). You can also access the server in graphical mode using VNC if you have a VNC client installed.

Please be aware that the rescue mode system will have a different SSH host key to your normal server. If you are using PuTTY you will see a warning like Screen 1:

(3)

Page| 3

2. Accept the warning by clicking the 'Yes' button and logging in. If you are using SSH from a Linux or Mac shell, you may need to remove the old version of the SSH key from your known hosts file before logging in. Once you have finished with rescue mode and booted your server normally, it will return to using its usual SSH host key and you will see a similar warning again.

You should see a window similar to Screen 2 once you are logged in:

Screen 2

Identifying your disk partitions

1. Identify your disk partitions before recovering your system. Get a list of all of the disks connected to the system and their partitions by running the command 'fdisk –l' as noted in Screen 3:

(4)

Page| 4

2. The exact output from this will vary depending on the number of disk in your server, the number of partitions on each disk, and whether or not your system uses software RAID. Screen 3 shows one disk (/dev/sda) that contains four partitions (numbered 1, 2, 5 and 6). The first partition (/dev/sda1) is marked as bootable, so this would be the partition mounted under /boot.

The second partition (/dev/sda2) is an extended partition and is only used as a container for the other two partitions. It is not mountable. The third partition (/dev/sda5) is the swap space, and the fourth partition (/dev/sda6) is the root partition, normally mounted as /. If your server has two disks the output will look something like Screen 4:

(5)

Page| 5

If your system uses software RAID, it will look something like Screen 5:

Screen 5

3. If your system uses software RAID, there are additional steps you will need to take before attempting to fix disk issues or access your data. Please refer to the separate software RAID instructions in the following sections.

If no disks are displayed (or an incorrect number of disks are displayed) then the disk(s) may have already suffered a catastrophic failure. In such an event, you will need to ask Tagadab Support to arrange for a replacement disk / server and then restore any backups.

Detecting physical disk problems

1. Your disk(s) may have physical errors that cannot be corrected and would require a disk replacement. You can use the smartctl program to test the disk to see if this is the case. First, check that the disk has its SMART capability enabled with the command 'smartctl –i

/dev/diskname', swapping diskname for the correct device as shown in Screen 6.

(6)

Page| 6

Screen 6

2. Run a test on the disk using 'smartctl –t short /dev/diskname'. Further options are available (use 'man smartctl' to see them). You will see a message that the test will take around one minute to complete as shown in Screen 7:

Screen 7

(7)

Page| 7

Screen 8

4. Look out for a high error count next to any of the errors with the type 'Pre-fail' as these may be an indication that the disk is going to fail soon. If any of your disks have this type of error, please contact Tagadab Support.

5. Smartctl can be used on systems with multiple disks by running the above sequence of commands for each disk (not each partition).

RAID Instructions

There are no separate instructions required for this section.

Detecting and fixing file system errors

1. Your server may fail to boot if there are errors with the file system. You can identify and correct these errors using the fsck tool. For example, if you have seen errors in the systems logs indicating partition problems on the root disk (/dev/sda6 as shown in Screen 9), you can try to correct this by running the command 'fsck /dev/sda6'. This must be done before the disk has been mounted.

(8)

Page| 8

2. In Screen 9, there are a few minor errors that fsck has fixed. For more severe errors, fsck may ask if you would like to fix them through a prompt. To avoid being prompted and simply accept the default options, run the fsck command with the –a flag. Further details are available from the fsck manual (type 'man fsck').

3. If you fix any disk errors, exit rescue mode and attempt to boot the system normally. If the system still fails to boot, or can’t fix the disk errors, you may need to recover any data that you did not back up (see the section on recovering data).

RAID Instructions

Perform fsck on the RAID device rather than on the member partitions to check the file system on both disks simultaneously. The RAID device will likely be either /dev/md0 or /dev/md1, whichever is the largest (the smaller RAID device will be swap space). In Screen 10, minor errors have been corrected.

Screen 10

Accessing your data

1. If your disks did not show any errors, or you know your system did not boot due to disk related reasons (e.g., incorrectly enabled firewall, incorrectly modified grub, etc.) you will need to access your disk(s) to either correct the problem or recover the data before reimaging. To do this, the disk(s) needs to be mounted.

(9)

Page| 9

3. To access the data on the root partition, create a mount point for the partition. For our one disk system, it will be created at /mnt/sda6. We then mount the disk on this mount point, and cd into the directory to view our system as shown below in Screen 11:

Screen 11

RAID Instructions

Create a mount point at /mnt/md0, mount the RAID device here and cd into the directory as shown in Screen 12. You can now view and edit your files using standard Linux tools (such as less, cat, vi, nano).

Screen 12

Chroot

1. Use the chroot command to change the root of the rescue system to the root on the disk. This is needed if you wanted to use the 'passwd' program to reset one of your system passwords. 2. Then use 'chroot mountpoint' to change the root to the partition you have mounted. In Screen 13

we used 'chroot /mnt/sda6' or 'chroot /mnt/md0'. You may see an error such as: chroot: failed to run command `/bin/zsh': No such file or directory

This indicates that the zsh shell used by the rescue system is not available to run (i.e., it is not installed) on your dedicated server. In this case, modify the command to run the bash shell: 'chroot mountpoint bash'

(10)

Page| 10

Screen 13

Recovering your data

If you are unable to fix your server, you will need to copy any data that is not backed up before requesting a reimage from the Tagadab control panel. If you have access to another server that runs FTP or SSH, use the command line FTP or SCP tools to upload your data to that server. Otherwise, you can connect an SCP client (such as WinSCP for Windows) to the rescue mode server, navigate to the point where you mounted the disk and download the data to your local system. HOW IS THE BELOW REFERENCED IN COPY?I

(11)

Page| 11

Unmount

When you have finished making changes, unmount the disk, and end rescue mode by rebooting the server from the control panel as shown in Screen 15. If necessary, reimage the server via the Tagadab control panel.

Screen 15

Navigating the Rescue Mode for Linux

ABOUT THIS GUIDE

Logging into rescue mode

Identifying your disk partitions

Detecting physical disk problems

Detecting and fixing file system errors

Accessing your data

Chroot

Recovering your data

Unmount

Need Further Assistance?