• No results found

Troubleshooting ZFS Data Errors in a Mirror Pool

In document D72965GC10_ag Solaris11 Adnace LAB (Page 82-88)

Task 1C: Troubleshooting a ZFS Device Error in a raid-z Pool

Task 2: Troubleshooting ZFS Data Errors in a Mirror Pool

In this task, you inject errors in your data file. Then you implement corrective measures to make sure that the data is restored from the mirror copy.

The following activities are covered in this task: • Running an explicit scrub

• Restoring data from the mirror backup

1. Verify that the Sol11-SuperServer and Sol11-Serv1 virtual machines are running. If the virtual machines are not running, start them now.

2. Log in to the Sol11-Serv1 virtual machine as the oracle user. Use oracle1 as the password. Assume administrator privileges.

oracle@S11-serv1:~$ su - Password: oracle1

root@S11-serv1:~#

Oracle Internal & Or

3. Use the zpool command and create a mirror pool. Check the health of the pool.

root@s11-serv1:~# zpool create assetpool mirror c7t3d0 c7t4d0 spare c7t5d0

root@s11-serv1:~# zpool status assetpool pool: assetpool

state: ONLINE

scan: none requested config:

NAME STATE READ WRITE CKSUM assetpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c7t3d0 ONLINE - - - c7t4d0 ONLINE - - - spares c7t5d0 AVAIL errors: No known data errors

4. Use the tar command to create a demonstration data file. Let it generate data for a minute

or more, and then break the command.

root@s11-serv1:~# tar cvf /assetpool/data.tar /usr … … … /usr/bin/nvidia-xconfig /usr/bin/alacarte /usr/bin/iceauth /usr/bin/ps2ascii /usr/bin/gvfs-mount /usr/bin/pmap /usr/bin/smproxy /usr/bin/pkglint /usr/bin/nautilus-connect-server /usr/bin/luit … <CTRL-C>

root@s11-serv1:~# df -h | grep asset

assetpool 1016M 1.3M 1015M 1% /assetpool

For demonstration purposes, you are creating a data file with significant amount of data in it.

Your display may differ slightly.

Oracle Internal & Or

5. Using the prtvtoc command, save vtoc of the first disk.

root@s11-serv1:~# prtvtoc /dev/dsk/c7t3d0 > /var/tmp/vtoc3

You are saving this vtoc because when you corrupt the data in the next step, the vtoc will also be corrupted. You will then need to restore it.

6. Using the dd command, corrupt the data on the first disk.

root@s11-serv1:~# dd if=/dev/zero of=/dev/dsk/c7t3d0 bs=8192 count=10000 conv=notrunc

10000+0 records in 10000+0 records out

If you are not familiar with the dd command, refer to the man pages. Using full blocks, you are overlaying 10,000 blocks of 8 kilobytes with zeros.

7. Using the tar command, display your data.

root@s11-serv1:~# tar tvf /assetpool/data.tar

… …

drwxr-xr-x root/sys 0 2011-07-16 17:34 usr/

lrwxrwxrwx root/root 0 2011-07-16 17:34 usr/tmp -> ../var/tmp lrwxrwxrwx root/root 0 2011-07-16 17:34 usr/mail -> ../var/mail drwxr-xr-x root/bin 0 2011-07-16 17:34 usr/snadm/

… … …

Is your data still there? Yes

8. Using the zpool command, display the status of the pool.

root@s11-serv1:~# zpool status assetpool pool: assetpool

state: ONLINE

status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.

action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'.

see: http://www.sun.com/msg/ZFS-8000-9P scan: none requested

config:

NAME STATE READ WRITE CKSUM assetpool ONLINE 0 0 0

Oracle Internal & Or

mirror-0 ONLINE 0 0 0 c7t3d0 ONLINE - - - c7t4d0 ONLINE - - - spares

c7t5d0 AVAIL

errors: No known data errors

Notice the error messages.

9. Attempt to correct the issues. Use the zpool command, try to clear the errors, and scrub the pool.

root@s11-serv1:~# zpool offline assetpool c7t3d0 root@s11-serv1:~# zpool online assetpool c7t3d0 ...

root@s11-serv1:~# zpool clear assetpool

Note: Press Return to go back to the command prompt after putting the device back online.

10. Using the zpool command, display the pool’s status. root@s11-serv1:~# zpool status assetpool pool: assetpool

state: DEGRADED

status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.

action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'.

see: http://www.sun.com/msg/ZFS-8000-9P

scan: resilvered 22.4M in 0h0m with 0 errors on Sun Oct 16 08:56:42 2011

config:

NAME STATE READ WRITE CKSUM assetpool DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 spare-0 DEGRADED - - -

c7t3d0 DEGRADED - - - too many errors c7t5d0 ONLINE - - -

c7t4d0 ONLINE - - - spares

c7t5d0 INUSE currently in use

Oracle Internal & Or

errors: No known data errors

Is the pool functional? Yes

What actions has ZFS taken? Due to data errors, it has placed the first disk in the

degraded state and substituted it with the spare.

Note the amount of data resilvered.

11. Using the zpool command, detach the spare disk, clear the errors, and display the pool’s health.

root@s11-serv1:~# fmthard -s /var/tmp/vtoc3 /dev/rdsk/c7t3d0 fmthard: New volume table of contents now in place.

root@s11-serv1:~# zpool clear assetpool root@s11-serv1:~# zpool scrub assetpool root@s11-serv1:~# zpool status assetpool pool: assetpool

state: ONLINE

status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P

scan: scrub repaired 7.64M in 0h0m with 0 errors on Sun Oct 16 08:58:41 2011

config:

NAME STATE READ WRITE CKSUM assetpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c7t3d0 ONLINE - - - c7t4d0 ONLINE - - - spares c3t5d0 AVAIL

errors: No known data errors

root@s11-serv1:~# zpool clear assetpool root@s11-serv1:~# zpool status assetpool pool: assetpool

state: ONLINE

scan: scrub repaired 7.64M in 0h0m with 0 errors on Sun Oct 16 08:58:41 2011

config:

NAME STATE READ WRITE CKSUM assetpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0

Oracle Internal & Or

c7t3d0 ONLINE - - - c7t4d0 ONLINE - - - spares

c7t5d0 AVAIL

errors: No known data errors

By detaching the spare, you are back to using the main disks in the mirror. The data has been resilvered on the first disk.

12. Using the tar command, display your data.

root@s11-serv1:~# tar tvf /assetpool/data.tar

… …

drwxr-xr-x root/sys 0 2011-07-16 17:34 usr/

lrwxrwxrwx root/root 0 2011-07-16 17:34 usr/tmp -> ../var/tmp lrwxrwxrwx root/root 0 2011-07-16 17:34 usr/mail -> ../var/mail drwxr-xr-x root/bin 0 2011-07-16 17:34 usr/snadm/

… … …

Is your data still there? Yes

This concludes the data correction exercise.

Oracle Internal & Or

In document D72965GC10_ag Solaris11 Adnace LAB (Page 82-88)