Instance Recovery and Database Availability

Elapsed time None Partial Full A B C D E G H F

1

2 ₃

4

5

In a single-instance environment, the instance startup combined with the crash recovery time is controlled by the setting of the FAST_START_MTTR_TARGETinitialization parameter. You can

set its value if you want incremental checkpointing to be more aggressive than the autotuned checkpointing. However, this is at the expense of a much higher I/O overhead.

In a RAC environment, including the startup time of the instance in this calculation is useless because one of the surviving instances is doing the recovery.

In a RAC environment, it is possible to monitor the estimated target, in seconds, for the duration from the start of instance recovery to the time when GCD is open for lock requests for blocks not needed for recovery. This estimation is published in the V$INSTANCE_RECOVERY

view through the ESTD_CLUSTER_AVAILABLE_TIME column. Basically, you can monitor the

time your cluster is frozen during instance-recovery situations.

In a RAC environment, the FAST_START_MTTR_TARGETinitialization parameter is used to

bound the entire instance-recovery time, assuming it is instance recovery for single-instance death.

Note: If you really want to have short instance recovery times by setting

FAST_START_MTTR_TARGET, you can safely ignore the alert log messages advising you to

raise its value.

Instance Recovery and RAC

Instance crashes Instance starts Instance opens Instance startup + crash recovery FAST_START_MTTR_TARGET Instance crashes Instance recovery starts Rolling forward ends Instance recovery

… first pass + lock claim

FAST_START_MTTR_TARGET

V$INSTANCE_RECOVERY.ESTD_CLUSTER_AVAILABLE_TIME

Here are some guidelines you can use to make sure that instance recovery in your RAC environment is faster:

• Use parallel instance recovery by setting RECOVERY_PARALLISM.

• Set PARALLEL_MIN_SERVERS to CPU_COUNT-1. This will prespawn recovery slaves at startup time.

• If a system fails when there are uncommitted parallel DML or DDL transactions, you can speed up transaction recovery during startup by setting the

FAST_START_PARALLEL_ROLLBACKparameter.

• Using asynchronous I/O is one of the most crucial factors in recovery time. The first- pass log read uses asynchronous I/O.

• Instance recovery uses 50 percent of the default buffer cache for recovery buffers. If this is not enough, some of the steps of instance recovery will be done in several passes. You should be able to identify such situations by looking at your alert.log file. In that case, you should increase the size of your default buffer cache.

Instance Recovery and RAC

• Use parallel instance recovery.

• Set PARALLEL_MIN_SERVERS.

• Use asynchronous input/output (I/O).

• Increase the size of the default buffer cache.

Although RAC provides you with methods to avoid or to reduce down time due to a failure of one or more (but not all) of your instances, you must still protect the database itself, which is shared by all the instances. This means that you need to consider disk backup and recovery strategies for your cluster database just as you would for a nonclustered database.

To minimize the potential loss of data due to disk failures, you may want to use disk mirroring technology (available from your server or disk vendor). As in nonclustered databases, you can have more than one mirror if your vendor allows it, to help reduce the potential for data loss and to provide you with alternative backup strategies. For example, with your database in ARCHIVELOGmode and with three copies of your disks, you can remove one mirror copy and perform your backup from it while the two remaining mirror copies continue to protect ongoing disk activity. To do this correctly, you must first put the tablespaces into backup mode and then, if required by your cluster or disk vendor, temporarily halt disk operations by issuing the ALTER SYSTEM SUSPENDcommand. After the statement completes, you can break the mirror and then resume normal operations by executing the ALTER SYSTEM RESUME command and taking the tablespaces out of backup mode.

Protecting Against Media Failure

Database backups Mirrored disks Archived log files Archived log files

Media recovery must be user-initiated through a client application, whereas instance recovery is automatically performed by the database. In these situations, use RMAN to restore backups of the data files and then recover the database. The procedures for RMAN media recovery in Oracle RAC environments do not differ substantially from the media recovery procedures for single-instance environments.

The node that performs the recovery must be able to restore all of the required data files. That node must also be able to either read all the required archived redo logs on disk or be able to restore them from backups. Each instance generates its own archive logs that are copies of its dedicated redo log group threads. It is recommended that Automatic Storage Management (ASM) or a cluster file system be used to consolidate these files.

When recovering a database with encrypted tablespaces (for example, after a SHUTDOWN ABORT or a catastrophic error that brings down the database instance), you must open the Oracle Wallet after database mount and before you open the database, so the recovery process can decrypt data blocks and redo.

Instance Recovery and Database Availability

1

2

₃

4

5

Instance Recovery and RAC

Instance Recovery and RAC

•

Use parallel instance recovery.

•

Set PARALLEL_MIN_SERVERS.

•

Use asynchronous input/output (I/O).

•

Increase the size of the default buffer cache.

Protecting Against Media Failure

Media Recovery in Oracle RAC

•

Media recovery must be user-initiated through a client

application.

•

In these situations, use RMAN to restore backups of the

data files and then recover the database.

•

RMAN media recovery procedures for RAC do not differ

substantially from those for single-instance environments.

•

The node that performs the recovery must be able to

Instance Recovery and Database Availability

1

2

3

4

5

Instance Recovery and RAC

Instance Recovery and RAC

•

Use parallel instance recovery.

•

Set PARALLEL_MIN_SERVERS.

•

Use asynchronous input/output (I/O).

•

Increase the size of the default buffer cache.

Protecting Against Media Failure

Media Recovery in Oracle RAC

•

Media recovery must be user-initiated through a client

application.

•

In these situations, use RMAN to restore backups of the

data files and then recover the database.

•

RMAN media recovery procedures for RAC do not differ

substantially from those for single-instance environments.

•

The node that performs the recovery must be able to

₃