• No results found

rajrac

N/A
N/A
Protected

Academic year: 2021

Share "rajrac"

Copied!
183
0
0

Loading.... (view fulltext now)

Full text

(1)

Oracle 11g RAC – Index Page 1 of 1830

Oracle Grid Infrastructure

And

Real Application Clusters

Manoj xerox

wilshire

Gayatri nager,behind HUDA complex, ameer pet

.

(2)

Oracle 11g RAC – Index Page 2 of 1830

Index

Oracle Grid Infrastructure ... 1

and ... Error! Bookmark not defined.

Real Application Clusters ... 1

1. Real Application Clusters Concepts ... 9

1.1. Introduction to Clusters ... 9

1.1.1. Types of Clusters ... 10

1.1.1.1. High-Availability (HA) Clusters ... 10

1.1.1.2. Fail-Over Clusters ... 10 1.1.1.3. Load-Balancing Clusters ... 11 1.1.1.4. High-Performance (HPC) Clusters ... 11 1.1.2. Cluster Products ... 11 1.1.2.1. SUN Clusters ... 11 1.1.2.2. HP Serviceguard ... 12 1.1.2.3. IBM's HACMP ... 12 1.1.2.4. MC Service Guard ... 12

1.1.2.5. Red Hat Cluster Suite ... 12

1.1.2.6. Veritas Cluster Server (VCS) ... 13

1.1.2.7. Oracle Real Application Clusters (RAC) ... 13

1.1.3. Component of a Cluster ... 13

1.1.3.1. Cluster Nodes ... 13

1.1.3.2. Cluster Manager ... 14

1.1.3.3. Shared Data Storage ... 14

1.1.3.3.1. Storage Area Networks (SAN) ... 14

1.1.3.3.2. Network Attached Storage (NAS) ... 14

1.1.3.4. Networks ... 14

1.2. Benefits of Real Application Clusters ... 14

1.2.1. High Availability ... 14 1.2.2. Reliability ... 15 1.2.3. Recoverability ... 15 1.2.4. Error Detection ... 15 1.2.5. Continuous Operations ... 15 1.2.6. Scalability ... 15

2. RAC Architecture ... 16

2.1. RAC Architecture ... 16

2.1.1. Background processes in RAC ... 17

2.1.2. Oracle Clusterware ... 18

2.1.3. Hardware Architecture ... 18

2.1.4. File Systems ... 19

2.1.5. Virtual internet Protocol Address (VIP) ... 19

2.2. 10g New Features for RAC Administration ... 19

(3)

Oracle 11g RAC – Index Page 3 of 1830

2.2.2. Enterprise Manager Enhancements for RAC ... 20

2.2.3. Server Control (srvctl) Enhancements ... 20

2.2.4. Oracle Clusterware (10.2.0.1.0) ... 20

2.2.5. Cluster Verification Utility (10.2.0.1.0) ... 20

2.2.6. Oracle Load Balancing Advisory (10.2.0.1.0) ... 21

2.2.7. Oracle RAC Runtime Connection Load Balancing using JDBC and ODP.NET ... 21

2.2.8. Oracle Fast Connection Failover (FCF) (10.2.0.1.0) ... 21

2.2.9. Transparent Data Encryption and RAC (10.2.0.1.0) ... 21

2.2.10. RAC Configuration Assistant Enhancements (10.2.0.1.0) ... 21

2.2.11. ASM Storage Consolidation ... 22

2.2.12. Dynamic RMAN Channel Allocation for RAC Environments ... 22

2.2.13. Failover Improvements for Distributed Transaction Processing (DTP) in RAC... 22

2.2.14. Multiple Oracle Clusterware Files ... 22

2.2.15. Fast-Start Failover and Data Guard Environments ... 22

2.3. 11gR2 New Features for RAC Administration ... 22

2.3.1. Oracle Real Application Clusters One Node (Oracle RAC One Node) ... 22

2.3.4. Grid Plug and Play ... 23

2.3.5. Oracle Cluster Registry performance enhancements ... 23

2.3.6. Enhanced Cluster Verification Utility ... 23

2.3.7. Patch application with Oracle Enterprise Manager Database Control ... 23

2.3.8. Zero downtime for patching Oracle RAC ... 23

2.3.9. Oracle ASM Dynamic Volume Manager ... 23

2.3.10. Oracle Automatic Storage Management Cluster File System ... 24

3. Oracle Clusterware ... 25

3.1. Oracle Clusterware and RAC ... 25

3.2. The Oracle Clusterware Architecture and Processing ... 25

3.3. Oracle Clusterware Software Component Processing Details ... 25

3.3.1. Clusterware 11.1 Main Daemons ... 27

3.3.2. Grid Infrastructure 11.2 Main Daemons ... 27

3.4. The Oracle Clusterware Stack ... 28

3.5. Oracle Clusterware Components and High Availability ... 32

3.5.1. The Oracle Clusterware Voting Disk and Oracle Cluster Registry ... 32

3.6. New Features for Oracle Clusterware Installation ... 33

4. System Requirements for Oracle Grid Infrastructure and RAC ... 35

4.1. Requirements for Oracle Grid Infrastructure (includes both Clusterware and ASM binaries) and Oracle RAC software ... 35

4.1.1. Hardware Requirements ... 35

4.1.2. Network Hardware Requirements ... 35

4.1.3. IP Address Requirements ... 35

4.1.4. Node Time Requirements ... 36

4.1.5. Software Requirements for Oracle Clusterware ... 36

4.1.6. Shared Storage Requirements ... 36

4.2. Operating system Configuration Tasks for Oracle Installations ... 36

4.2.1. Overview of Cluster Verification Utility ... 38

4.2.2. Creating the Oracle Inventory Group ... 40

4.2.3 Creating the OSDBA Group ... 40

4.2.4. Creating an OSOPER Group (Optional) ... 40

4.2.5. Creating the Oracle Software Owner User ... 40

4.2.6. Verifying User “nobody” Exists ... 41

4.2.7. Creating Identical Users and Groups on Other Cluster Nodes ... 41

(4)

Oracle 11g RAC – Index Page 4 of 1830

4.2.9. Configuring Kernel Tuning Parameters ... 41

4.2.9.1. Configuring Shared Memory ... 41

4.2.9.2. Configuring Semaphores ... 42

4.2.9.3. Configuring File Handles ... 43

4.2.10. Setting Shell Limits for the Oracle User ... 43

4.2.11. Identifying Required Software Directories ... 44

4.2.11.1. Oracle Base Directory ... 44

4.2.11.2. Oracle Inventory Directory ... 44

4.2.11.3. Oracle Grid Infrastructure Home Directory ... 44

4.2.11.4. Oracle Database Home Directory ... 45

4.2.12. Configuration of Hangcheck-Timer Module ... 45

4.3. General Storage Considerations ... 45

4.3.1. Identifying Required Partitions for Clusterware Files ... 45

4.3.2. Identifying Requirements for Using a File System for Oracle Clusterware Files ... 46

4.3.3. Creating Required Directories for Oracle Clusterware Files on Shared File Systems ... 48

4.3.4. Clusterware File Restrictions for Logical Volume Manager on Linux ... 48

5. Installing Oracle Grid Infrastructure and Oracle Database S/W ... 49

5.1. Verifying Oracle Grid Infrastructure Requirements with CVU ... 49

5.1.1. Troubleshooting Oracle Grid Infrastructure Setup ... 49

5.2. Preparing to Install Oracle Grid Infrastructure with OUI ... 50

5.3. Oracle Clusterware Installation Process Description ... 51

5.4. Installation of Oracle Database 11gR2 with RAC Using OUI ... 52

5.5. Administrative tools for RAC environments ... 52

6. Administering Oracle Clusterware ... 54

6.3. Administering Voting Disks in Oracle Clusterware ... 56

6.3.1. Backing up Voting Disks ... 56

6.3.2. Restoring Voting Disks (Till 11.1) ... 56

6.3.3. Restoring Voting Disks (In 11.2) ... 56

6.3.4. Moving Voting Disks from Non-ASM to ASM ... 57

6.4. Administering Oracle Cluster Registry in Oracle Clusterware ... 57

6.4.1 Adding, Replacing, Repairing, and Removing the OCR ... 57

6.4.2. Managing Backups and Recovering the OCR using OCR Backup Files ... 59

6.4.3. Restoring the OCR from Automatically Generated OCR Backups ... 59

6.4.4. Overriding the OCR Data Loss Protection Mechanism ... 59

6.4.5. Administering the Oracle Cluster Registry with OCR exports ... 60

6.4.6. Implementing Oracle Hardware Assisted Resilient Data Initiative for the OCR ... 60

6.4.7. Upgrading and Downgrading the OCR Configuration in RAC ... 61

7. RAC Database Creation... 62

7.1. Database Creation in RAC Environment ... 62

7.2. Storage Considerations for RAC Database ... 62

7.2.1. Introduction to ASM (Automatic Storage Management) ... 63

7.2.1.1. Identifying Storage Requirements for ASM ... 63

7.2.1.2. Configuring Disks for ASM ... 63

7.2.2. Oracle Cluster File System (OCFS) ... 64

7.2.3. Raw Devices ... 66

7.3. Additional Considerations for RAC Database ... 67

7.4. RAC Database Creation Consideration ... 68

7.4.1. Administering Storage ... 69

(5)

Oracle 11g RAC – Index Page 5 of 1830

7.4.3. Redo Log File Storage in RAC ... 70

7.4.4. Automatic Undo Management in RAC ... 70

7.5. Administration of ASM in RAC ... 70

7.5.1. Automatic Storage Management components in RAC ... 70

7.5.2. Modifying Disk-group configurations for ASM in RAC ... 70

7.5.4. Administering ASM Instances with Enterprise Manager in RAC... 71

7.5.5. Administering ASM Instances with srvctl in RAC ... 71

7.5.6. Creation of ASM Instance in RAC ... 76

7.5.6.1. Preparing Additional Storage for ASM ... 76

7.5.6.2. Initialization Parameters for ASM. ... 77

7.5.6.3. Creating ASM Cluster Instances on RAC Nodes ... 77

7.5.6.4. Starting ASM Cluster Instances and Mounting ASM Disk-groups ... 78

7.5.6.5. Registering ASM Instances with Oracle Cluster Ready Services ... 78

7.6. Creating Database in RAC ... 78

7.6.1. Overview of Initialization Parameter Files in RAC Database ... 78

7.6.2. Setting Server Parameter File Parameter Values for RAC ... 79

7.6.3. Parameter File Search Order in RAC ... 79

7.6.4. Initialization Parameter for RAC Database ... 79

7.6.5. Backing up the Server Parameter File ... 83

8. Administering Database Instances ... 84

8.1. RAC Management Tools ... 84

8.1.1. Administering RAC with Enterprise Manager ... 84

8.1.2. Administering RAC with SQL*Plus ... 84

8.1.3. Administering RAC with srvctl ... 84

8.2. Starting and Stopping Instances and RAC Databases ... 84

8.2.1. Starting up and Shutting down with Enterprise Manager ... 85

8.2.2. Starting up and Shutting Down with SQL*Plus ... 85

8.2.3. Starting up and Shutting down with srvctl ... 86

8.3. Managing Archived Redo Logs ... 86

8.3.1. Guidelines and Considerations for Archived Redo Logs ... 86

8.3.2. Archived Redo Log File conventions in RAC ... 87

8.3.3. Archiving Configuration in RAC ... 87

8.4. Changing the Archiving Mode in RAC ... 89

9. Backup and Recovery ... 91

9.1. HOT Backup ... 91

9.2. Configuration of RMAN for RAC ... 91

9.2.1. Configuring Channels for RMAN in RAC ... 92

9.2.1.1. Configuring Channels to Use Automatic Workload Balancing ... 92

9.2.1.2. Configuring Channels to Use a Specific Channel ... 92

9.2.2. Configuring the RMAN Snapshot Control File Location ... 92

9.2.3. Configuring the RMAN Control File Autobackup Feature ... 93

9.2.4. RMAN Archiving Configuration Scenarios ... 93

9.2.4.1. Cluster File System Archiving Scheme ... 93

9.2.4.2. Non-Cluster File System Local Archiving Scheme ... 94

9.3. Managing Backup and Recovery ... 96

9.3.1. RMAN Backup Scenarios for RAC ... 96

9.3.1.1. Non-Cluster File System Backup Scheme ... 96

9.3.1.2. Cluster File System Restore Scheme ... 97

9.3.1.3. Non-Cluster File System Restore Scheme ... 97

(6)

Oracle 11g RAC – Index Page 6 of 1830

9.3.1.5. RMAN recovery through resetlogs in RAC ... 97

9.3.1.6. RMAN and Oracle Networking in RAC ... 97

9.3.1.7. Instance Recovery in RAC ... 98

9.3.1.8. Using RMAN to Create Backups in RAC ... 98

9.3.1.9. Channel Connections to Cluster Instances ... 98

9.3.1.10. Node Affinity Awareness of Fast Connections ... 99

9.3.1.11. Deleting Archived Redo Logs after a Successful Backup... 99

9.3.1.12. Auto-location for Backup and Restore Commands ... 99

9.3.2. RMAN Restore Scenarios for Real Application Clusters ... 99

9.3.2.1. Cluster File System Restore Scheme ... 99

9.3.2.2. Non-Cluster File System Restore Scheme ... 100

9.3.3. RMAN Recovery through Resetlogs in Real Application Clusters ... 100

9.3.4. RMAN and Oracle Net in Real Application Clusters ... 100

9.3.5. Instance Recovery in Real Application Clusters ... 101

9.3.5.1. Single Node Failure in Real Application Clusters ... 101

9.3.6. Media Recovery in RAC ... 101

9.3.7. Parallel Recovery in RAC ... 101

9.3.8. Flash Recovery Area in RAC... 102

10. Workload Management ... 103

10.1. Workload Management and Application High Availability ... 103

10.2. Services Deployment Options ... 103

10.2.1. Using Oracle services ... 103

10.2.2. Default Service Connections ... 104

10.2.3. Connection Load Balancing ... 104

10.3. Fast Application Notification... 105

10.3.1. Overview of Fast Application Notification ... 105

10.3.2. Application High Availability with Services and FAN ... 106

10.3.3. Managing Unplanned Outages ... 106

10.3.4. Managing Planned Outages ... 107

10.3.5. Using Fast Application Notification Callouts ... 107

10.4. Load balancing advisory ... 108

10.4.1. Overview of the Load Balancing Advisory ... 108

10.4.2. Administering the Load Balancing Advisory ... 108

10.4.3. Load Balancing Advisory FAN Events ... 109

10.5. Oracle Clients Integration with FAN. ... 109

10.5.1. Enabling Java Database Connectivity to Receive FAN Events ... 110

10.5.2. Enabling ODP.NET to Receive FAN High Availability Events ... 112

10.6. Services and Distributed Transaction Processing in RAC ... 112

10.7. Administering services ... 113

10.7.1. Administering Services with Enterprise Manager ... 114

10.7.2. Administering Services with the Database Configuration Assistant (DBCA, in 11g it is de-supported) .... 116

10.7.3. Administering Services with the PL/SQL DBMS_SERVICE Package ... 116

10.7.4. Administering Services with SRVCTL (recommended) ... 117

10.8. Measuring Performance by Service Using the AWR ... 120

10.8.1. Service Thresholds and Alerts... 120

10.9. 11gR2 Clusterware New Components ... 123

10.9.1. Grid Plug and Play (GPnP) ... 123

10.9.2. Grid Naming Service (GNS) ... 123

10.9.3. Single Client Access Name (SCAN) ... 124

10.9.4. Server Pools ... 126

(7)

Oracle 11g RAC – Index Page 7 of 1830

11. Administrative Options Available for RAC ... 129

11.1. Enterprise Manager Tasks for RAC ... 129

11.1.1. Enterprise Manager Pages for RAC ... 129

11.2. RAC Administration Procedures for Enterprise Manager ... 131

11.3. Additional information about SQL*Plus in RAC... 132

11.4. Quiescing RAC Database ... 133

11.5. Administering System and Network Interfaces with OIFCFG ... 133

11.6. Changing VIP Addresses ... 134

12. Adding and Deleting Nodes and Instances ... 136

12.1. Cloning Oracle Grid Infrastructure (Clusterware + ASM) and RAC S/W in Grid Environments ... 136

12.2. Quick-Start Node and Instance Addition and Deletion Procedures ... 136

12.2.1. Adding an Oracle Grid Infrastructure Home to a New Node ... 136

12.2.4. Adding an Oracle Database Home with RAC to a New Node ... 136

12.2.5. Deleting an Oracle Database Home with RAC from an Existing Node ... 137

12.2.6. Deleting an Oracle Grid Infrastructure Home from an Existing Node ... 137

12.3. Node and Database Instance Addition and Deletion Procedures ... 138

12.3.1. Overview of Node Addition Procedures ... 138

12.3.2. Overview of Node Deletion Procedures ... 143

13. Design and Deployment Techniques ... 147

13.1. Service Configuration Recommendations for High Availability ... 147

13.2. How Oracle Clusterware Manages Service Relocation ... 148

13.3. General Database Deployment for RAC ... 148

14. Monitoring Performance ... 149

14.1. Monitoring RAC Databases ... 149

14.2. Verifying the Interconnect Settings for RAC ... 149

14.3. Performance Views in RAC ... 149

14.4. RAC Performance Statistics ... 149

14.5. Automatic Workload Repository (AWR) in RAC ... 149

14.6. Monitoring RAC Statistics and Events ... 150

14.6.1. RAC Statistics and Events in AWR and Statspack Reports ... 150

14.6.2. RAC Wait Events ... 150

14.6.3. Monitoring Performance by Analyzing GCS and GES Statistics ... 150

14.7. Monitoring Performance with Oracle Enterprise Manager ... 153

14.7.1. Collection-Based Monitoring... 153

14.8. Real-Time Performance Monitoring ... 154

15. Making Applications Highly Available Using Oracle Clusterware ... 162

15.1. Managing Custom Applications with Oracle Clusterware Commands ... 162

15.2. Creating Application Profiles ... 163

15.3. Oracle Clusterware Action Program Guidelines ... 168

15.4. Oracle Clusterware Commands for Application Management ... 169

15.4.1. Registering Application Resources ... 170

15.4.2. Starting Application Resources ... 170

15.4.3. Relocating Applications and Application Resources ... 170

15.4.4. Stopping Applications and Application Resources ... 171

15.4.5. Unregistering Applications and Application Resources ... 172

15.4.6. Clusterware Application and Application Resource Status Information ... 172

16. Troubleshooting of RAC ... 174

16.1. Overview of Troubleshooting RAC ... 174

16.1.1. Diagnosing Oracle Clusterware High Availability Components ... 174

(8)

Oracle 11g RAC – Index Page 8 of 1830

16.1.1.2. Component Level Debugging ... 174

16.1.1.3. Oracle Clusterware Shutdown and Startup ... 175

16.1.1.4. Enabling and Disabling Oracle Clusterware Daemons ... 175

16.1.1.5. Diagnostics Collection Script ... 175

16.1.1.6. The Oracle Clusterware Alerts ... 175

16.1.1.7. Resource Debugging ... 175

16.1.1.8. Checking the Health of the Clusterware ... 175

16.1.1.9. Clusterware Log Files and the Unified Log Directory Structure ... 176

16.1.2. Troubleshooting the Oracle Cluster Registry ... 177

16.1.3. Enabling Additional Tracing for RAC High Availability ... 179

16.1.3.1. Generating Additional Trace Information for a Running Resource ... 179

16.1.3.2. Verifying Event Manager Daemon Communications ... 179

16.1.3.3. Enabling Additional Debugging Information for Cluster Ready Services ... 179

16.1.4 Diagnosing Oracle RAC Components ... 180

16.1.5. Using the Cluster Verification Utility ... 181

16.1.5.1 Cluster Verification Utility Requirements ... 181

16.1.5.2 UNDERSTANDING CVU COMMANDS, HELP, OUTPUT, AND NODELIST ... 182

16.1.5.3. Using CVU Help ... 182

16.1.5.4. Verbose Mode and UNKNOWN Output ... 182

16.1.5.5. Cluster Verification Utility Nodelist Shortcuts ... 182

(9)

Oracle 10g RAC – Concepts Page 9 of 1830

1. Real Application Clusters Concepts

Oracle Real Application Clusters (RAC) allows Oracle Database to run any packaged or custom application, unchanged across a set of clustered services. This provides the highest levels of availability and the most flexible scalability. If a clustered server fails, Oracle continues running on the remaining servers. And when you need more processing power, simply add another server without taking users offline. To keep costs low, even the highest-end systems can be built out of standardized, commodity parts.

Oracle Real Application Cluster provides a foundation for Oracle Enterprise Grid computing Architecture. Oracle RAC technology enables a low-cost hardware platform to deliver the highest quality of service that rivals and exceeds the levels of availability and scalability achieved by the most expensive, mainframe SMP computers. Dramatically reducing administration costs and providing new levels of administration flexibility, oracle enabling the enterprise Grid environment.

1.1. Introduction to Clusters

A cluster is a group of independent servers that cooperate as a single system. Cluster provides improved fault resilience and modular incremental system growth over single symmetric multiprocessor (SMP) systems. In the event of a system failure, clustering ensures high availability to users. Access to mission critical data is not lost. A redundant hardware component, such as additional nodes, interconnects, and disk, allow the cluster to provide high availability. Such redundant hardware architectures avoid single point-of-failure and provide exceptional fault resilience.

Public Network

Private

Rac1

SCSI Cable SCSI cable

Clients

Rac2

Shared

Storage

(10)

Oracle 10g RAC – Concepts Page 10 of 1830

1.1.1. Types of Clusters

There are many types of clusters. Generally, clusters are classified based on their functionality. The types of clusters are:

1.1.1.1. High-Availability (HA) Clusters

High-availability clusters are implemented primarily for the purpose of improving the availability of services which the cluster provides. They operate by having redundant nodes, which are then used to provide service when system components fail. The most common size for an HA cluster is two nodes, which is the minimum requirement to provide redundancy. HA cluster implementations attempt to manage the redundancy inherent in a cluster to eliminate single points of failure.

1.1.1.2. Fail-Over Clusters

A failover cluster is typically built with one to four machines, physically configured to share disk storage. Like the other cluster types, all servers within the cluster work together to form one virtual server that provides an application or service to clients. The main purpose of the failover cluster is to provide uninterrupted service in the event of a failure within the cluster. Even though it is possible to configure a failover cluster to provide a small performance boost by changing certain settings, a failover cluster is not as scalable as other types of clusters. A failover cluster is a good fit for databases, file storage, and applications that have dynamic content or data that change often.

Shared Storage

Node 1 Node 2

End Users End Users

(11)

Oracle 10g RAC – Concepts Page 11 of 1830

1.1.1.3. Load-Balancing Clusters

Load balancing is similar to shared processing, but there is no need for communication between the nodes. With load balancing, each node processes the requests it has been given by the cluster manager. The cluster manager will distribute the requests in some manner that attempts to distribute the workload evenly among all the systems.

1.1.1.4. High-Performance (HPC) Clusters

High-performance clusters are implemented primarily to provide increased performance by splitting a computational task across many different nodes in the cluster, and are most commonly used in scientific computing. Such clusters commonly run custom programs which have been designed to exploit the parallelism available on HPC clusters. Many such programs use libraries such as MPI which are specially designed for writing scientific applications for HPC computers.

HPC clusters are optimized for workloads which require jobs or processes happening on the separate cluster computer nodes to communicate actively during the computation. These include computations where intermediate results from one node's calculations will affect future calculations on other nodes.

1.1.2. Cluster Products

1.1.2.1. SUN Clusters

Sun Cluster (sometimes Sun Cluster) is a High-availability cluster software package for Solaris operating systems, created by Sun Microsystems. It is used to improve the availability of services which the cluster provides, such as Databases, file sharing on a network, electronic commerce websites, or other software applications. Sun Cluster operates by having redundant computers or nodes which are then used to provide service if active nodes fail.

High availability clusters are distinct from Parallel computing clusters (designed to operate with many systems processing the same task) in that HA clusters are just used to provide more reliable services, which will be restarted on another node if the node they are running on crashes or is unable to keep running them.

Sun Cluster implementations attempt to build redundancy into a cluster to eliminate single points of failure, including multiple network connections and data storage which is multiply connected via Storage area networks.

(12)

Oracle 10g RAC – Concepts Page 12 of 1830

1.1.2.2. HP Serviceguard

HP Serviceguard for Linux is a high availability (HA) clustering solution that leverages the strength of HP's experience in the HA business, bringing the best-in-class mission critical HP -UX technologies to the Linux environment and ProLiant and Integrity servers. The cluster kit includes high availability software that provides critical applications the high availability that enterprise customers require for 24×7 business operations. It is designed to protect applications from a wide variety of software and hardware failures, monitoring the health of each server (node) and quickly responding to failures including system processes, system memory, LAN media and adapters, and application processes. Serviceguard for Linux enables customers to cluster HP ProLiant and Integrity server families with shared storage from HP Modular Smart Array 500 to HP Storage Works XP disk arrays in a 2 to 4-node SCSI or 2 to 16- node Fiber Channel configurations.

1.1.2.3. IBM's HACMP

For more than a decade, IBM High Availability Cluster Multiprocessing (HACMP) has provided solutions that ensure high availability, while they simplify the IT environment and give organizations the flexibility to respond to changing business needs. The latest incarnations of HACMP are the most effective yet – simple to manage, powerful, and reliable in environments with up to 32 HACMP servers per cluster. IBM HACMP:

ƒ

Significantly reduces planned and unplanned outages, allowing for cluster upgrades and system maintenance without interrupting operations.

ƒ

Offers multiple data replication and recovery methods to meet disaster management needs.

ƒ

Monitors, detects and reacts to software problems that are not severe enough to interrupt system operation, allowing the system to stay available during random, unexpected events.

IBM HACMP is simple to configure and flexible to maintain. It offers fast event processing to speed application recovery, and provides availability across multiple sites and all at an affordable cost. The result is business continuity for companies of any size, and an ideal solution for organizations with multiple sites, regional operations, or any other need for decentralization of data.

1.1.2.4. MC Service Guard

HP Service guard is specialized software for protecting mission-critical applications from a wide variety of hardware and software failures. With Service guard, multiple servers (nodes) and/or server partitions are organized into an enterprise cluster that delivers highly available application services to LAN-attached clients. HP Service guard monitors the health of each node and rapidly responds to failures in a way that minimizes or eliminates application downtime.

ƒ

Rapid automatic detection and recovery time

ƒ

Ability to survive multiple node failures

ƒ

Fast fail back

ƒ

Rolling upgrades

ƒ

Integration with the HP Virtual Server Environment

ƒ

Multiple cluster configurations—active-active, active-standby, and rotating standby

ƒ

Mixed Serviceguard cluster support of HP 9000 and HP Integrity servers

ƒ

Support for multiple operating systems (HP-UX and Linux)

ƒ

Maintains application availability during hardware and software updates

ƒ

Ensures that service-level objectives (SLOs) are maintained during planned and unplanned downtime

ƒ

Enables quick and easy deployment of applications, including database toolkits (such as Oracle10g) and Internet toolkits (such as HP Apache and HP CIFS/9000)

ƒ

Supports heterogeneous, mission-critical environments

1.1.2.5. Red Hat Cluster Suite

Red Hat Cluster Suite was designed specifically for Red Hat Enterprise Linux systems. Companies requiring applications to be highly available, or wishing to improve the performance and availability of their network infrastructure, should consider using a Red Hat Cluster Suite configuration. <p></p> Red Hat Cluster Suite is supported for use with Red Hat Enterprise Linux AS and Red Hat Enterprise Linux ES on all platforms except mainframe, p-series and i-series, and it is delivered on an annual subscription basis with one year of support services.

(13)

Oracle 10g RAC – Concepts Page 13 of 1830 For applications that require maximum uptime, a Red Hat Enterprise Linux cluster with Red Hat Cluster Suite is the answer. Specifically designed for Red Hat Enterprise Linux, Red Hat Cluster Suite provides two distinct types of clustering:

Application/Service Failover - Create n-node server clusters for failover of key applications and services

ƒ

IP Load Balancing

Load balance incoming IP network requests across a farm of servers With Red Hat Cluster Suite, applications can be deployed in high availability configurations so that they are always operational—bringing "scale-out" capabilities to enterprise Linux deployments. Support for up to 16 nodes: Allows high availability to be provided for multiple applications simultaneously. NFS/CIFS Failover: Supports highly available file serving in UNIX and Windows environments.

ƒ

SCSI and Fiber Channel Support

Configurations can be deployed using latest SCSI and Fiber Channel technology. Multi-terabyte configurations can readily be made highly available. Service failover: Red Hat Cluster Suite not only ensures hardware shutdowns or failures are detected and recovered from automatically, but it also monitors applications to ensure they are running correctly and restarts them automatically if they fail. Red Hat Cluster Suite is supported for use with Red Hat Enterprise Linux AS and Red Hat Enterprise Linux ES on x86, AMD64/EM64T and Itanium.

1.1.2.6. Veritas Cluster Server (VCS)

VERITAS Cluster Server, the industry's leading open systems clustering solution, is ideal for reducing planned and unplanned downtime, facilitating server consolidation, and effectively managing a wide range of applications in heterogeneous environments. With support for up to 32 node clusters in SAN, VERITAS Cluster Server features the power and flexibility to protect everything from a single critical database instance, to the largest, globally dispersed, multi-application clusters. Increasing automation, providing features to test production disaster recovery plans without disruption, and offering intelligent workload management allow cluster administrators to maximize resources by moving beyond reactive recovery to proactive management of application availability.

Veritas Cluster Server (also known as VCS) is High-availability cluster software, for UNIX, Linux and Microsoft Windows computer systems, created by Veritas Software (now part of Symantec). It provides application cluster capabilities to systems running Databases, file sharing on a network, electronic commerce websites or other applications.

High availability clusters (HAC) improve availability of applications by failing them over or switching them over in a group of systems as opposed to High Performance Clusters which improve performance of applications by allowing them to run on multiple systems simultaneously.

Most Veritas cluster server implementations, attempt to build availability into a cluster, eliminating single points of failure by making use of redundant components like multiple network cards, storage area networks in addition to the use of VCS.

VCS is user-level clustering software; all of VCS processes are normal system processes on the systems it operates on, and have no special access to the Operating System or kernel functions in the host systems.

1.1.2.7. Oracle Real Application Clusters (RAC)

Real Application Clusters is one of the revolutionary features in Oracle 9i Database. Real Application Clusters is a breakthrough technology that provides many advantages for Online Transaction Processing (OLTP), Decision Support System (DSS), Online Analytic Processing (OLAP) and hybrid system types. With the increased functionality of Real Application Clusters, above mention systems can effectively take the advantage of cluster environments.

Other benefit of using Real Application Clusters is to deliver high performance, increased throughput, and high availability of the database for 365x24x7 environments. But before using Real Application Clusters, we must first understand how Real Application Clusters works, what resources it requires, and how to effectively deploy it.

Oracle RAC is a cluster database with a shared cache architecture that overcomes the limitations of traditional shared-nothing and shared-disk approaches to provide a highly scalable and available database solution for all your business applications. Oracle RAC provides the foundation for enterprise grid computing.

1.1.3. Component of a Cluster

There are Four primary components of any cluster; cluster nodes, Cluster manager, Shared Data Storage, Network.

1.1.3.1. Cluster Nodes

The cluster nodes are the systems that provide the processing resource. Cluster nodes do the actual work of the cluster. Generally, they must be configured to take part in the cluster. They must also run the application software that

(14)

Oracle 10g RAC – Concepts Page 14 of 1830 is to be clustered. Depending upon the type of cluster, this application software may either be specially created to run on a cluster, or it may be standard software designed for a stand-alone system.

1.1.3.2. Cluster Manager

The cluster manager divides the work among all the nodes. In most clusters, there is only one cluster manager. Some clusters are completely symmetric and do not have any cluster manager, but these are more rare today. They require complex arbitration algorithms and are more difficult to set up.

Note that a cluster manager may also work as a cluster node. Just because a system is dividing the work does not mean that it cannot do any of the work itself. However, larger clusters tend to dedicate one or more machines to the role of cluster manager, because the task of dividing the work may take more computational power. It also makes it a bit easier to manage the cluster if the two roles are isolated.

1.1.3.3. Shared Data Storage

1.1.3.3.1. Storage Area Networks (SAN)

A Storage Area Network (SAN) is a highly fault tolerant, distributed network in itself dedicated to the purpose of providing absolutely reliable data serving operations. Conceptually, a SAN is a layer which sits between application servers and the physical storage devices, which themselves may be NAS devices, database servers, traditional file servers, or near-line and archival storage devices. The software associated with the SAN makes all this back-end storage transparently available and provides centralized administration for it.

The main distinguishing feature of a SAN is that it runs as an entirely separate network, usually employing a proprietary or storage-based networking technology. Most SANs these days are moving towards the use of fiber-channel. It should be clear that implementing a SAN is a non-trivial undertaking. Administering a SAN will likely require dedicated support personnel. Therefore SANs will most likely only be found in large enterprise environments.

1.1.3.3.2. Network Attached Storage (NAS)

A NAS device is basically an old fashioned file server turned into a closed system. Every last clock cycle in a NAS device is dedicated to pumping data back and forth from disk to network. This can be very useful in freeing up application servers (such as mail servers, web servers, or database servers) from the overhead associated with file operations.

Another way to think of a NAS device is as a hard drive with an Ethernet card and some file serving software thrown on. The advantage of a NAS box over a file server is that the NAS device is self-contained and needs less administration. Another key aspect is that a NAS box should be platform independent. As an all-purpose storage device, a NAS box should be able to transparently serve Windows and UNIX clients alike.

1.1.3.4. Networks

The nodes of a cluster must be connected by two or more local area networks (LANs); at least two networks are required to prevent a single point of failure. A server cluster whose nodes are connected by only one network is not a supported configuration. The adapters, cables, hubs, and switches for each network must fail independently. This usually implies that the components of any two networks must be physically independent.

Before you install the Cluster service, you must configure both nodes to use the TCP/IP protocol over all interconnects. Each network adapter must have an assigned IP address that is on the same network as the corresponding network adapter on the other node. Therefore, there can be no routers between two cluster nodes. However, routers can be placed between the cluster and its clients. If all interconnects must run through a hub, use separate hubs to isolate each interconnect.

ƒ

Private network.

Private network is used only for node-to-node cluster interconnect communication. Private network is used to transfer the heartbeat signals and data transfers within cluster nodes.

ƒ

Public Network.

A public network provides client systems with access to cluster application services. IP Address resources are created on networks that provide clients with access to cluster services.

1.2. Benefits of Real Application Clusters

1.2.1. High Availability

Oracle Real Application Clusters 10g and higher versions provides the infrastructure for datacenter high availability architecture, which provides best practice to provide the highest availability data management solution. Oracle Real Application Clusters provides protection against the main characteristics of high availability solutions.

(15)

Oracle 10g RAC – Concepts Page 15 of 1830

1.2.2. Reliability

Oracle Database is known for its reliability. Real Application Clusters takes this a step further by removing the database server as a single point of failure. If an instance fails, the remaining instances in the cluster are open and active.

1.2.3. Recoverability

Oracle Database includes many features that make it easy to recover from all types of failures. If an instance fails in a RAC database, it is recognized by another instance in the cluster and recovery automatically takes place. Fast Application Notification, Fast Connection Failover and Transparent Application Failover make it easy for applications to mask component failures from the user.

1.2.4. Error Detection

Oracle Clusterware automatically monitors RAC databases and provides fast detection of problems in the environment. Also it automatically recovers from failures often before anyone has notices a failure has occurred. Fast Application Notification provides the ability for applications to receive immediate notification of cluster component failures and mask the failure from the user by resubmitting the transaction to a surviving node in the cluster.

1.2.5. Continuous Operations

Real Application Clusters provides continuous service fro both planned and unplanned outages. If a node (or instance) fails, the database remains open and the application is able to access data. Most database maintenance operations can be completed without down time and are transparent to the user. Many other maintenance tasks can be done in a rolling fashion so application downtime is minimized or removed. Fast Application Notification and Fast Connection Failover assist applications in meeting service level and masking component failures in the cluster.

1.2.6. Scalability

Oracle Real Application Clusters provides unique technology for scaling applications. Traditionally, when the database server ran out of capacity, it was replaced with a new larger server. As servers grow in capacity, they are more expensive. For databases using RAC, there are alternatives for increasing the capacity. Applications that have traditionally run on large SMP servers can be migrated to run on clusters of small servers. Alternatively, you can maintain the investment in the current hardware and add a new server to the cluster (or to create a cluster) to increase the capacity. Adding servers to a cluster with Oracle Clusterware and RAC does not require an outage and as soon as the new instance is started, the application can take advantage of the extra capacity. All servers in the cluster must run the same operating system and same version or Oracle but, they do not have to be exactly the same capacity. Customers today run clusters that fit their needs whether they are clusters of servers where each server is a 2 cup commodity server to clusters where the servers have 32 or 64 cups in each server.

Oracle Real Application Clusters architecture automatically accommodates rapidly changing business requirements and the resulting workload changes. Application users, or mid tier application server clients, connect to the database by way of a service name. Oracle automatically balances the user load among the multiple nodes in the cluster. The Real Application Clusters database instances on the different nodes subscribe to all or some subset of database services. This provides DBAs the flexibility of choosing whether specific application clients that connect to a particular database service can connect to some or all of the database nodes. Administrators can painlessly add processing capacity as application requirements grow. The Cache Fusion architecture of RAC immediately utilizes the CPU and memory resources of the new node. DBAs do not need to manually re-partition data.

Another way of distributing workload in an Oracle database is through the Oracle Database’s parallel execution feature. Parallel execution (I.E. parallel query or parallel DML) divides the work of executing a SQL statement across multiple processes. In an Oracle Real Application Clusters environment, these processes can be balanced across multiple instances. Oracle’s cost-based optimizer incorporates parallel execution considerations as a fundamental component in arriving at optimal execution plans. In a Real Application Clusters environment, intelligent decisions are made with regard to intra-node and inter-node parallelism. For example, if a particular query requires six query processes to complete the work and six CPUs are idle on the local node (the node that the user connected to), then the query is processed using only local resources. This demonstrates efficient intra-node parallelism and eliminates the query coordination overhead across multiple nodes. However, if there are only two CPU’s available on the local node, then those two CPU’s and four CPUS of another node are used to process the query. In this manner, both inter-node and intra-inter-node parallelism are used to provide speed up for query operations.

(16)

Oracle 11g RAC – Architecture Page 16 of 1680

2. RAC Architecture

2.1. RAC Architecture

A RAC database is a clustered database. A cluster is a group of independent servers that cooperate as a single system. Cluster provides improved fault resilience and modular incremental system growth over single symmetric multiprocessor (SMP) systems. In the event of a system failure, clustering ensures high availability to users. Access to mission critical data is not lost. Redundant hardware components, such as additional nodes, interconnect, and disk, allow the cluster to provide high availability. Such redundant hardware architectures avoid single point-of-failure and provide exceptional fault resilience.

With RAC, we de-couple the Oracle Instance (the processes and memory structure running on a server to allow access to the data) from the Oracle database (the physical structure residing on storage which actually holds the data commonly known as datafiles). A clustered database is a single database that can be accessed by multiple instances. Each instance runs on a separate server in the cluster. When additional resources are required, additional nodes and instances can be easily added to the cluster with no downtime. Once the new instances are started applications using services can immediately take advantage of it with no changes to the application on application server.

A RAC database is a logically or physically shared everything database. All data files, control files, parameter files, and redo log files in RAC environments must reside on cluster-aware shared disks so that all of the cluster database instances can access them. All of the instances must also share the same interconnect. In addition, RAC databases can share the same interconnect that the Oracle Clusterware uses. Because a RAC database uses a shared everything architecture, RAC requires cluster-aware storage for all database files. It is your choice as to how to configure your disk, but you must use a supported cluster-aware storage solution. Oracle Database 10g and higher ersions provides Automatic Storage Management (ASM), which is the recommended solution to manage your disk. However you may also use a cluster-aware volume manager or a cluster file system (not required). In RAC, the Oracle Database software manages disk access and the Oracle software is certified for use on a variety of storage architectures. A RAC database can have up to 100 instances. Depending on your platform, you can use the following file storage options for RAC ASM, which Oracle recommends Oracle Cluster File System (OCFS), which is available for Linux and Windows platforms, or a third-party cluster file system that is certified for RAC A network file system Raw devices RAC databases differ architecturally from single-instance Oracle databases in that each

RAC database instance also has:

ƒ

At least one additional thread of redo for each instance

ƒ

An instance-specific undo tablespace

All nodes in a RAC environment must connect to a Local Area Network (LAN) to enable users and applications to access the database. Applications should use the Oracle Database services feature to connect to an Oracle database. Services enable you to define rules and characteristics to control how users and applications connect to database instances. These characteristics include a unique name, workload balancing and failover options, and high availability characteristics. Oracle Net Services enables the load balancing of application connections across all of the instances in a RAC database.

RAC databases have two or more database instances that each contains memory structures and background processes. A RAC database has the same processes and memory structures as a single-instance Oracle database as well as additional process and memory structures that are specific to RAC. Any one instance’s database view is nearly identical to any other instance’s view within the same RAC database; the view is a single system image of the environment.

Each instance has a buffer cache in its System Global Area (SGA). Using Cache Fusion, RAC environments logically combine each instance’s buffer cache to enable the instances to process data as if the data resided on a logically combined, single cache.

To ensure that each RAC database instance obtains the block that it needs to satisfy a query or transaction, RAC instances use two processes, the Global Cache Service (GCS) and the Global Enqueue Service (GES). The GCS and GES maintain records of the status of each data file and each cached block using a Global Resource Directory (GRD). The GRD contents are distributed across all of the active instances, which effectively increase the size of the System Global Area for a RAC instance.

After one instance caches data, any other instance within the same cluster database can acquire a block image from another instance in the same database faster than by reading the block from disk. Therefore, Cache Fusion moves current blocks between instances rather than re-reading the blocks from disk. When a consistent block is needed or a

(17)

Oracle 11g RAC – Architecture Page 17 of 1680 changed block is required on another instance, Cache Fusion transfers the block image directly between the affected instances. RAC uses the private interconnect for inter-instance communication and block transfers. The Global Enqueue Service Monitor and the Instance Enqueue Process manage access to Cache Fusion resources as well as enqueue recovery processing.

2.1.1. Background processes in RAC

These RAC-specific processes and the GRD collaborate to enable Cache Fusion. The RAC-specific processes and their identifiers are as follows:

a. LMSn - Global Cache Service Process b. LMD - Global Enqueue Service Daemon c. LMON - Global Enqueue Service Monitor d. LCKn - Instance Enqueue Process. e. DIAG - Diagnosable Damon Where n is between 1 and 36.

A RAC implementation comprises two or more nodes (instances) accessing a common shared database (i.e., one database is mounted and opened by multiple instances concurrently). In this case, each instance will have all the background process used in a stand-alone configuration, plus the additional background processes required specifically for RAC. Each instance has its own SGA, as well as several background processes, and runs on a separate node having its own CPU and physical memory. Keeping the configurations in all the nodes identical is beneficial for easy maintenance.

Below is the description of the RAC background process

LMS

Global cache services (LMSn) are processes that, when invoked by Oracle, copy blocks directly from the holding instances buffer cache and send a read-consistent copy of the block to the foreground process on the requesting instance. LMS also performs a rollback on any uncommitted transactions for any blocks that are being requested for consistent read by the remote instance.

The number of LMS processes running is driven by the parameter GCS_SERVER_PROCESSES. Oracle supports up to 36 LMS processes (0–9 and a–z). If the parameter is not defined, Oracle will start two LMS processes, which is the default value of GCS_SERVER_PROCESSES.

(18)

Oracle 11g RAC – Architecture Page 18 of 1680

LMON

The Global Enqueue Service Monitor (LMON) is a background process that monitors the entire cluster to manage global resources. By constantly probing the other instances, it checks and manages instance deaths and the associated recovery for Global Cache Services (GCS). When a node joins or leaves the cluster, it handles the reconfiguration of locks and resources. In particular, LMON handles the part of recovery associated with global resources. LMON-provided services are also known as cluster group services (CGS).

LMD

The Global Enqueue Service Daemon (LMD) is a background agent process that manages requests for resources and controls access to blocks and global enqueues.

The LMD process also handles global deadlock detection and remote resource requests (remote resource requests are requests originating from another instance).

LCK

The Lock process (LCK) manages noncache fusion resource requests such as library, row cache, and lock requests that are local to the server. LCK manages instance resource requests and cross-instance call operations for shared resources. It builds a list of invalid lock elements and validates lock elements during recovery. Because the LMS process handles the primary function of lock management, only a single LCK process exists in each instance.

DIAG

The Diagnostic Daemon (DIAG) background process monitors the health of the instance and captures diagnostic data regarding process failures within instances. The operation of this daemon is automated and updates the alert log file to record the activity it performs.

2.1.2. Oracle Clusterware

Starting with Oracle Database 10g, Oracle provides Oracle Clusterware, a portable clusterware solution that is integrated and designed specially for Oracle Database. You no longer have to purchase third party clusterware in order to have a RAC database. Oracle clusterware is integrated with the Oracle Universal Installer, Which the Oracle DBA is already familiar with. Support is made easier as there is one support organization to deal with for the clusterware and cluster database. You can choose to run Oracle RAC with selected third party clusterware, Oracle work with certified third party clusterware however, Oracle clusterware must manage all RAC database.

Oracle clusterware monitors and manages Real Application Cluster databases. When a node in the cluster is started, all instances, listeners and services are automatically started. If an instance fails, the clusterware will automatically restart the instance so the service is often restored before the administrator notices it was down.

With Oracle Database 10g release 2, Oracle provides a High Availability API so that non-Oracle processes can be put under the control of high availability framework within Oracle clusterware. When registering the process with Oracle Clusterware, information is provided on how to start, stop and monitor the process. You can also specify if the process should be relocated to another node in the cluster when the node it is executing on fails.

Prior to Oracle 11.2, the order of installation was therefore Oracle Clusterware, ASM (if required), and finally, the RDBMS software. In Oracle 11.2, Oracle Clusterware and ASM have been merged together into a single Oracle home known as the Grid Infrastructure home. The RDBMS software is still installed into a separate Oracle home. In Oracle 11.2, the order of installation is therefore Oracle Grid Infrastructure (including Oracle Clusterware and ASM), followed by the RDBMS software. It is no longer possible to install Oracle Clusterware and ASM separately. This change has particular relevance in single-instance environments, which now require a special single-node Grid Infrastructure installation that includes a cut-down version of Oracle Clusterware in addition to ASM.

2.1.3. Hardware Architecture

Oracle RAC is a shared everything architecture. All servers in the cluster must share all storage used for a RAC database. The type of disk storage used can be network attached storage (NAS), Storage area network (SAN), or SCSI disk. Your storage choice is dictated by the server hardware choice and what your hardware vendor supports. The key to choosing your storage is choosing a storage system that will provide scalable I/O for your application, an I/O system that will scale as additional servers are added to the cluster.

A cluster requires an additional network on the Local Area Network (LAN) that a database server is attached to for application connections. A cluster requires a second private network commonly known as interconnect. Oracle recommends that you use 2 network interfaces for this network high availability purpose. A network interface bonding

(19)

Oracle 11g RAC – Architecture Page 19 of 1680 external to Oracle should be used to provide failover and load balancing. Interconnect is used to by the cluster for inter-node messaging. Interconnect is also used by RAC to implement the cache fusion technology. Oracle recommends the use of UDP over GigE for the cluster interconnect. The use of crossover cables as interconnect is not supported for a production RAC database.

The cluster is made up of 1 to many servers each having a LAN connection, an interconnect connection, and must be connected to the shared storage. With Oracle Database 10g Release 2, Oracle clusterware and RAC support up to 100 nodes in the cluster whereas Oracle Database 11g Release 1 onwards supports more than 100 nodes per cluster. Each server in the cluster does not have to be exactly same but it must run same operating system, and the same version of Oracle. All servers must support the same architecture E.G all 32bit or all 64bit.

2.1.4. File Systems

Since RAC is a shared everything architecture, the volume management and file system used must be cluster-aware. Oracle recommends the use of Automatic Storage Management (ASM), which is a feature, included with Oracle Database 10g to automate the management of storage for the database. ASM provides the performance of async I/O with the easy management of a file system. ASM distributes I/O load across all available resource to optimize performance while removing the need for manual I/O tuning.

Alternatively Oracle supports the use of raw devices and some cluster file systems such as Oracle cluster file system (OCFS) which is available on Windows, Linux and Oracle ASM Cluster File System(ACFS) which is a POSIX-compliant general purpose file system supported from 11gR2 onwards.

2.1.5. Virtual internet Protocol Address (VIP)

Till Oracle Real Application Cluster 11g requires a virtual IP address for each server in the cluster. But from 11gR2 onwards, 3 more VIPs for SCAN (Single Client Access Name), which will be covered in-depth later. The virtual IP address is an unused IP address on the same subnet as the LAN. The address is used by application to connect to the RAC database. If a node fails, the virtual IP is failed over to another node in the cluster to provide an immediate node down response to connection request. The increase the availability for application as they no longer have to wait for network timeouts before the connection request fails over to another instance in the cluster.

Users can access a RAC database using a client-server configuration or through one or more middle tiers, with or without connection pooling. Users can be DBAs, developers, application users, power users, such as data miners who create their own searches, and so on most public networks typically use TCP/IP, but you can use any supported hardware and software combination. RAC database instances can be accessed through a database’s defined, default IP address and through VIP addresses In addition to the node’s host name and IP address; we must also assign a virtual host name and an IP address to each node. The virtual host name or VIP should be used to connect to the database instance. For example, you might enter the virtual host name rac1-vip in the address list of the

tnsnames.ora file.

A virtual IP address is an alternate public address that client connections use instead of the standard public IP address. To configure VIP addresses, you need to reserve a spare IP address for each node and recommended 3 more VIPs for SCAN per cluster that uses the same subnet as the public network. If a node fails, then the node’s VIP along with the corresponding SCAN VIP are failed over to another node on which the VIP cannot accept connections. Clients that attempt to connect to the VIP receive a rapid connection refused error instead of waiting for TCP connect timeout messages.

2.2. 10g New Features for RAC Administration

High Availability, Workload Management, and Services Oracle RAC introduce integrated Clusterware known as Cluster Ready Services (CRS). You install CRS on all platforms on which you can run Oracle RAC software. CRS manages cluster database functions including node membership, group services, global resource management, and high availability

In Oracle RAC, you can use services to define application workloads by creating a service for each application or for major components within complex applications. You can then define where and when the service runs and thus use services to control your workload.

In cluster and non-cluster environments, the Automatic Workload Repository (AWR) tracks performance metrics using services. You can also set thresholds on performance metrics to automatically generate alerts if these thresholds are exceeded.

(20)

Oracle 11g RAC – Architecture Page 20 of 1680

2.2.1. Enhanced Cluster Manager Implementation

In earlier releases of the Oracle Database, cluster manager implementations on some platforms were referred to as "Cluster Manager". In Oracle Database 10g, Cluster Ready Services (CRS) serves as the Clusterware software, and Cluster Synchronization Services (CSS) is the cluster manager software for all platforms. The Oracle Cluster Synchronization Service Daemon (OCSSD) performs some of the Clusterware functions on UNIX-based systems. On Windows-based systems, Oracle CSService, Oracle CRService replaces the Oracle Database OracleCMService9i.

2.2.2. Enterprise Manager Enhancements for RAC

This release includes the new Web-based Enterprise Manager Database Control with which you can manage a RAC database and Enterprise Manager Grid Control for administering multiple RAC databases. Administration of RAC databases is greatly simplified because of more simplified drill-down tasks and because Enterprise Manager displays cluster-wide performance information. This is available for both single-instance Oracle and RAC databases.

Enterprise Manager has several summary pages that show cluster database performance information at a glance; you no longer have to log in to each cluster database or display instance-specific pages to obtain a global view of cluster database performance.

ƒ

Enhancements for Flash Recovery Area and Automatic Disk-Based Backup and Recovery

ƒ

A flash recovery area is an Automatic Storage Management (ASM) disk group, a file system, or a directory that serves as a default storage area for recovery files. RAC supports the Automatic Disk-Based Backup and Recovery feature that simplifies managing disk space and backup and recovery files.

ƒ

Database Configuration Assistant (DBCA) Enhancements

ƒ

Use the DBCA to perform instance addition and deletion as well as database deletion.

ƒ

Database Upgrade Assistant (DBUA) Enhancements Use the DBUA to upgrade from an earlier RAC version to Oracle Database 10g or higher with RAC. When you upgrade from a Primary/Secondary environment, the DBUA creates one service and assigns it to one instance as a preferred instance, and to the other instance as its available instance.

2.2.3. Server Control (srvctl) Enhancements

Enhancements to SRVCTL support the management of services and Automatic Storage Management (ASM) instances within RAC.

ƒ

Enhanced Recovery Parallelism on Multiple CPU Systems The default for instance, crash, and media recovery is to operate in parallel mode on multiple-CPU systems.

ƒ

Revised Error Messages for High Availability and Management Tools in RAC

ƒ

The high availability error messages have been enhanced for this release.

ƒ

Oracle Cluster Registry (OCR) Enhancements the OCR contains configuration details for the cluster database and for high availability resources such as services, Virtual Interconnect Protocol (VIP) addresses, and so on.

ƒ

GCS_SERVER_PROCESSES Parameter There is a new, static parameter to specify the number of server processes for an instance's Global Cache Service (GCS) for routing inter-instance traffic among RAC instances. The default number of GCS server processes is calculated based on system resources with a minimum of 2. You can set this parameter to different values on different instances.

2.2.4. Oracle Clusterware (10.2.0.1.0)

Oracle Clusterware, formerly known as Cluster Ready Services (CRS) is an integrated cluster management solution that enables you to link multiple servers so that they function as a single system or cluster. This simplifies the infrastructure required for a RAC database by providing cluster software that is integrated with the Oracle Database. In addition, while continuing to be required for RAC databases, the Oracle Clusterware is also available for use with single-instance databases and applications that you deploy on clusters.

2.2.5. Cluster Verification Utility (10.2.0.1.0)

The Cluster Verification Utility (CVU) verifies a wide range of cluster and RAC-specific components such as shared storage devices, networking configurations, system requirements, the Oracle Clusterware, groups, and users. You can use CVU for pre-installation checks as well as for post-installation checks of your cluster environment. You can also use CVU to verify your environment when performing administrative operations such as installation, storage

(21)

Oracle 11g RAC – Architecture Page 21 of 1680 management, node addition, and troubleshooting. The OUI runs CVU immediately after you successfully install the Oracle Clusterware.

2.2.6. Oracle Load Balancing Advisory (10.2.0.1.0)

Applications using a RAC database need to balance the workload across the cluster. To assist in the balancing of application workloads across designated resources, Oracle Database 10g release 2 provides the load balancing advisory. This advisory monitors the current workload activity across the cluster and for each instance where a service is active; it provides a percentage value to indicate how much of the total workload should be sent to this instance as well as a service quality flag. The feedback is provided as an entry in the automatic workload repository and a Fast Application Notification (FAN) event is published. To take advantage of the load balancing advisory, an application can use integrated clients, or clients that use the Runtime Connection Load Balancing feature, or by subscribing directly to the FAN event.

2.2.7. Oracle RAC Runtime Connection Load Balancing using JDBC and

ODP.NET

Oracle supports Runtime Connection Load Balancing to balance work requests across all instances of a RAC database using service level information to select connections from a connection pool. The Oracle Database 10g client enables you to use Runtime Connection Load Balancing when using Java Database Connectivity (JDBC) or ODP.NET connection pools. Runtime Connection Load Balancing balances work requests across instances based on a service’s real-time information. The connection cache manager uses RAC workload metrics and the load balancing policies to select the optimal instance to process a connection request. This results in efficient database resource usage with a balanced and dynamic distribution of the workload among RAC instances based on workload metrics and distribution policy.

2.2.8. Oracle Fast Connection Failover (FCF) (10.2.0.1.0)

You can use FCF with JDBC, OCI, and ODP.NET to recover sessions when UP or DOWN events are published from clients. In the case of a DOWN event, Oracle cleans up any sessions in the connection pool that goes to the instance that stops. For UP events, Oracle creates new connections to the recently-started instance. Clients can use any of the three connection protocols to accept event information that Runtime Connection Load Balancing publishes to re-create sessions and initiate failover. In addition, your chosen connection protocol, JDBC, OCI, or ODP.NET, reacts to throughput information that Runtime Connection Load Balancing publishes to choose the most appropriate connection.

2.2.9. Transparent Data Encryption and RAC (10.2.0.1.0)

Transparent Data Encryption protects data that is stored in Oracle datafiles by preventing access to the data using means other than the normal database access mechanisms. This feature also provides secure storage and management of the encryption keys using a module that is external to the database. Thus, you can encrypt database column access and simultaneously more effectively manage encryption key access. Using Transparent Data Encryption in a RAC environment requires that all of the database instances have access to the same encryption keys. For this release, the only key storage mechanism that is supported is the Oracle Wallet. All of the RAC nodes must be able to access the wallet either through a shared disk or by way of a local copy. All other Transparent Data Encryption administration and usage requirements are the same as those for single-instance Oracle d deployments.

2.2.10. RAC Configuration Assistant Enhancements (10.2.0.1.0)

The Database Configuration Assistant (DBCA), the Database Upgrade Assistant (DBUA) have been enhanced for this release as follows: – DBCA Enhancements for Standalone ASM Configuration When you create a RAC database that uses ASM, the DBCA creates the database in the same Oracle home that the ASM instance uses. If you create the database using a different home than the Oracle home that has ASM and if the ASM version is 10.2, then the DBCA automatically extends ASM from whichever Oracle home ASM is running in. However, if the ASM version is 10.1 and if ASM instances do not yet exist on all of the selected nodes, then the BCA displays an error, prompting the user to either run the add node script or to upgrade ASM using the Database Upgrade Assistant (DBUA).

(22)

Oracle 11g RAC – Architecture Page 22 of 1680

2.2.11. ASM Storage Consolidation

One ASM instance on a node can support both single-instance Oracle database instances and RAC instances running on that node.

2.2.12. Dynamic RMAN Channel Allocation for RAC Environments

In previous releases, to use RMAN's parallelism in RAC, you had to manually allocate an RMAN channel for each instance. You can now use the syntax CONFIGURE DEVICE TYPE device PARALLELISM n in RAC in the same way as in single-instance Oracle database environments. Dynamic channel allocation is only applicable where each node can access all of the datafiles, archived logs, and so on, in a RAC environment.

2.2.13. Failover Improvements for Distributed Transaction Processing (DTP)

in RAC

Oracle DTP transaction environments should now use services to simplify management in a RAC environment. This feature automates the implementation of workarounds for using distributed transactions in RAC. This feature leverages the Oracle services framework so that failure detection, failover, and fail back are transparent to DBAs. In this release, DTP services automate the steps that are required to configure a RAC database to support distributed transactions in DTP environments. A DTP service will only be active on one instance in the cluster at a time. By creating multiple DTP services, with one or more DTP services enabled on each RAC instance, all tightly coupled branches of a global distributed transaction go to the same instance. In this way, you can leverage all of the instances of a RAC database to balance the distributed transaction load and thereby maximize application throughput.

For current and future client implementations, such as those for JDBC, you do not need the invocation to the

SYS.DBMS_SYSTEM.DIST_TXN_SYNC procedure because the OPS_FAILOVER flag is deprecated. Instead, the server manages the synchronization of in-doubt transaction information across the RAC instances for transaction recovery.

2.2.14. Multiple Oracle Clusterware Files

When you install the Oracle Clusterware, you can select the option of using multiple voting disks that reside on independent shared physical disks. This removes the requirement that the voting disk use redundant storage; now Oracle provides the redundancy and you do not need to use third party storage solutions to duplicate the voting disk. You can also select the option of mirroring your Oracle Cluster Registry (OCR). In addition, you can replace, repair, or remove an OCR if it fails, and you can perform this operation while the OCR is online. If you do not select the OCR mirroring option during the Oracle Clusterware installation, then you can mirror the OCR later.

2.2.15. Fast-Start Failover and Data Guard Environments

Fast-start failover, which is provided with the Oracle Data Guard broker, enables failovers to occur automatically when a RAC primary database becomes unavailable. This occurs without DBA intervention and with no loss of data. When fast-start failover is enabled, the broker determines if a failover is necessary and automatically initiates the failover to a pre-specified target RAC standby database instance. Fast-start failover will not occur in a RAC environment until all instances comprising a RAC primary database have failed. Moreover, after a failover completes, the broker can automatically reinstate the former primary database as a standby database in the new configuration.

2.3. 11gR2 New Features for RAC Administration

2.3.1. Oracle Real Application Clusters One Node (Oracle RAC One Node)

Oracle Database 11g release 2 (11.2) introduces a new option, Oracle Real Application Clusters One Node (Oracle RAC One Node). Oracle RAC One Node is a single instance of Oracle Real Application Clusters (Oracle RAC) that runs on one node in a cluster. This option adds to the flexibility that Oracle offers for database consolidation. You can consolidate many databases into one cluster with minimal overhead while also providing the high availability benefits

References

Related documents