Maximizing
Oracle RAC Uptime
Ian Cookson, Markus Michalewicz
Oracle Real Application Clusters (RAC)
Product Management / Development
September 29, 2014
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracle’s products remains at the sole discretion of Oracle.
The System Lifecycle
Implementation
Operation
Monitoring
Diagnosis
Installation
The System Lifecycle
Implementation
Operation
Monitoring
Diagnosis
Installation
Installation
spain
Oracle GI
|
Leaf
•
Server OS:
–
HUBs 4GB+ memory recommended
•
One HUB at a time will host GIMR database.
•
Only HUBs will host (Flex) ASM instances.
•
Leafs can have less memory, dependent on the use case.
•
Installer enforces HUB minimum memory requirement.
–
OL 6.5 UEK (other kernels are supported)
Installation – System assumed for this presentation
brazil
argentina germany
Oracle GI
|
HUB
Oracle GI
|
HUB
Oracle GI
|
HUB
Oracle RAC
Oracle RAC
italy
Oracle GI
|
Leaf
•
Installation is an infrequent task
•
It should be standardized
–
Follow:
http://www.slideshare.net/MarkusMichalewicz/oracle-rac-12c-collaborate-best-practices-ioug-2014-version
–
and come to the Oracle RAC demo booth (3787)
•
Tools to use:
1.
Linux: pre-install package
2.
Cluster Verification Utility (CVU)
3.
Oracle Universal Installer (OUI)
Installation
[root@germany ~]# uname –a
3.8.13-16.2.1.el6uek.x86_64 #1 SMP Thu Nov 7 17:01:44 PST 2013
x86_64 x86_64 x86_64 GNU/Linux
#Get the pre-install package
[root@germany Desktop]# yum list oracle-*
oracle-rdbms-server-11gR2-preinstall.x86_64 1.0-7.el6 ol6_latest oracle-rdbms-server-12cR1-preinstall.x86_64 1.0-8.el6 ol6_latest
•
OUI provides a simple GUI for:
•
Installation and Configuration
•
Upgrades
•
OUI calls cluvfy for:
•
Verification checks
•
Generating ‘fixup’ scripts
The System Lifecycle
Implementation
Operation
Monitoring
Diagnosis
Installation
Implementation
Implementation
•
Implementation is a recurring task
–
Initial implementation
–
Change implementation(s) as required
•
Implementation tasks are system-specific
•
Tools to use:
1.
CVU
2.
OraChk
CVU
Cluster Verification Utility (CVU) – Introduction
•
Purpose:
–
Verification of pre-install & post-install cluster setup
–
Run manually (command: cluvfy) or as part of the OUI
–
Available from OTN and included in Oracle Grid Infrastructure
–
Supports the Oracle RAC stack since version 10g Rel. 1
•
What does it do?:
–
Runs specified verification tests and optionally generates a ‘fixup’ script (run under root)
What does CVU Check?
•
System requirements
–
Are the installation requirements met for Clusterware, or RAC?
•
Network and connectivity
•
Cluster Time Synchronization (CTSS or NTP)
•
Existence of required OS users and permissions
•
Prerequisites for adding nodes
CVU for Pre-Implementation Checks
•
Purpose:
–
Verification of configuration after installation, prior to implementation (is the system ready?)
•
What Checks to be Made?:
–
Use ‘post’ checks to verify that system is indeed ready, and
–
Confirm that post-installation changes made to the system will not cause problems
•
Examples:
CVU for Pre-Implementation Checks - Example
$ cluvfy stage -post hwos -n germany,argentina –verbose
Performing post-checks for hardware and operating system setup
Checking node reachability...
Check: Node reachability from node "germany“
Destination Node Reachable?
--- ---
germany yes
argentina yes
Result: Node reachability check passed from node "germany“
Checking user equivalence...
Check: User equivalence for user "grid“
Node Name Status
--- ---
argentina passed
germany passed
Result: User equivalence check passed for user "grid“
…
•
Ora
Chk
–
Formerly RACchk or RACcheck
–
aka ExaChk
•
RAC Configuration Audit Tool
–
For details see MOS note ID 1268927.1
•
Checks Oracle Stack:
–
Standalone Database
–
Grid Infrastructure & RAC
–
Maximum Availability Architecture
(MAA) Validation
–
Oracle Hardware
Ora
Chk
Engineered Systems
require less initial testing
Ora
Chk – Installation and Configuration
•
Installation:
–
Download the latest version of orachk (90 day reminder…)
–
Unzip in local directory under the oracle user
–
Check permission are 755 on orachk
•
Configuration:
–
Run manually or in silent mode (via daemon)
Ora
Chk – Usage
•
Usage : ./orachk [-abvhpfmsuSo:c]
-a
- all checks
-b
- best practices only
-p
- patch recommendations only
-f
- offline (reports from existing data only)
-u
- pre-upgrade checks
-S or -s
- for silent installs, with or without SUDO capabilities
-c
- check individual components (ie. orachk –a –c ASM)
-o
- to invoke optional functionality (ie. to display only non-passing audit checks, verbose format, etc)
-m
- exclude MAA checks
Ora
Chk – Example
Check Id
Status
Type
Message
Status On
Details
E960DB20CA5A634F
E04312C0E50A62E0
FAIL
SQL Check
Table containing SecureFiles LOB storage belongs
to a tablespace with extent allocation type that is
not SYSTEM managed (not AUTOALLOCATE)
All Databases
View
6580DCAAE8A28F5B
E0401490CACF6186
WARNING
OS Check
The number of async IO descriptors is too low
(/proc/sys/fs/aio-max-nr)
All Database
Servers
View
5ADD88EC8E0AFF2E
E0401490CACF0C10
WARNING
OS Check
net.core.wmem_max Is NOT Configured
According to Recommendation
All Database
Servers
View
84BE4DE1F00AD833
E040E50A1EC07771
INFO
OS Check
Kernel Parameter fs.file-max Is Lower Than The
Recommended Value
All Database
Servers
View
66E70B43167837ABE
040E50A1EC02FEA
INFO
OS Check
ORA-00600 errors found in alert log
All Database
Servers
View
Database Server
Oracle orachk Assessment Report
System Health Score is 75 out of 100 (detail)
Ora
Chk report in html format
Summary with links to content
Ora
Chk – Example
MAA Scorecard
Oracle orachk Assessment Report
System Health Score is 75 out of 100 (detail)
DATA CORRUPTION
PREVENTION BEST
PRACTICES
FAIL
OS Check
Active Data Guard is not configured
All Database Servers
View
FAIL
SQL Parameter
Check
Database parameter
DB_BLOCK_CHECKSUM is NOT set to
recommended value
All Instances
View
Ora
Chk highlights failures
The System Lifecycle
Implementation
Operation
Monitoring
Diagnosis
Installation
Operation
•
Operation is an ongoing task
–
Oracle Grid Infrastructure provides all
necessary tools for normal operation.
•
Operation should not create extra tasks
–
Automation is the key
•
Tools to use:
1.
CVU (periodic runs)
2.
OraChk (interval runs via daemon)
3.
Cluster Health Monitor (CHM/OS)
Operation
CVU
Operations – Periodic CVU Checks are the Default
[GRID]> crsctl status res -t
---
Name Target State Server State details
---
Local Resources
---
ora.ASMNET1LSNR_ASM.lsnr
ONLINE ONLINE argentina STABLE
ONLINE ONLINE brazil STABLE
ONLINE ONLINE germany STABLE
...
ora.cvu
1 ONLINE ONLINE brazil STABLE
ora.germany.vip
1 ONLINE ONLINE germany
...
[GRID]> crsctl status res ora.cvu -p
NAME=ora.cvu
TYPE=ora.cvu.type
ACL=owner:grid:rwx,pgrp:oinstall:rwx,other::r--
ACTIONS=
ACTION_SCRIPT=
ACTION_TIMEOUT=60
ACTIVE_PLACEMENT=0
AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%
AUTO_START=restore
CARDINALITY=1
CHECK_INTERVAL=60
CHECK_RESULTS
=PRVF-4090 : Node connectivity failed for interface "*",PRVF-4090 : Node connectivity
failed for interface "*",PRVF-4090 : Node connectivity failed for interface "*",PRVF-4090 : Node connectivity
failed for interface "*",PRVG-1101 : SCAN name "cupscan.cupgnsdom.localdomain" failed to
resolve,PRVF-4657 : Name resolution setup check for "cupscan.cupgnsdom.localdomain" (IP address: 10.1.1.55)
failed,PRVF-4090 : Node connectivity failed for interface "*",PRVG-11050 : No matching interfaces "*" for
subnet "172.149.0.0" on nodes "argentina,brazil,germany",PRVG-11050 : No matching interfaces "*" for
subnet "172.149.0.0" on nodes "argentina,brazil,germany",PRVF-7530 : Sufficient physical memory is not
available on node "germany" [Required physical memory = 4GB (4194304.0KB)],PRVF-4354 : Proper hard
limit for resource "maximum open file descriptors" not found on node "germany" [Expected = "65536" ;
Found = "4096”…
Operations – Setup Periodic OraChk System Checks
<<< Configure & start orachk daemon for scheduled interval runs >>>
$
./orachk -id DBA -set \
> "[email protected];\
> AUTORUN_SCHEDULE = 4,8,12,16,20 * * *;\
> AUTORUN_FLAGS=-profile dba; COLLECTION_RETENTION=30“
•
Service integrated with the Oracle Clusterware
stack
•
Introduced in 11.2.0.2 (Linux, Solaris, Windows),
11.2.0.3(AIX)
•
Gathers OS level metrics to monitor resource
degradation and failure
•
Stores data in a central repository (
GIMR)
•
Runs real time with locked down memory for last
gasp analysis
•
Integration with QoS (Memory Guard) and CRS
(server pool categorization)
•
Integrated into EM Cloud Control
Cluster Health Monitor (CHM/OS)
germany argentina italy brazil
osysmond
Oracle GI
Oracle GI
Oracle GI
Oracle GI
osysmond
osysmond
osysmond
OLOGGERD
Cluster Health Monitor – Deamons / Processes
osysmond
ologgerd
oclumon
Function
•
Collect OS metrics
•
Process raw data for subset
of processes
•
Compress and send data to
ologgerd
•
Store/forward in case of
network failures
•
Consume data from all
active osysmonds
•
Store data in the repository
•
Service requests from
clients
•
Display OS level metrics in
historic/ real time mode
•
Perform repository
management operations
Managed by
ohasd
osysmond
Command line utility
Instances and location
Every node of the cluster
(including leaf nodes)
One per cluster
(Replica for 11.2.x)
Can be invoked from any
hub node in the cluster
Cluster Health Monitor – command line reporting
•
Command line reporting of current and historic OS metrics (oclumon)
–
from any hub node in the cluster
•
Example:
The System Lifecycle
Implementation
Operation
Monitoring
Diagnosis
Installation
Monitoring
Monitoring
•
Monitoring is an ongoing task
–
There is optional monitoring available for
an Oracle RAC cluster via QoS and Oracle EM
–
Quality of Service Management (QoS)
comes with a monitoring only feature
•
Monitoring is a pro-active task.
•
Tools to use:
1.
Oracle Enterprise Manager 12
c
CC
2.
Oracle Quality of Service Management
(Memory Guard)
Quality of Service Management – Memory Guard
•
QoS Feature externalized for general use
•
Memory Guard protects resources
–
Receives a stream of OS Memory metrics from CHM/OS
•
Issues alert should any server be at risk
•
Protects existing work and applications by automatically closing the server
to new connections (ie. stops service on at-risk node)
•
Automatically re-opens server to connections once the memory pressure
Autonomous Computing
QoS
CHM
CHA
HngMgr
Policy
Self- Optimizing Self- Protecting Self- Configuring Self- HealingEnabling Autonomous Computing
Cluster Health Monitor
(CHM)/OS & QoS 11.2+
LOGGERD
sysmond
CHM/OS
•
QoS Support for Measure
only with Performance
Objectives and Alerts
•
QoS Support for
Measuring and
Monitoring
Admin-Managed Databases
Further QoS & CHM
Enhancements in 12.1.0.2
Cluster Health Advisor
Coming soon…
The System Lifecycle
Implementation
Operation
Monitoring
Diagnosis
Installation
Diagnosis
•
Diagnosis is a recurring task
–
Ideally, there will be no incidents on system.
–
Realistically, there will be more than one.
•
Diagnosis is a reactive task.
–
It should be performed as efficiently as possible.
•
Tools to use:
1.
Trace File Analyzer (TFA)
•
Trace File Analyzer
–
Improved comprehensive first failure diagnostics
collection
–
Efficient collection, packaging and transfer
of data
–
Collect for all relevant components (OS, Grid Infra.,
ASM, RDBMS), including Exadata cell nodes
–
One command to collect all information, from all
nodes (or single-instance, single-node)
•
More information: MOS note ID 1513912.1
Trace File Analyzer (TFA) – intelligent log collection
Sending diagcollect request to host : argentina
Getting list of files satisfying time range [Tue Sep 03 14:17:43 PDT 2014, Tue Sep 03 18:17:43 PDT 2014] germany: Zipping File: /opt/oracle/oak/oswbb/archive/oswiostat/germany_iostat_14.09.03.1500.dat.gz germany: Zipping File: /u01/app/oracle/diag/rdbms/bill/bill1/trace/alert_bill1.log
Trimming file : /u01/app/oracle/diag/rdbms/bill/bill1/trace/alert_bill1.log with original file size : 109kB germany: Zipping File: /opt/oracle/oak/oswbb/archive/oswtop/germany_top_14.09.03.1500.dat.gz germany: Zipping File: /opt/oracle/oak/log/germany/oak/oakd.log
Trimming file : /opt/oracle/oak/log/germany/oak/oakd.log with original file size : 9.2MB germany: Zipping File: /u01/app/12.1.0.2/grid/log/germany/gipcd/gipcd.log
germany: Zipping File: /u01/app/12.1.0.2/grid/log/germany/agent/ohasd/oraagent_grid/oraagent_grid.log
Trimming file : /u01/app/12.1.0.2/grid/log/germany/agent/ohasd/oraagent_grid/oraagent_grid.log with original filesize 4.3MB germany: Zipping File: /var/log/messages
…
germany: Zipping File: /opt/oracle/oak/oswbb/archive/oswslabinfo/germany_slabinfo_14.09.03.1800.dat Collecting ADR incident files...
Total Number of Files checked : 10543
Total Size of all Files Checked : 3.9GB
Number of files containing required range : 68 Total Size of Files containing required range : 129MB Number of files trimmed : 10
Total Size of data prior to zip : 144MB
Saved 63MB by trimming files
Zip file size : 8.6MB Total time taken : 47s.
Logs are collected to:
/opt/oracle/tfa/tfa_home/repository/collection_Tue_Sep_3_18_17_24_PDT_2014_node_all/germany.tfa_Tue_Sep_3_18_17_24_PDT_2014.zip /opt/oracle/tfa/tfa_home/repository/collection_Tue_Sep_3_18_17_24_PDT_2014_node_all/argentina.tfa_Tue_Sep_3_18_17_24_PDT_2014.zip
$ ./tfactl diagcollect
One simple command
OS Watcher files
Pruning
47 seconds!
– 1 command, 2 nodes, 4 databases, ASM, Clusterware, OS
Relevant files only
144MB pruned and compressed down to 8.6MB
ADR Incident files
Trace File Analyzer (TFA) – Efficiency from A-Z
germanyOracle GI
|
HUB
Oracle RAC
brazilOracle GI
|
HUB
Oracle RAC
LOGs LOGsUtility Cluster
Enterprise Management (EM) Server
+1
Grid Home Server (Rapid Home Provisioning)
Storage Server Node2 Node1 Oracle ASM Oracle Clusterware ASM ASM
Flex ASM Storage
IOsrv IOsrv