• No results found

Maximizing Oracle RAC Uptime

N/A
N/A
Protected

Academic year: 2021

Share "Maximizing Oracle RAC Uptime"

Copied!
42
0
0

Loading.... (view fulltext now)

Full text

(1)
(2)

Maximizing

Oracle RAC Uptime

Ian Cookson, Markus Michalewicz

Oracle Real Application Clusters (RAC)

Product Management / Development

September 29, 2014

(3)

Safe Harbor Statement

The following is intended to outline our general product direction. It is intended for

information purposes only, and may not be incorporated into any contract. It is not a

commitment to deliver any material, code, or functionality, and should not be relied upon

in making purchasing decisions. The development, release, and timing of any features or

functionality described for Oracle’s products remains at the sole discretion of Oracle.

(4)

The System Lifecycle

Implementation

Operation

Monitoring

Diagnosis

Installation

(5)

The System Lifecycle

Implementation

Operation

Monitoring

Diagnosis

Installation

Installation

(6)

spain

Oracle GI

|

Leaf

Server OS:

HUBs 4GB+ memory recommended

One HUB at a time will host GIMR database.

Only HUBs will host (Flex) ASM instances.

Leafs can have less memory, dependent on the use case.

Installer enforces HUB minimum memory requirement.

OL 6.5 UEK (other kernels are supported)

Installation – System assumed for this presentation

brazil

argentina germany

Oracle GI

|

HUB

Oracle GI

|

HUB

Oracle GI

|

HUB

Oracle RAC

Oracle RAC

italy

Oracle GI

|

Leaf

(7)

Installation is an infrequent task

It should be standardized

Follow:

http://www.slideshare.net/MarkusMichalewicz/oracle-rac-12c-collaborate-best-practices-ioug-2014-version

and come to the Oracle RAC demo booth (3787)

Tools to use:

1.

Linux: pre-install package

2.

Cluster Verification Utility (CVU)

3.

Oracle Universal Installer (OUI)

Installation

[root@germany ~]# uname –a

3.8.13-16.2.1.el6uek.x86_64 #1 SMP Thu Nov 7 17:01:44 PST 2013

x86_64 x86_64 x86_64 GNU/Linux

#Get the pre-install package

[root@germany Desktop]# yum list oracle-*

oracle-rdbms-server-11gR2-preinstall.x86_64 1.0-7.el6 ol6_latest oracle-rdbms-server-12cR1-preinstall.x86_64 1.0-8.el6 ol6_latest

(8)

OUI provides a simple GUI for:

Installation and Configuration

Upgrades

OUI calls cluvfy for:

Verification checks

Generating ‘fixup’ scripts

(9)

The System Lifecycle

Implementation

Operation

Monitoring

Diagnosis

Installation

Implementation

(10)

Implementation

Implementation is a recurring task

Initial implementation

Change implementation(s) as required

Implementation tasks are system-specific

Tools to use:

1.

CVU

2.

OraChk

CVU

(11)

Cluster Verification Utility (CVU) – Introduction

Purpose:

Verification of pre-install & post-install cluster setup

Run manually (command: cluvfy) or as part of the OUI

Available from OTN and included in Oracle Grid Infrastructure

Supports the Oracle RAC stack since version 10g Rel. 1

What does it do?:

Runs specified verification tests and optionally generates a ‘fixup’ script (run under root)

(12)

What does CVU Check?

System requirements

Are the installation requirements met for Clusterware, or RAC?

Network and connectivity

Cluster Time Synchronization (CTSS or NTP)

Existence of required OS users and permissions

Prerequisites for adding nodes

(13)

CVU for Pre-Implementation Checks

Purpose:

Verification of configuration after installation, prior to implementation (is the system ready?)

What Checks to be Made?:

Use ‘post’ checks to verify that system is indeed ready, and

Confirm that post-installation changes made to the system will not cause problems

Examples:

(14)

CVU for Pre-Implementation Checks - Example

$ cluvfy stage -post hwos -n germany,argentina –verbose

Performing post-checks for hardware and operating system setup

Checking node reachability...

Check: Node reachability from node "germany“

Destination Node Reachable?

--- ---

germany yes

argentina yes

Result: Node reachability check passed from node "germany“

Checking user equivalence...

Check: User equivalence for user "grid“

Node Name Status

--- ---

argentina passed

germany passed

Result: User equivalence check passed for user "grid“

(15)

Ora

Chk

Formerly RACchk or RACcheck

aka ExaChk

RAC Configuration Audit Tool

For details see MOS note ID 1268927.1

Checks Oracle Stack:

Standalone Database

Grid Infrastructure & RAC

Maximum Availability Architecture

(MAA) Validation

Oracle Hardware

Ora

Chk

Engineered Systems

require less initial testing

(16)

Ora

Chk – Installation and Configuration

Installation:

Download the latest version of orachk (90 day reminder…)

Unzip in local directory under the oracle user

Check permission are 755 on orachk

Configuration:

Run manually or in silent mode (via daemon)

(17)

Ora

Chk – Usage

Usage : ./orachk [-abvhpfmsuSo:c]

-a

- all checks

-b

- best practices only

-p

- patch recommendations only

-f

- offline (reports from existing data only)

-u

- pre-upgrade checks

-S or -s

- for silent installs, with or without SUDO capabilities

-c

- check individual components (ie. orachk –a –c ASM)

-o

- to invoke optional functionality (ie. to display only non-passing audit checks, verbose format, etc)

-m

- exclude MAA checks

(18)

Ora

Chk – Example

Check Id

Status

Type

Message

Status On

Details

E960DB20CA5A634F

E04312C0E50A62E0

FAIL

SQL Check

Table containing SecureFiles LOB storage belongs

to a tablespace with extent allocation type that is

not SYSTEM managed (not AUTOALLOCATE)

All Databases

View

6580DCAAE8A28F5B

E0401490CACF6186

WARNING

OS Check

The number of async IO descriptors is too low

(/proc/sys/fs/aio-max-nr)

All Database

Servers

View

5ADD88EC8E0AFF2E

E0401490CACF0C10

WARNING

OS Check

net.core.wmem_max Is NOT Configured

According to Recommendation

All Database

Servers

View

84BE4DE1F00AD833

E040E50A1EC07771

INFO

OS Check

Kernel Parameter fs.file-max Is Lower Than The

Recommended Value

All Database

Servers

View

66E70B43167837ABE

040E50A1EC02FEA

INFO

OS Check

ORA-00600 errors found in alert log

All Database

Servers

View

Database Server

Oracle orachk Assessment Report

System Health Score is 75 out of 100 (detail)

Ora

Chk report in html format

Summary with links to content

(19)

Ora

Chk – Example

MAA Scorecard

Oracle orachk Assessment Report

System Health Score is 75 out of 100 (detail)

DATA CORRUPTION

PREVENTION BEST

PRACTICES

FAIL

OS Check

Active Data Guard is not configured

All Database Servers

View

FAIL

SQL Parameter

Check

Database parameter

DB_BLOCK_CHECKSUM is NOT set to

recommended value

All Instances

View

Ora

Chk highlights failures

(20)

The System Lifecycle

Implementation

Operation

Monitoring

Diagnosis

Installation

Operation

(21)

Operation is an ongoing task

Oracle Grid Infrastructure provides all

necessary tools for normal operation.

Operation should not create extra tasks

Automation is the key

Tools to use:

1.

CVU (periodic runs)

2.

OraChk (interval runs via daemon)

3.

Cluster Health Monitor (CHM/OS)

Operation

CVU

(22)

Operations – Periodic CVU Checks are the Default

[GRID]> crsctl status res -t

---

Name Target State Server State details

---

Local Resources

---

ora.ASMNET1LSNR_ASM.lsnr

ONLINE ONLINE argentina STABLE

ONLINE ONLINE brazil STABLE

ONLINE ONLINE germany STABLE

...

ora.cvu

1 ONLINE ONLINE brazil STABLE

ora.germany.vip

1 ONLINE ONLINE germany

...

[GRID]> crsctl status res ora.cvu -p

NAME=ora.cvu

TYPE=ora.cvu.type

ACL=owner:grid:rwx,pgrp:oinstall:rwx,other::r--

ACTIONS=

ACTION_SCRIPT=

ACTION_TIMEOUT=60

ACTIVE_PLACEMENT=0

AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%

AUTO_START=restore

CARDINALITY=1

CHECK_INTERVAL=60

CHECK_RESULTS

=PRVF-4090 : Node connectivity failed for interface "*",PRVF-4090 : Node connectivity

failed for interface "*",PRVF-4090 : Node connectivity failed for interface "*",PRVF-4090 : Node connectivity

failed for interface "*",PRVG-1101 : SCAN name "cupscan.cupgnsdom.localdomain" failed to

resolve,PRVF-4657 : Name resolution setup check for "cupscan.cupgnsdom.localdomain" (IP address: 10.1.1.55)

failed,PRVF-4090 : Node connectivity failed for interface "*",PRVG-11050 : No matching interfaces "*" for

subnet "172.149.0.0" on nodes "argentina,brazil,germany",PRVG-11050 : No matching interfaces "*" for

subnet "172.149.0.0" on nodes "argentina,brazil,germany",PRVF-7530 : Sufficient physical memory is not

available on node "germany" [Required physical memory = 4GB (4194304.0KB)],PRVF-4354 : Proper hard

limit for resource "maximum open file descriptors" not found on node "germany" [Expected = "65536" ;

Found = "4096”…

(23)

Operations – Setup Periodic OraChk System Checks

<<< Configure & start orachk daemon for scheduled interval runs >>>

$

./orachk -id DBA -set \

> "[email protected];\

> AUTORUN_SCHEDULE = 4,8,12,16,20 * * *;\

> AUTORUN_FLAGS=-profile dba; COLLECTION_RETENTION=30“

(24)

Service integrated with the Oracle Clusterware

stack

Introduced in 11.2.0.2 (Linux, Solaris, Windows),

11.2.0.3(AIX)

Gathers OS level metrics to monitor resource

degradation and failure

Stores data in a central repository (

GIMR)

Runs real time with locked down memory for last

gasp analysis

Integration with QoS (Memory Guard) and CRS

(server pool categorization)

Integrated into EM Cloud Control

Cluster Health Monitor (CHM/OS)

germany argentina italy brazil

osysmond

Oracle GI

Oracle GI

Oracle GI

Oracle GI

osysmond

osysmond

osysmond

OLOGGERD

(25)

Cluster Health Monitor – Deamons / Processes

osysmond

ologgerd

oclumon

Function

Collect OS metrics

Process raw data for subset

of processes

Compress and send data to

ologgerd

Store/forward in case of

network failures

Consume data from all

active osysmonds

Store data in the repository

Service requests from

clients

Display OS level metrics in

historic/ real time mode

Perform repository

management operations

Managed by

ohasd

osysmond

Command line utility

Instances and location

Every node of the cluster

(including leaf nodes)

One per cluster

(Replica for 11.2.x)

Can be invoked from any

hub node in the cluster

(26)
(27)
(28)

Cluster Health Monitor – command line reporting

Command line reporting of current and historic OS metrics (oclumon)

from any hub node in the cluster

Example:

(29)

The System Lifecycle

Implementation

Operation

Monitoring

Diagnosis

Installation

Monitoring

(30)

Monitoring

Monitoring is an ongoing task

There is optional monitoring available for

an Oracle RAC cluster via QoS and Oracle EM

Quality of Service Management (QoS)

comes with a monitoring only feature

Monitoring is a pro-active task.

Tools to use:

1.

Oracle Enterprise Manager 12

c

CC

2.

Oracle Quality of Service Management

(Memory Guard)

(31)
(32)

Quality of Service Management – Memory Guard

QoS Feature externalized for general use

Memory Guard protects resources

Receives a stream of OS Memory metrics from CHM/OS

Issues alert should any server be at risk

Protects existing work and applications by automatically closing the server

to new connections (ie. stops service on at-risk node)

Automatically re-opens server to connections once the memory pressure

(33)

Autonomous Computing

QoS

CHM

CHA

HngMgr

Policy

Self- Optimizing Self- Protecting Self- Configuring Self- Healing

(34)

Enabling Autonomous Computing

Cluster Health Monitor

(CHM)/OS & QoS 11.2+

LOGGERD

sysmond

CHM/OS

QoS Support for Measure

only with Performance

Objectives and Alerts

QoS Support for

Measuring and

Monitoring

Admin-Managed Databases

Further QoS & CHM

Enhancements in 12.1.0.2

Cluster Health Advisor

Coming soon…

(35)

The System Lifecycle

Implementation

Operation

Monitoring

Diagnosis

Installation

Diagnosis

(36)

Diagnosis is a recurring task

Ideally, there will be no incidents on system.

Realistically, there will be more than one.

Diagnosis is a reactive task.

It should be performed as efficiently as possible.

Tools to use:

1.

Trace File Analyzer (TFA)

(37)

Trace File Analyzer

Improved comprehensive first failure diagnostics

collection

Efficient collection, packaging and transfer

of data

Collect for all relevant components (OS, Grid Infra.,

ASM, RDBMS), including Exadata cell nodes

One command to collect all information, from all

nodes (or single-instance, single-node)

More information: MOS note ID 1513912.1

(38)

Trace File Analyzer (TFA) – intelligent log collection

Sending diagcollect request to host : argentina

Getting list of files satisfying time range [Tue Sep 03 14:17:43 PDT 2014, Tue Sep 03 18:17:43 PDT 2014] germany: Zipping File: /opt/oracle/oak/oswbb/archive/oswiostat/germany_iostat_14.09.03.1500.dat.gz germany: Zipping File: /u01/app/oracle/diag/rdbms/bill/bill1/trace/alert_bill1.log

Trimming file : /u01/app/oracle/diag/rdbms/bill/bill1/trace/alert_bill1.log with original file size : 109kB germany: Zipping File: /opt/oracle/oak/oswbb/archive/oswtop/germany_top_14.09.03.1500.dat.gz germany: Zipping File: /opt/oracle/oak/log/germany/oak/oakd.log

Trimming file : /opt/oracle/oak/log/germany/oak/oakd.log with original file size : 9.2MB germany: Zipping File: /u01/app/12.1.0.2/grid/log/germany/gipcd/gipcd.log

germany: Zipping File: /u01/app/12.1.0.2/grid/log/germany/agent/ohasd/oraagent_grid/oraagent_grid.log

Trimming file : /u01/app/12.1.0.2/grid/log/germany/agent/ohasd/oraagent_grid/oraagent_grid.log with original filesize 4.3MB germany: Zipping File: /var/log/messages

germany: Zipping File: /opt/oracle/oak/oswbb/archive/oswslabinfo/germany_slabinfo_14.09.03.1800.dat Collecting ADR incident files...

Total Number of Files checked : 10543

Total Size of all Files Checked : 3.9GB

Number of files containing required range : 68 Total Size of Files containing required range : 129MB Number of files trimmed : 10

Total Size of data prior to zip : 144MB

Saved 63MB by trimming files

Zip file size : 8.6MB Total time taken : 47s.

Logs are collected to:

/opt/oracle/tfa/tfa_home/repository/collection_Tue_Sep_3_18_17_24_PDT_2014_node_all/germany.tfa_Tue_Sep_3_18_17_24_PDT_2014.zip /opt/oracle/tfa/tfa_home/repository/collection_Tue_Sep_3_18_17_24_PDT_2014_node_all/argentina.tfa_Tue_Sep_3_18_17_24_PDT_2014.zip

$ ./tfactl diagcollect

One simple command

OS Watcher files

Pruning

47 seconds!

– 1 command, 2 nodes, 4 databases, ASM, Clusterware, OS

Relevant files only

144MB pruned and compressed down to 8.6MB

ADR Incident files

(39)

Trace File Analyzer (TFA) – Efficiency from A-Z

germany

Oracle GI

|

HUB

Oracle RAC

brazil

Oracle GI

|

HUB

Oracle RAC

LOGs LOGs

(40)

Utility Cluster

Enterprise Management (EM) Server

+1

Grid Home Server (Rapid Home Provisioning)

Storage Server Node2 Node1 Oracle ASM Oracle Clusterware ASM ASM

Flex ASM Storage

IOsrv IOsrv

Utility Cluster

Node 1

Database Domain

Application Domain

Application Domain

Application Domain

Database Domain

Application Domain

Application Domain

Application Domain

Node 2

Utility Cluster

Centralize and standardize storage,

deployment, management and diagnostics

Architecture:

An Oracle Grid Infrastructure based cluster

(41)

The System Lifecycle

Implementation

Operation

Monitoring

Diagnosis

Installation

(42)

References

Related documents

SAP – Oracle Database Backup Using cpio or dd Data files Online redo log files Oracle Database Control file Offline redo log files Detail log Summary log Media Media cpio/

Although the Whistleblower Office has begun to track the step in the claim review process at which claims are rejected, E-TRAK does not include data fields for tracking the

Back Up Using the Agent for Oracle in File-based Mode (see page 25) Restore Using the Agent for Oracle in File-based Mode (see page 30) Multiple Oracle Version Support Using

a. /u01/app/oracle/oradata/orcl/chg_track.f in the Block Change Tracking File field.. You are returned to the Maintenance page. 3) Use Enterprise Manager Database Control to create

Oracle Automatic Storage Management (ASM) is a volume manager and an Oracle specific file system that supports single instance Oracle Database and Oracle RAC configurations.. It

• Special file type asmvol used for creating ADVM Dynamic volumes. 8

These factors include, but not limited to database server operating system, file system or Oracle ASM, volume management such as LVM or VxVM, Oracle RAC or non-RAC, backup and

Elaine Farndale is Associate Professor in Human Resource Management at the School of Labor and Employment Relations at the Pennsylvania State University, and is also affiliated to