• No results found

Capacity Management for Oracle Database Machine Exadata v2

N/A
N/A
Protected

Academic year: 2021

Share "Capacity Management for Oracle Database Machine Exadata v2"

Copied!
34
0
0

Loading.... (view fulltext now)

Full text

(1)

Capacity Management for Oracle

Database Machine Exadata v2

Dr. Boris Zibitsker, BEZ Systems

NOCOUG 2010

(2)

© Boris Zibitsker Predictive Analytics for IT 3

About Author

Dr. Boris Zibitsker, Chairman, CTO, BEZ Systems.

Boris and his colleagues developed modeling technology supporting

multi-tier distributed systems based on Oracle RAC, Teradata, DB2, and SQL

Server. Boris consults, and speaks frequently on this topic at many

(3)

Outline

Overview of Oracle Database Machine V2 architecture

Capacity management challenges for DB Machine

Role of predictive analytics

How to justify strategic capacity planning decisions

How to justify tactical performance management actions

How to justify operational workload management actions

(4)

© Boris Zibitsker Predictive Analytics for IT 5

Node 8

Node 7

Node 6

Exadata Cell 2

Exadata Cell 1

Oracle DB Machine Exadata v2 Architecture

Massively Parallel Grid

Node 1

InfiniBand Switch/Network 40 Gb/sec

Flash Cache

Flash Cache

Node 2

Exadata Cell 14

Flash Cache

Node 3

Node 4

Node 5

RAC DB Server Grid

Exadata Storage Server Grid

Exadata Cell 3

Flash Cache

Exadata Cell Disk Storage Capacity:

• 12 x 600 GB SAS disks (7.2 TB/cell @ 100 TB total)

• 12 x 2TB SATA disks (24 TB/cell @ 336 TB total)

• 4 x 96 GB Sun Flash Cards (384GB/cell @ total 5TB)

Smart Scans

Hybrid Columnar Compression

Storage Indexes

(5)

Capacity Management Challenges for

DB Machine

Data collection

Workload characterization

How to set realistic SLO

Performance prediction

How to justify DB Machine

How to justify server consolidation

What will be the impact of new

application implementation

How to justify

– strategic capacity planning decisions

– tactical performance management actions

– operational workload management actions

(6)

© Boris Zibitsker Predictive Analytics for IT 7 RAC CPU Service

RAC Delay RAC CPU Wait

Response Time, Throughput and Cost Affect

Capacity Management Decisions

Exadata CPU Wait InfiniBand CPU Service

Exadata CPU Service Exadata Disk Wait Exadata Flash Cash RT

Exadata Disk Service

How do Smart Scan, Columnar

Compression, Flash Cache affect the

response time of each workload?

What should be done proactively to

support acceptable response time and

throughput for each workload?

(7)

Response Time Depends on Many Factors,

including Arrival Rate of Requests and Server

Utilization

Utilization of the server depends on

arrival rate of requests and time

required to serve an average request

Response time of the average

request depends on the service time

and server utilization

When arrival rate and server

utilization are low then time waiting

for service is low and the response

time is equal to service time

When workload activity is growing it

increases contention for the server

and significantly increases the

response time

How can you manage if you do not

know what will be an impact of

Arrival rate Res ponse T im e

Constant service time Variable queueing time

(8)

© Boris Zibitsker Predictive Analytics for IT 9

How to Decide When and What Should be

changed to Support SLO for each Workload?

Gut feelings

Rules of thumb

Benchmarks

Regression analysis

Analytical models

Arrival rate Res ponse T im e

SLO

?

(9)

Simplified Analytical Model of the Database Machine

1 2 n CPU Memory 1 2 n CPU Disk Flash Active Sessions Active Sessions Rejected Requests Rejected Requests

Exadata Storage Server Grid

Users Arriving Requests

Client

75 60 15 50 25 25

RAC DB Server Grid

Disk CPU CPU Max? Max? CPU Infiniband Switch Network Multiple Workloads With Different Profiles & SLOs • Up to 8 servers • 2 Intel quad-core • Xeons each • 14 storage servers • 100 TB raw SAS or

• 336 TB raw SATA disk

• 4 x 96GB PCI Express Flash

Cards per Exadata Server

(10)

©Boris Zibitsker Predictive Analytics for IT 11

Input and Output of Modeling and Optimization

Workloads

Hardware

Software

SLAs

Plan

Optimization

Options

Modeling

What if?

What is the

best decision?

OS

Oracle RAC

Exadata

(11)

Case Study

How to justify

capacity planning,

performance management

and

workload

management

decisions to support workloads’ SLOs effectively?

1. What will be the impact of workload growth,

and when will the current system be out of

capacity?

2.

What will be the impact of implementing

Oracle DB Machine v2?

3.

What will be the impact of hardware

upgrade?

4.

What will be the impact of performance

tuning?

5.

What will be the impact of changing

workload concurrency?

6.

What will be the impact of changing

workload priority?

7.

What will be the impact of new application?

8.

What will be the impact of limiting CPU

utilization?

AS2 DW

MKT

HR

AS1 OLTP

Sales

JVM1 JVM2 JVM3 JVM4 Sales HR MKT DBMS1 DBMS2 ARK ETL

(12)

© Boris Zibitsker Predictive Analytics for IT 13

How Will Workload and Database Size Growth

Affect Performance of Each Workload in Current

Multi-tier Distributed Environment

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 1 2 3 4 5 6 7 8 9 10 11 12

Predicted Relative Response Time

Sales Marketing HR Archive_2 Archive_1 ETL 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 1 2 3 4 5 6 7 8 9 10 11 12

Predicted Relative Throughput

Sales Marketing HR Archive_2 Archive_1 ETL

SLO

SLO

(13)

Exadata Cell 14

Exadata Cell 1

How Will Server Consolidation on Oracle DB

Machine Affect Each Workload’s Performance?

Node 1

Node 8

InfiniBand Switch/Network

MKT

HR

AS1 OLTP

Sales

JVM1 JVM2 JVM3 JVM4

Current Multi-tier Distributed

System with OLTP &

DataWarehouse workloads

Flash Cache

Flash Cache

AS2 DW

MKT

HR

AS1 OLTP

Sales

JVM1 JVM2 JVM3 JVM4 Sales HR MKT DBMS1 DBMS2 ARK ETL ARK ETL

(14)

© Boris Zibitsker Predictive Analytics for IT 15

Exadata Smart Scan Reduces Volume of

Data Returned to Server

(15)

How will Smart Scan Reduce Volume of Data

Returned to Each Server by each Workload?

Smart scan will have different

impact on volume of data send to

RAC server

We will use models to predict

how smart scan will affect the

response time and throughput for

each workload

Database size M b/S QL Re ques t

Traditional Scan Processing

Exadata Smart Scan Processing

OLTP (1 – 1.5)

DW (2 – 10 )

(16)

© Boris Zibitsker Predictive Analytics for IT 17

How will Exadata Hybrid Columnar

Compression Affect Performance of Each

Workload?

Data is stored by column and

then compressed

Factors affecting a

compression ratio:

– Table size

– Data cardinality

– Read/write ratio

(17)

How Exadata Hybrid Columnar Compression

Affects the # of I/Os, and CPU Utilization for

Each Workload

Compression ratios depends on

workload’s profile:

– OLTP (1 - - 6)

– DSS (5 -- 30)

– Archive (10 -- 50)

Regression analysis shows the

impact of Compression on

#IO/SQL Request and CPU

utilization

We will use models to predict

how compression will it affect the

response time and throughput of

each workload

Database size # IO s /S QL Reque s t Before Compression After Compression Ut il iza tion by Reque s t Before Compression After Compression

(18)

© Boris Zibitsker Predictive Analytics for IT 19

How Will Smart Flash Cache Size Will Affect

Performance of Each Workload?

Disk utilization as well as Flash Cache utilization

depends on IO Rate and I/O Service time :

Udisk = IO Rate * I/O Service Time

I/O Response time depends on I/O Service Time and

Disk or Flash Cache Utilization” :

I/O Service Time

I/O Response Time =

( 1 – Udisk )

A lot of value for OLTP workloads

Cost is a limitation

Disk IO Service Time is about 3

ms and it can handle up to 300

IOPS

Flash Cache Service Time is

small allowing 1mln iops and

scan 50gb/sec

Disk

Flash Cache min

Flash Cache max 300 IOPS #IOPS I/ O Res ponse T im e

DBA control:

Alter Table Customer Storage

(19)

Exadata I/O Resource Management in

Multi-Database Environment

How to optimize allocation of I/O resources / bandwidth to

different Databases and Workloads

• Database Sales 40% I/O resources

• OLTP 70% of I/O recourses

• ETL : 30% of I/O resources

Database Marketing: 35% of I/O resources

Interactive: 45% of I/O resources

Batch: 55% of I/O resources

(20)

© Boris Zibitsker Predictive Analytics for IT 21

What will be an Impact on Cost/Performance as

a Result of Moving Data between ASM Disk

Groups: Super Hot, Hot and Cold

ASM Disk Group Super Hot

based on Logical Flash Disks

ASM Disk Group Hot

(21)

Predicted Server Consolidation Impact on DB

Machine

0 0.1 0.2 0.3 0.4 0.5 1 2 3 4 5 6 7 8 9 10 11 12

Sales Response Time Components

0 1 2 3 4 1 2 3 4 5 6 7 8 9 10 11 12

Marketing Response Time Components

2 3 4 5

HR Response Time Components

0 0.1 0.2 0.3 0.4 0.5 0.6 1 2 3 4 5 6 7 8 9 10 11 12

Sales Response Time Components

0 0.5 1 1.5 2 2.5 3 1 2 3 4 5 6 7 8 9 10 11 12

Marketing Response Time Components

2 3 4 5

HR Response Time Components

!

(22)

© Boris Zibitsker Predictive Analytics for IT 23

What Will be the Impact of Increasing

Number of Exadata Cells?

0 5 10 15 20 25 30 1 2 3 4 5 6 7 8 9 10 11 12

Relative Response Time

Sales Marketing HR Archive_2 Archive_1 ETL 0 0.5 1 1.5 2 2.5 3 3.5 1 2 3 4 5 6 7 8 9 10 11 12

Relative Throughput

Sales Marketing HR Archive_2 Archive_1 ETL

(23)

Predicted Impact of Adding Exadata Cells on

Sales Response Time

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 1 2 3 4 5 6 7

Sales DB Machine Response Time

RAC Node Exadata Cell

(24)

© Boris Zibitsker Predictive Analytics for IT 25

How to Predict New Application Implementation

Impact

Production DB Machine

Stress Testing

Predict how new application will perform in production

environment Predict how new

application will affect performance of existing applications

New Application

Build Model of the

Test System Production SystemBuild Model of the

(25)

Predicted Impact of Database Tuning

0 10 20 30 40 50 60 1 2 3 4 5 6 7 8 9 10 11 12

Relative Response Time

Sales Marketing HR Archive_1 ETL Archive_2 2 3 4 5 6 7

Relative Throughput

Sales Marketing HR Archive_1 ETL

!

!

New Index?

Materialized view?

Data Compression?

(26)

© Boris Zibitsker Predictive Analytics for IT 27

Predicted Impact of Reducing ETL Concurrency

From 110 to 10 Starting P3

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1 2 3 4 5 6 7 8 9 10 11 12

Relative Response Time

Sales Marketing HR Archive_1 ETL 0 0.5 1 1.5 2 2.5 1 2 3 4 5 6 7 8 9 10 11 12

Relative Throughput

Sales Marketing HR Archive_1 ETL Archive_2 0 20 40 60 80 100 120 1 2 3 4 5 6 7 8 9 10 11 12

Oracle RAC CPU Utilization%

Marketing Archive_2 ETL Archive_1 Sales System_D1 0 10 20 30 40 1 2 3 4 5 6 7 8 9 10 11 12

Disk Utilization %

Marketing ETL Archive_1 Sales System_D1 HR

(27)

Limit Concurrency Reduces Contention but

Increase # of Requests Waiting for the Thread

# Req u e sts Threshold

Time

# Requests being processed = MPL # Requests waiting for Thread

# Req u e sts Threshold # Requests being processed = MPL # Requests waiting for Thread

# Req u e sts Unconstrained

Time

# Requests being processed = MPL Limit # Requests x = MPL

0

f

( dx

x

)

1

Approximation of the Multiprogramming (MPL) Distribution

0

)

(

rnal

AvgMPLInte

dx

x

f

P rob a b ili ty f (x )

(28)

© Boris Zibitsker Predictive Analytics for IT 29

Predicted Impact of Changing Workloads’

Priority

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1 2 3 4 5 6 7 8 9 10 11 12

Relative Response Time

Sales Marketing HR Archive_1 ETL 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1 2 3 4 5 6 7 8 9 10 11 12

Relative Response Time

Sales Marketing HR Archive_1 ETL

Reducing ETL Priority from 0.9

to 0.1 Starting P3

Increasing Marketing and Sales

Priority from 2 to 11 Starting P3

(29)

Predicted Impact of Limiting CPU Consumption

by ETL Workload to 50%?

0 10 20 30 40 50 1 2 3 4 5 6 7 8 9 10 11 12

Relative Response Time

Sales Marketing HR Archive_1 ETL Archive_2 0.5 1 1.5 2 2.5

Relative Throughput

Sales Marketing HR Archive_1 ETL

!

(30)

© Boris Zibitsker Predictive Analytics for IT 31

Workloads

SLOs, SLAs

Control

Decisions

Role of Modeling and Optimization

Modeling Optimization

System

Justify Capacity Planning,

Performance Management

and Operational Workload

Management Actions

(31)

Organizing Continuous Proactive

Performance Management Process

Actual

Expected

(32)

© Boris Zibitsker Predictive Analytics for IT 33

Conclusion

1.

Oracle Database Machine smart

scan, columnar data compression

and flash memory affect scalability

of OLTP and DW workloads

differently

2.

Capacity management decisions

based on gut feelings can be

misleading

3.

Rules of thumb do not take into

consideration specifics of your

environment

4.

Benchmarks produce the most

accurate results, but benchmarks

are expensive, time consuming and

not flexible

5.

We demonstrated a value of

predictive analytics in evaluating

options and justification of capacity

planning, performance management

and workload management

decisions

6.

Collaborative efforts between

business people and IT in workload

forecasting and evaluation of results

helps to understand how complex

system works

7.

Prediction results set realistic

expectations and enable

comparison of the actual results

with expected

8

. It reduces an uncertainty and

(33)

Thank you!

Contact Information

boris@bez.com

www.bez.com

(34)

© Boris Zibitsker Predictive Analytics for IT 35

References

• B. Zibitsker, A. Lupersolsky , IOUG 2009, “Modeling and Optimization in Virtualized Multi-tier Distributed Environment”

• B. Zibitsker, IOUG 2008. “Reducing Risk of Surprises in Changing Multi-tier Distributed Oracle RAC Environment”

• B. Zibitsker, DAMA 2007, “Enterprise Data Management and Optimization”

• B. Zibitsker, CMG 2008, 2009 “Hands on Workshop on Performance Prediction for Virtualized Multi-tier Distributed Environments”

• J. Buzen, B. Zibitsker, CMG 2006, “Challenges of Performance Prediction in Multi-tier Parallel Processing Environments”

• B, Zibitsker, G. Sigalov, A. Lupersolsky “Modeling and Proactive Performance Management of Multi-tier Distributed Environments”, International conference “Mathematical methods for analysis and

optimization of information and telecommunication networks"

• B. Zibitsker, C. Garry, CMG 2009, "Capacity Management Challenges for the Oracle Database Machine: Exadata v2“

• Oracle Enterprise Manager Grid Control documentation library at:

References

Related documents

Because of the extreme performance, high storage capacity, and unique compression capabilities delivered by the Exadata Database Machine, workloads that would require very

Exadata flash can be used directly as flash disks, but it is almost always configured as a flash cache in front of disk since caching provides flash level performance for much

Exadata Database Machine X2-8 Full Rack with High Capacity SAS Disks Up to 14 GB/second of uncompressed raw disk bandwidth. Up to 64 GB/second of uncompressed Flash data bandwidth

 Oracle Exadata Database Machine X4-2 (Oracle data sheet).  The Teradata Data

The Disaster Recovery Solution for the Oracle Exalogic Machine and Oracle Exadata Database Machine builds upon these well-established disaster protection solutions for Oracle

Note that while the Exadata Database Machine is specifically targeted at Oracle-only workloads running on the integrated Oracle servers (the Exadata Database Machine meets

As a proof-of-concept to evaluate the benefits of moving the Oracle E-Business Suite (EBS) database tier to the Oracle Exadata Database Machine, we ported a real world

• Smart Flash Log feature transparently uses Flash as a parallel write cache to disk controller cache. • Whichever write completes first wins (disk