• No results found

Oracle Data Warehousing Masterclass

N/A
N/A
Protected

Academic year: 2021

Share "Oracle Data Warehousing Masterclass"

Copied!
61
0
0

Loading.... (view fulltext now)

Full text

(1)

Oracle Data Warehousing Masterclass

Mark Rittman, Director, Rittman Mead Consulting

(2)

T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com

Who Am I?

• Oracle BI&W Architecture and Development Specialist

• Co-Founder of Rittman Mead Consulting

‣ Oracle BI&W Project Delivery Specialists

• 10+ years with Discoverer, OWB etc

• Oracle ACE Director, ACE of the Year 2005

• Writer for OTN and Oracle Magazine

• Longest-running Oracle blog

‣ http://www.rittmanmead.com/blog

• Chair of UKOUG BIRT SIG

• Co-Chair of ODTUG BI&DW SIG

• Speaker at IOUG and BIWA events

(3)

Rittman Mead Consulting

• Oracle BI&DW Project Specialists providing consulting, training and support

• Clients in the UK, USA, Europe, Middle-East

• Voted UKOUG BI Partner of the Year 2008

• Exhibitors at UKOUG 2008, Stand 90

(4)

T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com

So What Is A Data Warehouse?

• Store of historical information optimized for running queries

• Follows common design patterns: Inmon or Kimball

• Subject orientated, integrated, non-volatile and time-variant

‣ but there can be exceptions!

• High volumes of data

• Tuned and optimized for queries

(5)

Database Key Features for Data Warehousing

• Partitioning

• Parallel Query & DML

• Segment Compression

• Bitmap Indexes

• Materialized Views

• Star Transformations

• OLAP Option

• Data Mining Option

• Oracle Warehouse Builder

• Oracle Warehouse Builder

(6)

T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com

New Features in Oracle Database 11g and 10gR2

• More partitioning options

• Advanced Compression

• Embedding of Oracle Warehouse Builder

• Cube Organized Materialized Views

• ADDM, AWR, ASH and the OEM Performance Views and Advisors

Just Announced at Oracle Open World 2008 : Exadata Storage Server

(7)

Data Warehouses Are Often Very Large

• Data Warehouse databases are often very large

‣ Usually >100Gb, often >10Tb, now >1 Petabyte is not unknown

• Large databases require special handling

‣ Because they take a long time to query

‣ Because they take a long time to backup

‣ Because they take up a lot of disk space

‣ Because they take a long time to load with data

(8)

T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com

Oracle’s Support for Very Large Databases (VLDB)

• The Partitioning Option

• Segment Compression

• The Advanced Compression Option

• Parallel DML

• Automatic Storage Management

(9)

Oracle Partitioning Option

• Splits tables into lots of smaller tables

• Partitioned table still behaves as one table

• Partitions can be backed-up, made read-only, moved on to archive disks

• Queries that only require a subset of data will

only consider the relevant table partitions (“partition pruning”)

• Queries that only require a subset of data will only

join the relevant partitions (“partition-wise joins”)

• New data can be prepared offline and swapped in to

an empty partitiion (“partition exchange”)

(10)

T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com

Partitioning Methods Up to Oracle 10g

• Range Partitioning

‣ Rolling window operations

• Hash Partitioning

‣ Increased parallelism

• List Partitioning

‣ Ease of maintenance operations

• Composite

‣ Range-Hash

‣ Range-List

(11)

New Partitioning Features in Oracle Database 11g

• Interval Partitioning

‣ Automatic creation of range-based partitions

• REF Partitioning

‣ Partition detail table based on the master-table key

• Virtual-Column Based Partitioning

(12)

T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com

Interval Partitioning

• Interval Partitions

‣ Automatically creates partitions as they are needed

‣ Saves scripting the creation of new partitions each day/week/month

(13)

Interval Partitioning Details

• Specified by using the INTERVAL clause in the CREATE TABLE statement

• Technically an extension to range partitioning, requires at least one range partition,

range key value determines the range high point (“transition point”)

• Database automatically creates interval partitions beyond the transition point

• Local indexes can also be created along with new partitions

• Partitioning key can only be a single column, and either DATE or NUMBER datatype

STORE IN clause lets you specify which tablespaces are used (round-robin)

• Partition names are system generated

(14)

T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com

Interval Partitioning Example

• Partition P0 is the transition point, all interval partitions will be created from this date

onwards (> 1st Feb 2006)

CREATE TABLE sales

( prod_id NUMBER(6) , cust_id NUMBER , time_id DATE , channel_id CHAR(1) , promo_id NUMBER(6) , quantity_sold NUMBER(3) , amount_sold NUMBER(10,2) )

PARTITION BY RANGE (time_id)

INTERVAL(NUMTOYMINTERVAL(1, 'MONTH'))

( PARTITION p0 VALUES LESS THAN (TO_DATE('1-2-2006', 'DD-MM-YYYY')) );

(15)

Interval Partitioning Segment Creation

CREATE TABLE sales

( prod_id NUMBER(6) , cust_id NUMBER , time_id DATE , channel_id CHAR(1) , promo_id NUMBER(6) , quantity_sold NUMBER(3) , amount_sold NUMBER(10,2) )

PARTITION BY RANGE (time_id)

INTERVAL(NUMTOYMINTERVAL(1, 'MONTH'))

(16)

T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com

Interval Partitioning Segment Creation

INSERT INTO sales (time_id, prod_id, quantity_sold, .... VALUES TO_DATE(‘01-03-2006’), 146, 17, ....); 16

(17)

Demonstration

Interval Partitioning

(18)

T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com

REF Partitioning

• Partition a table based on a column in another table

• Partition an detail table by the master table primary key

• Useful, but actually not relevant for star schemas

‣ The wrong way around, we want to partition the dimensions by the fact keys

(19)

REF Partitioning Example

CREATE TABLE orders

( order_id NUMBER(12), order_date DATE,

order_mode VARCHAR2(8), order_total NUMBER(8,2),

CONSTRAINT orders_pk PRIMARY KEY(order_id) )

PARTITION BY RANGE(order_date)

( PARTITION Q1_2005 VALUES LESS THAN (TO_DATE('01-APR-2005','DD-MON-YYYY')), PARTITION Q2_2005 VALUES LESS THAN (TO_DATE('01-JUL-2005','DD-MON-YYYY')), PARTITION Q3_2005 VALUES LESS THAN (TO_DATE('01-OCT-2005','DD-MON-YYYY')), PARTITION Q4_2005 VALUES LESS THAN (TO_DATE('01-JAN-2006','DD-MON-YYYY')) );

CREATE TABLE order_items

( order_id NUMBER(12) NOT NULL, line_item_id NUMBER(3) NOT NULL, product_id NUMBER(6) NOT NULL, quantity NUMBER(8),

CONSTRAINT order_items_fk

FOREIGN KEY(order_id) REFERENCES orders(order_id) )

(20)

T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com

Virtual Column-Based Partitioning

• A new feature in Oracle Database 11g is “Virtual Columns”

• Allows you to create additional table columns that are based on expressions

‣ Purely virtual, metadata only

‣ Can have statistics gathered for them

• This column can then be used as the partition key for a partitioning clause

CREATE TABLE accounts

( acc_no NUMBER(12), acc_name VARCHAR2(20),

acc_branch NUMBER(2) GENERATED ALWAYS AS TO_NUMBER (SUBSTR(TO_CHAR(ACC_NO),1,2))

)

PARTITION BY LIST (acc_branch)

(21)

Segment Compression

• Segments (tables) can be compressed

• Compression can be specified at the partition level

• Typical compression ratios are 3:1 to 5:1

‣ Particularly suited to DW schemas as lots of redundant/repeated data

• Has two main benefits:

‣ Reduces the amount of disk space required for the data warehouse

‣ Packs more rows of data into a typical block - good for full table scans

• Historically, only certain operations load data in a compressed form

Compressed tables are mainly used for read-only operations

‣ Update, inserts and deletes are supported, but rows will be permanently decompressed

INSERT */+APPEND */

CREATE TABLE ... AS SELECT ALTER TABLE ... MOVE

(22)

T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com

New In Oracle Database 11g : Advanced Compression

• Oracle Database 11g introduces the Advanced Compression Option

• Provides compression for the following data types

‣ OLTP relational data (supports INSERT, UPDATE, DELETE and retains compression)

‣ Unstructured data through SecureFiles (replacement for LOBs)

‣ Backups

• OLTP performance is unaffected as the blocks

can be read in their compressed form

‣ Performance is actually likely to improve as more data per block (reduced I/O)

(23)

Enabling OLTP Compression in Tables

• Oracle 11g extends the COMPRESS clause to enable OLTP compression

• DW compression is enabled with COMPRESS FOR DIRECT_LOAD OPERATIONS

• OLTP compression can now be enabled using COMPRESS FOR ALL OPERATIONS

CREATE TABLE emp ( emp_id NUMBER

, first_name VARCHAR2(128) , last_name VARCHAR2(128)

) COMPRESS FOR ALL OPERATIONS; CREATE TABLE emp (

emp_id NUMBER

, first_name VARCHAR2(128) , last_name VARCHAR2(128)

(24)

T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com

How OLTP Compression Works

• Table is initially uncompressed when loaded

• When PCTFREE level reached, compression is triggered

• New inserts are added uncompressed

• When PCTFREE level reached again, compression is triggered again

• DML still causes block uncompression

but data is recompressed when

PCTFREE is reached again

Overhead Free Space Uncompressed Compressed Inserts are uncompressed

Block usage reaches PCTFREE – triggers Compression

Inserts are again

uncompressed

Block usage reaches PCTFREE – triggers Compression

(25)

Demonstration

(26)

T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com

ETL Enhancements in Oracle 10g and 11g

• Incorporation of Oracle Warehouse Builder into 11g Installer

• DML Error Logging (10gR2+)

• Previous releases introduced SQL ETL functions

‣ Merge

‣ Multi-Table Insert

‣ Table Functions

‣ etc

(27)

DML Error Logging

• Prior to Oracle Database 10gR2, loading data containing errors was handled two ways

‣ Process the data prior to loading strip out errors, then load

‣ Load the data using PL/SQL and the SAVE EXCEPTIONS clause

• DML Error Logging (Oracle 10gR2+) provides a third option

‣ An Error Logging table is created, data is INSERTed using the LOG ERRORS clause

‣ Rows causing errors are then loaded into the error log

DBMS_ERRLOG.CREATE_ERROR_LOG (

dml_table_name IN VARCHAR2,

err_log_table_name IN VARCHAR2 := NULL, err_log_table_owner IN VARCHAR2 := NULL, err_log_table_space IN VARCHAR2 := NULL, skip_unsupported IN BOOLEAN := FALSE); LOG ERRORS [INTO [schema.]table]

[ (simple_expression) ]

(28)

T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com

DML Error Logging

• DML Error Logging will catch the following load errors:

‣ Column values that are too large

‣ Constraint violations (NOT NULL, unique, referential, and check constraints), except in certain circumstances detailed below

‣ Errors raised during trigger execution

‣ Errors resulting from type conversion between a column in a subquery and the corresponding column of the table

‣ Partition mapping errors

• DML Error Logging will NOT catch the following errors:

‣ Violated deferred constraints

‣ Out-of-space errors

‣ Any direct-path INSERT operation (INSERT or MERGE) that raises a unique constraint or index violation

‣ Any UPDATE operation (UPDATE or MERGE) that raises a unique constraint or index violation

(29)

DML Error Logging Example

SQL> BEGIN 2 DBMS_ERRLOG.CREATE_ERROR_LOG('SALES_TARGET'); 3 END; 4 / SQL> INSERT /*+ APPEND */ 2 INTO sales_target 3 SELECT * 4 FROM sales_src 5 LOG ERRORS

6 REJECT LIMIT UNLIMITED 7 ; 918834 rows created. SQL> SELECT count(*) 2 FROM err$_sales_target 3 ; COUNT(*) --- 9

(30)

T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com

DML Error Logging Example

SQL> COLUMN ora_err_mesg$ FORMAT A50 SQL> SELECT ora_err_number$ 2 , ora_err_mesg$ 3 FROM err$_sales_target 4 ; ORA_ERR_NUMBER$ ORA_ERR_MESG$ ---

---2290 ORA-02290: check constraint (ERRLOG_TEST.AMOUNT_SOLD_CHK) violated 2290 ORA-02290: check constraint (ERRLOG_TEST.AMOUNT_SOLD_CHK) violated 2290 ORA-02290: check constraint (ERRLOG_TEST.AMOUNT_SOLD_CHK) violated

1400 ORA-01400: cannot insert NULL into ("ERRLOG_TEST"."SALES_TARGET"."PROMO_ID") Elapsed: 00:00:00.06

(31)

Demonstration

DML Error Logging

(32)

T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com

Incorporation of OWB Into 11g Database Install

• Oracle’s Data Warehouse design & build tool

• Now at version 11gR1, 11gR2 due 2008/2009

• From 11gR1, OWB is automatically

installed into RDBMS database home

• Repository schema automatically created

• Repository now divided into “workspaces”

• Other functionality much the same

• 11gR2 release is next major revision

‣ Support for ODI code templates

‣ Heterogeneous support

‣ OBIEE support

‣ CDC support

(33)

Oracle Database Machine and Oracle Exadata Storage Server

• New hardware initiative in association with HP

• Dedicated storage server, filters data at disk level

• Balanced disk and network capacity

• New enhancements delivered through

11.1.0.7 patchset

• Exadata Storage Server available

standalone, or as part of

HP Oracle Database Machine appliance

(34)

T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com

Exadata “Smart Scan” Processing

• Exadata Storage Server “Cells” offload

the predicate evaluation from the DB server

‣ Only returns relevant rows

‣ Only returns relevant columns

‣ Join filtering

‣ Incremental backup filtering

• Data reduction to around 10%

• Reduction in CPU load

• Filters are pushed to the disk unit

(35)

Oracle’s In-Database OLAP Engine : The OLAP Option

• An option to Oracle Database Enterprise Edition, now at version 11g

• A alternative summary management solution for SQL-based

business intelligence applications

‣ An alternative to table-based materialized views, offering improved query performance and

fast, incremental updates

• A fully-featured multidimensional OLAP server

‣ Excellent query performance for ad-hoc and unpredictable queries

‣ Enhances the analytic content of business intelligence applications through new calculation capabilities

(36)

T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com

Oracle OLAP Storage Characteristics

• Unlike relational storage, OLAP servers store data in multi-dimensional arrays

‣ Structured around business rules and business concepts

‣ Dimensions, facts, hierarchies, aggregation rules etc

• Can be faster to access “cells” of data for ad-hoc analysis

• Summary data stored in the cube, along with detail

• Offset addressing, no need for joins

• OLAP engines provide advanced analytics

‣ Time-series analysis

‣ Non-additive calculations

‣ Financial calculations

‣ Statistical forcasting, Models

(37)

Traditional Way of Accessing a Data Warehouse

• Query tools access star schema

stored in an Oracle data warehouse

• Most queries at a summary level

• Summary queries against

detail-level data can be expensive to process

‣ Aggregation

‣ Calculations

• Hence the need for summarizations,

(38)

T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com

Materialized Views and Automatic Query Rewrite

• Most DW customers use

materialized views to improve summary performance

• Define appropriate summaries

based on query patterns

• Each summary is typically

defined at a particular grain

‣ Sales by month and category

‣ Profits by city, quarter and channel

• The CBO automatically rewrites queries

to use materialized views as appropriate

(39)

The Materialized View Challenge

• Creating materialized views to support

ad-hoc query patterns is challenging

‣ Managing multiple MV objects

‣ Ensuring they rewrite properly

‣ Ensuring they refresh properly

• Users expect excellent query response

time across any summary

• Potentially many MVs to manage

• Practical limitations on the number of

MVs you can manage, the CBO can consider and you can built in time

(40)

T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com

New in 11g : Cube Organized Materialized Views

• Summaries can now be stored

in a single Oracle OLAP multidimensional cube

• A single cube can contain the

equivalent of thousands of summarycombinations

• The Oracle 11g CBO treats

OLAP cubes as MVs and rewrites queries to access cubes

transparently

• Cubes refreshed using standard

MV procedures

(41)

Positioned as the Oracle Data Warehouse Enhancer

• Transparent integration with SQL

• Swap-in replacement for MVs

• OLAP dimension, fact data also accessible directly via SQL views if rewrite is

not appropriate

‣ Calculations in cube

not present in relational source data

‣ CBO is aware of

nature of SQL OLAP views, joins data

(42)

T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com

Demonstration

Cube Organized Materialized Views

(43)

Optimizing and Tuning the Data Warehouse

• Prior to Oracle 10g, data warehouse tuning was largely a manual process

‣ 10046 trace files for session-level tuning, statspack for system-level tuning

• Oracle Database 10g and 11g come with a number of “Advisors”

• Provide insights into database performance, bottlenecks, potential resolutions

• Automatic recording of diagnostic data

• Require the Tuning and Diagnostics Pack

• Several “in-process” Advisors

‣ Automatic Workload Repository

‣ Automatic Session History

‣ Automatic Database Diagnostic Monitor

• A number of pages on Oracle DB Console

‣ Performance page

‣ Advisor Central Intelligent Infrastructure Application & SQL Management System Resource Management Space Management Backup & Recovery

Management Storage Management

Database Management

(44)

T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com

Automatic Workload Repository

• Successor to Statpack (though not free)

• Collects performance-related statistics on a regular (1-hour default) basis

• Monitors activity at the system level

• Periodically flush coarser-grain information to disk - in a self-managed tablespace,

readily available & real-time accessible when needed

• On by default - flush to disk every 30 min, keep for 7 days

• New Oracle Background – MMON, SYSAUX is the home tablespace

• Scripts:

‣ Awrrpt.sql – Default AWR Report

‣ Ashrpt.sql – Report on ASH Sampling

‣ Awrsqrpt.sql – Detailed report on SQL

(45)
(46)

T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com

Active Session History

• Circular buffer that records statistics on active sessions

• Used for analyzing wait events for individual sessions

• Alternative to 10046 Extended SQL Tracing

• V$SESSION_WAIT with a bit of historical data

(47)

Automatic Database Diagnostic Monitor (ADDM)

• Performance expert in a box; based around “response-time” approach

• Provides database-wide performance diagnostic, including RAC

• Automatically runs at every capture of workload repository snapshot

• Provides impact and benefit analysis

• Provides Information

vs. raw data

• Integrates all components

using the Time Model

• Uses AWR snapshots

• Uses MMON to generate

reports every 30 minutes

• Report format is very similar

to STATSPACK, only better

Intelligent Infrastructure Application & SQL Management System Resource Management Space Management Backup & Recovery

Management Storage Management

Database Management

(48)

T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com

Oracle DB Console Performance Features

Performance tab provides an

overview of server and database usage

• Host activity (CPU usage on Windows,

Run Queue on Unix)

• Average Active Sessions (from ASH)

• Throughput, I/O, Parallel Execution

and Services

• Links to other advisors, tasks

(49)

Average Active Sessions

• Provides detail on the waits within the selected wait class

(50)

T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com

View SQL and Execution Plan

• View SQL, Execution Plan for highlighted SQL statements

• View as diagram or as text

(51)

Create and View ADDM Reports

• Create and view ADDM reports

(52)

T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com

View Throughput, I/O, Parallel Query, Services

• View instance throughput, correlate with server and session load

• View disk I/O, degree of parallelism for queries

(53)

Tracing Individual Sessions

• The DB Console reports shown previously are system-wide, equivalent to statspack

• Sometimes you would want to “trace” an individual session, using ASH

• Accomplished through setting a CLIENT_ID for the individual session

• Session can then be spotted in the

“Top Consumers” report

• ASH Report can be generated

for this CLIENT_ID

exec dbms_session.set_identifier('SQL_TEST') select sum(sales)

from customers, sales s where ....

(54)

T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com

Generating Session Level ASH Report

• The Performance page has a Generate ASH Report button next to the

Average Active Sessions graph

• ASH Report can then be filtered by the CLIENT_ID

• Review session-level waits, top SQL etc to identify issues

(55)

The SQL Access Advisor

• Part of the 11g Tuning Pack (add-on to Enterprise Edition)

• Recommends use of indexes, materialized views and partitions

(56)

T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com

The SQL Performance Analyzer

• Provides an environment for replaying a set of SQL statements in initial and

changed environments, reports on impact of changes

• Useful for testing changes to SQL statements, DB parameters, indexes etc

(57)

SQL Tuning Sets

• Captured workloads that can be used as the input for tuning exercises

• Can grab SQL from the SQL cache, or listen over a defined period for SQL

• Stored in the database as a database object

(58)

T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com

Demonstration

Database Control Advisors and Performance Features

(59)

Over The Past Two Hours, We Have Covered...

• Introduction to Oracle 11g Data Warehousing

• New Extraction, Transformation & Load features in 10g and 11g

• New Partitioning and Compression features

• Summarization and Oracle OLAP

(60)

T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com

And Finally...

• Thank you all for attending

• I hope the two hours have been of interest to you

• Please remember to complete the course evaluations

• Leave your contact details with me to forward on the slides

• Have a safe journey home

• Goodbye!

(61)

Oracle Data Warehousing Masterclass

Mark Rittman, Director, Rittman Mead Consulting

References

Related documents