Oracle Data Warehousing Masterclass
Mark Rittman, Director, Rittman Mead Consulting
T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com
Who Am I?
• Oracle BI&W Architecture and Development Specialist
• Co-Founder of Rittman Mead Consulting
‣ Oracle BI&W Project Delivery Specialists
• 10+ years with Discoverer, OWB etc
• Oracle ACE Director, ACE of the Year 2005
• Writer for OTN and Oracle Magazine
• Longest-running Oracle blog
‣ http://www.rittmanmead.com/blog
• Chair of UKOUG BIRT SIG
• Co-Chair of ODTUG BI&DW SIG
• Speaker at IOUG and BIWA events
Rittman Mead Consulting
• Oracle BI&DW Project Specialists providing consulting, training and support
• Clients in the UK, USA, Europe, Middle-East
• Voted UKOUG BI Partner of the Year 2008
• Exhibitors at UKOUG 2008, Stand 90
T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com
So What Is A Data Warehouse?
• Store of historical information optimized for running queries
• Follows common design patterns: Inmon or Kimball
• Subject orientated, integrated, non-volatile and time-variant
‣ but there can be exceptions!
• High volumes of data
• Tuned and optimized for queries
Database Key Features for Data Warehousing
• Partitioning
• Parallel Query & DML
• Segment Compression
• Bitmap Indexes
• Materialized Views
• Star Transformations
• OLAP Option
• Data Mining Option
• Oracle Warehouse Builder
• Oracle Warehouse Builder
T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com
New Features in Oracle Database 11g and 10gR2
• More partitioning options
• Advanced Compression
• Embedding of Oracle Warehouse Builder
• Cube Organized Materialized Views
• ADDM, AWR, ASH and the OEM Performance Views and Advisors
• Just Announced at Oracle Open World 2008 : Exadata Storage Server
Data Warehouses Are Often Very Large
• Data Warehouse databases are often very large
‣ Usually >100Gb, often >10Tb, now >1 Petabyte is not unknown
• Large databases require special handling
‣ Because they take a long time to query
‣ Because they take a long time to backup
‣ Because they take up a lot of disk space
‣ Because they take a long time to load with data
T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com
Oracle’s Support for Very Large Databases (VLDB)
• The Partitioning Option
• Segment Compression
• The Advanced Compression Option
• Parallel DML
• Automatic Storage Management
Oracle Partitioning Option
• Splits tables into lots of smaller tables
• Partitioned table still behaves as one table
• Partitions can be backed-up, made read-only, moved on to archive disks
• Queries that only require a subset of data will
only consider the relevant table partitions (“partition pruning”)
• Queries that only require a subset of data will only
join the relevant partitions (“partition-wise joins”)
• New data can be prepared offline and swapped in to
an empty partitiion (“partition exchange”)
T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com
Partitioning Methods Up to Oracle 10g
• Range Partitioning
‣ Rolling window operations
• Hash Partitioning
‣ Increased parallelism
• List Partitioning
‣ Ease of maintenance operations
• Composite
‣ Range-Hash
‣ Range-List
New Partitioning Features in Oracle Database 11g
• Interval Partitioning
‣ Automatic creation of range-based partitions
• REF Partitioning
‣ Partition detail table based on the master-table key
• Virtual-Column Based Partitioning
T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com
Interval Partitioning
• Interval Partitions
‣ Automatically creates partitions as they are needed
‣ Saves scripting the creation of new partitions each day/week/month
Interval Partitioning Details
• Specified by using the INTERVAL clause in the CREATE TABLE statement
• Technically an extension to range partitioning, requires at least one range partition,
range key value determines the range high point (“transition point”)
• Database automatically creates interval partitions beyond the transition point
• Local indexes can also be created along with new partitions
• Partitioning key can only be a single column, and either DATE or NUMBER datatype
• STORE IN clause lets you specify which tablespaces are used (round-robin)
• Partition names are system generated
T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com
Interval Partitioning Example
• Partition P0 is the transition point, all interval partitions will be created from this date
onwards (> 1st Feb 2006)
CREATE TABLE sales
( prod_id NUMBER(6) , cust_id NUMBER , time_id DATE , channel_id CHAR(1) , promo_id NUMBER(6) , quantity_sold NUMBER(3) , amount_sold NUMBER(10,2) )
PARTITION BY RANGE (time_id)
INTERVAL(NUMTOYMINTERVAL(1, 'MONTH'))
( PARTITION p0 VALUES LESS THAN (TO_DATE('1-2-2006', 'DD-MM-YYYY')) );
Interval Partitioning Segment Creation
CREATE TABLE sales
( prod_id NUMBER(6) , cust_id NUMBER , time_id DATE , channel_id CHAR(1) , promo_id NUMBER(6) , quantity_sold NUMBER(3) , amount_sold NUMBER(10,2) )
PARTITION BY RANGE (time_id)
INTERVAL(NUMTOYMINTERVAL(1, 'MONTH'))
T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com
Interval Partitioning Segment Creation
INSERT INTO sales (time_id, prod_id, quantity_sold, .... VALUES TO_DATE(‘01-03-2006’), 146, 17, ....); 16
Demonstration
Interval Partitioning
T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com
REF Partitioning
• Partition a table based on a column in another table
• Partition an detail table by the master table primary key
• Useful, but actually not relevant for star schemas
‣ The wrong way around, we want to partition the dimensions by the fact keys
REF Partitioning Example
CREATE TABLE orders
( order_id NUMBER(12), order_date DATE,
order_mode VARCHAR2(8), order_total NUMBER(8,2),
CONSTRAINT orders_pk PRIMARY KEY(order_id) )
PARTITION BY RANGE(order_date)
( PARTITION Q1_2005 VALUES LESS THAN (TO_DATE('01-APR-2005','DD-MON-YYYY')), PARTITION Q2_2005 VALUES LESS THAN (TO_DATE('01-JUL-2005','DD-MON-YYYY')), PARTITION Q3_2005 VALUES LESS THAN (TO_DATE('01-OCT-2005','DD-MON-YYYY')), PARTITION Q4_2005 VALUES LESS THAN (TO_DATE('01-JAN-2006','DD-MON-YYYY')) );
CREATE TABLE order_items
( order_id NUMBER(12) NOT NULL, line_item_id NUMBER(3) NOT NULL, product_id NUMBER(6) NOT NULL, quantity NUMBER(8),
CONSTRAINT order_items_fk
FOREIGN KEY(order_id) REFERENCES orders(order_id) )
T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com
Virtual Column-Based Partitioning
• A new feature in Oracle Database 11g is “Virtual Columns”
• Allows you to create additional table columns that are based on expressions
‣ Purely virtual, metadata only
‣ Can have statistics gathered for them
• This column can then be used as the partition key for a partitioning clause
CREATE TABLE accounts
( acc_no NUMBER(12), acc_name VARCHAR2(20),
acc_branch NUMBER(2) GENERATED ALWAYS AS TO_NUMBER (SUBSTR(TO_CHAR(ACC_NO),1,2))
)
PARTITION BY LIST (acc_branch)
Segment Compression
• Segments (tables) can be compressed
• Compression can be specified at the partition level
• Typical compression ratios are 3:1 to 5:1
‣ Particularly suited to DW schemas as lots of redundant/repeated data
• Has two main benefits:
‣ Reduces the amount of disk space required for the data warehouse
‣ Packs more rows of data into a typical block - good for full table scans
• Historically, only certain operations load data in a compressed form
Compressed tables are mainly used for read-only operations
‣ Update, inserts and deletes are supported, but rows will be permanently decompressed
INSERT */+APPEND */
CREATE TABLE ... AS SELECT ALTER TABLE ... MOVE
T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com
New In Oracle Database 11g : Advanced Compression
• Oracle Database 11g introduces the Advanced Compression Option
• Provides compression for the following data types
‣ OLTP relational data (supports INSERT, UPDATE, DELETE and retains compression)
‣ Unstructured data through SecureFiles (replacement for LOBs)
‣ Backups
• OLTP performance is unaffected as the blocks
can be read in their compressed form
‣ Performance is actually likely to improve as more data per block (reduced I/O)
Enabling OLTP Compression in Tables
• Oracle 11g extends the COMPRESS clause to enable OLTP compression
• DW compression is enabled with COMPRESS FOR DIRECT_LOAD OPERATIONS
• OLTP compression can now be enabled using COMPRESS FOR ALL OPERATIONS
CREATE TABLE emp ( emp_id NUMBER
, first_name VARCHAR2(128) , last_name VARCHAR2(128)
) COMPRESS FOR ALL OPERATIONS; CREATE TABLE emp (
emp_id NUMBER
, first_name VARCHAR2(128) , last_name VARCHAR2(128)
T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com
How OLTP Compression Works
• Table is initially uncompressed when loaded
• When PCTFREE level reached, compression is triggered
• New inserts are added uncompressed
• When PCTFREE level reached again, compression is triggered again
• DML still causes block uncompression
but data is recompressed when
PCTFREE is reached again
Overhead Free Space Uncompressed Compressed Inserts are uncompressed
Block usage reaches PCTFREE – triggers Compression
Inserts are again
uncompressed
Block usage reaches PCTFREE – triggers Compression
Demonstration
T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com
ETL Enhancements in Oracle 10g and 11g
• Incorporation of Oracle Warehouse Builder into 11g Installer
• DML Error Logging (10gR2+)
• Previous releases introduced SQL ETL functions
‣ Merge
‣ Multi-Table Insert
‣ Table Functions
‣ etc
DML Error Logging
• Prior to Oracle Database 10gR2, loading data containing errors was handled two ways
‣ Process the data prior to loading strip out errors, then load
‣ Load the data using PL/SQL and the SAVE EXCEPTIONS clause
• DML Error Logging (Oracle 10gR2+) provides a third option
‣ An Error Logging table is created, data is INSERTed using the LOG ERRORS clause
‣ Rows causing errors are then loaded into the error log
DBMS_ERRLOG.CREATE_ERROR_LOG (
dml_table_name IN VARCHAR2,
err_log_table_name IN VARCHAR2 := NULL, err_log_table_owner IN VARCHAR2 := NULL, err_log_table_space IN VARCHAR2 := NULL, skip_unsupported IN BOOLEAN := FALSE); LOG ERRORS [INTO [schema.]table]
[ (simple_expression) ]
T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com
DML Error Logging
• DML Error Logging will catch the following load errors:
‣ Column values that are too large
‣ Constraint violations (NOT NULL, unique, referential, and check constraints), except in certain circumstances detailed below
‣ Errors raised during trigger execution
‣ Errors resulting from type conversion between a column in a subquery and the corresponding column of the table
‣ Partition mapping errors
• DML Error Logging will NOT catch the following errors:
‣ Violated deferred constraints
‣ Out-of-space errors
‣ Any direct-path INSERT operation (INSERT or MERGE) that raises a unique constraint or index violation
‣ Any UPDATE operation (UPDATE or MERGE) that raises a unique constraint or index violation
DML Error Logging Example
SQL> BEGIN 2 DBMS_ERRLOG.CREATE_ERROR_LOG('SALES_TARGET'); 3 END; 4 / SQL> INSERT /*+ APPEND */ 2 INTO sales_target 3 SELECT * 4 FROM sales_src 5 LOG ERRORS6 REJECT LIMIT UNLIMITED 7 ; 918834 rows created. SQL> SELECT count(*) 2 FROM err$_sales_target 3 ; COUNT(*) --- 9
T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com
DML Error Logging Example
SQL> COLUMN ora_err_mesg$ FORMAT A50 SQL> SELECT ora_err_number$ 2 , ora_err_mesg$ 3 FROM err$_sales_target 4 ; ORA_ERR_NUMBER$ ORA_ERR_MESG$ ---
---2290 ORA-02290: check constraint (ERRLOG_TEST.AMOUNT_SOLD_CHK) violated 2290 ORA-02290: check constraint (ERRLOG_TEST.AMOUNT_SOLD_CHK) violated 2290 ORA-02290: check constraint (ERRLOG_TEST.AMOUNT_SOLD_CHK) violated
1400 ORA-01400: cannot insert NULL into ("ERRLOG_TEST"."SALES_TARGET"."PROMO_ID") Elapsed: 00:00:00.06
Demonstration
DML Error Logging
T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com
Incorporation of OWB Into 11g Database Install
• Oracle’s Data Warehouse design & build tool
• Now at version 11gR1, 11gR2 due 2008/2009
• From 11gR1, OWB is automatically
installed into RDBMS database home
• Repository schema automatically created
• Repository now divided into “workspaces”
• Other functionality much the same
• 11gR2 release is next major revision
‣ Support for ODI code templates
‣ Heterogeneous support
‣ OBIEE support
‣ CDC support
Oracle Database Machine and Oracle Exadata Storage Server
• New hardware initiative in association with HP
• Dedicated storage server, filters data at disk level
• Balanced disk and network capacity
• New enhancements delivered through
11.1.0.7 patchset
• Exadata Storage Server available
standalone, or as part of
HP Oracle Database Machine appliance
T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com
Exadata “Smart Scan” Processing
• Exadata Storage Server “Cells” offload
the predicate evaluation from the DB server
‣ Only returns relevant rows
‣ Only returns relevant columns
‣ Join filtering
‣ Incremental backup filtering
• Data reduction to around 10%
• Reduction in CPU load
• Filters are pushed to the disk unit
Oracle’s In-Database OLAP Engine : The OLAP Option
• An option to Oracle Database Enterprise Edition, now at version 11g
• A alternative summary management solution for SQL-based
business intelligence applications
‣ An alternative to table-based materialized views, offering improved query performance and
fast, incremental updates
• A fully-featured multidimensional OLAP server
‣ Excellent query performance for ad-hoc and unpredictable queries
‣ Enhances the analytic content of business intelligence applications through new calculation capabilities
T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com
Oracle OLAP Storage Characteristics
• Unlike relational storage, OLAP servers store data in multi-dimensional arrays
‣ Structured around business rules and business concepts
‣ Dimensions, facts, hierarchies, aggregation rules etc
• Can be faster to access “cells” of data for ad-hoc analysis
• Summary data stored in the cube, along with detail
• Offset addressing, no need for joins
• OLAP engines provide advanced analytics
‣ Time-series analysis
‣ Non-additive calculations
‣ Financial calculations
‣ Statistical forcasting, Models
Traditional Way of Accessing a Data Warehouse
• Query tools access star schema
stored in an Oracle data warehouse
• Most queries at a summary level
• Summary queries against
detail-level data can be expensive to process
‣ Aggregation
‣ Calculations
• Hence the need for summarizations,
T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com
Materialized Views and Automatic Query Rewrite
• Most DW customers use
materialized views to improve summary performance
• Define appropriate summaries
based on query patterns
• Each summary is typically
defined at a particular grain
‣ Sales by month and category
‣ Profits by city, quarter and channel
• The CBO automatically rewrites queries
to use materialized views as appropriate
The Materialized View Challenge
• Creating materialized views to support
ad-hoc query patterns is challenging
‣ Managing multiple MV objects
‣ Ensuring they rewrite properly
‣ Ensuring they refresh properly
• Users expect excellent query response
time across any summary
• Potentially many MVs to manage
• Practical limitations on the number of
MVs you can manage, the CBO can consider and you can built in time
T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com
New in 11g : Cube Organized Materialized Views
• Summaries can now be stored
in a single Oracle OLAP multidimensional cube
• A single cube can contain the
equivalent of thousands of summarycombinations
• The Oracle 11g CBO treats
OLAP cubes as MVs and rewrites queries to access cubes
transparently
• Cubes refreshed using standard
MV procedures
Positioned as the Oracle Data Warehouse Enhancer
• Transparent integration with SQL
• Swap-in replacement for MVs
• OLAP dimension, fact data also accessible directly via SQL views if rewrite is
not appropriate
‣ Calculations in cube
not present in relational source data
‣ CBO is aware of
nature of SQL OLAP views, joins data
T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com
Demonstration
Cube Organized Materialized Views
Optimizing and Tuning the Data Warehouse
• Prior to Oracle 10g, data warehouse tuning was largely a manual process
‣ 10046 trace files for session-level tuning, statspack for system-level tuning
• Oracle Database 10g and 11g come with a number of “Advisors”
• Provide insights into database performance, bottlenecks, potential resolutions
• Automatic recording of diagnostic data
• Require the Tuning and Diagnostics Pack
• Several “in-process” Advisors
‣ Automatic Workload Repository
‣ Automatic Session History
‣ Automatic Database Diagnostic Monitor
• A number of pages on Oracle DB Console
‣ Performance page
‣ Advisor Central Intelligent Infrastructure Application & SQL Management System Resource Management Space Management Backup & Recovery
Management Storage Management
Database Management
T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com
Automatic Workload Repository
• Successor to Statpack (though not free)
• Collects performance-related statistics on a regular (1-hour default) basis
• Monitors activity at the system level
• Periodically flush coarser-grain information to disk - in a self-managed tablespace,
readily available & real-time accessible when needed
• On by default - flush to disk every 30 min, keep for 7 days
• New Oracle Background – MMON, SYSAUX is the home tablespace
• Scripts:
‣ Awrrpt.sql – Default AWR Report
‣ Ashrpt.sql – Report on ASH Sampling
‣ Awrsqrpt.sql – Detailed report on SQL
T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com
Active Session History
• Circular buffer that records statistics on active sessions
• Used for analyzing wait events for individual sessions
• Alternative to 10046 Extended SQL Tracing
• V$SESSION_WAIT with a bit of historical data
Automatic Database Diagnostic Monitor (ADDM)
• Performance expert in a box; based around “response-time” approach
• Provides database-wide performance diagnostic, including RAC
• Automatically runs at every capture of workload repository snapshot
• Provides impact and benefit analysis
• Provides Information
vs. raw data
• Integrates all components
using the Time Model
• Uses AWR snapshots
• Uses MMON to generate
reports every 30 minutes
• Report format is very similar
to STATSPACK, only better
Intelligent Infrastructure Application & SQL Management System Resource Management Space Management Backup & Recovery
Management Storage Management
Database Management
T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com
Oracle DB Console Performance Features
• Performance tab provides an
overview of server and database usage
• Host activity (CPU usage on Windows,
Run Queue on Unix)
• Average Active Sessions (from ASH)
• Throughput, I/O, Parallel Execution
and Services
• Links to other advisors, tasks
Average Active Sessions
• Provides detail on the waits within the selected wait class
T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com
View SQL and Execution Plan
• View SQL, Execution Plan for highlighted SQL statements
• View as diagram or as text
Create and View ADDM Reports
• Create and view ADDM reports
T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com
View Throughput, I/O, Parallel Query, Services
• View instance throughput, correlate with server and session load
• View disk I/O, degree of parallelism for queries
Tracing Individual Sessions
• The DB Console reports shown previously are system-wide, equivalent to statspack
• Sometimes you would want to “trace” an individual session, using ASH
• Accomplished through setting a CLIENT_ID for the individual session
• Session can then be spotted in the
“Top Consumers” report
• ASH Report can be generated
for this CLIENT_ID
exec dbms_session.set_identifier('SQL_TEST') select sum(sales)
from customers, sales s where ....
T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com
Generating Session Level ASH Report
• The Performance page has a Generate ASH Report button next to the
Average Active Sessions graph
• ASH Report can then be filtered by the CLIENT_ID
• Review session-level waits, top SQL etc to identify issues
The SQL Access Advisor
• Part of the 11g Tuning Pack (add-on to Enterprise Edition)
• Recommends use of indexes, materialized views and partitions
T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com
The SQL Performance Analyzer
• Provides an environment for replaying a set of SQL statements in initial and
changed environments, reports on impact of changes
• Useful for testing changes to SQL statements, DB parameters, indexes etc
SQL Tuning Sets
• Captured workloads that can be used as the input for tuning exercises
• Can grab SQL from the SQL cache, or listen over a defined period for SQL
• Stored in the database as a database object
T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com
Demonstration
Database Control Advisors and Performance Features
Over The Past Two Hours, We Have Covered...
• Introduction to Oracle 11g Data Warehousing
• New Extraction, Transformation & Load features in 10g and 11g
• New Partitioning and Compression features
• Summarization and Oracle OLAP
T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com
And Finally...
• Thank you all for attending
• I hope the two hours have been of interest to you
• Please remember to complete the course evaluations
• Leave your contact details with me to forward on the slides
• Have a safe journey home
• Goodbye!