South Florida Oracle User Group
HP Oracle Database Platform / Exadata Appliance –
Extreme Data Warehousing
March 26, 2009
Shyam Varan Nath
President, Oracle BIWA SIG &
Founder Exadata SIG
_ 0 2 2 3 0 7
Agenda
The Problem – Storage Bottleneck for Large Databases
Introduction to Data Warehouse Appliances
Market Landscape
The Solution - Oracle Database Platform and Exadata Storage
Technical Details
Summary
in in g W e b in a r fo r R e p o rt T e m p la te _ 0 2 2 3 0 7
About Myself….
A Certified DBA (OCP) on 4 different Database versions – since 1998
Former member of Oracle Corporation - BI Consulting Practice
Experience in Oracle Data Warehousing, Business Intelligence
(OBIEE) and Data Mining
Founder and President of Oracle BIWA SIG (
http://OracleBIWA.org
),
Exadata SIG
Received IOUG Oracle Contribution Award in 2007
Frequent speaker in Oracle Openworld (2003, 06, 07, 08), NYOUG
(June 06, Sep 06, Sep 08, Mar 09), IOUG/Collaborate (2005, 06, 08),
NOUG (2006), SFOUG (2007), ODTUG (2008) on topics ranging from
Database to BI.
Bachelors from Indian Institute of Technology (IIT), MBA and MS from
Florida Atlantic University
Based in South FL since 1995
_ 0 2 2 3 0 7
The Problem – Storage Bottleneck for Large Databases
Today most databases run on computers with one or many powerful CPU’s
Most large database are I/O bound rather than CPU bound The large storage systems are not able to feed data at a fast
enough rate to the database server
How can we make the storage more intelligent?
Database Engine or Storage or the Interconnect?
Business imperative
Parallel Execution Range Partitioning Composite Partitioning Real Application Clusters Compression Automatic Storage Management
First 1TB Database built in lab
First 1TB customer: Acxiom
First 10TB customer: Amazon.com
First 100TB customer: Yahoo!
Over 100 Terabyte customers
First 30TB customer: France Telecom
1995
1997
1999
2001
2003
2008
Oracle Release 7.3
Oracle8 Oracle8i Oracle9i Oracle9iR2 Oracle10g
2005
Oracle11gExadata Storage:
The next step in VLDW Technology
Over the past 12+ years, Oracle has steadily introduced major architectural advances for large database support
Data warehouses have grown exponentially with these new technologies
Exadata
How Big is the Data Warehouse Storage Problem?
ABC Inc.’s Data Warehouse is approaching 12 terabytes in size and growing by 100% every year! Storage and backup of data alone is costing 24% of the IT budget.
How much are we spending in
Storage?
What are the other impacts of huge
storage needs? Today
Tomorrow
Total IT budget is $5m and cost is expected to double next year at the given rate
Annual storage cost $1.2 m
Not only is the Data Warehouse growing unmanageable in size, information query is slowing down leading to lost orders
Information Retrieval is slow
What is causing the explosion of data in most enterprises?
Regulatory Compliance
Landscape
Web 2.0
Multi media content
Migration of Legacy Applications
Government regulations like SOX, HIPAA government regulations that mandate storing historical data for a certain
number of years
Bandwidth has become cheap and increasing amounts of multimedia content is being generated and stored
A new kind of data source – Web 2.0 such as social networks, blogs leading to various forms of semi-structured and unstructured data. Some of these data is being stored in the
database, some in ECM
As legacy applications from main frames and other files based databases is being migrated to RDBMS, increasing volumes of
data is being stored inside the database
Click-stream Click-stream and personalization data continues to explore for online sites
Some Large Databases in use Today
•Yahoo's data needs are
substantial.
•According to Hasan, VP of Data,
the travel industry's Sabre system
handles 50m events / day, credit
card company Visa handles 120m
events / day, and the New York
Stock Exchange has
handled over 225 m events / day.
•Yahoo, he said, handles
24 billion events / day, fully two
orders of magnitude more than
other non-Internet companies.
Source: IDC, Aug 2008 – “Worldwide Data Warehouse Management Tools 2007 Vendor Shares”
Market Size is $6.7 Billion with 14.6% Growth YoY
Building on Oracle’s Leading Position
Number 1 in Data Warehousing!
IBM 21.7% Microsoft 14.8% Teradata 11.7% Other 12.5%
Oracle 39.3%
Market Landscape
How does the Market Landscape of Data Warehouse appliances look like?
Business imperative
TERADATA
Use of Data Compression reduces storage need by up to 5 times, reducing storage cost by up to 60%
DW Appliance D ata Proc ess & Org aniza tion C os t B en efit Use r Exp erie nce
ORACLE DATABASE PLATFORM The users are able to retrieve information faster due to improved information query response time by up to 3 times Com petit ive Adv anta ge D ata S tora ge NETZZA
The cost of additional license for Data Compression is $ 1 million. Total
expected cost benefit is about $2 million / per year
EXADATA STORAGE
Ability to get results 3 times faster from the Data Warehouse will enhance Decision Support process and result in 20% more customer orders, adding $4 million to annual revenue
HP Oracle Database Machine:
The next step in DW Hardware Solutions
Custom
Custom
• Complete Flexibility • Any OS, any platform • Easy fit into a
company’s IT standards • Documented best-practice configurations for data warehousing
Optimized
Warehouse
Optimized
Warehouse
• Scalable systems installed and pre-configured: ready to run out-of-the-box• Highest performance • Pre-installed and
pre-configured • Sold by Oracle
Reference
Configurations
Reference
Configurations
HP Oracle
Database
Machine
HP Oracle
Database
Machine
Quote from TDWI
In any BI application, it’s always disk I/O that slows performance.
•Data Warehouses are mainly I/O bound rather than CPU bound
•Other VLDB techniques work with Exadata – such as partitioning and
compression
Three Pronged Approach to Solve the Problem
•Faster Pipe – Infiniband
•More Pipes
•More Efficient use of the
Data Pipe by Division of
Work between the DB Grid
and the Exadata Storage
Server
10-100X faster than conventional DW systems High bandwidth: 14GB/sec of raw I/O throughput
>50GB/sec of raw business data can be processed with compression High-bandwidth Infiniband network between Database Servers and
Storage Servers
Efficient block access in Storage Servers “Smart scan” processing
Data-intensive processing in the storage server
Compute-intensive processing in the database server Less data transfer over the network
HP Oracle Database Machine:
Extreme Performance
HP Oracle Database Machine:
Key Components
Database Server Grid
8 Servers, each consisting of: • One HP DL 360-G5 with
•2 Intel Quad-core processors •32 GB RAM
•4 146GB SAS disks
•Dual-port Infinibad Host Channel Adapter (HCA) •Oracle Enterprise Linux
•Oracle Database 11g Enterprise Edition with Real Application Clusters and Partitioning
Exadata Storage Server Grid
14 Servers, each consisting of: 14 Servers, each consisting of: 14 Servers, each consisting of: 14 Servers, each consisting of:
• One HP DL180-G5 with
• 2 Intel Quad-core processors • 8GB RAM
•12 450GB SAS or 1TB SATA disks
•Dual-port Infiniband Host Channel Adapter (HCA) • Oracle Enterprise Linux
• Oracle Exadata Storage Server Software
4
4
Infiniband
Infiniband
Switches
Switches
Each with 24 portsDivision of Work
Exadata Storage Server
Implements data intensive processing directly in storage
– Scans tables and indexes filtering out data that is not relevant to a query
Compute intensive data processing remains in database servers
Joins, aggregation, statistics, data conversions, etc.
Smart Scans
Exadata cells implement smart scans to greatly
reduce the data that needs to be processed by
database
Only return relevant rows and columns to
database
Offload predicate evaluation
Data reduction is usually very large
Column and row reduction often decrease data to
Traditional Scan Processing
Smart Scan Example:
Telco wants to identify
customers that spend more
than $200 on a single phone
call
With traditional storage, all
database intelligence
resides in the database
hosts
Most data returned from
storage is discarded by
database
Discarded data consumes
valuable resources, and
impacts the performance of
other workloads
IOs Executed:
1 terabyte of data
returned to hosts
DB Host reduces
terabyte of data to 1000
customer names that
are returned to client
Rows Returned
SELECT
customer_id
FROM calls
where amount >
200;
Table
Extents
Identified
I/Os Issued
Exadata Smart Scan Processing
Only the relevant columns
customer_id
and required rows
where amount>200
are are returned to database
CPU consumed by
predicate evaluation is
offloaded
Moving scan processing off
the database frees CPU
cycles and eliminates lots
of unproductive messaging
Returns the needle, not the
entire hay stack
2MB of data
returned to server
Rows Returned
Smart Scan
Constructed And
Sent To Cells
Smart Scan
identifies rows and
columns within
terabyte table that
Consolidated
Result Set
Built From
All Cells
SELECT
customer_id
FROM calls
where amount >
200;
Smart Scan Transparency
Smart Scans correctly handle complex cases including
Uncommitted data and locked rows
Chained rows
Compressed tables
National Language Processing
Date arithmetic
Regular expression searches
Partitioned tables
Smart scans are transparent to the application
No application or SQL changes required
Returned data is fully consistent and transactional
If a cell dies during a smart scan, the uncompleted portions
Data Flow Concepts
Concept of Data flow and producer – consumer relationships
Three kinds of data exchanges take place
– Exchange 1
– Exchange 2
– Exchange 3
Exchange 1 is flow of data within an Exadata Cell using iDB
protocol, throughput is 60-80MB/sec per disk
Exchange 2 is between a single cell and Database grid
(1Gb/sec)
Exchange 3 is between the Database grid and the Storage Grid
(1.6 GB/sec)
Targeted Messages: to DW Managers / Architects v/s to DBA’s/ System Admins
Key Messages for DW Managers / Architects
10x – 100x performance gains for end-user queries
Zero changes to existing BIDW tools and applications
Supports large numbers of Decision Support users and applications
Fast deployment: no configuration needed
Key Messages to DBA’s / Sys Admins
Built on Oracle Database 11g (11.1.0.6 and higher), consistent with corporate standards
Based on standard hardware components from HP – no proprietary hardware
Oracle provides a single point of purchase and support
10.5 GB/s 46 TB
168 TB HP Oracle Database Machine Hardware SATA
1 GB/s 1.5 TB
5.4 TB HP Exadata Storage Server Hardware SAS
0.75 GB/s 3.3 TB
12 TB HP Exadata Storage Server Hardware SATA
Data Bandwidth User Data Raw Storage 14 GB/s 30 TB 97 TB HP Oracle Database Machine Hardware SAS
Raw Storage: Total raw disk capacity, computed as (# disks x disk capacity)
User Data: Space for end-user data, computed after mirroring and after allowing space for
database structures such as temp, logs, undo, and indexes. User data capacity is
uncompressed; with compression, 2x to 4x more data can often be stored. Actual user
data capacity varies by application
HP Oracle Database Machine
HP Oracle Database Machine:
High Availability
Oracle Exadata Storage Servers
Storage Server failure
Oracle Real Application Clusters
Database Server failure
Oracle Automatic Storage Management: all
disks are mirrored
Disk failure
Redundant switches; dual-port HCA’s in all
servers
Switch failure
Redundant power supplies for all servers
Power failure
Database Machine Solution
Problem
HP Oracle Database Machine:
Installation
Goal: Deliver to the customer a completely functioning database system
All servers properly configured and networked All software configured (CRS, RAC, DB, Exadata) Default database created
Performance and functionality validated
Installation is included in the price of HP Oracle Database Machine
Onsite HP Installation Services Onsite Oracle ACS Services
HP Oracle Database Machine:
Support
Single point of contact for support (Oracle) for entire HP Oracle Database Machine Hardware
Software
– Oracle Enterprise Linux – Database
– Exadata Storage Software
Software issues resolved by Oracle support
Hardware support
Hardware issues are passed to HP
HP contacts the customer to resolve the issues HP Support is available 24x7
– For on-site support HP has to respond (not repair) within defined times Customer can buy additional support (HP Care packs)
DB Machine Technology Comparison
128 GB
108 GB
368 GB
Memory
1 Gb/sec BYNET
1Gb/sec Ethernet
20Gb/sec Infiniband
Interconnect
144 x 300GB disks
108 x 400GB disks
168 x 450GB disks
Disks
32 DB Cores
4 DB Cores (?)
64 DB Cores
Database cores
0 Storage Cores
108 Storage Cores*
112 Storage Cores
Storage cores
32 Cores
112 Cores*
176 Cores
Total cores
12.6 TB
12.5 TB
21 TB
User data
HW Architecture
Footprint
Proprietary**
Proprietary
Open
1 rack
1 rack
1 rack
Teradata
2550
Netezza
10100
HP Oracle
Database
Machine
* Netezza 10100 uses PowerPC CPU’s (less powerful than Intel Xeon cores) ** Teradata BYNET Interconnect is proprietary
Retailer Exadata Speedup – 3x to 50x
- 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 Recall Query
Gift Card Activations Sales and Customer Counts Prompt04 Clone for ACL audit Date to Date Movement Comparison - 53 weeks Materialized Views Rebuild Merchandising Level 1 Detail by
W eek
Supply Chain Vendor - Year - Item Movement
Merchandising Level 1 Detail: Current - 52 weeks Merchandising Level 1 Detail:
Period Ago
x SPEEDUP
16x
Average
Oracle HP
Database
Machine
Oracle HP
Database
Machine
Scalable DB Reference Customers Pre-built BIAccelerators Single Point of Contact Industry Vertical Solutions BI/DW Technical Infrastructure Ready Configuration Existing DB features compatibility (Partitioning) Scalable Storage
Exadata’s Value Proposition
Ability to stay on Oracle Database for Extreme BIDW Performance
Compatibility with DB features like Partitioning, DB Compression etc
Horizontal Scalability for
Database Grid and Storage Grid
Pre-built solutions from Oracle for BIDW like BI-Apps using OBIEE, Industry extensions like Oracle Data Warehouse for Retail (Accelerators)
Single point of support – Hardware and Software
Exadata Benefits
Extreme Performance
10X to 100X10X to 100X speedup for data warehousing Database Aware Storage
Smart Scans
Massively Parallel Architecture
Dynamically Scalable to hundreds of cells Linear Scaling of Data Bandwidth
Transaction/Job level Quality of Service Mission Critical Availability and Protection
Orac
le Ex
adat
a
Let us look at why Oracle Exadata needs to be in the BIDW roadmap of the companies to
address common issues
What can Oracle Exadata Platform do for you?
Explosion on Data Volumes
Explosion on Data Volumes
Cost of licensing new H/W and
S/W
Cost of licensing new H/W and
S/W
Reduced Query Performance due
to large database size
Reduced Query Performance due
to large database size
Fear of adoption and learning
curve of data compression
Fear of adoption and learning
curve of data compression
Compatibility with other 11g
features like compression or
Partitioning
Compatibility with other 11g
features like compression or
Partitioning
DB is on Exadata, what about
backup?
DB is on Exadata, what about
backup?
High Perforamance even with
exponential growth of data
High Perforamance even with
exponential growth of data
Total cost of ownership is reduced
in long run
Total cost of ownership is reduced
in long run
Tremendous Business Productivity
boost
Tremendous Business Productivity
boost
No impact to app
developers/end-users, minimal impact for DBA’s
No impact to app
developers/end-users, minimal impact for DBA’s
Compression/Partitioning can be
used with Exadata storage
Compression/Partitioning can be
used with Exadata storage
Standby DB does not have to be
Exadata
Standby DB does not have to be
Exadata
Questions
Reminder join IOUG Exadata SIG for more info
Contact Info:
ShyamVaran@Gmail.com
(954) 609 – 2402 cell
http://OracleExadata.org
Other Resources
http://OracleExadata.org http://www.oracle.com/exadata www.oracle.com/technology/products/bi www.oracle.com/solutions/business_intelligence OTN: http://www.oracle.com/technology/products/bi/db/dbmachine http://www.oracle.com/technology/products/bi/db/exadata Forums: http://structureddata.org/ http://kevinclosson.wordpress.com/ http://techspectator.blogspot.com/Subject:Oracle Exadata Setup/Configuration Best Practices Doc ID:757553.1Type:BULLETIN Modified
Date:18-MAR-2009Status:PUBLISHED