Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Big Data Management System
Solution Overview
Pascal GUY
Pre Sales Architect
Business Unit Systems
Oracle France
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracle’s products remains at the sole discretion of Oracle.
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Oracle positioning
Technical proposition
Storage Treatment
Data Management
Visualization
Active domains
1
2
3
4
5
Oracle Confidential – Internal/Restricted/Highly Restricted 3
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential | #BeyondBigData 4
Enterprise Big Data Analytics Architecture
Enabling you to Create Value from Data
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | BIG DATA MANAGEMENT BIG DATA ANALYTICS BIG DATA APPLICATIONS BIG DATA INTEGRATION CREATE VALUE FROM DATA
Streaming +
Batch
Data Reservoir +
Data Warehouse
Discovery +
Business Analytics
Mobile +
Web + On-device
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Oracle positioning
Technical proposition
Storage Treatment
Data Management
Visualization
Active domains
1
2
3
4
5
Oracle Confidential – Internal/Restricted/Highly Restricted 5
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Getting Started with Big Data
Transform
Key Business Initiatives
Build Foundation
ETL Offload
ISV Platform Integration
Enrich
Enhance Existing Data
Warehouse and BI
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Driving Business Value from Technology Innovation
Use the Right Tool for the Job and benefit from the Power of “AND”
7
Run the Business
Integrate existing systems
Support mission-critical tasks
Protect existing expenditures
Ensure skills relevance
Relational
Hadoop
Change the Business
Disrupt competitors
Disintermediate supply chains
Leverage new paradigms
Exploit new analyses
NoSQL
Scale the Business
Serve data faster
Meet mobile challenges
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data Management System
SOUR
C
ES
DATA RESERVOIR
DATA WAREHOUSE
Oracle Database
Oracle Industry
Models
Oracle Advanced
Analytics
Oracle Spatial & Graph
Big Data Appliance
Apache
Flume
Oracle
GoldenGate
Oracle Event
Processing
Cloudera Hadoop
Oracle NoSQL
Oracle R Advanced
Analytics for Hadoop
Oracle R Distribution
Oracle Database
In-Memory, Multi-tenant
Oracle Industry Models
Oracle Advanced
Analytics
Oracle Spatial & Graph
Exadata
Oracle
GoldenGate
Oracle Event
Processing
Oracle Data
Integrator
Oracle Big Data
Connectors
Oracle Data
Integrator
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Oracle positioning
Technical proposition
Storage Treatment
Data Management
Visualization
Active domains
1
2
3
4
5
Oracle Confidential – Internal/Restricted/Highly Restricted 9
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Recap: Big Data Appliance Overview
Big Data Appliance X4-2
Sun Oracle X4-2L Servers with per server:
•
2 * 8 Core Intel Xeon E5 Processors
•
64 GB Memory
•
48TB Disk space
Integrated Software:
•
Oracle Linux, Oracle Java VM
•
Oracle Big Data SQL*
•
Cloudera Distribution of Apache Hadoop – EDH Edition
•
Cloudera Manager
•
Oracle R Distribution
•
Oracle NoSQL Database
10
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Recap: Standard and Modular
11
Starter Rack is a fully cabled and
configured for growth with 6 servers
In-Rack Expansion delivers 6 server
modular expansion block
Full Rack delivers optimal blend of
capacity and expansion options
Grow by adding rack – up to 18 racks
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data SQL – A New Architecture
•
Powerful, high-performance SQL on Hadoop
–
Full Oracle SQL capabilities on Hadoop
–
SQL query processing local to Hadoop nodes
•
Simple data integration of Hadoop and Oracle Database
–
Single SQL point-of-entry to access all data
–
Scalable joins between Hadoop and RDBMS data
•
Optimized hardware
–
High-speed Infiniband network between Hadoop and Exadata
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Smart Scan for Fast Query Processing
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Intelligent Query Optimization
One Query Spanning Oracle Database, Hadoop & NoSQL
Query Data in RDBMS,
Hadoop & NoSQL
Oracle SQL
Oracle NoSQL DB BDS Server HDFS Data Node BDS Server Oracle NoSQL DB BDS Server HDFS Data Node BDS Server Oracle Database Storage Server Oracle Database Storage ServerFast
Massive Parallelism
Filtered Locally
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Data Stored in Hadoop
Oracle Confidential – Internal/Restricted/Highly Restricted 15
Hadoop/NoSQL Ecosystem
{"custId":1185972,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:07","recommended":null,"activity":8}{"custId":1354924,"movieId":1948,"genreId":9,"time":"2012-07-01:00:00:22","recommended":"N","activity":7} {"custId":1083711,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:26","recommended":null,"activity":9} {"custId":1234182,"movieId":11547,"genreId":44,"time":"2012-07-01:00:00:32","recommended":"Y","activity":7} {"custId":1010220,"movieId":11547,"genreId":44,"time":"2012-07-01:00:00:42","recommended":"Y","activity":6} {"custId":1143971,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:43","recommended":null,"activity":8} {"custId":1253676,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:50","recommended":null,"activity":9} {"custId":1351777,"movieId":608,"genreId":6,"time":"2012-07-01:00:01:03","recommended":"N","activity":7} {"custId":1143971,"movieId":null,"genreId":null,"time":"2012-07-01:00:01:07","recommended":null,"activity":9} {"custId":1363545,"movieId":27205,"genreId":9,"time":"2012-07-01:00:01:18","recommended":"Y","activity":7} {"custId":1067283,"movieId":1124,"genreId":9,"time":"2012-07-01:00:01:26","recommended":"Y","activity":7} {"custId":1126174,"movieId":16309,"genreId":9,"time":"2012-07-01:00:01:35","recommended":"N","activity":7} {"custId":1234182,"movieId":11547,"genreId":44,"time":"2012-07-01:00:01:39","recommended":"Y","activity":7}} {"custId":1346299,"movieId":424,"genreId":1,"time":"2012-07-01:00:05:02","recommended":"Y","activity":4}
Example: 1TB File
Block B1
Block B2
Block B3
•
1 block = 256 MB
•
Example File = 40 blocks
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 16
Enhance Oracle External Table Performance
•
Previously external tables were “file-centric”
–
1 file == 1 unit of parallelism
•
Enhanced external tables understand
parallelism
–
Automatically map external units of parallelism to
Oracle “Granules”
–
1 Input Split == 1 Oracle “Granule”
CREATE TABLE movieapp_log_json
(click VARCHAR2(4000))
ORGANIZATION EXTERNAL
(TYPE
ORACLE_HIVE
DEFAULT DIRECTORY DEFAULT_DIR
)
PARALLEL 20
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Query Execution on Hadoop
select last_name, state,
movie, genre
from movielog m, customer c
where genre=‘comedy’
and c.custid = m.custid
1
Query compilation determines:
•
Data locations
•
Data structure
•
Parallelism
1
2
Parallel reads using Big Data SQL Server:
•
Parallel unit: PQ Slaves & InputSplits
•
Filter rows and project columns
2
Hive Metastore HDFS
NameNode
3
Process filtered result
•
Move relevant data to database
•
Join with database tables
•
Apply database security policies
3
HDFS Data Node BDS Server
HDFS Data Node BDS Server
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Big Data SQL Server Minimizes Data Movement
Oracle Confidential – Internal/Restricted/Highly Restricted 18
Data Node
Disk
Big Data SQL Server
External Table Services
1.
Read using Hadoop Classes
2.
Convert to Oracle Data
Stream
Hadoop Smart Scan
1.
Apply filter predicates
2.
Apply column projections
3.
Apply row-level functions
•
JSON Parsing
•
Work close to the data
–
Scans and serializations from Hadoop classes
–
Transformation into Oracle data stream
•
Smart Scan: Emit only relevant data
–
Apply filter predicates
•
Include complex predicates, e.g. JSON_EXISTS
•
Bloom filters for faster joins
•
Score Data Mining models
–
Project columns
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Big Data SQL
Rich, comprehensive SQL access to all enterprise data
19
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Oracle positioning
Technical proposition
Storage Treatment
Data Management
Visualization
Active domains
1
2
3
4
5
Oracle Confidential – Internal/Restricted/Highly Restricted 20
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Feedback Loop
Data Management
Big Data
Platform
(Hadoop/NoSQL)
Relational
Data Warehouse
(OCDM)
Analytic Apps
Customer
Experience
Operations
Monetization
Adapters
ETL/ELT
Adapters
Real-Time
Adapters
Third
Party
Data
Sources
Oracle Comms Apps (BSS/OSS)
Oracle Comms Ntwk Products (Tekelec
& Acme)
Other Oracle Apps (CRM, ERP, etc.)
Third Party Sources
Oracle Communications Data Model
Reference Architecture
To Other Apps
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Data Stored in Hadoop
Oracle Confidential – Internal/Restricted/Highly Restricted 22
Hadoop/NoSQL Ecosystem
{"custId":1185972,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:07","recommended":null,"activity":8}{"custId":1354924,"movieId":1948,"genreId":9,"time":"2012-07-01:00:00:22","recommended":"N","activity":7} {"custId":1083711,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:26","recommended":null,"activity":9} {"custId":1234182,"movieId":11547,"genreId":44,"time":"2012-07-01:00:00:32","recommended":"Y","activity":7} {"custId":1010220,"movieId":11547,"genreId":44,"time":"2012-07-01:00:00:42","recommended":"Y","activity":6} {"custId":1143971,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:43","recommended":null,"activity":8} {"custId":1253676,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:50","recommended":null,"activity":9} {"custId":1351777,"movieId":608,"genreId":6,"time":"2012-07-01:00:01:03","recommended":"N","activity":7} {"custId":1143971,"movieId":null,"genreId":null,"time":"2012-07-01:00:01:07","recommended":null,"activity":9} {"custId":1363545,"movieId":27205,"genreId":9,"time":"2012-07-01:00:01:18","recommended":"Y","activity":7} {"custId":1067283,"movieId":1124,"genreId":9,"time":"2012-07-01:00:01:26","recommended":"Y","activity":7} {"custId":1126174,"movieId":16309,"genreId":9,"time":"2012-07-01:00:01:35","recommended":"N","activity":7} {"custId":1234182,"movieId":11547,"genreId":44,"time":"2012-07-01:00:01:39","recommended":"Y","activity":7}} {"custId":1346299,"movieId":424,"genreId":1,"time":"2012-07-01:00:05:02","recommended":"Y","activity":4}
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Hive
•
Provides SQL-like interface to data stored in HDFS
•
Allows applications to process data stored in any format
•
Tables capture metadata required to locate and parse data
•
SQL query generates a MapReduce job to process the data
Oracle Confidential – Internal/Restricted/Highly Restricted 23
•
Big Data SQL uses Hive metadata to simplify
administration, but it’s not required.
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Schema on Read: MapReduce and Hive
Simple Case: Single Column
Oracle Confidential – Internal/Restricted/Highly Restricted 24
> select * from movieapp_log_json
{"custId":1185972,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:07",… {"custId":1354924,"movieId":1948,"genreId":9,"time":"2012-07-01:00:00:22",… {"custId":1083711,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:26",… {"custId":1234182,"movieId":11547,"genreId":44,"time":"2012-07-01:00:00:32”,… {"custId":1010220,"movieId":11547,"genreId":44,"time":"2012-07-01:00:00:42",… {"custId":1143971,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:43",… {"custId":1253676,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:50",… {"custId":1351777,"movieId":608,"genreId":6,"time":"2012-07-01:00:01:03”,…
HiveQL:
CREATE EXTERNAL TABLE movieapp_log_json
(
click STRING
)
ROW FORMAT
DELIMITED
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/user/oracle/applog';
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Schema on Read: MapReduce and Hive
Same Source with Columns Derived Using SerDe
Oracle Confidential – Internal/Restricted/Highly Restricted 25
> select * from movielog_cols
HiveQL:
CREATE EXTERNAL TABLE movielog_cols (
custid int,
movieid int,
activity int, …)
ROW FORMAT SERDE
'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'/user/oracle/applog_json';
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Hive Metastore
SQL Execution Engines Share Metadata
Oracle Confidential – Internal/Restricted/Highly Restricted 26
Hive Metastore
Hive
Impala
Shark
Oracle Big Data SQL
…
Table Definitions:
movieapp_log_json
movielog
movieapp_log_avro
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Unify Metadata: Publish Hive Metadata to Oracle Catalog
27
CREATE TABLE movieapp_log_json
(click VARCHAR2(4000))
ORGANIZATION EXTERNAL
(TYPE
ORACLE_HIVE
DEFAULT DIRECTORY DEFAULT_DIR
)
REJECT LIMIT UNLIMITED;
Big Data Appliance
+
Hadoop/NoSQL
Exadata
+
Oracle Database
Oracle Catalog
External Table
Hive metadata
External Table
Hive Metastore
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Automation: Oracle Data Modeler
Import Hive definitions into model
Automatically generate Oracle DDL for
imported tables
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 29
New Data Sources for Oracle External Tables
CREATE TABLE movielog
(click VARCHAR2(4000))
ORGANIZATION EXTERNAL
( TYPE
ORACLE_HIVE
DEFAULT DIRECTORY Dir1
ACCESS PARAMETERS
(
com.oracle.bigdata.tablename logs
com.oracle.bigdata.cluster mycluster
)
)
REJECT LIMIT UNLIMITED
•
New set of properties
–
ORACLE_HIVE and ORACLE_HDFS access drivers
–
Identify a Hadoop cluster, data source, column
mapping, error handling, overflow handling, logging
•
New table metadata passed from Oracle DDL to
Hadoop readers at query execution
•
Architected for extensibility
–
StorageHandler capability enables future support for
other data sources
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Use Rich Oracle SQL Dialect Over
All
Data
Snapshot of Oracle SQL Analytic Functions
• Ranking functions– rank, dense_rank, cume_dist, percent_rank, ntile
• Window Aggregate functions (moving and cumulative)
– Avg, sum, min, max, count, variance, stddev, first_value, last_value
• LAG/LEAD functions
– Direct inter-row reference using offsets
• Reporting Aggregate functions
– Sum, avg, min, max, variance, stddev, count, ratio_to_report
• Statistical Aggregates
– Correlation, linear regression family, covariance
• Linear regression
– Fitting of an ordinary-least-squares regression line to a set of number pairs.
– Frequently combined with the COVAR_POP, COVAR_SAMP, and CORR functions
• Descriptive Statistics
– DBMS_STAT_FUNCS: summarizes numerical columns of a table and returns count, min, max, range, mean, stats_mode, variance, standard deviation, median,
quantile values, +/- n sigma values, top/bottom 5 values
• Correlations
– Pearson’s correlation coefficients, Spearman's and Kendall's (both nonparametric).
• Cross Tabs
– Enhanced with % statistics: chi squared, phi coefficient, Cramer's V, contingency coefficient, Cohen's kappa
• Hypothesis Testing
– Student t-test , F-test, Binomial test, Wilcoxon Signed Ranks test, Chi-square, Mann Whitney test, Kolmogorov-Smirnov test, One-way ANOVA
• Distribution Fitting
– Kolmogorov-Smirnov Test, Anderson-Darling Test, Chi-Squared Test, Normal, Uniform, Weibull, Exponential
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
next = lineNext.getQuantity(); }
if (!q.isEmpty() && (prev.isEmpty() || (eq(q, prev) && gt(q, next)))) { state = "S";
return state; }
if (gt(q, prev) && gt(q, next)) { state = "T";
return state; }
if (lt(q, prev) && lt(q, next)) { state = "B";
return state; }
if (!q.isEmpty() && (next.isEmpty() || (gt(q, prev) && eq(q, next)))) { state = "E";
return state; }
if (q.isEmpty() || eq(q, prev)) { state = "F";
return state; }
return state; }
private boolean eq(String a, String b) { if (a.isEmpty() || b.isEmpty()) { return false;
}
return a.equals(b); }
private boolean gt(String a, String b) { if (a.isEmpty() || b.isEmpty()) { return false;
}
return Double.parseDouble(a) > Double.parseDouble(b); }
private boolean lt(String a, String b) { if (a.isEmpty() || b.isEmpty()) { return false;
}
return Double.parseDouble(a) < Double.parseDouble(b); }
public String getState() { return this.state; }
}
BagFactory bagFactory = BagFactory.getInstance(); @Override
public Tuple exec(Tuple input) throws IOException { long c = 0; String line = ""; String pbkey = ""; V0Line nextLine; V0Line thisLine; V0Line processLine; V0Line evalLine = null; V0Line prevLine;
boolean noMoreValues = false; String matchList = "";
ArrayList<V0Line> lineFifo = new ArrayList<V0Line>(); boolean finished = false;
DataBag output = bagFactory.newDefaultBag(); if (input == null) { return null; } if (input.size() == 0) { return null; } Object o = input.get(0); if (o == null) { return null; } //Object o = input.get(0); if (!(o instanceof DataBag)) { int errCode = 2114;
String msg = "Expected input to be DataBag, but"
Pattern Matching With Oracle SQL
Snapshot of Oracle SQL Analytic Functions
Simplified, sophisticated, standards based syntax
SELECT first_x, last_z
FROM ticker MATCH_RECOGNIZE (
PARTITION BY name ORDER BY time MEASURES FIRST(x.time) AS first_x, LAST(z.time) AS last_z ONE ROW PER MATCH
PATTERN (X+ Y+ W+ Z+)
DEFINE X AS (price < PREV(price)), Y AS (price > PREV(price)), W AS (price < PREV(price)), Z AS (price > PREV(price) AND
z.time - FIRST(x.time) <= 7 ))
250+ Lines of Java UDF
12 Lines of SQL
20x less code
Finding Patterns in Stock Market Data - Double Bottom (W)
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 31
10:00 10:05 10:10 10:15 10:20 10:25
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Govern
All
Data
32
Store JSON data unconverted
in Hadoop
Oracle Database 12c Oracle Big Data Appliance
SQL
Data analyzed via SQL
Store business-critical data in
Oracle
DBMS_REDACT.ADD_POLICY( object_schema => 'hr', object_name => 'employee', column_name => 'social_sec_num', policy_name => 'redact_ssn', function_type => DBMS_REDACT.FULL, expression => '1=1' );
Apply advanced security on Hadoop
−
Masking/Redaction
−
Virtual Private Database
−
Fine-grained Access Control
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Oracle positioning
Technical proposition
Storage Treatment
Data Management
Visualization
Active domains
1
2
3
4
5
Oracle Confidential – Internal/Restricted/Highly Restricted 33
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Enterprise Analytics and the Unified Reservoir
Gather Once, Don’t Wait, Analyze Many Times
Sale s Fin an ce Su p p ly Cha in HR Mark et in g
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal 35
Oracle Big Data Discovery
. The Visual Face of Hadoop
Explore
Transform
Discover
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Big Data Management System
Advanced Query & Analysis
Full Power of SQL and Advanced Analytics
Leverages All Your Data
Relational, Hadoop and NoSQL
Secure
Unified Governance on All Data
Fastest Performance
Utilize SQL Processing Across the Platform
Transparent to Applications
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Oracle positioning
Technical proposition
Storage Treatment
Data Management
Visualization
Active domains
1
2
3
4
5
Oracle Confidential – Internal/Restricted/Highly Restricted 42
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Why Is Big Data Important?
Value Creation
HEALTH CARE MANUFACTURING COMMUNICATIONS
“In a big data world, a competitor that fails to sufficiently
develop its capabilities will be left behind.”
Reduce Prescription
Fraud
Accelerate Test Cycles
to Reduce Backlog
Offering New Services
based on Location
Data
McKinsey Global Institute
RETAIL