Seamless Access from Oracle Database to Your Big Data

Full text

(1)

Seamless Access from

Oracle Database to Your Big Data

Brian Macdonald

Big Data and Analytics Specialist

Oracle Enterprise Architect

(2)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Agenda

Hadoop and SQL access methods

What is Oracle Big Data SQL

Big Data SQL Architecture

Big Data SQL Configuration

Roadmap

Customer Story

Q&A

(3)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

First Lets Define Big Data

Structured & Unstructured Data

(4)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

SQL on Hadoop is Obvious

Although Implementations Vary

Hive

Impala, HAWQ, IBM Big SQL

Oracle SQL Connector for Hadoop (OSCH)

Oracle Big Data SQL

A million more (Tez, Presto, Hadapt, Stinger, Polybase, Drill, Lots of

start ups)

(5)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

SQL Analytics Challenge

Separate silos of information to analyze

(6)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

SQL Analytics Challenge

No comprehensive SQL interface

(7)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Big Data SQL

Hadoop + NoSQL + Relational…

(8)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Big Data SQL – A New Architecture

Powerful, high-performance SQL on Hadoop

Full Oracle SQL capabilities on Hadoop

SQL query processing local to Hadoop nodes

Simple data integration of Hadoop and Oracle Database

Single SQL point-of-entry to access all data

Scalable joins between Hadoop and RDBMS data

Oracle Security

Govern all Data through a Single Set of Security Policies

Redaction, VPD, etc.

Tool Access

(9)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Use Rich Oracle SQL Dialect Over All

Data

Snapshot of Oracle SQL Analytic Functions

• Ranking functions

– rank, dense_rank, cume_dist, percent_rank, ntile

• Window Aggregate functions (moving and cumulative)

– Avg, sum, min, max, count, variance, stddev, first_value, last_value

• LAG/LEAD functions

– Direct inter-row reference using offsets

• Reporting Aggregate functions

– Sum, avg, min, max, variance, stddev, count, ratio_to_report

• Statistical Aggregates

– Correlation, linear regression family, covariance

• Linear regression

– Fitting of an ordinary-least-squares regression line to a set of number

pairs.

– Frequently combined with the COVAR_POP, COVAR_SAMP, and CORR

functions

• Descriptive Statistics

– DBMS_STAT_FUNCS: summarizes numerical columns of a table and returns count,

min, max, range, mean, stats_mode, variance, standard deviation, median, quantile values, +/- n sigma values, top/bottom 5 values

• Correlations

– Pearson’s correlation coefficients, Spearman's and Kendall's (both nonparametric).

• Cross Tabs

– Enhanced with % statistics: chi squared, phi coefficient, Cramer's V, contingency

coefficient, Cohen's kappa

• Hypothesis Testing

– Student t-test , F-test, Binomial test, Wilcoxon Signed Ranks test, Chi-square, Mann

Whitney test, Kolmogorov-Smirnov test, One-way ANOVA

• Distribution Fitting

– Kolmogorov-Smirnov Test, Anderson-Darling Test, Chi-Squared Test, Normal,

(10)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Big Data SQL Architecture

Two components of Oracle Big Data SQL

External Table extension

(11)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Big Data SQL Architecture

Two components of Oracle Big Data SQL

External Table extension

(12)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 12

A Smarter Oracle External Table

Oracle Table

HDFS Data

You define:

You get:

• Table name

• Automatic discovery of Hive table metadata

• Oracle types

• Automatic translation from Hadoop types

• Automatic conversion from any InputFormat

• Any Degree of Parallelism

(13)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Unify Metadata: Publish Hive Metadata to Oracle Catalog

13

CREATE TABLE movieapp_log_json

(click VARCHAR2(4000))

ORGANIZATION EXTERNAL

(TYPE

ORACLE_HIVE

DEFAULT DIRECTORY DEFAULT_DIR

)

REJECT LIMIT UNLIMITED;

Big Data Appliance

+

Hadoop/NoSQL

Exadata

+

Oracle

Database

Oracle Catalog

External Table

Hive metadata

External Table

Hive Metastore

(14)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Accessible through Oracle Data Dictionary

Immediately – So the DBA doesn’t need to go to Hadoop

ALL_HIVE_DATABASES

ALL_HIVE_TABLES

ALL_HIVE_COLUMNS

DBA_HIVE_DATABASES

DBA_HIVE_TABLES

DBA_HIVE_COLUMNS

USER_HIVE_DATABASES

USER_HIVE_TABLES

USER_HIVE_COLUMNS

(15)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 15

Extend Oracle External Tables

CREATE TABLE movielog (

click VARCHAR2(4000))

ORGANIZATION EXTERNAL (

TYPE ORACLE_HIVE

DEFAULT DIRECTORY DEFAULT_DIR

ACCESS PARAMETERS

(

com.oracle.bigdata.tablename logs

com.oracle.bigdata.cluster mycluster

))

REJECT LIMIT UNLIMITED;

New types of external tables

ORACLE_HIVE (inherit metadata)

ORACLE_HDFS (specify metadata)

Access parameters for Big Data

Hadoop cluster

Remote Hive database/table

DBMS_HADOOP Package for automatic

import

(16)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

(17)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

How Data is Stored in Hadoop

As files. Pretty Simple

17 {"custId":1185972,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:07","recommended":null,"activity":8} {"custId":1354924,"movieId":1948,"genreId":9,"time":"2012-07-01:00:00:22","recommended":"N","activity":7} {"custId":1083711,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:26","recommended":null,"activity":9} {"custId":1234182,"movieId":11547,"genreId":44,"time":"2012-07-01:00:00:32","recommended":"Y","activity":7} {"custId":1010220,"movieId":11547,"genreId":44,"time":"2012-07-01:00:00:42","recommended":"Y","activity":6} {"custId":1143971,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:43","recommended":null,"activity":8} {"custId":1253676,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:50","recommended":null,"activity":9} {"custId":1351777,"movieId":608,"genreId":6,"time":"2012-07-01:00:01:03","recommended":"N","activity":7} {"custId":1143971,"movieId":null,"genreId":null,"time":"2012-07-01:00:01:07","recommended":null,"activity":9} {"custId":1363545,"movieId":27205,"genreId":9,"time":"2012-07-01:00:01:18","recommended":"Y","activity":7} {"custId":1067283,"movieId":1124,"genreId":9,"time":"2012-07-01:00:01:26","recommended":"Y","activity":7} {"custId":1126174,"movieId":16309,"genreId":9,"time":"2012-07-01:00:01:35","recommended":"N","activity":7} {"custId":1234182,"movieId":11547,"genreId":44,"time":"2012-07-01:00:01:39","recommended":"Y","activity":7}} {"custId":1346299,"movieId":424,"genreId":1,"time":"2012-07-01:00:05:02","recommended":"Y","activity":4}

Example: 1TB File

CREATE TABLE ORDER (custid VARCHAR2(10),

recommended VARCHAR2(20),

activity (NUMBER 8,2))

ORGANIZATION EXTERNAL (TYPE

oracle_hdfs

)

LOCATION ("hdfs:/usr/cust/summary/*");

Assumes Default Values

Table Options

Fields

Column Maps

Delimiters

Fileformats

json, textfile, sequencefile,…

Serdes

i.e regex

(18)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Creating an External Table against Hive

DBMS_HADOOP.CREATE_EXTDDL_FOR_HIVE (

cluster_id IN VARCHAR2,

db_name IN VARCHAR2 := NULL,

hive_table_name IN VARCHAR2,

hive_partition IN BOOLEAN,

table_name IN VARCHAR2 := NULL,

perform_ddl IN BOOLEAN DEFAULT FALSE,

text_of_ddl OUT VARCHAR2

);

set serveroutput on

DECLARE

DDLout VARCHAR2(4000);

BEGIN

dbms_hadoop.create_extddl_for_hive(

CLUSTER_ID=>

'bigdatalite

',

DB_NAME=>

'brian

',

HIVE_TABLE_NAME=>

'movie

',

HIVE_PARTITION=>FALSE,

TABLE_NAME=>

'movie

',

PERFORM_DDL=>FALSE,

TEXT_OF_DDL=>DDLout);

dbms_output.put_line(DDLout);

END;

(19)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 19

Oracle External Tables – Flexibility for Varied File Structures

Transparent schema-for-read

Use fast C-based readers when possible

Use native Hadoop classes otherwise

Engineered to understand parallelism

Map external units of parallelism to Oracle

Architected for extensibility

StorageHandler capability enables support for other

data sources

Examples: MongoDB, HBase, Oracle NoSQL DB

CREATE TABLE ORDER (

cust_num VARCHAR2(10),

order_num VARCHAR2(20),

order_total NUMBER(8,2))

ORGANIZATION EXTERNAL (

TYPE

ORACLE_HIVE

DEFAULT DIRECTORY DEFAULT_DIR

)

PARALLEL 20

(20)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

StorageHandlers: Extensibility Beyond HDFS

Hive Metastore

Oracle Big Data SQL

(21)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Big Data SQL Architecture

Two components of Oracle Big Data SQL

External Table extension

(22)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

What gives Exadata

extreme

performance?

22

Oracle Database 12c

SQL

Offload Query to

Exadata Storage Servers

Small data subset

(23)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Introducing Oracle Big Data SQL

23

Massively Parallel SQL Query across Oracle, Hadoop and NoSQL

Oracle Database 12c

Hadoop & NoSQL

(24)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Big Data Appliance X5-2

Sun Oracle X5-2L Servers with per server:

2 * 18 Core Intel Xeon E5 Processors

128 GB Memory

96TB Disk space

Integrated Software (4.2):

Oracle Linux 6.6

Oracle Big Data SQL 1.1*

Cloudera Distribution of Apache Hadoop 5.4 – EDH Edition

Cloudera Manager 5.4

Oracle R Distribution

Oracle NoSQL Database CE

24

(25)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Introducing Oracle Big Data SQL

25

Massively Parallel SQL Query across Oracle, Hadoop and NoSQL

Oracle Database 12c

Offload Query to

Exadata Storage Servers

Small data subset

quickly returned

Hadoop & NoSQL

Offload Query to

Data Nodes

SQL

data

subset

SQL

(26)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Storage Layer

26

Big Data SQL Server: A New Hadoop Processing Engine

Filesystem (HDFS)

(Oracle NoSQL DB, Hbase)

NoSQL Databases

Resource Management (YARN, cgroups)

Processing Layer

MapReduce

(27)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

How do we query Hadoop?

Big Data SQL Query Execution

HDFSData Node BDSQL

HDFS Data Node BDSQL

Query compilation determines:

• Data locations

• Data structure

• Parallelism

1

Fast reads using Big Data SQL Server

• Schema-for-read using Hadoop classes

• Smart Scan selects only relevant data

2

Process filtered result

• Move relevant data to database

• Join with database tables

• Apply database security policies

3

Hive Metastore HDFS

NameNode

1

(28)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Apply Advanced Security on Hadoop & NoSQL

28

Same security policies across all data

Oracle Database 12c

Hadoop

Redacted

data

subset

SQL

Raw JSON

data in

Hadoop

JSON

Customer

data

in Oracle

DBMS_REDACT.ADD_POLICY(

object_schema => 'sales',

object_name => 'customer_detail',

column_name => 'last_name',

policy_name => 'customer_privacy',

function_type => DBMS_REDACT.FULL,

expression => '1=1'

);

Redaction

Virtual Private Database

Fine-grain Access Control

(29)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Configuration

Install Oracle Big Data SQL on the BDA using Mammoth

Run the Big Data SQL-Exadata installation script on

each

Oracle Exadata

database node

Sets up connectivity from Exadata to the Big Data SQL Servers on the BDA.

Installs a Hadoop client

Configure directories and files

Big Data SQL Agent

Oracle directory objects

Others

(30)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Directories

Two Types of directories are created

Common Directory – must be on cluster wide shared files system

Subdirectories for jar files

bigdata.properties (paths,etc.)

Cluster Directory(s)

Configuration details for each BDA Cluster

Sub directory of Common directory

Oracle Directories that point to these Dirs

ORACLE_BIGDATA_CONFIG – Common Directory

(31)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Big Data SQL Agents

Created by Install Script

This multi-threaded agent bridges the metadata between Oracle Database

and Hadoop.

It launches a single JVM - instead of one for every process (which can be

quite slow).

create public database link BDSQL$_XXXX using

'extproc_connection_data'; (XXXX is the name of each BDA cluster from

Cluster Directories

create public database link BDSQL$_DEFAULT_CLUSTER using

(32)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

If Kerberos is used on BDA

Must create ticket (kinit) for BDS user

BDS runs as Oracle User

Need to renew tickets

cron

(33)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Requirements - For Now

Exadata

Oracle 12.1.0.2.1+

Storage Servers 2.1.1.1 or 12.1.1.0

Exadata configured on the same InfiniBand subnet as BDA

Exadata and BDA connected by InfiniBand

(34)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Roadmap

(35)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Today

Hadoop DoP linked to RDBMS DoP

Lead to many idle PQ processes

Required explicit declaration

Next

Unlink Hadoop and RDBMS DoP

Automatic max Hadoop parallelism

Even on serial tables

An average of 40% faster

Even at equivalent DoP

(36)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Today

All blocks in a query must be read

from disk

Large (256MB) disk I/O for each block

Next

Automatically create Storage

Indexes in Big Data SQL Agents

Check index before reading blocks

Skip unnecessary I/Os

An average of 65% faster

Up to 100x faster for highly selective

queries

Storage Indexing

(37)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Customer Examples

(38)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Building Customer Loyalty

Deliver personalized multi-channel content to every

customer (example: Kroger’s ‘MyMagazine’)

Expand to a wide variety of interaction data to build

customer profiles

Challenges

Customer loyalty marketing and programs for

major retailers and consumer brands

Company Overview

2x improvements in campaign performance

Large-scale concurrent processing of complex SQL

70% of analysis is done in SQL, uses R as well

Benefits

Solution Overview

Oracle Exadata X3-8

Oracle Database with Advanced Analytics

Oracle ZFS Backup Appliance

Big Data Appliance

Next: Big Data SQL

SQL Analysis R-based Analysis Machine Learning

BDA

ZFS

X3-8

X3-8

(39)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Thank You

&

Figure

Updating...

References

Updating...

Related subjects :