• No results found

Big Data for the Enterprise DAMA 12/15/2011. Bruce Nelson

N/A
N/A
Protected

Academic year: 2021

Share "Big Data for the Enterprise DAMA 12/15/2011. Bruce Nelson"

Copied!
49
0
0

Loading.... (view fulltext now)

Full text

(1)

Big Data for the Enterprise

DAMA 12/15/2011

(2)

The following is intended to outline our general

product direction. It is intended for information

purposes only, and may not be incorporated into any

contract. It is not a commitment to deliver any

material, code, or functionality, and should not be

relied upon in making purchasing decisions.

(3)

Agenda

Big Data Overview

Oracle Big Data solutions

Oracle NoSQL Database

Architecture

Technical Overview

Technical Overview

(4)

<Insert Picture Here>

Big Data

What is Big Data?

Big data is a term applied to data sets that are large,

complex and dynamic (or a combination thereof) and

complex and dynamic (or a combination thereof) and

for which there is a requirement to capture, manage

and process the data set in its entirety (not

transformed), such that it is not possible to process the

data using traditional software tools and analytic

(5)

What is Big Data ?

GEODATA

TEXT

VOLUME VELOCITY VARIETY VALUE

(6)

What is Driving Big Data ?

• The cost per TB of data is now under $100. Over the last 30 years, the amount of space per unit cost has doubled roughly every 14 months (increasing by an order of magnitude every 48 months).

• The cost of CPU cores and memory has dropped massively. Core

counts and computing power has increased several fold for commodity hardware but the cost per server has decreased.

• The software licensing costs for manipulating terabytes, petabytes or

• The software licensing costs for manipulating terabytes, petabytes or exabytes of data has gone from millions of dollars to $0 for open

source Hadoop and noSQL systems compared to Teradata, for

example. These new open source systems are capable of handling the demands of extremely large datasets and massive distribution.

• Development and operational costs have remained about the same but are insignificant compared to cost of licensing in traditional

(7)

What is Driving Big Data ? (Cont’d)

• Big Data projects are being initiated with the business, marketing and

BI segments of companies because there is a demand for deeper and more comprehensive analytics.

• The impetus to use Hadoop and noSQL is a response to take

advantage of years of “throw away” raw data to mine to gain a competitive edge. These new systems are providing a way for the competitive edge. These new systems are providing a way for the business people to do this.

• Hadoop and noSQL get around the restrictions imposed by operations

on traditional RDBMS systems as far as performance, space and license costs are concerned.

• Even though these systems are still in their infancy, they are providing a means to mine almost any kind of data. Text, binary data, XML,

(8)

Why it Big Data important ?

US HEALTH CARE US RETAIL MANUFACTURING GLOBAL PERSONAL LOCATION DATA

EUROPE PUBLIC SECTOR ADMIN

$300 B

60+%

–50%

$100 B

€250 B

Increase industry value per year by

Increase net margin by

Decrease dev., assembly costs by

Increase service provider revenue by

Increase industry value per year by

“In a big data world, a competitor that fails to sufficiently

develop its capabilities will be left behind.”

(9)

Big Data Is About*

Tapping into new data sets

Tapping into new data sets

Finding and monetizing

Finding and monetizing

Finding and monetizing

previously unknown relationships

Finding and monetizing

previously unknown relationships

(10)

Deep Analytics

Low, predictable Latency

High Transaction Count

Big Data in Action

Acquire

Organize

Analyze

Agile Development

Massive Scalability

Real Time Results

High Throughput

In-Place Preparation

All Data Sources/Structures

Flexible Data Structures

 Provide Value Added Services to your Customers

(11)

Early Adopters

A Divided Solution Spectrum

MapReduce Solutions Distributed File Systems Transaction (Key-Value) Stores

NoSQL

Flexible Specialized Developer Centric Schema-less Unstructured Data Variety

Acquire Organize Analyze

(12)

Early Adopter Dilemma

• Time to Build?

• Expertise?

(13)

• Big data is a hot topic for 2011 and beyond driven by

the user community not the vendor community.

• Processing now goes to the data instead of data

being transported to the process (traditional

RDBMS).

Paradigm Shift !

RDBMS).

• The competition for the big data market place is not

between vendors but between the inspired do it

yourself team or individual and the hardware and

software vendors.

• Until recently, big data projects were being

(14)

Agenda

Big Data Overview

Oracle Big Data solutions

Oracle NoSQL Database

Architecture

Technical Overview

Technical Overview

(15)

Hadoop to Oracle

Bridging the Gap

Hadoop MapReduce HDFS

Cassandra

Oracle Loader for

Schema-less Unstructured

Data Variety

Acquire Organize Analyze

RDBMS (OLTP)

RDBMS

(DW) Advanced Analytics

ETL

Oracle Loader for Hadoop

(16)

Oracle Integrated Software Solution

Schema-less Unstructured Data Variety Hadoop HDFS Oracle NoSQL DB Oracle Analytics: Data Mining R Oracle Oracle Loader for Hadoop

Acquire Organize Analyze

(17)

Oracle Engineered Solutions

Schema-less Unstructured Data Variety In-DB Analytics “R” Mining Text Graph Oracle NoSQL DB HDFS Hadoop Oracle Data Integrator Oracle Loader for Hadoop

Big Data Appliance

•Hadoop

•NoSQL Database

•Oracle Loader for hadoop

•Oracle Data Integrator Exalytics •Speed of Thought

Acquire Organize Analyze

Schema Oracle Database (DW) Oracle Database (OLTP) Graph Spatial Oracle BI EE Data Integrator Oracle Exadata •OLTP & DW

•Data Mining & Oracle R •Semantics

•Spatial

(18)

Oracle’s Big Data Solution

Engineered Systems

Oracle

Big Data Appliance

Oracle Exadata

Oracle Exalytics

InfiniBand

Acquire Organize Analyze & Visualize Stream

(19)

•18 Sun X4270 M2 Servers

–48 GB memory per node = 864 GB memory

–12 Intel cores per node = 216 cores

–24 TB storage per node = 432 TB storage

•40 Gb p/sec InfiniBand

Oracle Big Data Appliance Hardware

•40 Gb p/sec InfiniBand

(20)

•Oracle Linux 5.6

•Java Hotspot VM

•Apache Hadoop Distribution v0.20.x

•R Distribution

•Oracle NoSQL Database Enterprise

Oracle Big Data Appliance Software

•Oracle NoSQL Database Enterprise

Edition

•Oracle Data Integrator Application Adapter for Hadoop

(21)

Big Data Overview

Oracle Big Data solutions

Oracle NoSQL Database

Architecture

Technical Overview

Agenda

(22)

• New, rapidly emerging database technology

• Simple data storage, typically non-SQL or Not-only-SQL

• Distributed (Cloud) storage

• Large amounts of data (Terabyte – Petabyte range)

• Solution categories

The Challenge – NoSQL

• Storage for Web Services  Our focus is here

• ETL Processing (MR & Hadoop)  M and we integrate here

• Common data models

• Key-Value  Our focus is here

• Document

• Columnar

(23)

Target Use Cases

• Large semi- or un-structured data repositories

• Simple data structure, simple relationships

• High volume random reads and/or data capture

• Data capture

• Sensor data capture (i.e. IA, SmartGrid, Earth Sc., BioMedical Sc.)

• Statistics & network capture (QOS Network Mgmt)

• Web applications (click-through capture)

• Backup services for mobile devices

• Data services

• NoSQL data sharing (Earth Sci, BioMedical)

• Scalable authentication

• Real-time communication (MMS, SMS, routing)

(24)

Technical Requirements

• Based on talking to many prospects

• Requirements

• Terabytes to petabytes of unstructured or semi-structured data

• No single point of failure

• Cost effective, distributed storage

• Elastic scalability on commodity hardware

• Elastic scalability on commodity hardware

• Fast, bounded response to simple queries

• Fast, reliable transactions

• Simple administration, enterprise support

Commercial-grade NoSQL solution • Real 24x7 support

• Real database expertise

(25)

• Simple Data Model

• Key-value pair with major+minor-key paradigm

• Read/insert/update/delete

• Scalability

• Dynamic data partitioning and distribution

• Optimized data access via intelligent driver

High availability

Oracle NoSQL DB Overview

A Distributed, Scalable Key-Value Database

NoSQL DB Driver

Application

NoSQL DB Driver

Application

• High availability

• One or more replicas

• Resilient to partition master failures

• No single point of failure

• Disaster recovery through location of replicas

• Transparent load balancing • Reads from master or replicas

• Driver is network topology & latency aware

• Elastic (Planned for Release 2)

• Online addition/removal of storage nodes and automatic data redistribution

Storage Nodes

Data Center A

Storage Nodes

(26)

Oracle NoSQL Database

Enterprise Topology

• Replicated Application Servers

• Driver is linked into each Application

• Data Nodes are kept current

(underlying BDB JE HA technology)

• Storage Nodes can be created

across Data Centers across Data Centers

• Automatic Storage Node failure

handling

• Graceful degradation until • Node is fixed, or M • M node is replaced • Automatic recovery

(27)

• Operation result • New Partition Map

Oracle NoSQL Database Driver

Resolving a Request

Hash Major Key to determine Partition id

Use Partition Map to map Partition id to a Rep Group

Operation + Key[M,m] + Value + Transaction Policy

NoSQL DB Driver

Application

RepNodeStorageTable

• New Partition Map

• RepNodeStorageTable

information

id to a Rep Group

Use State Table to determine eligible Storage Node(s) within Rep Group

Use Load Balancer to select best eligible Rep Node

(28)

• Simple data model – key-value pair (major+minor-key paradigm)

• Simple operations – read/insert/update/delete, RMW support

• Scope of transaction – records within a major key, single API call

• Unordered scan of all data (non-transactional)

Oracle NoSQL Database Key Features

Simple Data Model

(29)

Oracle NoSQL Database Key Features

ACID Transactions – Write Durability

Specified on per-operation basis, application can set

defaults

Durability consists of ...

a) Sync policy (on Master and Replica)

• Sync – force to disk

• Write No Sync – force to OS buffer

• Write No Sync – force to OS buffer

• No Sync – write to local log buffer, flush when convenient

b) Replica Ack Policy

• All

• Simple Majority

(30)

Oracle NoSQL Database Key Features

ACID Transactions – Read Consistency

Specified on per-operation basis, default can be

changed

Consistency

Absolute (read from Master)

Time-based

Time-based

Version

(31)

High Availability/Replication

Failure and Recovery

• Storage Node Failure

– System continues to function using remaining nodes in the replication group

– Failed nodes can be removed/replaced via the Administrative API

– Rejoining nodes automatically synchronize with the master

– Isolated nodes can still service reads as long as consistency guarantees are met

• Master Failover

• Master Failover

– Nodes detect failure via missing heartbeat or connection failure

– Automatic election of new master, via a distributed two phase election algorithm (PAXOS)

– Master election based on highest LSN (log sequence number)

• Replication group failure

– System continues to function using remaining replication groups

– Read/Write requests to unavailable subset of keys time out

(32)

• Web-based console and

CLI commands

• Manages and Monitors

• Topology

• Configuration changes

Oracle NoSQL Database

Simple Administration

• Load: Number of operations,

data size

• Performance: Latency, throughput. Min, max,

average, trailing, M

• Events: Failover, recovery, load distribution

(33)

Oracle NoSQL Database Differentiation

Commercial Grade Software and Support

Simple Programming

and Operation Model Easy Management

General PurposeSimple Major + Web-based Console

Scalable throughput and bounded Latency

Intelligent Oracle

Integrates seamlessly with Oracle Stack (ODI, CEP, OLH)

General Purpose

Reliable –Based on proven Berkeley DB JE HA

Easy to Install & Configure

Simple Major + Minor key and Value data structureACID transactionsConfigurable consistency and durabilityWeb-based Console andCLI commands

Manages and Monitors:

• Topology • Load • Performance • Events • AlertsIntelligent Oracle NoSQL DB Driver

• Evenly distributes Data • Sends operation to

fastest node

• Bounded network hops

(34)



Easy to use, easy to manage



Scalable, Available, Bounded Latency



A NoSQL Database from a vendor you trust

Oracle NoSQL Database

(35)

• Two versions

• Oracle NoSQL Database Community Edition. Open Source. GPLv2 or AGPL license.

• Oracle NoSQL Database Enterprise Edition. Closed Source. Standard Oracle License.

Community VS Enterprise Edition

• Community Edition has all of the basic functionality and APIs.

Gets you started. Competes with other OS NoSQL solutions.

• Enterprise Edition for large, production, multi-data center,

(36)
(37)
(38)

Oracle NoSQL Database



Easy to use, easy to manage



Scalable, Available, Bounded Latency



A NoSQL Database from a vendor you trust

(39)
(40)
(41)

Key-Value Store Workloads

Data Capture

• Write data as fast as you can

• Minimal indexing

• No referential integrity

• Relaxed durability guarantees (lower value data)

• Scale write throughput via data distribution

• Optimize write throughput per storage node (master,

append-only log file)

• Asynchronous replication

• Bulk operation support is useful for some applications

• Workload can be steady and/or bursty

(42)

Key-Value Store Workloads

Data Services

• Simple data reads, minimize I/O

• Primary key lookup(s)

• Relaxed consistency requirements (recent is good enough)

• Scale read throughput via load balancing

• Optimize read throughput per storage node – single I/O

• Node failure/partition tolerance (shift in CAP focus)

• Minimize number of operations to get data

• Workload tends to be highly random

• Data caching doesn’t help much

• Data clustering pays off

(43)

Oracle NoSQL Database

Storage Terminology

• Key space hashes into multiple hash

buckets (partitions)

• A set of partitions maps to a replication group (aka “shard” – a logical container for subset of data)

“full” key space

.

...

• A set of replication nodes for each replication group provides HA and read scalability for each replication group

• A Storage node (physical or virtual machine) runs each replication node

m m

.

...

(44)

Oracle NoSQL Database

Topology Example

• 100M keys, 9 Storage Nodes

might be configured as – 1000 Partitions – 3 Rep Groups – Replication Factor 3 – 9 Rep/Storage Nodes P9 P1 P8 P3 P2 P14 P13 ... P111 P333 ... ... P9 P8 P3 P2 P1 – 9 Rep/Storage Nodes SN2 SN9 SN7 SN3 SN5 SN6 SN1 SN4 SN8

(45)

Simple installation, configuration, setup

• Intelligent NoSQL DB Driver “knows” about the topology

Reliability - no single point of failure

• NoSQL DB Driver installed on all clients

• Replication groups for HA, DR

Oracle NoSQL Database Benefits

Commercial Grade Software and Support

• Replication groups for HA, DR

• Storage node based on proven Berkeley DB JE HA

• Automated node failover and recovery

General-purpose

(46)

Backup and Restore

• Backup

– Backups transactionally consistent per storage node, not system-wide

– Uses fast snapshots, creating hard link to database log files

– User can optionally copy files to remote storage

• Restore

– Supports system-wide recovery from remote storage

– Supports system-wide recovery from remote storage

– Requires identical storage node topology

– Copy snapshot backup files to at least one replica in each replication

group and bring up nodes. Remaining nodes will synch up.

– Individual storage nodes recover via HA/replication

• Load from Backup

– Supports loading system into a new NoSQL Database with different topology

(47)

Intelligent Oracle NoSQL DB Driver

• Data evenly distributed across nodes

• Operations sent to fastest appropriate node

• Bounded network hops for all operations

Optimized node configuration

Oracle NoSQL Database Benefits

Bounded Latency At Scale

Optimized node configuration

• Highly-tuned memory management

Read/write scale-out with bounded latency

(48)

Simple Major + Minor key and Value data structure

Simple, consistent transaction model

• Conflict resolution not required

• ACID transactions

Configurable consistency & durability, where needed

Oracle NoSQL Database Benefits

Simple Programming and Operational Model

Configurable consistency & durability, where needed

• Read consistency

(49)

Web-based console, API accessible

Manages and Monitors

• Topology

• Load

• Number of operations, data size

Oracle NoSQL Database Benefits

Easy Management

• Number of operations, data size

• Performance

• Latency, throughput. Min, max, average, trailing, M

• Events

• Failover, recovery, load distribution

• Alerts

References

Related documents