• No results found

Oracle Big Data, In-memory, and Exadata - One Database Engine to Rule Them All Dr.-Ing. Holger Friedrich

N/A
N/A
Protected

Academic year: 2021

Share "Oracle Big Data, In-memory, and Exadata - One Database Engine to Rule Them All Dr.-Ing. Holger Friedrich"

Copied!
36
0
0

Loading.... (view fulltext now)

Full text

(1)
(2)

Agenda

Introduction

• Old Times

• Exadata

• Big Data

• Oracle In-Memory

• Headquarters

(3)

sumIT AG

• Consulting and implementation services in Switzerland

• Experts for

– Data Warehousing,

– Business Intelligence,

– and Big Data solutions

• Focussed on Oracle technology

• ‘BI Foundation specialized’ partner

• ‘Data Warehousing specialized’ partner

• Our motto: Get Value From Data

• Visit our web site: www.sumit.ch


(4)

Holger Friedrich

• Computer Science diploma of


Karlsruhe Institute of Technology (KIT)

• Ph.D. in Robotics and Machine Learning

• More than 16 years experience with Oracle technology

• Expert for

– Data Integration – Data Warehousing, – Data Mining and

– Business Intelligence

• Technical Director of sumIT AG

(5)

Agenda

• Introduction

Old Times

• Exadata

• Big Data

• Oracle In-Memory

• Headquarters

(6)

DB Architecture - Old Times

• Old times = 1977 - 2008

• SGA - System Global Area

- Shared Pools (Library Cache etc.)

- Redo Log Buffer

- Buffer Cache

• Persistent Storage

- Disk & Tape

- serve database blocks

• PGA - Program Global Area

- Query specific processing 


(7)

Query Processing - Old Times

Server Process

(8)

Agenda

• Introduction

• Old Times

Exadata

• Big Data

• Oracle In-Memory

• Headquarters

(9)
(10)

Exadata - Architecture

• Databases and applications

deployed and configured without any adaptations

• Fast network 


via Infiniband

• Regular compute servers

• Dedicated storage servers

- organised in cells

- discs & flash attached

- run Exadata Storage


(11)

Exadata - The Secret Sauce

Three reasons for outstanding Exadata performance

• Hardware engineering

• Local query processing functionality in storage layer

• Database engine ‘aware’ of ‘intelligent’ storage layer

- extended optimizer costing model and transformations

- extended SW to use Exacta Storage APIs

Divide and conquer for query processing

• not just with slave processes (PARALLEL)

• not just between compute nodes (RAC)

(12)

Exadata - Storage Software Evolution

• Smart Scanning

- execute sub-query in storage cells

- project results in storage already

• Keep hot data in Flash Cache

• Storage Indexes

- collect min/max column values

- reduce disc access

• Smart scanning directly on HCC

data - no decompression required

• Offload mining tasks like scoring

• Additional data caching in

(13)

Agenda

• Introduction

• Old Times

• Exadata

Big Data

• Oracle In-Memory

• Headquarters

(14)

Information Mgmt Reference Architecture

(15)
(16)
(17)

Big Data - Challenges

• Dynamic ecosphere

- Pre-packaged distributions

- Oracle Big Data Appliance

• Analytics

- Tools of Hadoop ecosphere

- Oracle Big Data Analytics

• Data Integration

- Ever changing Hadoop tool set

- Oracle Data Integrator

(18)

Big Data Appliance - The Secret Sauce

Three reasons for outstanding BDA performance

• Hardware engineering

• Local query processing functionality in storage layer

- Big Data SQL = Exadata Storage Software on HADOOP

- Added as process engine to the HADOOP process layer

- BDS agents run independently on HADOOP nodes

• Database engine ‘aware’ of ‘intelligent’ big data layer

- extended and enhanced External Table API

- extended optimizer costing model and transformations

• Exadata success and performance on Big Data

(19)

Big Data SQL - Smart Scan

1.Read data from HDFS data node

- Direct-path reads

- C-based readers when possible

- native HADOOP classes otherwise

2.Translate bytes to Oracle

3.Smart scan on Oracle format

- apply storage indexes (BDS2.0)

- filtering

- column projection

- parsing JSON/XML

(20)

Big Data SQL 2.0 - Storage Indexes

• New feature of Big Data SQL 2.0

• Avoid unnecessary disc access


on HADOOP nodes

• Index built during first full scan

• Granularity in HDFS blocks (256MB)

• Index application

- receive filter predicate

- check storage index for blocks where 


predicate between 


min and max

(21)
(22)

Extended External Tables - HIVE

CREATE TABLE order (cust_num VARCHAR2(10), order_num VARCHAR2(20), order_date DATE, item_cnt NUMBER, description VARCHAR2(100), order_total (NUMBER(8,2)) ORGANIZATION EXTERNAL

(TYPE oracle_hive

ACCESS PARAMETERS (

com.oracle.bigdata.tablename: order_db.order_summary com.oracle.bigdata.colmap: {"col":"ITEM_CNT", \

"field":"order_line_item_count"} com.oracle.bigdata.overflow: {"action":"TRUNCATE", \

"col":"DESCRIPTION"} com.oracle.bigdata.erroropt: [{"action":"replace", \

"value":"INVALID_NUM" , \

"col":["CUST_NUM","ORDER_NUM"]} ,\ {"action":"reject", \

“col":"ORDER_TOTAL}]

optional settings

new type


(23)

Extended External Tables - HDFS

CREATE TABLE order (cust_num VARCHAR2(10), order_num VARCHAR2(20), order_date DATE,

item_cnt NUMBER,

description VARCHAR2(100),

order_total (NUMBER8,2)) ORGANIZATION EXTERNAL

(TYPE oracle_hdfs

ACCESS PARAMETERS( com.oracle.bigdata.rowformat: \ SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' com.oracle.bigdata.fileformat: \ INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'\ OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' com.oracle.bigdata.colmap: {"col":"item_cnt", \

"field":"order_line_item_count"} com.oracle.bigdata.overflow: {"action":"TRUNCATE", \

"col":"DESCRIPTION"}

LOCATION ("hdfs:/usr/cust/summary/*")); optional settings

new type


(24)

Agenda

• Introduction

• Old Times

• Exadata

• Big Data

Oracle In-Memory

• Headquarters

(25)

Columnar Stores - Oracle’s Flavour

• transparent column store managed next to the row store

• not either/or

• persistent storage row-based as before

• column store DML-synched in real-time

• the entire Oracle DB-ecosphere remains unchanged

- security

- backup

- disaster recovery

- RAC

(26)

Advantages

• Best for queries that

- scan large quantities of data

- on a rather small set of columns

- compute aggregates on the results

• High compression benefits on

most columns


(except ones containing distinct values)

(27)

Technology Gems

1. In-memory storage index

2. Filtering on binary compressed data

3. Columnar storage of selected columns

4. Transparent querying across storage hierarchy

5. Real-time background actualization of columnar store

6. Parallel query execution on the columnar store

7. SIMD vector processing

8. In-memory fault tolerance on RAC

(28)

Example - In-Memory Aggregation

• New optimizer transformation Vector Group By

• Resembles well-known star transformation

• Two phase, 6 step process

• Phase 1 - preparation

1. Scan dimensions

2. Build key vectors

3. Prepare accumulator

4. Build tmp-tables for


dim select attributes

• Phase 2 - computation

5. Scan facts w.r.t.


(29)

In-Memory - The Secret Sauce

Many reasons for outstanding In-Memory performance

• Conceptual advantage of columnar format

• Speed of processing in DRAM

• Sum of technology gems (see earlier)

• Database engine ‘aware’ of columnar stores capabilities

- extended optimizer costing model and transformations

- extended SW to use columnar stores’ APIs

• Unprecedented performance for analytics

(30)

Agenda

• Introduction

• Old Times

• Exadata

• Big Data

• Oracle In-Memory

Headquarters

(31)

Headquarters

Wikipedia: "Headquarters (HQ) denotes the location where most, if not all, of

the important functions of an organization are coordinated."

Query Process


in DB

Columnar


Store Exadata


Storage Big Data


Storage

HQ

Block Buffer Disks

(32)

The Database Kernel Rules Them All

Query Franchising in action

• optimizer generates execution plan

• partial queries are sent out to other engines

- Big Data (SQL)

- Columnar in-memory store

- Exadata storage

• partial results are received & further processed

• security policies are applied

• final results are delivered

(33)

The Key Lies in The Kernel

Database optimizer and execution engine make it happen

• Transformer:

- new transformations

• Estimator:

- new cost estimation models

• Execution engine:

- extended calls and APIs

Only possible because Oracle owns all implementations

(34)

Crucial Part - The Dictionary

• The optimizer’s estimates rely on

- the data dictionary

- statistics

• Data Dictionary knows all objects

- Exadata: regular db objects

- In-memory: regular db objects

- Big Data: defined through 


External Table declaration

Estimating statistics about


Big Data objects


(35)

Agenda

• Introduction

• Old Times

• Exadata

• Big Data

• Oracle In-Memory

• Headquarters

(36)

Conclusions

• Exadata - boosts execution for traditional applications and analytics

• Big Data - provides affordable data management for lots of and unstructured data

• In-Memory - serves mighty fast scans, joins, and aggregations for analytics

• With other vendors these technologies are either

- not available in the desired quality

- or not tightly integrated, if at all

• Data silos & isolated solutions are being built again

• But: Oracle provides top solutions for each

• In fact: Oracle provides the only portfolio with

- all three technologies tightly integrated

References

Related documents

Similarly, comfort and travel time are valued higher by commuters from zones close to CBD (i.e., within 5 km to the CBD) than those from city peripherals. It was, how- ever,

Adam is responsible for all of Key Equipment Finance’s commercial and bank-based equipment finance efforts, together with equipment finance programs for the corporate aviation,

+ Hadoop/NoSQL Exadata + Oracle Database Oracle Catalog External Table Hive metadata External Table Hive Metastore.|. Copyright © 2014 Oracle and/or

 Oracle Exadata Database Machine X4-2 (Oracle data sheet).  The Teradata Data

The minimum requirements on the qualifications and experience of the key personnel of a registered specialist contractor in site formation works category (RSC(SF)) are given in

Analyst Workstation / Laptop (2 core, 16GB RAM) Oracle Database Server with ORE Hadoop Server (Oracle Big Data Appliance) In-Memory R Engine In-Memory R Engines spawned by DB.

In addition, wasta (connections) is used extensively within Jordanian bureaucracy to create advantages for oneself and relatives (T. Al- Masri). In this way,

The prevalence of genetic disorders was compared between ARM patients with a major upper limb anomaly—with or without other associated congenital anomalies —and non- isolated