• No results found

SQL - The Goto Language for Big Data Analy&cs!

N/A
N/A
Protected

Academic year: 2021

Share "SQL - The Goto Language for Big Data Analy&cs!"

Copied!
23
0
0

Loading.... (view fulltext now)

Full text

(1)

SQL Made

Great

Analy&cal SQL

SQL - The Goto Language for Big Data

Analy&cs!

Hermann Bär, [email protected]

Product Management

Data Warehousing

(2)

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

Who Am I Not? .. But Who SHOULD be Here ..

Keith Laker

[email protected]

19 years with Oracle

Worked in consulSng, global support, onsite support and product mgnt.

Part of Data Warehouse product management team

Product Manager for analyScal SQL

Based in Manchester, UK

(3)

Safe Harbor Statement

The following is intended to outline our

general product direc&on

. It is intended for

informa&on purposes only

, and may not be incorporated into any contract. It is

not a

commitment to deliver

any material, code, or funcSonality, and should

not be relied

upon

in making purchasing decisions. The development, release, and Sming of any

features or funcSonality described for Oracle’s products remains at the sole discreSon of

Oracle.

(4)

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

Dear Reader –

Due to legal constraints we are not allowed to provide any material

about Oracle’s exciSng upcoming new technology. We sincerely

apologize for this.

However, there is no reason to wait and not to use the current

available version, Oracle Database 12c Release 12.1 up to its full

potenSal.

We hope you will find this material useful. Please visit us also on OTN

for further informaSon.

(5)

Agenda

SQL and Big Data

CalculaSng approximate answers

SQL made for analysis

Summary

1

2

3

4

All new

features

marked

as….

Some things you will be able to

do today and..

some things you will be able to

do tomorrow . . .

(6)

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

(7)

Conceptual View of Big Data Architecture

AcSonable

Events

Event Engine

Data

Reservoir

Data Factory

InformaSon Store

Enterprise

ReporSng

Discovery Lab

AcSonable

InformaSon

AcSonable

Insights

Data

Streams

ExecuSon

InnovaSon

Discovery

Output

Events

& Data

Structured

Enterprise

Data

Other

Data

SQL

(8)

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

Approximate result-sets

(9)

Exploring Today’s Big Data Lakes

Key

business challenges

Many queries rely on counts and/or staSsScal calculaSons

NDVs, Pareto’s 80:20 rule, idenSfying outliers etc.

Exact processing of large data sets is resource intensive

Exploratory queries don’t require completely accurate result

Trending analysis, social analysis, sessionizaSon analyScs

Oracle’s solu&ons

Provide “approximate result” capabiliSes in SQL

Key objec&ves

Return approximate results faster, minimal deviaSon from actual

(10)

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

Answer “

How many

…” type quesSons

How many unique sessions today

How many unique customers logged on

How many unique events occurred

COUNT (DISTINCT . . .)

Returns the exact number of rows that

contain disSnct values of specified

expression

Can be resource intensive because

requires sorSng

… significantly faster solu&on is

APPROX_COUNT_DISTINCT

(expr)

Processes large amounts of data

significantly faster

Uses HyperLogLog algorithm

Negligible deviaSon from exact result

Ignores rows containing null values

Supports any scalar data type

Does not support BFILE, BLOB, CLOB, LONG,

LONG RAW, or NCLOB

Genng Approximate Uniqueness Counts

(11)

Results for accuracy

Real world customer workload

Accuracy that is typically

97%

with

95% confidence

Performance Results

Real world customer workload

5-50x

improvement

Notes:

This approach does not use sampling, it uses a

hash-based approach

Ignores rows that contain a null value for specified

expression

Supports any scalar data type other than BFILE,

BLOB, CLOB, LONG, LONG RAW, or NCLOB

(12)

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

(13)

COUNT(DISTINCT…) processing:

SORT

operaSons

8GB

of memory

(PGA)

164GB

of

temp

(14)

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

(15)

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 15

With approx query processing:

No sort

Only 540MB PGA

Zero temp

1.

50x Faster

2.

15X Less Memory

3.

No temp

(16)

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

Approximate Count DisSnct

Independent exemplary performance and accuracy analysis

Data courtesy of ChrisSan Antognini

High accuracy with superior performance, using bounded memory

Comparison of COUNT(DISTINCT) with APPROX_COUNT_DISTINCT

(17)

SQL is made for

Analysis

(18)

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

The On-Going EvoluSon of AnalyScal SQL

Introduction of

“window” functions

8i

Enhanced window functions

(percentile, etc)

Rollup, grouping sets, cube

9i

SQL Pivot

Recursive WITH

ListAgg

Nth value window

11g

Pattern matching

Top N clause

Approx. count distinct

JSON support

12c

Slide - 18

Statistical functions

SQL model clause

Partition Outer Join

Data mining

(19)

Making Code Simpler: Pattern Matching with SQL

Java vs. SQL: Searching for ‘W’ Patterns in Stock Trade Data

250+ Lines of Java and PIG

package pigstuff; import java.io.IOException; import java.util.ArrayList; import java.util.Iterator; import org.apache.pig.EvalFunc; import org.apache.pig.PigException; import org.apache.pig.backend.executionengine.ExecException; import org.apache.pig.data.BagFactory; import org.apache.pig.data.DataBag; import org.apache.pig.data.DataType; import org.apache.pig.data.Tuple; import org.apache.pig.data.TupleFactory; import org.apache.pig.impl.logicalLayer.FrontendException; import org.apache.pig.impl.logicalLayer.schema.Schema; /** * * @author nbayliss */

private class V0Line { String state = null; String[] attributes; String prev = "”; String next = ””;

public V0Line(String[] atts) { attributes = atts;

}

public String[] getAttributes() { return attributes;

}

public void setState(String state) { this.state = state;

}

public String setState(V0Line linePrev, V0Line lineNext) { private boolean eq(String a, String b) {

private boolean gt(String a, String b) {

public Tuple exec(Tuple input) throws IOException { @Override

public Schema outputSchema(Schema input) { Schema.FieldSchema linenumber = new

Schema.FieldSchema("linenumber", DataType.CHARARRAY); Schema.FieldSchema pbykey = new

Schema.FieldSchema("pbykey", DataType.CHARARRAY);

Schema.FieldSchema count = new Schema.FieldSchema("count", DataType.LONG);

Schema tupleSchema = new Schema(); tupleSchema.add(linenumber); tupleSchema.add(pbykey); tupleSchema.add(count);

return new Schema(tupleSchema); }

}

Find a W-shape

paXern in a Scker

stream:

Output the

beginning

and

ending

date of the paXern

Calculate

average price

each the W-shape

Find only paXerns that

(20)

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

Making Code Simpler: Pattern Matching with SQL

Java vs. SQL: Searching for ‘W’ Patterns in Stock Trade Data

SELECT first_x, last_z

FROM ticker

MATCH_RECOGNIZE (

PARTITION BY name ORDER BY time

MEASURES FIRST(x.time) AS first_x,

LAST(z.time) AS last_z

ONE ROW PER MATCH

PATTERN (X+ Y+ W+ Z+)

DEFINE X AS (price < PREV(price)),

Y AS (price > PREV(price)),

W AS (price < PREV(price)),

Z AS (price > PREV(price) AND

z.time - FIRST(x.time) <= 7 ))

12 Lines of SQL

250+ Lines of Java and PIG

package pigstuff; import java.io.IOException; import java.util.ArrayList; import java.util.Iterator; import org.apache.pig.EvalFunc; import org.apache.pig.PigException; import org.apache.pig.backend.executionengine.ExecException; import org.apache.pig.data.BagFactory; import org.apache.pig.data.DataBag; import org.apache.pig.data.DataType; import org.apache.pig.data.Tuple; import org.apache.pig.data.TupleFactory; import org.apache.pig.impl.logicalLayer.FrontendException; import org.apache.pig.impl.logicalLayer.schema.Schema; /** * * @author nbayliss */

private class V0Line { String state = null; String[] attributes; String prev = "”; String next = ””;

public V0Line(String[] atts) { attributes = atts;

}

public String[] getAttributes() { return attributes;

}

public void setState(String state) { this.state = state;

}

public String setState(V0Line linePrev, V0Line lineNext) { private boolean eq(String a, String b) {

private boolean gt(String a, String b) {

public Tuple exec(Tuple input) throws IOException { @Override

public Schema outputSchema(Schema input) { Schema.FieldSchema linenumber = new

Schema.FieldSchema("linenumber", DataType.CHARARRAY); Schema.FieldSchema pbykey = new

Schema.FieldSchema("pbykey", DataType.CHARARRAY);

Schema.FieldSchema count = new Schema.FieldSchema("count", DataType.LONG);

Schema tupleSchema = new Schema(); tupleSchema.add(linenumber); tupleSchema.add(pbykey); tupleSchema.add(count);

return new Schema(tupleSchema); }

}

20

(21)

SQL

analy&cs

for all

your data

delivers faster processing,

richer analyScs over all your

data – relaSonal, Hadoop,

Hive, NoSQL, event streams

etc

Faster and Smarter SQL

Faster and resource

efficient and processing for

counSng and staSsScal

driven queries

Richer SQL

New richer funcSons for

deeper analysis of big data,

IoT data sets

MATCH_RECOGNIZE

SQL is Made for AnalyScs

AcSonable Events Event Engine Data Reservoir Data Factory

Enterprise InformaSon Store

ReporSng Discovery Lab AcSonable InformaSon AcSonable Insights Data Streams ExecuSon InnovaSon Structured Enterprise Data Other Data

(22)

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

Safe Harbor Statement

The preceding is intended to outline our general product direcSon. It is intended for

informaSon purposes only, and may not be incorporated into any contract. It is not a

commitment to deliver any material, code, or funcSonality, and should not be relied upon

in making purchasing decisions. The development, release, and Sming of any features or

funcSonality described for Oracle’s products remains at the sole discreSon of Oracle.

(23)

References

Related documents

Our hypothesis is that physical infrastructure as well as certain institutional characteristics, or what we call complementary infrastructure—the existence of repo markets,

Draw- ing on this research, we created a set of recommendations to im- prove the organization and created a set of marketing materials: a logo and promotional video for the

(b) According to section 79, in the case of a company in which public are not substantially interested, the unabsorbed business loss relating to any assessment year can be carried

Bringing Customers In According to Glenn, Transmission Magicians doesn’t do a lot of marketing to bring customers in the door. Most of their work comes in through referrals

individual practitioners, involving forces that push and pull them to conform to the community’s expectations. This thesis is an examination of how Ashtanga practitioners engage

Recurring EBITDA: EBITDA before non-recurring (exceptional) items. *Deconsolidation of CMore in November 2008. Diversification figures for 2008 excl.. Figures 2009 preliminary

In conclusion, for the studied Taiwanese population of diabetic patients undergoing hemodialysis, increased mortality rates are associated with higher average FPG levels at 1 and

The main wall of the living room has been designated as a &#34;Model Wall&#34; of Delta Gamma girls -- ELLE smiles at us from a Hawaiian Tropic ad and a Miss June USC