• No results found

Vertica Live Aggregate Projections

N/A
N/A
Protected

Academic year: 2021

Share "Vertica Live Aggregate Projections"

Copied!
71
0
0

Loading.... (view fulltext now)

Full text

(1)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Vertica Live Aggregate Projections

Modern Materialized Views for Big Data

Nga Tran - HPE Vertica - [email protected]

(2)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Outline

• What is Big Data?

• How Vertica provides Big Data Solutions?

• What are Materialized Views (MVs)?

• What are the Roles of MVs in Big Data?

• How does Vertica implement their MVs?

Why Vertica names their MVs Live Aggregare Projections (LAPs)?

What are the differences between LAPS & MVs?

(3)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 3

(4)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 4

What is Big Data?

(5)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 5

What is Big Data?

You have

a lot of

data.

(6)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 6

What is Big Data?

You have

a lot of

data.

You want to mine your data for “

treasures”

(7)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 7

What is Big Data?

You have

a lot of

data.

You want to mine your data for “

treasures”

You want

sub-second

mining processes. Why?

(8)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 8

What is Big Data?

You have

a lot of

data.

You want to mine your data for “

treasures”

You want

sub-second

mining processes. Why?

“Treasures” no more

after

that time.

(9)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 9

What is Big Data?

You have

a lot of

data.

You want to mine your data for “

treasures”

You want

sub-second

mining processes. Why?

“Treasures” no more

after

that time.

“Treasures” help with

urgent

decisions.

(10)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 10

What is Big Data?

(11)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 11

What is Big Data?

HAVE:

a lot of

data.

WANT: dig them for

treasures

in

sub-second

(12)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 12

What is Big Data?

(13)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 13

What is Big Data?

Places to store and access your data

safely & efficiently.

(14)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 14

What is Big Data?

Places to store and access your data

safely & efficiently.

Tools to analyze them quickly.

(15)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 15

What is Big Data?

Places to store and access your data

safely & efficiently.

Tools to analyze them quickly.

How you make it?

[Columnar] Database

Analytic Tools

(16)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 16

What is Big Data?

Places to store and access your data

safely & efficiently.

Tools to analyze them quickly.

How you make it?

[Columnar] Database

Analytic Tools

(17)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Outline

• What is Big Data?

• How Vertica provides Big Data Solutions?

What are Materialized Views (MVs)?

What are the Roles of MVs in Big Data?

How does Vertica implement their MVs?

Why Vertica names their MVs Live Aggregare Projections (LAPs)?

What are the differences between LAPS & MVs?

(18)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 18

(19)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 19

How Vertica provides Big Data Solutions?

(20)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 20

What is Vertica Analytic Database?

(21)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 21

Column Store

Columnar and MPP RDBMS and More

Conventional databases store a

“heap file” with all records in row

order

id name game1_score game2_score profile_txt

1 Michael 1120 42 <long string> 2 Christopher 9001 17 <long string> 3 Matthew 4041 17 <long string> 4 Jessica 7218 42 <long string> 5 Ashley 3364 42 <long string>

Column stores store each column separately,

so you only read what you need

SELECT id, game2_score FROM scores;

id 1 2 3 4 5 name Michael Christopher Matthew Jessica Ashley game1_score 1120 9001 4041 7218 3364 game2_score 42 17 17 42 42 profile_txt

<big long string> <big long string> <big long string> <big long string> <big long string>

(22)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 22

Sorted

Column Store

Columnar and MPP RDBMS and More

Want the highest values? Trivial if it’s sorted.

Joining two tables? If they’re sorted on the join key, just line up the rows.

id

1

4

5

2

3

name

Michael

Jessica

Ashley

Christopher

Matthew

game1_score

1120

7218

3364

9001

4041

game2_score

42

42

42

17

17

profile_txt

<big long string>

<big long string>

<big long string>

<big long string>

<big long string>

SELECT id, game2_score

FROM scores

ORDER BY game2_score DESC

LIMIT 3;

name

Michael

Christopher

Matthew

Jessica

Ashley

id

1

2

3

4

5

user_id

1

1

1

2

2

last_login_time

7:32 PM

5:18 PM

4:59 PM

3:02 PM

1:03 PM

SELECT scores.name,

MAX(sessions.last_login_time)

FROM scores, sessions

WHERE scores.id = sessions.user_id

GROUP BY scores.id, scores.name;

(23)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 23

Compressed

Sorted

Column Store

Columnar and MPP RDBMS and More

Sorted fields often have lots of duplicate values. Let’s de-duplicate!

Saves lots of disk space, disk IO

Values don’t need to be identical

They will be similar; many compression schemes work well

Can compute directly on compressed data for more performance

COUNT(column) secretly rewritten as SUM(run_count)

SUM(column) secretly rewritten as SUM(run_count * value)

user_id

1

1

1

2

2

user_id

1 x3

2 x2

-

run_count of 3

-

run_count of 2

(24)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 24

Distributed

Compressed

Sorted

Column Store

Columnar and MPP RDBMS and More

We’re already lining up join keys

Let’s segment the data across machines

Enforce that matching values are always on the same node

Means we can JOIN in parallel

name

Michael

id

1

user_id

1 x3

last_login_time

7:32 PM

5:18 PM

4:59 PM

name

Christopher

id

2

user_id

2 x2

last_login_time

3:02 PM

1:03 PM

Node_01

Node_02

(25)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 25

ACID-Compliant

Distributed

Compressed

Sorted Column Store

Columnar and MPP RDBMS and More

The world is moving away from straight Map/Reduce; back towards transactional systems

Google (“F1: A Distributed SQL Database That Scales”; VLDB 2013)

Yahoo (“Peta-Scale Data Warehousing at Yahoo!”; Sigmod 2009)

Consistency is really hard !

If you leave it to each application developer, rather than guaranteeing it always, you get bugs.

?

?

name

Michael

id

1

user_id

1 x3

last_login_time

7:32 PM

5:18 PM

4:59 PM

name

Christopher

id

2

user_id

2 x2

last_login_time

3:02 PM

1:03 PM

(26)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 26

ACID-Compliant

Distributed

Compressed

Sorted Column Store

Columnar and MPP RDBMS and More

The problem: Data being loaded and queried on all nodes by many users all at once

Must provide a consistent view of the data

Either you see the whole change (on all machines) or none of it

Must scale linearly (more nodes + more data == same performance)

name

Michael

id

1

user_id

1 x3

last_login_time

7:32 PM

5:18 PM

4:59 PM

name

Christopher

id

2

user_id

2 x2

last_login_time

3:02 PM

1:03 PM

Node_01

Node_02

(27)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 27

ACID-Compliant

Distributed

Compressed

Sorted Column Store

Columnar and MPP RDBMS and More

The solution: Distributed Agreement

“Spread” (open-source library)

Node_01

Node_02

Send carefully-designed control messages around a

ring of nodes

Ring enforces order

Arbitrary/unordered processing between messages

UPDATE

SELECT

INSERT

Is the above diagram really accurate? No. If you want to know more, come interview with us!

name

Michael

id

1

user_id

1 x3

last_login_time

7:32 PM

5:18 PM

4:59 PM

name

Christopher

id

2

user_id

2 x2

last_login_time

3:02 PM

1:03 PM

(28)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 28

name

Michael

id

1

user_id

1 x3

last_login_time

7:32 PM

5:18 PM

4:59 PM

name

Christopher

id

2

user_id

2 x2

last_login_time

3:02 PM

1:03 PM

Node_01

Node_02

UPDATE

SELECT

INSERT

?

a = true

WHEN b > 10

DBD

Query Optimizer

TM

Columnar and MPP RDBMS and More

(29)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 29

Query Optimizer

Columnar and MPP RDBMS and More

Query

Optimizer

Query

Optimizer

SQL Query

Statistics & Histogram

Query Plan

(30)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 30

Database Designer

Columnar and MPP RDBMS and More

DBD

DBD

Tables & Constraints

Data Set

Queries

Projection Set

Scenario:

Workload + Space Budget

Query Performance

Storage Footprint

Fault Tolerance

Recovery

(31)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 31

Tuple Mover

Columnar and MPP RDBMS and More

TM

TM

Storage

Containers

Fewer but larger

Storage Containers

(32)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 32

Analytic Queries and Functions

Columnar and MPP RDBMS and More

•Time Series:

evaluate the values of a given set of variables over time and group those values

into a window

SELECT symbol, slice_time, TS_FIRST_VALUE(bid1) AS first_bid

FROM Tickstore

WHERE symbol IN ('MSFT', 'IBM')

TIMESERIES slice_time AS '5 seconds' OVER (PARTITION BY symbol ORDER BY ts)

•Event Series Pattern Matching

SELECT uid, sid, ts, refurl, pageurl, action, event_name(), pattern_id(), match_id()

FROM clickstream_log

MATCH (PARTITION BY uid, sid ORDER BY ts

DEFINE Entry AS RefURL NOT ILIKE '%website2.com%' AND PageURL ILIKE '%website2.com%',

Onsite AS PageURL ILIKE '%website2.com%' AND Action='V',

Purchase AS PageURL ILIKE '%website2.com%' AND Action = 'P'

PATTERN P AS (Entry Onsite* Purchase)

ROWS MATCH FIRST EVENT);

(33)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 33

Analytic Queries and Functions

Columnar and MPP RDBMS and More

Vertica can run R , C++ and Java extensions

internally.

We’ve even enabled parallel/distributed execution

within R itself

• “distributedR”

• Extensive collaboration with HP Labs and various

universities

• “Presto: Distributed Machine Learning and Graph

Processing with Sparse Matrices” (Eurosys 2013)

• Can use Vertica as a data store and preprocessor

(34)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 34

name

Michael

id

1

user_id

1 x3

last_login_time

7:32 PM

5:18 PM

4:59 PM

name

Christopher

id

2

user_id

2 x2

last_login_time

3:02 PM

1:03 PM

Node_01

Node_02

UPDATE

SELECT

INSERT

Web App

BI Tools

?

a = true

WHEN b > 10

HDFS

R

DBD

Query Optimizer

Flex

(Unstructured)

TM

Text Search

LAPs

UDx

(35)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 35

name

Michael

id

1

user_id

1 x3

last_login_time

7:32 PM

5:18 PM

4:59 PM

name

Christopher

id

2

user_id

2 x2

last_login_time

3:02 PM

1:03 PM

Node_01

Node_02

UPDATE

SELECT

INSERT

Web App

BI Tools

?

a = true

WHEN b > 10

HDFS

R

DBD

Query Optimizer

Flex

(Unstructured)

TM

Text Search

LAPs

UDx

(36)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 36

What are LAPs?

LAPs = Live Aggregate Projections

~ Materialized Views in other Databases

(37)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Outline

• What is Big Data?

How Vertica provides Big Data Solutions?

What are Materialized Views (MVs)?

What are the Roles of MVs in Big Data?

How does Vertica implement their MVs?

Why Vertica names their MVs Live Aggregare Projections (LAPs)?

What are the differences between LAPS & MVs?

(38)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 38

What are Materialized Views (MVs)?

(39)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 39

What are Materialized Views (MVs)?

Redundant Data Storages for Performance Purposes

Example 1: Avoid Expensive Joins

SELECT Emp, DateOfSale, Amount, Location

FROM SalesData, Employee

WHERE Emp = EmpID;

Emp

DateOfSale

Amount

Nga

2013-12-12

15K

Natalya

2013-12-30

7K

Jaimin

2014-01-10

3K

Nga

2014-01-20

11K

Jaimin

2014-01-30

10K

EmpID

Location

...

Jaimin

Boston

...

Natalya

New York

Nga

Chicago

...

Employee

SalesData

(40)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 40

What are Materialized Views (MVs)?

Redundant Data Storages for Performance Purposes

Example 1: Avoid Expensive Joins

SELECT Emp, DateOfSale, Amount, Location

FROM SalesData, Employee

WHERE Emp = EmpID;

Emp

DateOfSale

Amount

Nga

2013-12-12

15K

Natalya

2013-12-30

7K

Jaimin

2014-01-10

3K

Nga

2014-01-20

11K

Jaimin

2014-01-30

10K

EmpID

Location

...

Jaimin

Boston

...

Natalya

New York

Nga

Chicago

...

Employee

SalesData

(41)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 41

What are Materialized Views (MVs)?

Redundant Data Storages for Performance Purposes

Example 1: Avoid Expensive Joins

SELECT Emp, DateOfSale, Amount, Location

FROM SalesData, Employee

WHERE Emp = EmpID;

Emp

DateOfSale

Amount

Nga

2013-12-12

15K

Natalya

2013-12-30

7K

Jaimin

2014-01-10

3K

Nga

2014-01-20

11K

Jaimin

2014-01-30

10K

EmpID

Location

...

Jaimin

Boston

...

Natalya

New York

Nga

Chicago

...

Employee

SalesData

Normal Way: Run above query and get the results.

If many joins and a lot of data, the query can be

slow

(42)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 42

What are Materialized Views (MVs)?

Redundant Data Storages for Performance Purposes

Example 1: Avoid Expensive Joins

SELECT Emp, DateOfSale, Amount, Location

FROM SalesData, Employee

WHERE Emp = EmpID;

Emp

DateOfSale

Amount

Nga

2013-12-12

15K

Natalya

2013-12-30

7K

Jaimin

2014-01-10

3K

Nga

2014-01-20

11K

Jaimin

2014-01-30

10K

EmpID

Location

...

Jaimin

Boston

...

Natalya

New York

Nga

Chicago

...

Employee

SalesData

Normal Way: Run above query and get the results.

If many joins and a lot of data, the query can be

slow

–> For better performance,

(43)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 43

What are Materialized Views (MVs)?

Redundant Data Storages for Performance Purposes

Example 1: Avoid Expensive Joins

SELECT * from Sales;

Emp

DateOfSale

Amount

Nga

2013-12-12

15K

Natalya

2013-12-30

7K

Jaimin

2014-01-10

3K

Nga

2014-01-20

11K

Jaimin

2014-01-30

10K

EmpID

Location

...

Jaimin

Boston

...

Natalya

New York

Nga

Chicago

...

Emp

DateOfSale

Amount

Location

Nga

2013-12-12

15K

Chicago

Natalya

2013-12-30

7K

New York

Jaimin

2014-01-10

3K

Boston

Nga

2014-01-20

11K

Chicago

Jaimin

2014-01-30

10K

Boston

Employee

SalesData

Sales

The query before becomes simpler and run faster

(44)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 44

What are Materialized Views (MVs)?

Redundant Data Storages for Performance Purposes

Example 1: Avoid Expensive Joins

SELECT * from Sales;

Emp

DateOfSale

Amount

Nga

2013-12-12

15K

Natalya

2013-12-30

7K

Jaimin

2014-01-10

3K

Nga

2014-01-20

11K

Jaimin

2014-01-30

10K

EmpID

Location

...

Jaimin

Boston

...

Natalya

New York

Nga

Chicago

...

Emp

DateOfSale

Amount

Location

Nga

2013-12-12

15K

Chicago

Natalya

2013-12-30

7K

New York

Jaimin

2014-01-10

3K

Boston

Nga

2014-01-20

11K

Chicago

Jaimin

2014-01-30

10K

Boston

Employee

SalesData

Sales

The query before becomes simpler and run faster

This simplified query is generated implicitly

(45)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 45

What are Materialized Views (MVs)?

Redundant Data Storages for Performance Purposes

Example 2: Pre-aggregate Data

Emp

DateOfSale

Amount

Nga

2013-12-12

15K

Natalya

2013-12-30

7K

Jaimin

2014-01-10

3K

Nga

2014-01-20

11K

Jaimin

2014-01-30

10K

Jaimin

2014-02-15

18K

Nga

2014-02-16

5K

Nga

2014-02-25

9K

SalesData

SELECT year(dateOfSale) year, emp,

sum(amount) sum, max(amount) max

FROM SalesData

GROUP BY year, emp

ORDER BY year, emp;

Year

Emp

Sum

Max

2013

Natalya

7K

7K

2013

Nga

15K

15K

2014

Jaimin

31K

18K

2014

Nga

25K

11K

(46)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 46

What are Materialized Views (MVs)?

Redundant Data Storages for Performance Purposes

Example 2: Pre-aggregate Data

Emp

DateOfSale

Amount

Nga

2013-12-12

15K

Natalya

2013-12-30

7K

Jaimin

2014-01-10

3K

Nga

2014-01-20

11K

Jaimin

2014-01-30

10K

Jaimin

2014-02-15

18K

Nga

2014-02-16

5K

Nga

2014-02-25

9K

SalesData

Year

Emp

Sum

Max

2013

Natalya

7K

7K

2013

Nga

15K

15K

2014

Jaimin

31K

18K

2014

Nga

25K

11K

For better performance, data is pre-aggregated and inserted to

SalesSummary

SELECT * from SalesSummary;

The query before becomes simpler and run faster

(47)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 47

What are Materialized Views (MVs)?

Redundant Data Storages for Performance Purposes

Example 2: Pre-aggregate Data

Emp

DateOfSale

Amount

Nga

2013-12-12

15K

Natalya

2013-12-30

7K

Jaimin

2014-01-10

3K

Nga

2014-01-20

11K

Jaimin

2014-01-30

10K

Jaimin

2014-02-15

18K

Nga

2014-02-16

5K

Nga

2014-02-25

9K

SalesData

Year

Emp

Sum

Max

2013

Natalya

7K

7K

2013

Nga

15K

15K

2014

Jaimin

31K

18K

2014

Nga

25K

11K

For better performance, data is pre-aggregated and inserted to

SalesSummary

SELECT * from SalesSummary;

The query before becomes simpler and run faster

This simplified query is generated implicitly

(48)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Outline

• What is Big Data?

How Vertica provides Big Data Solutions?

What are Materialized Views (MVs)?

• What are the Roles of MVs in Big Data?

How does Vertica implement their MVs?

Why Vertica names their MVs Live Aggregare Projections (LAPs)?

What are the differences between LAPS & MVs?

(49)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 49

(50)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 50

What are the Roles of MVs in Big Data?

HAVE:

a lot of

data.

WANT: dig them for

treasures

in

sub-second

(51)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 51

What are the Roles of MVs in Big Data?

HAVE:

a lot of

data.

WANT: dig them for

treasures

in

sub-second

MVs helps bring

a lot of

data to a

small

set of data

→ Dig that

small

set of data can happen in

sub-second

(52)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 52

What are the Roles of MVs in Big Data?

HAVE:

a lot of

data.

WANT: dig them for

treasures

in

sub-second

MVs helps bring

a lot of

data to a

small

set of data

→ Dig that

small

set of data can happen in

sub-second

Vertica is already fast. MVs will bring it to an even better level

(53)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Outline

• What is Big Data?

How Vertica provides Big Data Solutions?

What are Materialized Views (MVs)?

What are the Roles of MVs in Big Data?

• How does Vertica implement their MVs?

Why Vertica names their MVs Live Aggregare Projections (LAPs)?

What are the differences between LAPS & MVs?

(54)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 54

How Vertica Implements their MVs?

Answer these 2 questions will answer the above question

1. Why Vertica names their MVs Live Aggregate Projections (LAPs)?

2. What are the differences between LAPs and Mvs?

(55)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 55

How Vertica Implements their MVs?

Answer these 2 questions will answer the above question

1. Why Vertica names their MVs Live Aggregate Projections (LAPs)?

2. What are the differences between LAPs and MVs?

The answers of question #2 will light up the answer of question 1.

(56)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 56

How Vertica Implements their MVs?

Answer these 2 questions will answer the above question

1. Why Vertica names their MVs Live Aggregate Projections (LAPs)?

2. What are the differences between LAPs and MVs?

The answers of question 2 will light up the answer of question 1.

So What are the differences between LAPs and MVs?

(57)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 57

(58)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 58

MVs

Fully aggregated OR out-of-date

Differences between LAPs and MVs?

LAPs

Partially aggregated per load on each partition.

Always up-to-date.

Partitioned by MONTH(DateOfSale)

a

Year

Emp

Sum

Max

2013-12

2013

Natalya

7K

7K

2013

Nga

15K

15K

a

Year

Emp

Sum

Max

2014-01

2014

Jaimin

13K

10K

2014

Nga

11K

11K

a

Year

Emp

Sum

Max

2014-02

2014

Jaimin

18K

18K

a

2014

Nga

14K

9K

Year

Emp

Sum

Max

2013

Natalya

7K

7K

2013

Nga

15K

15K

2014

Jaimin

31K

18K

2014

Nga

25K

11K

(59)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 59

MVs

Compute/Recompute MVs data after INSERT

LAPs

Compute LAP's during INSERT

GroupBy

LAPs

Base

Table

Data

Recompute/

Maintenance

MVs

Base

Table

Data

Differences between LAPs and MVs?

(60)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 60

MVs

Compute Delta

Recompute affected data for all related MVs to

bring them up-to-date before they are used

LAPs

Do nothing

Differences between LAPs and MVs?

(61)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 61

MVs

Locking needed during MV maintenance

→ Heavy duty system

LAPs

Do nothing since data of new load will be in

their own container

Differences between LAPs and MVs?

(62)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 62

MVs

Locking needed during MV maintenance

→ Heavy duty system

LAPs

Do nothing since data of new load will be in

their own container

Differences between LAPs and MVs?

Fully vs Partially Aggregated

→ Different Currency Control

GroupBy

LAPs

Base

Table

Data

Recompute/

Maintenance

MVs

Base

Table

Data

The recompute/Maintenance step is not cheap and

BaseTable has to be locked during that time

(63)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 63

MVs

Read MV

LAPs

Read and Combine partially aggregated data

GroupBy

LAPs

MVs

Scan

Data

Data

Scan

Differences between LAPs and MVs?

(64)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 64

MVs

Possible if MVs are not used right after

INSERT

LAPs

Use Tuple Mover (TM) to combine partially

aggregated data from different loads

GroupBy

LAPs

Scan

Differences between LAPs and MVs?

(65)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 65

MVs

Recompute MVs for affected groups

LAPs

No delete single row but drop a whole partition

→ Do nothing

Differences between LAPs and MVs?

(66)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 66

MVs

Fully aggregated but not always up-to-date

Expensive maintenance process

Heavy-duty Concurrency Control

Select data from MVs is fast

Background Maintenance not work since MVs

must be up-to-date

LAPs

Partially aggregated but always up-to-date

No maintenance needed

No Concurrency Control needed on LAPs

Select data from LAPs also fast but need to

combine data from different partitions &

loads.

Background Maintenance by TM

Differences between LAPs and MVs?

(67)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 67

MVs

Fully aggregated but not always up-to-date

Expensive maintenance process

Heavy-duty Concurrency Control

Select data from MVs is fast

Background Maintenance not work since MVs

must be up-to-date

LAPs

Partially aggregated but always up-to-date

No maintenance needed

No Concurrency Control needed on LAPs

Select data from LAPs also fast but need to

combine data from different partitions &

loads.

Background Maintenance by TM

So why name Live Aggregate Projections?

Differences between LAPs and MVs?

(68)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 68

MVs

Fully aggregated but not always up-to-date

Expensive maintenance process

Heavy-duty Concurrency Control

Select data from MVs is fast

Background Maintenance not work since MVs

must be up-to-date

LAPs

Partially aggregated but always

up-to-date

No maintenance needed

No Concurrency Control needed on LAPs

Select data from LAPs also fast but need to

combine data from different partitions &

loads.

Background Maintenance by TM

So why name Live Aggregate Projections?

Differences between LAPs and MVs?

(69)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 69

MVs

Fully aggregated but not always up-to-date

Expensive maintenance process

Heavy-duty Concurrency Control

Select data from MVs is fast

Background Maintenance not work since MVs

must be up-to-date

LAPs

Partially aggregated but always

up-to-date

No maintenance needed

No Concurrency Control needed on LAPs

Select data from LAPs also fast but need to

combine data from different partitions &

loads.

Background Maintenance by TM

Vertica's physical storages are

Projections

So why name Live Aggregate Projections?

Differences between LAPs and MVs?

(70)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 70

MVs

Fully aggregated but not always up-to-date

Expensive maintenance process

Heavy-duty Concurrency Control

Select data from MVs is fast

Background Maintenance not work since MVs

must be up-to-date

LAPs

Partially aggregated but always

up-to-date

No maintenance needed

No Concurrency Control needed on LAPs

Select data from LAPs also fast but need to

combine data from different partitions &

loads.

Background Maintenance by TM

Vertica's physical storages are

Projections

Data is

pre-aggregated

So why name Live Aggregate Projections?

Differences between LAPs and MVs?

(71)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 71

References

Related documents

An agency under the Department of Justice that is charged with custody and rehabilitation of national offenders, that is, those sentenced to serve a term of imprisonment of more

During the first part of her internship, the author worked on the SCCS’s software application layer by assisting multiple ground subsystems teams including Launch Accessories

When Cisco VoIP extensions will dial any extensions in 6XX series then for landing of those numbers to particular extension in Eternity GE table 001 will be used.. If external

Read, which is a program to promote reading among reluctant male readers.

This paper gives an overview of the task- parameterized Gaussian mixture model (TP-GMM) presented in previous publications, and introduces a number of extensions and ongoing

(2005) point out, ethnic networks may affect bilateral trade not only through their effect on trade costs (information and contracting costs), but also through pref- erences: members

In each case, panel A graphs the smoothed median, panel B plots the associated contour lines, panel C shows median weekly wages by age for various years, panel D gives wages by year

This chapter presents a descriptive review of the literature about non-pharmacological multicomponent interventions to prevent delirium among older adults receiving acute medical,