© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Vertica Live Aggregate Projections
Modern Materialized Views for Big Data
Nga Tran - HPE Vertica - [email protected]
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Outline
• What is Big Data?
• How Vertica provides Big Data Solutions?
• What are Materialized Views (MVs)?
• What are the Roles of MVs in Big Data?
• How does Vertica implement their MVs?
●
Why Vertica names their MVs Live Aggregare Projections (LAPs)?
●
What are the differences between LAPS & MVs?
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 3
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 4
What is Big Data?
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 5
What is Big Data?
●
You have
a lot of
data.
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 6
What is Big Data?
●
You have
a lot of
data.
●
You want to mine your data for “
treasures”
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 7
What is Big Data?
●
You have
a lot of
data.
●
You want to mine your data for “
treasures”
●You want
sub-second
mining processes. Why?
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 8
What is Big Data?
●
You have
a lot of
data.
●
You want to mine your data for “
treasures”
●You want
sub-second
mining processes. Why?
●
“Treasures” no more
after
that time.
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 9
What is Big Data?
●
You have
a lot of
data.
●
You want to mine your data for “
treasures”
●You want
sub-second
mining processes. Why?
●
“Treasures” no more
after
that time.
●“Treasures” help with
urgent
decisions.
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 10
What is Big Data?
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 11
What is Big Data?
●
HAVE:
a lot of
data.
●
WANT: dig them for
treasures
in
sub-second
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 12
What is Big Data?
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 13
What is Big Data?
●
Places to store and access your data
safely & efficiently.
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 14
What is Big Data?
●
Places to store and access your data
safely & efficiently.
●
Tools to analyze them quickly.
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 15
What is Big Data?
●
Places to store and access your data
safely & efficiently.
●
Tools to analyze them quickly.
How you make it?
●
[Columnar] Database
●Analytic Tools
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 16
What is Big Data?
●
Places to store and access your data
safely & efficiently.
●
Tools to analyze them quickly.
How you make it?
●
[Columnar] Database
●Analytic Tools
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Outline
• What is Big Data?
• How Vertica provides Big Data Solutions?
•
What are Materialized Views (MVs)?
•
What are the Roles of MVs in Big Data?
•
How does Vertica implement their MVs?
●
Why Vertica names their MVs Live Aggregare Projections (LAPs)?
●
What are the differences between LAPS & MVs?
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 18
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 19
How Vertica provides Big Data Solutions?
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 20
What is Vertica Analytic Database?
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 21
Column Store
Columnar and MPP RDBMS and More
Conventional databases store a
“heap file” with all records in row
order
id name game1_score game2_score profile_txt
1 Michael 1120 42 <long string> 2 Christopher 9001 17 <long string> 3 Matthew 4041 17 <long string> 4 Jessica 7218 42 <long string> 5 Ashley 3364 42 <long string>
Column stores store each column separately,
so you only read what you need
SELECT id, game2_score FROM scores;
id 1 2 3 4 5 name Michael Christopher Matthew Jessica Ashley game1_score 1120 9001 4041 7218 3364 game2_score 42 17 17 42 42 profile_txt
<big long string> <big long string> <big long string> <big long string> <big long string>
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 22
Sorted
Column Store
Columnar and MPP RDBMS and More
Want the highest values? Trivial if it’s sorted.
Joining two tables? If they’re sorted on the join key, just line up the rows.
id
1
4
5
2
3
name
Michael
Jessica
Ashley
Christopher
Matthew
game1_score
1120
7218
3364
9001
4041
game2_score
42
42
42
17
17
profile_txt
<big long string>
<big long string>
<big long string>
<big long string>
<big long string>
SELECT id, game2_score
FROM scores
ORDER BY game2_score DESC
LIMIT 3;
name
Michael
Christopher
Matthew
Jessica
Ashley
id
1
2
3
4
5
user_id
1
1
1
2
2
last_login_time
7:32 PM
5:18 PM
4:59 PM
3:02 PM
1:03 PM
SELECT scores.name,
MAX(sessions.last_login_time)
FROM scores, sessions
WHERE scores.id = sessions.user_id
GROUP BY scores.id, scores.name;
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 23
Compressed
Sorted
Column Store
Columnar and MPP RDBMS and More
Sorted fields often have lots of duplicate values. Let’s de-duplicate!
•
•
Saves lots of disk space, disk IO
•
Values don’t need to be identical
•
They will be similar; many compression schemes work well
•
Can compute directly on compressed data for more performance
•
COUNT(column) secretly rewritten as SUM(run_count)
•
SUM(column) secretly rewritten as SUM(run_count * value)
user_id
1
1
1
2
2
user_id
1 x3
2 x2
-
run_count of 3
-
run_count of 2
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 24
Distributed
Compressed
Sorted
Column StoreColumnar and MPP RDBMS and More
•
We’re already lining up join keys
•
Let’s segment the data across machines
•
Enforce that matching values are always on the same node
•
Means we can JOIN in parallel
name
Michael
id
1
user_id
1 x3
last_login_time
7:32 PM
5:18 PM
4:59 PM
name
Christopher
id
2
user_id
2 x2
last_login_time
3:02 PM
1:03 PM
Node_01
Node_02
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 25
ACID-Compliant
Distributed
Compressed
Sorted Column StoreColumnar and MPP RDBMS and More
The world is moving away from straight Map/Reduce; back towards transactional systems
•
Google (“F1: A Distributed SQL Database That Scales”; VLDB 2013)
•
Yahoo (“Peta-Scale Data Warehousing at Yahoo!”; Sigmod 2009)
Consistency is really hard !
If you leave it to each application developer, rather than guaranteeing it always, you get bugs.
?
?
name
Michael
id
1
user_id
1 x3
last_login_time
7:32 PM
5:18 PM
4:59 PM
name
Christopher
id
2
user_id
2 x2
last_login_time
3:02 PM
1:03 PM
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 26
ACID-Compliant
Distributed
Compressed
Sorted Column StoreColumnar and MPP RDBMS and More
The problem: Data being loaded and queried on all nodes by many users all at once
•
Must provide a consistent view of the data
•
Either you see the whole change (on all machines) or none of it
•
Must scale linearly (more nodes + more data == same performance)
name
Michael
id
1
user_id
1 x3
last_login_time
7:32 PM
5:18 PM
4:59 PM
name
Christopher
id
2
user_id
2 x2
last_login_time
3:02 PM
1:03 PM
Node_01
Node_02
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 27
ACID-Compliant
Distributed
Compressed
Sorted Column StoreColumnar and MPP RDBMS and More
The solution: Distributed Agreement
•
“Spread” (open-source library)
Node_01
Node_02
•
Send carefully-designed control messages around a
ring of nodes
•
Ring enforces order
•
Arbitrary/unordered processing between messages
UPDATE
SELECT
INSERT
Is the above diagram really accurate? No. If you want to know more, come interview with us!
name
Michael
id
1
user_id
1 x3
last_login_time
7:32 PM
5:18 PM
4:59 PM
name
Christopher
id
2
user_id
2 x2
last_login_time
3:02 PM
1:03 PM
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 28
name
Michael
id
1
user_id
1 x3
last_login_time
7:32 PM
5:18 PM
4:59 PM
name
Christopher
id
2
user_id
2 x2
last_login_time
3:02 PM
1:03 PM
Node_01
Node_02
UPDATE
SELECT
INSERT
?
a = true
WHEN b > 10
DBD
Query Optimizer
TM
Columnar and MPP RDBMS and More
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 29
Query Optimizer
Columnar and MPP RDBMS and More
Query
Optimizer
Query
Optimizer
SQL Query
Statistics & Histogram
Query Plan
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 30
Database Designer
Columnar and MPP RDBMS and More
DBD
DBD
Tables & Constraints
Data Set
Queries
Projection Set
Scenario:
Workload + Space Budget
Query Performance
Storage Footprint
Fault Tolerance
Recovery
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 31
Tuple Mover
Columnar and MPP RDBMS and More
TM
TM
Storage
Containers
Fewer but larger
Storage Containers
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 32
Analytic Queries and Functions
Columnar and MPP RDBMS and More
•Time Series:
evaluate the values of a given set of variables over time and group those values
into a window
SELECT symbol, slice_time, TS_FIRST_VALUE(bid1) AS first_bid
FROM Tickstore
WHERE symbol IN ('MSFT', 'IBM')
TIMESERIES slice_time AS '5 seconds' OVER (PARTITION BY symbol ORDER BY ts)
•Event Series Pattern Matching
SELECT uid, sid, ts, refurl, pageurl, action, event_name(), pattern_id(), match_id()
FROM clickstream_log
MATCH (PARTITION BY uid, sid ORDER BY ts
DEFINE Entry AS RefURL NOT ILIKE '%website2.com%' AND PageURL ILIKE '%website2.com%',
Onsite AS PageURL ILIKE '%website2.com%' AND Action='V',
Purchase AS PageURL ILIKE '%website2.com%' AND Action = 'P'
PATTERN P AS (Entry Onsite* Purchase)
ROWS MATCH FIRST EVENT);
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 33
Analytic Queries and Functions
Columnar and MPP RDBMS and More
Vertica can run R , C++ and Java extensions
internally.
We’ve even enabled parallel/distributed execution
within R itself
• “distributedR”
• Extensive collaboration with HP Labs and various
universities
• “Presto: Distributed Machine Learning and Graph
Processing with Sparse Matrices” (Eurosys 2013)
• Can use Vertica as a data store and preprocessor
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 34
name
Michael
id
1
user_id
1 x3
last_login_time
7:32 PM
5:18 PM
4:59 PM
name
Christopher
id
2
user_id
2 x2
last_login_time
3:02 PM
1:03 PM
Node_01
Node_02
UPDATE
SELECT
INSERT
Web App
BI Tools
?
a = true
WHEN b > 10
HDFS
R
DBD
Query Optimizer
Flex
(Unstructured)
TM
Text Search
LAPs
UDx
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 35
name
Michael
id
1
user_id
1 x3
last_login_time
7:32 PM
5:18 PM
4:59 PM
name
Christopher
id
2
user_id
2 x2
last_login_time
3:02 PM
1:03 PM
Node_01
Node_02
UPDATE
SELECT
INSERT
Web App
BI Tools
?
a = true
WHEN b > 10
HDFS
R
DBD
Query Optimizer
Flex
(Unstructured)
TM
Text Search
LAPs
UDx
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 36
What are LAPs?
●
LAPs = Live Aggregate Projections
●
~ Materialized Views in other Databases
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Outline
• What is Big Data?
•
How Vertica provides Big Data Solutions?
•
What are Materialized Views (MVs)?
•
What are the Roles of MVs in Big Data?
•
How does Vertica implement their MVs?
●
Why Vertica names their MVs Live Aggregare Projections (LAPs)?
●
What are the differences between LAPS & MVs?
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 38
What are Materialized Views (MVs)?
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 39
What are Materialized Views (MVs)?
Redundant Data Storages for Performance Purposes
Example 1: Avoid Expensive Joins
SELECT Emp, DateOfSale, Amount, Location
FROM SalesData, Employee
WHERE Emp = EmpID;
Emp
DateOfSale
Amount
Nga
2013-12-12
15K
Natalya
2013-12-30
7K
Jaimin
2014-01-10
3K
Nga
2014-01-20
11K
Jaimin
2014-01-30
10K
EmpID
Location
...
Jaimin
Boston
...
Natalya
New York
…
Nga
Chicago
...
Employee
SalesData
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 40
What are Materialized Views (MVs)?
Redundant Data Storages for Performance Purposes
Example 1: Avoid Expensive Joins
SELECT Emp, DateOfSale, Amount, Location
FROM SalesData, Employee
WHERE Emp = EmpID;
Emp
DateOfSale
Amount
Nga
2013-12-12
15K
Natalya
2013-12-30
7K
Jaimin
2014-01-10
3K
Nga
2014-01-20
11K
Jaimin
2014-01-30
10K
EmpID
Location
...
Jaimin
Boston
...
Natalya
New York
…
Nga
Chicago
...
Employee
SalesData
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 41
What are Materialized Views (MVs)?
Redundant Data Storages for Performance Purposes
Example 1: Avoid Expensive Joins
SELECT Emp, DateOfSale, Amount, Location
FROM SalesData, Employee
WHERE Emp = EmpID;
Emp
DateOfSale
Amount
Nga
2013-12-12
15K
Natalya
2013-12-30
7K
Jaimin
2014-01-10
3K
Nga
2014-01-20
11K
Jaimin
2014-01-30
10K
EmpID
Location
...
Jaimin
Boston
...
Natalya
New York
…
Nga
Chicago
...
Employee
SalesData
Normal Way: Run above query and get the results.
If many joins and a lot of data, the query can be
slow
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 42
What are Materialized Views (MVs)?
Redundant Data Storages for Performance Purposes
Example 1: Avoid Expensive Joins
SELECT Emp, DateOfSale, Amount, Location
FROM SalesData, Employee
WHERE Emp = EmpID;
Emp
DateOfSale
Amount
Nga
2013-12-12
15K
Natalya
2013-12-30
7K
Jaimin
2014-01-10
3K
Nga
2014-01-20
11K
Jaimin
2014-01-30
10K
EmpID
Location
...
Jaimin
Boston
...
Natalya
New York
…
Nga
Chicago
...
Employee
SalesData
Normal Way: Run above query and get the results.
If many joins and a lot of data, the query can be
slow
–> For better performance,
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 43
What are Materialized Views (MVs)?
Redundant Data Storages for Performance Purposes
Example 1: Avoid Expensive Joins
SELECT * from Sales;
Emp
DateOfSale
Amount
Nga
2013-12-12
15K
Natalya
2013-12-30
7K
Jaimin
2014-01-10
3K
Nga
2014-01-20
11K
Jaimin
2014-01-30
10K
EmpID
Location
...
Jaimin
Boston
...
Natalya
New York
…
Nga
Chicago
...
Emp
DateOfSale
Amount
Location
Nga
2013-12-12
15K
Chicago
Natalya
2013-12-30
7K
New York
Jaimin
2014-01-10
3K
Boston
Nga
2014-01-20
11K
Chicago
Jaimin
2014-01-30
10K
Boston
Employee
SalesData
Sales
The query before becomes simpler and run faster
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 44
What are Materialized Views (MVs)?
Redundant Data Storages for Performance Purposes
Example 1: Avoid Expensive Joins
SELECT * from Sales;
Emp
DateOfSale
Amount
Nga
2013-12-12
15K
Natalya
2013-12-30
7K
Jaimin
2014-01-10
3K
Nga
2014-01-20
11K
Jaimin
2014-01-30
10K
EmpID
Location
...
Jaimin
Boston
...
Natalya
New York
…
Nga
Chicago
...
Emp
DateOfSale
Amount
Location
Nga
2013-12-12
15K
Chicago
Natalya
2013-12-30
7K
New York
Jaimin
2014-01-10
3K
Boston
Nga
2014-01-20
11K
Chicago
Jaimin
2014-01-30
10K
Boston
Employee
SalesData
Sales
The query before becomes simpler and run faster
This simplified query is generated implicitly
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 45
What are Materialized Views (MVs)?
Redundant Data Storages for Performance Purposes
Example 2: Pre-aggregate Data
Emp
DateOfSale
Amount
Nga
2013-12-12
15K
Natalya
2013-12-30
7K
Jaimin
2014-01-10
3K
Nga
2014-01-20
11K
Jaimin
2014-01-30
10K
Jaimin
2014-02-15
18K
Nga
2014-02-16
5K
Nga
2014-02-25
9K
SalesData
SELECT year(dateOfSale) year, emp,
sum(amount) sum, max(amount) max
FROM SalesData
GROUP BY year, emp
ORDER BY year, emp;
Year
Emp
Sum
Max
2013
Natalya
7K
7K
2013
Nga
15K
15K
2014
Jaimin
31K
18K
2014
Nga
25K
11K
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 46
What are Materialized Views (MVs)?
Redundant Data Storages for Performance Purposes
Example 2: Pre-aggregate Data
Emp
DateOfSale
Amount
Nga
2013-12-12
15K
Natalya
2013-12-30
7K
Jaimin
2014-01-10
3K
Nga
2014-01-20
11K
Jaimin
2014-01-30
10K
Jaimin
2014-02-15
18K
Nga
2014-02-16
5K
Nga
2014-02-25
9K
SalesData
Year
Emp
Sum
Max
2013
Natalya
7K
7K
2013
Nga
15K
15K
2014
Jaimin
31K
18K
2014
Nga
25K
11K
For better performance, data is pre-aggregated and inserted to
SalesSummary
SELECT * from SalesSummary;
The query before becomes simpler and run faster
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 47
What are Materialized Views (MVs)?
Redundant Data Storages for Performance Purposes
Example 2: Pre-aggregate Data
Emp
DateOfSale
Amount
Nga
2013-12-12
15K
Natalya
2013-12-30
7K
Jaimin
2014-01-10
3K
Nga
2014-01-20
11K
Jaimin
2014-01-30
10K
Jaimin
2014-02-15
18K
Nga
2014-02-16
5K
Nga
2014-02-25
9K
SalesData
Year
Emp
Sum
Max
2013
Natalya
7K
7K
2013
Nga
15K
15K
2014
Jaimin
31K
18K
2014
Nga
25K
11K
For better performance, data is pre-aggregated and inserted to
SalesSummary
SELECT * from SalesSummary;
The query before becomes simpler and run faster
This simplified query is generated implicitly
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Outline
• What is Big Data?
•
How Vertica provides Big Data Solutions?
•
What are Materialized Views (MVs)?
• What are the Roles of MVs in Big Data?
•
How does Vertica implement their MVs?
●
Why Vertica names their MVs Live Aggregare Projections (LAPs)?
●
What are the differences between LAPS & MVs?
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 49
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 50
What are the Roles of MVs in Big Data?
●
HAVE:
a lot of
data.
●
WANT: dig them for
treasures
in
sub-second
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 51
What are the Roles of MVs in Big Data?
●
HAVE:
a lot of
data.
●
WANT: dig them for
treasures
in
sub-second
●
MVs helps bring
a lot of
data to a
small
set of data
→ Dig that
small
set of data can happen in
sub-second
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 52
What are the Roles of MVs in Big Data?
●
HAVE:
a lot of
data.
●
WANT: dig them for
treasures
in
sub-second
●
MVs helps bring
a lot of
data to a
small
set of data
→ Dig that
small
set of data can happen in
sub-second
●
Vertica is already fast. MVs will bring it to an even better level
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Outline
• What is Big Data?
•
How Vertica provides Big Data Solutions?
•
What are Materialized Views (MVs)?
•
What are the Roles of MVs in Big Data?
• How does Vertica implement their MVs?
●
Why Vertica names their MVs Live Aggregare Projections (LAPs)?
●
What are the differences between LAPS & MVs?
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 54
How Vertica Implements their MVs?
●
Answer these 2 questions will answer the above question
1. Why Vertica names their MVs Live Aggregate Projections (LAPs)?
2. What are the differences between LAPs and Mvs?
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 55
How Vertica Implements their MVs?
●
Answer these 2 questions will answer the above question
1. Why Vertica names their MVs Live Aggregate Projections (LAPs)?
2. What are the differences between LAPs and MVs?
●
The answers of question #2 will light up the answer of question 1.
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 56
How Vertica Implements their MVs?
●
Answer these 2 questions will answer the above question
1. Why Vertica names their MVs Live Aggregate Projections (LAPs)?
2. What are the differences between LAPs and MVs?
●
The answers of question 2 will light up the answer of question 1.
So What are the differences between LAPs and MVs?
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 57
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 58
MVs
•
Fully aggregated OR out-of-date
Differences between LAPs and MVs?
LAPs
•
Partially aggregated per load on each partition.
•Always up-to-date.
Partitioned by MONTH(DateOfSale)
a
Year
Emp
Sum
Max
2013-12
2013
Natalya
7K
7K
2013
Nga
15K
15K
a
Year
Emp
Sum
Max
2014-01
2014
Jaimin
13K
10K
2014
Nga
11K
11K
a
Year
Emp
Sum
Max
2014-02
2014
Jaimin
18K
18K
a
2014
Nga
14K
9K
Year
Emp
Sum
Max
2013
Natalya
7K
7K
2013
Nga
15K
15K
2014
Jaimin
31K
18K
2014
Nga
25K
11K
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 59
MVs
Compute/Recompute MVs data after INSERT
LAPs
Compute LAP's during INSERT
GroupBy
LAPs
Base
Table
Data
Recompute/
Maintenance
MVs
Base
Table
Data
Differences between LAPs and MVs?
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 60
MVs
●
Compute Delta
●
Recompute affected data for all related MVs to
bring them up-to-date before they are used
LAPs
Do nothing
Differences between LAPs and MVs?
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 61
MVs
●
Locking needed during MV maintenance
→ Heavy duty system
LAPs
Do nothing since data of new load will be in
their own container
Differences between LAPs and MVs?
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 62
MVs
●
Locking needed during MV maintenance
→ Heavy duty system
LAPs
Do nothing since data of new load will be in
their own container
Differences between LAPs and MVs?
Fully vs Partially Aggregated
→ Different Currency Control
GroupBy
LAPs
Base
Table
Data
Recompute/
Maintenance
MVs
Base
Table
Data
The recompute/Maintenance step is not cheap and
BaseTable has to be locked during that time
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 63
MVs
Read MV
LAPs
Read and Combine partially aggregated data
GroupBy
LAPs
MVs
Scan
Data
Data
Scan
Differences between LAPs and MVs?
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 64
MVs
Possible if MVs are not used right after
INSERT
LAPs
Use Tuple Mover (TM) to combine partially
aggregated data from different loads
GroupBy
LAPs
Scan
Differences between LAPs and MVs?
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 65
MVs
●
Recompute MVs for affected groups
LAPs
No delete single row but drop a whole partition
→ Do nothing
Differences between LAPs and MVs?
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 66
MVs
●
Fully aggregated but not always up-to-date
●Expensive maintenance process
●
Heavy-duty Concurrency Control
●Select data from MVs is fast
●
●
●
Background Maintenance not work since MVs
must be up-to-date
LAPs
●
Partially aggregated but always up-to-date
●No maintenance needed
●
No Concurrency Control needed on LAPs
●Select data from LAPs also fast but need to
combine data from different partitions &
loads.
●
Background Maintenance by TM
Differences between LAPs and MVs?
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 67
MVs
●
Fully aggregated but not always up-to-date
●Expensive maintenance process
●
Heavy-duty Concurrency Control
●Select data from MVs is fast
●
●
●
Background Maintenance not work since MVs
must be up-to-date
LAPs
●
Partially aggregated but always up-to-date
●No maintenance needed
●
No Concurrency Control needed on LAPs
●Select data from LAPs also fast but need to
combine data from different partitions &
loads.
●
Background Maintenance by TM
So why name Live Aggregate Projections?
Differences between LAPs and MVs?
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 68
MVs
●
Fully aggregated but not always up-to-date
●Expensive maintenance process
●
Heavy-duty Concurrency Control
●Select data from MVs is fast
●
●
●
Background Maintenance not work since MVs
must be up-to-date
LAPs
●
Partially aggregated but always
up-to-date
●No maintenance needed
●
No Concurrency Control needed on LAPs
●Select data from LAPs also fast but need to
combine data from different partitions &
loads.
●
Background Maintenance by TM
So why name Live Aggregate Projections?
Differences between LAPs and MVs?
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 69
MVs
●
Fully aggregated but not always up-to-date
●Expensive maintenance process
●
Heavy-duty Concurrency Control
●Select data from MVs is fast
●
●
●
Background Maintenance not work since MVs
must be up-to-date
LAPs
●
Partially aggregated but always
up-to-date
●No maintenance needed
●
No Concurrency Control needed on LAPs
●Select data from LAPs also fast but need to
combine data from different partitions &
loads.
●
Background Maintenance by TM
●
Vertica's physical storages are
Projections
So why name Live Aggregate Projections?
Differences between LAPs and MVs?
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 70
MVs
●
Fully aggregated but not always up-to-date
●Expensive maintenance process
●
Heavy-duty Concurrency Control
●Select data from MVs is fast
●
●
●
Background Maintenance not work since MVs
must be up-to-date
LAPs
●
Partially aggregated but always
up-to-date
●No maintenance needed
●
No Concurrency Control needed on LAPs
●Select data from LAPs also fast but need to
combine data from different partitions &
loads.
●
Background Maintenance by TM
●
Vertica's physical storages are
Projections
●Data is
pre-aggregated
So why name Live Aggregate Projections?
Differences between LAPs and MVs?
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 71