HP Vertica and MicroStrategy 10: a functional overview including
recommendations for performance optimization
MicroStrategy Data Access Workflows
There are numerous ways for MicroStrategy to interact with HP Vertica
•
Ad-hoc Schema
•
Live Connect
•
In-Memory Cube
MicroStrategy Is Most Commonly Used To Send Analytical Queries to HP Vertica
Analytical Queries have specific characteristics that differentiate them from operational queries
Analytical queries involve processing of massive
quantities of data
•
Accessing large data volumes
•
Processing massive data volumes
Typical Challenge
•
Achieve interactive response time
Vertica offers the following key features:
•
Columnar Orientation
•
Compression
•
Projections
Schema Design Is Essential for Analytical Query Performance
All key features are implemented as part of schema design
MicroStrategy Unique Optimizations for HP Vertica
Vertica-specific SQL syntax
•
Analytical functions (OLAP functions)
•
CASE expressions
•
Full outer joins
•
Set operators
•
Subqueries
Multi-pass SQL for Analytical
Sophistication
•
Use of temporary tables
•
Use Read Optimized Storage
•
Analyze statistics on temporary tables
•
Middle-tier computation of calculations not available in Vertica
Support for key Vertica features
•
Massively Parallel Processing
•
Label queries for simplified analysis
•
High Availability and Load Balancing
•
Secure connectivity
Extensions to Vertica functionality
•
Aggregate awareness with physical summary tables
•
Middle-tier caching via In-Memory Cubes
•
Report caching
MicroStrategy Generates Vertica-Specific SQL Syntax
MicroStrategy integrates with HP Vertica’s broad list of database functions and SQL functionality to
improve analytical performance
MicroStrategy Unique Optimizations for HP Vertica
Vertica-specific SQL syntax
•
Analytical functions (OLAP functions)
•
CASE expressions
•
Full outer joins
•
Set operators
•
Subqueries
Multi-pass SQL for Analytical
Sophistication
•
Use of temporary tables
•
Use Read Optimized Storage
•
Analyze statistics on temporary tables
•
M iddle-tier computation of calculations not available in
Vertica
Support for key Vertica features
•
Massively Parallel Processing
•
Label queries for simplified analysis
•
High Availability and Load Balancing
•
Secure connectivity
Extensions to Vertica functionality
•
Aggregate awareness with physical summary tables
•
Middle-tier caching via In-Memory Cubes
•
Report caching
MicroStrategy Generates Multi-Pass SQL Queries For Analytical Richness
By default MicroStrategy creates temporary tables to hold intermediate result sets
create
local
temporary table
ZZSP00
on commit preserve rows
as
select a13.YEAR_ID YEAR_ID,
a12.SUBCAT_ID SUBCAT_ID,
sum(a11.TOT_UNIT_SALES) WJXBFS1
from ITEM_MNTH_SLS a11,
LU_ITEM a12,
LU_MONTH a13
where a11.ITEM_ID = a12.ITEM_ID and
a11.MONTH_ID = a13.MONTH_ID
group by a13.YEAR_ID,
a12.SUBCAT_ID
unsegmented all nodes
user-session visibility
session-scoped data
Large Intermediate Result Sets Can Bypass Write Store for Better Performance
Query hint forces storage to Read Optimized Storage (ROS)
Example:
create local temporary table ZZMD00 on
commit preserve rows as
select
/*+ DIRECT */
a11.YEAR_ID YEAR_ID,
sum(a11.TOT_UNIT_SALES) WJXBFS1
from ITEM_MNTH_SLS a11
Data stored in the Read Optimized
Storage(ROS)
•
Highly organized by compression
Analyzing Large Intermediate Result Sets Improves Query Execution
MicroStrategy can instruct Vertica to generate statistics on temporary tables
Vertica Query Optimizer
101110010101
Analyze_statistics for temp table
MicroStrategy Provides Middle-tier Computations for Analytical Sophistication
Combining multiple insert statements, removes the overhead of parsing structurally identical statements
repeatedly
Row-by-Row Inserts are Slow
Bulk-Inserts are Fast
Requires time-consuming locking/unlocking of table
Uses Parameterized Statements to insert blocks of data
Enabling Parameterized Inserts in MicroStrategy Significantly Improves Response Time
Parameterized inserts are enabled a DB Instance level in MicroStrategy
•
Navigate to DB instance DB connection
•
Click Modify to edit the Database Connection.
MicroStrategy Avoids Unnecessary Workload on Vertica
Enabling SQL Global Optimization reduces the number of SQL passes improving query performance
SELECT Category
FROM ……
WHERE ……
HAVING Sum(Sales)>50000
SELECT Units Sold
Units Received FROM SQL Pass #N-2 SQL Pass #N-1 WHERE …… SELECT Category FROM …… WHERE …… HAVING Sum(Sales)>50000 SELECT Category FROM …… WHERE …… HAVING Sum(Sales)>50000
SELECT Units Sold
Units Received FROM SQL Pass #N-3 SQL Pass #N-2 WHERE
……
SQ L P as s #1 SQ L P as s #2 SQ L P as s #N -1 SQ L Pa ss #1 S Q L P a ss # N -1Before Global Optimization
Redundant SQL Pass
After Global Optimization Level 1
Redundant SQL Pass automatically
removed
SELECT Sum(Revenue) FROM ITEM_MTH_SLS WHERE …… GROUPBY ….. SELECT Sum(Revenue) Count(Item) FROM SQL Pass #1 SQL Pass #2 WHERE …… SQ L P as s #1 SQ L P as s #2 SQ L P as s #3 SELECT Count(item) FROM ITEM_MTH_SLS WHERE …… GROUPBY ….. SELECT Sum(Revenue) Count(Item) FROM ITEM_MTH_SLS ……. WHERE ……Si
n
gl
e
S
Q
L
P
as
s
Before Global Optimization
Metric definitions force different
SQL passes
After Global Optimization Level 2
SQL Engine automatically combines
MicroStrategy Pushes Smart SQL to Vertica
SQL Global Optimization is enabled by default for Vertica
SELECT …… FROM ……
WHERE SLS.REGION
=LU_REGION.REGION
AND REGION = Central
SELECT Northeast Revenue, Central Revenue , Southeast Revenue FROM SQL Pass #1 SQL Pass #2 SQL Pass #3 WHERE …. SELECT …… FROM …… WHERE SLS.REGION =LU_REGION.REGION
AND REGION = Northeast
SELECT Sum(SLS.Revenue(IF LU_REGION.REGION = Northeast,0), Sum(SLS.Revenue(IF LU_REGION.REGION = Central), Sum(SLS.Revenue(IF LU_REGION.REGION = Southeast), FROM …… WHERE SLS.REGION =LU_REGION.REGION SQ L P as s #1 SQ L P as s #2 SQ L P as s #4
Before Global Optimization
Filter Conditions Force Four
Separate SQL Passes
After Global Optimization Level 3
Resolve Filter Conditions into a Single SQL
Pass
CREATE TABLE ZZMD01 AS
SELECT Category
……
FROM ……
WHERE Region = Northeast
AND …… S Q L P a ss # 1 SQ L P as s #1 CREATE TABLE ZZMD02 AS SELECT Category ……. FROM ……
WHERE Region = Central
AND …… CREATE TABLE ZZMD01 AS SELECT Category Sum(Revenue(IF.REGION = Northeast,0)), Sum(Revenue(IF REGION = Central,0)) …… FROM …… WHERE ……