How to Build MicroStrategy Projects on Top of Big
Data Sources in the Cloud
#mstrworld
Traditional sources
moving online
Company, Government, Financial sector, Business and consumer studies, Surveys, Polls
All business performance drivers – Operational efficiency, Revenue management, Strategic planning
SOURCE
VALUE
Digital exhaust
from interactions
Online click-stream, Application logs, Call/service records, ID scans, Security cameras
New revenue sources, Consumer promotions, Risk management, Fraud detection
SOURCE
VALUE
Web 2.0
phenomenon
Content generated from social media posts, tweets, blogs, pictures, videos, ratings
Customer engagement, Customer service, Brand management, Viral marketing
SOURCE
VALUE
Internet of
things
Machine generated sensor data and machine to machine communication
Operational efficiency, Cost control, Risk avoidance
SOURCE
VALUE
Use Cases
for Big Data in the Cloud
Traditional sources moving online
How to take advantage of new technologies
Traditional relational data sources in the cloud
• RDBMS installed in the cloud (e.g. HP Vertica on Amazon EC2)
• Managed RDBMS in the cloud (e.g. Amazon RDS)
Relational Database technology build for the cloud, e.g.
• Amazon AWS (EMR, Redshift, Aurora)
• Google BigQuery
• RDBMS vendor cloud services (e.g. Microsoft, Oracle, Teradata, HP, IBM,
SAP, …)
Cloud services simplify and automate many aspects of data management,
#mstrworld
Some Database Features Require Conscious Design Choices
Query time often dominated by data access with significant performance impact
4
Data organization
•
Columnar vs. row based
Minimize data access
•
Partitioning key selection
•
Data sorting
•
(Index selection/strategy)
•
Compression (on/off; algorithm)
•
Approximate calculation (e.g. HyperLogLog)
Access and process data in parallel
•
Data distribution in MPP databases to minimize data movement
Existing best practices for developing MicroStrategy applications apply
Make sure to take advantage of db features designed for analytical workloads
Look for best practices to take advantage of data source strengths in
Traditional sources
moving online
Company, Government, Financial sector, Business and consumer studies, Surveys, Polls
All business performance drivers – Operational efficiency, Revenue management, Strategic planning
SOURCE
VALUE
Digital exhaust
from interactions
Online click-stream, Application logs, Call/service records, ID scans, Security cameras
New revenue sources, Consumer promotions, Risk management, Fraud detection
SOURCE
VALUE
Web 2.0
phenomenon
Content generated from social media posts, tweets, blogs, pictures, videos, ratings
Customer engagement, Customer service, Brand management, Viral marketing
SOURCE
VALUE
Internet of
things
Machine generated sensor data and machine to machine communication
Operational efficiency, Cost control, Risk avoidance
SOURCE
VALUE
Use Cases
for Big Data in the Cloud
#mstrworld
Identifying Value in Data Requires Utmost Flexibility
Static data models get in the way of analysis at the speed of thought
6
Digital exhaust
from interactions
Online click-stream, Application logs, Call/service records, ID scans, Security cameras
New revenue sources, Consumer promotions, Risk management, Fraud detection
SOURCE
VALUE
Technical Characteristics:
•
Unknown data sources are analyzed for
potential new business value.
•
Analysis necessary to support the
development of new business models
•
Data models don’t exist (yet).
A n a ly ti ca l C o m p le xi ty U se r S ca le
• Trained in modeling and coding • Use a variety of tools
• Want their favorite tools • Look for the truth
• Analytical amateurs • Power users of BI tools • Want to use the right tool • Look for the business edge
• Make the daily decisions • Some may be power users • Most need simple tools
• Look for actionable information
Data Scientists
Business Analysts
Business Users
Back Office Front Line
MicroStrategy Supports All Analytic Needs
#mstrworld
Choose how to access and analyze data
MicroStrategy
Provides Flexible Data Modeling Options
Direct
Unified MicroStrategy Metadata
• Reusable Data • Reusable Objects • Reusable Design Report
Modeled
Visual Insight DashboardID scans Online
click-stream Application logs
Call/service records Report Dashboard Visual Insight
Flexible data access
• Schema on read
• Supports quick iterations
Traditional sources
moving online
Company, Government, Financial sector, Business and consumer studies, Surveys, Polls
All business performance drivers – Operational efficiency, Revenue management, Strategic planning
SOURCE
VALUE
Digital exhaust
from interactions
Online click-stream, Application logs, Call/service records, ID scans, Security cameras
New revenue sources, Consumer promotions, Risk management, Fraud detection
SOURCE
VALUE
Web 2.0
phenomenon
Content generated from social media posts, tweets, blogs, pictures, videos, ratings
Customer engagement, Customer service, Brand management, Viral marketing
SOURCE
VALUE
Internet of
things
Machine generated sensor data and machine to machine communication
Operational efficiency, Cost control, Risk avoidance
SOURCE
VALUE
Use Cases
for Big Data in the Cloud
#mstrworld
The Web 2.0 Phenomenon Introduces Specific Challenges
Data access, data structure, and data meshing
10
Web 2.0
phenomenon
Content generated from social media posts, tweets, blogs, pictures, videos, ratings
Customer engagement, Customer service, Brand management, Viral marketing
SOURCE
VALUE
Access data where it exists
•
Web 2.0 data stored in relational data sources
•
Online services that also provide data services
•
E.g. Salesforce.com
•
Online services that provide data
•
Social
•
Government
•
Weather
MicroStrategy offers three ways to access Web 2.0 data
Data often requires structuring or
flattening for analysis
For optimal value data from
multiple sources need to be put in
context
User /
Departmental Data
Data Warehouse Appliances Big Data & NoSQL Relational Databases Multidimensional Databases Columnar Databases SaaS-Based App Data HANA BigInsights
Parallel Data Warehouse
Elastic Map Reduce Analysis Services Redshift B ri n g A ll R e le va n t D a ta t o D e ci si o n M a ke rs Distribution
No Data Left Behind
#mstrworld
D
A
T
A
P
R
O
C
E
S
S
IN
G
,
A
N
A
L
Y
T
IC
S
&
D
E
L
IV
E
R
Y
Dashboards Self-Service Analytics Reports and Statements OLAP Analysis
MicroStrategy Analytics Platform
1. Direct connection to source
• Parse structure with lightweight “Schema-on-read” functions • Import data or Create a modeled
environment
2. Using Web Services
• Requires data to be exposed as a Web Service
• Data will need to be structured prior to access
3. Offline “Process and Store”
• Using specialty analytics (text, streaming, image processing) and stored as structured
• Text Analytics Module
Semi-Structured Data
Unstructured Data
D
A
T
A
S
T
O
R
A
G
E
Web Logs Social media posts
Surveys Server Logs Geo-spatial
E-mail Image Audio Video
Sensor + Machine Data Documents
MicroStrategy Offers Several Paths to Mesh Data For Analysis
Integrating Modeled BI and Self-Service BI
Multi-Source Pushdown Joins
Structured BI Content
Consumption
Structured Data:
Architect
Structured Join:
Multi-Source Model
Corporate Data Sources
Dashboards and MicroApps
Cubes from Model
Ad Hoc / Visual Insight Join Datasets in Documents
Self Service BI Content
Creation
Self Service Data:
Data Import
Self Service Join:
Document Data
#mstrworld
Traditional sources
moving online
Company, Government, Financial sector, Business and consumer studies, Surveys, Polls
All business performance drivers – Operational efficiency, Revenue management, Strategic planning
SOURCE
VALUE
Digital exhaust
from interactions
Online click-stream, Application logs, Call/service records, ID scans, Security cameras
New revenue sources, Consumer promotions, Risk management, Fraud detection
SOURCE
VALUE
Web 2.0
phenomenon
Content generated from social media posts, tweets, blogs, pictures, videos, ratings
Customer engagement, Customer service, Brand management, Viral marketing
SOURCE
VALUE
Internet of
things
Machine generated sensor data and machine to machine communication
Operational efficiency, Cost control, Risk avoidance
SOURCE
VALUE
Use Cases
for Big Data in the Cloud
Internet of
things
Machine generated sensor data and machine to machine communication
Operational efficiency, Cost control, Risk avoidance
SOURCE
VALUE
Find Insights in Vast Amounts of Machine Generated Data
Machine generated data often does not lend itself for traditional OLAP analysis
Apply the methods of predictive analytics and data mining to
machine generated data
#mstrworld
Primary Work Horses of Data Mining
“Which Techniques Do You Use Most”
= MicroStrategy Native = via PMML = via R
Source: 2013 Rexer Data Miner Surveys
www.RexerAnalytics.com
Over 1,250 Data Miners from 75 Countries
MicroStrategy Support for Predictive Analytics
Predictive Analytics Are Part of MicroStrategy Function Library
Average Mean Count Sum Maximum Minimum Median Mode Product Rank Percentile “N”-Tile N-tile by Step N-tile by Value N-tile by Step and ValueReporting
Add Days Add Months Current Date Current Date & Time Current Time Day of Month Day of Week Day of Year Days Between Month Start Date Month End Date Months Between Year Start Date Year End Date
Date and Time
Standard Deviation Standard Deviation of a Population Variance Variance of a Population Geometric Mean Average Deviation Kurtosis Skew Statistical Aggregate Running Total Running Std Deviation Running Std Deviation of Population Running Minimum Running Maximum Running Count Moving Difference Moving Maximum Moving Minimum Moving Average Moving Sum Moving Count Moving Std Deviation Moving Std Deviation of Population
First or Last Value in Range Exponential Weight Moving Avg Exponential Weight Running Avg OLAP Functions Beta Distribution Beta Inverse Binomial Distribution Probability Chi Distribution Chi Inverse Confidence Correlation Coefficient Covariance Critical Binomial Distribution Chi Test (Independence) Cumulative Binomial Distribution Exponent Distribution F-Probability Distribution F-Test Fisher Transformation Gamma Distribution Gamma Inverse Gamma Logarithm Homoscedastic Ttest Heteroscedastic Ttest Hypergeometric Distribution Intercept Point Inverse of Lognormal Cumulative Distribution Inverse of F Probability Distribution Inverse of Fisher Inverse of the Std Normal Cumulative Distribution Inverse of the T-Distribution Lognormal Cumulative Distribution Mean T-Test Negative Binomial Distribution Normal Cumulative Distribution Normal Distribution Inverse Number of Permutations for a Given Object Paired T-test Poisson Distribution (Predict Number of Events) Pearson Product Moment Correlation Coefficient RSQ (Square of Pearson) Slope of Linear Regression
STEYX (Standard Error of Predicted “y”Value) Standardize Standard Normal Cumulative Distribution T-Distribution Variance Test Weibull Distribution (Reliability Analysis) Statistical Accrued Interest Accrued Interest Maturity Amount Received at Maturity Bond-equivalent Yield for T-BILL
Convert Dollar Price from Fraction to Decimal Convert Dollar Price from Decimal to Fraction Cumulative Interest Paid on Loan
Cumulative Principal Paid on Loan Depreciation for each Accounting Period Days In Coupon Period to Settlement Date Days In Coupon Period with Settlement Date Days from Settlement Date to Next Coupon Double-Declining Balance Method
Interest Rates Interest Rate Interest Payment Internal Rate of Return Interest Rate per Annuity Macauley Duration Modified Duration
Modified Internal Rate of Return Next Coupon Date After Settlement Date No of Coupons Settlement and Maturity Date Nominal Annual Interest Rate
No of Investment Periods Net Present Value
Payment on Principal Price
Price Discount Price at Maturity Present Value
Prorated Depreciation for each Period Straight Line Depreciation
Sum-Of-Years' Digits Depreciation T-BILL Price
T-BILL Yield
Variable Declining Balance Yield
Yield for Discounted Security
Financial
Absolute Integer A-cosine Ln
Hyp A-cos Log A-sine Log10 Hyp A-sine Mod A-tan Power A-tan2 Quotient Hyp A-tanRadians Ceiling Randbetween Combine Round Cosine Sine
Hyp Cosine Hyp Sine Degrees Square Root
Math Functions Association Rules Clustering General Regression Mining Neural Network Regression Rule Set
Support Vector Machine
Time Series Train Association Train Clustering Train Decision Tree Train Regression Train Time Series Tree Model Variants
#mstrworld
Deploy Any of 5000+
Open Source R
Analytics
As a MicroStrategy metric, use models and functions in any report or dashboard
MicroStrategy R
Integration Pack
Create Your Own
Custom Functions
MicroStrategy Custom
Function Plug-in
Import Predictive
Models from Popular
Packages
PMML Model
ƒ
Apply(X)Industry’s most powerful SQL Engine and 300+ native analytical functions
Predictions
Relationship Analysis
Benchmarking
Trend Analysis
Data Summarization
A n a ly ti c a l M a tu ri tyWhat is likely to happen based on past history?
What factors influence activity or behavior?
How are we doing versus comparables?
What direction are we headed in?
What is happening in the aggregate?
Optimization
What do we want to happen?
World’s most popular advanced analytics tool. Free, open source. More Specialty Tools
#mstrworld
Traditional sources
moving online
Company, Government, Financial sector, Business and consumer studies, Surveys, Polls
All business performance drivers – Operational efficiency, Revenue management, Strategic planning
SOURCE
VALUE
Digital exhaust
from interactions
Online click-stream, Application logs, Call/service records, ID scans, Security cameras
New revenue sources, Consumer promotions, Risk management, Fraud detection
SOURCE
VALUE
Web 2.0
phenomenon
Content generated from social media posts, tweets, blogs, pictures, videos, ratings
Customer engagement, Customer service, Brand management, Viral marketing
SOURCE
VALUE
Internet of
things
Machine generated sensor data and machine to machine communication
Operational efficiency, Cost control, Risk avoidance
SOURCE
VALUE