A GACP and GTMCP company
FREE Webinar by
June 19
th
, 2013
How to perform predictive analysis on your
web analytics tool data
A GACP and GTMCP company
?
Q
&
A
wwwBefore we start...
A GACP and GTMCP company
Our speakers
Carolina Araripe
Inbound Marketing Strategist @Tatvic
http://linkd.in/YazvVn
Amar Gondaliya
Data Model Engineer @Tatvic http://linkd.in/16cpDQIKushan Shah
Web Analyst @Tatvic http://linkd.in/18rfFfVA GACP and GTMCP company
Talking about Analytics…
Analytics
Descriptive: What has happened? Prescriptive: What should happen? Predictive: Predicts the outcome or futureA GACP and GTMCP company
Talking about Analytics…
Analytics
Descriptive: What has happened? Prescriptive: What should happen? Predictive: Predicts the outcome or futureA GACP and GTMCP company
In other words…
“Technology that learns from experience (data) to
predict the future behavior of individuals in order
to drive better decisions.”
Source: Siegel, E. (2013) “Predictive Analytics. The power to predict who will click, buy, lie or die.”
A GACP and GTMCP company
Outline of this webinar
Predictive Analytics
Tool
Data
Model
R
Analytics Google Regression LogisticA GACP and GTMCP company
Outline of this webinar
Predictive Analytics
Tool
Data
Model
R
Analytics Google Regression LogisticA GACP and GTMCP company
Introduction to R
What
•
Open source statistical computing language, widely used by
organizations to solve business problems.
Why
•
Easy to integrate
•
Data frame
•
Pre developed
packages
How to get
started
•
Download
and install
•
Choose and download
a user-friendly GUI
RStudioApplications
•
Data Analysis
•
Data Visualization
•
Statistical Tests
•
Predictive Model
•
Forecasting
A GACP and GTMCP company
R Packages
Data Extraction
Time Series
Machine Learning
For this webinar
Categories of Packages
Data Visualization
•
RGoogleAnalytics
Usage: To extract Google Analytics data into R
Contibutors: Michael Pearmain, Nick Mihailovski, Amar Gondaliya and Vignesh Prajapati
•
ggplot2
Usage: Build plots and charts
A GACP and GTMCP company
Outline of this webinar
Predictive Analytics
Tool
Data
Model
R
Analytics Google Regression LogisticA GACP and GTMCP company
Outline of this webinar
Predictive Analytics
Tool
Data
Model
R
Analytics Google Regression LogisticA GACP and GTMCP company
Google Analytics data
Extracting your GA data into R
User performing
data extraction
Google OAuth2
Authorization
Server
Google Analytics
API
Access Token Request
Access Token Response
Call API for list
of profiles
Call API for
A GACP and GTMCP company
Outline of this webinar
Predictive Analytics
Tool
Data
Model
R
Analytics Google Regression LogisticA GACP and GTMCP company
Outline of this webinar
Predictive Analytics
Tool
Data
Model
R
Analytics Google Regression LogisticA GACP and GTMCP company
Business Problem
$194.70 $225.50 $258.90 $296.70 $338.90 $384.90 2011 2012 2013 2014 2015 2016US Retail eCommerce Sales 2011-2016
(in billion $)
Projected Growth of Retail eCommerce in US
A GACP and GTMCP company
Business Problem
Product return
Average Return Rate 9 % 7 %
Average Order Value $100 $100 Orders Per Day 500 500 Total Income $50,000 $50,000 Loss due to returns $4,500 $3,500
Revenue post loss $45,500 $46,500 Increase in Revenue/day --- $1000
Product Return Impact (per day)
“Returns are on the rise-up 19% from 2007. For every US$1 spent on merchandize, 9¢ are returned.”
“Average return rate for ecommerce retailers varies from 3-12%.”
Source: Time Magazine, Sept. 04th, 2012
Increase in Revenue with
recovered returns in long run
Month x30 $30,000
A GACP and GTMCP company
Transactional Data
Pre Purchase
Data
In Purchase
Data
Browsing Behavior up to shopping
cart
Purchase Behavior from shopping
cart to thank you page
A GACP and GTMCP company
Loading Input Data
Introducing Model Variables
Model Creation
Model Performance
Applying Model to Test Data
A GACP and GTMCP company Training Data Machine Learning Algorithm
Test Data Predictive
Model
Predicted Outcome
labels Labels
Supervised Learning Model Variables
Labels are right answers from historical data
e.g.: Spam Detection
Input Data: Contains emails marked Spam/No Spam
Supervised Learning
Generates a function that maps inputs (labeled data) to desired outputs (e.g.: Spam Detection)
Variables
A GACP and GTMCP company
Loading Input Data
Introducing Model Variables
Model Creation
Model Performance
Applying Model to Test Data
A GACP and GTMCP company
Loading Input Data
Introducing Model Variables
Model Creation
Model Performance
Applying Model to Test Data
A GACP and GTMCP company
•
E.g.: Products purchased as gifts are less likely to be returned
•
Create a New Variable with binary values: 1 – Product purchased as gift, 0 –
otherwise
•
Products purchased in holiday season are more likely to be returned
•
Based on Purchase date, create new variable with binary values: 1 – Product
purchased in the month Nov-Dec, 0 - otherwise
Going beyond algorithms and using domain knowledge to augment new
variables to model
A GACP and GTMCP company 0.00 100,000.00 200,000.00 300,000.00 400,000.00 500,000.00 600,000.00 700,000.00 800,000.00 0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 5,000 Pri ce of Hou se ( $) Size of House (sq ft)
Predictor Variable
R
es
po
n
se
V
ar
iable
Predictor/Response Variables
A GACP and GTMCP company
Loading Input Data
Introducing Model Variables
Model Creation
Model Performance
Applying Model to Test Data
A GACP and GTMCP company
Formula
Response ~ Predictor (This argument shows which all variables are
independent (predictor) variables and which variable is/are
dependent(response) variable/s
Family
Binomial (Since the output variable (which is product return is
defined as binary value 0 or 1, we are using binomial family)
Data
Train data set – This data set consists values of all 18 variables (i.e.
values of dependent variables and independent variables are
given). This dataset is also called labeled data.
glm (formula, family, data)
A GACP and GTMCP company
Loading Input Data
Introducing Model Variables
Model Creation
Model Performance
Applying Model to Test Data
A GACP and GTMCP company
Loading Input Data
Introducing Model Variables
Model Creation
Model Performance
Applying Model to Test Data
A GACP and GTMCP company Training Data Machine Learning Algorithm
Test Data Predictive
Model
Predicted Outcome
labels
Labels
Supervised Learning Model
Variables
Labels are right answers from historical data
e.g.: Spam Detector
Input Data: Contains emails marked Spam/No Spam
Supervised Learning
Generates a function that maps inputs (labeled data) to desired outputs (e.g. Spam Detection)
Variables
A GACP and GTMCP company
Call customer before shipping
> 60 % < 60 % Number of Tr an sa ct ions
Probability of Product Returns
Probability of product return > 60% Probability of product return ≤ 60%
≤ 60 % > 60 %
A GACP and GTMCP company
Outline of this webinar
Predictive Analytics
Tool
Data
Model
R
Analytics Google Regression LogisticA GACP and GTMCP company
Outline of this webinar
Predictive Analytics
Tool
Data
Model
R
Analytics Google Regression LogisticA GACP and GTMCP company
Geometric Shapes
Scales and Coordinate Systems
Plot Annotations
ggplot2
A GACP and GTMCP company
A GACP and GTMCP company
Thank you!
Carolina Araripe
carolina@tatvic.com +91 7600-515-354