Enriching Customer Data With New Customer
Insights Using Big Data And Analytics
Mike Ferguson Managing Director
Intelligent Business Strategies Swiss BI Day
Geneva, October 2015
About Mike Ferguson
Mike Ferguson is Managing Director of Intelligent Business Strategies Limited. As an analyst and consultant he specialises in business intelligence, data management and enterprise business integration. With over 34 years of IT experience, Mike has consulted for dozens of companies, spoken at events all over the world and written numerous articles. Formerly he was a principal and co-founder of Codd and Date Europe Limited – the inventors of the Relational Model, a Chief Architect at Teradata on the Teradata DBMS and European Managing Director of DataBase Associates.www.intelligentbusiness.biz
Twitter: @mikeferguson1 Tel/Fax (+44)1625 520700
3
Topics
The increasing power of the customer
New Requirement - The Customer Intelligent Omni-Channel Front
Office
Why is new data needed for business survival?
Using Big Data and analytics to enrich customer insight for
competitive advantage
•
Assessing new data sources to determine business value•
Text analytics and social media data•
Graph analytics•
Clickstream analytics•
Re-analysing enriched customer data for data-driven growth
•
Integrating enriched customer data into the omni-channnel front
office
Business Survival: Today Customers Are Increasingly Well
Informed BEFORE They Buy
Important new data sources for analysis
•
Search data•
Clickstream data from web logs (including tracker data)New competitors More choice
On the web the customer is king
On the move
Easy to find Easy to compare
Voice of the customer
Easy to find ratings
Primarily B2C but B2B is increasingly following the same
5
Customer Power – Comparison Web Sites Are Having A
Major Influence On B2C And Some B2B Buying Behaviour
MySupermarket (retailer prices) Pricegrabber (CPG prices) GoCompare (car insurance) Google flights (flight prices) USwitch (broadband prices) CompareTheMarket (energy prices)
With Customers So Well Informed, Quality Of Product And
Service Plus Smooth Operations Become Very Important
Eliminate process errors
Customer sentiment will
7
Improve marketing, sales and service via on-demand access to insights and recommendation services relevant to each customer
E-commerce application Customer service apps Sales Force automation apps Customer facing outlet applications Front-Office Operations Personalised customer Insight Personalised customer recommendations Marketing applications
Customers
Common transaction services OMNI-CHANNEL Improve customer engagement Prescriptive analytics Predictive analyticsRequirement Is Consistent Customer Treatment Across All
Channels – The
Smart
Omni-Channel Front-Office
For Most Organisations A Customer Master Data
Management System Holds What Details About Customers
C R
U
customer
D
Customer master data
MDM sales distribution finance ops OLTP systems Transaction data EDW EDW mart DW & marts Customer and other insights
9
Key Questions?
Is ‘traditional’ customer identity data in your master
management system enough?
Do you have all the attributes in your Customer MDM
system that could be of value to your business?
Do you know about all the relationships that your
customers have in your Customer MDM system that
could be of value to your business?
Do you have all the insights about your customers
either in your DW or your MDM system that could be of
value to your business?
Why New Data?
– Huge Demand To Enrich Customer Master Data
11
Improving Customer Experience Via Time Series Analysis
Of All Customer Interactions
OMNI channel analysis – analyse all customer interactions across all channels
identity data behavioural data social data Customer “DNA”
New Data - Do You Collect Data From All Inbound And
Outbound Customer Interaction Points?
•
Direct mail•
In-store POS•
Kiosks•
Websites•
Search•
Online advertising sites•
Mobile devices•
Email•
SMS/ MMS (inbound and outbound)•
Social Media•
Customer service•
Call centres•
Client centresSystems of Engagement Do you know how your
13
Social Networks Are Getting Significant Business Interest –
Primarily From Marketing
Profiles (e.g. LinkedIn)
Ratings / Likes / Dislikes
Social Graph
Comments (e.g. Twitter)
Image source: www.flipthemedia.com
The Business Value Of Social Networks And Social
Network Analytics
Sentiment?
Circle of trusted friends?
Influencers?
How valuable ?
What are thedominant relationships? Who are the influencers? How valuable is the network?
15
Popular Big Web Data Analytic Applications That Can Help
Enrich Master Data
Clickstream analytics
•
Site navigation behaviour (session) analysis– Paths to buy, paths to abandonment, what else they looked at
– Improve customer experience and conversion – Associate clicks with customers & prospects
Social network influencer analysis
•
Graph analytics for influencer behavioural impact analysis•
‘Target the influencer’ marketing campaign effectivenessToday Both Structured And Multi-Structured Data Are
Needed For Deeper Insight
Multi-structured
data
Click stream web log data Customer interaction data Social interaction data
Sensor data
Rich media data (video, audio) External content
Documents Internal web content Seismic data (oil & gas)
Structured data OLTP system data Data warehouse data Personal data stores, e.g.
Excel, Access Often un-modelled and may
not be well understood
Often a schema is defined and data is well understood
17
Using Big Data And Hadoop To Enrich Customer
Knowledge In MDM And DW Systems
Parse & Prepare Data in Hadoop (Spark or MapReduce) Transform & Cleanse Data in Hadoop (Spark or MapReduce)
Discover data in Hadoop
ELT work -flow sandbox other data sandbox sandbox Data Reservoir (raw data)
Load data into Hadoop Data Refinery New high value Insights (pub/sub) contains clean,
high value data C
R U D Prod Asset Cust MDM EDW EDW mart DW & marts
The Problem With Enriching Customer Master Data
There could be potentially hundreds of possible data sources
to choose from
•
Which ones would add the highest value?•
How do you assess the candidate data sources?•
Who decides which ones to choose?•
How long does it currently take you to currently get business agreement on adding new attributes to a customer MDM system?•
How long does the data from a candidate data source retain its business value?
Can you use big data technology to help you decide which
data sources are important?
•
Yes!!•
Load into HDFS and run search over it to quickly explore it19
Data Deluge - Search Offers A Way To Quickly Explore
Multi-Structured Data Sources To Assess Their Value
BI Systems DWs & Data Marts Sales by Region Search and BI ECMS ECMS WWW content search indexes Free-form ad hoc analysis of multi-structured data Lucene search engine technology is part of the Hadoop software stack
Search On Hadoop - Data Scientists Can Quickly Explore
Newly Loaded Multi-Structured Data
e.g. Social Data Platforms HDFS files BI Tools, Applications, Mashups index index Index partition
21
Big Data Analysis - Exploratory Analysis of Multi-Structured
Data In Hadoop Via Search, e.g. Lucene Or IBM BigIndex
CMS Image server Collab tools File servers Web feeds email Web sites LOAD BI Tools, Applications, Mashups
Use massively parallel Map Reduce to build a partitioned search index
index index
Index partition
index partitions
Useful for analysing un-modelled semi-structured content that is not well understood
Hadoop Search Based Analytics
23
Assessing New Sources To Enrich Customer Data Is A
Collaborative Process – You Need Business In The Loop
IT Developer IT Data Architect Business data expert Business data expert Business analyst Data Steward Data Scientist Business data expert IT Data Architect
We need all relevant people to help determine high value data sources
We need to capture discussions, share exploratory results, rate data, prioritise projects
Goal: Enrich CUSTOMER data for better marketing
sandbox sandbox
Once You Have Assessed The Value You Can Start Data
Science Project(s) To Acquire New Data
For example additional data about customers could come from:
Social media data
•
Professional life•
Lifestyle•
Relationships•
Likes/dislikes•
Sentiment - positive or negative opinion•
Intent - wants to buy, travel, etc.•
Ownership - products owned (could be from competitors)•
Interests - Could be short-lived
In-bound customer email
25
Going Beyond Basic Identity Master Data
– E.g. Extending / Enriching Customer MDM
Customer interaction data Customer attitude data
Customer behaviour data Customer descriptive data
Email Chat / transcripts Call centre notes
Click stream Person-to-person dialogue
Opinions Preferences Needs and desires
Orders Payments Transaction history Usage history Attributes Characteristics Relationships Demographics Source: MDM
The objective is to create the best Customer dimension possible using additional internal and external data sources
MDM system with master data
services C R U D enriched customercustomer
Enriching Customer Data – Which Data Sources Potentially
Require Big Data Analytics To Derive Insight?
Customer interaction data Customer attitude data
Customer behaviour data Customer descriptive data
Email Chat / transcripts Call centre notes
Click stream Person-to-person dialogue
Opinions Preferences Needs and desires
Orders Payments Transaction history Usage history Attributes Characteristics Relationships Demographics
The objective is to create the best Customer dimension possible using additional internal and external data sources
MDM system with master data
services C R U D enriched customer Potential big data sources sensor data, web logs CRM, web logs CRM, social media data, review web sites social media data, SEC filings
27
Enriching Customer Data – Need To Consider Volume,
Variety And Velocity Of Valuable New Data Sources
Customer interaction data Customer attitude data
Customer behaviour data Customer descriptive data
Email Chat / transcripts Call centre notes
Click stream Person-to-person dialogue
Opinions Preferences Needs and desires
Orders Payments Transaction history Usage history Attributes Characteristics Relationships Demographics Source: MDM High volume undiscovered structured data
The objective is to create the best Customer dimension possible using additional internal and external data sources
MDM system with master data
services C R U D enriched customer sensor data, web logs CRM, web logs CRM, social media data, review web sites social media data Potential big data sources unstructured data
High velocity, high volume semi-structured data semi-structured data semi-structured data unstructured data
Enriching Customer Data – Different Platforms Optimised
For Different Analytical Workloads Are Needed
Big Data workloads result inmultipleplatforms now being needed for
analytical processing Streaming data Hadoop data store Data Warehouse RDBMS NoSQL DBMS EDW EDW DW & marts NoSQL DB e.g. graph DB NoSQL DB e.g. graph DB Advanced Analytic (multi-structured data) mart DW Appliance Advanced Analytics (structured data) Analytical RDBMS Traditional query, reporting & Data mining, model development Streaming analytics Real-time stream processing & decision m’gmt Investigative analysis, Data refinery Graph analysis
29
Key Point ! – Several Different Types Of Big Data Analytic
Workloads Can Be Used To Enrich Customer Data
Text analytics to get new structured data attributes from
millions of documents – e.g. SEC filings, tweets, reviews
•
Sentiment analytics for customer opinion
Graph analytics for discovery of new customer
relationships
Clickstream analytics for customer interaction behaviour
You can also combine these to find new data
•
E.g. Text analytics to extract new data feeding graph analytics to find relationships in extracted dataNew Data Sources - What Are We looking To Extract From
Social Media Data Sources?
Social Data Platforms HDFS files C R U customer D MDM System enrich
Requires several techniques: 1. JSON schema extraction 2. Text analytics for entity
extraction
3. Clickstream analysis
4. Graph analytics for relationship discovery analysis
Additional Person data e.g. hobbies, Interests, desires Additional Organisation data Unknown Relationships Intent Sentiment Product ownership data Professional data e.g. employers EDW EDW mart DW & marts
31
Social Media Data Challenges
– A Person Could Have Multiple Social Personas
Enriching Customer MDM - Extracting LinkedIn Social
Profile Data Via Their REST API
Most social media sites have APIs to access informaton
LinkedIn returns data in JSON or XML formats
Additional Person data e.g. education, interests Professional data e.g. employers, skills
33
Enriching Customer MDM - Text Analysis Can Help Extract
Structure From Unstructured Data
Case management
Fault management and field
service optimisation
“Voice of the customer”
Sentiment analytics
Competitor analysis
Media coverage analysis
Improve pharma drug trials
Unstructured content is hard to
analyse
How much isTEXTworth to your business?
Using Text Analytics To Extract Additional Data From
Unstructured Content
Requirement is automatic recognition of people, organisations, addresses This can be a computationally intensive process involving complex character-level operations such as pattern matching
35
The Text Analytics Process – Key Tasks
Extract raw text
(html, pdf, ps, gif) Tokenize
Detect term boundaries
Detect sentence boundaries
Tag parts of speech – nouns & verbs
Tag named entities
Person, place, organization, gene, chemical Parse Determine co-reference Extract knowledge
Text Analytics Applications
- What Is Sentiment Analysis?
Definition• The process of determining a sentiment scorefrom text
Why do it?
• Responding to negative sentiment quickly is important to improving customer satisfaction and loyalty and protecting brand
Data sources
• Contact centre customer interactions, e.g. email, SMS …..
• Twitter, Facebook, review web sites
Basic sentiment analysis
•
Classifies the polarity of a document, sentence or other text•
Positive, Negative, Neutral
Advanced sentiment analysis
•
“Beyond polarity" sentiment classification looks, at emotional statesAdditional Person data e.g. hobbies, Interests, desires
Intent Sentiment
37
Sentiment Analysis – Text Analytics Entity Extraction Is
Needed To Derive Structure From Unstructured Content
(source: Crunchbase)
Note: Not everyone is on Twitter!!
Some people have > 1 Twitter account
Challenges
Emoticons ( :-) :-< :0) ) Twitter hashtags #bigdata “Yoda” speak
Slang / vernacular / abbreviations Sarcasm
Ambiguity Spam
Multiple languages
Sentiment Analytics – The Process Of Associating Terms
With Sentiment Ratings
1 2
3
Source: Mining Text to Pinpoint Customer
Drill down
For customers who rated product xlow, how many of them mentioned “smell”
39
Sentiment Analysis Visualisation Example
– Sentiment Histograms
Source: Pardee Center Research Report: Connecting the Dots: Information Visualization and Text Analysis of the Searchlight Project Newsletter, Feb 2012
The Social Profile And Sentiment Analytics Can Be
Matched To Master Data In Hadoop Using Fuzzy Matching
Social Data Platforms Text Analysis Customer Engagement Management
Social Media Aggregators
Analyse / Index / Deliver Twitter Firehose MySpace Klout Amazon Facebook reddit Flickr Youtube bit.ly MapReduce or Spark sentiment scoring application HDFS files Scored sentiment and Social profile data Hive tables Probabilistic (‘fuzzy’) matching C R U customer critical fields enrich C R U enriched customer D MDM System
41
Sentiment Analysis Could Be Done On The Cloud While
Matching Could Be Done In-House
Social Data Platforms Text Analysis Customer Engagement Management
Social Media Aggregators
Analyse / Index / Deliver Twitter Firehose MySpace Klout Amazon Facebook reddit Flickr Youtube bit.ly CRM applications MapReduce or Spark sentiment scoring application HDFS files Hive tables Scored sentiment and Social profile data C R U customer D MDM System critical fields enrich C R U enriched customer D MDM System O n -p re m is e s O n -th e -c lo u d Probabilistic (‘fuzzy’) matching
Running A Master Data Matching Engine On Hadoop As A
MapReduce Job Matching People With Social Interactions
Product Example: IBM InfoSphere MDM BigMatch PME
43
Where Are We? - Enriching Customer Master Data With
New Relationships Using Graph Analysis
Customer interaction data Customer attitude data
Customer behaviour data Customer descriptive data
Chat / transcripts Call centre notes Clickstream
Person-to-person dialogue
Opinions Preferences Needs and desires
Orders Payments
Transaction history Usage history
Click stream navigation
Attributes Characteristics Relationships Demographics
Source: MDM
The objective is to create the best Customer dimension possible using additional internal and external data sources
MDM system with master data services C R U D Enriched customercustomer
Graph Analytics – Use Cases
Financial crimes
•
Anti-money laundering, fraud
Government benefits fraud
Insurance fraud
Crime prevention and counter terrorism
Social network influencer analysis
Route optimisation
•
Airlines, supply/distribution chain, logistics…
Life sciences (Bioinformatics)
•
Medical research, Disease pathologies45
Graph Analytics Example
- Social Network Relationships Analysis
Image source: Mashable.com
As graphs get more complex you don’t know the relationships and the less likely you would be in successfully partitioning the data
Graph Analysis – Verticies And Edges
- What Can Be Vertices?
Vertex (can have properties)
Edge (can have direction)
47
Graph Analysis
– Edges Are Often More Valuable Than Vertices
Source: Teradata
There Are A Range Of Graph Analytics Algorithms
- E.g. Teradata Aster Prepackaged Graph Algorithms
49
Graph Analysis Algorithm Example
- Eigen Centrality Could Highlight Important Influencers
Exploratory Graph Analysis
51
Using Text And Graph Analytics To Enrich Customer Data
- Entity Flow From SEC Filings
Extract Extract IntegrateIntegrate Millions of documents 2005 2013 Filing timeline SEC/FDIC Filings of Financial Companies Entity-centric view employment, director, officer insider, 5% owner, 10% owner Event Company Person Security Loan subsidiaries, insider, 5%, 10% owner, banking subsidiaries borrower, lender Source: IBM
Using Text AND Graph Big Data Analytics To Enrich
Customer Data - Detailed Entity Flow Overview
Post-Crawl AnalyticsText
U.S. S.E.C Securities and Exchange Commission Crawl Entity Integration Load Single
machine Product Example: IBM Big Match and BigInsights
• Per document, incremental • Parse and Extract using AQL
• Over all documents (non-incremental)
Nutch
segments JSON JSON
JSON (Nested Entities) • Nutch crawl for SEC. • Manual download for FDIC filings.
Part 1
Part 2
R E S T fu l A P I Q u e ry L a y e r Hadoop Graph Store Hadoop Graph Store53
Information Extracted From SEC filings
The information from the following SEC documents can be
extracted and consolidated into entities
Forms 3/4/5 Forms SC Forms 8 / 10 / DEF XML to Json Forms 13F Extract Extract Extract
No extractor run. Convert from XML to JSON. We get people and companies from here and the transactions between them. 5% or more Beneficial Ownership reports
Institutional Investment Manager Reports. Holdings.
Core Financial Information: Biographies, Loan Agreements, Merger & Acquisitions, Appointments & Resignations,
Committees, Board Positions, etc.
Source: IBM
employment, director, officer insider, 5% owner, 10% owner
Event Company Person Security Loan subsidiaries, insider, 5%, 10% owner, banking subsidiaries borrower, lender Forms 8-K
Forms 10-K, DEF 14A, 8-K, 3/4/5 Forms 10-K, DEF
14A, 8-K, 3/4/5, 13F, SC 13D, SC 13G, FDIC Call Report
Reference SEC table
Forms 13F, Forms 3/4/5 Forms 3/4/5, SC 13D, SC 13G, 10-K,
FDIC Call Report
Forms 3/4/5, SC 13D, SC 13G Forms 10-K, 10-Q, 8-K 5% beneficial ownership • owner • issuer • % owned • date Shareholders
• related institutional managers • Holdings in different securities
Subsidiaries • list subsidiaries of a
company
Current Events • merger and acquisition • bankruptcy
• change of officers and directors • material definitive agreements
Loan Agreements • loan summary details • counterparties (borrower,
lender, other agents) • commitments
Insider filings • transactions • holdings • Insider relationship
Officers & Directors • mention • bio range, age, current
position, past position • signed by • committee membership
55
Enriching Customer Master Data – Do You Attach Insights
To The Master Data Entity, Relationships Or Both
Image Source:http://www.computerweekly.com/feature/Whiteboard-it-the-power-of-graph-databases, byAndy Hogg
enrich
enrich
new relationship
Where Are We? - Enriching Customer Master Data With
Clickstream Interaction Behaviour Insight
Customer interaction data Customer attitude data
Customer behaviour data Customer descriptive data
Chat / transcripts Call centre notes Clickstream
Person-to-person dialogue
Opinions Preferences Needs and desires
Orders Payments
Transaction history Usage history
Click stream navigation
Attributes Characteristics Relationships Demographics MDM system
with master data services C R U D Enriched customercustomer What do logged in customers do and look at on-line?
57
A Common Way To Capture Weblog Data To Bring Into
Hadoop HDFS Is Using Apache Flume
Flume Sinks include
• HDFS sink - supports writing Avro files with arbitrary schemas • Solr sink with ETL capabilities.
• HBase
Source: Cloudera
Flume Master, is a separate service with knowledge of all the physical and logical nodes in a Flume installation
Exploratory Analysis Of Clickstream Data In Hadoop
– E.g. Weblog Data In Hortonworks
59
Putting Structure On Clickstream Data
- Creating A Hive View Over The Weblog Data
ClickStream Data With A Hive Schema Allows The Data To Be
Queried & Joined With Other Data, e.g. CRM And Product Data
61
Teradata Aster Discovery Portfolio
– Clickstream Visualisation Examples
Source: Teradata
Analysing Enriched Customer Data Can Improve Accuracy
Of Next Best Action To Be Taken
C R U D Enriched customer Enriched MDM System Additional Person data e.g. hobbies, Interests, desires Additional Organisation data Unknown Relationships Intent score Sentiment score Product ownership data Professional data e.g. employers
Life events Behaviour
analyse Next best action enrich Option 1 Option 2 EDW EDW mart DW & marts Additional Person data e.g. hobbies, Interests, desires Additional Organisation data Unknown Relationships Intent score Sentiment score Product ownership data Professional data e.g. employers
Life events Behaviour
analyse Next
best action enrich
63
Distributed Execution Of Analytics In A Data Refinery
Process – E.g. RapidMiner
Use Analytics On Enriched Master Data To Top Up
‘Todays Calls’ Into Salesforce.com, e.g. RapidMiner
65
Improve marketing, sales and service via on-demand access to smart
master data with insight on each and every each
customer available through all channels
E-commerce application Customer service app Sales Force automation app Customer facing outlet applications Front-Office Operations Marketing application
Enterprise Service Bus
Smart master data & master data services
C R U D Enriched customer Analytical services
Achieving Consistent Customer Treatment Via On-Demand
Access To Common Smart Analytical Master Data Services
Customers
OMNI-CHANNEL
Improve customer engagement
Conclusions
B2C and B2B customers are becoming very powerful
because they are getting informed before they buy
This means loyalty is becoming cheap and so organisations
have to try harder to keep customers
To understand customers better, companies need more data
Big data analytics can be very effective in providing new
insights to enrich customer data in DW and MDM systems
Integrating analytical and decision services that analyse
enriched customer data into OLTP applications can deliver
significant competitive advantage
67 www.intelligentbusiness.biz [email protected] Twitter: @mikeferguson1 Tel/Fax (+44)1625 520700