• No results found

Big Data Analytics. Analysis of high-volume and unstructured Data

N/A
N/A
Protected

Academic year: 2021

Share "Big Data Analytics. Analysis of high-volume and unstructured Data"

Copied!
39
0
0

Loading.... (view fulltext now)

Full text

(1)

Big Data Analytics

Analysis of high-volume and unstructured Data

Stefan Weingaertner, DYMATRIX CONSULTING GROUP

KNIME Meetup Italia, 10

th

October 2013

(2)

Agenda

1

Company Introduction

2

Big Data - an Introduction

3

Big Data Analytics on high-volume Data

5

Livedemo: Advanced Email Classification

4

Big Data Analytics on unstructured Data

(3)
(4)

DYMATRIX – The analytical CRM Company

»

Solution provider

for Customer Intelligence, Marketing Automation and

Advanced Predictive Analytics

»

Consulting, development and implementation know how, based upon

more than

900 projects

with mid- and large cap companies across

industries

»

Goal- and client- oriented project execution

based upon award winning,

established solutions

(5)

Our Consulting Competence Centers

Business Intelligence Advanced Analytics Campaign Management

»

Conception of (big)

data warehouse and business intelligence architectures

»

Enterprise Reporting Systems

»

Dashboards

»

Sales Controlling

»

Planning & Forecasting

»

Balanced Scorecard E-commerce insight » Customer Segmentation » Customer Value Analysis » Propensity Modeling (Cross-/Upsell/Churn) » Shopping Basket Analysis

» Credit Rating Analysis & Credit Scoring » Text Mining » Data Mining

Automation » Big Data Analytics

» Design and Optimization of Campaign Processes and Workflows » Implementation of Campaign Management Systems » Integration of Data Mining Models in Campaign Processes » Campaign Optimization » Consulting & Implementation of Next Best Activity Processes

» Web Tracking » Web Controlling » Web Mining » Real Time Recommendation » Social Media

Tracking & Analysis » Web Performance

Measurement » Customer Journey

Analytics

Analysis of client oriented processes

Initial situation – Analysis – Conception of processes for customer retention and its optimization - customer reactivation and new customer activation – benchmarking against industry leaders

(6)

Solution Portfolio – The Customer Insight Suite

DynaCampaign

» Intelligent multi-touchpoint campaign management platform

» Planning, target group selection, execution and response measurement of campaigns » Event-triggered realtime campaigning

DynaMine

» End2end automation of data mining processes » Intelligent model management for automation of preprocessing, training & scoring of models

DynaCision

» Realtime decision management platform » Design & exection of complex embedded

decision processess

DynaSocial

» Social CRM platform to listen, track, identify and quantify customer needs and sentiments

(7)

Our KNIME Solution Nodes & KNIME Consulting Services

PMML2SQL / PMML2SAS Converter

» Convert PMML to executable SQL Code for

In-Database-Scoring

» Convert PMML to executable SAS Code for Model Scoring within SAS

Big Data Integration

» Access any Hadoop large-scale distributed batch processing infrastructure from KNIME » Efficiently distribute large amounts of data &

preprocessing across a set of machines

Uplift Modeling

» Predictive Modeling Nodes to predict the incremental response to marketing actions » For up-sell, cross-sell, churn and retention

activities

Interactive Scorecard Builder

» interactive Scorecard Building Nodes for Design of Credit or Marketing Scorecards

+ Business Consulting

+ Analytical Consulting

+ Technical Consulting

+ Trainings

(8)

Referenzen

References

(9)

References

Media

Banks, Insurances

Utilities, Industries, Public

Schwäbisch Hall

(10)
(11)

A Characterization of Big Data

Big

Data

Volume

Structured Structured & Unstructured Streaming Batch Zettabyte Terabyte

(12)

Needs Possibilities Decisions Approach Purchase Delivery Usage Service & Support Remember

Challenge: Big Data Collection & Integration

(13)

Needs Possibilities Decisions Approach Purchase Delivery Usage Service & Support Remember

Big Data Analytics: Learn, Target & Influence!

(14)

Big Data Analytics on high-volume Data

Volume Structured Structured & Unstructured Streaming Batch Zettabyte Terabyte Big Data

(15)

Big Data Access

Hadoop Distributed File System (HDFS) MapReduce Hive HBase Hado o p Exte n si o n s Mahout A n al ytic A p p lic ation s H ad o o p Co re B ig D ata So u rc e s MapReduce Routines

(16)

Big Data Analytics

Hadoop Distributed File System (HDFS) MapReduce Hive HBase Hado o p Exte n si o n s Mahout A n al ytic A p p lic ation s H ad o o p Co re B ig D ata So u rc e s MapReduce Routines

PMML2SQL

Converter

(17)

Big Data Analytics on unstructured Data

Volume Structured Structured & Unstructured Streaming Batch Zettabyte Terabyte Big Data

(18)

80%

of the world’s data is

unstructured.

Unstructured data is growing at

15 times

the rate of structured

data.

Source: Google Trends April 6, 2012

Big Data is not just about structured data…

15 times

80%

(19)

»

…to classify all customer related text

messages by

Source / Origin

Sentiment

Product or Service

Business Transaction

Context

etc.

»

…to identify unknown trends

»

…to identify cause and effect relations

»

…to react on that information, e.g.

Technical Problems

Needs

Usability

Competition

etc.

Imagine…

The KNIME platform supports

these efforts with comprehensive

Text Analytics & Network Analytics

capabilities!

(20)

Deutsche Telekom: Social Earthquake

0

200

400

600

800

1000

1. Mrz. 8. Mrz. 15. Mrz. 22. Mrz. 29. Mrz. 5. Apr. 12. Apr. 19. Apr. 26. Apr.

Facebook Posts & Comments March & April 2013

Negativ

Neutral

Positiv

First Rumours: Limitation of Bandwidth (21.3. – 23.3.) „DSL-Drossel“:

Official Pressrelease on Limitation of Bandwidth leads to a Social Earthquake. (22.4. – 27.4.)

(21)
(22)

DYMATRIX Text Mining Process (KNIME Text Processing)

Text Datasources

Datasources:

Facebook

Twitter

Emails

Data Provider

like GNIP,

Datasift etc.

Crawled Data

etc.

For Machine

Learning

Provide Training

Data for

Classification

(e.g. Sentiment)

Text

Enrichment

Language Detection

English

German

Many more…

Language individual

NLP POS Tagging

Penn Treebank

Tagger

STTS Tagger

Text Cleansing

Stop Words

Punctuations

Stemming

Sentiment Amplifier

Matching of

Sentiment- &

Emoticon-Dictionaries

Subject

Matching

Text Tagging with

any Subjects

Products

Brands

Business

Transactions

Service

Complaints

Requests

etc.

Fuzzy Matching

with Dictionary

Tagger

Matching of

Subject-Dictionaries

Sentiment

Classification

Text Vectorization

Creation of text

predictors to

predict sentiments

Machine Learning

Classification with

Predictive

Analytics (e.g.

Decision Tree)

Retraining Interface

Adjustment of

misclassified

messages for

permanent

optimization of

classification

Information

Delivery

Text Data Mart

Make information

available in central

Text Data Mart for

visualization,

alerting etc.

Fields of Application

Email-Routing

Event triggered

Campaign

Management

etc.

(23)

DYMATRIX Text Mining Process: Datasources

Text Datasources

Information

Delivery

Sentiment

Classification

Subject

Matching

Text

Enrichment

Access any Text Datasource to start the

Text Mining Process

»

Facebook

»

Twitter

»

Emails

»

Crawler

»

Data Provider like GNIP, Datasift

etc.

Exemplified contribution on

Facebook Fanpage

(24)

DYMATRIX Text Mining Process: Text Enrichment

Why not sortyour signal issues out instead of bringing new phones out!!!! Wk 3 of crap [----] signal but yet paying FULL monthly contract! Vodafone sort it.

Sentiment Amplifier

sort[VBG] signal[VBP] issues [VBZ] instead[RB]

bringing[VBG] phones[NNS] Wk[NNP] 3[CD] crap[NN]

paying[VBG] monthly[RB] contract[NN] Vodafone[NNP]

Removal of Stop Words & Punctuations Penn Treebank POS Tagger (English Messages)

Why[WRB] not[RB] sort[VBG] your[PRP] signal[VBP] issues

[VBZ] out[IN] instead[RB] of[IN] bringing[VBG] new[JJ]

phones[NNS]!!!![SYM] Wk[NNP] 3[CD] of[IN] crap[NN]

but[CC] yet[RB] paying[VBG] FULL[NNP] monthly[RB]

contract[NN] ![SYM] Vodafone[NNP] sort[VBG] it[PRP]

.[SYM]

Text Datasources

Information

Delivery

Sentiment

Classification

Subject

Matching

Text

Enrichment

Original Facebook Message

Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap signal but yet paying FULL monthly contract! Vodafone sort it.

(25)

DYMATRIX Text Mining Process: Subject Matching

Subject Matching (Fuzzy Matching)

Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap signal [NETWORK] but yet paying FULL monthly contract! Vodafone sort it

[COMPLAINT].

Text Datasources

Information

Delivery

Sentiment

Classification

Subject

Matching

Text

Enrichment

Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap signal but yet paying FULL monthly contract! Vodafone sort it.

BUSINESS TRANSACTION: Complaint

NETWORK: No Signal

PRODUCT: Nokia Lumia 925 Original Facebook Message

(26)

DYMATRIX Text Mining Process: Sentiment Classification

Output from Text Enrichment

Predictors relevant for Text Classification , e.g.

- Emoticons positive/negative - Length of message - Fragments positive/negative - Likes

- Words positive/negative - Comments

- Author-related Inputs - Other linguistic Inputs

Text Vectorization (Transformation)

Text Datasources

Information

Delivery

Sentiment

Classification

Subject

Matching

Text

Enrichment

Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap signal but yet paying FULL monthly contract! Vodafone sort it.

Original Facebook Message

Text Classification with Decision Tree

(27)

DYMATRIX Text Mining Process: Information Delivery

Make information available in central Text Data Mart Visualization in DynaSocial

Original Facebook Message

Other Fields of Application

»

Subject-oriented Email-Classification

& Email-Routing

Text Datasources

Information

Delivery

Sentiment

Classification

Subject

Matching

Text

Enrichment

Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap signal but yet paying FULL monthly contract! Vodafone sort it.

Sentiment Business Transaction Product Relevance

+

+

+

+

Network

(28)
(29)
(30)

»

Text Enrichment & Classification Workflows

can be used for classification

of any electronic text message (e.g. Social Content, Blogs, Emails).

»

KNIME Server-based

Text Enrichment & Classification Workflows

can be

deployed as a webservice and called easily from any other application.

KNIME Server: Develop once, deploy everywhere!

Benefits

»

Uniformed Sentiment- and Classification-Handling

for all

customer-related messages.

(31)

Application Integration I: DynaSocial

(32)

Generic Big Data

Model

Social Media Analytics Data Management

Social Media Analytics Dashboard

DynaSocial – Social Media Excellence Architecture

Text Enrichment &

Classification

Network Insights

Advanced Social Media Analytics Text Mining & Network Mining

Facebook

Twitter

Social Media Analytics Content Extractor

Client individual Sources

Social Media Data Provider

Social Service Platforms

Emails

Integrated Social Inbox including all

Social Touchpoints

Social Engagement

Data Sources Sentiments & Classifications Reports & Dashboard

(33)

DynaSocial Management Dashboard

Activities Sentiment Ratio Key Influencer Platform Distribution Trends compared to competition (Share of Voice)

Geographic Distribution Overall Sentiments Top Keywords Flexible Selection of Time Windows

(34)
(35)

Application Integration II: Advanced Email-Classification

(36)

Email Classification: MS Exchange Connector

KNIME Server

Microsoft Exchange

Webservice

.NET Batch

Microsoft Outlook

2

Call .NET Procedure and transfer email contents to KNIME Server via Webservice Call.

Incoming Email

Call KNIME Text Enrichment & Classification

Workflows und return classification results.

Classification results are returned to Exchange Server and are saved persistantly with object categories.

Any clients having access to Exchange Server get the same classification.

1

4

3

5

Microsoft Outlook

(37)

Livedemo

Realtime

Email-Classification

(38)
(39)

Thank you for your attention.

We are happy to answer any of your questions!

DYMATRIX CONSULTING GROUP GmbH Zeppelin Carré

Lautenschlagerstrasse 2 D-70173 Stuttgart

Your Contact: Stefan Weingaertner Phone Fax E-Mail Web +49.711.22.007.88 - 12 +49.711.22.007.88 - 88 [email protected] www.dymatrix.de

Contact

References

Related documents

The team was cognisant of the need to offer a wide range of communications using various channels and different types of support to suit colleagues with different roles, availability

AIG collects information necessary to underwrite and administer the insured’s insurance cover, to maintain and to improve customer service and to advise the insured of our products

On the single objective problem, the sequential metamodeling method with domain reduction of LS-OPT showed better performance than any other method evaluated. The development of

Thus, after obtaining the set of financial ratios that possess the highest information content regarding the uncertainty level of firms within an industry group, we employ a

The theoretical concerns that should be addressed so that the proposed inter-mated breeding program can be effectively used are as follows: (1) the minimum sam- ple size that

(Industry)/ (Information technology) Submodule name Introduction to Project Management Submodule number WI-1.132.2. Main module Personnel and Project Management

Both, sustained attention and behavioral inhibition, are related to risky decision-making and risky decision- making in turn is associated with impulsivity and

However, Bhaduri and Marglin (op. cit.) did not express prop- erly the nature of the problems created by an open economy. Considering that profit share does not affect directly