• No results found

10 BEST PRACTICES FOR SOLUTION ARCHITECTURES THAT WOULD TAME BIG DATA!!!

N/A
N/A
Protected

Academic year: 2021

Share "10 BEST PRACTICES FOR SOLUTION ARCHITECTURES THAT WOULD TAME BIG DATA!!!"

Copied!
46
0
0

Loading.... (view fulltext now)

Full text

(1)
(2)
(3)

BIG DATA BEST PRACTICE-1

(4)

T

HE

IDEA

IN

B

RIEF

 What are the questions at the heart of the

problem ?

 Formulate the

hypothesis/questions at the heart of the issue ! Distill them into a clear set of hypothesis to be tested

Remember Hadoop and

associated technology components are a means

 Isolate $ Denting Analytical Use Case

(5)

R

EAL

LIFE

EXAMPLE

:

C

URATING

USE

CASE

IN

TELECOM

SECURITY

INTELLIGENCE

Business Context

What new signals to listen to

prevent adverse events from

happening ?

4 Data Pools

 Netsweepeer logs  Radius logs  Switch CDR  MMS logs 

2 Use cases

 Watch list analysis + Network link analysis

(6)

 Have an intensive ½ day cross functional workshop with business to boil down the game changing use case

 Is it a “nice to have” use case or a “$ impacting use case” ?

 Who is the consumer of the use case ?

 How does it help him optimize cost or reduce risk or increase revenue ?

(7)

BIG DATA BEST PRACTICE-2

I

MPACT

AHA

MOMENT

IN

60

90

DAYS

.

(8)

T

HE

IDEA

IN

BRIEF

D

ELIVER

F

IRST

B

IG

D

ATA

AHA

MOMENT

IN

6090

DAYS

Skeletal MVP : End to

end implementation

that links all

architectural

components together

Could be the answer

to a

previously

unanswered question

Propels momentum

(9)

A

REAL

LIFE

EXAMPLE

 Industry = OTA

 Context : Important to improve look to book

 Is there a co-relation

between response time of a web page and the look to book ratio ?

 Hadoop cluster + Infobright + Hive jobs ready in 3

weeks

 Scaled data and

improvised dashboard experience for another 3 weeks

 Business readout in 6 weeks

(10)

T

HEREFORE

Break it into 3 chunks

 30 day milestones

 60 day milestones

 90 day milestones

 In 30 days plan to cover functional breadth

 Hadoop infrastructure + cluster

 Integrate disparate components – data pipeline, Columnar database, machine learning process , Hadoop cluster

 Have a small file go from start to end thru the process chain

 In 60 days plan to cover scalability

 Scale for 12 months data atleast

 Tableau / Pentaho

 In 90 days plan to cover bells n whistles

 Configurators

 Alerters

 Additional abtraction

(11)

BIG DATA BEST PRACTICE-3

A

CTIONS

NOT

INSIGHTS

DATA

INSIGHTS

(12)

B

EST

P

RACTICE

-3

A

CTIONS

NOT

INSIGHTS

 Actions are executed in the frontline

 Call centre

 Mobile

 Store channel

Digital channel

 Actions could be

 Behaviour based discounts

 Help close a digital transaction

 Serve customized webpage

Take proactive actions

 Insights are nice to know  Actions impact $

(13)

T

HEREFORE

W

HAT ACTIONS ARE

DRIVEN AS A RESULT OF THESE INSIGHTS

?

H

OW ARE WE DISSEMINATING INSIGHTS TO FRONT LINE CHANNELS

?

A

SK

SO WHAT

” 5

TIMES

!!!

(14)

BIG DATA BEST PRACTICE-4 :

(15)

R

EAL

LIFE

EXAMPLE

Keyword frequency

 “Leaks”, “Leakage”,

 “Noise”, “Sound”,

 “Vibrations”

Noise / leakage frequency is a better

predictor of repeat sales than any

other indicators including marketing

spends !!!

(16)

A

REAL

LIFE

EXAMPLE

S lid e 16 XYZ Online Buzz analysis

How can we create a strategy to respond to what we are hearing about XYZs buzz online ?

Business Question • Text mining

• Visual data exploration • Hypothesis testing • Affinity analysis

Statistical Technique

Sentiment trends :+/-

Sentiment benchmark with McDonalds Top keywords for XYZ

Top keywords for McDonalds Keyword affinities

Insights derived

•Theme specific campaigns • NPD process

• Instore experience

• Reverse impact of negative buzz

Business Action

www.yelp.com

Raw data

(17)

W

HERE

DO

CUSTOMERS

EXPRESS

THEMSELVES

?

Slide 17

Universe of XYZ sentiment data = 5 sources, 5556 posts,3 years data we’s phase-1 analysis = www.yelp.com, 136 posts, 2 years data

136

posts

Yelp.com

552

posts

Epinions.com

2854

posts

planetfeedback.com

1500

posts

Twitter.com

500

posts

Facebook.com

(18)

S

OURCE

= T

WITTER

.

COM

(19)

S

OURCE

= Y

ELP

.

COM

(20)

S

OURCE

= F

ACEBOOK

.

COM

(21)

S

TEP

BY

STEP

SENTIMENT

TEXT

MINING

PROCESS

Slide 21

Process

• Blogs • Customer review sites • Online consumer forum • Customers\Ven dors emails • Unstructured data from Applications

Input

Output

• Inferences • Customer’s sentiments

(22)

O

VERALL

S

ENTIMENTS

D

ASHBOARD

(23)

T

HEREFORE

R text mining algorithm

RHadoop

(24)

BIG DATA BEST PRACTICE-5 :

COLUMNAR &IN MEMORYARCHITECTURES TO SPEED UP CHAIN OF THOUGHT

Which devices are infected from a malicious attack ?

(25)

H

OW

TO

H

ANDLE

“N

EEDLE

IN

A

H

AYSTACK

W

ORKLOADS

?

What happened on

firewall-3 between

3:17 and 3:21 am ?

How many payment

gateway drops

happened between

9:47 am and 9:52 am

on 15-Nov-2012 ?

Data forensic queries

supporting chain of

thoughts

(26)

26

Id Name Designation Tenure

S1 Prem Founder 8

S2 Simon Security Architect 5 S3 Bhavana Sales Head 6

S4 Ram CEO 3

S5 Shyam Developer 1

S1PremFounder8 S2SimonSecurityArchitect5 S3BhavanaSalesHead6 S4RamCEO3 S5ShyamDeveloper1

S1S2S3S4S5PremSimonBhavanaRamShyamFounderSecurityHeadSalesHeadCEODeveloper85631

(27)

interactive or real-time query for large datasets =key to analyst productivity (support chain of thought analysis).

Chain of thought analysis = Explore data torrent by quickly running off a series of iterative queries, each informed by the last.

Most solutions aren’t fast enough and reduce analytical effectiveness when users chain of thought process is interrupted

In memoy DB Tools

 Dremel at Google, Druid at Metamarkets,  Sting at Netflix,

Cloudera’s Impala

C Berkeley’s AMPLab’s Spark, SAP Hana,

Platfora.

(28)

T

HEREFORE

Examine

columnar databases

and

inmemory databases

to

speed up important query workloads

Download evaluation version of Actian, Infobright and do a

(29)

B

EST

P

RACTICE

-6

H

OW

TO

P

LAN

FOR

100

X

SCALABILITY

?

BIG DATA BEST PRACTICE-6

:

T

HINK

100

X

S

CALABILITY

!!!

(30)

R

EAL

LIFE

EXAMPLE

Industry

= Telecom

Business context

National content filtering solution

Events Generated Per Day

:

1 Billion Events

New URL’s Classified per Day

:

1 Million

(31)

Price sensitive search Store search Ratings based ordering Comparator events Basket add events Payment Gateway events

The data torrent

The Organisation

BIG DATA BEST PRACTICE-7 :

D

ETECT

D

ATA

PATTERNS

IN

REAL

TIME

!!!

(32)

T

HE

CONTEXT

Velocity is high

Decision making window is low

(33)

R

EAL

TIME

EXAMPLE

Decision window = 8 mins

If a high value customer ( decile = 1 on last 36 months revenue )

and intra book interval > threshold and recency of search < 70

then route to call center channel

(34)

T

HEREFORE

Include S4 and other real time analytics into your

Big data reference architecture

(35)

BIG DATA BEST PRACTICE-8

(36)

T

HE

BASICS

Captology = Persuasion thru technology

D

ESIGN FOR

B

EHAVIOURAL

C

HANGE

Persuasion examples

 Users to change channel behaviour ( Move from Desktop to Mobile channel )

(37)

C

APTOLOGY

IN

A

CTION

Captology in Insurance

Reduce rates each time a person reports his or her exercise behaviour to a group of peers online

(38)

T

HERE

ARE

TOO

MANY

GOOD

PRODUCTS

HIDDEN

BEHIND

BAD

USER

INTERFACES

P

RODUCT

= I

NTERFACE

(39)

BIG DATA BEST PRACTICE-9

(40)

B

EST

P

RACTICE

-9

I

NTERSECT

OF

M

OVING

P

ARTS

ARE

THE

WEAK

LINKS

 Big Data Moving moving parts

 Columnar databases

 Hadoop clusters

 Advanced visualisation layer

 Real time components

 Data pipelines

 API’s scrappers to syndicate info

 Bridge to existing DW

 The intersect can give away as data / user volumes increase

 A real life big data architecture architecure

 Event loggers

 Hbase/Cassandra for high velocity event absorption

 Sqoop/Flume for data ingestion

 Hadoop cluster for massive data crunching

 R for extracting patterns

 Columnar database for 10 x lightning retrieval

 Tableau for advanced visualisation

 S4 for real time analytics

 Channel integration components

Hadoop Cluster R Predictor ranking Infobright Columnar DB

(41)

T

HEREFORE

… W

ATCH

THE

FOLLOWING

4 W

EAK

LINKS

1.

Link between Operational

event streams and

Hadoop

cluster

2.

Link between Hadoop

cluster and

Columnar

database

3.

Link between Columnar

database and the

visualisation tool

4.

Time it takes for the

machine learning

algorithm

to run

(42)

HIGH VELOCITY DATA PIPELINE

(43)

BIG DATA BEST PRACTICE-10

(44)

BEST PRACTICE-10 :

7 C

ORE MACHINE LEARNING BUILDING BLOCKS FOR ORCHESTRATING ANALYTICAL PROCESSES

Collaborative filtering

Apriori

Text mining

A/B testing

Clustering

Scoring models

Optimization

(45)

T

O

S

UMMARIZE

(46)

T

HANK

Y

OU

!

Q

UESTIONS

?

C

OMMENTS

?

T

HOUGHTS

?

References

Related documents

In an attempt to boost the recent success even further, this thesis analyzes the current situation that adidas faces including its strengths, weaknesses, opportunities, and

John Grant, interview by Glenna Graves, October 26, 1991, Appalachia Oral History Collection, Louis B. Nunn Center for

These eight practices represent a synthesis of research on mathematics teaching and represent yet another set of core practices of mathematics instruction and include

In light of the recently concluded Third International Conference on Small Island Developing States in Apia, Samoa and the upcoming 2015 DRR conference in Japan, the

A retrospective chart review was conducted to identify all patients who underwent breast augmentation within the Mentor Adjunct MemoryGel Silicone Breast Implant Study by a

About the above two searching, if it is done by military prosecutor, inform Military Court in 3 days; if it is done by military officer, military police or soldiers, inform to

We will now create an action (information) with tow parameters (title and message), the associated route (info), and a view to display the message: The route will use the two

If you are affected by a nuisance and you are unable to resolve it yourself, then you may be able to make a formal complaint to your local council or take action yourself in