• No results found

Machine Learning for Understanding User Behaviours. Semi-Supervised Learning Applied to Click Streams

N/A
N/A
Protected

Academic year: 2021

Share "Machine Learning for Understanding User Behaviours. Semi-Supervised Learning Applied to Click Streams"

Copied!
31
0
0

Loading.... (view fulltext now)

Full text

(1)

Machine Learning for

Understanding User

Behaviours

(2)

Goals

Motivation for semi-supervised learning and log

analytics

Overall Methodology

Leveraging Hadoop / Java to create datasets

(3)

Online Retail

Upsales / Cross Sales

Personalized Offers

Newsletter Targeting

Competition Optimized Price

Optimized Display Campaigns

Funnel Optimization

(4)

Publishing & Media

Understand and monetize

audience

Acquire and sell customer

data

Personalize User Experience

(5)

Product Manager Questions

Different Customer Behaviours

Subscribers

Newcomers

How do behaviour evolve in

time ?

Do customer search & find

relevant content ?

(6)
(7)

What kind of session ?

Search for a

specific Topic

Home Page

Wanderer

Foreigner

Discovering the

Web Site

Fan that loves

to rate and

comment

Newcomer from

Google News

(8)

Product Manager Available

Data

Click Streams

Topics / Content Referential

Competitors / Outside Data

Customer Feedbacks

(9)

Structured ?

Number of

events / pages

Number of

Users

Click Stream

+

100 M

10 M

Content

Hierarchy

++

10 K

-

Web / RSS

+-

100 M

User Ratings

+

~ 100 K

~ 10K

Server Logs

+-

100M

10M

(10)

Methodology Overview

Raw Data

User Sessions

Session

Analytics

Manual

Session Tagging

Define Session Model

Hadoop

!

Aggregation

Sampling /

!

Clustering /

!

Viz

Supervised

!

Learning

Reports

Detailed

Session

Stats

(11)

Add Signals to the Sessions

Click Stream

Sessions

+ Channel

+ Page View

+ Customer history

Session History

Marketing

Campaigns

Page Level

Analytics

CRM

+ Generated Revenue

+ Main topic

Articles Topics

Sessions To Analyze = ~ 50 dimensions

+ Geo/ISP/Device

GeoIP

(12)

ClickStream

Click Stream

UserID TimeStamp Page Refererer ….. 94390940 2013/11/11-12:32:43 43653729 2013/11/11-12:32:43 30418239 2013/11/11-12:32:43 94390940 2013/11/11-12:32:43 99666658 2013/11/11-12:32:43 43653729 2013/11/11-12:32:43 94390940 2013/11/11-12:32:43 94390940 2013/11/11-12:32:43 99666658 2013/11/11-12:32:43 94390940 2013/11/11-12:32:43 94390940 2013/11/11-12:32:43 94390940 2013/11/11-12:32:43 43653729 2013/11/11-12:32:43 94390940 2013/11/11-12:32:43 94390940 2013/11/11-12:32:43

(13)

Build Sessions

Click Stream

UserID TimeStamp Page Refererer ….. SessionID 94390940 2013/11/11-12:32:43 94390940-1 94390940 2013/11/11-12:32:43 94390940-1 94390940 2013/11/11-12:32:43 94390940-1 94390940 2013/11/11-12:32:43 94390940-1 94390940 2013/11/11-12:32:43 94390940-1 94390940 2013/11/11-12:32:43 94390940-1 94390940 2013/11/11-12:32:43 94390940-1 94390940 2013/11/11-12:32:43 94390940-1 94390940 2013/11/11-12:32:43 94390940-1

Sessions

(14)

Add Signals to the Sessions

Click Stream

UserID TimeStamp Page Referer ,,,, SessionID

94390940 2013/11/11-12:32:43 1 94390940 2013/11/11-12:32:43 1 94390940 2013/11/11-12:32:43 1 94390940 2013/11/11-12:32:43 1 94390940 2013/11/11-12:32:43 1 94390940 2013/11/11-12:32:43 1 94390940 2013/11/11-12:32:43 1 94390940 2013/11/11-12:32:43 1 94390940 2013/11/11-12:32:43 1

Sessions

(15)

Add Signals to the Sessions

Click Stream

UserID TimeStamp Page User Agent IP ,,,, SessionID City ISP Device OS

!

Local Hour

94390940 2013/11/11 -12:32:43

Mozilla 4.1

(…) 193.4.1.3 1 PARIS FREE SAS Mobile iOS 14

Sessions

GeoIP / ISP

Database

+ Geo/ISP/Device/Time

(16)

Channel Signal

Click Stream

UserID TimeStam

p Page

User

Agent IP ,,,, SessionID City ISP Device OS

Marketing Source Camp. 94390940 2013/11/1 1-12:32:4 3 Mozilla

4.1 (…) 193.4.1.3 1 PARIS FREE SAS Mobile iOS AdWords ad-213 E-Mailing e-2013 Retarg. crteo-2 DIsplay ..

None .. SEO ..

Sessions

GeoIP / ISP

Database

+ Geo/ISP/Device

+ Channel

Marketing

Campaigns

(17)

Channel Signal

Click Stream

UserID TimeStam

p Page

User

Agent IP ,,,, SessionID City ISP Device OS

Marketing Source Camp. 94390940 2013/11/1 1-12:32:4 3 Mozilla

4.1 (…) 193.4.1.3 1 PARIS FREE SAS Mobile iOS AdWords ad-213 E-Mailing e-2013 Retarg. crteo-2 DIsplay ..

None .. SEO ..

Sessions

GeoIP / ISP

Database

+ Geo/ISP/Device

+ Channel

Marketing

Campaigns

(18)

Channel Signal

Click Stream

UserID TimeStamp Page User Agent IP

Marketing

Source Camp. ….

First Contact

N days ago Last Contact N days ago N Contacts Last 30 Days 94390940 2013/11/11-12:32:43 Mozilla 4.1 (…) 193.4.1.3 AdWords ad-213 231 2 19 E-Mailing e-2013 Retarg. crteo-2 DIsplay .. None .. SEO ..

Sessions

GeoIP / ISP

Database

+ Geo/ISP/Device

+ Channel

Marketing

Campaigns

+ Customer history

(19)

Channel Signal

Click Stream

UserID TimeStamp Page User Agent IP

N Contacts

Last 30 Days …. N PageViews N Page < 5

sec … N Search Page … 94390940 2013/11/11-1 2:32:43 Mozilla 4.1 (…) 193.4.1.3 19

Sessions

+ Geo/ISP/Device

+ Channel

+ Customer history

+ PageView

Sessions To Analyze = ~ 50 dimensions

+ ….

(20)

How to do these

aggregations ?

Hive Job

Pig Job

Spark

Hadoop Streaming + Python

Scalding

(21)

Methodology Overview

Raw Data

User Sessions

Session

Analytics

Manual

Session Tagging

Define Session Model

Hadoop

!

Aggregation

Sampling /

!

Clustering /

!

Viz

Supervised

!

Learning

Reports

Detailed

Session

Stats

(22)
(23)

Clustering & Cluster

Sampling

(24)

Tag ~ 1000 Sessions

Dark Bot (From

A Competitor ?)

Foreigner

Discovering the

Web Site

Fan that loves to

rate and comment

Search for a

specific Topic

Newcomer from

Google News

Home Page

Wanderer

(25)

Methodology Overview

Raw Data

User Sessions

Session

Analytics

Manual

Session Tagging

Define Session Model

Hadoop

!

Aggregation

Sampling /

!

Clustering /

!

Viz

Supervised

!

Learning

Reports

Detailed

Session

Stats

(26)

Supervised Learning

Mahout (Logistic Regression?)

Python Scikit

MLI + Spark (Logistic Regression?)

(27)

Methodology Overview

Raw Data

User Sessions

Session

Analytics

Manual

Session Tagging

Define Session Model

Hadoop

!

Aggregation

Sampling /

!

Clustering /

!

Viz

Supervised

!

Learning

Reports

Detailed

Session

Stats

(28)

Reporting Part

Hive + Desktop Reporting Tool

Python (Pandas)

(29)

Sessions Statistics

Dark Bot (From

A Competitor ?)

Foreigner

Discovering the

Web Site

Fan that loves to

rate and comment

Search for a

specific Topic

Newcomer from

Google News

Home Page

Wanderer

0.3

per session

0.23

acquisition costs

``

`

13k sessions

1.3

per session

0.23

acquisition costs

938k sessions

938k sessions

0.3

per session

0.23

acquisition costs

738k sessions

0.83

per session

0.73

acquisition costs

68k sessions

0.3

per session

1.23

acquisition costs

1k sessions

0

per session

0

acquisition costs

(30)

Next: Follow Transitions

Dark Bot (From

A Competitor ?)

Foreigner

Discovering the

Web Site

Fan that loves to

rate and comment

Search for a

specific Topic

Newcomer from

Google News

Home Page

Wanderer

0.3

per session

0.23

acquisition costs

``

`

13k sessions

1.3

per session

0.23

acquisition costs

938k sessions

938k sessions

0.3

per session

0.23

acquisition costs

738k sessions

0.83

per session

0.73

acquisition costs

68k sessions

0.3

per session

1.23

acquisition costs

1k sessions

0

per session

0

acquisition costs

(31)

Thank you !

Tweet

#ParisJUG @fdouetteau

Semi-supervised learning for user

sessions

We’re hiring

References

Related documents

Though there are no transgender parents on American network television at this time, both Netflix and Amazon Prime offer shows featuring a transgender woman with children.. Orange

Earliest possible start of installed correlator testing. Q2, 2007 Earliest possible “beta”

Three ownership types are commonly used by local government in their delivery of leisure services, these are: (a) single provision by the local authority itself (Public), (b) single

• Money market securities such as treasury bills but, in some countries, certificates of deposit (CD) and commercial paper (CP).. Bank loans need to be made transferable in order to

Chow Yei Ching School of Graduate Studies City University of Hong Kong.. Tel : (852) 3442-9076 Email

In Canada, there are three main sources of government-provided retirement income: the Canada/Quebec Pension Plans (C/QPP), which have benefits and contributions based on earnings

This paper presents a review on seismic behavior of various structures using various codal provisions as given in Indian Code, American code, European code, New Zealand code

A small amount of the other music and sound effects created by Carlos and Rachel Elkind — part of a considerable quantity of music written by them that was originally intended to