• No results found

Challenges and Opportunities in Data Mining: Personalization

N/A
N/A
Protected

Academic year: 2021

Share "Challenges and Opportunities in Data Mining: Personalization"

Copied!
31
0
0

Loading.... (view fulltext now)

Full text

(1)

Center for Web Intelligence Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA

Challenges and Opportunities in Data Mining:

Big Data Predictive User Modeling and Big Data, Predictive User Modeling, and

Personalization

Bamshad Mobasher Center for Web Intelligence Center for Web Intelligence

School of Computing

DePaul University, Chicago, Illinois, USA

April 20, 2012

(2)

Google Trends: Data Mining vs. Analytics Google Trends: Data Mining vs. Analytics

Center for Web Intelligence Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA

2

(3)

The Big Question?

The Big Question?

z Will data mining remain relevant? If so, how?

z Quick survey: Do you think the amount of data available in the digital world g

| will decrease in the future?

| will become less complex?

Where is the Life we have lost in living?

Where is the wisdom we have lost in knowledge?

Where is the knowledge we have lost in information?

-- T.S. Eliot, “The Rock” ,

(4)

How much data?

z Google: ~20-30 PB a day

z Wayback Machine has ~4 PB + 100-200 TB/month

f /

z Facebook: ~3 PB of user data + 25 TB/day z eBay: ~7 PB of user data + 50 TB/day

z CERN’s Large Hydron Collider generates 15 PB a year z CERN s Large Hydron Collider generates 15 PB a year

z In 2010, enterprises stored 7 Exabytes = 7,000,000,000 GB

640K ought to be enough for anybody.

Center for Web Intelligence Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA

(5)

The Data Tsunami

McKinsy Global Institute Report:

“Big Data: the next frontier forg innovation, competition and productivity”

(6)

Big Data Value g

McKinsy Global Institute Report:

“Big Data: the next frontier for innovation, competition and productivity”

Center for Web Intelligence Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA

6

(7)
(8)

8

(9)

What’s Seen the Most Growth in 2008-2011

Types of Data Types of Activities/Areas

• Location / Geo / Mobile Data • Search / Web content mining

• Music / Audio

• Social Media / Social Networks

• Time Series

g

• Text mining / opinion analysis

• Personalization / recommendation

• Social network / Social media

• Images / Video

• User Profile data

• Text feeds / Micro-blog data

analysis

• Topic modeling / micro-blog analysis

H lth i f ti

z Much of this growth is driven by end user mobile or Web-based applications

• Health informatics

applications

| users are inundated with huge volume of complex information

| need for more personalized intelligent applications

(10)

Personalization

z The Problem

| Dynamically serve customized content (pages products

| Dynamically serve customized content (pages, products, recommendations, etc.) to users based on their profiles, preferences, or expected interests

z Why we need it?

|

Information spaces are becoming much more complex for user

|

Information spaces are becoming much more complex for user to navigate (huge online repositories, social networks, mobile applications, blogs, ….)

|

For businesses: need to grow customer loyalty / increase sales

|

For businesses: need to grow customer loyalty / increase sales

|

Industry Research: successful online retailers are generating as much as 35% of their business from recommendations

Center for Web Intelligence Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA

10

(11)

Data Mining and Personalization Data Mining and Personalization

z “Killer App” for data mining?

z Tangible successes both in the research and in industrial applications

|

recommender systems

|

recommender systems

|

personalized Web agents

|

user adaptive systems

|

Web marketing and eCRM

|

personalized search

z Sophisticated modeling approaches based on both z Sophisticated modeling approaches based on both

predictive and unsupervised DM techniques

(12)

Personalization

z Common Approaches

| Collaborative Filtering

| Collaborative Filtering

z Give recommendations to a user based on preferences of

“similar” users

| Content Based Filtering

| Content-Based Filtering

z Give recommendations to a user based on items with “similar”

content in the user’s profile

| R l B d (K l d B d) Filt i

| Rule-Based (Knowledge-Based) Filtering

z Provide recommendations to users based on predefined (or learned) rules

z age(x, 25-35) and income(x, 70-100K) and childred(x, >=3) Æ recommend(x, Minivan)

| Combined or Hybrid Approaches

Center for Web Intelligence Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA

12

(13)

The Recommendation Task

z Basic formulation as a prediction problem

Given a profile P

u

for a user u, and a target item i

t

, predict the preference score of user u on item i

t

z Typically the profile P contains preference scores by u predict the preference score of user u on item i

t

z Typically, the profile P

u

contains preference scores by u on some other items, {i

1

, …, i

k

} different from i

t

| preference scores on i1, …, ik may have been obtained explicitly

( i ti ) i li itl ( ti t d t

(e.g., movie ratings) or implicitly (e.g., time spent on a product page or a news article)

(14)

The Recommendation Task

z Content-Based Recommendation

| Predictions for unseen (target) items are computed based on

| Predictions for unseen (target) items are computed based on their similarity (in terms of content) to items in the user profile

z C ll b ti R d ti

z Collaborative Recommendation

| Predictions for unseen (target) items are computed based the other users’ with similar interest scores on items in user u’s profile

z i.e. users with similar tastes (aka “nearest neighbors”)

z requires computing correlations between user u and other users

di i i

according to interest scores or ratings z k-nearest-neighbor (knn) strategy

Center for Web Intelligence Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA

14

(15)

Content-Based

Recommender

Systems

(16)

Content-Based Recommenders:

Personalized Search Personalized Search

z How can the search i d t i th engine determine the

“user’s intent”?

?

Query: “Madonna and Child”

?

?

z Need to “learn” the user profile: p

| User is an art historian?

| User is a pop music fan?

Center for Web Intelligence Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA

16

(17)

Content-Based Recommenders

:: more examples

z Music recommendations

z Play list generation

(18)

Collaborative Recommender Systems

Center for Web Intelligence Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA

18

(19)

Collaborative

Recommender

Systems

(20)

Collaborative Recommender Systems

Center for Web Intelligence Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA

20

(21)

Personalization Based on User Behavior Data:

Data Mining Approach Data Mining Approach

Typically an Offline Process

Data Preparation / Modeling Phase Pattern Discovery Phase

Implicit or explicit User preference data (clicktrhoughs, ratings,

purchases, reviews Pattern Filtering Aggregation Pattern Analysis

p ,

Data Cleaning Data Integration Data Preprocessing

Aggregation Characterization

Aggregate User Models Data Integration

Data Transformation Event Model Generation

Sessionization

Data Mining

Patterns Content

& Structure

User Transaction /

Preference

User Segmentation Item Clustering / Similarity

User/Item Classification Correlation Analysis Association Rule Mining Domain Knowledge

(22)

Personalization Based on User Behavior Data:

Data Mining Approach Data Mining Approach

Online Process

Recommendation Engine

Recommendations, Integrated

Aggregate User Models

<user,item1,item2, Recommendations,

Predictions g

User Profile

user,item1,item2,

…>

Stored User Profile

Web Server Client Application Active Session

Domain Knowledge

Center for Web Intelligence Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA

22

(23)

New Challenges g

z Context-Awareness

| Can s stems nderstand ser’s conte t sit ation

| Can systems understand user’s context, situation, current intentions?

| Need to understand “task” being performed; user’s g p ; environment, domain knowledge/characteristics;

short-term and long-term preferences

I t ti D i K l d

z Integrating Domain Knowledge

| Most current modeling approaches focus on the discovery of “shallow” patterns

discovery of shallow patterns

| DM + Domain Knowledge (DM + AI) Æ intelligent

apps that can reason about / explain patterns

(24)

New Challenges

z Security / Trust / Reputation

| Many user adaptive systems vulnerable to malicious manipulation (e g “shilling”)

manipulation (e.g., shilling )

| Need more robust algorithms and ways to detect malicious profiles

| I i l t th ti f “ t ti ” b iti l

| In social systems the notion of “reputation” beocmes critical

z Serendipity

| Most predictive models not necessarily the bestp y

| Need the ability to “surprise” or provide novelty

z Big Data Challenges

Q i f l i f k d l i h

| Questions of scale require new frameworks and algorithms

| Wide variation in user behaviors require more sophisticated models (e.g., matrix factorization, hybrid / ensemble models)

Center for Web Intelligence Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA

24

(25)

Challenges:: Problems of Scale g

(26)

New Opportunities:: Social Annotation S stems

Systems

Center for Web Intelligence Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA

26

(27)

Amazon Example: Tags describe the g Resource

• Tags can describe Tags can describe

• The resource (genre, actors, etc)

• Organizational (toRead)

• Subjective (awesome)

• Ownership (abc)

(28)

Tag Recommendation Tag Recommendation

Center for Web Intelligence Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA

(29)

Example: Tags describe the user

z These systems are “collaborative.”

Example: Tags describe the user

| Recommendation / Analytics based on the

“wisdom of crowds.”

Rai Aren's profile Rai Aren s profile co-author

“Secret of the Sands"

(30)

New Opportunities:: Social Recommendation

Recommendation

z A form of collaborative filtering using social

network data

| U fil

| Users profiles

represented as sets of links to other nodes (users or items) in the network

| Prediction problem: infer

| Prediction problem: infer a currently non-existent link in the network

Center for Web Intelligence Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA

30

(31)

Conclusions

z Personalization and Recommendation Technologies

| The killer app for predictive data analytics

| The killer app for predictive data analytics

| Will drive the next generation of Web applications z Lots of new (and old) challenges

| New: Social media and social networks provide new challenges and opportunities; big data challenges scalability and effectiveness of old algorithms

scalability and effectiveness of old algorithms

| Old: scalability, sparsity, scrutability, serendipity

| Promising new work:

| Promising new work:

z

New approaches to hybridization

z

Social media analytics

References

Related documents

We use a theoretical model of primary land allocation decisions to demonstrate how municipal officials’ responsibilities for promoting economic growth and managing public

The system automatically extracts a sequence of SDCs from the natural language command. Since each SDC generally corresponds to individual tasks that the robot should execute, we

Normality test results indicate that the significance value re- sulted from all variables included in this study, such as inflation rate, interest rates, exchange

ACCESS INCLUDED TO 8 MORE BARS IN THE RESORT BENAZUZA LOUNGE BAR RED KINKY NIGHT CLUB FONTAINEBLEAU RED HAVANA NIGHT CLUB HAVANA CIGAR CORNER COYOTE LOCO. BENAZUZA

Proces nastajanja matemati č kog modela proizvodnog sustava možemo podijeliti na pet koraka. Prvi korak je utvr đ ivanje stvarne vrste proizvodnog procesa npr. Drugi korak je

If you want to receive the paper you can reply to this e-mail writing  &#34; Second z/OS Knights tournament&#34; in the subject.. CMG-Italia

A new culture of peace, preventing wars, terrorism and poverty, is generated when a natural order of social harmony gives priority to the well-being of children, parents,