Center for Web Intelligence Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA
Challenges and Opportunities in Data Mining:
Big Data Predictive User Modeling and Big Data, Predictive User Modeling, and
Personalization
Bamshad Mobasher Center for Web Intelligence Center for Web Intelligence
School of Computing
DePaul University, Chicago, Illinois, USA
April 20, 2012
Google Trends: Data Mining vs. Analytics Google Trends: Data Mining vs. Analytics
Center for Web Intelligence Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA
2
The Big Question?
The Big Question?
z Will data mining remain relevant? If so, how?
z Quick survey: Do you think the amount of data available in the digital world g
| will decrease in the future?
| will become less complex?
Where is the Life we have lost in living?
Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?
-- T.S. Eliot, “The Rock” ,
How much data?
z Google: ~20-30 PB a day
z Wayback Machine has ~4 PB + 100-200 TB/month
f /
z Facebook: ~3 PB of user data + 25 TB/day z eBay: ~7 PB of user data + 50 TB/day
z CERN’s Large Hydron Collider generates 15 PB a year z CERN s Large Hydron Collider generates 15 PB a year
z In 2010, enterprises stored 7 Exabytes = 7,000,000,000 GB
640K ought to be enough for anybody.
Center for Web Intelligence Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA
The Data Tsunami
McKinsy Global Institute Report:
“Big Data: the next frontier forg innovation, competition and productivity”
Big Data Value g
McKinsy Global Institute Report:
“Big Data: the next frontier for innovation, competition and productivity”
Center for Web Intelligence Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA
6
8
What’s Seen the Most Growth in 2008-2011
Types of Data Types of Activities/Areas
• Location / Geo / Mobile Data • Search / Web content mining
• Music / Audio
• Social Media / Social Networks
• Time Series
g
• Text mining / opinion analysis
• Personalization / recommendation
• Social network / Social media
• Images / Video
• User Profile data
• Text feeds / Micro-blog data
analysis
• Topic modeling / micro-blog analysis
H lth i f ti
z Much of this growth is driven by end user mobile or Web-based applications
• Health informatics
applications
| users are inundated with huge volume of complex information
| need for more personalized intelligent applications
Personalization
z The Problem
| Dynamically serve customized content (pages products
| Dynamically serve customized content (pages, products, recommendations, etc.) to users based on their profiles, preferences, or expected interests
z Why we need it?
|
Information spaces are becoming much more complex for user
|
Information spaces are becoming much more complex for user to navigate (huge online repositories, social networks, mobile applications, blogs, ….)
|
For businesses: need to grow customer loyalty / increase sales
|
For businesses: need to grow customer loyalty / increase sales
|
Industry Research: successful online retailers are generating as much as 35% of their business from recommendations
Center for Web Intelligence Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA
10
Data Mining and Personalization Data Mining and Personalization
z “Killer App” for data mining?
z Tangible successes both in the research and in industrial applications
|
recommender systems
|
recommender systems
|
personalized Web agents
|
user adaptive systems
|
Web marketing and eCRM
|
personalized search
z Sophisticated modeling approaches based on both z Sophisticated modeling approaches based on both
predictive and unsupervised DM techniques
Personalization
z Common Approaches
| Collaborative Filtering
| Collaborative Filtering
z Give recommendations to a user based on preferences of
“similar” users
| Content Based Filtering
| Content-Based Filtering
z Give recommendations to a user based on items with “similar”
content in the user’s profile
| R l B d (K l d B d) Filt i
| Rule-Based (Knowledge-Based) Filtering
z Provide recommendations to users based on predefined (or learned) rules
z age(x, 25-35) and income(x, 70-100K) and childred(x, >=3) Æ recommend(x, Minivan)
| Combined or Hybrid Approaches
Center for Web Intelligence Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA
12
The Recommendation Task
z Basic formulation as a prediction problem
Given a profile P
ufor a user u, and a target item i
t, predict the preference score of user u on item i
tz Typically the profile P contains preference scores by u predict the preference score of user u on item i
tz Typically, the profile P
ucontains preference scores by u on some other items, {i
1, …, i
k} different from i
t| preference scores on i1, …, ik may have been obtained explicitly
( i ti ) i li itl ( ti t d t
(e.g., movie ratings) or implicitly (e.g., time spent on a product page or a news article)
The Recommendation Task
z Content-Based Recommendation
| Predictions for unseen (target) items are computed based on
| Predictions for unseen (target) items are computed based on their similarity (in terms of content) to items in the user profile
z C ll b ti R d ti
z Collaborative Recommendation
| Predictions for unseen (target) items are computed based the other users’ with similar interest scores on items in user u’s profile
z i.e. users with similar tastes (aka “nearest neighbors”)
z requires computing correlations between user u and other users
di i i
according to interest scores or ratings z k-nearest-neighbor (knn) strategy
Center for Web Intelligence Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA
14
Content-Based
Recommender
Systems
Content-Based Recommenders:
Personalized Search Personalized Search
z How can the search i d t i th engine determine the
“user’s intent”?
?
Query: “Madonna and Child”
?
?
z Need to “learn” the user profile: p
| User is an art historian?
| User is a pop music fan?
Center for Web Intelligence Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA
16
Content-Based Recommenders
:: more examples
z Music recommendations
z Play list generation
Collaborative Recommender Systems
Center for Web Intelligence Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA
18
Collaborative
Recommender
Systems
Collaborative Recommender Systems
Center for Web Intelligence Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA
20
Personalization Based on User Behavior Data:
Data Mining Approach Data Mining Approach
Typically an Offline Process
Data Preparation / Modeling Phase Pattern Discovery Phase
Implicit or explicit User preference data (clicktrhoughs, ratings,
purchases, reviews Pattern Filtering Aggregation Pattern Analysis
p ,
Data Cleaning Data Integration Data Preprocessing
Aggregation Characterization
Aggregate User Models Data Integration
Data Transformation Event Model Generation
Sessionization
Data Mining
Patterns Content
& Structure
User Transaction /
Preference
User Segmentation Item Clustering / Similarity
User/Item Classification Correlation Analysis Association Rule Mining Domain Knowledge
Personalization Based on User Behavior Data:
Data Mining Approach Data Mining Approach
Online Process
Recommendation Engine
Recommendations, Integrated
Aggregate User Models
<user,item1,item2, Recommendations,
Predictions g
User Profile
user,item1,item2,
…>
Stored User Profile
Web Server Client Application Active Session
Domain Knowledge
Center for Web Intelligence Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA
22
New Challenges g
z Context-Awareness
| Can s stems nderstand ser’s conte t sit ation
| Can systems understand user’s context, situation, current intentions?
| Need to understand “task” being performed; user’s g p ; environment, domain knowledge/characteristics;
short-term and long-term preferences
I t ti D i K l d
z Integrating Domain Knowledge
| Most current modeling approaches focus on the discovery of “shallow” patterns
discovery of shallow patterns
| DM + Domain Knowledge (DM + AI) Æ intelligent
apps that can reason about / explain patterns
New Challenges
z Security / Trust / Reputation
| Many user adaptive systems vulnerable to malicious manipulation (e g “shilling”)
manipulation (e.g., shilling )
| Need more robust algorithms and ways to detect malicious profiles
| I i l t th ti f “ t ti ” b iti l
| In social systems the notion of “reputation” beocmes critical
z Serendipity
| Most predictive models not necessarily the bestp y
| Need the ability to “surprise” or provide novelty
z Big Data Challenges
Q i f l i f k d l i h
| Questions of scale require new frameworks and algorithms
| Wide variation in user behaviors require more sophisticated models (e.g., matrix factorization, hybrid / ensemble models)
Center for Web Intelligence Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA
24
Challenges:: Problems of Scale g
New Opportunities:: Social Annotation S stems
Systems
Center for Web Intelligence Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA
26
Amazon Example: Tags describe the g Resource
• Tags can describe Tags can describe
• The resource (genre, actors, etc)
• Organizational (toRead)
• Subjective (awesome)
• Ownership (abc)
Tag Recommendation Tag Recommendation
Center for Web Intelligence Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA
Example: Tags describe the user
z These systems are “collaborative.”
Example: Tags describe the user
| Recommendation / Analytics based on the
“wisdom of crowds.”
Rai Aren's profile Rai Aren s profile co-author
“Secret of the Sands"
New Opportunities:: Social Recommendation
Recommendation
z A form of collaborative filtering using social
network data
| U fil
| Users profiles
represented as sets of links to other nodes (users or items) in the network
| Prediction problem: infer
| Prediction problem: infer a currently non-existent link in the network
Center for Web Intelligence Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA
30
Conclusions
z Personalization and Recommendation Technologies
| The killer app for predictive data analytics
| The killer app for predictive data analytics
| Will drive the next generation of Web applications z Lots of new (and old) challenges
| New: Social media and social networks provide new challenges and opportunities; big data challenges scalability and effectiveness of old algorithms
scalability and effectiveness of old algorithms
| Old: scalability, sparsity, scrutability, serendipity
| Promising new work:
| Promising new work:
z