Music Recommendation

(1)

Music Recommendation

Listen to the music You like

Recuperação de Informação 1ºsemestre 2011/2012 Ricardo Dias, nº55444

(2)

Bibliography

Music Recommendation and Discovery: The Long

Tail, Long Fail, and Long Play

Òscar Celma, Springer 2010, Ch-1-3,5

Recommender Systems

Prem Melville, Vikas Sindhwani

Encyclopedia of Machine Learning, 2010

Handbook of Multimedia for Digital

Entertainment and Arts

(3)

MOTIVATION & CONTEXT

(4)

(5)

(6)

(7)

(8)

(9)

(10)

(11)

Music Consumption Change

Time

(12)

Music Recommendation

Digital Era - Portability

Up to ~20 tracks Up to ~40.000 tracks

(13)

Music Recommendation

Digital Era – Online Services

(14)

Music Recommendation

Digital Era – Online Services

Amazon

(15)

Music Recommendation Problem

• Overwhelming number of choices of which

music to listen to

– Users feel:

• Paralyzed • Doubtful

• Need to provide personalized filters and

recommendations to ease users’ decisions

(16)

Music Recommendation

Before Digital Era

• Cannot

_only

rely on recommendations from:

– Radios – Friends

– Local Record Dealers – Dj’s and Music Experts – Etc.

(17)

Music Recommendation

Digital Era

(18)

Music Characteristics

• Different from other types of media

– Tracking users’ preferences can be implicit – Items can be consumed several times

(even repeatedly and continuously) – Instant feedback

– Music consumption depends on context (morning, work, afternoon, etc.)

(19)

Music Recommendation Specificities

• Current music recommendation algorithms try

to accurately predict what people will want to

listen

– Making accurate predictions about a user could

listen

to, or

buy

next, independently of how

useful

the provided recommendations are to the user

(20)

THE RECOMMENDATION PROBLEM

(21)

Formalization

• Recommendation Problem

• Prediction problem – estimation of the items’ likeliness for a given user

• Recommend a list of N items – assuming that the system can predict likeliness for yet unrated items

(22)

Prediction Problem

• 𝑼 = 𝒖

_𝟏

, 𝒖

_𝟐

, … , 𝒖

_𝒎

 the set of Users

• 𝑰 = 𝒊

_𝟏

, 𝒊

_𝟐

, … , 𝒊

_𝒏

 items that can be

recommend

• 𝑰

_𝒖_𝒋

– list of items a user j expressed his interests

• Function 𝑷

_𝒖_𝒂_,𝐢_j

– predicted likeness of item 𝑖

_𝑗

,

for the active user 𝑢

_𝑎

, where 𝑖

_𝑗

∉ 𝐼

_𝑢_𝑎

(23)

Recommendation Problem

• Find a list of N items 𝑰

_𝒓

⊂ 𝑰 that the user will

like the most

• The ones with higher 𝑃_𝑢_𝑎_,𝑖𝑗

• The resultant list should not contain items from the user’s interests

(24)

Use Cases

• Common usages of a recommender system:

1. Find good items

2. Find all good items

3. Recommend sequence (e.g. playlist generation)

4. Just browsing

5. Find credible recommender 6. Express self

(25)

General Model

• Users and Items

• Two types of

recommendations:

• Top-N predicted items • Top-N predicted

(26)

User Profile Generation

• Two key elements:

• Generation and Maintenance

• Exploitation of the profile using a recommendation system

(27)

User Profile Creation

• Empty Profile – The simplest, but…

• Manual – Direct feedback to the system, but…

• Data import – create the profile from an

external representation

• Training Set – Provide feedback to concrete

items, marking them relevant or irrelevant to

user’s interests, but…

• Stereotyping – Assign a user into a cluster of

similar users that are represented by their

(28)

User Profile Maintenance

• Explicit Feedback

• Ratings (Problems?)

• Comments and Opinions

• Implicit Feedback

• Monitoring user’s actions (e.g., tracking play,

pause, skip and stop buttons in the media player, etc.)

• Problem?

(29)

User Profile Adaptation

• Adapt the system to users’ profile changes:

• Manually

• Adding new information while keeping the old • Gradually forgetting old interest and promoting

(30)

Recommendation Methods

• Standard classification of recommender

systems:

1. Demographic Filtering 2. Collaborative Filtering 3. Content-based Filtering 4. Context-based Filtering 5. Hybrid Approaches

(31)

Demographic Filtering

• Used to identify the kind of users that like a

certain item

• Classifies user profiles in clusters based on:

• Personal data (age, gender, marital status, etc.) • Geographic data (city, country)

(32)

Advantages/Limitations

• The simplest recommendation method

• But…

• Recommendations are too general

• Requires effort from the user to generate the profile

(33)

Collaborative Filtering

• Predict user preferences for items by learning

past user-item relationships

• CF methods work by build a matrix M with n

items and m users, that contains the

interaction (e.g. ratings, plays, etc.) of the

users with the items.

(34)

Collaborative Filtering

• The value 𝑴

_𝒖_𝒂,_𝒊_𝒋

represents the “rating” of the

(35)

Collaborative Filtering Approaches

• Item-Based Neighborhood

• User-Based Neighborhood

• Matrix Factorization

(36)

Item-Based Neighborhood

• Only users that rated the items 𝒊

_𝒋

and 𝒊

_𝒌

, are

(37)

Item-Based Neighborhood

• Only users that rated the items 𝒊

_𝒋

and 𝒊

_𝒌

, are

(38)

Item-Based Neighborhood

1. Compute the similarity between two items, i

and j

1. Example: Adjusted cosine similarity

2. Predict to the target user, u, a value for the

active item, i

𝑺𝒌 𝒊; 𝒖 - set of k neighbors of item I, that the user u has rated

(39)

User-Based Neighborhood

• Compute the predicted rating value of item i,

for the active user u, taking into account those

users that are similar to u

𝑟 - average rating for user u _𝑢

(40)

Matrix Factorization

• Useful when the M user-item matrix is sparse

• Reduce dimensionality of the original matrix,

generating matrices U and V that approximate the original one

• Example: SVD – Singular Value Decomposition

• Computes matrices 𝑛 𝑥 𝑘 𝑈 and 𝑚 𝑥 𝑘 𝑉, for a given number k, such as:

(41)

Matrix Factorization

• After matrix reduction we can calculate the

predicted rating value for item i for a user u

(42)

Limitations

• Data sparsity and high dimensionality

• Gray sheep problem*

• Cold-start problem (early-rater problem)

• Does not take into account items’ descriptions

• Popularity Bias

(43)

Content-based Filtering

• Uses information describing the items

• Process of characterizing item data set can be:

• Manual (annotations by domain experts)

• Automatic (extracting features by analyzing the content)

(44)

Content-based Filtering

• Similarity Functions

1. Euclidean 2. Manhattan 3. Chebychev 4. Mahalanobis

(45)

Limitations

• Cold-start problem (only to user)

• Gray-sheep problem

• Novelty (?)

• Limitation of extracted automatic features

• Subjective (personal opinions) not taken into

(46)

Context-based Filtering

• Uses context information to describe and

characterize the items

• Context Information – any information that can be used to characterize a situation or an entity

• Context != Content

• Two main techniques:

• Web mining • Social Tagging

(47)

Web Mining

• 3 different web mining categories:

• Web content mining

• text, hypertext, markup, and multimedia mining

• Web structure mining

• focuses on link analysis (in- and out- links)

• Web usage mining

• uses the information available on session logs. This information can be used to derive user habits and

preferences, link prediction, or item similarity based on co-occurrences in the session log

(48)

Social Tagging

• Aims at annotating web content using tags

• Tags are freely chosen keywords, not

constrained to a predefined vocabulary

• Recommender systems can derive social

(49)

Social Tagging

• When users tag items, we get tuples of :

< 𝒖𝒔𝒆𝒓, 𝒊𝒕𝒆𝒎, 𝒕𝒂𝒈 >

• These triples conform a 3-order matrix

(50)

Social Tagging

• Two approaches to compute item (and user)

similarity:

1. Unfold the 3-order tensor in three bidimensional

matrices (user- tag, item-tag and user-item matrices)

(51)

Unfolding the 3-order tensor

• User-Tag (U matrix) - 𝑼

_𝒊,𝒋

contains the number

of times user i applied the tag j

• Item-Tag (I matrix) - 𝑰

_𝒊,𝒋

contains the number

of times an item i has been tagged with tag j

• User-Item (R binary matrix) - 𝑹

_𝒊,𝒋

denotes

(52)

Unfolding the 3-order tensor

• Item similarity (using I) or user similarity (using

U or I), can be computed using:

• Cosine-based distance

• Dimensionality reduction techniques(SVD, NMF)

• Then recommendations can be made by using:

• R user-item matrix or,

(53)

Using the 3-order tensor

• The available techniques are (high-order)

extensions of SVD and NMF

• HOSVD is a higher order generalization of matrix SVD for tensors,

• Non-negative Tensor Factorization (NTF) is a generalization of NMF

(54)

Limitations

• Coverage

• Problems with tags:

• Polysemy • Synonymy

• Usefulness of personal tags • Sparsity

(55)

Hybrid Approaches

• Goal

• Achieve better recommendations by combining some of the previous approaches

• Methods:

• Weighted • Switching • Mixed

(56)

Factors Affecting Recommendation

• Novelty and Serendipity

• Explainability (transparency)

• Cold Start Problem

• Data Sparsity and High Dimensionality

• Coverage

(57)

Factors Affecting Recommendation

• Trust

• Attacks

• Temporal Effects

(58)

MUSIC RECOMMENDATION

(59)

Use Cases

• Main task of a music recommendation

system:

• Propose interesting music, consisting of a mix of known and unknown artists, as well as the

(60)

Use Cases

• Artist Recommendation

• Playlist Generation

• Shuffle, Random Playlists • Personalized Playlists

(61)

(62)

User Profile Representation

• Extend user profile with music related

information

• Has not been largely investigated

• Useful to:

• Improve music recommendation • Share with others your preferences

(63)

Type of Listeners

• Each type of listener needs different type of

recommendations

(64)

User Profile Representation

Proposals

• Most relevant proposals are:

• User modeling for Information Retrieval (UMIRL) • MPEG-7 standard

(65)

User Modeling for Information

Retrieval

• Allows one to describe perceptual and

(66)

MPEG-7 User Preferences

• User preferences in MPEG-7 includes:

• Content filtering

• Searching and browsing preferences

(67)

FOAF: User Profiling in the

Semantic Web

• Provides conventions and a language “to tell”

a machine the type of things a user says about

herself in her homepage

(68)

Item Profile Representation

• Music items:

• Artists • Songs

(69)

(70)

Music Information Plane

• Music knowledge management categories:

• Editorial Metadata • Cultural Metadata • Acoustic Metadata

(71)

(72)

(73)

(74)

Music Description Facets

• Low-level Timbre Descriptors

• Spectral Centroid/Flateness/Skewness, MFCCs, etc.

• Instrumentation

• Rhythm

• Harmony

• Structure

• Intensity

• Genre

• Mood

(75)

Recommendation Methods

(examples and specificities)

• Collaborative Filtering (CF)

• Explicit/Implicit Feedback

• Content-Based Filtering

• Context-Based Filtering

• Hybrid Methods

(76)

Collaborative Filtering

• CF makes use of the editorial and cultural

information

• Explicit feedback – based on ratings about songs / artists

(77)

CF with Explicit Feedback

• Examples:

• Ringo – 1st_{music recommender based on CF and} explicit feedback

• Racofi – based on CF and a set of logic rules based on Horn clauses

• Indiscover

(78)

CF with Implicit Feedback

• Main Drawbacks:

• The value that a user assigns to an item is not

always in a predefined range (e.g. from 1..5 or like it/hate it)

• Cannot gather negative feedback

• Recommendations usually performed at artist

level, but listening habits are recorded at song

level  Aggregation

(79)

Content-Based Filtering

• Uses content extracted from music to provide

recommendations

• Compute similarity among songs, in order to recommend music to the user

• Two ways to describe audio content:

• Manually

(80)

Manually Audio Content

Description

• Very time consuming

• Scalability problems

• But…

• Annotations can be more accurate than automatic

• Example: Pandora

• Analysts annotate 400 parameters per song, using a ten point scale per attribute

(81)

Automatic Audio Content

Description

• Early work on audio similarity is based on

low-level descriptors, such as Mel Frequency

Cepstral Coefficients ( MFCC)

• Foote proposed a music indexing system based on MFCC histograms

• Audio features are usually aggregated together using mean and variance, or modeling it as a Gaussian Mixture Model (GMM)

(82)

Automatic Audio Content

Description

• Analyses audio signal and automatically

extracts a set of features:

• Tzanetakis extracted a set of features representing the spectrum, rhythm and harmony (chord

structure); then merged into a single vector, and were used to determine song similarity

• Cataltepe et al. presented a music

recommendation system based on audio

similarity, where user’s listening history is taken into account

(83)

Context-Based Filtering Techniques

• Uses cultural information to compute artist or

song similarity

• Mainly based on web mining techniques, or

mining data from collaborative tagging

(84)

Context-Based Filtering Techniques

• Example:

• M3_{(Music for My Mood), uses context (season,} month, day of the week, weather, temperature) and Case-based Reasoning to recommend music

(85)

Hybrid Methods

• Allows a system to minimize the issues that a

solely method can have

• How cascade approach works:

• One technique is applied first, obtaining a ranked list of items. Then, a second technique refines or re-rank the results obtained in the first step

(86)

Hybrid Method

• Example:

• Tiemann et al. investigate ensemble learning methods for hybrid music recommender

algorithms. Their approach combines social and

content-based methods, where each one

produces a weak learner. Then using a

combination rule, it unifies the output of the weak learners.

(87)

EVALUATION

(88)

Evaluation

• Three different strategies

• System-centric • Network-centric • User-centric

(89)

System-centric Evaluation

• Evaluation measures how accurate the system

can predict the actual values that user have

(90)

System-centric Evaluation

• Most approaches based on the leave-n-out

method

• Similar to the classic n-fold cross validation

• Dataset divided into two (usually disjunct)

sets:

• Training and Test

• Accuracy evaluation based only on a user’s

dataset

(91)

System-centric Evaluation

• Metrics:

• Predictive accuracy

• Mean Absolute Error, Root Mean Square Error

• Decision based

• Mean Average Precision, Recall, F-measure, Accuracy, ROC

• Rank based

• Spearman’s ƥ, Kandall – Ƭ, Half-life Utility, Discounted Cumulative Gain

(92)

System-centric Evaluation

Limitations

• Cannot evaluate recommendations concerning:

1. Coverage 2. Novelty

3. Transparency (explainability) 4. Trustworthiness (confidence) 5. Perceived Quality

(93)

Network-centric Evaluation

• Evaluation aims at measuring the topology of

the item (or user) similarity network

(94)

Network-centric Evaluation

• The similarity network is the basis to provide

the recommendations

• Important to analyze and understand the

underlying topology of the similarity network

• Measures:

• Coverage

(95)

Network-centric Evaluation

• In terms of:

• Navigation

• Average Shortest Path, Giant Component

• Connectivity

• Degree Distribution, Degree-Degree Correlation, Mixing Patterns

• Clustering

• Local/Global Clustering Coefficient

• Centrality

(96)

Network-centric Evaluation

• Limitations:

• Accuracy of the recommendations cannot be measured

• Transparency (explainability) and trustworthiness (confidence) of the recommendations cannot be measured

• The perceived quality (i.e. usefulness and

effectiveness) of the recommendations cannot be measured

(97)

User-centric Evaluation

• Evaluation focuses on the user’s perceived

quality and usefulness of the

(98)

User-centric Evaluation

• Copes with the limitations of both:

• System- and Network-centric approaches

• Evaluates:

• Novelty

(99)

User-centric Evaluation

• Gathering Feedback (Explicit, Implicit)

• Perceived Quality • Novelty

(100)

Perceived Quality

• Easiest way to measure?

• Explicitly ask the users

• User needs information about:

• Item (e.g. metadata, preview, etc.)

• Reasons why the item was recommended

• Then can rate the quality of each recommended item (or the whole list)

(101)

Novelty

• Ask users if they recognize the predicted items

or not

• Combining novelty and perceived quality we

can infer if:

• User likes to receive and discover unknown items • Prefers more conservative and familiar

(102)

A/B Testing

• Present two different versions of an algorithm

(or two algorithms)

• Evaluate which one performs the best

• Performance measured by the impact the

new algorithm has on the visitors’ behavior,

(103)

User-centric Evaluation

• Limitations:

• Need of user intervention in the evaluation process

• Gathering feedback from the user can be tedious for some users

(104)

Evaluation summary

• Combining the three methods we can cover

(105)

Evaluation summary

• System-centric

• Evaluates performance accuracy of the algorithm

• Network-centric

• Analyses the structure of the similarity network

• User-centric

• Measure satisfaction about recommendations they receive

(106)

DATASETS FOR EVALUATION

(107)

Last.fm Dataset – 1K users

• Contains <user, timestamp, artist, song>

tuples

• Represents the listening habits for ~1.000 users

• Collected from Last.fmAPI

• User.getRecentTracks()

• Statistics:

• ~108,000 Artists with MusicBrainz ID • ~70.000 Artists without MusicBrainz ID

(108)

Last.fm Dataset – 360K users

• Contains tuples <user, artist, plays> from

360.000 users

• Collected from Last.fm API

• User.getTopArtists()

• Statistics:

• ~190.000 Artists with MusicBrainz ID

(109)

The Million Song Dataset

• One Million Songs!!!

• 280GB of data

• ~45.000 unique artists • ~8.000 unique terms

• > 2 Million asymmetric similarity relationships • Acoustic features

• Pitch, Timbre, Loudness, etc.

• Links to other sources to obtain more information

(110)

NEXTONE PLAYER

NEXTONE PLAYER: A

Music Recommendation

System Based on User

Behavior

Yajie Hu and Mitsunori Ogihara

(111)

Session-based CF for Music

Recommendation

Session-based Collaborative

Filtering for Predicting the

Next Song

Sung Eun Park, Sangkeun Lee, Sang-goo Lee

(112)

(113)