Neighbourhood Methods - Recommender systems with Bayesian aspect models and the effect of appro

The first collaborative filtering recommendation systems were based on heuris- tic methods that do not specify a model. Instead, recommendations are gener-

ated using all the observations in the database. For this reason these methods are commonly known as memory-based methods. The intuition behind these methods is simple: when recommending items to a user, find similar users and recommend items based on items the similar users have rated highly. This approach is simple and intuitive and it achieves reasonably good results for forced prediction, which has resulted in it remaining popular even against more recent approaches. Neighbourhood methods generate recommendations via rating prediction.

The simplest neighbourhood technique generates a predicted rating ˆru,i for

user u and item i by first selecting a neighbourhood of users V and then predicting the rating as the mean rating given by the users in the neighbourhood.

ˆ ru,i= 1 |V | V X v=1 rv,i (3.8)

The neighbourhood V can be as simple as consisting of users that have rated item i, in which case the neighbourhood method is simply recommending based on item means. A better approach is to use a similarity measure between users to build the neighbourhood.

The similarity measure is a function denoted sim(ru, rv) that, given two user

rating vectors return a value in the range [−1, 1] where −1 indicates the users have exactly opposite histories and +1, indicates their histories are exactly the same.

We can use the similarity measure to predict the rating as a weighted average for the users in V , where the item ratings are weighted based on the similarity between the selected user and the user from the neighbourhood. This way the ratings of more similar users contribute more than dissimilar users.

ru,i = 1 PV v=1sim(ru, rv) V X v=1 sim(ru, rv)rv,i (3.9)

The neighbourhood can also be filtered down further from all the users that have rated item i by only keeping the K most similar users to the selected user

u in U0; these top-K neighbourhoods let one control the size of the neighbourhood that is used.

3.3.1 Normalisation

An obstacle to this approach is that not all users use the rating scale in the same way. To remedy this we can normalise the ratings to keep the preference to rating mapping consistent between users (Ekstrand et al., 2011). For a user represented as a rating vector ru we can normalise around the user mean µu by

subtracting the mean from each rating so that the user feedback represented as deviation from the mean:

r_u,inew= ru,i− µu. (3.10)

To further normalise the ratings to account for the differences in how users spread their ratings over the rating scale, the user’s ratings can be centred around the user mean and then scaled relative to the user’s standard deviation σu

rnew_u,i = (ru,i− µu)/σu. (3.11)

When predicting ratings the predicted ratings must be scaled back to the original rating scale.

3.3.2 Similarity Measures

The two most common and effective similarity measures are cosine similarity and Pearson correlation.

Cosine similarity is a vector similarity measure that measures the cosine of the angle between two vectors.

sim(ru, rv) = cos(θ) =

i=1ru,irv,i

q PI i=1ru,i2 q PI i=1rv,i2 (3.12)

Pearson correlation coefficient is a linear correlation measure where I0 = Iu∩ Iv sim(ru, rv) = ρru,rv = X ∀(I0₎ (ru,i− µu)(rv,i− µv) p(r_u,i− µu)2p(rv,i− µv)2 . (3.13)

µu is the feedback average for user u over Iu∩ Iv.

Both of these similarity measures give jump to the edges of their range values when the user histories are small. This causes users with small histories to seem very similar to other users even though they only have a small overlap in histories. This effect creates issues in cold start scenarios. This causes neighbourhood models to be generally unsuitable for cold start scenarios and there needs to be specific strategies for new users, or a hybrid system is required (Lam et al., 2008).

3.3.3 Usage and Drawbacks

Neighbourhood models are conceptually simple and intuitive, and are commonly recommended as a starting point for recommender systems. However, they are less accurate than matrix factorisation for rating prediction (Koren et al., 2009) and perform very poorly for top-N item prediction (Cremonesi et al., 2010).

The recommendations generated are explainable as ‘users similar to you en- joyed:’, but because no explicit model is learned, the system cannot gain any insight into user behaviour or the relationship between items. Neighbour- hood models are also difficult to scale and much of the desired simplicity of the approach is lost with complex extensions to enable scaling. Despite these drawbacks there are approaches that successfully employ neighbourhood tech- niques. Amazon.com for example used what they called item-to-item collaborative filtering (Linden et al., 2003). This combines many approaches including the traditional collaborating filtering approach into a hybrid, scalable recommender.

In document Recommender systems with Bayesian aspect models and the effect of approximate inference (Page 64-67)