CHAPTER 4 : AUTOMATED COLLABORATIVE FILTERING
4.4 PRINCIPLES OF ACF
The basic assumption of the ACF is that, rather than relying upon content descriptions as a basis for filtering, it may be easier to harness the decisions made by other like-minded users in accepting or rejecting the system assets. Unlike the ‘content-based’ filter, a collaborative filter requires access to the usage data of other users of the system. It does not require that the system assets be modelled in any way, which is an enormous advantage where the representation task is onerous. The basic idea behind ACF can be illustrated in Figure 4.2 where all three users have expressed an interest in assets A, B & C. (For instance they may have listened to songs A, B & C.) The high degree of overlap indicates that these users may have a shared taste in music. Further it seems safe to recommend assets D and E to user 1 because they have been ‘endorsed’ by Users 2 and 3. However, as we will discuss, the task of assembling a group of neighbours to create a filter that
approximates the target user’s interests requires a lot of use-data, since the degree of intersection between individual users varies considerably.
Figure 4.2: ACF takes advantage of the overlap in interests within communities of users
ACF takes advantage of the overlap in interests within communities of users to develop a filter for each user based on a neighbourhood of ‘like-minded’ users. The key assumption on which any ACF algorithm rests is that human preferences can be correlated, and that informed prediction can be made based on these correlations. Until recently, the ACF algorithm did not have formal roots in any discipline as CBR does in cognitive science. Despite its increasing application within on-line commercial systems, it has often been viewed as ‘the orphan child’ of applied AI, whose parentage is somewhat questionable (Billsus and Pazzani 1998). However, Pennock et al. have demonstrated that ACF techniques rely upon similar properties of determining and aggregating preferences that social choice theorists have been formally analysing for decades (Pennock et al. 2000). Using the conditions laid down by such theorists they show that the weighted, nearest neighbour implementation of the ACF algorithm is the only possible form of prediction function possible. In Section 4.5.2 we will discuss this implementation, as well as other techniques established outside the formalism of social theory.
4.4.2 Data Representation
Relevance feedback is a commonly accepted technique of improving retrieval quality in information retrieval (IR) systems (Rocchio 1971, Salton & Buckley 1990). Using relevance feedback, a user can indicate to an IR system which portions of a retrieved document set are useful to him or her. The IR system can reformulate the original query based on the feedback and return an improved retrieval set. This type of feedback is sometimes referred to as “querying by instance”. Unlike IR systems which rely upon content representation, collaborative filtering systems use relevance feedback as the only means of filtering or recommending.
The amount of data required depends to some extent on the type of data available. In this context, there are two distinct approaches to the ACF idea that can be termed explicit and implicit (Nichols 1997, Oard & Kim 1998, Claypool et al. 2001, Rafter et al. 2000, Hayes et al. 2001,
O’Sullivan et al. 2002). With the explicit approach the user is asked to rate assets. Such an approach was taken by the GroupLens project where users are asked to rate Usenet articles (Konstan et al. 1997). However, asking users to rate articles imposes a cognitive load that users may refuse. Grudin has observed that users may not participate in explicit grading schemes if they are not aware of the benefits to them (Grudin 1994). Thus explicit datasets may be sparse because of the work required by users (Oard & Kim 1998). Consequently there are several strands of research on gathering implicit data without imposing a feedback obligation on users (Morita & Shinoda 1994, Konstan et al. 1997, Lieberman 1997, Rafter et al. 2000). For instance, implicit data points might represent individual page impressions at a website. Implicit data contains less information and can be noisy in the sense that users might not like some of the items they have used. This can be seen in Table 4.5, which is an implicit version of Table 4.4. The information that User 2 dislikes asset D is lost in the implicit approach. Because of this data noise and loss of information, more data is needed to produce good recommendations with a purely implicit approach. However, it can be used to supplement a sparse dataset based on explicit ratings (Konstan et al. 1997, Rafter et al. 2000). Research has been done on deriving more precise implicit values based on a qualitative measurement of specific user actions. For instance, Morita & Shinoda found that the time people spend reading a Usenet article is correlated with their interest in it, but that there was no correlation between message length and reading time (Morita and Shinoda 1994). In the CASPER job finder system Rafter & Smyth use reading duration and the number of revisits to a job description as implicit metrics (Rafter & Smyth 2000).
Several similar implicit techniques have been applied by researchers (Lieberman 1997, Konstan et al. 1997, Mobasher et al. 2000a). In Chapter 6 we will discuss how data is captured and represented in the Smart Radio system.
Table 4.4: Data for use in ACF where users have explicitly rated assets
A B C D E F G User 1 0.6 0.6 0.8 ? ? 0.8 0.5 User 2 ? 0.8 0.8 0.3 0.7 ? ? User 3 0.6 0.6 0.3 0.5 ? 0.7 0.5 User 4 ? ? ? ? 0.7 0.8 0.7 User 5 0.6 0.6 0.8 ? ? 0.7 ? User 6 ? 0.8 0.8 0.7 0.7 ? ? User 7 0.7 0.5 ? ? 0.7 ? ? User 8 ? ? ? ? 0.7 0.7 0.8
Table 4.5: ACF data where users have not explicitly rated assets
A B C D E F G User 1 1 1 1 ? ? 1 1 User 2 ? 1 1 1 1 ? ? User 3 1 1 1 1 ? 1 1 User 4 ? ? ? ? 1 1 1 User 5 1 1 1 ? ? 1 ? User 6 ? 1 1 1 1 ? ? User 7 1 1 ? ? 1 ? ? User 8 ? ? ? ? 1 1 1
Irrespective of how it is collected, ACF data is usually sparse in that a user typically rates only a small portion of the items in the system (Sarwar et al. 1998). Equation 4.1 defines a metric for sparsity which is termed sparsity level.
Equation 4.1
non zero entries sparsity level = 1
total entries
−
The degree of data sparsity is domain dependent, but for sites with a large product range and many customers, the sparsity level can be in the range of 90–97%. For instance, the sparsity level of the MovieLens ACF dataset is 0.9369 (Sarwar et al. 2000b).
4.4.3 ACF as a Classification Task
In the user–item tables above, the missing values are indicated by a ‘?’. The prediction task has been viewed as an elaboration, or filling out of the sparse user–item matrix (Billsus and Pazzani 1999), but within the constraints of a real time system. We can view this prediction task as a classification or regression problem. In a situation where users have rated assets, the recommendation problem may be cast as the prediction of these ratings – a regression problem. Alternatively, it may be viewed as a classification problem – the classification being whether an asset will be liked or disliked. The measure of accuracy may be: 0/1 error for classification, absolute mean error (MAE) or root-mean-square (RMS) error for regression or a measure of the correlation of predicted ratings with actual ratings (e.g. Pearson's correlation coefficient). This allows for the type of evaluation that is common in machine learning where the data is partitioned into training and test data – using the training data to produce predictions for the test data. These estimates may be improved by using k-fold cross validation or by using a leave-one-out evaluation, i.e. a rating is predicted by using all the data except the rating itself (Moore & Lee 1994).
In Chapter 7 we will return to the issue of evaluation and suggest that the recommendation task is not as well understood as a classification/regression task, and that the goal of filling in the missing values in the user–item matrix does not take into account the active user’s current ‘use- context’.