• No results found

COMBINING COLLABORATIVE AND CONTENT-BASED TECHNIQUES

CHAPTER 4 : AUTOMATED COLLABORATIVE FILTERING

4.10 COMBINING COLLABORATIVE AND CONTENT-BASED TECHNIQUES

In the previous chapter we discussed CBR, an AI technique which can be used as a content-based recommendation strategy. One of the strengths of a content-based strategy is that it will consistently find items that match a given user profile. Thus, new items can be recommended to the user, irrespective of whether other users have endorsed them. This addresses the latency problem in ACF systems whereby new items cannot be recommended until they have been rated by a number of users. Content-based systems have the drawback that they will only tend to recommend the types of items described in the user profile, thus limiting the user to the same type of items he/she has

used in the past. ACF, on the other hand, taps into the collective experience of a neighbourhood of users and can make much more diverse recommendations. Of course, it could be argued that a technique like CBR could make diverse recommendations if a sufficiently elaborate domain ontology were developed. However, the knowledge engineering required to model the relations between items in a changing domain, as well as the utility measures to reflect each user’s preferences would be enormous. Instead, ACF captures this knowledge implicitly.

Several research applications have demonstrated that the weaknesses of content-based systems and of collaborative systems can be alleviated by the strengths of each respective technique (Balabanovic & Shoham 1997, Smyth & Cotter 1999a,b, Pazzani 1999). These strengths and weaknesses have been summarised in Table 4.7. to illustrate the mutually beneficial affect of each technique.

Table 4.7: The strengths (+) and weaknesses (–) of collaborative filters and content-based filters

Content-based (CB) Å comment Æ Collaborative Filter (CF) + Consistently finds items similar

in type to those liked in the past.

–Cannot infer whether the items are good or bad quality.

Content filters can recommend new items irrespective of their rating history.

However, the content representation cannot capture subjective measures such as aesthetic quality.

–Suffers from new item latency – ‘Grey sheep’ problem

+ Recommends items using

endorsements from like-minded users.

– Cannot infer user’s preference

for new types of content. ACF can suggest cross ‘genre’ recommendations which a content filter would never find.

+ Finds content type other than that

used by the user in the past.

– Suffers from new user bootstrap problem, but can start retrieving ‘similar’ content quickly.

+ Independent of the ratings of

other users in the system.

Both systems require historical data to begin recommending/filtering content. However, content-based techniques can ‘query by instance’, i.e. find new items ‘similar’ to a single instance chosen by the user. ACF systems require rating data from the active user and other like- minded users

– New user bootstrap problem

– Requires large amount of rating data from other users.

4.10.1 Combination Methods

Burke has surveyed the various techniques for combining ACF with other recommendation strategies (Burke 2002). He defines 6 combination techniques described in the literature. We will outline these techniques, as they are relevant to our discussion in the next chapter of the hybrid approach used in Smart Radio.

Weighted:

With this technique a prediction on an item is made using a weighted average of the predictions of each of the recommender techniques in the system. The P-Tango system, a personalised newspaper system, learns the appropriate weights for the ACF filter and the content filter by periodically evaluating the success of each filter on the use-data collected by the system for each user. Thus each user is assigned different weights for the ACF filter and content filter according to the success of each technique operating on its own (Claypool et al. 1999). Claypool suggests that this

technique is appropriate for dealing with grey sheep in which case the content filter will predominate until the ACF filter has accumulated enough data to make good predictions.

Switched:

Switched systems use a measure of quality to provide recommendations from one recommender technique or the other. The DailyLearner system uses a content-based strategy as the primary recommendation strategy. If the recommendation cannot be made with enough confidence, it falls back on a collaborative strategy (Billsus & Pazzani 1999).

Mixed:

Mixed combinations occur where several recommendations are presented together. For instance, the PTV system employs a mixed strategy to compile a personalised TV guide for each of its users. PTV arbitrates between possible conflicts by allowing the content-based strategy take precedence over the collaborative strategy (Burke 2002).

Feature combination:

Feature combination systems are essentially content-based techniques in which the collaborative data is treated as an additional set of features. Basu et al. (1998) apply an inductive rule learner to such a merged dataset and report significant improvements in precision over purely collaborative techniques. However, in order to achieve this, content features were manually selected.

Cascade:

Cascading combinations employ one recommendation strategy to retrieve a candidate set of recommendations that is then refined by a second recommendation strategy. Burke’s revised Entrée system, a restaurant recommender, employs a cascaded combination. The content-based recommender produces an ordered list of recommendations, while the collaborative recommender is used to decide the ordering of tied recommendations (Burke 2000).

Feature augmentation:

Feature augmentation refers to a process where the output of one recommendation strategy contributes additional features to the input of a second recommender. The GroupLens research group has employed content-based filter agents specialised in particular topics as pseudo-users to rate new content. These ratings alleviate the sparsity problem and the latency problem for the collaborative filter recommender (Sarwar et al. 1998).

Meta-level:

In this case the model generated by one recommender is used as the input for another. An example of this technique is the Fab system described earlier (Balabanovic & Shoham 1997).

Finally, it should be noted that building a hybrid model supposes that good content descriptors are readily available which might not be the case for certain domains. The quality of either type of data (content or collaborative) will be the primary factor in deciding a combination strategy.