Leveraging User Proﬁles - Using Content-Based Models for Collaborative Filtering

4.6 Using Content-Based Models for Collaborative Filtering

4.6.1 Leveraging User Proﬁles

Another case in which collaborative filtering-like models can be created with content attributes is when user profiles are available in the form of specified keywords. For example, users may choose to specify their particular interests in the form of keywords. In such cases, instead of creating a local classification model for each user, one can create a global clas- sification model over all users by using the user features. For each user-item combination, a content-centric representation can be created by using the Kronecker-product of the attribute vectors of the corresponding user and item [50]. A classification or regression model is constructed on this representation to map user-item combinations to ratings. Such an approach is described in detail in section8.5.3of Chapter8.

4.7 Summary

This chapter introduces the methodology of content-based recommender systems in which user-specific training models are created for the recommendation process. The content attributes in item descriptions are combined with user ratings to create user profiles. Clas- sification models are created on the basis of these models. These models are then used to classify item descriptions that have as of yet not been rated by the user. Numerous classification and regression models are used by such systems, such as nearest-neighbor classifiers, rule-based methods, the Bayes method, and linear models. The Bayes method has been used with great success in a variety of scenarios because of its ability to handle various types of content. Content-based systems have the advantage that they can handle cold- start problems with respect to new items, although they cannot handle cold-start problems with respect to new users. The serendipity of content-based systems is relatively low because content-based recommendations are based on the content of the items previously rated by the user.

164 CHAPTER 4. CONTENT-BASED RECOMMENDER SYSTEMS

4.8 Bibliographic Notes

The earliest content-based systems were attributed to the work in [60] and the Syskill & Webert [82, 476–478] systems. Fab, however, uses a partial hybridization design in which the peer group is determined using content-based methods, but the ratings of other users are leveraged in the recommendation process. The works in [5,376, 477] provide excellent overview articles on content-based recommender systems. The latter work was designed for finding interesting Websites, and therefore numerous text classifiers were tested for their effectiveness. In particular, the work in [82] provides a number of useful pointers about the relative performance of various content-based systems. Probabilistic methods for user modeling are discussed in [83]. The work in [163, 164] is notable for its use of rule-based systems in e-mail classification. Rocchio’s relevance feedback [511] was also used during the early years, although the work does not have theoretical underpinnings, and it can often perform poorly in many scenarios. Numerous text classification methods, which can be used for content-based recommendations, are discussed in [21,22,400]. A discussion of the notion of serendipity in the context of information retrieval is provided in [599]. Some content- based systems explicitly filter out very similar items in order to improve serendipity [85]. The work in [418] discusses how one can go beyond accuracy metrics to measure the quality of a recommender system.

Methods for feature extraction, cleaning, and feature selection in text classification are discussed in [21, 364, 400]. The extraction of the main content block from a Web page containing multiple blocks is achieved with the help of the tree-matching algorithm can be found in [364, 662]. The use of visual representations for extracting content structure from Web pages is described in [126]. A detailed discussion of feature selection measures for classification may be found in [18]. A recent text classification survey [21] discusses feature selection algorithms for the specific case of text data.

Numerous real-world systems have been designed with the use of content-based systems. Some of the earliest are Fab [60] and Syskill & Webert [477]. An early system, referred to as Personal WebWatcher [438, 439], makes recommendations by learning the interests of users from the Web pages that they visit. In addition, the Web pages that are linked to by the visited page are used in the recommendation process. The Letizia system [356] uses a Web-browser extension to track the user’s browsing behavior, and uses it to make recommendations. A system known as Dynamic-Profiler uses a pre-defined taxonomy of categories to make news recommendations to users in real time [636]. In this case, user Web logs are used to learn the preferences and make personalized recommendations. The IfWeb system [55] represents the user interests in the form of a semantic network. The WebMate system [150] learns user profiles in the form of keyword vectors. This system is designed for keeping track of positive user interests rather than negative ones. The general principles in Web recommendations are not very different from those of news filtering. Methods for performing news recommendations are discussed in [41,84,85,392,543,561]. Some of these methods use enhanced representations, such as WordNet, to improve the modeling process. Web recommender systems are generally more challenging than news recommender systems because the underlying text is often of lower quality. The Citeseer system [91] is able to discover interesting publications in a bibliographic database by identifying the common citations among the papers. Thus, it explicitly uses citations as a content mechanism for determination of similarity.

Content-based systems have also been used in other domains such as books, music, and movies. Content-based methods for book recommendations are discussed in [448]. The main challenge in music recommendations is the semantic gap between easily available features

4.9. EXERCISES 165 and the likelihood of a user appreciating the music. This is a common characteristic between the music and the image domains. Some progress in bridging the semantic gap has been made in [138, 139]. Pandora [693] uses the features extracted in the Music Genome Project to make recommendations. The ITR system discusses how one might use text de- scriptions [178] of items (e.g., book descriptions or movie plots) to make recommendations. Further work [179] shows how one might integrate tags in a content-based recommender. The approach uses linguistic tools such as WordNet to extract knowledge for the recom- mendation process. A movie recommendation system that uses text categorization is the INTIMATE system [391]. A method that combines content-based and collaborative recom- mender systems is discussed in [520]. A broader overview of hybrid recommender systems is provided in [117]. A potential direction of work, mentioned in [376], is to enhance content- based recommender systems with encyclopedic knowledge [174, 210, 211], such as that gained from Wikipedia. A few methods have been designed that use Wikipedia for movie recommendation [341]. Interestingly, this approach does not improve the accuracy of the recommender system. The application of advanced semantic knowledge in content-based recommendations has been mentioned as a direction of future work in [376].

4.9 Exercises

1. Consider a scenario in which a user provides like/dislike ratings of a set of 20 items,

in which she rates 9 items as a “like” and the remaining as a “dislike.” Suppose that 7 item descriptions contain the word “thriller,” and the user dislikes 5 of these items. Compute the Gini index with respect to the original data distribution, and with respect to the subset of items containing the word “thriller.” Should feature selection algorithms retain this word in the item descriptions?

2. Implement a rule-based classiﬁer with the use of association pattern mining.

3. Consider a movie recommender system in which movies belong to one or more of

the genres illustrated in the table, and a particular user provides the following set of ratings to each of the movies.

Genre⇒ Comedy Drama Romance Thriller Action Horror Like or

Movie-Id⇓ Dislike 1 1 0 1 0 0 0 Dislike 2 1 1 1 0 1 0 Dislike 3 1 1 0 0 0 0 Dislike 4 0 0 0 1 1 0 Like 5 0 1 0 1 1 1 Like 6 0 0 0 0 1 1 Like Test-1 0 0 0 1 0 1 ? Test-2 0 1 1 0 0 0 ?

Mine all the rules with at least 33% support and 75% conﬁdence. Based on these rules, would you recommend the item Test-1 or Test-2 to the user?

4. Implement a Bayes classiﬁer with Laplacian smoothing.

5. Repeat Exercise 3 with the use of a Bayes classiﬁer. Do not use Laplacian smoothing.

166 CHAPTER 4. CONTENT-BASED RECOMMENDER SYSTEMS

6. Repeat Exercise 3 with the use of a 1-nearest neighbor classiﬁer.

7. For a training data matrix D, regularized least-squares regression requires the

inversion of the matrix (DTD + λI), where λ > 0. Show that this matrix is always invertible.

8. The χ2distribution is deﬁned by the following formula, as discussed in the chapter:

χ2= p i₌₁ (Oi− Ei)2 Ei

Show that for a 2× 2 contingency table, the aforementioned formula can be rewritten as follows:

χ2= (O1+ O2+ O3+ O4)· (O1O4− O2O3) 2 (O₁+ O₂)· (O3+ O₄)· (O1+ O₃)· (O2+ O₄)

Chapter 5 Knowledge-Based Recommender

Systems

“Knowledge is knowing that a tomato is a fruit. Wisdom is knowing not to put it in a fruit salad.”–Brian O’Driscoll

5.1 Introduction

Both content-based and collaborative systems require a signiﬁcant amount of data about past buying and rating experiences. For example, collaborative systems require a reasonably well populated ratings matrix to make future recommendations. In cases where the amount of available data is limited, the recommendations are either poor, or they lack full coverage over the entire spectrum of user-item combinations. This problem is also referred to as the cold-start problem. Diﬀerent systems have varying levels of susceptibility to this problem. For example, collaborative systems are the most susceptible, and they cannot handle new items or new users very well. Content-based recommender systems are somewhat better at handling new items, but they still cannot provide recommendations to new users.

Furthermore, these methods are generally not well suited to domains in which the prod- uct is highly customized. Examples include items such as real estate, automobiles, tourism requests, financial services, or expensive luxury goods. Such items are bought rarely, and sufficient ratings are often not available. In many cases, the item domain may be complex, and there may be few instances of a specific item with a particular set of properties. For example, one might want to buy a house with a specific number of bedrooms, lawn, locality, and so on. Because of the complexity in describing the item, it may be difficult to obtain a reasonable set of ratings reflecting the past history of a user on a similar item. Similarly, an old rating on a car with a specific set of options may not even be relevant in the present context.

DOI 10.1007/978-3-319-29659-3 5

168 CHAPTER 5. KNOWLEDGE-BASED RECOMMENDER SYSTEMS

How can one handle such customization and paucity of ratings? Knowledge-based rec- ommender systems rely on explicitly soliciting user requirements for such items. However, in such complex domains, it is often difficult for users to fully enunciate or even understand how their requirements match the product availability. For example, a user may not even be aware that a car with a certain combination of fuel efficiency and horsepower is available. Therefore, such systems use interactive feedback, which allows the user to explore the inherently complex product space and learn about the trade-offs available between various options. The retrieval and exploration process is facilitated by knowledge bases describing the utilities and/or trade-offs of various features in the product domain. The use of knowledge bases is so important to an effective retrieval and exploration process, that such systems are referred to as knowledge-based recommender systems.

Knowledge-based recommender systems are well suited to the recommendation of items that are not bought on a regular basis. Furthermore, in such item domains, users are generally more active in being explicit about their requirements. A user may often be willing to accept a movie recommendation without much input, but she would be unwilling to accept recommendations about a house or a car without having detailed information about the speciﬁc features of the item. Therefore, knowledge-based recommender systems are suited to types of item domains diﬀerent from those of collaborative and content-based systems. In general, knowledge-based recommender systems are appropriate in the following situations: 1. Customers want to explicitly specify their requirements. Therefore, interactivity is a crucial component of such systems. Note that collaborative and content-based systems do not allow this type of detailed feedback.

2. It is diﬃcult to obtain ratings for a speciﬁc type of item because of the greater complexity of the product domain in terms of the types of items and options available. 3. In some domains, such as computers, the ratings may be time-sensitive. The ratings on

an old car or computer are not very useful for recommendations because they evolve with changing product availability and corresponding user requirements.

A crucial part of knowledge-based systems is the greater control that the user has in guiding the recommendation process. This greater control is a direct result of the need to be able to specify detailed requirements in an inherently complex problem domain. At a basic level, the conceptual differences in the three categories of recommendations are described in Table5.1. Note that there are also significant differences in the input data used by various systems. The recommendations of content-based and collaborative systems are primarily based on historical data, whereas knowledge-based systems are based on the direct specifications by users of what they want. An important distinguishing characteristic of knowledge-based sys- tems is a high level of customization to the specific domain. This customization is achieved through the use of a knowledge-base that encodes relevant domain knowledge in the form of either constraints or similarity metrics. Some knowledge-based systems might also use user attributes (e.g., demographic attributes) in addition to item attributes, which are specified at query time. In such cases, the domain knowledge might also encode relationships between user attributes and item attributes. The use of such attributes is, however, not universal to knowledge-based systems, in which the greater focus is on user requirements.

Knowledge-based recommender systems can be categorized on the basis of user interactive methodology and the corresponding knowledge bases used to facilitate the interaction. There are two primary types of knowledge-based recommender systems:

1. Constraint-based recommender systems: In constraint-based systems [196,197], users typically specify requirements or constraints (e.g., lower or upper limits) on the item

5.1. INTRODUCTION 169

Table 5.1: The conceptual goals of various recommender systems

Approach Conceptual Goal Input

Collaborative Give me recommendations based on a collaborative approach User ratings + that leverages the ratings and actions of my peers/myself. community ratings Content- Give me recommendations based on the content (attributes) User ratings +

based I have favored in my past ratings and actions. item attributes Knowledge- Give me recommendations based on my explicit speciﬁcation User speciﬁcation +

based of the kind of content (attributes) I want. item attributes + domain knowledge

attributes. Furthermore, domain-specific rules are used to match the user requirements or attributes to item attributes. These rules represent the domain-specific knowledge used by the system. Such rules could take the form of domain-specific constraints on the item attributes (e.g., “Cars before year 1970 do not have cruise control.”). Furthermore, constraint-based systems often create rules relating user attributes to item attributes (e.g., “Older investors do not invest in ultrahigh-risk products.”). In such cases, user attributes may also be specified in the search process. Depending on the number and type of returned results, the user might have an opportunity to modify their original requirements. For example, a user might relax some constraints when too few results are returned, or add more constraints when too many results are returned. This search process is interactively repeated until the user arrives at her desired results.

2. Case-based recommender systems: In case-based recommender systems [102,116,377, 558], specific cases are specified by the user as targets or anchor points. Similarity metrics are defined on the item attributes to retrieve similar items to these targets. The similarity metrics are often carefully defined in a domain-specific way. Therefore, the similarity metrics form the domain knowledge that is used in such systems. The returned results are often used as new target cases with some interactive modifications by the user. For example, when a user sees a returned result that is almost similar to what she wants, she might re-issue a query with that target, but with some of the attributes changed to her liking. Alternatively, a directional critique may be specified to prune items with specific attribute values greater (or less) than that of a specific item of interest. This interactive process is used to guide the user towards the final recommendation.

Note that in both cases, the system provides an opportunity for the user to change her specified requirements. However, the way in which this is done is different in the two cases. In case-based systems, examples (or cases) are used as anchor points to guide the search in con- junction with similarity metrics, whereas in constraint-based systems, specific criteria/rules (or constraints) are used to guide the search. In both cases, the presented results are used to modify the criteria for finding further recommendations. Knowledge-based systems derive their name from the fact that they encode various types of domain knowledge in the form of constraints, rules, similarity metrics, and utility functions during the search process. For example, the design of a similarity metric or a specific constraint requires domain-specific knowledge, which is crucial to the effective functioning of the recommender system. In general, knowledge-based systems draw on highly heterogeneous, domain-specific sources of knowledge, compared to content-based and collaborative systems, which work with somewhat similar types of input data across various domains. As a result, knowledge-based

170 CHAPTER 5. KNOWLEDGE-BASED RECOMMENDER SYSTEMS

systems are highly customized, and they are not easily generalizable across various domains. However, the broader principles with which this customization is done are invariant across domains. The goal of this chapter is to discuss these principles.

The interaction between user and recommender may take the form of conversational systems, search-based systems, or navigational systems. Such diﬀerent forms of guidance may be present either in isolation, or in combination, and they are deﬁned as follows:

1. Conversational systems: In this case, the user preferences are determined in the con- text of a feedback loop. The main reason for this is that the item domain is complex, and the user preferences can be determined only in the context of an iterative conversational system.

2. Search-based systems: In search-based systems, user preferences are elicited by using a preset sequence of questions such as the following: “Do you prefer a house in a suburban area or within the city?”

3. Navigation-based recommendation: In navigation-based recommendation, the user specifies a number of change requests to the item being currently recommended. Through an iterative set of change requests, it is possible to arrive at a desirable item. An example of a change request specified by the user, when a specific house is

In document Recommender Systems the Textbook (Page 185-194)