Exercises - Recommender Systems the Textbook

1. Explain why unary ratings are signiﬁcantly diﬀerent from other types of ratings in

the design of recommender systems.

2. Discuss cases in which content-based recommendations will not perform as well as

ratings-based collaborative ﬁltering.

3. Suppose you set up a system, where a guided visual interface is used in order to

determine the product of interest to a customer. What category of recommender system does this case fall into?

4. Discuss a scenario in which location plays an important role in the recommendation

process.

5. The chapter mentions the fact that collaborative ﬁltering can be viewed as a gener-

alization of the classification problem. Discuss a simple method to generalize classification algorithms to collaborative filtering. Explain why it is difficult to use such methods in the context of sparse ratings matrices.

6. Suppose that you had a recommender system that could predict raw ratings. How

would you use it to design a top-k recommender system? Discuss the computational complexity of such a system in terms of the number of applications of the base prediction algorithm. Under what circumstances would such an approach become impractical?

Chapter 2 Neighborhood-Based Collaborative

Filtering

“When one neighbor helps another, we strengthen our communities.” – Jennifer Pahlka

2.1 Introduction

Neighborhood-based collaborative ﬁltering algorithms, also referred to as memory-based algorithms, were among the earliest algorithms developed for collaborative ﬁltering. These algorithms are based on the fact that similar users display similar patterns of rating behavior and similar items receive similar ratings. There are two primary types of neighborhood-based algorithms:

1. User-based collaborative ﬁltering: In this case, the ratings provided by similar users to a target user A are used to make recommendations for A. The predicted ratings of A are computed as the weighted average values of these “peer group” ratings for each item.

2. Item-based collaborative filtering: In order to make recommendations for target item B, the first step is to determine a set S of items, which are most similar to item B. Then, in order to predict the rating of any particular user A for item B, the ratings in set S, which are specified by A, are determined. The weighted average of these ratings is used to compute the predicted rating of user A for item B.

An important distinction between user-based collaborative ﬁltering and item-based collaborative ﬁltering algorithms is that the ratings in the former case are predicted us- ing the ratings of neighboring users, whereas the ratings in the latter case are predicted using

DOI 10.1007/978-3-319-29659-3 2

30 CHAPTER 2. NEIGHBORHOOD-BASED COLLABORATIVE FILTERING

the user’s own ratings on neighboring (i.e., closely related) items. In the former case, neigh- borhoods are defined by similarities among users (rows of ratings matrix), whereas in the latter case, neighborhoods are defined by similarities among items (columns of ratings matrix). Thus, the two methods share a complementary relationship. Nevertheless, there are considerable differences in the types of recommendations that are achieved using these two methods.

For the purpose of subsequent discussion, we assume that the user-item ratings matrix is an incomplete m× n matrix R = [ruj] containing m users and n items. It is assumed

that only a small subset of the ratings matrix is specified or observed. Like all other collaborative filtering algorithms, neighborhood-based collaborative filtering algorithms can be formulated in one of two ways:

1. Predicting the rating value of a user-item combination: This is the simplest and most primitive formulation of a recommender system. In this case, the missing rating ruj

of the user u for item j is predicted.

2. Determining the top-k items or top-k users: In most practical settings, the merchant is not necessarily looking for specific ratings values of user-item combinations. Rather, it is more interesting to learn the top-k most relevant items for a particular user, or the top-k most relevant users for a particular item. The problem of determining the top-k items is more common than that of finding the top-k users. This is because the former formulation is used to present lists of recommended items to users in Web- centric scenarios. In traditional recommender algorithms, the “top-k problem” almost always refers to the process of finding the top-k items, rather than the top-k users. However, the latter formulation is also useful to the merchant because it can be used to determine the best users to target with marketing efforts.

The two aforementioned problems are closely related. For example, in order to determine the top-k items for a particular user, one can predict the ratings of each item for that user. The top-k items can be selected on the basis of the predicted rating. In order to improve efficiency, neighborhood-based methods pre-compute some of the data needed for prediction in an offline phase. This pre-computed data can be used in order to perform the ranking in a more efficient way.

This chapter will discuss various neighborhood-based methods. We will study the impact of some properties of ratings matrices on collaborative filtering algorithms. In addition, we will study the impact of the ratings matrix on recommendation effectiveness and efficiency. We will discuss the use of clustering and graph-based representations for implementing neighborhood-based methods. We will also discuss the connections between neighborhood methods and regression modeling techniques. Regression methods provide an optimization framework for neighborhood-based methods. In particular, the neighborhood-based method can be shown to be a heuristic approximation of a least-squares regression model [72]. This approximate equivalence will be shown in section2.6. Such an optimization framework also paves the way for the integration of neighborhood methods with other optimization models, such as latent factor models. The integrated approach is discussed in detail in section 3.7 of Chapter3.

This chapter is organized as follows. Section2.2discusses a number of key properties of ratings matrices. Section2.3 discusses the key algorithms for neighborhood-based collaborative ﬁltering algorithms. Section2.4discusses how neighborhood-based algorithms can be made faster with the use of clustering methods. Section2.5discusses the use of dimensional- ity reduction methods for enhancing neighborhood-based collaborative ﬁltering algorithms.

2.2. KEY PROPERTIES OF RATINGS MATRICES 31

In document Recommender Systems the Textbook (Page 50-53)