Using RecDB - Database management system support for collaborative filtering recommender system

Since RecDB is implemented inside a database management system, it hence accepts relational tables as input. The recommender input data mainly represents a user/item

Ratings table that contains a set of users U and a set of items I and a set of ratings

18 Ratings represent users expressing their opinions over items. Opinions can be a numeric rating (e.g., one to five stars), or unary (e.g., Facebook “check-ins”). Also, ratings may represent purchasing behavior (e.g., Amazon). Figure 3.2 gives an example of movie recommendation data.

RecDB provides a tool to the system users to freely decide which attributes and recommendation algorithm to be used in building a recommender. To this end, the system allows its users to use a SQL-like clause to declare a new recommender by specifying the recommender input data source and recommendation algorithm. This section focuses on how users interact with the system. In particular, Section 6.1.1 explains the SQL clause for creating a new recommender, while Section 6.1.2 explains the SQL for querying a certain recommender. Internals of RecDB that enable such interface, i.e., indexing, maintenance, query processing and optimization, are described in later sections.

3.2.1 Creating a Recommender

To allow creating a new recommender, RecDB employs a new SQL statement, called CREATE RECOMMENDER, as follows:

CREATE RECOMMENDER <Recommender Name> ON <Ratings Table> USERS FROM <Users ID Column>

ITEMS FROM <Items ID Column>

RATINGS FROM <Ratings Value Column> USING <Recommendation algorithm>

The recommender creation SQL, presented above, has the following parameters: (1) Recommender name is a unique name assigned to the created Recommender. (2) Ratings Table is the table that contains the input user/item ratings data (e.g., see Figure 3.2). (3) Users ID Column, Items ID Column, and Ratings Value Column are the columns containing the users, items, and ratings data in the ratings table. (4) Recommendation algorithm is the algorithm used to build the recommender. Cur- rently, RecDB supports three main recommendation algorithms (with their variants): (a) Item-Item Collaborative Filtering with Cosine (abbr. ItemCosCF) or Pearson Cor- relation (abbr. ItemPearCF) similarity functions, (b) User-User Collaborative filtering (abbr. UserCosCF / UserPearCF), and (c) Regularized Gradient Descent Singular Value

UserID uVector

Alice {h‘Spartacus’,1.5i}

Bob {h‘Inception’,3.5i;h‘Spartacus’,4.5i;h‘The Matrix’,2i} Carol {h‘Inception’,1i;h‘Spartacus’,2i}

Eve {h‘Inception’,1i;h‘The Matrix’,2.5i} (d) Item-Users Vector Table

Item iVector

‘Spartacus’ {hAlice,1.5i;hBob,4.5i;hCarol,1i} ‘Inception’ {hBob,3.5i;hCarol,1i;hEve,1i} ‘The Matrix’ {hBob,2i;hEve,2.5i}

Fig. 3.3: Rating Table Storage Representation

Decomposition (abbr. SVD). If no recommendation algorithm is specified, RecDB employs by default the ItemCosCF algorithm.

Recommender Initialization. The initialization process consists of two steps: (I) User/Item Rating Re-Arrangement: RecDB first re-arranges the user/item rating matrix data on disk and stores it into the vector representation format. That format represents the user/item ratings matrix as a table, namely the User Vector Table. The user vector table consists of two columns: UserID: a unique user identifier and uVector: a set of Key-Value pairs hiid, ratingi that contains evert item iid rated by the user and the respective rating value. The user vector table is indexed by a primary key index created on the UserID field. Figure 5.1 gives the User Vector Table for the ratings matrix given in Figure 3.2. To efficiently access item vectors instead of user vectors, we also store the user/item ratings matrix transpose, called Item Vector Table, that also consists of two columns: ItemID and iVector (see Figure 5.1).

(II) Model Building: In this step, RecDB employs a set of user defined functions that train a recommendation model RecModel using the input data. The format of the model depends on the underlying recommendation algorithm. For example, a recommendation model for the item-item collaborative filtering (cosine similarity measure) model (ItemCosCF) [6] represents a similarity list of the tuples hip, iq, SimScorei, where

20 3.2.2 Updating a Recommender

To get the most accurate result, RecModel should be updated with newly inserted rating by a user u assigned to an item i. However, doing so is infeasible as collaborative recommendation algorithms employ complex computational techniques that are very costly to update. The update maintenance procedure differs based on the underlying recommender algorithm, specified in the CREATE RECOMMENDER statement. Yet, most of the algorithms may call for a complete model rebuilding to incorporate any new update. To avoid such prohibitive cost, we decide to update the RecModel only if the number of new updates reaches to a certain percentage ratio N % (a system parameter) from the number of entries used to build the current model. We do so because an appealing quality of most supported recommendation algorithms is that as RecModel matures (i.e., more data is used to build it), more updates are needed to significantly change the recommendations produced from it.

3.2.3 Querying a Recommender

Once a recommender is created and initialized using the CREATE RECOMMENDER statement, users can issue SQL queries that harnesses the created recommender to produce recommendation to end-users, as follows:

SELECT <Select Clause> FROM <Rating Table>

RECOMMEND <ItemID> TO <UserID> ON <RatingVal> USING <Recommendation Algorithm>

WHERE <Where Clause>

Query Syntax. The SELECT and WHERE clauses are typical as in any SQL query. The FROM clause may directly accept a [Ratings] table with the same schema passed to the CREATE RECOMMENDER statement. The RECOMMEND clause is responsible for predicting how much the system users would like the unseen items. The application developer also needs to specify the ItemID (i.e., <ItemID>), UserID (i.e., TO <UserID>), and Rating Value (i.e., ON <RatingVal>) Columns.

Query Semantics. The RECOMMEND clause returns a set of tuples S such that each tuple s ∈ S; s =huid, iid, ratingvali represents a predicted rating score (ratingval) that

a user (uid) would give to an unseen item (iid) based on the recommendation algorithm specified in the USING clause.

In document Database management system support for collaborative filtering recommender systems (Page 30-34)