Methods - Ensemble methods for top-N recommendation

We developed a model ensemble over local sparse linear models for top-N recommendation. Following the idea from Lee [29], a set of anchor pairs is first selected. With respect to each of the anchor pairs, a local SLIM model is trained. The local models are then ensembled via combining their results or combining the models directly.

4.3.1 Anchor Pair Selection

We first randomly select a user u∗ out of m users as the anchor user. Then from u∗, we randomly select an item i∗ that u∗ has purchased. This item will be the anchor item. The user-item pair (u∗, i∗) will be the anchor pair with respect to which each local model will be built.

4.3.2 Training Data Selection for Local Models

Training data for each local model with respect to each anchor pair (u∗, i∗) are selected according to: 1). user and item similarities, and 2). user and item popularities.

Similarity-based Training Data Selection

In this method, we use a radial basis function (RBF) kernel to measure the similarity between a user-item pair (u0, i0) and the anchor pair (u∗, i∗), and apply two

different similarity-based schemes to select training data. The RBF kernel over the two pairs is defined as follows [29]:

Kui((u0, i0), (u∗, i∗)) = Ku(u0, u∗) × Ki(i0, i∗) (4.5)

where both Ku and Ki are RBF kernels:

Ku(u0, u∗) = exp(−γkR(u0, ·) − R(u∗, ·)k2), (4.6)

Ki(i0, i∗) = exp(−γkR(·, i0) − R(·, i∗)k2). (4.7)

In Equation 4.6 and Equation 4.7, R(u, ·) (R(·, i)) is purchase profile of by user u (of item i). Since both Ku and Ki are valid kernels, Kui is also a valid kernel.

The definition in Equation 4.6 and Equation 4.7 follow the idea in user-based and item-based Collaborative Filtering (CF), respectively, that calculates user and item similarities directly from user-item matrix R. This is different from the idea in Lee et. al. [29], where the user and item similarities are calculated from their latent factors that are obtained via Matrix Factorization (MF) over R. By doing the CF-based user and item similarities, we can avoid unnecessary complications related to the non-convex properties of typical MF approaches and their high computational costs. Given the similarities, all the user-item pairs are weighted by their similarities with the anchor pair (u∗, i∗) as follows, Each local model is trained using the selected data RKui

u∗_,i∗(u0, i0) defined as follows,

RKui

u∗_,i∗(u0, i0) = K_ui((u0, i0), (u∗, i∗))R(u0, i0), (4.8)

where RKui

u∗_,i∗ has values in [0, 1] (i.e., any floating values between 0 and 1), and will

Popularity-based Training Data Selection

In this method, we select the user-item pairs such that the selected users/items have similar popularities as the anchor user/item. The user popularity is defined as the number of items that the user has purchased, and the item popularity is defined as the number of users who have purchased the item. For each anchor pair (u∗, i∗), we first select α% of all the users who have the closest but lower or higher popularity than u∗, respectively. From the selected users, we select α% of all the items that have the closest but lower or higher popularity than i∗, respectively. The interactions between the selected users and items will be used as training data. This training data selection method is referred to as SPui.

4.3.3 Model Combination and Recommendation Generation

After training a SLIM model on each of K selected training datasets with respect to K anchor pairs, we ensemble the model results or the models themselves in order to produce recommendations.

Model Result Aggregation

Each local model first produces a recommendation list. Each of the items that has ever appeared in any of the K recommendation lists is then scored. These items are re-ranked using the scores into a new ranked list and the top-N items in the new list will be recommended. This result-aggregation based method is referred to as MRA. To score the items, we use the following two approaches. The first one is the Borda [33] approach, which scores each item using the sum of their ranking positions from all the recommendation lists. The Borda scoring approach is denoted as Cr.

The second scoring approach is to use the weighted sum of recommendation scores from all the recommendation lists. In specific, the score of a user u on an item i,

denoted as ˜r˜u,i, is calculated using Equation 4.4 as in LLORMA. This scoring approach

is denoted as Cs.

Linear SLIM Model Ensemble

Instead of combining recommendation results from local models, we can also com- bine models directly. In the case of SLIM, the coefficient matrices from local models are linearly combined as follows:

W_i,je = 1 |{k|Wk i,j 6= 0, k = 1, · · · , K}| K X k=1 W_i,jk, (4.9)

where Wk is the k-th model (coefficient matrix) with respect to the k-th anchor pair, {Wi,j|Wi,jk 6= 0, k = 1, · · · , K} is the set of coefficients in which Wi,jk 6= 0, | · | is the

cardinality of a set , and We is the ensembled model (coefficient matrix). We use We to produce recommendations as in Equation 4.1.

Note that only a certain portion of Wk that corresponds to the selected training items can have non-zero values. Thus, the linear combination of multiple Wk_{’s resem-}

bles using multiple small plates to approximate a manifold. Thus, it may represent non-linear relations among items. This method is referred to as LSME.

4.4 Materials 4.4.1 Datasets

We evaluated different methods on a benchmark dataset ML100K1, and its sparsified versions. From the original dataset (referred to as ML100K-1), we generated three sparsified datasets (referred to as ML100K-2, ML100K-3 and ML100K-4, respectively). The first sparsified dataset MK100K-2 is generated by randomly selecting 50% of purchases from ML100K-1, The second/third sparsified dataset ML100K- 1_{https://grouplens.org/datasets/movielens/}

3/ML100K-4 is generated by randomly selecting 50% of purchases from ML100K- 2/ML100K-3. Table 4.1 represents the dataset summaries.

Table 4.1.: Datasets Used in Evaluation

dataset #users #items #ratings rsize csize density ML100K-1 943 1,682 100,000 106.05 59.45 6.30% ML100K-2 943 1,682 49,760 52.77 29.58 3.14% ML100K-3 943 1,682 24,647 26.14 14.65 1.55% ML100K-4 943 1,682 12,086 12.82 7.19 0.76%

Columns of “#users”, “#items” and “#ratings” represent the number of users, items and ratings in the datasets, respectively. Columns of “rsize” and “csize” represent the average number of ratings for each user and each item, respectively. Column of “density” represents the density of each dataset (i.e., density = #ratings/(#users × #items)).

4.4.2 Evaluation Methodology and Metrics

We applied 5-time Leave-One-Out cross validation (LOOCV) to evaluate the performance of different methods. In each run, one of the purchases of each user is randomly selected into the testing set, and the remaining purchases are used in the training set. For SLIM, we search the parameters in the following ranges. For ML100K-1, the range of λ is {0.00001,0.0001,0.001,0.01,0.1,1,2,5}, the range of β is {0.1,1,2,5,10,15}. For ML100K-2, the range of λ is {0.001,0.01,0.1,1,2,5}, the range of β is {5,10,15,20,25}. For ML100K-3, the range of λ is {0.001,0.01,0.1,1,2,5}, the range of β is {5,10,15,20,25}. For Local SLIM with combined Models(LSM) based on popularity, we find the best performance in the following parameters. For ML100K-1, the range of λ is {2,5,10}, the range of β is {0.01,0.1,1}, the range of n (number of local models) is {10,20,50,60}, the range of bandwidth is {0.3,0.4}. For ML100K-2, the range of λ is {0.0001,0.001,0.01,0.1,1,2,5}, the range of β is {5,10,15,20}, the range of n (number of local models) is {50,60,70}, the range of bandwidth is {0.3,0.4}. ‘For ML100K-3, the range of λ is {0.0001,0.001,0.01,0.1,1}, the range of β is {2,5,10,15}, the range of n (number of local models) is {50,60,70}, the range of bandwidth is {0.3,0.4}. The performance of LSM based on popularity and LSIM on the above datasets is shown in Table 4.2. For each user, a size-N (N = 10 by default) ranked

list of items is recommended from the ensemble model trained using the training set. The evaluation is performed by comparing the recommendations for each user and the left-out item of that user in the testing set. We use Hit Rate (HR) and the Average Reciprocal Hit-Rank (ARHR) [11] as the evaluation metrics. HR is defined as the rate of correctly recommended items, that is,

HR = #hits

#users, (4.10)

where #users is the total number of users in the testing set, and #hits is the number of users who have their testing items correctly recommended (i.e., hit). ARHR is a weighted version of HR defined as follows:

ARHR = 1 #users #hits X i=1 1 pi (4.11)

where if an item of a user is hit, pi is the position of the item in the ranked recommen-

dation list. Thus, ARHR measures how strongly an item is recommended, in which the weight is the reciprocal of the hit position in the recommendation list. Higher HR and ARHR values indicate better performance.

4.5 Experimental Results

In document Ensemble methods for top-N recommendation (Page 53-58)