Collaborative Filtering Cross-Domain Recommenders

2.2 Single and Cross-Domain Recommender Systems

2.2.2 Collaborative Filtering Cross-Domain Recommenders

Cross-domain collaborative filtering aims to transfer user’s rating pattern from source (auxiliary) domains to a target domain for the purpose of alleviating the sparsity problem and providing better target recommendations. Most of the work on cross-domain collaborative filtering has been either on manually picked, naturally close domains (e.g. movie and music) or on one domain that is randomly split into datasets considered as distinct domains.

As an example of recent work, Tirushi and Kuflik presented initial results of a work in progress that ranked and mapped between pairs of domains based on the ability to create recommendations in domain one using ratings of items from the other domain [70]. They collected 2, 148 Facebook profiles, which contained items (likes) in four domains: Music, Movies, TV shows, and Books. Their initial results, with cross-domain collaborative filtering on a joint space of domains, showed that there are differences between the source domains with respect to the quality of the recommendations.

Zhang et al. proposed MCF and MCF-LF methods that exploit the relationships between domains and perform multiple collaborative filtering tasks simultaneously [82]. They used a probabilistic framework which uses probabilistic matrix factorization to model the rating problem in each domain and allows the knowledge to be adaptively transferred across different domains by automatically learning a link function between domains. Their experiments were performed on MovieLens and Book Crossing datasets separately, each of which are divided randomly into five simulated domains. This approach does not need shared users or items between the domains.

In [26] constrained collective matrix factorization (CCMF) was proposed as an extension of collective matrix factorization ([66]) to iteratively factorize the rating matrices in source

and target domain. The authors added a constraint on the user feature matrices for target domain and auxiliary domain. This approach assumes sharing users in the datasets and the experiments are on a simulated dataset sampled from the Netflix dataset and a real dataset crawled from Douban. Klami et al. also provided a method based on collective matrix factorization (CMF) [30]. This method allows each of the matrices to have a separate low- rank structure independent of the other matrices, as well as structures that are shared only by a subset of them. They tested the method on MovieLens and Flickr data.

Lu et al. proposed Selective Transfer Learning that transfers the data using a criterion based on empirical prediction error and its variance [45]. It extends Gaussian Probabilistic Latent Semantic Analysis (GPLSA) to Transferred Gaussian Probabilistic Latent Semantic Analysis (TGPLSA) model, then applies TGPLSA as base model over weighted instances for Selective Transfer Learning for Collaborative Filtering (STLCF). In this case, the approach needs either shared users or shared items.

Moreno et al. proposed a transfer learning technique (TALMUD) that extracts knowledge from multiple domains containing rich data (e.g., movies and music) and generates recommendations for a sparse target domain (e.g., games) [50]. The approach learns the degree of relatedness between different source domains and the target domain, without re- quiring overlapping users between domains. They tested their approach on Netflix, Jester, Music Loads, and Games Loads data.

Zhao et al. proposed a framework to construct entity correspondence between domains with limited shared user or items [83]. They used active learning to facilitate knowledge transfer across recommender systems based on Maximum-Margin Matrix Factorization. Their setting of source and target domains is as following: Netflix → Netflix, DoubanMovie → DoubanBook and Netflix → DoubanMovie.

Hu et al. proposed a generalized Cross Domain Triadic Factorization (CDTF) model over the triadic relation user-item-domain based on CP tensor decomposition [25]. They leveraged user explicit and implicit feedback respectively, along with a genetic algorithm based weight parameters tuning algorithm to trade off influence among domains optimally. They experimented on Amazon data (music CDs, DVDs and VHS video tape domains) and social network dataset provided by KDD Cup 2012 with 4 anonymous item domains.

Twin-Bridge Transfer Learning (TBT) proposed in [65] reduces the sparsity in target data by transferring knowledge from dense auxiliary data with either shared user or item sets and the similarity graphs of users and items constructed from the learned latent factors. The authors tested their approach on MovieLens10M and Epinions datasets separately with simulated domains created by random separation of datasets.

Wu et al. proposed a fusion multi-domain semantic topics and syntax classes model based on hidden Markov model with latent Dirichlet allocation (HMM-LDA) [77]. In every sub-domain, the model uses HMM-LDA to extract sub-domain topic and class features. Then, the fusion model combines the multiple sub-domain models to extract the whole domain features. They used MovieLens and Book-Crossing dataset (book and movie as source, movie as target) for their experiments. This approach does not require shared users or items.

Xin et al. proposed a nonlinear transfer learning model, and used the radial basis function (RBF) kernel to map user features of multiple sites [78]. This approach consists of two steps: first, the initial feature vectors for users/items in source and target domains are learned separately using probabilistic matrix factorization; then, a group of regression functions (using support vector machine) are used to map the user latent feature in the auxiliary domain to the user latent feature in the target domain. The kernel trick is used in this second step. In this approach the users should be shared in the domains. Douban (movies) and DianPing (restaurants) are the datasets the authors experimented on.

Loni et al. used factorization machines on Amazon data (books, music CDs, DVDs and video tapes) for cross-domain collaborative filtering [42].

Gao et al. [21] proposed a cluster-level based latent factor model for cross-domain recommendations. They based their optimization problems on a joint non-negative matrix tri-factorization. The assumption behind this factorization is that there is a common latent rating pattern across the two domains (in addition to domain-specific latent rating pat- terns) that drives the useful shared information. They tested their method on MovieLens, EachMovie, and Book-Crossing datasets.

Iwata and Takeuchi proposed a method based on matrix factorization, assuming that latent vectors in different domains are generated from a common Gaussian distribution with

a full covariance matrix [27]. Neither users nor items were shared across domains. They tried their method on Movielens, EachMovie, Netflix, and Amazon review rating (Book, DVD, Electronics, Kitchen, Music and Video).

Liu et al. proposed the notion of Hyper-Structure Transfer (HST) and its model called the Minimal Orthogonal Tensor Approximation with Residuals (MOTAR) that transfers non- linearly correlated knowledge between domains [41]. This approach works on the domains with shared users. Movielens and DBLP (each citation is a rating, each category of MS research is a domain) are the datasets they have tested their approach on.

Mirbakhsh and Ling proposed cross-domain clustering-based matrix factorization on Amazon dataset (DVD, music, video, electronics, kitchen and housewares, and toys and games) and Epinions dataset (10 categories with the most observed ratings) in [46].

In document Canonical Correlation Analysis in Cross-Domain Recommendation (Page 38-41)