• No results found

As mentioned this spectrum of techniques combines both node-based and topology-based methods since the features of nodes considered in these methods consist of both similarity features from topology-based metrics and other specific social network features from node-based metrics such as textual information, domain knowledge and other attributes. Many learning- based link prediction methods have been proposed in recent years based on the features provided by the relatively classic methods introduced above regarding internal attributes and external information [41, 79]. Most of the learning based methods can be regarded as a typical feature- based classification problem [84, 116, 161].

2.5.1 Feature Based Classification

Let , ∈ be nodes in a graph ( , )and ( , ) be the label of the node pair instance( , ). We can have:

( , )

=

+1 ( , ) ∈

−1 ( , ) ∉

Where a pair of nodes can be labelled as positive if there is a link connecting the nodes, otherwise, the pair is labelled as negative. As we can see this is a typical binary classification

38

problem and many supervised classification learning models can be used to solve it, just to name a few decision tree, support vector machines [23], Bayes, etc. While using this method, it is important to define, select and collect appropriate features from social networks. Thus we can use features provide by node-based, topology-based and social theory based metrics. However, all these features extracted are very sparse in terms of semantic description [85, 150]. Providing a unified way to describe the abundant and diverse academic information has been a debatable and challenge issue around the world. Newman [105, 106] firstly rose up the idea of analysing the structure of scientific collaboration network by calculating statistics about coauthors relationships. FRBR (Functional Requirements for Bibliographic Records) [76] is an entity- relationship model developed by the International Federation of Library Associations and Libraries (IFLA), which is an application model for describing bibliography records in the area of academic publication. Furthermore, Scholarly Works Application Profile (SWAP) created on top of FRBR as its semantic implementation further introduced a way of describing electronic publications such as peer-reviewed journal articles, work papers and theories, etc. On the other hand, FOAF [21, 155] which stands for Friend of a Friend is a metadata standard focus on describing people and those relationships among people that have become the basic element of a virtual community. Another widely accepted metadata standard is Dublin Core [142], which is a set of predefined properties for the description of documents in multi-disciplines. Finally, the MarcOnt ontology is also a unified bibliography proposed by Dabrowski [40] which is created based on analysis of a wide range of existing literature standards, including MARC21, ISBN, BibTex, FRBR, that explore the field of semantic description of academic literature.

2.5.2 Matrix Factorization

Link prediction problem can also be regarded as a matrix completion problem which the matrix factorization can be used to solve this issue [47, 100] . The graph we considered in link prediction problem can be factorised as ≈ ( ∧ ) for ∈ ℝ × , ∧ ∈ ℝ × and link

function (∙), where is the number of nodes and is the number of latent features. Each node has a corresponding latent vector ∈ ℝ . And the predicted score of this mode for a pair node ( , ) is ( ∧ ).

39

2.5.3 Recommender System

Recommender systems are to assist users to find out suitable items by analysing users' preferences. The core part of recommender systems is recommendation algorithms that can be divided into mainly classified into content based (CB) methods [109, 128] and collaborative filtering (CF) methods[15, 64, 124, 146]. Content-based recommendation methods extract characteristic units of users and items from their profiles and then recommend suitable items that are similar in content to items the user has liked in the past, or matched to attributes of the user[15]. These methods are very similar to a common neighbour in node-based similarity calculation. In contrast, collaborative filtering methods recommend items based on the preferences of other similar users or items and have been extensively used in some famous commercial systems [64], such as ebay.com. There are two main disciplines of collaborative filtering: the neighbourhood methods and the model-based methods [124]. The former predicts a user’s rating on a target item by the other users or items with high correlations. However, the latter uses the user-item matrix, in whole or in part, to train a prediction model.

Collaborative Filtering

As the major approach for a recommendation system, collaborative filtering has been most widely applied in different fields because of its advantage of relying on the user-item interaction history [10, 53, 64]. There are two types of collaborative filtering approaches: neighborhood- based and model-based. The difference between the two lies in how to use the user-item ratings. The former directly uses the stored ratings in the prediction. The latter uses these ratings to learn a predictive model.

Currently, Matrix factorization (MF) is one of the most popular model-based CF methods. MF techniques, including principal component analysis (PCA) [100], singular value decomposition (SVD) [134], Regularized Matrix Factorization (RMF) and latent Dirichlet allocation (LDA) [18], have been in particular well implementation to recommender systems. But these methods also encounter the data sparsity problem. To learn the characteristics of users/items, traditional MF techniques map both users and items into two low-rank user-specific [52]. And now, a large number of variants are proposed. For instance, Koren [78] proposed a methodology, named SVD++, to incorporate the SVD with neighbourhood information. Ma et al. [96] extended the RMF by integrating two social regularisation terms to constrain the matrix

40

factorization objective function under the assumption that friends with similar or dissimilar tastes are treated differently in the social network. Yelong Shen et al. [126] advanced a joint Personal and Social Latent factor (PSLF) model for the social recommendation. Santosh et al. [72]proposed an item-based method for generating top-N recommendations that learn the item- item similarity matrix as the product of two low-dimensional latent factor matrices. Chu-Xu Zhang et al. [159] considered the neighbours’ impact on the interest of each user in the same LFS and proposed a recommendation model based on clustering of users (UCMF). Szwabe et al. [134] combined random indexing (RI) technique and SVD to describe content features of items.