Performance Comparison - Embedding Approaches for Relational Data

4.5 Experiments

4.5.2 Performance Comparison

Performance of the proposed and competing methods are reported in Table 4.4. The proposed TransPES provides in general better performance than the competing ones, particularly for the larger and more complex dataset FB15k containing1,345

relation types. Although TransR provides good performance for the smaller dataset WN18, it performs less wells for the larger dataset FB15K. In terms of optimisation complexity, TransR requires to optimise much more variables than TransE and TransPES.

We also demonstrate how the TransPES performance changes against different settings of the embedding dimensionality (k), regularisation parameter (λ2) and mar-

gin parameter (γ) using the FB15K dataset. In each implementation, two parameters are fixed as the ones in the optimal configuration, different settings of the third parameter within the searching range are examined, for which the raw and filtered mean ranks, also the filtered hits@10 performance for both the validation and test

4.5 Experiments 106 Table 4.4: Performance comparison for WN18 and FB15k datasets. The best performance is highlighted in bold, and second best underlined.

Dataset WN18 FB15k

Metric Mean Rank Hits@10(%) Mean Rank Hits@10(%) Raw Filter Raw Filtered Raw Filtered Raw Filtered Unstructured [24] 315 304 35.3 38.2 1,074 979 4.5 6.3 RESCAL [67] 1,180 1,163 37.2 52.8 828 683 28.4 44.1 SE [23] 1,011 985 68.5 80.5 273 162 28.8 39.8 SME(linear) [24] 545 533 65.1 74.1 274 154 30.7 40.8 SME(bilinear) [24] 526 509 54.7 61.3 284 158 31.3 41.3 LFM [68] 469 456 71.4 81.6 283 164 26.0 33.1 TransE [128] 263 251 75.4 89.2 243 125 34.9 47.1 TransH [130] 318 303 75.4 86.7 211 84 42.5 58.5 TransR [131] 232 219 78.3 91.7 226 78 43.8 65.5 CTransR [131] 243 230 78.9 92.3 233 82 44 66.3 TransPES 223 212 71.6 81.3 198 66 48.05 67.3

sets are reported in Figures 4.2 and 4.2 (Cont.) . It can be seen that TransPES is less sensitive to the regularisation parameterλ2than to the embedding dimensionkand

margin parameterγ.

We further analyse the performance of the large dataset FB15K in detail using the detailed evaluation protocol suggested in [128], which classifies the hits@10 results according to four categories of relationship including 1-to-1 (1-1), 1-to-many (1-M), many-to-1 (M-1) and many-to-many (M-M).The corresponding results are shown in Table 4.5. It can be seen from the table that the proposed algorithm consistently outperforms most the competing ones, provides similarly good performance as the cluster-based TransR (CTransR). As expected, TransPES provides satisfactory performance to predict head entity in the 1-to-1, 1-to-many relationships and predict tail entity in the 1-to-1 and many-to-1 relationships.

We conduct deeper analysis for the FB15k dataset using the proposed evaluation scheme as explained in Section 4.4.4, based on which 4,336 (7.3%) reciprocal triplets and 41,481 (70.2%) reverse triplets are identified, and the remaining triplets correspond to “the others" type. Most test triplets can find their reciprocal or repetitive

4.5 Experiments 107 20 50 100 k 0 50 100 150 200 250 mean rank valid test

(a) raw mean rank (k)

20 50 100 k 0 20 40 60 80 100 120 mean rank valid test

(b) filtered mean rank (k)

20 50 100 k 0 10 20 30 40 50 60 70 hit@10 (%) valid test (c) filtered hits@10 (k) 0.001 0.01 0.1 λ 2 0 50 100 150 200 250 mean rank valid test

(d) raw mean rank (λ2)

0.001 0.01 0.1 λ 2 0 10 20 30 40 50 60 70 80 mean rank valid test

(e) filtered mean rank (λ2)

0.001 0.01 0.1 λ 2 0 10 20 30 40 50 60 70 hit@10 (%) valid test (f) filtered hits@10 (λ2)

Figure 4.2: Illustration of the performance change of TransPES against each of its three algorithm parameters (k, λ2, γ) in terms of the raw and filtered mean rank, also

the filtered hits@10 measurements, evaluated using validation and test sets marked as “valid" and “test" respectively in each plot. (cont.)

4.5 Experiments 108 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 γ 60 80 100 120 140 160 180 200 220 240 mean rank valid test

(g) raw mean rank (γ)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 γ 50 60 70 80 90 100 110 mean rank valid test

(h) filtered mean rank (γ)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 γ 52 54 56 58 60 62 64 66 68 70 hit@10 (%) valid test

(i) filtered hits@10 (γ)

Figure 4.2 (Cont.): Illustration of the performance change of TransPES against each of its three algorithm parameters (k, λ2, γ) in terms of the raw and filtered mean

rank, also the filtered hits@10 measurements, evaluated using validation and test sets marked as “valid" and “test" respectively in each plot.

4.5 Experiments 109

Table 4.5: Detailed evaluation on FB15k. Best performance is highlighted in bold, and second best underlined.

Tasks Predicting Head (Hits@10) Predicting Tail (Hits@10)

Relation Category 1-1 1-M M-1 M-M 1-1 1-M M-1 M-M Unstructured [24] 34.5 2.5 6.1 6.6 34.3 4.2 1.9 6.6 SE [23] 35.6 62.6 17.2 37.5 34.9 14.6 68.3 41.3 SME(linear) [24] 35.1 53.7 19.0 40.3 32.7 14.9 61.6 43.3 SME(bilinear) [24] 30.9 69.6 19.9 38.6 28.2 13.1 76.0 41.8 TransE [128] 43.7 65.7 18.2 47.2 43.7 19.7 66.7 50.0 TransH [130] 66.7 81.7 30.2 57.4 63.7 30.1 83.2 60.8 TransR [131] 76.9 77.9 38.1 66.9 76.2 38.4 76.2 69.1 CTransR [131] 78.6 77.8 36.4 68.0 77.4 37.8 78.0 70.3 TranPES 78.0 88.6 38.9 67.3 78.9 42.1 84.2 69.8

Table 4.6: Link prediction comparison between TransE and TransPES over the reciprocal, reverse and other triplets in the test set of FB15k data.

Methods TransE TranPES

Metrics MAR Hits@10(%) MAR Hits@10 (%)

Reciprocal 46 58.8 10 82.1

Reverse 75 56.9 28 72.4

4.6 Conclusion 110 forms in the training set to support the inference. In Table 4.6, we compare the TransE and TransPES performance by examining how well they infer the reciprocal and reverse type of triples in the test set in Table 4.6. It can be seen from the table that the proposed algorithm achieves much better results (58.8% vs. 82.1% on the reciprocal triplets and 56.9% vs. 72.4% on the reverse ones). On the other hand, for the more challenging triplets of “the other" type, both algorithms experience a very large decrease in the performance.

4.6 Conclusion

We have presented a new translation-based relational learning algorithm to encode relation triplets in KGs using link and entity embeddings, under the constraint of employing simple operations, such as vector addition and projection to encode interlinkages in KGs and maintain very low computational cost and better model interpretability. Facing the challenge of accurately modelling complex relation logic via simple operations, the key is to unfold the relation logic by determining appropri- ate subspaces to work on. The proposed TransPES allows multiple representations for a single relation type to model its multimodality behaviour when interacting with different entity pairs, and employs fixed embedding representation for entities to permit smooth propagation of information across the graph. Interactions between links and entities are formulated in different spaces spanned by different entity pairs to offer bespoke link presentation for a targeted entity pair. Performance comparison with state-of-the-art methods and in-depth analysis of the algorithm behaviour based on different test data partitions demonstrate the superiority of the proposed algorithm.

In Table 4.6, we have conducted deeper analysis for the FB15k dataset using the proposed evaluation scheme, and find that the TransE and TransPES algorithms perform much better on the simplereciprocalandrepetitivetypes of testing triples. From here we conclude that the existing relational learning algorithms could make better use of the triplets’ reciprocal and repetitive forms in the training set to support inference than other dependencies. And we foresee that a better encoding and interpretability of the more complex dependency structures within the KGs is highly

4.6 Conclusion 111 demanded for the future research in this field.

As we have discussed in Chapter 3, we expect that the inclusion of the relation type information can help to mitigate the difficulties of modelling the multi-modality linkage structures. However, it turns out that these sources of information themselves are noise and ambiguous by nature, and thus they can even introduce more issues. Unravelling the multi-modality properties of data will continually be an inevitable challenge for the relational learning algorithms. We will see in the next chapter, instead of labelling the relationships with type information, the linkage network com- bines the node description/content information to find ways to benefit the relational data learning tasks (e.g., link prediction, semantic representations).

Chapter 5 Link Prediction in Document

Networks

5.1 Introduction

In the former chapters, we have developed novel embedding approaches for analysing and manipulating different types of relational structures. While traditional learning algorithms have a long history of modelling attribute/propositional data that is characterised by its high dimensional feature vectors of attributes. It is possible to combine the relational structure with the rich information of data attribute representations to improve the relational learning tasks, i.e, suggesting new and conceivable connections, clustering data into functional related sets. In this chapter, we consider joining these two types of data information into a complex network, where nodes correspond to objects with attribute representations and edges to relationships between objects. In its most simple form, we study the document network data introduced in Section 1.1.1, where it requires the attributes to be fixed and defined in the same homogeneous set, e.g., attributes might correspond to occurrence counts of terms in a fixed vocabulary when representing text or raw intensity value of each pixel when representing images. In this chapter, we discuss the document network in the context of text, that is, the nodes correspond to documents, and the edges are the citations/hyperlinks between documents.

5.1 Introduction 113 Document network data is ubiquitous, existing throughout human generated text contents, such as citation/co-author networks in the scientific publications, friendship networks in online websites like Facebook, Flickr, and Twitter, and hyperlinked networks of webpages. Apparently, the context of the nodes along with the additional links between them should be useful for exploring various aspects of the intrinsic nature of the data. To this end, various tasks have been explored for the document network data in recent years, including improving the discovery of hidden "topic" factors in a corpus [33,135–138], enhancing the detection of document clusters [139], inferring the topical influences over linkages [140] and inspecting the topic changes over time [141]. All these approaches model the observed linkage structure as ground truth information that has been encoded into the model (e.g., interdependencies networks between documents, linkage-based attributes) and cannot generalise to new linkages outside of the training data.

Link prediction is one fundamental task in relational learning that can potentially benefit from the fusion of both the linkage structure and the content data, which form the document network data. In the traditional network analysis, it only makes use of the observed linkage structure to help to infer new likely links. This is unfavourable since most linkage structures are constructed by human agents, which are laborious and thus far from being incomplete. Otherwise, the linkage structures are extracted by some automatic data mining techniques, causing the data to be very noisy. To support link predictions, conventional models also require a node to have a moderate number of observed links for providing a predictive distribution. But this is not likely to be the case since most linkage networks are very sparse, e.g., the average number of links for the documents in Cora citation dataset is only four. As such, one can hardly build a reliable model to generalise well based on the linkage structure alone. Differently, in the real world application, the acquisition of the content data is normally inexpensive and they can be highly useful for link prediction. For example, the text content of scientific articles which describes the themes of the papers are strongly related to the authors’ selection of cited papers, the hyperlinks between webpages often signify their relevance in talking topics and items. Moreover, if no linkages are provided for a single node, we can still take advantage of this node’s

5.1 Introduction 114 feature representation to give information about its linkage distribution. Thus, it is important to develop numerical models for improving the link prediction power over traditional network model by jointly utilising the linkage structure and the content data.

In this field, only a few studies are directly designed for addressing the link prediction problem. Most of them are probabilistic generative models [33, 139, 142– 146]. These models assume the existence of some latent membership factors that account for both the random generation of text at each node and the links between them. Usually, the words generation in these models is as the same as that in conventional Topic models [27, 147], where each document comprises of different proportions of membership factors and each membership factor is accompanied with a distribution over the set of distinct vocabularies. But the underlying linkage structures are modelled in different ways through either a random generation process or a network regulariser. Other types of approaches have also been proposed for modelling the document networks, such as methods relying on learning a Mahalanobis distance metric [40, 148], and those built on deep learning methods [149, 150]. Some linear embedding approaches [65, 151] can also be extended to handle this link prediction task.

As we have seen in previous chapters, embedding-driven methods have been successfully applied to the scenarios for capturing important data patterns through the attribute representations as well as modelling the node’s interdependencies in the linkage structure. It is then straightforward to associate each node with a latent vector representation to connect both types of modalities. Specifically, we have developed a nonlinear embedding model for simultaneously processing the linkage structure and the node attributes in the document network data. It converts the assumed-to-be exist embedding points into conditional probabilities for explaining the likelihood of pairwise linkages, and iteratively modifies the locations of these embedding points by matching these conditional probabilities with our knowledge of the link structure and the node attributes. Two objectives are used, one is a ranking based criterion for modelling the linkage structure, and the other is rest on the Kullback-Leibler divergence for capturing the patterns in the content data.

5.2 Related Work 115

In document Embedding Approaches for Relational Data (Page 120-130)