• No results found

This chapter introduces the first work that generalizes knowledge graph embeddings to the mul- tilingual scenario. Our model MTransEcharacterizes monolingual relations and compares three different techniques to learn cross-lingual alignment for entities and relations. Extensive experi- ments on the tasks of cross-lingual entity matching and triple alignment verification show that the linear-transformation-technique is the best among the three. Moreover, MTransE preserves the key properties of monolingual knowledge graph embeddings on monolingual tasks. Moreover, we propose a semi-supervised learning approach to co-train multilingual KG embeddings and the em- beddings of entity descriptions for cross-lingual knowledge alignment. Our approachKDCoEef- fectively leverages KG embeddings for learning cross-lingual inferences on large, weakly-aligned KGs, which significantly outperforms previous models on the entity alignment task. The zero-shot alignment task also shows the effectiveness ofKDCoEfor improving the cross-lingual matching of entity descriptions through co-training. Meanwhile, we observe thatKDCoEis able to enhance the traditional methods of KG completion by leveraging the information from another language.

CHAPTER 4

Transfer Embeddings with Complex Alignment In-

formation

In this chapter, We extend the vanilla learning framework in the previous chapter to capture knowl- edge transfer with more complex alignment information. We consider the embedding learning of two-view knowledge bases, and biological knowledge graphs with fuzzy alignment.

4.1

Introduction

4.1.1 Ontology-level Concepts and Instance-level Entities

Several Knowledge bases, such as DBpedia (Lehmann et al., 2015), YAGO (Mahdisoltani et al., 2015) and ConceptNet (Speer et al., 2017), have incorporated knowledge graphs that can be cat- egorized as two views: (i) theinstance-view knowledge graphs that contain relations between specificentitiesin triples (for example, “Barack Obama”, “isPoliticianOf”, “United States”) and (ii) theontology-view knowledge graphsthat constitute semanticmeta-relationsof abstractcon- cepts (such as “polication”, “is leader of”, “city”). In addition, knowledge bases also provide cross-viewlinks that connect ontological concepts and instances, denoting whether an instance is an instantiation from a specific concept.

Existing embedding models, however, are limited to only one single view, either on the instance- view graph (Bordes et al., 2013; Nickel et al., 2016; Yang et al., 2015b) or on the ontology-view

graph (Chen et al., 2018d; Ristoski et al., 2018). Learning to represent a knowledge base from both views will no doubt provide more comprehensive insights. On one hand, instance embeddings pro- vide detailed and rich information for their corresponding ontological concepts. For example, by observing many individual musicians, the embedding of its corresponding concept “Musician” can be largely determined. On the other hand, a concept embedding provides a high-level summary of its instances, which is extremely helpful when very few relations are observed for an instance. For example, for a musician who has very few relational facts in the instance-view graph, we can still tell his or her rough position in instance embedding space because he or she should not be far away from other musicians.

In this chapter, we first propose JOIE(Hao et al., 2019) to jointly embed the instance-view graph and the ontology-view graph, by leveraging (i) triples in both graphs and (ii) type links that connect the two graphs. It is a non-trivial task to effectively combine representation learning techniques on both views of a knowledge base together, which faces the following challenges: (i) the vocabularies of entities and concepts, as well as relations and meta-relations, are disjoint but semantically related in these two views of the knowledge base. The semantic mappings from entities to concepts and from relations to meta-relations are complicated and difficult to be precisely captured by any current embedding models; and (ii) the known type links often inadequately cover a vast number of entities, which leads to insufficient information to align both views of the knowledge base, and entails discovering new type links; (iii) the scales and topological structures are also largely inconsistent in the two views. Specifically, The ontological views are often sparser. They provide fewer types of relations and often form hierarchical substructures. In contrast, the instance view is much larger and heterogeneous in relation types.

To address the above issues, we propose a novel knowledge graph embedding model named

JOIE, which jointly encodes both the ontology and instance views of a knowledge base. JOIE

extendsMTransEto support the representation learning. First, analignment modelassociates the instance embedding to its corresponding concept embedding. Second, theknowledge modelchar- acterizes the relational facts of ontology and instance views in two separate embedding spaces, for which we also investigate several triple encoding techniques, as well as hierarchical aware en-

View I = Cells

View II = Genes

Gene KG (derived from PPIs)

Cell Clusters (Inferred)

Fuzzy Alignment – Observed Expression

Cell 1 Cell 3 Cell 5 Cell 2 Cell 4 Cell 6 Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6

Figure 4.1: The scRNA-seq fuzzy alignment between genes and cells.

coding techniques for the ontology view. For the alignment model, we explore two techniques to capture the type links. The cross-view grouping technique assumes that the two views can be forced into the same embedding space, while thecross-view transformationtechnique enables non-linear transformations from the instance embedding space to the ontology embedding space. As for the knowledge embedding model, in particular, we use three state-of-the-art translational or similarity-based relational embedding techniques to capture the multi-relational structures of each view. Additionally, for some knowledge bases where ontologies constitute hierarchical sub- structures, we deploy a hierarchy-aware embedding technique based on knowledge non-linear transformations. This technique seek to help preserve the hierarchical property of such ontologies. Accordingly, we investigate with nine variants ofJOIE and evaluate these models on two tasks: the triple completion task and the entity typing task. Experimental results on the triple completion task confirm the effectiveness ofJOIE for populating knowledge in both ontology and instance- view knowledge graphs, and has significantly outperformed various baseline models. The results on the entity typing task show that our model is competent in discovering type links to align the ontology-view and the instance-view knowledge graphs.

4.1.2 Single-cell RNA-sequence Data As Fuzzy Alignment

Single-cell RNA-sequencing (scRNA-seq) enables high throughput measurement of RNA expres- sion in individual cells, and seeks to help cell type identification and clustering (Gong et al., 2018; Li and Li, 2018; Talwar et al., 2018). The relations of genes (RNA) can be derived from the protein-protein interaction knowledge graphs (Szklarczyk et al., 2016), Hence, we further extend

MTransE to deal with the single-cell RNA-sequencing. The proposed KG-Transfer model seeks to transfer the gene-level knowledge to the cell view potentially helps the inferences of cell information. Figure 4.1 shows an overview of the knowledgeKG-Transferseeks to represent.

Since the vocabularies of genes and cells are of largely different sizes, KG-Transfer dis- tribute genes and cells in embedding spaces with different dimensionalities. The knowledge model encodes the relations of genes based on protein-protein interaction data. The cells does not have multi-relational data, while our objective here is to infer the cell clustering.

The additional challenge under this case lies mainly under the fuzzy alignment between genes and cells. As the scRNA-seq data measures different gene expressions between organisms, tis- sues, and disease states of a single cell based on wet lab transcripts. Due to that for each cell, the observed gene-cell associations typically have different number of observations and different evi- dential confidence in wet lab transcripts (Talwar et al., 2018). Hence, the alignment model needs to be adapted to capture such fuzzy alignment information. More over, due to technical limitations, scRNA-seq data often contain zero counts for many transcripts in individual cells (Wang et al., 2009). These zero counts, or dropout events, complicate the transfer learning by causing missing alignment information. Against this issue, we develop the alignment model of KG-Transfer

as a Semi-Nonnegative Matrix Tri-factorization (Semi-NMTF) (Ding et al., 2010) based fuzzy alignment model between the gene and cell views. This alignment technique seeks to capture the associations between genes and cells based on fuzzy alignment and impute the missing values of the scRNA-seq matrix. We show that by transfering the gene-level knowledge,KG-Transferis able to significantly improve cell clustering, especially under the case where the scRNA-seq data has extreme dropout rates and is highly sparse.