This section provides a introduction and categorization of the multi-relational data (or knowledge graphs).
2.2.1 Monolingual and Multilingual Knowledge
We have already come across much of monolingual knowledge in the literature. In current knowl- edge bases, such as Wikipedia (Wikipedia, 2016), WordNet (Bond and Foster, 2013), and Concept- Net (Speer and Havasi, 2013), vast amounts of multilingual knowledge are being created across the multiple language-specific versions of the knowledge base. Such multilingual knowledge, in- cluding inter-lingual links (ILLs), and triple-wise alignment (TWA), is very useful in aligning and synchronizing different language-specific versions of a knowledge base that evolve independently, as needed to further improve applications built on multilingual knowledge bases. However, such cross-lingual knowledge is far from complete, while extending it is challenging due to the fact that it is almost not possible for existing corpus to directly provide such knowledge of expertise. Ex- isting approaches involve either extensive human involvement or require training comprehensive models on information that is external to knowledge graphs.
2.2.2 Ontology and Instance-level Knowledge Graphs
From a different perspective, knowledge bases can also be classified intoinstance-level knowledge graphsandontology-level knowledge graphs(Ni et al., 2016). Some large knowledge bases, such as DBpedia (Lehmann et al., 2015), YAGO (Mahdisoltani et al., 2015) and ConceptNet (Speer et al., 2017), simultaneously manage both categories of knowledge graphs as two views. These
two views of knowledge graphs are described as follows: (i) theinstance-level knowledge graphs that containrelationsbetween specificentitiesin triples (for example, “Barack Obama”, “isPoliti- cianOf”, “United States”) and (ii) the ontology-level knowledge graphsthat constitute semantic meta-relations of abstract concepts (such as “polication”, “is leader of”, “city”). In addition, these knowledge bases also provide cross-view links that connect ontological concepts and in- stances, denoting whether an instance is an instantiation from a specific concept. Figure 2.1 shows a snapshot of such a knowledge base.
Barack Obama Person Politician City at_location Honolulu Singer Pablo Alborán State Donald Trump at_location Place was_born_in Columbia University New York City is_located_in graduated_from is_a is_a
Ontology-viewKnowledgeGraph
Instance-viewKnowledgeGraph
University Michelle Obama has_spouse Nobel Peace Prize has_award Richard Hofstadter graduated_from at_location Artist is_a is_a is_a is_a has_album Institution is_a leader TypeLinks Concept Entity Relation Meta-Relation was_born_in lives_in related_to has_award has_spouse
Figure 2.1: An example of two-view KB. Regular meta-relations and those conforming the hierar- chical property are denoted as black and orange dashed lines respectively in the ontology view.
2.2.3 Comprehensive Properties of Relation Facts
Aside from general knowledge graphs that model relation facts as simple triples, a handful of com- monsense (Mitchell et al., 2018; Speer et al., 2017) and biological ontologies (Moal and Fernández- Recio, 2012; Szklarczyk et al., 2016), feature comprehensive properties in their relation facts. This subsection describes such properties from two perspectives, i.e. relational properties and uncer- tainty.
Table 2.1: The number of triples of each relation type in Yago3 Ontology. Relation Number Trans. Sym. Hier.
happenedIn 2810 hasChild 41938 X hasAcademicAdvisor 4 X livesIn 1600 isCitizenOf 1197 isLocatedIn 1549685 X X wasBornIn 11672 isMarriedTo 8593 X isLeaderOf 1071 X isPoliticianOf 4833 hasNeighbor 450 X hasCapital 5280 isConnectedTo 26966 X X dealsWith 821 X influences 170 hasCurrency 4 X diedIn 7195 hasGender 34811 X Total num/portion 1699100 92.8% 2.2% 95.8% 2.2.3.1 Relational Properties
In some knowledge graphs, especially ontology graphs, the majority of relation facts can be en- forced with specific relational properties (e.g., transitivity, symmetry), or form hierarchies (e.g. taxonomy relations and spatial topological relations (Chen et al., 2016a)). For example, Freebase contains more than 20% of transitive or symmetric relations (Bollacker et al., 2008); Concept- Net (Speer and Havasi, 2013) contains 70% of transitive or symmetric relations, and at least 26% of hierarchical relations; Yago3 Ontology (Mahdisoltani et al., 2015) even contains only 17 types of relations (whose statistics we have listed in Table 2.1), while more than 92% of the relations are transitive or symmetric relations, and more than 95% of the relations are hierarchical. Moreover, we can further divide hierarchical relations into refinement and coercion relations (Camossi et al., 2006), such that the former divides each coarser concept or entity into more refined ones, and the later does the opposite.
2.2.3.2 Uncertain Knowledge Graphs
In contrast to the aforementioned deterministic knowledge graphs, uncertain knowledge graphs provide a confidence score along with every relation fact. The development of relation extraction and crowdsourcing in recent years enabled the construction of large-scale uncertain knowledge bases. ConceptNet (Speer et al., 2017) is a multilingual uncertain knowledge graph for common- sense knowledge that is collected via crowdsourcing. The confidence scores in ConceptNet mainly come from the co-occurrence frequency of the labels in crowdsourced task results. Probase (Wu et al., 2012) consists of an universal probabilistic taxonomy that is built by relation extraction. Ev- ery relation fact in Probase is associated with a joint probabilityPisA(x, y). NELL (Mitchell et al.,
2018) collects relation facts from reading web pages, and learns their confidence scores from semi- supervised learning with Expectation-maximum (EM) algorithm. On the other side, in biological knowledge graphs, the confidence score often serves as a quantification of certain biochemical in- teractions, or express the belief of the interactions based on the experimental verification. Such cases include binding affinity estimation of proteins that are endowed to the protein-protein in- teractions in SKEMPI (Moal and Fernández-Recio, 2012), as well as the evidential confidence of typed protein-protein interactions in STRING (Szklarczyk et al., 2016).
2.2.4 Sequence Data As Side Information
Besides the multi-relational structures, some knowledge bases also provide side information of entities (concepts) as sequence data. Such sequence data serve as alternative views to represent entities or concepts in the embedding space, which is captured with the neural sequence models introduced in the next sections. Such data includes natural language descriptions of entities in multilingual knowledge bases (Bollacker et al., 2008; Lehmann et al., 2015), which are leveraged to support co-training of cross-lingual knowledge transfer in Section 3. Other forms include def- initions of words in lexicographic knowledge bases (online dictionaries) (Meyer and Gurevych, 2012), and amino acid sequences of proteins in protein knowledge bases (Szklarczyk et al., 2016), for which we utilize directly to experiment sequence-based relational learning.