Knowledge Graph Representation

(1)

Knowledge Graph Representation

From Recent Models towards a Theoretical Understanding

Ivana Balaˇzevi´c & Carl Allen

January 27, 2021

(2)

What are Knowledge Graphs?

A

B

C

D

father of

ma

rried

to

mother

of

?

sibling

uncle

of

?

Entities E={A,B,C,D}

(3)

Representing Entities and Relations

Subject and object entitieses,eo are represented byvectorses,eo∈Rd

(embeddings).

Relationsr are represented by transformationsfr,gr:Rd→Rd

0 that transform the entity embeddings.

Aproximity measure, e.g. Euclidean distance, dot product, compares the transformed subject and object entities.

es eo e(sr) e(or) fr gr

(4)

Representing Entities and Relations

(embeddings).

Relationsr are represented bytransformations fr,gr:Rd→Rd

(5)

Representing Entities and Relations

(embeddings).

(6)

Representing Entities and Relations

(embeddings).

(7)

Score Function

Ascore functionφ:E ×R×E →_Rbrings together entity, relation representations and proximity measure to assign a scoreφ(es,r,eo) to each triple, used to predict whether the triple is true or false.

Representation parameters are optimised to improve prediction accuracy. Score functions can be broadly categorised by:

ä relation representation type (additive, multiplicative or both); and ä proximity measure (e.g. dot product, Euclidean distance).

Rel. Repr. Type Exampleφ(es,r,eo) Model

Multiplicative e>

sWreo=he(sr),eoi

DistMult (Yang et al., 2015) TuckER (Balaˇzevi´c et al., 2019b) Additive −kes+r−eok2 TransE (Bordes et al., 2013)

Both −ke>sWsr+r−e >

(8)

Score Function

Representation parameters are optimised to improve prediction accuracy.

Score functions can be broadly categorised by:

Rel. Repr. Type Exampleφ(es,r,eo) Model

Multiplicative e>

sWreo=he(sr),eoi

(9)

Score Function

Rel. Repr. Type Exampleφ(es,r,eo) Model

Multiplicative e>

sWreo=he(sr),eoi

(10)

Score Function

Rel. Repr. Type Exampleφ(es,r,eo) Model

Multiplicative e>

sWreo=he

(r)

s ,eoi

Both −ke>Ws₊_r₋_e> Wo_k2₊_b

(11)

TuckER: Tensor Factorization for Knowledge Graph Completion

W de de dr es eo wr

=

de de es eo Wr

Figure 1: Visualization of the TuckER architecture.

φTuckER(es,r,eo) = ((W ×1wr)×2es)×3eo=e>s Wreo

Multi-task learning: Rather than learning distinct relation matricesWr, the core tensorW contains a shared pool of “prototype” relation matrices that are linearly combined using parameters of the relation embeddingwr.

(12)

TuckER: Tensor Factorization for Knowledge Graph Completion

W de de dr es eo wr

=

de de es eo Wr

Figure 1: Visualization of the TuckER architecture.

φTuckER(es,r,eo) = ((W ×1wr)×2es)×3eo=e>s Wreo

Multi-task learning: Rather than learning distinct relation matricesWr, the core tensorW contains a shared pool of “prototype” relation matrices

(13)

MuRE: Multi-relational Euclidean Graph Embeddings

x z

y

Figure 2: MuRE spheres of influence.

(14)

Recap

ä KGs store facts: binaryrelationsbetweenentities(es,r,eo).

ä Enable computational reasoning over KGs,

e.g. question answering and inferring new facts (link prediction) ä Requires representation, typically:

• each entity by a vectorembedding e∈_Rd_, • each relation by atransformationfrom

subject entity to object entity,

es eo

es(r)

r

Many,many models with increasing success, but no principled

rationaleas to why they work, or how to improve (e.g. better

(15)

Recap

ä KGs store facts: binaryrelationsbetweenentities(es,r,eo). ä Enable computational reasoning over KGs,

e.g. question answering and inferring new facts (link prediction)

ä Requires representation, typically:

es eo

es(r)

r

(16)

Recap

es eo

es(r)

r

(17)

Recap

es eo

es(r)

r

(18)

Recap

es eo

es(r)

r

(19)

Recap

es eo

es(r)

r

ä Many,many models with gradually increasing success, butno principled rationalefor why they work, or how to improve them (e.g. more accurate prediction, incorporate logic, etc).

(20)

Simplify: consider Word Embeddings

ä Word embeddings, e.g.

• Word2Vec (W2V, Mikolov et al., 2013) • GloVe(Pennington et al., 2014)

w1 w2 w3 wn target words (E) c1 c2 c3 cn context words (E) .. . ... W C

ä Observation: semanticrelations

between words =⇒

geometricrelationships between embeddings • similarwords⇒close embeddings

• analogies(often) ⇒ _wking

wman

wwoman

wwoman+wking−wman ≈ wqueen

(21)

Simplify: consider Word Embeddings

between words =⇒

geometricrelationships between embeddings

• similarwords⇒close embeddings

wman

wwoman

(22)

Simplify: consider Word Embeddings

between words =⇒

wman

wwoman

(23)

Simplify: consider Word Embeddings

between words =⇒

wman

wwoman

(24)

Simplify: consider Word Embeddings

between words =⇒

wman

wwoman

(25)

Understanding word embeddings: the W2V Loss Function

−`W2V = X i,j #(wi,cj) logσ(wi>cj) + k#(wi)#(cj) D log(σ(−w > i cj)) ∇wi`W2V ∝ X j p(wi,cj)+kp(wi)p(cj) | {z } d(_ji) σ(Si,j)−σ(w>i cj) | {z } e(_ji) cj =Cdiag(d(i))e(i) •`W2V minimised when: low-rank case: w>_i cj = log

p(cj|wi) p(cj) | {z }

PMI(wi,cj)

−logk =. Si,j (Levy and Goldberg, 2014)

general case: error vectors diag(d(i)₎_e(i) _{orthogonal to rows of}_C

⇒Embeddingwi is a (non-linear) projection of rowiof the PMI matrix*, a PMI vectorpi_.

(26)

Understanding word embeddings: the W2V Loss Function

PMI(wi,cj)

(27)

Understanding word embeddings: the W2V Loss Function

PMI(wi,cj)

(28)

Understanding word embeddings: the W2V Loss Function

PMI(wi,cj)

(29)

Understanding word embeddings: the W2V Loss Function

PMI(wi,cj)

(30)

PMI Vectors

pi =

logp(cj|wi)

p(cj) cj∈E = log p(E|wi)

(31)

PMI Vector Interactions = Semantics (Similarity)

Similarity: similar words, e.g. synonyms, induce similar distributions, p(E|w), over context words.

Identified bysubtraction of PMI vectors: pi−pj = logp(E|wi) p(E|wj) = ρ i,j w1 wn E p(E|hound) p(E|dog) =⇒ pdog phound

(32)

PMI Vector Interactions = Semantics (Similarity)

(33)

PMI Vector Interactions = Semantics (Similarity)

(34)

PMI Vector Interactions = Semantics (Paraphrase)

Paraphrases: word sets with similar aggregate semantic meaning, e.g. {man, royal} ≈king.

Identified byadditionof PMI vectors: pi+pj = logp(E|wi) p(E) + log p(E|wj) p(E) =pk + logp(E|wi,wj) p(E|wk) | {z } ρ{i,j},k | {z } paraphrase error − log p(wi,wj|E) p(wi|E)p(wj|E) | {z } σi,j +log p(wi,wj) p(wi)p(wj) | {z } τi,j | {z } independence error w1 wn E

p(E|king) p(E|{man, royal})

=⇒

pman

proyal

p{man, royal}

(35)

PMI Vector Interactions = Semantics (Paraphrase)

Identified byadditionof PMI vectors: pi+pj = logp(E|wi) p(E) + log p(E|wj) p(E) =pk + logp(E|wi,wj) p(E|wk) | {z } ρ{i,j},k | {z } paraphrase error − log p(wi,wj|E) p(wi|E)p(wj|E) | {z } σi,j +log p(wi,wj) p(wi)p(wj) | {z } τi,j | {z } independence error w1 wn E

=⇒

pman

proyal

p{man, royal}

(36)

PMI Vector Interactions = Semantics (Paraphrase)

Identified byadditionof PMI vectors: pi+pj = logp(E|wi) p(E) + log p(E|wj) p(E) =pk + logp(E|wi,wj) p(E|wk) | {z } ρ{i,j},k | {z } paraphrase error − log p(wi,wj|E) p(wi|E)p(wj|E) | {z } σi,j +log p(wi,wj) p(wi)p(wj) | {z } τi,j | {z } independence error

=⇒

pman

p{man, royal}

(37)

PMI Vector Interactions = Semantics (Analogy)

Analogies: word pairs that share a similar semantic difference, e.g.{man, king}and{woman, queen}.

Identified by alinear combination of PMI vectors: pking−pman ≈ pqueen−pwoman

p

king

p

man

p

woman

p

queen

(38)

PMI Vector Interactions = Semantics (Analogy)

p

king

p

man

p

woman

p

queen

(39)

PMI Vector Interactions = Semantics (Analogy)

p

king

p

man

p

woman

(40)

From Analogies to Relations

pking pman pwoman pqueen Analogy ⇔

≈

+ man king

≈

+ woman queen Relation

ä Analogies contain commonbinary word relations, similar to KGs. ä For certain analogies (“specialisations”), the associated “vector offset”

gives atransformation that represents the relation.

ä Not all relations fit this semantic pattern, but we have insight to consider geometric aspects (relation conditions) of other relation types.

(41)

From Analogies to Relations

≈

+ man king

≈

+ woman queen Relation ä Analogies contain commonbinary word relations, similar to KGs.

ä For certain analogies (“specialisations”), the associated “vector offset” gives atransformation that represents the relation.

(42)

From Analogies to Relations

≈

+ man king

≈

+ woman queen Relation ä Analogies contain commonbinary word relations, similar to KGs. ä For certain analogies (“specialisations”), the associated “vector offset”

(43)

From Analogies to Relations

≈

+ man king

≈

+ woman queen Relation ä Analogies contain commonbinary word relations, similar to KGs. ä For certain analogies (“specialisations”), the associated “vector offset”

(44)

Categorising Relations: semantics

→

relation requirements

≈

Similarity

≈

Relatedness

≈

Specialisation

≈

Context-shift

≈

Gen. context-shift Relationships between PMI vectors for different relation types.

blue/green = strong word association (PMI>0); red = relatedness; black = context sets

Categorisation of WN18RR relations.

Type Relation Examples(subject entity, object entity)

R

verb group (trim down VB 1, cut VB 35),(hatch VB 1, incubate VB 2)

derivationally related form (lodge VB 4, accommodation NN 4),(question NN 1, inquire VB 1)

also see (clean JJ 1, tidy JJ 1),(ram VB 2, screw VB 3)

S hypernym (land reform NN 1, reform NN 1),(prickle-weed NN 1, herbaceous plant NN 1) instance hypernym (yellowstone river NN 1, river NN 1),(leipzig NN 1, urban center NN 1)

C

member of domain usage (colloquialism NN 1, figure VB 5),(plural form NN 1, authority NN 2)

member of domain region (rome NN 1, gladiator NN 1),(usa NN 1, multiple voting NN 1)

member meronym (south NN 2, sunshine state NN 1),(genus carya NN 1, pecan tree NN 1)

has part (aircraft NN 1, cabin NN 3),(morocco NN 1, atlas mountains NN 1)

(45)

Categorising Relations: semantics

→

relation requirements

≈

Similarity

≈

Relatedness

≈

Specialisation

≈

Context-shift

≈

Gen. context-shift Relationships between PMI vectors for different relation types.

blue/green = strong word association (PMI>0); red = relatedness; black = context sets

Categorisation of WN18RR relations.

Type Relation Examples(subject entity, object entity)

R

verb group (trim down VB 1, cut VB 35),(hatch VB 1, incubate VB 2)

derivationally related form (lodge VB 4, accommodation NN 4),(question NN 1, inquire VB 1)

also see (clean JJ 1, tidy JJ 1),(ram VB 2, screw VB 3)

S hypernym (land reform NN 1, reform NN 1),(prickle-weed NN 1, herbaceous plant NN 1) instance hypernym (yellowstone river NN 1, river NN 1),(leipzig NN 1, urban center NN 1)

(46)

Categorical completeness: are all relations covered?

ä View PMI vectors assets of word features andrelation types as set operations:

• similarity ⇒set equality

• relatedness ⇒subset equality (relation-specific) • context-shift ⇒set difference (relation-specific)

ä For any relation, each feature is either • necessarily unchanged (relatedness),

• necessarily/potentially changed (context shift), or • irrelevant.

ä Conjecture: the relation types identified partition the set of semantic relations.

(47)

Categorical completeness: are all relations covered?

• relatedness ⇒subset equality (relation-specific) • context-shift ⇒set difference (relation-specific) ä For any relation, each feature is either

• necessarily unchanged (relatedness),

(48)

Categorical completeness: are all relations covered?

• relatedness ⇒subset equality (relation-specific) • context-shift ⇒set difference (relation-specific) ä For any relation, each feature is either

• necessarily unchanged (relatedness),

(49)

Relations as mappings between embeddings

R: S-relatedness requires both entity embeddingses,eo to share a common subspace component_VS

ä project ontoVS (multiply by matrixPr∈Rd×d) and compare.

ä Dot product: (Pres)>(Preo) =es>Pr>Preo=es>Mreo ä Euclidean distance: kPres−Preok2=kPresk2−2es>Mreo+kPreok2

S/C: requiresS-relatedness and relation-specific component(s) (vs r,vro). ä project onto a subspace (byPr∈Rd×d) corresponding toS,vrs andvro

(i.e. testS-relatedness while preserving relation-specific components); ä add relation-specificr =vo

r −vrs∈Rd to transformed embeddings. ä Dot product: (Pres+r)>Preo

(50)

Summary

ä Theoretic: a derivation of geometric components of relation representations from word co-occurrence statistics.

ä Interpretability: associates geometric model components with semantic aspects of relations.

ä Empirically supported: justifies relative link-prediction performance of a range of models on real datasets:

additive & multiplicative

| {z }

MuRE* (Balaˇzevi´c et al., 2019a)

> multiplicative

| {z }

TuckER (Balaˇzevi´c et al., 2019b) DistMult (Yang et al., 2015)

or additive

| {z }

TransE (Bordes et al., 2013)

.

*Note: MuRE was inspired by the vector offset of analogies.

(51)

Summary

additive & multiplicative

| {z }

or additive

| {z }

.

(52)

Summary

additive & multiplicative

| {z }

or additive

| {z }

.

(53)

Summary

additive & multiplicative

| {z }

or additive

| {z }

.

(54)

Thanks!

(55)

References i

Carl Allen and Timothy Hospedales. Analogies Explained: Towards Understanding Word Embeddings. InICML, 2019.

Carl Allen, Ivana Balaˇzevi´c, and Timothy Hospedales. What the Vec? Towards Probabilistically Grounded Embeddings. InNeurIPS, 2019.

Carl Allen, Ivana Balaˇzevi´c, and Timothy Hospedales. Interpreting Knowledge Graph Relation Representation from Word Embeddings. InICLR, 2021.

Ivana Balaˇzevi´c, Carl Allen, and Timothy M Hospedales. Multi-relational Poincar´e Graph Embeddings. InNeurIPS, 2019a.

Ivana Balaˇzevi´c, Carl Allen, and Timothy M Hospedales. TuckER: Tensor Factorization for Knowledge Graph Completion. InEMNLP, 2019b.

Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. Translating Embeddings for Modeling Multi-relational Data. In

NeurIPS, 2013.

Omer Levy and Yoav Goldberg. Neural word embedding as implicit matrix factorization. InNeurIPS, 2014.

(56)

References ii

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. InICLR Workshop, 2013.

Jeffrey Pennington, Richard Socher, and Christopher Manning. Glove: Global Vectors for Word Representation. InEMNLP, 2014.

Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. InICLR, 2015.