• No results found

Knowledge Graph Representation

N/A
N/A
Protected

Academic year: 2021

Share "Knowledge Graph Representation"

Copied!
56
0
0

Loading.... (view fulltext now)

Full text

(1)

Knowledge Graph Representation

From Recent Models towards a Theoretical Understanding

Ivana Balaˇzevi´c & Carl Allen

January 27, 2021

(2)

What are Knowledge Graphs?

A

B

C

D

father of

ma

rried

to

mother

of

?

sibling

uncle

of

?

Entities E={A,B,C,D}
(3)

Representing Entities and Relations

Subject and object entitieses,eo are represented byvectorses,eo∈Rd

(embeddings).

Relationsr are represented by transformationsfr,gr:Rd→Rd

0 that transform the entity embeddings.

Aproximity measure, e.g. Euclidean distance, dot product, compares the transformed subject and object entities.

es eo e(sr) e(or) fr gr

(4)

Representing Entities and Relations

Subject and object entitieses,eo are represented byvectorses,eo∈Rd

(embeddings).

Relationsr are represented bytransformations fr,gr:Rd→Rd

0 that transform the entity embeddings.

Aproximity measure, e.g. Euclidean distance, dot product, compares the transformed subject and object entities.

es eo e(sr) e(or) fr gr

(5)

Representing Entities and Relations

Subject and object entitieses,eo are represented byvectorses,eo∈Rd

(embeddings).

Relationsr are represented bytransformations fr,gr:Rd→Rd

0 that transform the entity embeddings.

Aproximity measure, e.g. Euclidean distance, dot product, compares the transformed subject and object entities.

es eo e(sr) e(or) fr gr

(6)

Representing Entities and Relations

Subject and object entitieses,eo are represented byvectorses,eo∈Rd

(embeddings).

Relationsr are represented bytransformations fr,gr:Rd→Rd

0 that transform the entity embeddings.

Aproximity measure, e.g. Euclidean distance, dot product, compares the transformed subject and object entities.

es eo e(sr) e(or) fr gr

(7)

Score Function

Ascore functionφ:E ×R×E →Rbrings together entity, relation representations and proximity measure to assign a scoreφ(es,r,eo) to each triple, used to predict whether the triple is true or false.

Representation parameters are optimised to improve prediction accuracy. Score functions can be broadly categorised by:

ä relation representation type (additive, multiplicative or both); and ä proximity measure (e.g. dot product, Euclidean distance).

Rel. Repr. Type Exampleφ(es,r,eo) Model

Multiplicative e>

sWreo=he(sr),eoi

DistMult (Yang et al., 2015) TuckER (Balaˇzevi´c et al., 2019b) Additive −kes+reok2 TransE (Bordes et al., 2013)

Both −ke>sWsr+re >

(8)

Score Function

Ascore functionφ:E ×R×E →Rbrings together entity, relation representations and proximity measure to assign a scoreφ(es,r,eo) to each triple, used to predict whether the triple is true or false.

Representation parameters are optimised to improve prediction accuracy.

Score functions can be broadly categorised by:

ä relation representation type (additive, multiplicative or both); and ä proximity measure (e.g. dot product, Euclidean distance).

Rel. Repr. Type Exampleφ(es,r,eo) Model

Multiplicative e>

sWreo=he(sr),eoi

DistMult (Yang et al., 2015) TuckER (Balaˇzevi´c et al., 2019b) Additive −kes+reok2 TransE (Bordes et al., 2013)

Both −ke>sWsr+re >

(9)

Score Function

Ascore functionφ:E ×R×E →Rbrings together entity, relation representations and proximity measure to assign a scoreφ(es,r,eo) to each triple, used to predict whether the triple is true or false.

Representation parameters are optimised to improve prediction accuracy. Score functions can be broadly categorised by:

ä relation representation type (additive, multiplicative or both); and ä proximity measure (e.g. dot product, Euclidean distance).

Rel. Repr. Type Exampleφ(es,r,eo) Model

Multiplicative e>

sWreo=he(sr),eoi

DistMult (Yang et al., 2015) TuckER (Balaˇzevi´c et al., 2019b) Additive −kes+reok2 TransE (Bordes et al., 2013)

Both −ke>sWsr+re >

(10)

Score Function

Ascore functionφ:E ×R×E →Rbrings together entity, relation representations and proximity measure to assign a scoreφ(es,r,eo) to each triple, used to predict whether the triple is true or false.

Representation parameters are optimised to improve prediction accuracy. Score functions can be broadly categorised by:

ä relation representation type (additive, multiplicative or both); and ä proximity measure (e.g. dot product, Euclidean distance).

Rel. Repr. Type Exampleφ(es,r,eo) Model

Multiplicative e>

sWreo=he

(r)

s ,eoi

DistMult (Yang et al., 2015) TuckER (Balaˇzevi´c et al., 2019b) Additive −kes+reok2 TransE (Bordes et al., 2013)

Both −ke>Ws+re> Wok2+b

(11)

TuckER: Tensor Factorization for Knowledge Graph Completion

W de de dr es eo wr

=

de de es eo Wr

Figure 1: Visualization of the TuckER architecture.

φTuckER(es,r,eo) = ((W ×1wr)×2es)×3eo=e>s Wreo

Multi-task learning: Rather than learning distinct relation matricesWr, the core tensorW contains a shared pool of “prototype” relation matrices that are linearly combined using parameters of the relation embeddingwr.

(12)

TuckER: Tensor Factorization for Knowledge Graph Completion

W de de dr es eo wr

=

de de es eo Wr

Figure 1: Visualization of the TuckER architecture.

φTuckER(es,r,eo) = ((W ×1wr)×2es)×3eo=e>s Wreo

Multi-task learning: Rather than learning distinct relation matricesWr, the core tensorW contains a shared pool of “prototype” relation matrices

(13)

MuRE: Multi-relational Euclidean Graph Embeddings

x z

y

Figure 2: MuRE spheres of influence.

(14)

Recap

ä KGs store facts: binaryrelationsbetweenentities(es,r,eo).

ä Enable computational reasoning over KGs,

e.g. question answering and inferring new facts (link prediction) ä Requires representation, typically:

• each entity by a vectorembedding eRd, • each relation by atransformationfrom

subject entity to object entity,

es eo

es(r)

r

Many,many models with increasing success, but no principled

rationaleas to why they work, or how to improve (e.g. better

(15)

Recap

ä KGs store facts: binaryrelationsbetweenentities(es,r,eo). ä Enable computational reasoning over KGs,

e.g. question answering and inferring new facts (link prediction)

ä Requires representation, typically:

• each entity by a vectorembedding eRd, • each relation by atransformationfrom

subject entity to object entity,

es eo

es(r)

r

Many,many models with increasing success, but no principled

rationaleas to why they work, or how to improve (e.g. better

(16)

Recap

ä KGs store facts: binaryrelationsbetweenentities(es,r,eo). ä Enable computational reasoning over KGs,

e.g. question answering and inferring new facts (link prediction) ä Requires representation, typically:

• each entity by a vectorembedding eRd, • each relation by atransformationfrom

subject entity to object entity,

es eo

es(r)

r

Many,many models with increasing success, but no principled

rationaleas to why they work, or how to improve (e.g. better

(17)

Recap

ä KGs store facts: binaryrelationsbetweenentities(es,r,eo). ä Enable computational reasoning over KGs,

e.g. question answering and inferring new facts (link prediction) ä Requires representation, typically:

• each entity by a vectorembedding eRd, • each relation by atransformationfrom

subject entity to object entity,

es eo

es(r)

r

Many,many models with increasing success, but no principled

rationaleas to why they work, or how to improve (e.g. better

(18)

Recap

ä KGs store facts: binaryrelationsbetweenentities(es,r,eo). ä Enable computational reasoning over KGs,

e.g. question answering and inferring new facts (link prediction) ä Requires representation, typically:

• each entity by a vectorembedding eRd, • each relation by atransformationfrom

subject entity to object entity,

es eo

es(r)

r

Many,many models with increasing success, but no principled

rationaleas to why they work, or how to improve (e.g. better

(19)

Recap

ä KGs store facts: binaryrelationsbetweenentities(es,r,eo). ä Enable computational reasoning over KGs,

e.g. question answering and inferring new facts (link prediction) ä Requires representation, typically:

• each entity by a vectorembedding eRd, • each relation by atransformationfrom

subject entity to object entity,

es eo

es(r)

r

ä Many,many models with gradually increasing success, butno principled rationalefor why they work, or how to improve them (e.g. more accurate prediction, incorporate logic, etc).

(20)

Simplify: consider Word Embeddings

ä Word embeddings, e.g.

Word2Vec (W2V, Mikolov et al., 2013) • GloVe(Pennington et al., 2014)

w1 w2 w3 wn target words (E) c1 c2 c3 cn context words (E) .. . ... W C

ä Observation: semanticrelations

between words =⇒

geometricrelationships between embeddings • similarwords⇒close embeddings

analogies(often) ⇒ wking

wman

wwoman

wwoman+wkingwmanwqueen

(21)

Simplify: consider Word Embeddings

ä Word embeddings, e.g.

Word2Vec (W2V, Mikolov et al., 2013) • GloVe(Pennington et al., 2014)

w1 w2 w3 wn target words (E) c1 c2 c3 cn context words (E) .. . ... W C

ä Observation: semanticrelations

between words =⇒

geometricrelationships between embeddings

similarwords⇒close embeddings

analogies(often) ⇒ wking

wman

wwoman

wwoman+wkingwmanwqueen

(22)

Simplify: consider Word Embeddings

ä Word embeddings, e.g.

Word2Vec (W2V, Mikolov et al., 2013) • GloVe(Pennington et al., 2014)

w1 w2 w3 wn target words (E) c1 c2 c3 cn context words (E) .. . ... W C

ä Observation: semanticrelations

between words =⇒

geometricrelationships between embeddings • similarwords⇒close embeddings

analogies(often) ⇒ wking

wman

wwoman

wwoman+wkingwmanwqueen

(23)

Simplify: consider Word Embeddings

ä Word embeddings, e.g.

Word2Vec (W2V, Mikolov et al., 2013) • GloVe(Pennington et al., 2014)

w1 w2 w3 wn target words (E) c1 c2 c3 cn context words (E) .. . ... W C

ä Observation: semanticrelations

between words =⇒

geometricrelationships between embeddings • similarwords⇒close embeddings

analogies(often) ⇒ wking

wman

wwoman

wwoman+wkingwmanwqueen

(24)

Simplify: consider Word Embeddings

ä Word embeddings, e.g.

Word2Vec (W2V, Mikolov et al., 2013) • GloVe(Pennington et al., 2014)

w1 w2 w3 wn target words (E) c1 c2 c3 cn context words (E) .. . ... W C

ä Observation: semanticrelations

between words =⇒

geometricrelationships between embeddings • similarwords⇒close embeddings

analogies(often) ⇒ wking

wman

wwoman

wwoman+wkingwmanwqueen

(25)

Understanding word embeddings: the W2V Loss Function

`W2V = X i,j #(wi,cj) logσ(wi>cj) + k#(wi)#(cj) D log(σ(−w > i cj)) ∇wi`W2V ∝ X j p(wi,cj)+kp(wi)p(cj) | {z } d(ji) σ(Si,j)−σ(w>i cj) | {z } e(ji) cj =Cdiag(d(i))e(i) •`W2V minimised when: low-rank case: w>i cj = log

p(cj|wi) p(cj) | {z }

PMI(wi,cj)

−logk =. Si,j (Levy and Goldberg, 2014)

general case: error vectors diag(d(i))e(i) orthogonal to rows ofC

⇒Embeddingwi is a (non-linear) projection of rowiof the PMI matrix*, a PMI vectorpi.

(26)

Understanding word embeddings: the W2V Loss Function

`W2V = X i,j #(wi,cj) logσ(wi>cj) + k#(wi)#(cj) D log(σ(−w > i cj)) ∇wi`W2V ∝ X j p(wi,cj)+kp(wi)p(cj) | {z } d(ji) σ(Si,j)−σ(w>i cj) | {z } e(ji) cj =Cdiag(d(i))e(i) •`W2V minimised when: low-rank case: w>i cj = log

p(cj|wi) p(cj) | {z }

PMI(wi,cj)

−logk =. Si,j (Levy and Goldberg, 2014)

general case: error vectors diag(d(i))e(i) orthogonal to rows ofC

⇒Embeddingwi is a (non-linear) projection of rowiof the PMI matrix*, a PMI vectorpi.

(27)

Understanding word embeddings: the W2V Loss Function

`W2V = X i,j #(wi,cj) logσ(wi>cj) + k#(wi)#(cj) D log(σ(−w > i cj)) ∇wi`W2V ∝ X j p(wi,cj)+kp(wi)p(cj) | {z } d(ji) σ(Si,j)−σ(w>i cj) | {z } e(ji) cj =Cdiag(d(i))e(i) •`W2V minimised when: low-rank case: w>i cj = log

p(cj|wi) p(cj) | {z }

PMI(wi,cj)

−logk =. Si,j (Levy and Goldberg, 2014)

general case: error vectors diag(d(i))e(i) orthogonal to rows ofC

⇒Embeddingwi is a (non-linear) projection of rowiof the PMI matrix*, a PMI vectorpi.

(28)

Understanding word embeddings: the W2V Loss Function

`W2V = X i,j #(wi,cj) logσ(wi>cj) + k#(wi)#(cj) D log(σ(−w > i cj)) ∇wi`W2V ∝ X j p(wi,cj)+kp(wi)p(cj) | {z } d(ji) σ(Si,j)−σ(w>i cj) | {z } e(ji) cj =Cdiag(d(i))e(i) •`W2V minimised when: low-rank case: w>i cj = log

p(cj|wi) p(cj) | {z }

PMI(wi,cj)

−logk =. Si,j (Levy and Goldberg, 2014)

general case: error vectors diag(d(i))e(i) orthogonal to rows ofC

⇒Embeddingwi is a (non-linear) projection of rowiof the PMI matrix*, a PMI vectorpi.

(29)

Understanding word embeddings: the W2V Loss Function

`W2V = X i,j #(wi,cj) logσ(wi>cj) + k#(wi)#(cj) D log(σ(−w > i cj)) ∇wi`W2V ∝ X j p(wi,cj)+kp(wi)p(cj) | {z } d(ji) σ(Si,j)−σ(w>i cj) | {z } e(ji) cj =Cdiag(d(i))e(i) •`W2V minimised when: low-rank case: w>i cj = log

p(cj|wi) p(cj) | {z }

PMI(wi,cj)

−logk =. Si,j (Levy and Goldberg, 2014)

general case: error vectors diag(d(i))e(i) orthogonal to rows ofC

⇒Embeddingwi is a (non-linear) projection of rowiof the PMI matrix*, a PMI vectorpi.

(30)

PMI Vectors

pi =

logp(cj|wi)

p(cj) cj∈E = log p(E|wi)

(31)

PMI Vector Interactions = Semantics (Similarity)

Similarity: similar words, e.g. synonyms, induce similar distributions, p(E|w), over context words.

Identified bysubtraction of PMI vectors: pipj = logp(E|wi) p(E|wj) = ρ i,j w1 wn E p(E|hound) p(E|dog) =⇒ pdog phound

(32)

PMI Vector Interactions = Semantics (Similarity)

Similarity: similar words, e.g. synonyms, induce similar distributions, p(E|w), over context words.

Identified bysubtraction of PMI vectors: pipj = logp(E|wi) p(E|wj) = ρ i,j w1 wn E p(E|hound) p(E|dog) =⇒ pdog phound

(33)

PMI Vector Interactions = Semantics (Similarity)

Similarity: similar words, e.g. synonyms, induce similar distributions, p(E|w), over context words.

Identified bysubtraction of PMI vectors: pipj = logp(E|wi) p(E|wj) = ρ i,j w1 wn E p(E|hound) p(E|dog) =⇒ pdog phound

(34)

PMI Vector Interactions = Semantics (Paraphrase)

Paraphrases: word sets with similar aggregate semantic meaning, e.g. {man, royal} ≈king.

Identified byadditionof PMI vectors: pi+pj = logp(E|wi) p(E) + log p(E|wj) p(E) =pk + logp(E|wi,wj) p(E|wk) | {z } ρ{i,j},k | {z } paraphrase error − log p(wi,wj|E) p(wi|E)p(wj|E) | {z } σi,j +log p(wi,wj) p(wi)p(wj) | {z } τi,j | {z } independence error w1 wn E

p(E|king) p(E|{man, royal})

=⇒

pman

proyal

p{man, royal}

(35)

PMI Vector Interactions = Semantics (Paraphrase)

Paraphrases: word sets with similar aggregate semantic meaning, e.g. {man, royal} ≈king.

Identified byadditionof PMI vectors: pi+pj = logp(E|wi) p(E) + log p(E|wj) p(E) =pk + logp(E|wi,wj) p(E|wk) | {z } ρ{i,j},k | {z } paraphrase error − log p(wi,wj|E) p(wi|E)p(wj|E) | {z } σi,j +log p(wi,wj) p(wi)p(wj) | {z } τi,j | {z } independence error w1 wn E

p(E|king) p(E|{man, royal})

=⇒

pman

proyal

p{man, royal}

(36)

PMI Vector Interactions = Semantics (Paraphrase)

Paraphrases: word sets with similar aggregate semantic meaning, e.g. {man, royal} ≈king.

Identified byadditionof PMI vectors: pi+pj = logp(E|wi) p(E) + log p(E|wj) p(E) =pk + logp(E|wi,wj) p(E|wk) | {z } ρ{i,j},k | {z } paraphrase error − log p(wi,wj|E) p(wi|E)p(wj|E) | {z } σi,j +log p(wi,wj) p(wi)p(wj) | {z } τi,j | {z } independence error

p(E|king) p(E|{man, royal})

=⇒

pman

p{man, royal}

(37)

PMI Vector Interactions = Semantics (Analogy)

Analogies: word pairs that share a similar semantic difference, e.g.{man, king}and{woman, queen}.

Identified by alinear combination of PMI vectors: pkingpmanpqueenpwoman

p

king

p

man

p

woman

p

queen
(38)

PMI Vector Interactions = Semantics (Analogy)

Analogies: word pairs that share a similar semantic difference, e.g.{man, king}and{woman, queen}.

Identified by alinear combination of PMI vectors: pkingpmanpqueenpwoman

p

king

p

man

p

woman

p

queen
(39)

PMI Vector Interactions = Semantics (Analogy)

Analogies: word pairs that share a similar semantic difference, e.g.{man, king}and{woman, queen}.

Identified by alinear combination of PMI vectors: pkingpmanpqueenpwoman

p

king

p

man

p

woman
(40)

From Analogies to Relations

pking pman pwoman pqueen Analogy

+ man king

+ woman queen Relation

ä Analogies contain commonbinary word relations, similar to KGs. ä For certain analogies (“specialisations”), the associated “vector offset”

gives atransformation that represents the relation.

ä Not all relations fit this semantic pattern, but we have insight to consider geometric aspects (relation conditions) of other relation types.

(41)

From Analogies to Relations

pking pman pwoman pqueen Analogy

+ man king

+ woman queen Relation ä Analogies contain commonbinary word relations, similar to KGs.

ä For certain analogies (“specialisations”), the associated “vector offset” gives atransformation that represents the relation.

ä Not all relations fit this semantic pattern, but we have insight to consider geometric aspects (relation conditions) of other relation types.

(42)

From Analogies to Relations

pking pman pwoman pqueen Analogy

+ man king

+ woman queen Relation ä Analogies contain commonbinary word relations, similar to KGs. ä For certain analogies (“specialisations”), the associated “vector offset”

gives atransformation that represents the relation.

ä Not all relations fit this semantic pattern, but we have insight to consider geometric aspects (relation conditions) of other relation types.

(43)

From Analogies to Relations

pking pman pwoman pqueen Analogy

+ man king

+ woman queen Relation ä Analogies contain commonbinary word relations, similar to KGs. ä For certain analogies (“specialisations”), the associated “vector offset”

gives atransformation that represents the relation.

ä Not all relations fit this semantic pattern, but we have insight to consider geometric aspects (relation conditions) of other relation types.

(44)

Categorising Relations: semantics

relation requirements

Similarity

Relatedness

Specialisation

Context-shift

Gen. context-shift Relationships between PMI vectors for different relation types.

blue/green = strong word association (PMI>0); red = relatedness; black = context sets

Categorisation of WN18RR relations.

Type Relation Examples(subject entity, object entity)

R

verb group (trim down VB 1, cut VB 35),(hatch VB 1, incubate VB 2)

derivationally related form (lodge VB 4, accommodation NN 4),(question NN 1, inquire VB 1)

also see (clean JJ 1, tidy JJ 1),(ram VB 2, screw VB 3)

S hypernym (land reform NN 1, reform NN 1),(prickle-weed NN 1, herbaceous plant NN 1) instance hypernym (yellowstone river NN 1, river NN 1),(leipzig NN 1, urban center NN 1)

C

member of domain usage (colloquialism NN 1, figure VB 5),(plural form NN 1, authority NN 2)

member of domain region (rome NN 1, gladiator NN 1),(usa NN 1, multiple voting NN 1)

member meronym (south NN 2, sunshine state NN 1),(genus carya NN 1, pecan tree NN 1)

has part (aircraft NN 1, cabin NN 3),(morocco NN 1, atlas mountains NN 1)

(45)

Categorising Relations: semantics

relation requirements

Similarity

Relatedness

Specialisation

Context-shift

Gen. context-shift Relationships between PMI vectors for different relation types.

blue/green = strong word association (PMI>0); red = relatedness; black = context sets

Categorisation of WN18RR relations.

Type Relation Examples(subject entity, object entity)

R

verb group (trim down VB 1, cut VB 35),(hatch VB 1, incubate VB 2)

derivationally related form (lodge VB 4, accommodation NN 4),(question NN 1, inquire VB 1)

also see (clean JJ 1, tidy JJ 1),(ram VB 2, screw VB 3)

S hypernym (land reform NN 1, reform NN 1),(prickle-weed NN 1, herbaceous plant NN 1) instance hypernym (yellowstone river NN 1, river NN 1),(leipzig NN 1, urban center NN 1)

(46)

Categorical completeness: are all relations covered?

ä View PMI vectors assets of word features andrelation types as set operations:

• similarity ⇒set equality

• relatedness ⇒subset equality (relation-specific) • context-shift ⇒set difference (relation-specific)

ä For any relation, each feature is either • necessarily unchanged (relatedness),

• necessarily/potentially changed (context shift), or • irrelevant.

ä Conjecture: the relation types identified partition the set of semantic relations.

(47)

Categorical completeness: are all relations covered?

ä View PMI vectors assets of word features andrelation types as set operations:

• similarity ⇒set equality

• relatedness ⇒subset equality (relation-specific) • context-shift ⇒set difference (relation-specific) ä For any relation, each feature is either

• necessarily unchanged (relatedness),

• necessarily/potentially changed (context shift), or • irrelevant.

ä Conjecture: the relation types identified partition the set of semantic relations.

(48)

Categorical completeness: are all relations covered?

ä View PMI vectors assets of word features andrelation types as set operations:

• similarity ⇒set equality

• relatedness ⇒subset equality (relation-specific) • context-shift ⇒set difference (relation-specific) ä For any relation, each feature is either

• necessarily unchanged (relatedness),

• necessarily/potentially changed (context shift), or • irrelevant.

ä Conjecture: the relation types identified partition the set of semantic relations.

(49)

Relations as mappings between embeddings

R: S-relatedness requires both entity embeddingses,eo to share a common subspace componentVS

ä project ontoVS (multiply by matrixPr∈Rd×d) and compare.

ä Dot product: (Pres)>(Preo) =es>Pr>Preo=es>Mreo ä Euclidean distance: kPresPreok2=kPresk2−2es>Mreo+kPreok2

S/C: requiresS-relatedness and relation-specific component(s) (vs r,vro). ä project onto a subspace (byPr∈Rd×d) corresponding toS,vrs andvro

(i.e. testS-relatedness while preserving relation-specific components); ä add relation-specificr =vo

rvrs∈Rd to transformed embeddings. ä Dot product: (Pres+r)>Preo

(50)

Summary

ä Theoretic: a derivation of geometric components of relation representations from word co-occurrence statistics.

ä Interpretability: associates geometric model components with semantic aspects of relations.

ä Empirically supported: justifies relative link-prediction performance of a range of models on real datasets:

additive & multiplicative

| {z }

MuRE* (Balaˇzevi´c et al., 2019a)

> multiplicative

| {z }

TuckER (Balaˇzevi´c et al., 2019b) DistMult (Yang et al., 2015)

or additive

| {z }

TransE (Bordes et al., 2013)

.

*Note: MuRE was inspired by the vector offset of analogies.

(51)

Summary

ä Theoretic: a derivation of geometric components of relation representations from word co-occurrence statistics.

ä Interpretability: associates geometric model components with semantic aspects of relations.

ä Empirically supported: justifies relative link-prediction performance of a range of models on real datasets:

additive & multiplicative

| {z }

MuRE* (Balaˇzevi´c et al., 2019a)

> multiplicative

| {z }

TuckER (Balaˇzevi´c et al., 2019b) DistMult (Yang et al., 2015)

or additive

| {z }

TransE (Bordes et al., 2013)

.

*Note: MuRE was inspired by the vector offset of analogies.

(52)

Summary

ä Theoretic: a derivation of geometric components of relation representations from word co-occurrence statistics.

ä Interpretability: associates geometric model components with semantic aspects of relations.

ä Empirically supported: justifies relative link-prediction performance of a range of models on real datasets:

additive & multiplicative

| {z }

MuRE* (Balaˇzevi´c et al., 2019a)

> multiplicative

| {z }

TuckER (Balaˇzevi´c et al., 2019b) DistMult (Yang et al., 2015)

or additive

| {z }

TransE (Bordes et al., 2013)

.

*Note: MuRE was inspired by the vector offset of analogies.

(53)

Summary

ä Theoretic: a derivation of geometric components of relation representations from word co-occurrence statistics.

ä Interpretability: associates geometric model components with semantic aspects of relations.

ä Empirically supported: justifies relative link-prediction performance of a range of models on real datasets:

additive & multiplicative

| {z }

MuRE* (Balaˇzevi´c et al., 2019a)

> multiplicative

| {z }

TuckER (Balaˇzevi´c et al., 2019b) DistMult (Yang et al., 2015)

or additive

| {z }

TransE (Bordes et al., 2013)

.

(54)

Thanks!

(55)

References i

Carl Allen and Timothy Hospedales. Analogies Explained: Towards Understanding Word Embeddings. InICML, 2019.

Carl Allen, Ivana Balaˇzevi´c, and Timothy Hospedales. What the Vec? Towards Probabilistically Grounded Embeddings. InNeurIPS, 2019.

Carl Allen, Ivana Balaˇzevi´c, and Timothy Hospedales. Interpreting Knowledge Graph Relation Representation from Word Embeddings. InICLR, 2021.

Ivana Balaˇzevi´c, Carl Allen, and Timothy M Hospedales. Multi-relational Poincar´e Graph Embeddings. InNeurIPS, 2019a.

Ivana Balaˇzevi´c, Carl Allen, and Timothy M Hospedales. TuckER: Tensor Factorization for Knowledge Graph Completion. InEMNLP, 2019b.

Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. Translating Embeddings for Modeling Multi-relational Data. In

NeurIPS, 2013.

Omer Levy and Yoav Goldberg. Neural word embedding as implicit matrix factorization. InNeurIPS, 2014.

(56)

References ii

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. InICLR Workshop, 2013.

Jeffrey Pennington, Richard Socher, and Christopher Manning. Glove: Global Vectors for Word Representation. InEMNLP, 2014.

Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. InICLR, 2015.

References

Related documents

This network indicated a strong sublanguage pattern among medical image reports, because (1) A small subset of semantic types was used to (top 40 + 4 expert chosen) cover a large

This network indicated a strong sublanguage pattern among medical image reports, because (1) A small subset of semantic types was used to (top 40 + 4 expert chosen) cover a large

We demonstrate that for sentence-level re- lation extraction it is beneficial to consider other relations in the sentential context while predicting the target relation..

For each (pattern, set of relations) tu- ple for each sentence that matches this pattern it is counted in how many sentences that match this pattern a certain relation exists

Keywords: cross-language information retrieval systems, machine translation, lexical disambiguation, semantic and conceptual indexing, contextual relations, matching,

Because it involves the whole person depicting physical, affective and ideational aspects of human social relations with and to others, a view on devised drama gives insight into

Further- more, we compared our approach with two different base lines: (i) a method exploiting only the frequency of semantic relations of length 1 within linked data (no heuristics

In order to extract relations from text, we need to model the correlation between relation instances and their mentions in text.. For this purpose we define the mention template T