Knowledge Graph Representation
From Recent Models towards a Theoretical Understanding
Ivana Balaˇzevi´c & Carl Allen
January 27, 2021
What are Knowledge Graphs?
A
B
C
D
father of
ma
rried
to
mother
of
?
sibling
uncle
of
?
Entities E={A,B,C,D}Representing Entities and Relations
Subject and object entitieses,eo are represented byvectorses,eo∈Rd
(embeddings).
Relationsr are represented by transformationsfr,gr:Rd→Rd
0 that transform the entity embeddings.
Aproximity measure, e.g. Euclidean distance, dot product, compares the transformed subject and object entities.
es eo e(sr) e(or) fr gr
Representing Entities and Relations
Subject and object entitieses,eo are represented byvectorses,eo∈Rd
(embeddings).
Relationsr are represented bytransformations fr,gr:Rd→Rd
0 that transform the entity embeddings.
Aproximity measure, e.g. Euclidean distance, dot product, compares the transformed subject and object entities.
es eo e(sr) e(or) fr gr
Representing Entities and Relations
Subject and object entitieses,eo are represented byvectorses,eo∈Rd
(embeddings).
Relationsr are represented bytransformations fr,gr:Rd→Rd
0 that transform the entity embeddings.
Aproximity measure, e.g. Euclidean distance, dot product, compares the transformed subject and object entities.
es eo e(sr) e(or) fr gr
Representing Entities and Relations
Subject and object entitieses,eo are represented byvectorses,eo∈Rd
(embeddings).
Relationsr are represented bytransformations fr,gr:Rd→Rd
0 that transform the entity embeddings.
Aproximity measure, e.g. Euclidean distance, dot product, compares the transformed subject and object entities.
es eo e(sr) e(or) fr gr
Score Function
Ascore functionφ:E ×R×E →Rbrings together entity, relation representations and proximity measure to assign a scoreφ(es,r,eo) to each triple, used to predict whether the triple is true or false.
Representation parameters are optimised to improve prediction accuracy. Score functions can be broadly categorised by:
ä relation representation type (additive, multiplicative or both); and ä proximity measure (e.g. dot product, Euclidean distance).
Rel. Repr. Type Exampleφ(es,r,eo) Model
Multiplicative e>
sWreo=he(sr),eoi
DistMult (Yang et al., 2015) TuckER (Balaˇzevi´c et al., 2019b) Additive −kes+r−eok2 TransE (Bordes et al., 2013)
Both −ke>sWsr+r−e >
Score Function
Ascore functionφ:E ×R×E →Rbrings together entity, relation representations and proximity measure to assign a scoreφ(es,r,eo) to each triple, used to predict whether the triple is true or false.
Representation parameters are optimised to improve prediction accuracy.
Score functions can be broadly categorised by:
ä relation representation type (additive, multiplicative or both); and ä proximity measure (e.g. dot product, Euclidean distance).
Rel. Repr. Type Exampleφ(es,r,eo) Model
Multiplicative e>
sWreo=he(sr),eoi
DistMult (Yang et al., 2015) TuckER (Balaˇzevi´c et al., 2019b) Additive −kes+r−eok2 TransE (Bordes et al., 2013)
Both −ke>sWsr+r−e >
Score Function
Ascore functionφ:E ×R×E →Rbrings together entity, relation representations and proximity measure to assign a scoreφ(es,r,eo) to each triple, used to predict whether the triple is true or false.
Representation parameters are optimised to improve prediction accuracy. Score functions can be broadly categorised by:
ä relation representation type (additive, multiplicative or both); and ä proximity measure (e.g. dot product, Euclidean distance).
Rel. Repr. Type Exampleφ(es,r,eo) Model
Multiplicative e>
sWreo=he(sr),eoi
DistMult (Yang et al., 2015) TuckER (Balaˇzevi´c et al., 2019b) Additive −kes+r−eok2 TransE (Bordes et al., 2013)
Both −ke>sWsr+r−e >
Score Function
Ascore functionφ:E ×R×E →Rbrings together entity, relation representations and proximity measure to assign a scoreφ(es,r,eo) to each triple, used to predict whether the triple is true or false.
Representation parameters are optimised to improve prediction accuracy. Score functions can be broadly categorised by:
ä relation representation type (additive, multiplicative or both); and ä proximity measure (e.g. dot product, Euclidean distance).
Rel. Repr. Type Exampleφ(es,r,eo) Model
Multiplicative e>
sWreo=he
(r)
s ,eoi
DistMult (Yang et al., 2015) TuckER (Balaˇzevi´c et al., 2019b) Additive −kes+r−eok2 TransE (Bordes et al., 2013)
Both −ke>Ws+r−e> Wok2+b
TuckER: Tensor Factorization for Knowledge Graph Completion
W de de dr es eo wr=
de de es eo WrFigure 1: Visualization of the TuckER architecture.
φTuckER(es,r,eo) = ((W ×1wr)×2es)×3eo=e>s Wreo
Multi-task learning: Rather than learning distinct relation matricesWr, the core tensorW contains a shared pool of “prototype” relation matrices that are linearly combined using parameters of the relation embeddingwr.
TuckER: Tensor Factorization for Knowledge Graph Completion
W de de dr es eo wr=
de de es eo WrFigure 1: Visualization of the TuckER architecture.
φTuckER(es,r,eo) = ((W ×1wr)×2es)×3eo=e>s Wreo
Multi-task learning: Rather than learning distinct relation matricesWr, the core tensorW contains a shared pool of “prototype” relation matrices
MuRE: Multi-relational Euclidean Graph Embeddings
x z
y
Figure 2: MuRE spheres of influence.
Recap
ä KGs store facts: binaryrelationsbetweenentities(es,r,eo).
ä Enable computational reasoning over KGs,
e.g. question answering and inferring new facts (link prediction) ä Requires representation, typically:
• each entity by a vectorembedding e∈Rd, • each relation by atransformationfrom
subject entity to object entity,
es eo
es(r)
r
Many,many models with increasing success, but no principled
rationaleas to why they work, or how to improve (e.g. better
Recap
ä KGs store facts: binaryrelationsbetweenentities(es,r,eo). ä Enable computational reasoning over KGs,
e.g. question answering and inferring new facts (link prediction)
ä Requires representation, typically:
• each entity by a vectorembedding e∈Rd, • each relation by atransformationfrom
subject entity to object entity,
es eo
es(r)
r
Many,many models with increasing success, but no principled
rationaleas to why they work, or how to improve (e.g. better
Recap
ä KGs store facts: binaryrelationsbetweenentities(es,r,eo). ä Enable computational reasoning over KGs,
e.g. question answering and inferring new facts (link prediction) ä Requires representation, typically:
• each entity by a vectorembedding e∈Rd, • each relation by atransformationfrom
subject entity to object entity,
es eo
es(r)
r
Many,many models with increasing success, but no principled
rationaleas to why they work, or how to improve (e.g. better
Recap
ä KGs store facts: binaryrelationsbetweenentities(es,r,eo). ä Enable computational reasoning over KGs,
e.g. question answering and inferring new facts (link prediction) ä Requires representation, typically:
• each entity by a vectorembedding e∈Rd, • each relation by atransformationfrom
subject entity to object entity,
es eo
es(r)
r
Many,many models with increasing success, but no principled
rationaleas to why they work, or how to improve (e.g. better
Recap
ä KGs store facts: binaryrelationsbetweenentities(es,r,eo). ä Enable computational reasoning over KGs,
e.g. question answering and inferring new facts (link prediction) ä Requires representation, typically:
• each entity by a vectorembedding e∈Rd, • each relation by atransformationfrom
subject entity to object entity,
es eo
es(r)
r
Many,many models with increasing success, but no principled
rationaleas to why they work, or how to improve (e.g. better
Recap
ä KGs store facts: binaryrelationsbetweenentities(es,r,eo). ä Enable computational reasoning over KGs,
e.g. question answering and inferring new facts (link prediction) ä Requires representation, typically:
• each entity by a vectorembedding e∈Rd, • each relation by atransformationfrom
subject entity to object entity,
es eo
es(r)
r
ä Many,many models with gradually increasing success, butno principled rationalefor why they work, or how to improve them (e.g. more accurate prediction, incorporate logic, etc).
Simplify: consider Word Embeddings
ä Word embeddings, e.g.
• Word2Vec (W2V, Mikolov et al., 2013) • GloVe(Pennington et al., 2014)
w1 w2 w3 wn target words (E) c1 c2 c3 cn context words (E) .. . ... W C
ä Observation: semanticrelations
between words =⇒
geometricrelationships between embeddings • similarwords⇒close embeddings
• analogies(often) ⇒ wking
wman
wwoman
wwoman+wking−wman ≈ wqueen
Simplify: consider Word Embeddings
ä Word embeddings, e.g.
• Word2Vec (W2V, Mikolov et al., 2013) • GloVe(Pennington et al., 2014)
w1 w2 w3 wn target words (E) c1 c2 c3 cn context words (E) .. . ... W C
ä Observation: semanticrelations
between words =⇒
geometricrelationships between embeddings
• similarwords⇒close embeddings
• analogies(often) ⇒ wking
wman
wwoman
wwoman+wking−wman ≈ wqueen
Simplify: consider Word Embeddings
ä Word embeddings, e.g.
• Word2Vec (W2V, Mikolov et al., 2013) • GloVe(Pennington et al., 2014)
w1 w2 w3 wn target words (E) c1 c2 c3 cn context words (E) .. . ... W C
ä Observation: semanticrelations
between words =⇒
geometricrelationships between embeddings • similarwords⇒close embeddings
• analogies(often) ⇒ wking
wman
wwoman
wwoman+wking−wman ≈ wqueen
Simplify: consider Word Embeddings
ä Word embeddings, e.g.
• Word2Vec (W2V, Mikolov et al., 2013) • GloVe(Pennington et al., 2014)
w1 w2 w3 wn target words (E) c1 c2 c3 cn context words (E) .. . ... W C
ä Observation: semanticrelations
between words =⇒
geometricrelationships between embeddings • similarwords⇒close embeddings
• analogies(often) ⇒ wking
wman
wwoman
wwoman+wking−wman ≈ wqueen
Simplify: consider Word Embeddings
ä Word embeddings, e.g.
• Word2Vec (W2V, Mikolov et al., 2013) • GloVe(Pennington et al., 2014)
w1 w2 w3 wn target words (E) c1 c2 c3 cn context words (E) .. . ... W C
ä Observation: semanticrelations
between words =⇒
geometricrelationships between embeddings • similarwords⇒close embeddings
• analogies(often) ⇒ wking
wman
wwoman
wwoman+wking−wman ≈ wqueen
Understanding word embeddings: the W2V Loss Function
−`W2V = X i,j #(wi,cj) logσ(wi>cj) + k#(wi)#(cj) D log(σ(−w > i cj)) ∇wi`W2V ∝ X j p(wi,cj)+kp(wi)p(cj) | {z } d(ji) σ(Si,j)−σ(w>i cj) | {z } e(ji) cj =Cdiag(d(i))e(i) •`W2V minimised when: low-rank case: w>i cj = logp(cj|wi) p(cj) | {z }
PMI(wi,cj)
−logk =. Si,j (Levy and Goldberg, 2014)
general case: error vectors diag(d(i))e(i) orthogonal to rows ofC
⇒Embeddingwi is a (non-linear) projection of rowiof the PMI matrix*, a PMI vectorpi.
Understanding word embeddings: the W2V Loss Function
−`W2V = X i,j #(wi,cj) logσ(wi>cj) + k#(wi)#(cj) D log(σ(−w > i cj)) ∇wi`W2V ∝ X j p(wi,cj)+kp(wi)p(cj) | {z } d(ji) σ(Si,j)−σ(w>i cj) | {z } e(ji) cj =Cdiag(d(i))e(i) •`W2V minimised when: low-rank case: w>i cj = logp(cj|wi) p(cj) | {z }
PMI(wi,cj)
−logk =. Si,j (Levy and Goldberg, 2014)
general case: error vectors diag(d(i))e(i) orthogonal to rows ofC
⇒Embeddingwi is a (non-linear) projection of rowiof the PMI matrix*, a PMI vectorpi.
Understanding word embeddings: the W2V Loss Function
−`W2V = X i,j #(wi,cj) logσ(wi>cj) + k#(wi)#(cj) D log(σ(−w > i cj)) ∇wi`W2V ∝ X j p(wi,cj)+kp(wi)p(cj) | {z } d(ji) σ(Si,j)−σ(w>i cj) | {z } e(ji) cj =Cdiag(d(i))e(i) •`W2V minimised when: low-rank case: w>i cj = logp(cj|wi) p(cj) | {z }
PMI(wi,cj)
−logk =. Si,j (Levy and Goldberg, 2014)
general case: error vectors diag(d(i))e(i) orthogonal to rows ofC
⇒Embeddingwi is a (non-linear) projection of rowiof the PMI matrix*, a PMI vectorpi.
Understanding word embeddings: the W2V Loss Function
−`W2V = X i,j #(wi,cj) logσ(wi>cj) + k#(wi)#(cj) D log(σ(−w > i cj)) ∇wi`W2V ∝ X j p(wi,cj)+kp(wi)p(cj) | {z } d(ji) σ(Si,j)−σ(w>i cj) | {z } e(ji) cj =Cdiag(d(i))e(i) •`W2V minimised when: low-rank case: w>i cj = logp(cj|wi) p(cj) | {z }
PMI(wi,cj)
−logk =. Si,j (Levy and Goldberg, 2014)
general case: error vectors diag(d(i))e(i) orthogonal to rows ofC
⇒Embeddingwi is a (non-linear) projection of rowiof the PMI matrix*, a PMI vectorpi.
Understanding word embeddings: the W2V Loss Function
−`W2V = X i,j #(wi,cj) logσ(wi>cj) + k#(wi)#(cj) D log(σ(−w > i cj)) ∇wi`W2V ∝ X j p(wi,cj)+kp(wi)p(cj) | {z } d(ji) σ(Si,j)−σ(w>i cj) | {z } e(ji) cj =Cdiag(d(i))e(i) •`W2V minimised when: low-rank case: w>i cj = logp(cj|wi) p(cj) | {z }
PMI(wi,cj)
−logk =. Si,j (Levy and Goldberg, 2014)
general case: error vectors diag(d(i))e(i) orthogonal to rows ofC
⇒Embeddingwi is a (non-linear) projection of rowiof the PMI matrix*, a PMI vectorpi.
PMI Vectors
pi =
logp(cj|wi)
p(cj) cj∈E = log p(E|wi)
PMI Vector Interactions = Semantics (Similarity)
Similarity: similar words, e.g. synonyms, induce similar distributions, p(E|w), over context words.
Identified bysubtraction of PMI vectors: pi−pj = logp(E|wi) p(E|wj) = ρ i,j w1 wn E p(E|hound) p(E|dog) =⇒ pdog phound
PMI Vector Interactions = Semantics (Similarity)
Similarity: similar words, e.g. synonyms, induce similar distributions, p(E|w), over context words.
Identified bysubtraction of PMI vectors: pi−pj = logp(E|wi) p(E|wj) = ρ i,j w1 wn E p(E|hound) p(E|dog) =⇒ pdog phound
PMI Vector Interactions = Semantics (Similarity)
Similarity: similar words, e.g. synonyms, induce similar distributions, p(E|w), over context words.
Identified bysubtraction of PMI vectors: pi−pj = logp(E|wi) p(E|wj) = ρ i,j w1 wn E p(E|hound) p(E|dog) =⇒ pdog phound
PMI Vector Interactions = Semantics (Paraphrase)
Paraphrases: word sets with similar aggregate semantic meaning, e.g. {man, royal} ≈king.
Identified byadditionof PMI vectors: pi+pj = logp(E|wi) p(E) + log p(E|wj) p(E) =pk + logp(E|wi,wj) p(E|wk) | {z } ρ{i,j},k | {z } paraphrase error − log p(wi,wj|E) p(wi|E)p(wj|E) | {z } σi,j +log p(wi,wj) p(wi)p(wj) | {z } τi,j | {z } independence error w1 wn E
p(E|king) p(E|{man, royal})
=⇒
pman
proyal
p{man, royal}
PMI Vector Interactions = Semantics (Paraphrase)
Paraphrases: word sets with similar aggregate semantic meaning, e.g. {man, royal} ≈king.
Identified byadditionof PMI vectors: pi+pj = logp(E|wi) p(E) + log p(E|wj) p(E) =pk + logp(E|wi,wj) p(E|wk) | {z } ρ{i,j},k | {z } paraphrase error − log p(wi,wj|E) p(wi|E)p(wj|E) | {z } σi,j +log p(wi,wj) p(wi)p(wj) | {z } τi,j | {z } independence error w1 wn E
p(E|king) p(E|{man, royal})
=⇒
pman
proyal
p{man, royal}
PMI Vector Interactions = Semantics (Paraphrase)
Paraphrases: word sets with similar aggregate semantic meaning, e.g. {man, royal} ≈king.
Identified byadditionof PMI vectors: pi+pj = logp(E|wi) p(E) + log p(E|wj) p(E) =pk + logp(E|wi,wj) p(E|wk) | {z } ρ{i,j},k | {z } paraphrase error − log p(wi,wj|E) p(wi|E)p(wj|E) | {z } σi,j +log p(wi,wj) p(wi)p(wj) | {z } τi,j | {z } independence error
p(E|king) p(E|{man, royal})
=⇒
pman
p{man, royal}
PMI Vector Interactions = Semantics (Analogy)
Analogies: word pairs that share a similar semantic difference, e.g.{man, king}and{woman, queen}.
Identified by alinear combination of PMI vectors: pking−pman ≈ pqueen−pwoman
p
kingp
manp
womanp
queenPMI Vector Interactions = Semantics (Analogy)
Analogies: word pairs that share a similar semantic difference, e.g.{man, king}and{woman, queen}.
Identified by alinear combination of PMI vectors: pking−pman ≈ pqueen−pwoman
p
kingp
manp
womanp
queenPMI Vector Interactions = Semantics (Analogy)
Analogies: word pairs that share a similar semantic difference, e.g.{man, king}and{woman, queen}.
Identified by alinear combination of PMI vectors: pking−pman ≈ pqueen−pwoman
p
kingp
manp
womanFrom Analogies to Relations
pking pman pwoman pqueen Analogy ⇔≈
+ man king≈
+ woman queen Relationä Analogies contain commonbinary word relations, similar to KGs. ä For certain analogies (“specialisations”), the associated “vector offset”
gives atransformation that represents the relation.
ä Not all relations fit this semantic pattern, but we have insight to consider geometric aspects (relation conditions) of other relation types.
From Analogies to Relations
pking pman pwoman pqueen Analogy ⇔≈
+ man king≈
+ woman queen Relation ä Analogies contain commonbinary word relations, similar to KGs.ä For certain analogies (“specialisations”), the associated “vector offset” gives atransformation that represents the relation.
ä Not all relations fit this semantic pattern, but we have insight to consider geometric aspects (relation conditions) of other relation types.
From Analogies to Relations
pking pman pwoman pqueen Analogy ⇔≈
+ man king≈
+ woman queen Relation ä Analogies contain commonbinary word relations, similar to KGs. ä For certain analogies (“specialisations”), the associated “vector offset”gives atransformation that represents the relation.
ä Not all relations fit this semantic pattern, but we have insight to consider geometric aspects (relation conditions) of other relation types.
From Analogies to Relations
pking pman pwoman pqueen Analogy ⇔≈
+ man king≈
+ woman queen Relation ä Analogies contain commonbinary word relations, similar to KGs. ä For certain analogies (“specialisations”), the associated “vector offset”gives atransformation that represents the relation.
ä Not all relations fit this semantic pattern, but we have insight to consider geometric aspects (relation conditions) of other relation types.
Categorising Relations: semantics
→
relation requirements
≈
Similarity≈
Relatedness≈
Specialisation≈
Context-shift≈
Gen. context-shift Relationships between PMI vectors for different relation types.blue/green = strong word association (PMI>0); red = relatedness; black = context sets
Categorisation of WN18RR relations.
Type Relation Examples(subject entity, object entity)
R
verb group (trim down VB 1, cut VB 35),(hatch VB 1, incubate VB 2)
derivationally related form (lodge VB 4, accommodation NN 4),(question NN 1, inquire VB 1)
also see (clean JJ 1, tidy JJ 1),(ram VB 2, screw VB 3)
S hypernym (land reform NN 1, reform NN 1),(prickle-weed NN 1, herbaceous plant NN 1) instance hypernym (yellowstone river NN 1, river NN 1),(leipzig NN 1, urban center NN 1)
C
member of domain usage (colloquialism NN 1, figure VB 5),(plural form NN 1, authority NN 2)
member of domain region (rome NN 1, gladiator NN 1),(usa NN 1, multiple voting NN 1)
member meronym (south NN 2, sunshine state NN 1),(genus carya NN 1, pecan tree NN 1)
has part (aircraft NN 1, cabin NN 3),(morocco NN 1, atlas mountains NN 1)
Categorising Relations: semantics
→
relation requirements
≈
Similarity≈
Relatedness≈
Specialisation≈
Context-shift≈
Gen. context-shift Relationships between PMI vectors for different relation types.blue/green = strong word association (PMI>0); red = relatedness; black = context sets
Categorisation of WN18RR relations.
Type Relation Examples(subject entity, object entity)
R
verb group (trim down VB 1, cut VB 35),(hatch VB 1, incubate VB 2)
derivationally related form (lodge VB 4, accommodation NN 4),(question NN 1, inquire VB 1)
also see (clean JJ 1, tidy JJ 1),(ram VB 2, screw VB 3)
S hypernym (land reform NN 1, reform NN 1),(prickle-weed NN 1, herbaceous plant NN 1) instance hypernym (yellowstone river NN 1, river NN 1),(leipzig NN 1, urban center NN 1)
Categorical completeness: are all relations covered?
ä View PMI vectors assets of word features andrelation types as set operations:
• similarity ⇒set equality
• relatedness ⇒subset equality (relation-specific) • context-shift ⇒set difference (relation-specific)
ä For any relation, each feature is either • necessarily unchanged (relatedness),
• necessarily/potentially changed (context shift), or • irrelevant.
ä Conjecture: the relation types identified partition the set of semantic relations.
Categorical completeness: are all relations covered?
ä View PMI vectors assets of word features andrelation types as set operations:
• similarity ⇒set equality
• relatedness ⇒subset equality (relation-specific) • context-shift ⇒set difference (relation-specific) ä For any relation, each feature is either
• necessarily unchanged (relatedness),
• necessarily/potentially changed (context shift), or • irrelevant.
ä Conjecture: the relation types identified partition the set of semantic relations.
Categorical completeness: are all relations covered?
ä View PMI vectors assets of word features andrelation types as set operations:
• similarity ⇒set equality
• relatedness ⇒subset equality (relation-specific) • context-shift ⇒set difference (relation-specific) ä For any relation, each feature is either
• necessarily unchanged (relatedness),
• necessarily/potentially changed (context shift), or • irrelevant.
ä Conjecture: the relation types identified partition the set of semantic relations.
Relations as mappings between embeddings
R: S-relatedness requires both entity embeddingses,eo to share a common subspace componentVS
ä project ontoVS (multiply by matrixPr∈Rd×d) and compare.
ä Dot product: (Pres)>(Preo) =es>Pr>Preo=es>Mreo ä Euclidean distance: kPres−Preok2=kPresk2−2es>Mreo+kPreok2
S/C: requiresS-relatedness and relation-specific component(s) (vs r,vro). ä project onto a subspace (byPr∈Rd×d) corresponding toS,vrs andvro
(i.e. testS-relatedness while preserving relation-specific components); ä add relation-specificr =vo
r −vrs∈Rd to transformed embeddings. ä Dot product: (Pres+r)>Preo
Summary
ä Theoretic: a derivation of geometric components of relation representations from word co-occurrence statistics.
ä Interpretability: associates geometric model components with semantic aspects of relations.
ä Empirically supported: justifies relative link-prediction performance of a range of models on real datasets:
additive & multiplicative
| {z }
MuRE* (Balaˇzevi´c et al., 2019a)
> multiplicative
| {z }
TuckER (Balaˇzevi´c et al., 2019b) DistMult (Yang et al., 2015)
or additive
| {z }
TransE (Bordes et al., 2013)
.
*Note: MuRE was inspired by the vector offset of analogies.
Summary
ä Theoretic: a derivation of geometric components of relation representations from word co-occurrence statistics.
ä Interpretability: associates geometric model components with semantic aspects of relations.
ä Empirically supported: justifies relative link-prediction performance of a range of models on real datasets:
additive & multiplicative
| {z }
MuRE* (Balaˇzevi´c et al., 2019a)
> multiplicative
| {z }
TuckER (Balaˇzevi´c et al., 2019b) DistMult (Yang et al., 2015)
or additive
| {z }
TransE (Bordes et al., 2013)
.
*Note: MuRE was inspired by the vector offset of analogies.
Summary
ä Theoretic: a derivation of geometric components of relation representations from word co-occurrence statistics.
ä Interpretability: associates geometric model components with semantic aspects of relations.
ä Empirically supported: justifies relative link-prediction performance of a range of models on real datasets:
additive & multiplicative
| {z }
MuRE* (Balaˇzevi´c et al., 2019a)
> multiplicative
| {z }
TuckER (Balaˇzevi´c et al., 2019b) DistMult (Yang et al., 2015)
or additive
| {z }
TransE (Bordes et al., 2013)
.
*Note: MuRE was inspired by the vector offset of analogies.
Summary
ä Theoretic: a derivation of geometric components of relation representations from word co-occurrence statistics.
ä Interpretability: associates geometric model components with semantic aspects of relations.
ä Empirically supported: justifies relative link-prediction performance of a range of models on real datasets:
additive & multiplicative
| {z }
MuRE* (Balaˇzevi´c et al., 2019a)
> multiplicative
| {z }
TuckER (Balaˇzevi´c et al., 2019b) DistMult (Yang et al., 2015)
or additive
| {z }
TransE (Bordes et al., 2013)
.
Thanks!
References i
Carl Allen and Timothy Hospedales. Analogies Explained: Towards Understanding Word Embeddings. InICML, 2019.
Carl Allen, Ivana Balaˇzevi´c, and Timothy Hospedales. What the Vec? Towards Probabilistically Grounded Embeddings. InNeurIPS, 2019.
Carl Allen, Ivana Balaˇzevi´c, and Timothy Hospedales. Interpreting Knowledge Graph Relation Representation from Word Embeddings. InICLR, 2021.
Ivana Balaˇzevi´c, Carl Allen, and Timothy M Hospedales. Multi-relational Poincar´e Graph Embeddings. InNeurIPS, 2019a.
Ivana Balaˇzevi´c, Carl Allen, and Timothy M Hospedales. TuckER: Tensor Factorization for Knowledge Graph Completion. InEMNLP, 2019b.
Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. Translating Embeddings for Modeling Multi-relational Data. In
NeurIPS, 2013.
Omer Levy and Yoav Goldberg. Neural word embedding as implicit matrix factorization. InNeurIPS, 2014.
References ii
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. InICLR Workshop, 2013.
Jeffrey Pennington, Richard Socher, and Christopher Manning. Glove: Global Vectors for Word Representation. InEMNLP, 2014.
Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. InICLR, 2015.