Mark Steyvers Mark Steyvers
Department of Cognitive Sciences University of California, Irvine
1
Network structure of word‐associationsNetwork structure of word associations
Decentralized search in information networks
Analogy between Google and word retrieval
2
nwords = 5,000+
Categories
1,000
Word forms Word forms
29,000+
Word senses
99,000+
Word forms
122,000+
Word
Association Roget WordNet
1. Short Path Lengths
n = number of nodes 5000+ 30,000+ 200,000+
D = diameter 5 10 27
L = average path lengthg g 3.04 5.6 10.6
C P( Neighbors of a node are ) 186 875 029
2. Local Clustering
C = P( Neighbors of a node are ) .186 .875 .029
each other’s neighbors
C=0 C=1
3. Power-Law degree distributions
Exponential:
Scale-free
(Power law degree distributions)
HUBS
e.g., random graphs
WORD ASSOCIATION WORDNET
ROGET'S THESAURUS WORD ASSOCIATION
) 10-2
10-1 100
WORDNET
10-2 10-1 100
THESAURUS
10-1 100
P( k )
10-5 10-4 10-3
10-5 10-4 10-3 10
10-2
γ=3.01 γ=3 19 γ=3 11
101 102 10-6
10
100 101 102
10-6 10
100 101
10-3 k k
k
γ 3.01 γ=3.19 γ=3.11
gs
Slope in rank plot a=.466
#meaning
Adamic (2000):
γ=1+1/a
#
Slope in distribution plot γ = 3.15
Word frequency rank
Small‐world properties put qualitative constraints on theories of semantic representations
(Griffiths, Steyvers, Tenenbaum, 2007; Steyvers & Tenenbaum, 2005) 10
“Why is network anatomy so important to
characterize? Because structure always affects function.” Strogatz (2001). Nature, 410, p. 268
S ll ld ll ffi i t h
Small‐worlds allow efficient search processes
11
12
Random person Random person in Nebraska
Median of 6 steps (“6 degrees of separation”)
Target person in Boston
in Boston
K i i ht f Mil 6
Key insight from Milgram 1967:
Short paths exist but can also be found using simple decentralized search algorithmsg
No global knowledge of network is required, only notion of proximity to target
proximity to target
Search time O( log n ) with small‐world structuresg
Relevance for finding information on peer‐to‐peer networks without using a global index
networks without using a global index
14
Global memory models Proposed Model Global memory models
(e.g. Minerva, REM, etc)
Proposed Model
Target memory Memory trace
Memory trace
Cue Memory trace
M t Cue
Memory trace
Memory trace Memory trace
Memory trace Memory trace
Memory trace Memory trace
Cue z Local connectivity
z No global index into memory
z No global index into memory
z Notion of proximity to target
z Decentralized search of target information
(Griffiths, Steyvers, Flirl, 2007; Psych Science) 16
Associative semantic networks World Wide Web
Page
C t Pet
P
P P
Cat
Dog Bone
Page
Page Page
Google PageRank uses network
structure to rank documents Can we use PageRank to explain how humans retrieve words using
z17
semantic network structure? g
18
19
How can Google “know”
that we are interested in this page out of 377,000?
f h l f b
Measure for the relative importance of a webpage based on link structure
Key idea:
the relationship between importance and linking is the relationship between importance and linking is
recursive: a highly important webpage is a webpage that receives many links from other highly important
webpages webpages
z20 Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual Web search engine. Computer Networks
and ISDN Systems, 30, 107-117.
Adjacency matrix LAdjacency matrix L
= 1 if there is a link from document j to i Lij
Normalize matrix L to create a Markov chain
PageRank values correspond to the principal eigenvector of L
z21
A
C
B D
z22 Note: PageRank also assumes an additional process. At each time, there is a 0.15 probability of a “teleport” jump to any page in
the network
P=.36
P=.18
P=.34 P=.12
z23 Note: PageRank also assumes an additional process. At each time, there is a 0.15 probability of a “teleport” jump to any page in
the network
PageRank as the relative importance of a wordage a as t e e at e po ta ce o a o d
Does this explain word production in humans?
Is it better than word frequency or other measures
word
of availability?
word
word word
word
word
word
word
z24
o d
h l f l h b
Cue participant with letter of alphabet.
D D _______
Participants give first word that comes to mind:Participants give first word that comes to mind:
E.g. DOG, DAD, DOOR
50 participants, 21 letters of alphabet
z25
A B C D P
z26
Nelson et al. (1998)
5018 x 5018 matrix of associations 5 5 Pet
Applied PageRank on unweighted
Cat Pet
semantic association matrix Cat
Dog
Bone
z27
z28
P di t All d
Predictor All words
PageRank 8.3%
In degree %
In‐degree 10.0%
Word frequency (TASA) 19.0%
z29
P di t All d
Predictor All words
PageRank 8.3%
In degree %
In‐degree 10.0%
Word frequency (TASA) 19.0%
Weighted PageRank % Weighted PageRank 7.1%
Weighted in‐degree 8.2%
z30
P di t All d N O l C t N
Predictor All words Nouns Only Concrete Nouns
PageRank 8.3% 8.2% 13.3%
In degree % 8% %
In‐degree 10.0% 14.8% 17.5%
Word frequency (TASA) 19.0% 22.5% 21.6%
Weighted PageRank % 8 6% 3 3%
Weighted PageRank 7.1% 8.6% 13.3%
Weighted in‐degree 8.2% 13.0% 16.7%
z31
PageRank on word association might approximate psychological mechanisms
R d lk ti
Random walk on a semantic network
“Random mental surfing”
Cat Bone
Pet
Random mental surfing
Dog
z32
hb h d f d f l f
Neighborhood structure of words is useful for prediction
Local structure
e.g., number of incoming and outgoing connections (fan‐e.g., number of incoming and outgoing connections (fan in, fan‐out)
Global structure
e.g. Google PageRank
33
Similar computational demands
Similar computational demands:
Both retrieve the most relevant items from a large
information repository in response to external cues or o at o epos to y espo se to e te a cues o queries.
U f l l i / i di i li h
Useful analogies/ interdisciplinary approaches
z34