• No results found

Department of Cognitive Sciences University of California, Irvine 1

N/A
N/A
Protected

Academic year: 2021

Share "Department of Cognitive Sciences University of California, Irvine 1"

Copied!
34
0
0

Loading.... (view fulltext now)

Full text

(1)

Mark Steyvers Mark Steyvers

Department of Cognitive Sciences University of California, Irvine

1

(2)

ƒ Network structure of word‐associationsNetwork structure of word associations

ƒ Decentralized search in information networks

ƒ Analogy between Google and word retrieval

2

(3)

nwords = 5,000+

(4)

Categories

1,000

Word forms Word forms

29,000+

(5)

Word senses

99,000+

Word forms

122,000+

(6)

Word

Association Roget WordNet

1. Short Path Lengths

n = number of nodes 5000+ 30,000+ 200,000+

D = diameter 5 10 27

L = average path lengthg g 3.04 5.6 10.6

C P( Neighbors of a node are ) 186 875 029

2. Local Clustering

C = P( Neighbors of a node are ) .186 .875 .029

each other’s neighbors

C=0 C=1

3. Power-Law degree distributions

(7)

Exponential:

Scale-free

(Power law degree distributions)

HUBS

e.g., random graphs

(8)

WORD ASSOCIATION WORDNET

ROGET'S THESAURUS WORD ASSOCIATION

) 10-2

10-1 100

WORDNET

10-2 10-1 100

THESAURUS

10-1 100

P( k )

10-5 10-4 10-3

10-5 10-4 10-3 10

10-2

γ=3.01 γ=3 19 γ=3 11

101 102 10-6

10

100 101 102

10-6 10

100 101

10-3 k k

k

γ 3.01 γ=3.19 γ=3.11

(9)

gs

Slope in rank plot a=.466

#meaning

Adamic (2000):

γ=1+1/a

#

Slope in distribution plot γ = 3.15

Word frequency rank

(10)

ƒ Small‐world properties put qualitative constraints  on theories of semantic representations

(Griffiths, Steyvers, Tenenbaum, 2007; Steyvers & Tenenbaum, 2005) 10

(11)

ƒ “Why is network anatomy so important to 

characterize? Because structure always affects  function.” Strogatz (2001). Nature, 410, p. 268

S ll ld   ll   ffi i t  h 

ƒ Small‐worlds allow efficient search processes

11

(12)

12

(13)

Random person Random person in Nebraska

Median of 6 steps (“6 degrees of separation”)

Target person in Boston

in Boston

(14)

K  i i ht f  Mil 6

ƒ Key insight from Milgram 1967:

ƒ Short paths exist but can also be found using simple  decentralized search algorithmsg

ƒ No global knowledge of network is required, only notion of  proximity to target

proximity to target

ƒ Search time O( log n ) with small‐world structuresg

ƒ Relevance for finding information on peer‐to‐peer  networks without using a global index

networks without using a global index

14

(15)

Global memory models Proposed Model Global memory models

(e.g. Minerva, REM, etc)

Proposed Model

Target memory Memory trace

Memory trace

Cue Memory trace

M t Cue

Memory trace

Memory trace Memory trace

Memory trace Memory trace

Memory trace Memory trace

Cue z Local connectivity

z No global index into memory

z No global index into memory

z Notion of proximity to target

z Decentralized search of target information

(16)

(Griffiths, Steyvers, Flirl, 2007; Psych Science) 16

(17)

Associative semantic networks World Wide Web

Page

C t Pet

P

P P

Cat

Dog Bone

Page

Page Page

Google PageRank uses network

structure to rank documents Can we use PageRank to explain  how humans retrieve words using 

z17

semantic network structure?  g

(18)

18

(19)

19

How can Google “know”

that we are interested in this page out of 377,000?

(20)

f h l f b

ƒ Measure for the relative importance of a webpage  based on link structure

ƒ Key idea:

ƒ the relationship between importance and linking is the relationship between importance and linking is 

recursive: a highly important webpage is a webpage that  receives many links from other highly important 

webpages  webpages 

z20 Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual Web search engine. Computer Networks

and ISDN Systems, 30, 107-117.

(21)

ƒ Adjacency matrix  LAdjacency matrix  L

= 1 if there is a link from document j to i Lij

ƒ Normalize matrix L to create a Markov chain

ƒ PageRank values correspond to the principal  eigenvector of L

z21

(22)

A

C

B D

z22 Note: PageRank also assumes an additional process. At each time, there is a 0.15 probability of a “teleport” jump to any page in

the network  

(23)

P=.36

P=.18

P=.34 P=.12

z23 Note: PageRank also assumes an additional process. At each time, there is a 0.15 probability of a “teleport” jump to any page in

the network  

(24)

ƒ PageRank as the relative importance of a wordage a as t e e at e po ta ce o a o d

ƒ Does this explain word production in humans?

ƒ Is it better than word frequency or other measures 

word

of availability?

word

word word

word

word

word

word

z24

o d

(25)

h l f l h b

ƒ Cue participant with letter of alphabet. 

ƒ D _______

ƒ Participants give first word that comes to mind:Participants give first word that comes to mind:

ƒ E.g. DOG, DAD, DOOR

ƒ 50 participants, 21 letters of alphabet

z25

(26)

A B C D P

z26

(27)

ƒ Nelson et al. (1998)

ƒ 5018 x 5018  matrix of associations 5 5 Pet

ƒ Applied PageRank on unweighted 

Cat Pet

semantic association matrix Cat

Dog

Bone

z27

(28)

z28

(29)

P di t All  d

Predictor All words

PageRank 8.3%

In degree %

In‐degree 10.0%

Word frequency (TASA) 19.0%

z29

(30)

P di t All  d

Predictor All words

PageRank 8.3%

In degree %

In‐degree 10.0%

Word frequency (TASA) 19.0%

Weighted PageRank % Weighted PageRank 7.1%

Weighted in‐degree 8.2%

z30

(31)

P di t All  d N  O l C t  N

Predictor All words Nouns Only Concrete Nouns

PageRank 8.3% 8.2% 13.3%

In degree % 8% %

In‐degree 10.0% 14.8% 17.5%

Word frequency (TASA) 19.0% 22.5% 21.6%

Weighted PageRank % 8 6% 3 3%

Weighted PageRank 7.1% 8.6% 13.3%

Weighted in‐degree 8.2% 13.0% 16.7%

z31

(32)

ƒ PageRank on word association might approximate  psychological mechanisms

R d   lk      ti  

ƒ Random walk on a semantic  network

ƒ “Random mental surfing”

Cat Bone

Pet

Random mental surfing

Dog

z32

(33)

hb h d f d f l f

ƒ Neighborhood structure of words is useful for  prediction

ƒ Local structure

ƒ e.g., number of incoming and outgoing connections (fan‐e.g., number of incoming and outgoing connections (fan in, fan‐out)

ƒ Global structure

ƒ e.g. Google PageRank

33

(34)

Similar computational demands

ƒ Similar computational demands:

ƒ Both retrieve the most relevant items from a large 

information repository  in response to external cues or o at o epos to y espo se to e te a cues o queries. 

U f l  l i / i di i li   h  

ƒ Useful analogies/ interdisciplinary approaches 

z34

References

Related documents

For property and casualty insurance business, intermediaries are licensed in California as ‘broker-agents’, meaning that depending on the totality of the facts

In the fifteenth century another great building, D, with its arcades opening up, on the ground floor, onto the courtyard, took up the whole of the western side of the Villa.. In

Several designs used in the CFD analysis and the result shows the pressure distribution, separation points as well as the wake generated at the rear of the vehicle.. Various

A consequence of this trend is a deeply unbalanced soil nutrient composition that ultimately leads to a reduction in crop yield potential (Tonfack et al ., 2009).This research

Nurse executives in a senior management position within an acute care facility including chief nursing officers, nurse directors, nurse managers, and executive nursing officers with

The goal of this investigation is to simply compare the performance of two groups of Saudi EFL learn- ers, one group taking a reading comprehension test in its inter- net-based

He had this dream: He wanted to set up a team of representatives to carry the views of mental health service users into the heart of the committee world of both Dorset County

(b) May not accept contributions from any individual, multi-legislative candidate political committee, principal campaign committee or political party committee that, when combined