• No results found

DanNet From Dictionary to Wordnet

N/A
N/A
Protected

Academic year: 2021

Share "DanNet From Dictionary to Wordnet"

Copied!
72
0
0

Loading.... (view fulltext now)

Full text

(1)

DanNet

From Dictionary

to Wordnet

Jörg Asmussen

Society for Danish Language and Literature, DSL, Copenhagen

Bolette Sandford Pedersen

Centre for Language Technology, CST, University of Copenhagen

Lars Trap-Jensen

(2)

Outline

1. Introduction LTJ, 2 min.

2. Characteristics of the DDO LTJ, 5 min.

3. Building DanNet BSP, 8 min.

4. Extraction of differentia info JA, 7 min.

(3)

DanNet

Lexical-semantic wordnet for Danish

Joint project

Society for Danish Language and Literature

Centre for Language Technology, University of Copenhagen
(4)

Limited resources

Adapt an existing wordnet? or

Reuse other lexical-semantic resources:

SIMPLE-DK
(5)

Outline

1. Introduction

2. Characteristics of the DDO 3. Building DanNet

4. Extraction of differentia info from definitons 5. Conclusions

(6)

Den Danske Ordbog

Published by DSL 2003–5

Corpus-based, DDOC

60,000 entries

Spelling, morphology, pronunciation, meaning, collocations,

fixed phrases, syntax, usage, word formation, etymology

(7)

Den Danske Ordbog

Words edited in related groups

Machine readable

Fine-grained microstructure
(8)
(9)

Systematic domain info

concerns relation

(10)

Sense definition

relevant info „manually“ extracted

(11)

Hyperonym

(12)

Sense relations, i.e. synonyms

(13)

Collocational information

(14)

Authentic example

(15)
(16)

Definitions in the DDO

Definition scheme:

Genus proximum – closest hyperonym:

apparat ‚technical device‘

Differentia specifica – distinctive feature: remaining part of the definition
(17)

Outline

1. Introduction

2. Characteristics of the DDO 3. Building DanNet

4. Extraction of differentia info from definitons 5. Conclusions

(18)

Building DanNet

Extract definitions and genus specifications

Include them in the DanNet tool

Use it for domain-wise development of data: 1. Homonymy and polysemy

2. Establishing synsets

(19)

Homonymy & polysemy

celle ‚cell‘ is genus proximum of

gærcelle ,yeast cell‘

fængselscelle ‚prison cell‘

Convert lexical expressions into concepts:

celle-1 ‚part of living organism‘
(20)

Establishing synsets

lære

‚studies‘ ‚subject‘fag videnskab‚science‘

informatik ‚informatics‘ bromatologi ‚nutrition science‘ samfundsfag ‚social studies‘ datalogi ‚computer science‘

(21)

Establishing synsets

lære

‚studies‘ ‚subject‘fag videnskab‚science‘

informatik ‚informatics‘ bromatologi ‚nutrition science‘ samfundsfag ‚social studies‘ datalogi ‚computer science‘ One synset

(22)

Building the hierarchy

Hyponymy is generally defined as

X is a Y

Taxonymy is a subtype of this:

X is a kind/type of Y
(23)

Example: Hyponymy?

vejtræ ‚roadside tree‘ træ ‚tree‘ kirsebærtræ ‚cherry tree‘ birketræ ‚birch‘
(24)

Example: Hyponymy?

træ ‚tree‘ kirsebærtræ ‚cherry tree‘ birketræ ‚birch‘ vejtræ ‚roadside tree‘ „Orthogonal“ Hyponymy
(25)
(26)

Building the hierarchy

stol ‚chair‘

siddemøbel ‚sitting furniture‘

møbel ‚furniture‘

genstand ‚object‘ TOP

(27)

Building the hierarchy

stol ‚chair‘

siddemøbel ‚sitting furniture‘

møbel ‚furniture‘

genstand ‚object‘ TOP

(28)

Building the hierarchy

stol ‚chair‘

siddemøbel ‚sitting furniture‘

møbel ‚furniture‘

genstand ‚object‘ TOP

(29)

Definition composition

Genus selection – a conscious process

Differentia:

No editorial specifications, i.e. no fixed definition vocabulary nor syntax

Consequences for DanNet:

Complicates computational exploitation
(30)

Coding relations

What is done manually:

No semantic info other than that of DDO

Reduction of semantic info

What is done automatically:
(31)

Outline

1. Introduction

2. Characteristics of the DDO 3. Building DanNet

4. Extraction of differentia info from definitons 5. Conclusions

(32)

Extraction of telic role

fjernsyn ‚tv set‘

‚box-shaped device that can receive tv signals and transform them into animated pictures on a screen and accompanying sound in the speakers of the device‘

(33)

Extraction of telic role

genus expression

fjernsyn ‚tv set‘

‚box-shaped device that can receive tv signals and transform them into animated pictures on a screen and accompanying sound in the speakers of the device‘

(34)

Extraction of telic role

genus expression

fjernsyn ‚tv set‘

‚box-shaped device that can receive tv signals and transform them into animated pictures on a screen and accompanying sound in the speakers of the device‘

Telic role:

(35)

Extraction of telic role

genus expression

fjernsyn ‚tv set‘

‚box-shaped device that can receive tv signals and transform them into animated pictures on a screen and accompanying sound in the speakers of the device‘

Telic role:

(36)
(37)

VPs in a relative clause which are headed by

kan ‚can‘ specify the telic role (i.e. the

for_purpose_of relation) of the definiendum

Hypothesis

(38)

VPs in a relative clause which are headed by

kan ‚can‘ specify the telic role (i.e. the

for_purpose_of relation) of the definiendum

Hypothesis

Corpus query

Find all definitions with genus apparat

followed by der or som

followed by kan

(39)
(40)

Results of corpus query

query VP

heads denoting

(41)

Results of corpus query

query VP

heads denoting

telic role dictionary entries

Only 26 occurrences

of this pattern – but 203

(42)
(43)

Why this bad coverage?

1. Definitions where the pattern contains
(44)

Why this bad coverage?

1. Definitions where the pattern contains

interposed material are not captured 2. Other stuctural patterns indicating a

for_purpose_of relation than that one given in our hypothesis

(45)

1. GE that can VP-inf

2. GE that is used for to VP-inf with

3. GE for to VP-inf with/on/in

4. GE that VP-fin 5. GE for NP

6. GE that is specially designed for to VP-inf

Further patterns

(46)

1. GE that can VP-inf

2. GE that is used for to VP-inf with

3. GE for to VP-inf with/on/in

4. GE that VP-fin 5. GE for NP

6. GE that is specially designed for to VP-inf

Further patterns

(47)

1. GE that can VP-inf

2. GE that is used for to VP-inf with

3. GE for to VP-inf with/on/in

4. GE that VP-fin 5. GE for NP

6. GE that is specially designed for to VP-inf

Further patterns

head for_purpose_of

These patterns

capture 70% of the

apparat

(48)
(49)

A statistical approach

Frequency list of types in definitions with genus apparat
(50)

A statistical approach

Frequency list of types in definitions with genus apparat
(51)

A statistical approach

Frequency list of types in definitions with genus apparat

compared with

(52)

A statistical approach

Frequency list of types in definitions with genus apparat

compared with

frequency list of types in all definitions
(53)

A statistical approach

Frequency list of types in definitions with genus apparat

compared with

frequency list of types in all definitions

using a statistical test (e.g. log likelihood)

Salient types are listed for investigation and may give hints on semantic relations
(54)

Some salient types

afspille ‚to play back‘

afspilning ‚play back‘

måle ,measure‘

måler ,measuring tool‘

måling ,gauging‘
(55)

Some salient types

afspille ‚to play back‘

afspilning ‚play back‘

måle ,measure‘

måler ,measuring tool‘

måling ,gauging‘

målinger ,measurements‘

grammofon,

cd-afspiller, afspiller, sequencer, diktafon

kassettespiller,

hjemmevideo, kassettebåndoptager, båndoptager

stroboskop,

måler, timer, løgnedetektor, ekkolod gasmåler,

speedometer, omdrejningstæller, benzinmåler, fotofældeelmåler,

trykmåler, luxmeter, spirometer, gyrometer, alkometer, newtonmeter,

magnetometer, instrument, kalorimeter

måleinstrument,

(56)
(57)

Automatic extraction?

Basically NO...

Developing reliant methods is

too expensive!

(58)

Automatic extraction?

Structural and lexical properties of definitions differ considerably
(59)

Automatic extraction?

Structural and lexical properties of definitions differ considerably

Difficult to automatically extract semantic relations from definitions
(60)

Automatic extraction?

Structural and lexical properties of definitions differ considerably

Difficult to automatically extract semantic relations from definitions

Concordances and lists of salient definition types may help the editor
(61)

Automatic extraction?

Structural and lexical properties of definitions differ considerably

Difficult to automatically extract semantic relations from definitions

Concordances and lists of salient definition types may help the editor

But the DanNet editor still has to do the core job of analysing dictionary definitions
(62)

Outline

1. Introduction

2. Characteristics of the DDO 3. Building DanNet

4. Extraction of differentia info from definitons 5. Conclusions

(63)

Conclusion

(64)

Conclusion

Reusing the DDO

Cheap

(65)

Conclusion

Reusing the DDO

Semi-automatic exploitation of the dictionary structure

hyponymy structure

synonym/antonym info

Cheap

(66)

Conclusion

Reusing the DDO

Semi-automatic exploitation of the dictionary structure

hyponymy structure

synonym/antonym info

Automatic exploitation of definitions proper to find other semantic relations

Cheap

(67)

Conclusion

Reusing the DDO

Semi-automatic exploitation of the dictionary structure

hyponymy structure

synonym/antonym info

Automatic exploitation of definitions proper to find other semantic relations

Cheap

(68)

Conclusion

(69)

Conclusion

The DanNet approach

Expensive Cheap

(70)

Conclusion

The DanNet approach

Translation/expansion of existing WNs?

Better coherence with other WNs

Linguistic bias

Expensive Cheap

(71)

Conclusion

The DanNet approach

Translation/expansion of existing WNs?

Better coherence with other WNs

Linguistic bias

Reusing/merging language resources?

More loyal to the specific language

Expensive, unless based on an existing resource, i.e. a dictionary

Expensive Cheap

(72)

Conclusion

The DanNet approach

Translation/expansion of existing WNs?

Better coherence with other WNs

Linguistic bias

Reusing/merging language resources?

More loyal to the specific language

Expensive, unless based on an existing resource, i.e. a dictionary

Expensive Cheap

References

Related documents

Playing Subscription TV, Movie Box, or Movie Rental content will be unmetered if your Fetch TV Box was provided by your ISP and your mobile device is connected to your Fetch TV

Figure 4 indicates that China and India are very close to the US in terms of expense ratio; both markets fall quite behind in insurance penetration; India has a lower loss ratio

Here’s how to identify the customers who really want a vehicle service contract, and how to make sure they get the protection they’re looking for.... 2012 2013

Computer Hardware Management Software Tamper-proofing Remote Control Customized Interface Operating System Filesystem Device Drivers Graphical Interface Application Software

When used to model U-tube boreholes, the infinite line source method places the line source at the center of the borehole and obtains the average borehole wall temperature.. The fluid

Locally advanced prostate cancer (LAPC), androgen deprivation therapy (ADT), androgen receptor (AR), dose-escalation, external beam radiotherapy (EBRT), conformal radiotherapy

U ovom radu se opisuju aspekti industrije marina i njihove gastro ponude. Kao jedna od najuspješnijih i najvažnijih grana turizma, nautički turizam raste

[r]