• No results found

Utrecht Linguistic Database. Computational Tools for Linguistic Data March 15, Rapid Application Development

N/A
N/A
Protected

Academic year: 2022

Share "Utrecht Linguistic Database. Computational Tools for Linguistic Data March 15, Rapid Application Development"

Copied!
18
0
0

Loading.... (view fulltext now)

Full text

(1)

Computational Tools for Linguistic Data - March 15, 2002 1

1

Utrecht Linguistic Database

Maaike Schoorlemmer Lennart Herlaar

Harmen van der Iest

Martin Everaert Alexis Dimitriadis Peter Ackema

Computational Tools for Linguistic Data March 15, 2002

2

Introduction

• Project overview

• Some examples

• Rapid Application Development

• Current status

• Concluding remarks

(2)

Computational Tools for Linguistic Data - March 15, 2002 2

3

Project overview (1)

• Project goal

• Research tool

• Functional requirements

• Storage of linguistic material of any size and type

• Annotation of any type at any level

• Search facilities

4

Project overview (2)

• Project management

• Linguistic feature design

• Technical infrastructure design

• ER modeling

• Database implementation

• Rapid Application Development (RAD)

(3)

Computational Tools for Linguistic Data - March 15, 2002 3

5

Linguistic features

• Data

• Annotations

• Judgments

• Paradigms

• Languages

• Typology

• Spellings

• Scripts

• Sources

• Questionnaires

• People

• History

6

Technical infrastructure

• Components

• MS-SQL Server database on Windows 2000 platform for data storage and management

ULD client RDBMS (MS-SQL)

Database WWW server WWW client

• Fully Unicode compliant

• Keyboard mappings

• Delphi 6 client on Windows 9x/NT/2000/XP platform for data maintenance

• WWW-based client

for data querying

(4)

Computational Tools for Linguistic Data - March 15, 2002 4

7

Entity-Relationship model

8

Database implementation

(5)

Computational Tools for Linguistic Data - March 15, 2002 5

9

Samples and examples

Gisteren vond ik een boek op straat Example

Ik heb het maar meegenomen Example

Ik heb haar maar meegenomen Example

Sample

Sample

10

Relevant partial ER model

(6)

Computational Tools for Linguistic Data - March 15, 2002 6

11

Bracketing and trees

AdvP gisteren C0

vond DP ik I0

tV DP

tDP DP

D een

NP

N boek

PP

DP straat P

op

tV CP

IP

VP

[CPGisteren vond [IPik tv[VPtDP[V’[DPeen [NPboek]][V’[PPop [DPstraat ]]tv]]]]]

12

Relevant partial ER model

(7)

Computational Tools for Linguistic Data - March 15, 2002 7

13

Annotations

Gisteren vond ik een boek op straat

Ik heb het maar meegenomen

Label : Subject inversion Type : Word order

Annotation

Label : Coreferent Type : Coreference

Annotation

14

Relevant partial ER model

(8)

Computational Tools for Linguistic Data - March 15, 2002 8

15

Judgments

Type : * Analyst : 2273

Judgment Gisteren vond ik een boek op straat

Ik heb haar maar meegenomen

Label : Coreferent Type : Coreference

Annotation

16

Relevant partial ER model

(9)

Computational Tools for Linguistic Data - March 15, 2002 9

17

Annotations and judgments

Ik nam het maar mee

Gisteren vond ik een boek op straat

Label : Closure

Type : Narrative structure

Annotation

Type : * Analyst : 2273

Judgment

18

Morphological analysis

[vond-Ø]

find

Label : Past

Type : Morphological gloss

Annotation

Label : Sg

Type : Morphological gloss

Annotation

Label : Stem

Type : Morphological gloss

Annotation

Label : Umlaut

Type : Morphological gloss

Annotation

(10)

Computational Tools for Linguistic Data - March 15, 2002 10

19

Relevant partial ER model

20

Groupings

[vond-en]

find

Label : Sg

Type : Morph. gloss

Annotation

Label : Pl

Type : Morph. gloss

Annotation [vond-Ø]

find

Label : Past

Type : Morph. gloss

Annotation

Label : Stem Type : Morph. gloss

Annotation

Label : Umlaut Type : Morph. gloss

Annotation

Type : Paradigm

Name : Past tense “vinden”

Grouping

vond Sg

vonden Pl

(11)

Computational Tools for Linguistic Data - March 15, 2002 11

21

Relevant partial ER model

22

Groupings

Gisteren vond ik een boek op straat

Ik heb het maar meegenomen

Ik heb haar maar meegenomen

Label : Coreferent Type : Coreference

Annotation

Type : OK Analyst : 2273

Judgment

Type : Contrast

Name : Pronominal reference

Grouping

Label : Coreferent Type : Coreference

Annotation

Type : * Analyst : 2273

Judgment

(12)

Computational Tools for Linguistic Data - March 15, 2002 12

23

Gisteren vond ik een boek op straat Typical example

Typology

Name: Standard Dutch Language

Language typology Typological property: V2

Condition for use : Main clause

24

Relevant partial ER model

(13)

Computational Tools for Linguistic Data - March 15, 2002 13

25

Conditions for use (1)

Language typology Typological property: V2

Condition for use : Main clause Name: Standard Dutch

Language

If: main clause

unless: topic drop yes-no question non-finite

introduced by conjunction unless: “want”

“dus”

26

Conditions for use (2)

main clause ANDNO (topic drop ORyes-no question ORnon-finite OR (introduced by

conjunction ANDNO (“want” OR“dus”)))

AND

main NO clause

OR OR

topic drop

yes- no question

OR

non-

finite AND

NO introduced

by con- junction OR

“want” “dus”

(14)

Computational Tools for Linguistic Data - March 15, 2002 14

27

Relevant partial ER model

28

Rapid App. Development (1)

(15)

Computational Tools for Linguistic Data - March 15, 2002 15

29

Rapid App. Development (2)

30

Rapid App. Development (3)

(16)

Computational Tools for Linguistic Data - March 15, 2002 16

31

Rapid App. Development (4)

32

Rapid App. Development (5)

(17)

Computational Tools for Linguistic Data - March 15, 2002 17

33

Rapid App. Development (6)

34

Rapid App. Development (7)

(18)

Computational Tools for Linguistic Data - March 15, 2002 18

35

Current status

• Design is ready

• Database is ready

• Data maintenance client is in beta

• Data must be entered

• WWW client must be produced

36

Concluding remarks

• Flexible linguistic database system with required features is feasible

• High flexibility implies high complexity

• 70 database tables

• Seperation of design and implementation

is crucial

References

Related documents

A necessary andsufficient condition to extenda continuous linear real func- tionals which is positive with respect to a semi-group defined on a subspace of a linear space is

Horadam, Basic properties of a certain generalized sequence of numbers, Fibonacci Quart. Horadam

In view of Proposition 3.4, it is enough to prove that there exists a homomorphism ϕ of A onto a finite group X such that element fϕ of X does not belong to the double coset Hϕ gϕ

We examined NMJ morphology after tenotomy at a time point when changes at the NMJ occur after injury [55, 63] and found changes during the period when muscle atrophy could not

Discussion: The findings will help determine whether neuromuscular exercise is superior to traditional quadriceps strengthening regarding effects on knee load, pain and

Whatever the position of grassroots rural movements on the question of biodiversity, given the fact that the future is likely to see them more frequently with land titles, and

To date, there has been no study on the effect of reli- gious beliefs and practices of Buddhism on physical and mental health outcomes in chronic musculoskeletal patients,

In the Serret-Frenet motion, a point on the moving body moves along the curve and the coordinate frame on the moving body remains aligned with the tangent t , normal n and bi-normal