Computational Tools for Linguistic Data - March 15, 2002 1
1
Utrecht Linguistic Database
Maaike Schoorlemmer Lennart Herlaar
Harmen van der Iest
Martin Everaert Alexis Dimitriadis Peter Ackema
Computational Tools for Linguistic Data March 15, 2002
2
Introduction
• Project overview
• Some examples
• Rapid Application Development
• Current status
• Concluding remarks
Computational Tools for Linguistic Data - March 15, 2002 2
3
Project overview (1)
• Project goal
• Research tool
• Functional requirements
• Storage of linguistic material of any size and type
• Annotation of any type at any level
• Search facilities
4
Project overview (2)
• Project management
• Linguistic feature design
• Technical infrastructure design
• ER modeling
• Database implementation
• Rapid Application Development (RAD)
Computational Tools for Linguistic Data - March 15, 2002 3
5
Linguistic features
• Data
• Annotations
• Judgments
• Paradigms
• Languages
• Typology
• Spellings
• Scripts
• Sources
• Questionnaires
• People
• History
6
Technical infrastructure
• Components
• MS-SQL Server database on Windows 2000 platform for data storage and management
ULD client RDBMS (MS-SQL)
Database WWW server WWW client
• Fully Unicode compliant
• Keyboard mappings
• Delphi 6 client on Windows 9x/NT/2000/XP platform for data maintenance
• WWW-based client
for data querying
Computational Tools for Linguistic Data - March 15, 2002 4
7
Entity-Relationship model
8
Database implementation
Computational Tools for Linguistic Data - March 15, 2002 5
9
Samples and examples
Gisteren vond ik een boek op straat Example
Ik heb het maar meegenomen Example
Ik heb haar maar meegenomen Example
Sample
Sample
10
Relevant partial ER model
Computational Tools for Linguistic Data - March 15, 2002 6
11
Bracketing and trees
AdvP gisteren C0
vond DP ik I0
tV DP
tDP DP
D een
NP
N boek
PP
DP straat P
op
tV CP
IP
VP
[CPGisteren vond [IPik tv[VPtDP[V’[DPeen [NPboek]][V’[PPop [DPstraat ]]tv]]]]]
12
Relevant partial ER model
Computational Tools for Linguistic Data - March 15, 2002 7
13
Annotations
Gisteren vond ik een boek op straat
Ik heb het maar meegenomen
Label : Subject inversion Type : Word order
Annotation
Label : Coreferent Type : Coreference
Annotation
14
Relevant partial ER model
Computational Tools for Linguistic Data - March 15, 2002 8
15
Judgments
Type : * Analyst : 2273
Judgment Gisteren vond ik een boek op straat
Ik heb haar maar meegenomen
Label : Coreferent Type : Coreference
Annotation
16
Relevant partial ER model
Computational Tools for Linguistic Data - March 15, 2002 9
17
Annotations and judgments
Ik nam het maar mee
Gisteren vond ik een boek op straat
Label : Closure
Type : Narrative structure
Annotation
Type : * Analyst : 2273
Judgment
18
Morphological analysis
[vond-Ø]
find
Label : Past
Type : Morphological gloss
Annotation
Label : Sg
Type : Morphological gloss
Annotation
Label : Stem
Type : Morphological gloss
Annotation
Label : Umlaut
Type : Morphological gloss
Annotation
Computational Tools for Linguistic Data - March 15, 2002 10
19
Relevant partial ER model
20
Groupings
[vond-en]
find
Label : Sg
Type : Morph. gloss
Annotation
Label : Pl
Type : Morph. gloss
Annotation [vond-Ø]
find
Label : Past
Type : Morph. gloss
Annotation
Label : Stem Type : Morph. gloss
Annotation
Label : Umlaut Type : Morph. gloss
Annotation
Type : Paradigm
Name : Past tense “vinden”
Grouping
vond Sg
vonden Pl
Computational Tools for Linguistic Data - March 15, 2002 11
21
Relevant partial ER model
22
Groupings
Gisteren vond ik een boek op straat
Ik heb het maar meegenomen
Ik heb haar maar meegenomen
Label : Coreferent Type : Coreference
Annotation
Type : OK Analyst : 2273
Judgment
Type : Contrast
Name : Pronominal reference
Grouping
Label : Coreferent Type : Coreference
Annotation
Type : * Analyst : 2273
Judgment
Computational Tools for Linguistic Data - March 15, 2002 12
23
Gisteren vond ik een boek op straat Typical example
Typology
Name: Standard Dutch Language
Language typology Typological property: V2
Condition for use : Main clause
24
Relevant partial ER model
Computational Tools for Linguistic Data - March 15, 2002 13
25
Conditions for use (1)
Language typology Typological property: V2
Condition for use : Main clause Name: Standard Dutch
Language
If: main clause
unless: topic drop yes-no question non-finite
introduced by conjunction unless: “want”
“dus”
26
Conditions for use (2)
main clause ANDNO (topic drop ORyes-no question ORnon-finite OR (introduced by
conjunction ANDNO (“want” OR“dus”)))
AND
main NO clause
OR OR
topic drop
yes- no question
OR
non-
finite AND
NO introduced
by con- junction OR
“want” “dus”
Computational Tools for Linguistic Data - March 15, 2002 14
27
Relevant partial ER model
28
Rapid App. Development (1)
Computational Tools for Linguistic Data - March 15, 2002 15
29
Rapid App. Development (2)
30
Rapid App. Development (3)
Computational Tools for Linguistic Data - March 15, 2002 16
31
Rapid App. Development (4)
32
Rapid App. Development (5)
Computational Tools for Linguistic Data - March 15, 2002 17
33
Rapid App. Development (6)
34
Rapid App. Development (7)
Computational Tools for Linguistic Data - March 15, 2002 18
35
Current status
• Design is ready
• Database is ready
• Data maintenance client is in beta
• Data must be entered
• WWW client must be produced
36