• No results found

Diachronic syntax based on constituency and dependency annotated corpora: theoretical and methodological issues

N/A
N/A
Protected

Academic year: 2021

Share "Diachronic syntax based on constituency and dependency annotated corpora: theoretical and methodological issues"

Copied!
19
0
0

Loading.... (view fulltext now)

Full text

(1)

Universität Stuttgart ⓒ Achim Stein 2013

Diachronic syntax based on

constituency and dependency annotated corpora:

theoretical and methodological issues

Achim Stein (ILR, Universität Stuttgart)

This talk based on collaborative research with Sophie Prévost

(CNRS LaTTiCE) and the members of the ANR/DFG project

(2)

Universität Stuttgart ⓒ Achim Stein, Institut für Linguistik/Romanistik

Principal investigators: Sophie Prévost, Achim Stein

Funding: 2009 – 2012

Agence Nationale de la Recherche ANR (France)

Deutsche Forschungsgemeinschaft DFG (Germany)

Special Session on Romance Parsed Corpora

43rd Linguistic Symposium on Romance Languages

Syntactic Reference Corpus of Medieval French

Institutions and staff:

Paris

: UMR 8094-LaTTiCe (CNRS/ENS Paris):

Sophie Prévost, Julie Glikman

Lyon

: ENS de Lyon

Céline Guillot, Serge Heiden, Alexei Lavrentiev, Tom Rainsford

Stuttgart

: Institut für Linguistik/Romanistik (ILR)

Achim Stein, Beatrice Bischof, Nicolas Mazziotta

(3)

Universität Stuttgart ⓒ Achim Stein 2013

CoNLL based query tools

The annotation workflow

3 Corpora: Base de français médiéval (BFM); Nouveau Corpus d'Amsterdam (NCA) manual annotation with the Notabene tool

(Mazziotta 2010) syntactic structures (RDF graphs) dependency model annotation principles Forum: discussion of grammar and annotation principles correction 1: compare parallel annotations correction 2: review of compared versions queries with TigerSearch (local) or TXM (web) training of dependency parsers XML CoNLL

preparation

work

use

(4)

Universität Stuttgart ⓒ Achim Stein, Institut für Linguistik/Romanistik

Special Session on Romance Parsed Corpora

43rd Linguistic Symposium on Romance Languages

(5)

Universität Stuttgart 5 ⓒ Achim Stein 2013

Penn style constituent structure

Tresqu'en la mer cunquist la tere altaigne. (Chanson de Roland)

Until the sea he conquered the high land.

The noun

phrase (NP)

consists of

"la" and "mer"

(6)

Universität Stuttgart ⓒ Achim Stein 2013

Special Session on Romance Parsed Corpora

43rd Linguistic Symposium on Romance Languages

6

Dependency structure

Tresqu'en la mer cunquist la tere altaigne.

Until the sea he conquered the high land.

(SRCMF, Prévost/Stein 2013)

"Tresqu", "en",

and "la"

depend

on

"mer"

"mer"

depends

on

"cunquist"

(compare with

constituency)

(7)

Universität Stuttgart ⓒ Achim Stein 2013

(8)

Universität Stuttgart 8 ⓒ Achim Stein 2013 classe unité syntaxique fonction structure noeud groupe satellite parenthese modifieur relateur structure maximale structure non-maximale noeud verbal noeud non-verbal [nV] coordonné [Coo] coordination [GpCoo] phrase [Snt]

non-phrase [nSnt] noeud verbal personnel [VFin] noeud verbal infinitif [VInf] noeud verbal participial [VPar]

actant circonstant [Circ] négation [Ng] forclusif [NgPrt] auxilié [Aux] sujet attribut régime [Regim] auxilié actif [AuxA] auxilié passif [AuxP] apostrophe [Apst]

interjection [Int] insertion [Insrt]

sujet personnel [SjPer] sujet impersonnel [SjImp]

modifieur attaché [ModA] modifieur détaché [ModD]

attribut de sujet [AtSj] attribut d'objet [AtObj] attribut du réfléchi [AtRfc]

objet [Obj] complément [Cmpl]

réfléchi [Rfc] réfléxif renforcé [Rfx]

relateur coordonnant [RelC] relateur non-coordonnant [RelNC]

The class hierarchy of

SRCMF categories

syntactic entities

(9)

Universität Stuttgart ⓒ Achim Stein 2013

SRCMF grammar: heads and "functional categories"

9

(TUT, Bosco 2004)

Turin University Treebank

Functional categories = heads

e.g. prepositional phrase:

in > quei > giorni

(

in > these > days)

SRCMF

Lexical categories = heads

e.g. prepositional phrase

mer > outre

(sea > over)

preposition

noun

noun

preposition

verb

conjunction

(10)

Universität Stuttgart ⓒ Achim Stein, Institut für Linguistik/Romanistik

A duplicate is a double reference to a node (not two forms).

Duplicates allow for the assignment of a second relation to the node.

Duplicates are used in relative clauses and contracted forms.

Special Session on Romance Parsed Corpora

43rd Linguistic Symposium on Romance Languages

SRCMF grammar: duplicates

Examples:

In (1) the relative pronoun

qui

is a non-coordinating relator (RelNC).

Its duplicate is a subject (SjPer).

In (2) the contracted form

nes

(=

ne

+

les)

is a negation (Ng).

Its duplicate is an object (Obj).

(1) Souffrance si est semblable a esmeraude

qui

toz jorz est vert.

Sufferance such is like an emerald

which

all day is green.

(2) sovent dit / Qu'or veut morir s'il

nes

ocit.

often says / that now wants die if he

not+them

kills

(11)

Universität Stuttgart ⓒ Achim Stein, Institut für Linguistik/Romanistik

(12)

Universität Stuttgart ⓒ Achim Stein (Institut für Linguistik/Romanistik)

Special Session on Romance Parsed Corpora

43rd Linguistic Symposium on Romance Languages

12

(13)
(14)

Universität Stuttgart ⓒ Achim Stein, Institut für Linguistik/Romanistik

Special Session on Romance Parsed Corpora

43rd Linguistic Symposium on Romance Languages

(15)

Universität Stuttgart 15 Achim Stein 2013

(16)

Universität Stuttgart ⓒ Achim Stein 2013

Special Session on Romance Parsed Corpora

43rd Linguistic Symposium on Romance Languages

16

Parser: mate tools (Bohnet 2010; Björkelund, Bohnet et al. 2010)

training on three different texts (90% of 6508 sentences)

evaluation on 10% (650 sentences)

Encouraging results, considering that

the SRCMF grammar designed is motivated linguistically

no compromise was made to facilitate automatic parsing

SRCMF: a first parsing experiment

Difficulties to guess the right

label: the price for a very

explicit annotation model?

Main error:

Cmpl-Circ

Too few exact matches:

a small number of

(17)

Universität Stuttgart 17 Achim Stein 2013

(18)

Universität Stuttgart ⓒ Achim Stein 2013

Special Session on Romance Parsed Corpora

43rd Linguistic Symposium on Romance Languages

Results

18

See the SRCMF homepage:

http://srcmf.org

Publication is on-going:

15 Old French texts, > 250.000 words, > 23.000 sentences.

online access (via TXM web, ENS Lyon)

download formats for local queries

documentation

Re-usable tools

Notabene annotation tool

http://sourceforge.net/projects/notabene/

(19)

Universität Stuttgart ⓒ Achim Stein 2013

Bechhofer, Sean; van Harmelen, Frank; Hendler, Jim; Horrocks, Ian; McGuinness, Deborah L.; F.,

Patel-Schneider Peter; Andrea Stein, Lynn (2004): OWL Web Ontology Language Reference. W3C Recommendation 10 February 2004.

Björkelund, Anders; Bohnet, Bernd; Hafdell, Love; Nugues, Pierre (2010): A high-performance syntactic and semantic dependency parser. Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations, Stroudsburg, PA, USA: Association for Computational Linguistics, 33--36.

Bohnet, Bernd (2010): Top Accuracy and Fast Dependency Parsing is not a Contradiction. Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), Beijing, China: Coling 2010

Organizing Committee, 89--97.

Bosco, Cristina (2004): A Grammatical Relation System for Treebank Annotation. : PhD Thesis, Università degli Studi di Turino.

Guillot, Céline; Marchello-Nizia, Christiane; Lavrentiev, Alexeij (2007): La Base de Français Médiéval (BFM) : états et perspectives. – Kunstmann, Pierre; Stein, Achim (ed.): Le Nouveau Corpus d'Amsterdam. Actes de l'atelier de Lauterbad, 23-26 février 2006, Stuttgart: Steiner.

Martineau, France (2009): Le corpus MCVF. Modéliser le changement: les voies du français. Ottawa: Université d'Ottawa.

Mazziotta, Nicolas (2010): Logiciel NotaBene pour l'annotation linguistique. Annotations et conceptualisations multiples. Recherches qualitatives. Hors-série 'Les actes', 9, 83-94.

Achim Stein et al. (2006): Nouveau Corpus d'Amsterdam. Corpus informatique de textes littéraires d'ancien français (ca 1150-1350), établi par Anthonij Dees (Amsterdam 1987), remanié par Achim Stein, Pierre

Kunstmann et Martin-D. Gleßgen. Stuttgart: Institut für Linguistik/Romanistik.

Stein, Achim; Prévost, Sophie (2013): Syntactic annotation of medieval texts: the Syntactic Reference Corpus of Medieval French (SRCMF). – Bennett, Paul; Durrell, Martin; Scheible, Silke; Whitt, Richard (ed.): New

Methods in Historical Corpus Linguistics, Tübingen: Narr.

References

Related documents

Uit het interview met de gemeente Enschede kwam naar voren dat in deze gemeente er met het doel inburgeraars te informeren gebruik gemaakt is van de volgende beleidsinstrumenten:

Consumentenbesluitvorming wordt gebruikt om aan te geven hoe besluitvormingstrategie gebruikt wordt bij verschillende promotieoperaties en hoe Johma deze in de toekomst kan

1 Maak licht en klein 2 Zorg voor hygiëne 3 Plaats onopvallend 4 Plaats in persoonlijke ruimte 5 Volg menselijke vormtaal 6 Neem beweging als bron 7 Houd rekening

 Consumer Behaviour towards Apparel Brands were considered based on ten factors such as Brand name, Product Category, Criteria for Apparel Purchase, General

We conclude with some remarks about compactness. Recall that C is a compact subset of W ’ and also that the product fails to be jointly continuous in the topology of

r0 = √ E , the wave number of the plane waves tends to zero and this indicates the existence of travelling plane wave train solutions near the limit cycle.. This satisfies

For each association of changes in BMD and changes in lean mass that was reported by at least three papers depending on the site of measurement (i.e., total hip, femoral neck,

In models that included both objective and self-reported measures, higher population density and higher percentage of parcel area devoted to retail uses remained