• No results found

The Italian Particle “ne”: Corpus Construction and Analysis

N/A
N/A
Protected

Academic year: 2020

Share "The Italian Particle “ne”: Corpus Construction and Analysis"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)
(2)
(3)
(4)

Figure 5: Characteristics of the antecedents ofnein the development data

11.2 tokens but only 1.6 nouns.3

Building on these observations, as well as on experience on other kinds of non-standard anaphora, we developed a recency-based baseline which for each ne selects as an-tecedent the preceding closest noun. Obviously, this simple approach would never get the correct antecedent whenever this is a VP or an S, by definition. Therefore, the upper-bound on the development data is 94%, which corresponds to the proportion of NP antecedents. On the development data, this baseline finds 37 correct antecedents, yielding an accuracy of 29.8%.

If the antecedent type were to be known, and the baseline could assign, where appropriate, the closest verb or the pre-vious sentence (based on the pre-existing sentence bound-ary markup in the corpus) rather than closest noun, the over-all accuracy on the development data would be 0.347, with the breakdown per type reported in Table 3.

Table 3: Accuracy of the recency-based baseline on the de-velopment data if the antecedent type was known

antecedent type #cases # correct accuracy

NP 116 37 0.319

VP 4 4 1.000

S 4 2 0.500

all 124 43 0.347

When run on the test data, which includes 79 valid samples, the recency baseline correctly identified 20 antecedents, achieving an overall accuracy of 29%.

5.2. Determination ofnetype

Although there seem to be some indicators of (non)partitiveness, the determination of the ne type is not that simple. In spite of the amount of attention that partitive uses have received in the literature, our data shows that non-partitive occurrences are far more frequent.

3We could not count full NPs since our data is not

chun-ked, but we counted as intervening nouns between antecedent and anaphor only NOUN-tagged tokens that were not included in the antecedent NP.

Indeed, a most-frequent-use baseline, which would assign a non-partitive interpretation to all occurrences of ne, would achieve an accuracy of 75% on the development data and 65% on the test data.

A preliminary analysis of the development data has shown that two important indicators of a partitive use are (i) the parallelism between thene’s predicate and the antecedent’s predicate and (ii) a plural antecedent. However, for such features to be exploited, we would need not only to know the antecedent already (see discussion above), but also to parse the data so as to find the relevant predicates. Depen-dency parsing for Italian has recently witnessed new inter-est thanks to the organisation of a shared evaluation task (Bosco et al., 2007). The best parser yielded an accuracy of around 87% (Lesmo, 2007), suggesting that we might be able to use reasonably precise syntactic information, even when this is acquired automatically. Future work will ex-plore these avenues.

6.

Outlook

We aim at extending this work in two directions. On the one hand, we would like to increase the size of our corpus so as to have larger figures (even for rarer phenomena) that would allow extensive experimenting with statistical mod-elling. We also plan to extend the annotation to clitic uses of ne, which cover approximately one fourth of the total occurrences in the “la Repubblica Corpus” and have been left out in the present study, but might show an interest-ing, and possibly different, behaviour. On the other hand, once assessed the difficulty of the resolution tasks by means of the two simple baselines that we have described in this paper, we are moving forward to developing more linguis-tically motivated resolution algorithms, both in a statistical and symbolic fashion.

7.

References

(5)

Corpus of Newspaper Italian. In Proceedings of LREC 2004.

A. Belletti and L. Rizzi. 1981. The syntax of ne: some implications. The Linguistic Review, 1:117–154. Cristina Bosco, Alessandro Mazzei, and Vincenzo

Lom-bardo. 2007. Evalita Parsing Task: an analysis of the first parsing system contest for Italian. Intelligenza Artificiale, IV(2):30–33.

L. Burzio. 1986. Italian Syntax: A Government-Binding Approach. Reidel, Dordrecht.

H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. 2002. GATE: A framework and graphical development environment for robust NLP tools and applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics.

Daniel Hardt. 1997. An Empirical Approach to VP Ellip-sis. Computational Linguistics, 23(4):525–541.

Leonardo Lesmo. 2007. The rule-based parser of the nlp group of the university of torino. Intelligenza Artificiale, IV(2):46–47.

Natalia N. Modjeska. 2002. Lexical and grammatical role constraints in resolving other-anaphora. In Proc. of DAARC 2002, pages 129–134, Lisbon, Portugal. Tadashi Nomoto and Yoshihiko Nitta. 1993. Resolving

zero anaphora in japanese. In Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics, pages 315–321, Morristown, NJ, USA. Association for Computational Linguistics. L. Renzi, G. Salvi, and A. Cardinaletti, editors. 2001.

Grande Grammatica di Consultazione dell’Italiano – Voll. I-III. Il Mulino.

A. Sorace. 2000. Gradients in auxiliary selection with in-transitive verbs. Language, 76:859–890.

Figure

Table 1: Inter-annotator agreement (f-score and kappa) forthe three annotation classes.classfeaturef-scorekappa
Table 2: Overview of test set#occurrences
Figure 5: Characteristics of the antecedents of ne in the development data

References

Related documents

Analysis of SCADA data for early fault detection, with application to the maintenance management of wind turbines. Ageing assessment of a wind turbine over time by interpreting

ON BASIN OF ZERO-SOLUTIONS TO A SEMILINEAR PARABOLIC EQUATION WITH ORNSTEIN-UHLENBECK OPERATOR YASUHIRO FUJITA.. Received 27 April 2005; Accepted 10

's Commissioner v. If the thief was apprehended, the property was returned to the victim. Here the court compared the embezzler to a borrower and the employer to

The ripple carry adders which were required for add- ing the partial products were constructed using HNG gates. This design has high speed, smaller area and less power consumption

Guided by the work of Eric Sharpe on the history of the comparative study of religion, the article locates Schleiermacher in the context of the state of knowledge about the

God’s own Logos (Wisdom and Word) was made flesh in Jesus the Christ in such a comprehensive manner that God, by assuming the particular life-story of Jesus the Jew from Nazareth,

Present project is assigned us to design and perform structural analysis of the Small rocket motor Mounting Flange along with its components which are subjected to

We evaluate the performance of the HAVS-Cloud framework by a prototype implementation. In the cloud, we deploy our server application based on Java, including one