• No results found

Sublanguage Dependent Evaluation: Toward Predicting NLP performances

N/A
N/A
Protected

Academic year: 2020

Share "Sublanguage Dependent Evaluation: Toward Predicting NLP performances"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)
(2)
(3)
(4)
(5)

6.

Conclusion and future works

To conclude, we show that it is possible to partition the data in way where the evaluation results are more homoge-neous (reduced standard mean deviation) inside each sub-set than on the global sub-set. Furthermore, this partition can be based either from an existing classification or from an induced one, using easy to extract features. This opens new perspectives in the reuse of evaluation results to help in choosing wich approach is the best suited for handling a specific language subset.

From these preliminary experiments, it seems possible to have, at no extra cost, evaluation campaigns which will provide an idea of the robustness of the participating sys-tems with respect to data heterogeneity.

Future works will concern extensive tests, on larger data, using different feature sets to induce the most appro-priate classification and the application of our methodology to other control tasks than POS tagging.

7.

References

Gilles Adda, Josette Lecomte, Joseph Mariani, Patrick Paroubek, and Martin Rajman. 1997. Les procédures de mesures automatique de l’action grace pour l’évaluation des assignateurs de parties de discours pour le fran cais. In JST-FRANCIL, Avignon.

Douglas Biber. 1993. Using register-diversified corpora for general language studies. Computational Linguistics, 19(2):243–258.

Kessler Brett, Nunberg Geoffrey, and Hinrich Schuetze. 1997. Automatic detection of text genre. In Proceedings

ACL/EACL, pages 32–38, Madrid.

R. Cunningham. 1997. Software infrastructure for natural language processing. In ANLP’97.

Stephen J. DeRose. 1988. Grammatical category dis-ambiguation by statistical optimization. Computational

Linguistics, 14(1):31–39.

Helka Folch, Serge Heiden, Benoit Habert, Serge Fleury, Gabriel Illouz, Pierre Lafon, Julien Nioche, and Sophie Prévost. 2000. Typtex : Inductive typological text clas-sification by multivariate statistical analysis for nlp sys-tems tuning/evaluation. Computational Linguistics. D. Harman. 1998. The text retrieval conference (trecs) and

the cross-language track. In LREC, Grenade, Espagne. L. Hirshman. 1998. Language understanding evaluations:

Lessons learned from muc and atis. In LREC, Grenade, Espagne.

Michele Jardino and Christelle Beaujard. 1997. Role du contexte dans les modèles de langage ’n-classes’ applica-tion et évaluaapplica-tion sur mask et railtel. In Actes des 1 ères

Journées Scientifiques et Techniques du Réseau Franco-phone de l’Ingénierie de la Langue de l’Aupelf-Uref.

Jussi Karlgren and Douglas Cutting. 1994. Recognizing text genres with simple metrics using discriminant anal-ysis. In COLING, pages 1071–1075, Kyoto, Japan. Adam Kilgariff. 1998. Senseval: An excercise in

evalu-ating word sense disambiguation programs. In LREC, Grenade, Espagne.

Mary Ann Marcinkiewicz Marcus Mitchell, Beatrice San-torini. 1993. Building a large annoted corpus of

en-glish: the penn treebank. Computational Linguistics, 19(1):313–330.

Satoshi Sekine. 1998. The domain dependence of parsing. In Fifth Conference on Applied Natural Language

Pro-cessing, pages 96–102, Washington, march-april.

Shmid. 1995. Improvements in part of speech tagging with an application to german. In Proceedings of the ACL

SIGDAT-Workshop.

Johnathan Slocum. 1986. How one might automatically identify and adapt to a sublanguage. In R. Grishman and R. Kittredge, editors, Analyzing Language in

Re-stricted Domains, chapter 11, pages 195–210. Lawrence

Figure

Table 1: Performance Variations according to text genre
Table 2: Error Analysis according to a dumb baseline

References

Related documents

We have assessed how bank integration, as measured by the bilateral bank exposure, can capture the co-movements across housing markets in developed countries using a dynamic

In the case of capacity additions, retrofits or replacements (except for capacity addition projects for which the electricity generation of the existing power

gendarussa leaf extracts from five different locations and flavonoids (kaempferol, naringenin, and a mixture of kaempferol and naringenin) were tested against breast cancer cell

Table (9) shows that there are shows a strong positive relationship between (interactive justice, work ethics) where the (R) was (0.46) and through the value of

have already completed If your organization, community, region or state has already used NOAA’s Roadmap for Adapting to Coastal Risk , ICLEI’s Preparing for Climate

Podoloff DA. Value of SPECT imaging of the thoracolumbar spine in cancer patients. Comparison of MRI and bone scintigraphy in the evaluation of osseous

Most prescribing and use of e-prescribing occurs in ambulatory care settings where prescribing errors have been common 2 and where e-prescribing has great potential to

Iowa State Traveling Men's Ass'n, " the United States Supreme Court held that when a federal district court determined that it had jurisdiction over the person