In House: An Ensemble of Pre Existing Off the Shelf Parsers

(1)

In-House: An Ensemble of Pre-Existing Off-the-Shelf Parsers

Yusuke Miyao♣_{, Stephan Oepen}♠♥_{, and Daniel Zeman}♦

♣_{National Institute of Informatics, Tokyo} ♠_{University of Oslo, Department of Informatics} ♥_{Potsdam University, Department of Linguistics}

♦_{Charles University in Prague, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics}

yusuke@nii.ac.jp,oe@ifi.uio.no,zeman@ufal.mff.cuni.cz

Abstract

This submission to the open track of Task 8 at SemEval 2014 seeks to connect the Task to pre-existing, ‘in-house’ pars-ing systems for the same types of target semantic dependency graphs.

1 Background and Motivation

The three target representations for Task 8 at SemEval 2014,Broad-Coverage Semantic Depen-dency Parsing (SDP; Oepen et al., 2014), are rooted in language engineering efforts that have been under continuous development for at least the past decade. The gold-standard semantic de-pendency graphs used for training and testing in the Task result from largely manual annotation, in part re-purposing and adapting resources like the Penn Treebank (PTB; Marcus et al., 1993), Prop-Bank (Palmer et al., 2005), and others. But the groups who prepared the SDP target data have also worked in parallel on automated parsing systems for these representations.

Thus, for each of the target representations, there is a pre-existing parser, often developed in parallel to the creation of the target dependency graphs, viz. (a) for the DM representation, the parser of the hand-engineered LinGO English Re-source Grammar (ERG; Flickinger, 2000); (b) for PAS, the Enju parsing system (Miyao, 2006), with its probabilistic HPSG acquired through linguis-tic projection of the PTB; and (c) for PCEDT, the scenario for English analysis within the Treex framework (Popel and Žabokrtský, 2010), com-bining data-driven dependency parsing with hand-engineered tectogrammatical conversion. At least

This work is licenced under a Creative Commons At-tribution 4.0 International License; page numbers and the proceedings footer are added by the organizers. http:// creativecommons.org/licenses/by/4.0/

for DM and PAS, these parsers have been exten-sively engineered and applied successfully in a variety of applications, hence represent relevant points of comparison. Through this ‘in-house’ submission (of our ‘own’ parsers to our ‘own’ task), we hope to facilitate the comparison of dif-ferent approaches submitted to the Task with this pre-existing line of parser engineering.

2 DM: The English Resource Grammar

Semantic dependency graphs in the DM target rep-resentation, DELPH-IN MRS-Derived Bi-Lexical Dependencies, stem from a two-step ‘reduc-tion’ (simplification) of the underspecified logical-form meaning representations output natively by the ERG parser, which implements the linguis-tic framework of Head-Driven Phrase Structure Grammar (HPSG; Pollard and Sag, 1994). Gold-standard DM training and test data for the Task were derived from the manually annotated Deep-Bank Treebank (Flickinger et al., 2012), which pairs Sections 00–21 of the venerable PTB Wall Street Journal (WSJ) Corpus with complete ERG-compatible HPSG syntactico-semantic analyses. DeepBank as well as the ERG rely on Minimal Re-cursion Semantics (MRS; Copestake et al., 2005) for meaning representation, such that the exact same post-processing steps could be applied to the parser outputs as were used in originally reducing the gold-standard MRSs from DeepBank into the SDP bi-lexical semantic dependency graphs.

Parsing Setup The ERG parsing system is a hy-brid, combining (a) the hand-built, broad-coverage ERG with (b) an efficient chart parser for uni-fication grammars and (c) a conditional proba-bility distribution over candidate analyses. The parser most commonly used with the ERG, called PET (Callmeier, 2002),1 _{constructs a complete,} 1_{The SDP test data was parsed using the 1212 release} of the ERG, using PET and converter versions from what

(2)

subsumption-based parse forest of partial HPSG derivations (Oepen and Carroll, 2000), and then extracts from the forest n-best lists (in globally correct rank order) of complete analyses according to a discriminative parse ranking model (Zhang et al., 2007). For our experiments, we trained the parse ranker on Sections 00–20 of DeepBank and otherwise used the default, non-pruning develop-ment configuration, which is optimized for accu-racy. In this setup, ERG parsing on average takes close to ten seconds per sentence.

Post-Parsing Conversion After parsing, MRSs are reduced to DM bi-lexical semantic dependen-cies in two steps. First, Oepen and Lønning (2006) define a conversion to variable-free Ele-mentary Dependency Structures(EDS), which (a) maps each predication in the MRS logical-form meaning representation to a node in a dependency graph and (b) transforms argument relations rep-resented by shared logical variables into directed dependency links between graph nodes. This first step of the conversion is ‘mildly’ lossy, in that some scope-related information is discarded; the EDS graph, however, will contain the same num-ber of nodes and the same set of argument de-pendencies as there are predications and semantic role assignments in the original MRS. In particu-lar, the EDS may still reflect non-lexical semantic predications introduced by grammatical construc-tions like covert quantifiers, nominalization, com-pounding, or implicit conjunction.2

Second, in another conversion step that is not information-preserving, the EDS graphs are fur-ther reduced into strictly bi-lexical form, i.e. a set of directed, binary dependency relations holding exclusively between lexical units. This conversion is defined by Ivanova et al. (2012) and seeks to (a) project some aspects of construction seman-tics onto word-to-word dependencies (for example introducing specific dependency types for com-pounding or implicit conjunction) and (b) relate the linguistically informed ERG-internal tokeniza-tion to the conventokeniza-tions of the PTB.3_{Seeing as both} is called the LOGON SVN trunk as of January 2014; see

http://moin.delph-in.net/LogonTopfor detail. 2_{Conversely, semantically vacuous parts of the original} input (e.g. infinitival particles, complementizers, relative pro-nouns, argument-marking prepositions, auxiliaries, and most punctuation marks) were not represented in the MRS in the first place, hence have no bearing on the conversion.

3_{Adaptations of tokenization encompass splitting} ‘multi-word’ ERG tokens (likesuch asorad hoc), as well as ‘hiding’ ERG token boundaries at hyphens or slashes (e.g.

77-year-conversion steps are by design lossy, DM seman-tic dependency graphs present a true subset of the information encoded in the full, original MRS.

3 PAS: The Enju Parsing System

Enju Predicate–Argument Structures (PAS) are derived from the automatic HPSG-style annota-tion of the PTB, which was primarily used for the development of the Enju parsing system4_(Miyao, 2006). A notable feature of this parser is that the grammar is not developed by hand; instead, the Enju HPSG-style treebank is first developed, and the grammar (or, more precisely, the vast major-ity of lexical entries) is automatically extracted from the treebank (Miyao et al., 2004). In this ‘projection’ step, PTB annotations such as empty categories and coindexation are used for deriv-ing the semantic representations that correspond to HPSG derivations. Its probabilistic model for disambiguation is also trained using this treebank (Miyao and Tsujii, 2008).5

The PAS data set is an extraction of predicate– argument structures from the Enju HPSG tree-bank. The Enju parser outputs results in ‘ready-to-use’ formats like phrase structure trees and predicate–argument structures, as full HPSG anal-yses are not friendly to users who are not famil-iar with the HPSG theory. The gold-standard PAS target data in the Task was developed using this function; the conversion program from full HPSG analyses to predicate–argument structures was ap-plied to the Enju Treebank.

Predicate–argument structures (PAS) represent word-to-word semantic dependencies, such as se-mantic subject and object. Each dependency type is represented with two elements: the type of the predicate, such as verb and adjective, and the ar-gument label, such asARG1andARG2.6

old), which the PTB does not split.

4_See_{http://kmcs.nii.ac.jp/enju/}_.

5_{Abstractly similar to the ERG, the annotations of the} Enju treebank instantiate the linguistic theory of HPSG. However, the two resources have been developed indepen-dently and implementation details are quite different. The most significant difference is that the Enju HPSG treebank is developed by linguistic projection of PTB annotations, and the Enju parser derived from the treebank; conversely, the ERG was predominantly manually crafted, and it was later applied in the DeepBank re-annotation of the WSJ Corpus.

(3)

Parsing Setup Basically we used the publicly available package of the Enju parser ‘as is’ (see the above web site). We did not change default pars-ing parameters (beam width, etc.) and features. However, the release version of the Enju parser is trained with the HPSG treebank corresponding to the Penn Treebank WSJ Sections 2–21, which in-cludes the test set of the Task (Section 21). There-fore, we re-trained the Enju parser using Sections 0–20, and used this re-trained parser in preparing the PAS semantic dependency graphs in this en-semble submission.

Post-Parsing Conversion The dependency for-mat of the Enju parser is almost equivalent to what is provided as the PAS data set in this shared task. Therefore, the post-parsing conversion for the PAS data involves only formatting, viz. (a) format con-version into the tabular file format of the Task; and (b) insertion of dummy relations for punctuation tokens ignored in the output of Enju.7

4 PCEDT: The Treex Parsing Scenario

ThePrague Czech-English Dependency Treebank

(PCEDT; Hajiˇc et al., 2012)8_{is a set of parallel} de-pendency trees over the same WSJ texts from the Penn Treebank, and their Czech translations. Sim-ilarly to other treebanks in the Prague family, there are two layers of syntactic annotation: analytical

(a-trees) and tectogrammatical (t-trees). Unlike for the other two representations used in the Task, for PCEDT there is no pre-existing parsing system designed to deliver the full scale of annotations of the SDP gold-standard data. The closest avail-able match is a parsing scenario implemented in the Treex natural language processing framework.

Parsing Setup Treex9 _{(Popel and Žabokrtský,} 2010) is a modular, open-source framework origi-nally developed for transfer-based machine trans-lation. It can accomplish any NLP-related task by sequentially applying to the same piece of data variousblocksof code. Blocks operate on a com-mon data structure and are chained inscenarios.

Some early experiments with scenarios for tec-togrammatical analysis of English were described by Klimeš (2007). It is of interest that they report

7_{The Enju parser ignores tokens tagged as ‘}_._{’, while} the PAS representation includes them with dummy relations; thus, missing periods are inserted in post-processing by com-parison to the original PTB token sequence.

8_See_{http://ufal.mff.cuni.cz/pcedt2.0/}_. 9_See_{http://ufal.mff.cuni.cz/treex/}_.

U.S. should regulate X more stringently than Y

CPR PAT PRED

ACT

PAT MANN

CPR

PAT

PRED ACT

PAT

[image:3.595.320.513.63.171.2]

MANN CPR

Figure 1: PCEDT asserts two copies of the token

regulate(shown here as ‘regulate’ and ‘’, under-lined). Projecting t-nodes onto the original tokens, required by the SDP data format, means that the node will be merged with regulate. The edges going to and fromwill now lead to and from reg-ulate(see the dotted arcs), which results in a cycle. To get rid of the cycle, we skipand connect di-rectly its children, as shown in the final SDP graph below the sentence.

an F1 score of assigningfunctors(dependency la-bels in PCEDT terminology) of 70.3%; however, their results are not directly comparable to ours.

Due to the modular nature of Treex, there are various conceivable scenarios to get the t-tree of a sentence. We use the default scenario that con-sists of 48 blocks: two initial blocks (reading the input), one final block (writing the output), two A2N blocks (named entity recognition), twelve W2Ablocks (dependency parsing at the analytical layer) and 31A2T and T2T blocks (creating the t-tree based on the a-tree).

Most blocks are highly specialized in one par-ticular subtask (e.g. there is a block just to make sure that quotation marks are attached to the root of the quoted subtree). A few blocks are respon-sible for the bulk of the work. The a-tree is con-structed by a block that contains the MST Parser (McDonald et al., 2005), trained on the CoNLL 2007 English data (Nivre et al., 2007), i.e. Sec-tions 2–11 of the PTB, converted to dependencies. The annotation style of CoNLL 2007 differs from PCEDT 2.0, and thus the unlabeled attachment score of the analytical parser is only 66%.

(4)

bet-John brought and ate ripe apples and pears

ACT

CONJ CONJ

PRED.m PRED.m

RSTR

PAT.m PAT.m

PAT

TOP TOP

PAT PAT

ACT

CONJ.m CONJ.m

RSTR RSTR

PAT

[image:4.595.310.523.60.147.2]

CONJ.m CONJ.m

Figure 2: Coordination in PCEDT t-tree (above) and in the corresponding SDP graph (below).

ter informed than the other systems participating in the Task.

Functor assignment is done heuristically, based on POS tags and function words. The primary focus of the scenario was on functors that could help machine translation, thus it only generated 25 different labels (of the total set of 65 labels in the SDP gold-standard data)10_{and left about 12%} of all nodes without functors. Precision peaks at 78% for ACT(or) relations, while the most fre-quent error type (besides labelless dependencies) is a falsely proposedRSTR(iction) relation. Both

ACTandRSTRare among the most frequent de-pendency types in PCEDT.

Post-Parsing Conversion Once the t-tree has been constructed, it is converted to the PCEDT target representation of the Task, using the same conversion code that was used to prepare the gold-standard SDP data.11

SDP graphs are defined over surface tokens but the set of nodes of a t-tree need not correspond one-to-one to the set of tokens. For example, there are no t-nodes for punctuation and function words (except in coordination); these tokens are rendered as semantically vacuous in SDP, i.e. they do not participate in edges. On the other hand, t-trees can contain generated nodes, which represent elided words and do not correspond to any surface

to-10_{The system was able to output the following functors} (or-dered in the descending order of their frequency in the sys-tem output):RSTR,PAT,ACT,CONJ.member,APP,MANN, LOC,TWHEN,DISJ.member,BEN,RHEM,PREC,ACMP, MEANS,ADVS.member,CPR, EXT,DIR3, CAUS, COND, TSIN,REG,DIR2,CNCS, andTTILL.

11_{In the SDP context, the target representation derived} from the PCEDT is called by the same name as the origi-nal treebank; but note that the PCEDT semantic dependency graphs only encode a subset of the information annotated at the tectogrammatical layer of the full treebank.

DM PAS PCEDT

LF LM LF LM LF LM

Priberam .8916 .2685 .9176 .3783 .7790 .1068 In-House .9246 .4807 .9206 .4384 .4315 .0030

UF UM UF UM UF UM

Priberam .9032 .2990 .9281 .3924 .8903 .3071 In-House .9349 .5230 .9317 .4429 .6919 .0148

Table 1: End-to-end ‘in-house’ parsing results.

ken. Most generated nodes are leaves and, thus, can simply be omitted from the SDP graphs. Other generated nodes are copies of normal nodes and they are linked to the same token to which the source node is mapped. As a result, one token can appear at several different positions in the tree; if we project these occurrences into one node, the graph will contain cycles. We decided to remove all generated nodes causing cycles. Their chil-dren are attached to their parents and inherit the functor of the generated node (Figure 1). The con-version procedure also removes cycles caused by more fine-grained tokenization of the t-layer.

Furthermore, t-trees use technical edges to cap-ture paratactic constructions where the relations are not ‘true’ dependencies. The conversion pro-cedure extracts true dependency relations: Each conjunct is linked to the parent or to a shared child of the coordination. In addition, there are also links from the conjunction to the conjuncts and they are labeledCONJ.m(ember). These links pre-serve the paratactic structure (which can even be nested) and the type of coordination. See Figure 2 for an example.

5 Results and Reflections

Seeing as our ‘in-house’ parsers are not directly trained on the semantic dependency graphs pro-vided for the Task, but rather are built from ad-ditional linguistic resources, we submitted results from the parsing pipelines sketched in Sections 2 to 4 above to the open SDP track. Table 1 summarizes parser performance in terms of la-beled and unlala-beled F1 (LF and UF)12 and full-sentence exact match (LM and UM), comparing to the best-performing submission (dubbed Prib-eram; Martins and Almeida, 2014) to this track. Judging by the official SDP evaluation metric, av-erage labeled F1 over the three representations, our ensemble ranked last among six participating

[image:4.595.79.286.63.206.2]

(5)

teams; in terms of unlabeled average F1, the ‘in-house’ submission achieved the fourth rank.

As explained in the task description (Oepen et al., 2014), parts of the WSJ Corpus were excluded from the SDP training and testing data because of gaps in the DeepBank and Enju treebanks, and to exclude cyclic dependency graphs, which can sometimes arise in the DM and PCEDT conver-sions. For these reasons, one has to allow for the possibility that the testing data is positively bi-ased towards our ensemble members.13 _{But even} with this caveat, it seems fair to observe that the ERG and Enju parsers both are very competitive for the DM and PAS target representations, respec-tively, specifically so when judged in exact match scores. A possible explanation for these results lies in the depth of grammatical information avail-able to these parsers, where DM or PAS seman-tic dependency graphs are merely a simpliefied view on the complete underlying HPSG analyses. These parsers have performed well in earlier con-trastive evaluation too (Miyao et al., 2007; Bender et al., 2011; Ivanova et al., 2013; inter alios).

Results for the Treex English parsing scenario, on the other hand, show that this ensemble mem-ber is not fine-tuned for the PCEDT target rep-resentation; due to the reasons mentioned above, its performance even falls behind the shared task baseline. As is evident from the comparison of labeled vs. unlabeled F1 scores, (a) the PCEDT parser is comparatively stronger at recovering se-mantic dependencystructurethan at assigning la-bels, and (b) about the same appears to be the case for the best-performing Priberam system (on this target representation).

Acknowledgements

Data preparation and large-scale parsing in the DM target representation was supported through access to the ABEL high-performance computing facilities at the University of Oslo, and we ac-knowledge the Scientific Computing staff at UiO, the Norwegian Metacenter for Computational Sci-ence, and the Norwegian tax payers. This project has been supported by the infrastructural funding

13_{There is no specific evidence that the WSJ sentences} ex-cluded in the Task for technical issues in either of the under-lying treebanks or conversion procedures would be compara-tively much easier to parse for other submissions than for the members of our ‘in-house’ ensemble, but unlike other sys-tems these parsers ‘had a vote’ in the selection of the data, particularly so for the DM and PAS target representations.

by the Ministry of Education, Youth and Sports of the Czech Republic (CEP ID LM2010013).

References

Bender, E. M., Flickinger, D., Oepen, S., and Zhang, Y. (2011). Parser evaluation over local and non-local deep dependencies in a large cor-pus. InProceedings of the 2011 Conference on Empirical Methods in Natural Language Pro-cessing (p. 397 – 408). Edinburgh, Scotland, UK.

Callmeier, U. (2002). Preprocessing and encoding techniques in PET. In S. Oepen, D. Flickinger, J. Tsujii, and H. Uszkoreit (Eds.), Collabora-tive language engineering. A case study in effi-cient grammar-based processing(p. 127 – 140). Stanford, CA: CSLI Publications.

Copestake, A., Flickinger, D., Pollard, C., and Sag, I. A. (2005). Minimal Recursion Seman-tics. An introduction. Research on Language and Computation,3(4), 281 – 332.

Flickinger, D. (2000). On building a more ef-ficient grammar by exploiting types. Natural Language Engineering,6 (1), 15 – 28.

Flickinger, D., Zhang, Y., and Kordoni, V. (2012). DeepBank. A dynamically annotated treebank of the Wall Street Journal. InProceedings of the 11th International Workshop on Treebanks and Linguistic Theories(p. 85 – 96). Lisbon, Portu-gal: Edições Colibri.

Hajiˇc, J., Hajiˇcová, E., Panevová, J., Sgall, P., Bojar, O., Cinková, S., . . . Žabokrtský, Z. (2012). Announcing Prague Czech-English De-pendency Treebank 2.0. InProceedings of the 8th International Conference on Language Re-sources and Evaluation(p. 3153 – 3160). Istan-bul, Turkey.

Ivanova, A., Oepen, S., Dridan, R., Flickinger, D., and Øvrelid, L. (2013). On different approaches to syntactic analysis into bi-lexical dependen-cies. An empirical comparison of direct, PCFG-based, and HPSG-based parsers. InProceedings of the 13th International Conference on Parsing Technologies(p. 63 – 72). Nara, Japan.

(6)

A contrastive study of syntacto-semantic depen-dencies. InProceedings of the Sixth Linguistic Annotation Workshop(p. 2 – 11). Jeju, Republic of Korea.

Klimeš, V. (2007). Transformation-based tec-togrammatical dependency analysis of English. In V. Matoušek and P. Mautner (Eds.), Text, speech and dialogue 2007, LNAI 4629(p. 15 – 22). Berlin / Heidelberg, Germany: Springer.

Marcus, M., Santorini, B., and Marcinkiewicz, M. A. (1993). Building a large annotated cor-pora of English: The Penn Treebank. Computa-tional Linguistics,19, 313 – 330.

Martins, A. F. T., and Almeida, M. S. C. (2014). Priberam. A turbo semantic parser with second order features. In Proceedings of the 8th In-ternational Workshop on Semantic Evaluation.

Dublin, Ireland.

McDonald, R., Pereira, F., Ribarov, K., and Hajiˇc, J. (2005). Non-projective dependency parsing using spanning tree algorithms. InProceedings of the Human Language Technology Conference and Conference on Empirical Methods in Nat-ural Language Processing(p. 523 – 530). Van-couver, British Columbia, Canada.

Miyao, Y. (2006). From linguistic theory to syntactic analysis. Corpus-oriented grammar development and feature forest model. Doc-toral Dissertation, University of Tokyo, Tokyo, Japan.

Miyao, Y., Ninomiya, T., and Tsujii, J. (2004). Corpus-oriented grammar development for ac-quiring a Head-Driven Phrase Structure Gram-mar from the Penn Treebank. InProceedings of the 1st International Joint Conference on Natu-ral Language Processing(p. 684 – 693).

Miyao, Y., Sagae, K., and Tsujii, J. (2007). Towards framework-independent evaluation of deep linguistic parsers. In Proceedings of the 2007 Workshop on Grammar Engineering across Frameworks (p. 238 – 258). Palo Alto, California.

Miyao, Y., and Tsujii, J. (2008). Feature forest models for probabilistic HPSG parsing. Com-putational Linguistics,34(1), 35 – 80.

Nivre, J., Hall, J., Kübler, S., McDonald, R., Nils-son, J., Riedel, S., and Yuret, D. (2007). The CoNLL 2007 shared task on dependency pars-ing. In Proceedings of the 2007 Joint Con-ference on Empirical Methods in Natural Lan-guage Processing and Conference on Natural Language Learning (p. 915 – 932). Prague, Czech Republic.

Oepen, S., and Carroll, J. (2000). Ambiguity packing in constraint-based parsing. Practical results. InProceedings of the 1st Meeting of the North American Chapter of the Association for Computational Linguistics(p. 162 – 169). Seat-tle, WA, USA.

Oepen, S., Kuhlmann, M., Miyao, Y., Zeman, D., Flickinger, D., Hajiˇc, J., . . . Zhang, Y. (2014). SemEval 2014 Task 8. Broad-coverage seman-tic dependency parsing. In Proceedings of the 8th International Workshop on Semantic Evalu-ation. Dublin, Ireland.

Oepen, S., and Lønning, J. T. (2006). Discriminant-based MRS banking. In Proceed-ings of the 5th International Conference on Language Resources and Evaluation(p. 1250 – 1255). Genoa, Italy.

Palmer, M., Gildea, D., and Kingsbury, P. (2005). The Proposition Bank. A corpus annotated with semantic roles. Computational Linguistics,

31(1), 71 – 106.

Pollard, C., and Sag, I. A. (1994). Head-Driven Phrase Structure Grammar. Chicago, USA: The University of Chicago Press.

Popel, M., and Žabokrtský, Z. (2010). TectoMT. Modular NLP framework. Advances in Natural Language Processing, 293 – 304.