5.6 Comparing and Combining Grammar-free and Grammar-based Pars-
5.6.1 Grammar-based Dependency Parsing
5.6.4.3 Parser Ensemble via Bagging
The dependency parsing problem can be viewed as a word prediction problem, i.e. finding head of each word. It is convenient to transform dependency parser ensemble to a word voting problem, and the Bagging method is therefore easy to apply. In the training phase, given a training set D of size n, our model generates m new training sets Di of size 61.8% × n by sampling examples from D without replacement. Each Di is separately used to train the Berkeley parser and the graph-based dependency parser. Using this strategy, we can get 2m weak parsers. In the parsing phase, the 2m models outputesults (which are automatically converted to dependency parses) for each given sentence. For every sentence, the final parsing result is a combination of its corresponding 2m structures. We implement two strategies for the combination. Word-by-word voting These 2m dependency trees can be combined in a simple word-by-word voting scheme, where each parser votes for the head of each word in the given sentences, and the head with most votes is assigned to each word. This very simple scheme guarantees that final set of dependencies will have as many votes as possible, but it does not guarantee that the final voted set of dependencies will be
a well formed dependency tree.
Re-parsing To guarantee that the resulting dependency tree is well-formed, we employ the dynamic programming algorithm of [Eisner,1996] for re-parsing.
5.6.4.4 Evaluation
Table 5.11 shows the parsing performance on the development data of the stacking model. Compared to the baseline results (see Table 5.9 and 5.10), we can see that the stacking model is effective to improve the parsing accuracy, with regards to both first-order and second-order dependencies.
Devel. UASdep Fsib Fgrd Stacking 85.41% 71.36 82.94
Table 5.11: Performance of the stacking model.
Table5.12is the re-parsing performance on the development data. When only two baseline parsers are applied to provide dependency candidates, the re-parsing method does not work well. The parsing accuracy slightly decreases, even compared to the weaker baseline performance. When the above stacking parser is also employed, the re-parsing method performs a little better than the best of the three. However, the improvement is too modest.
Devel. UASdep
mate/Berkeley+conversion 83.72%
mate/Berkeley+conversion/stacking 85.60% Table 5.12: Performance of the re-parsing model.
We evaluate our Bagging model on the same data set. In the following exper- iments, we use the standard discriminative POS tagger to provide inputs for the dependency parser. Because each new data set Di in the Bagging algorithm is gen- erated by a random procedure, the performance of all Bagging experiments are not the same. To give a more stable evaluation, we repeat 3 experiments for each m and show the averaged accuracy. Figure 5.7 shows the influence of m in the Bagging al- gorithm. We can see that the Bagging model taking both discriminative dependency models and generative constituency models as basic systems outperform the baseline systems and the Bagging model taking either model in isolation as basic systems. The Bagging method can also improve individual parsing models, and the grammar-based model can be enhanced more.
1 2 3 4 5 6 7 8 9 10 Ber 81.4681.3583.17 83.5 83.9583.9984.3884.3584.5584.63 Mate 83.1483.0783.9184.1884.2784.26 84.4 84.4784.4284.52 Ber+Mate 84.9285.5185.93 86.1 86.2886.3786.3786.4586.49 81 82 83 84 85 86 87 A ve rag e d UA S Bagging-voting 1 2 3 4 5 6 7 8 9 10 Ber 81.7181.6683.2983.42 84.2 84.1284.4384.4984.4284.63 Mate 83.0883.33 84 83.8984.23 84.2 84.31 84.4 84.4584.45 Ber+Mate 82.4885.0285.6985.84 86.2 86.3186.3286.4186.4686.51 81 82 83 84 85 86 87 A ve rag e d UA S Bagging-reparsing
Figure 5.7: Dependency UAS of Bagging models with different numbers of sampling data sets.
Bagging a single-view parser Figure 5.7 indicates that (1) the Bagging method can also improve individual single-view parsers, especially for the grammar-based parser and (2) the Bagging approach to improve either grammar-free or grammar- based parser obtains equivalent overall accuracy. We analyze the outputs generated by two single-view Bagging models and present three aforementioned evaluation metrics in Table5.13. The ensemble learning enhanced parsing models still have complemen- tary strengths and the combination of both is therefore beneficial.
Devel. Complete Fsib Fgrd
Berkeley+conversion 30.08% 71.06 82.95
Mate 31.50% 69.55 80.48
Berkeley+conversion/Mate 34.39% 72.67 82.92
Table 5.13: Performance of different Bagging models. m=10, Inference=re-parsing.
5.6.4.5 Final Results
Table 5.14 summarizes the final results of different models on the test data set. We can see that parser ensemble is very important to advance the state-of-the-art of Chinese dependency parsing. Hatori et al. [2011] study several enhancement tech- niques, including joint learning, dynamic programming and deep feature engineering, for transition-based dependency parsing. Evaluations on an out-of-date version of CTB show that their system achieves significantly better performance than previ- ously reported systems. We re-train their system on the CoNLL data, and report its UAS in the first line.1 The beam width for decoding is set to 32, and the iteration
number (29) for model training is tuned on the development data. Though several second order graph-based parsers have been implemented and evaluated for Chinese dependency parsing, they do not focus on POS tagging much. In our experiments, the Mate parser based a much stronger POS tagger easily defeats many other systems, and obtain a state-of-the-art result.
Test UAS
State-of-the-art [Hatori et al., 2011] 84.27%
Mate parser 84.38%
Berkeley parser+conversion 83.49%
Stacking 85.80%
Bagging(m = 20)(voting) 86.85%
Bagging(m = 20)(re-parsing) 86.79%
Table 5.14: Accuracies of different models on the test data.
System ensemble can significantly enhance state-of-the-art parsers for Chinese. Compared to the previously introduced stacking model, our Bagging model is more effective to integrate grammar-free and grammar-based parsers. Based on automatic POS tagging, our Bagging model achieves a UAS of 86.85%, which obtains relative error reductions of 16% and 20% respectively compared to the strong baselines. The remarkable results of parsing ensemble also demonstrate the diversity of constituency and dependency parsing.