A Case Study on Example-Based Parsing

(1)

A Case Study on Example-Based Parsing

Oliver Streiter and Hsueh Pei-Yun

Academia Sinica, Institute of Information Science

Nankang, Taipei, Taiwan 115

E-mail:

[email protected], [email protected]

Abstract

The paper describes an example-based parser for Chi-nese: Tree structure whose size is equal or smaller than the sentence to be parsed are retrieved from a treebank and aligned with the sentence. Subsequent structural

adapta-tions handle unknown words, type shifting and

metaphor-ical extensions of words. Derivational adaptations

re-analyze awkward subtrees in order to auto-correct badly matched trees and insert unmatched, previously deleted words. This strategy is crucial for parsing long sentences. We try to illustrate the function of the parser through the analysis of one example sentence in different parsing mod-els. The main concern is to find an optimal strategy for the retrieval of appropriate tree structures and to demonstrate the advantages of fuzzy matches.

1. Introduction

The Academia Sinica possesses rich linguistic resources of modern Mandarin Chinese; among them, a manually tagged corpus of about 10 million characters [1] and a lex-icon containing about 80,000 words which describes their semantic and syntactic properties [2]. With the help of a rule-based parser, a treebank of manually corrected sen-tences has been created, containing at the time of writing about 40.000 trees [3]. The treebank annotation scheme specifies almost 200 lexical labels, 45 syntactic phrasal la-bels and 46 semantic role lala-bels.

An example-based parser tries to reshape the informa-tion contained in the treebank into a parser. In [4] we de-scribed the basic generalization and indexing mechanism for this parser. Since then this basic parser has been devel-oped through different models, which have been evaluated quantitatively in [5]. This paper will review the develop-ment of the parser in a qualitative way and thus comple-ment and deepen the description of this parsing approach: We shall look at one sentence and see how it is analyzed for some of the parsing models we subsequently developed. All

models have been trained with 16.000 trees of the treebank before the example sentence has been input.

2. Related Research

Given the amount of information contained in the tree-bank, the conversion of this data into a parser can only be achieved by an automatic learning approach. Daelemans et al. [6] divide automatic learning approaches into lazy learn-ers, which keep all learned data available for reasoning, and eager learners, which extract knowledge structures or sta-tistical information from the training data and reason on the basis of these abstractions but not the original data. They claim that lazy learners outperform eager learners for NLP tasks, as the latter ignore what they consider ’noise’ , i.e. the irregular and infrequent, however valuable, items.

Lazy learning algorithms descent from the k-nearest neighbor classifier (henceforth k-NN), according to which a new item is classified by retrieving k most similar items and their classifications from a database. The adaptation of the old classification to the new item is considered an efficient and qualified strategy which led to a number of paradigms such as Case-based Reasoning and Example-based Machine Translation. The conceptual advantage that no empirical data are lost through a transformation into an abstract model is accompanied by the computational disadvantage that the whole database has to be searched through in order to find similar items. In addition, the similarity measure is not de-rived from a model but remains open to the genius of the researchers.

Eager learners of treebanks are so-called Treebank Grammars. The Treebank Grammars derive stochastic phrase structure rules for the usage of traditional parsers. This approach tends to obtain better results than parsers driven by hand-made grammars, c.f. [7, 8, 9]. Problems of this approach, as mentioned by the authors themselves, are due to the low degree of lexicalization as the terminals of phrase structure rules only refer to the category labels, al-though lexical information and syntactic irregularities trig-gered by them are crucial for the quality of the parse, c.f.

(2)

Figure 1. The segmented example sentence, pin-yin transliteration and a word-based gloss

Figure 2. The reference parse of the example sentence

[10, 11]. Phrase structure rules representing irregular phe-nomena cannot be distinguished from noise when the lex-emes are not included into the representation and are re-moved from the set of rules. One possible reason for why lexemes are removed is that parsers generally cannot handle the amount of rules and intermediate results as would be necessary when syntactic categories are used in combina-tion with lexemes. For example, one single English phrase the car could be represented by four rules, resulting in four equivalent parses (e.g.NP->the car;NP->the n;

NP->det car;NP->det n). Thus, if a mixed encoding of lexemes and categories is required in an advanced parser, we need to find a parsing approach different from traditional ones.

Later developed parsing approaches based on Explanation-based learning retain this paradigm, as they are principally concerned with the automatic extrac-tion of a specialized grammar out of a general-purpose grammar on the bases of parsing examples [12]. Even Explanation-based learning approaches to parsing which are quite similar to the approach we propose [13] eagerly abstract away from lexical entries and morphological features.

Lazy learners have been proposed for a number of NLP tasks, including machine translation [14, 15, 16], part-of-speech tagging [11] and shallow parsing [6]. Due to the recursive and lexical (i.e. non-regular) nature of language, however, only tasks which require a narrow window on the language stream have been treated successfully, even if a mixed encoding of categories and lexemes is used: In an EBMT-task, for example, [16] matches an average of 3.5 words with a training corpus of 5 millions of words. In a parsing-task [4] matches an average of 3 words with a training corpus of 100,000 words.

When applying the k-NN approach to the task of pars-ing Chinese, we consider the surface strpars-ing of a sentence the item to be classified. The database contains example items (Chinese sentences) and their classifications (the cor-responding tree structures). Parsing principally consists of searching for the most closely-matched sentence, i.e. the

nearest neighbor (NN), and adapting this NN to the new sentence. This approach seems attractive for parsing Chi-nese, as the final parse is not derived from the analytical combination of hopelessly ambiguous sentence parts, but the recognition and combination of large sentence patterns.

3. Example Sentence

In this paper, we follow the processing of one sentence in different models in order to illustrate the progress of models. One sentence may not be representative enough to judge the performance of the parser for large corpora. However, there is abundant information in the case study to show the qualitative advance of models. The example sen-tence is reproduced in Fig. 1, the correct reference parse in Fig. 2.

In Fig. 2, the pipe | is used to separate nodes at the same level of the tree. : separates the informa-tion related to a node. Phrasal nodes consist of Se-mantic Role:Phrasal Category, while lexical nodes con-sist of Semantic Role:Lexical Category:Word. The Lexi-cal Category may be subdivided by > into the syntactic-semantic category and an additional syntactic-semantic feature for further cross-classification. The category for proper nouns

Nba, for example, may occur with the additional fea-tures likemankind,organization,super-nature,

country,region, etc. These features may also occur on other categories, e.g. onNhbin Fig. 2. While the seman-tic role labels are self-explaining, lexical categories are fare more complex. At this time it may suffice to say that cate-gories starting withNrefer to nouns, starting withVrefer to verbs and starting withPrefer to prepositions. Additional characters develop a finer classification e.g.VK1is a subset ofVKwhich is a subset ofV.

4. Model 1: Exact Matches

For the retrieval of stored tree structures we use a vari-ant of what [17] calls an Inverted Index. Indices are surface strings, categories, and generalized categories. e.g. ,

(3)

0-1: 1-9: 1-6: 1-5: 2-8: 2-6: 3-8: 4-10: 4-9: 5-9: 6-11: 7-9:

Figure 3. Lattice of partial parses if exact matches are required

Ndabd>temp,NdabdandN. For every index and its posi-tion in the sentence we store the tree number and the size of the sentence. In the analysis we calculate the intersection of all tree numbers obtained from all words of a sentence and all their possible (generalized) categories, obtaining by this the tree number corresponding to the sentence to be parsed. Calculating the intersection of indices on all words corre-sponds to an exact match, i.e. every word matches via its lexeme or category a slot in the tree.

If no complete match can be found for a sentence, we parse all possible sub-chunks of the sentence. The result-ing subtrees are written into a lattice and evaluated so as to obtain the best combination of partial trees. The lat-tice containing parsed sub-chunks is reproduced in Fig. 3 where chunks are described according to the segmentation in Fig. 1: (0-1) describes the first word, (1-6) the 2nd to 6th word and (1-5) the 2nd to 5th word. For the reason of read-ability and space, most partial trees which are not referred to later have been removed.

As what can be seen in Fig. 4, the combination of partial matches does not result in good parses. No overall parse is constructed. One reason for this is that unknown words remain unanalyzed. And due to the requirement of an exact match, even those sub-sentences which contain no unknown words are not completely covered, either.

We observe incorrect groupings (crossing brackets). For

example, should be analyzed as quantifying measure word (quantifier:DM) of the following nominal group, and not as frequency related to the verb. As what may be noticed, the correct analysis of can be found in (1-9). As these chunks cannot be completed to cover the whole sentence, this analysis does not occur in the final output. Al-though the coordination around the coordinator has been correctly identified, e.g. in (5-9), as well as the inclusion of the coordination into the attributive -construction, e.g. in (4-10), no partial parse contains both elements correctly. In addition, we observe minor mistakes like the incorrect recognition of the negation marker as adverbial, though there is also correct analysis appearing in (1-5). An interest-ing example for the ambiguity of partial chunks in Chinese is the analysis of as noun (1-9), as verb (3-8) and prepo-sition (7-9) as well as the analysis of the aspect marker asVC1(1-9), asVH11(2-6) and preposition (4-9). These ambiguities and the impossibility to integrate partial correct analysis into a correct one keep this first model from practi-cal.

5. Model 2: Fuzzy Match with Substitutions

In order to overcome the limitations of the exact matches, we investigate the potentials of fuzzy matches. A

(4)

Figure 4. The final parse of Model I: A combination of partial parses

Figure 5. The nearest neighbor with a fuzzy retrieval (substitutions)

your spelling checker finds financial although you wrote fi-namcial, thus substituting m by n. If we no longer calculate the intersection of all indices, but simply the sentence num-ber which has most often been referred to when compar-ing to tree , substitutions may occur in every position of the sentence. However, such substitutions only make sense if they can lead to a correct analysis. The ad-vantage we expect from this strategy is that longer matching chunks represent a gain in information compared to short exact matches even if some information is lost through the fuzzy match. Especially if lost information can be recov-ered the total balance should be positive.

If, for example, an adverb substitutes a subject pro-noun slot, frequencies of role-category and word-category combinations collected during the compilation of the tree-bank allow us to replace the role and category related to the pronoun by the adverb’s most probable lexical cate-gory and semantic role. This approach also gives us ad-vantage in the treatment of unknown words (which are as-signed the category found in the tree structure), metonymi-cal and metaphorimetonymi-cal usages of words (the syntactic cate-gory of the word is added to the semantic catecate-gory and role of the slot), as well as the treatment of all types of type coer-cion in which the syntactic category assigned in the lexicon is changed into a syntactically-required category (in most cases verb to action-noun coercion). The NN, i.e the best fuzzy match before adaptation is performed, is reproduced in Fig. 5. In this structure, the unknown proper noun has received the role and category found in the tree.

The consecutive structural adaptations, the results of which can be seen in Fig. 6, assign a correct role and cat-egory to the mismatched . However, a -construction which has been matched erroneously cannot be repaired. cannot be further adapted as no information about this word is available. is incorrectly adapted. The adaptations performed on , , again improve the lexical coding of the word, but cannot correct the wrong structure. Finally and receive their correct category through the adap-tation. Although the parse is far from perfect, at least the lexical encoding of words is very close to the reference tree

in Fig. 2.

6. Model III: Derivational Adaptations

Such adaptations of retrieved examples constitute a fundamental part of k-NN derived approaches as Case-Based Reasoning and Example-Case-Based Machine Transla-tion. Structural adaptations are performed via specific rules which are applied to the retrieved example (e.g. our treatment of unknown words, type switching and incorrect matches described above). Derivational adaptations refer to a recursive application of the learning algorithm in order to improve the result. In the above example, the structural adaptations improve only the lexical level, syntactically in-correct structures however could not be repaired (e.g. the wrongly matched -construction). We found that the re-analysis of obvious queer phrases (triggered by the same simple statistics referred to above) may produce small but noticeable improvements of the parsing results: A phrase is re-parsed if there is a not-yet-observed relation between a word and its category. Thus, if such a relation neither oc-curs in the learned corpus nor in the lexicon, the phrase is re-parsed. In the examples of re-parses shown in Fig. 7, a parse may have been triggered by one of the incorrect re-lations DE: , Ncb: , Nac: and Nad: . The result would have been better if no re-parse had been taken place due to the unknown relation Nac: , and Nad: . Here structural adaptations would have been preferable. Unfortunately, if the first match (Fig. 5) chunks the text wrongly at the high-est level, (e.g. as one phrase), a re-parse can bring only limited improvements.

Due to this limitation, no improvement can be achieved for this example sentence, as what can be seen when com-paring Fig. 8 to Fig. 6. Some structures worsen by re-parsing, e.g. the former correct attributive -construction was replaced by a wrong nominalization -construction. Although not visible in this sentence, the re-parse produced statistical improvements for a whole text [5].

(5)

Figure 6. The final parse of Model II

Figure 7. Recursively re-parsed sub-trees

7. Model IV: Deletions

As what can be observed in Fig. 6 to Fig. 8, still only small sentences can be parsed satisfyingly. In order to han-dle longer sentences, we attempt to extend the fuzzy match by deletions. Deletions constitute an additional operation to match two different units in a fuzzy match. If you mis-spelled the word parallel as paralllel, your spelling checker is likely to find the intended word not by any substitution, but by deleting one character. In the same way, when match-ing an input sentence onto tree structures, we use deletions of words of the sentence in order to match the sentence on a smaller, but maybe more appropriate tree structure.

If words have been deleted, input sentence and the pat-tern have to be aligned again in order to find out which word has to be mapped onto which tree slot. As there is no defi-nite relationship that the best tree extracted is also the best tree for the alignment, we extract a set of 10 nearest neigh-bors and align the sentence with them. That tree which can be best aligned is called the best neighbor (BN).

The best neighbor is identified via a general dynamic programming alignment algorithm, which has been applied and discussed previously for chunk-alignment [18] and bi-text alignment [19]. The scores are assigned by the align-ment algorithm in which values are determined as below: a Full Match for a syntactic category (3 points), a Full Match for a word (3 additional points), a partial category match (e.g.Nv1andNv2) (0.8 per character), a substitution and a deletion (0 points).

Fig. 9 lists the 10 nearest neighbors, their BN (Best Neighbor scores) and the alignment pattern. The alignment-pattern 0 1 2 3 4 5 6 10 for example says that the 7th, 8th and 9th word of the sentence (starting counting at 0) have to be deleted for the alignment.

After the best neighbor (BN) has been identified with the score 42, the words identified in the alignment pattern are copied to replace the original words of the

best neighbor (Fig. 10). Some matches are nice, e.g.

evaluation:Dbb: ==>evaluation:Dbb: ,

aspect:Di: ==>aspect:Di: ,

Head:Caa: =>Head:Caa: , some are not harmful, e.g. Head:Nad: ==>Head:Nad: , and some are odd, e.g.Head:Nad: ==>Head:Nad: .

Before this sentence is further adapted, however, the un-aligned words have to be inserted (default is in front of the next matching word). Deleted words obtain the provisional mark ROLE:CAT in category fields which can be easily identified in the following adaptation steps (Fig. 11)

Deleted and odd words at the top-level are corrected via structural adaptations as before. The outcome after the structural adaptation can be seen in Fig. 12. Deletions and odd words at deeper levels are not structurally adapted, as we want to adapt the tree by re-parsing.

Odd subtrees or subtrees containing inserted words are re-parsed. This re-parsing is a recursive application of the parsing function, following the same steps, i.e. NN-retrieval, BN-estimation, Insertion, Structural Adaptation and, if necessary, re-parsing. Re-parsing may continue un-til single words have to be adapted, in which case struc-tural adaptations apply and re-parsing stops. For reasons of space, we shall reproduce this process in an abbreviated form (Fig. 13).

The final parse of our example sentence is very close to the reference tree. We may observe two major differences between these trees. First, the unknown word is not recognized as proper noun, but labeled as a temporal adverb. As a consequence, the sentence has no subject and is labeled as VP. The second difference lies in the attachment of , which is associated to the top-node of the coordination but not to the first coordinated noun. If the measure word were not , this analysis however would have been correct also.

(6)

Figure 8. The final parse of Model III BN Alignment Sentence 33 0 1 2 3 4 5 6 10 29.4 0 3 4 5 6 7 8 10 39.8 0 1 3 4 6 7 8 9 10 42 0 1 3 4 5 6 7 8 10 30 0 3 6 7 8 9 10 39.6 0 1 2 3 4 6 7 9 10 29.4 0 1 3 4 5 6 7 10 38.2 2 3 4 5 6 7 8 9 10 32.4 0 3 4 5 6 7 8 9 10 30.6 0 1 2 3 4 6 8 9 10

Figure 9. The 10 NNs, their BN-scores and alignment patterns

(7)

Figure 11. Deleted words are inserted

Figure 12. Structural Adaptations at the top level

8. Conclusions

In this paper we have been going through the parsing steps of different models for one sentence of an example-based parser. The last model, a fuzzy match with substi-tutions and deletions, produces the qualitatively best parses which, though not perfect, describe the most parts and rela-tions of the sentence correctly.

The aim of the paper is to help understanding how a tree-bank can be compiled into an efficient parser. The k-NN classifier could be applied successfully to the task of pars-ing Chinese, as this approach allows us to use both lexemes and categories for the retrieval, which has been an obsta-cle to Treebank Grammars, but also for fuzzy matches. As a consequence, general patterns as well as syntactic partic-ularities can be handled efficiently at the same time. The k-NN approach can be equally combined the fuzzy matches which extend the size of the matched chunks and add vague-ness to the parser. Since the absence of vaguevague-ness is of-ten held responsible for the failure to build robust symbolic parser, this new parser makes a significant progress in sym-bolic parsing. To sum up, the model we propose has a num-ber of strong points which may probably change parsing paradigms to similar example-based approaches.

1) Parsing is extremely robust through fuzzy matches. 2) Parsing is extremely reliable through the string

encod-ing. This property has been subject to further investi-gations in [20].

3) Parsing is extremely rapid as whole patterns are searched for, and the analysis is not a combination of ambiguous sub-trees.

4) The parser can be improved by simply adding new trees to the treebank. Although this may have a limited effect for the parsing of free texts, in a limited domain the parser may parse quickly with almost 100% accu-racy as long as the learned trees remain consistent.

5) The parser is completely symbolic and may cooper-ate at any phase with other knowledge resources in or-der to support, for example, the treatment of unknown words.

6) As the parser may combine, if necessary, sub-trees from different trees, this parser can be developed from a treebank as small as 5.000 example trees for simple applications.

References

[1] Chu-Ren Huang and Keh-Jiann Chen. A Chi-nese corpus for linguistics research. In COL-ING’92, 1992.

[2] Chu-Ren Huang, Keh-Jiann Chen, Li-ping Chang, and Hui-li Hsu. Zh ¯ong y¯ang yán j¯iu yuàn p¯ing héng y ˇu liào k ù jiˇan jiè. In Pro-ceedings of ROCLING VIII, 1995.

[3] Keh-Jiann Chen, Chi-Ching Luo, Zhao-Ming Gao, Ming-Chung Chang, Feng-Yi Chen, and Chao-Jan Chen. The CKIP Chinese Treebank. In Journ´ees ATALA sur les Corpus annot´es pour la syntaxe. Talana, Paris VII, 1999. [4] Oliver Streiter. Parsing Chinese with

randomly generalized examples. In NL-PRS’99 Workshop on Multi-lingual Infor-mation Processing and Asian Language Processing, Beijing, November 1999. http://rockey.iis.sinica.edu.tw/oliver/.

[5] Oliver Streiter and Keh-Jiann Chen. Ex-periments in example-based parsing. In Proceedings of the Dialogue 2000 Inter-national Seminar in Computational Lin-guistics and Applications, Tarusa, Rus-sia, 2000. (with Russian summary), http://rockey.iis.sinica.edu.tw/oliver/.

(8)

Figure 13. The recursive application of the parsing as derivational adaptation

Figure 14. The final parse of Model IV

[6] Walter Daelemans, Sabine Buchholz, and Jorn Veenstra. Memory-based shallow parsing. In Proceedings of CoNLL-99, Bergen, Norway, June 12 1999. http://ilk.kub.nl/papers.html. [7] Rens Bod. Data oriented parsing (dop). In

COLING. 1992.

[8] Eugene Charniak. Tree-bank grammars. In 13th National Conference on Artificial Intelli-gence, AAAI-96, 1996.

[9] Satoshi Sekine and Ralph Grisman. A corpus-based probabilistic grammar with only two non-terminals. In Forth International Workshop on Parsing Technology, Prague, September 1995. //www.cs.nyu.edu/cs/projects/proteus/sekine. [10] Rens Bod. Extracting stochastic grammars

from treebanks. In Journ´ees ATALA sur les Corpus annot´es pour la syntaxe. Talana, Paris VII, 1999.

[11] Walter Daelemans, Antal Van den Bosch, and Jakub Zavrel. Forgetting excep-tions is harmful in language learning. Machine Learning, special issue on nat-ural language learning(34):11–43, 1999. http://ilk.kub.nl/papers.html.

[12] Manny Rayner and Samuelsson Christer. Corpus-based grammar specification for fast analysis. In Agnas et al., editor, Spoken Language Translator: First Year Report, SRI Technical Report CRC-043, pg. 41-54. 1994. URL http://www.cam.sri.com.

[13] B. Srivinas and Aravind K. Joshi. Some novel applications of explanation-based learning to parsing lexicalized tree-adjoining grammars. cmp-lg archive 9505023, 1995.

[14] Zeres GmBH. ZERESTRANS Benutzerhand-buch. Version:19.6.97, Bochum, 1997. [15] Michael Carl. Inducing translation templates

for example-based machine translation. In Proceedings of the MT-Summit’99, Singapore, 1999.

[16] Ralf D. Brown. Adding linguistic knowledge to a lexical example-based translation system. In TMI-99, 1999.

[17] Ralf D. Brown. Generalized example-based machine translation. URL: http//www.cs.cmu.edu/afs/cs.cmu.edu/user/ ralf/pub/WWW/ebmt.html, April 1999. [18] Robert Frederking and Sergei Nirenburg.

Three heads are better than one. In Proceed-ings of ANLP-94, Stuttgart, Germany, 1994. [19] Oliver Streiter, Leonid L. Iomdin, Munpyo

Hong, and Ute Hauck. Learning, for-getting and remembering: Statistical sup-port for rule-based MT. In TMI’99, 1999. http://proling.iitp.ru/bibitems/.

[20] Oliver Streiter. Reliability in example-based parsing. In TAG+5, International Workshop on Tree Adjoining Grammars and Related Formalisms, Paris, France, 2000. http://rockey.iis.sinica.edu.tw/oliver/.