Extensions to the Basic Model - Evaluating the Annotation Tool’s Suggestions

3.5 Evaluating the Annotation Tool’s Suggestions

4.1.3 Extensions to the Basic Model

Collins (2003) describes a number of additions to the model, which add back some of the information lost by the independence assumptions made in Section 4.1.1. The first is a distance measure, which aims to predispose the model to right-branching structure and favour modification by the most recent verb. Equations 4.9 and 4.10 are altered to include a distance function:

Y which looks at the surface string of the previous modifiers. The two pieces of information carried by distancel and distancer are whether or not the string is empty, and whether or not it contains a verb. These allow the model to learn the biases mentioned above. This is Collins’ Model 1.

Collins’ Model 2 makes use of subcategorisation frames, which help the model differen-tiate between adjuncts and complements. Collins identifies which is which using various heuristics based on the constituent labels and the Treebank II semantic tags:^DIR,^LOC,^TMP, etc. This informa-tion is included in the model as an extra step between generating the head and the modifiers. Left and right subcategorisation frames, LC and RC, specify the complements required and are generated with the following probabilities:

P_lc(LC|Parent, h, H) (4.16)

Prc(RC|Parent, h, H) (4.17)

These values are then included in the generation of the left and right modifiers:

Y This is the most-widely used of Collins’ models and the one that we experiment with.

Collins’ Model 3 adds support for traces and Wh-movement, using a mechanism similar to that used in Generalized Phrase Structure Grammar (GPSG) (Gazdar et al., 1985). As with the subcategorisation frames, this model introduces a new parameter which is generated (conditioned on the head) and then used in the subsequent generation of the modifiers. This gap parameter serves to propagate the trace through the tree. When a gap has been probabilistically determined, it is added to the parameter. A trace will then be generated further down in the tree.

56 Chapter 4: Parsing with Collins’ Models

None of these extensions are particularly relevant for parsingNPs. The distance features for example are made entirely ineffectual by treating the previous modifier as the head. This means that they will always have the same values: the surface string is always empty and never contains a verb. Because of this, we have not described the additional terms in great detail. However, it is important to note the measures that must be taken in order to add new information sources to the model.

Coordination

Collins’ models have difficulties with coordination, as the head-finding rules and prob-ability estimates do not handle multiple heads. The independence assumptions that are necessary to allow lexicalisation also remove the information that is required to model coordinate structures well. As a result, constituents with unbalanced conjunctions, such as^{NP CC}andNP CC NP NP, are given too much probability mass.

Collins (1999) introduces a solution to this problem, generating the conjunct and the fol-lowing constituent together. For each constituent, a binary flag is generated. If the flag is true, then an additional step is taken which creates the conjunct node. An extra term is added to the product of rule probabilities, alongside those in Equation 4.11. This parameter Pαis conditioned on the words being coordinated, their constituent labels and the resulting constituent label. For example:

P_α(^CC, and|Bill, Ted,NP,NP,NP) (4.20)

Punctuation

One deficiency of the original Collins (1997) model is that it did not generate punctuation at all, ignoring the information that it can provide. Collins (1999) takes some measures towards including punctuation, generating commas and colons, although not in the same way as other con-stituents are. Other punctuation marks are still ignored, as are any punctuation marks that begin or end a sentence. The remaining commas and colons are raised as high in the tree as possible, which means they always occur between constituents. Figure 4.2 shows an example of the process taken from Collins (2003, page 604). Once this transformation is performed, punctuation is generated in the same way as coordination. A flag is generated with every constituent, and when it is true, an additional term is included in the calculation of the rule probability. The punctuation flag is conditioned on the same variables as the coordination flag. An example is shown below:

Pp(^comma, comma|Vinken, old,NP,NPB,ADJP) (4.21)

Chapter 4: Parsing with Collins’ Models 57

Figure 4.2: (a) Before punctuation preprocessing (b) After punctuation transformation

The standard evaluation used by Collins’ and most other researchers since, actually ig-nores punctuation entirely. This may be because speech marks in particular were given the lowest priority in the Treebank and are thus inconsistent. The bracketing guidelines (Bies et al., 1995, page 54) say:

. . . [speech marks] just get yanked around by whatever is inside them.

They are at the very bottom of the pecking order.

Thus, the addition of punctuation to Collins’ model is not so that they are recovered better, but to increase the model’s ability to recover other structures using the information that punctuation provides.

Bikel’s treatment of coordination and punctuation

The two previous sections have described the treatment of coordination and punctuation in Collins’ models. Bikel (2004, §3.5.4) points out that this approach causes the model to be in-consistent, as the number of intervening conjunctive items is not taken into account. This means that there are an infinite number of sentence/tree pairs (with different amounts of coordination and punctuation) all of which must be assigned some probability mass. This causes the sum of the probability of all trees to diverge.

To solve this problem, Bikel implements a different solution that generates conjunctions

58 Chapter 4: Parsing with Collins’ Models

and punctuation in the same manner as other constituents. A mapping function is used:

δ(Mi)=

where Miis a left or right modifier. This function is then included in the conditioning context, rather than as a separate parameter class as in Collins’ models. That is, Equations 4.9 and 4.10 are altered as shown shown below:

i=1...n+1

Pm(Mi(mi)|Parent, H, h, δ(M_i−1), side) (4.23) where side indicates whether the modifier is to the left or right of the head. This approach properly estimates the joint probability for coordination and punctuation, without causing the model to be inconsistent.

In document Statistical parsing of noun phrase structure (Page 71-74)