Anchor Chunk Alignment - Qualitative analysis

5.5 Qualitative analysis

5.5.2 Anchor Chunk Alignment

To assess the quality of the aligned anchor chunks, we extracted all aligned chunks that are compatible with the manual word-to-word reference alignments.

Four different types of aligned chunks were distinguished:

• One-to-one chunks

| of asylum seekers | - | van asielzoekers |

| can be set up | - | kunnen worden aangemaakt |

• One-to-many chunks

| Mr President | - | Mijnheer | de Voorzitter |

| on state aids | - | op steunmaatregelen | van overheidswege |

5.6 Summary

1:1 1:n n:1 n:m Proceedings EP .68 .11 .04 .17 Press releases .80 .10 .04 .06 User manuals .75 .12 .05 .08

Total .76 .11 .04 .09

Table 5.5: Percentage of one-to-one, one-to-many, one, and many-to-many chunk alignments in the manual reference alignment per text type.

• Many-to-one chunks

| every individual | in the audience | - | elke toeschouwer |

| the heads | of government | - | de regeringshoofden |

• Many-to-many chunks

| could be | quite drastic | - | kan | ... | immers | escaleren |

Table 5.5 contains the percentage of one-to-one, one-to-many, many-to-one and many-to-many aligned chunks. As expected, the text type that contained the freest translations – the Europarl data set – contains the highest number of many-to-many chunk alignments, whereas the text types that contain the most literal translations – the press releases and user manuals – contain the highest number on one-to-one aligned chunks.

To asses the quality of the aligned anchor chunks, we calculated precision and recall at the level of aligned chunks. Overall, 55% of the chunks could be linked with a precision of 80%. A more detailed overview is given in table 5.6. More than 95% of the correctly aligned chunks were one-to-one chunks. As explained in chapter 4, F-measure at chunk level is very strict and does not account for partially correct chunk alignments. This explains why the scores calculated at chunk level are much lower than the scores calculated at word level.

5.6 Summary

In this chapter, we have described the global architecture of our sub-sentential alignment system. We conceive our sub-sentential aligner as a cascaded model with two phases.

F-measure Precision Recall

Proceedings EP .57 .67 .49

Press releases .74 .85 .65

User manuals .61 .85 .48

Total .65 .80 .55

Table 5.6: F-measure, precision and recall calculated at chunk level per text type. The system uses the probabilistic dictionary trained on the 9.3M word corpus as lexical resource.

The objective of the first phase was to link anchor chunks, i.e. chunks that can be linked with a very high precision. In our baseline system, on average 45-60% of the words can be linked with a precision ranging from 90% to 98%.

If we calculate the more strict F-measure at chunk level, 55% of the chunks could be linked with a precision of 80%. The obtained precision scores seem high enough to use the aligned chunks as anchors in the second phase of the alignment process.

We experimented with two different types of bilingual dictionaries to generate the lexical correspondences: a handcrafted bilingual dictionary and probabilistic bilingual dictionaries. We demonstrated that although the handcrafted dictio-nary is twice the size of the probabilistic dictiodictio-nary, the obtained recall scores are lower. No difference in precision is observed for the retrieved anchor chunks.

We demonstrated that lemmatizing the training corpus prior to dictionary ex-traction can increase recall for small training corpora. As expected, increasing the size of the training corpora has a positive impact on the overall recall scores.

In the next chapter, we focus on the alignment of more complex translational correspondences and present a chunk-driven bootstrapping approach. In chapter 6, we no longer rely on bilingual dictionaries to retrieve the lexical correspon-dences, but start from the intersected IBM Model 4 word alignments.

CHAPTER 6 A chunk-driven bootstrapping approach

6.1 Introduction

In the previous chapter, we presented the global architecture of our sub-sentential alignment system. We explained that our alignment system is conceived as a cascaded model consisting of two phases. We described in detail the first phase of the alignment process and introduced the notion of anchor chunks – chunks that can be linked with a very high precision on the basis of lexical correspon-dences and syntactic similarity. In this chapter, the focus is on the second phase of the process, and we will explain how the anchor chunks of the first phase are used to retrieve more complex translational correspondences.

In the previous chapter, two types of bilingual dictionaries were used as lexical resource: a manually created bilingual dictionary and probabilistic bilingual dictionaries derived from the IBM Model One word alignments. In this chapter, the system does not start from a bilingual dictionary but builds further on the output of GIZA++ (Och and Ney 2003), a state-of-the-art tool for statistical word alignment, which implements the IBM models 1–5 and is able to generate better word alignments than IBM Model One¹.

1Part of the work described here is published in Macken and Daelemans (2010).

In the context of statistical machine translation, GIZA++ is one of the most widely used word alignment toolkits. GIZA++ implements the IBM models 1–5² (Brown et al. 1993) and is used in Moses (Koehn et al. 2007) – an open-source statistical machine translation system – to generate the initial open- source-to-target and target-to-source word alignments after which a symmetrization heuristic combines the alignments of both translation directions. Intersecting the two alignments results in an overall alignment with a higher precision, while taking the union—textbf of the alignments results in an overall alignment with a higher recall. The default symmetrization heuristic applied in Moses (grow-diag-final) starts from the intersection points and gradually adds alignment points of the union to link unaligned words that neighbour established alignment points.

The main problem with the union and the grow-diag-final heuristics is that the gain in recall causes a substantial loss in precision, which poses a problem for applications intended for human users.

As can be seen in table 6.1, the intersected GIZA++ alignment points are very precise and have a much higher recall³ than the alignments resulting from the dictionary lookup in the dictionary derived from the IBM Model One word alignments.

Therefore, in this chapter, we will build our anchor chunks on the basis of the intersected GIZA++ word alignments. The anchor chunks and the intersected word alignments are then used to bootstrap the extraction of more complex translational correspondences (e.g. deletion of a determiner in a noun phrase, change from premodification to postmodification in a noun phrase).

WF1 WPrec WRec Model One Dict. 44.7 85.2 30.3 Intersected GIZA++ 74.1 97.2 59.9

Table 6.1: Weighted F-measure, precision and recall on all word-to-word links of a system using a Model One dictionary versus the intersected GIZA++ align-ments, calculated on all test files presented in table 5.1.

2IBM Model one is a pure lexical model: it only takes into account the word frequencies of the source and target sentences. The higher numbered IBM Models are more complex and take into account word order (distortion) and model the probability that one source word aligns to more than one target word (fertility).

3In contrast with the anchor chunk system described in chapter 5, GIZA++ also aligns function words.

6.1 Introduction

Two other changes have been made to the system described in chapter 5: prepo-sitions are considered as separate chunks and a stricter policy was applied for the identification of anchor chunks:

1. Prepositional phrases are used for a wide range of syntactic and seman-tic functions, most commonly modification and complementation. In the case of a complement, the preposition and the word it complements often function as one unit. For example, the verb to dispose and the preposition of form one unit with the meaning to discard. As our system offers the flexibility to merge smaller chunks, but cannot split chunks, we consider each preposition as a separate chunk that can be grouped either with the verb phrase, noun phrase or adjectival phrase it complements or with the noun phrase it introduces.

2. In the system described in chapter 5, a similarity test was performed on all candidate anchor chunks (see section 5.3.2). In the case of contigu-ous chunks 80% of the word had to be linked, while in the case of non-contiguous chunks all words had to be linked. In this chapter we apply a stricter policy: only the chunks where all words are linked are considered to be anchor chunks. The reason for this change is that we want to extract the translation-specific patterns (e.g. insertion of a determiner in a noun phrase as in education ∼ het onderwijs [En: the education]) automatically in the bootstrapping process.

In order to assess the impact of those two changes, we compared the results after anchor chunk alignment on the basis of the intersected GIZA++ alignments in two different settings. In the first version, a similarity test threshold of 80%

was used in the case of contiguous chunks and prepositions were not considered as separate phrases. In the second version, the similarity test threshold was set to 100% and prepositions were considered as separate phrases. The results are presented in table 6.2. As expected, the results of the second version are a bit more precise, but have a lower recall.

WF1 WPrec WRec Similarity threshold 80% 75.2 96.8 61.5 Similarity threshold 100% 74.7 97.2 60.7 + prepositions as separate phrases

Table 6.2: Weighted F-measure, precision and recall on all word-to-word links after anchor chunk alignment on the basis of the intersected GIZA++ align-ments calculated on all test files presented in table 5.1.

In document Sub-sentential alignment of translational correspondences (Page 90-96)