2.2 Statistical Machine Translation
2.2.3 Moses Toolkit
The PBSMT tool we use in this work is the Moses Toolkit (Koehn et al., 2007). This tool takes a set of parallel sentences and a language model as input and trains an SMT system. It computes the models explained in Section 2.2.2 and produces a file, moses.ini, that contains the features as shown in Listing 2.1.
The first part of the moses.ini file contains the paths to the components explained in Section 2.2.2 (translation model, lexical reordering and language model) and other
UnknownWordPenalty WordPenalty P h r a s e P e n a l t y P h r a s e D i c t i o n a r y M e m o r y name=T r a n s l a t i o n M o d e l 0 num−f e a t u r e s =4 p a t h=/p a t h / t o / p h r a s e −t a b l e . g z i n p u t −f a c t o r =0 o u t p u t−f a c t o r =0 L e x i c a l R e o r d e r i n g name=L e x i c a l R e o r d e r i n g 0 num−f e a t u r e s =6 t y p e=wbe−msd−b i d i r e c t i o n a l −f e − a l l f f i n p u t −f a c t o r =0 o u t p u t−f a c t o r =0 p a t h=/p a t h / t o / r e o r d e r i n g −t a b l e . wbe−msd−b i d i r e c t i o n a l −f e . g z D i s t o r t i o n KENLM l a z y k e n =0 name=LM0 f a c t o r =0 p a t h=/p a t h / t o /LM o r d e r =8 L e x i c a l R e o r d e r i n g 0= 0 . 0 8 9 9 0 2 3 0 . 0 5 8 9 2 5 3 0 . 0 4 5 6 7 9 6 0 . 0 8 7 9 3 9 7 0 . 0 0 0 1 2 2 1 0 6 0 . 1 3 5 8 9 6 D i s t o r t i o n 0= 0 . 0 2 9 4 9 9 3 LM0= 6 . 1 5 0 9 7 e −05 WordPenalty0= − 0 . 0 1 6 4 7 1 3 P h r a s e P e n a l t y 0= −0 .2 91 07 T r a n s l a t i o n M o d e l 0= 0 . 0 0 0 1 1 9 4 8 1 0 . 0 2 0 7 1 7 3 0 . 2 2 2 7 9 9 − 0 . 0 0 0 7 9 7 1 8 6 UnknownWordPenalty0= 1
Listing 2.1: Extraction of moses.ini file using default configuration of Moses
wenn w i r | | | when we | | | 0 . 2 0 0 6 0 . 1 7 7 2 0 . 1 1 0 0 . 1 5 5 1 | | | 0−0 1−1 | | | 6 4 8 1 1 7 7 1 3 0 | | | | | | wenn w i r | | | when | | | 0 . 0 0 0 6 0 1 5 0 . 0 0 0 7 4 2 8 0 . 0 0 5 1 0 . 1 9 9 5 | | | 0−0 | | | 9 9 7 4 1 1 7 7 6 | | | | | | wenn w i r | | | w h e n e v e r we | | | 0 . 1 8 1 8 0 . 1 8 5 1 0 . 0 0 3 3 9 8 0 . 0 0 3 3 7 6 | | | 0−0 1−1 | | | 22 1 1 7 7 4 | | | | | | wenn w i r | | | w h e r e we | | | 0 . 0 0 3 8 7 5 0 . 0 1 1 4 8 0 . 0 0 0 8 4 9 6 0 . 0 0 5 9 2 7 | | | 0−0 1−1 | | | 2 5 8 1 1 7 7 1 | | | | | | wenn w i r | | | w h i l e we | | | 0 . 0 1 4 9 2 0 . 0 0 9 1 5 1 0 . 0 0 0 8 5 0 . 0 0 2 9 2 6 | | | 0−0 1−1 | | | 67 1 1 7 7 1 | | | | | |
Listing 2.2: Extraction of phrase table file
features such as word and phrase penalty (so the translations are not too long or too short), unknown word penalty and distortion (Brown et al., 1993).
The second part of the file contains the weights of the features (λi values in
Equation (2.11)). In Listing 2.1 we show the values after tuning.
We can observe in the first part of the moses.ini file (Listing 2.1) that the trans- lation model (PhraseDictionaryMemory), reordering model (LexicalReordering) and language model (LM) indicate the files where these models are stored. Note that the translation model and reordering model files have been created by Moses, but the language model is created separately and then provided to Moses at training time.
The translation model is stored in a file called phrase table. We show an ex- traction in Listing 2.2. This file contain five columns (the separator of the table is “|||”)
1. Phrase in the source side.
2. Phrase in the target side: the phrase in the target side language that is paired with the source side phrase.
3. Translation model features: The four probabilities explained in Section 2.2.2 in this order: inverse phrase translation probability, inverse lexical weighting, direct phrase translation probability, and direct lexical weighting. We describe the first row of Listing 2.2 as an example of how they are computed:
• Inverse phrase translation probability (φ(f |e)): this is computed as in Equation (2.13). The counts of occurrences of the phrases are shown in column 5 (“when we” occurs 648 times, “wenn wir” and “when we” occur together 130 times). Therefore the inverse phrase translation probability is 0.2006 = 130/648.
• Inverse lexical weighting (lex(f |e)): this is computed as in Equation (2.14). The individual lexical weighting is stored in a file called lex.e2f created by Moses. In this file we find the values of lexical weighting for the words in the phrases, in the rows wenn when 0.2658521 and wir we 0.6666557. Therefore the inverse lexical weighting is 0.1772 = 0.2658521 · 0.6666557. • Direct phrase translation probability (φ(e|f )): this is computed as in Equation 2.13. “wenn wir” occurs 1177 times, “wenn wir” and “ when we” occur together 130 times. Therefore the direct phrase translation probability is 0.110 = 130/1177.
• Direct lexical weighting (lex(e|f )): this is computed as in Equation (2.14). The individual lexical weighting is stored in a file called lex.f2e which con- tains the rows when wenn 0.1995174 and we wir 0.7773823. Therefore the direct lexical weighting is 0.1551 = 0.1995174 · 0.7773823.
4. Alignments: How words of the source and target side are aligned individually. For example, in the last row, the pair h“wenn wir , when we”i , “0-0” indicate that the 0-th word in the source side word (“wenn”) is aligned to the 0-th target-side word (“when”).
wenn w i r | | | when | | | 0 . 2 0 0 0 0 0 0 . 0 6 6 6 6 7 0 . 7 3 3 3 3 3 0 . 3 3 3 3 3 3 0 . 0 6 6 6 6 7 0 . 6 0 0 0 0 0 wenn w i r | | | w h e n e v e r we | | | 0 . 2 7 2 7 2 7 0 . 0 9 0 9 0 9 0 . 6 3 6 3 6 4 0 . 0 9 0 9 0 9 0 . 0 9 0 9 0 9 0 . 8 1 8 1 8 2 wenn w i r | | | w h e r e we | | | 0 . 2 0 0 0 0 0 0 . 2 0 0 0 0 0 0 . 6 0 0 0 0 0 0 . 2 0 0 0 0 0 0 . 2 0 0 0 0 0 0 . 6 0 0 0 0 0 wenn w i r | | | w h i l e we | | | 0 . 2 0 0 0 0 0 0 . 2 0 0 0 0 0 0 . 6 0 0 0 0 0 0 . 2 0 0 0 0 0 0 . 2 0 0 0 0 0 0 . 6 0 0 0 0 0
Listing 2.3: Reordering table
phrase counts, source and target intersection count. For example, in the first example “wenn wir , when we”, there are 1177 occurrences of “wenn wir”, and 648 occurrences of “when we”. The phrases “wenn wir” and “ when we” occur together 130 times,
The reordering model is also stored in a separate file. We show an extraction in Listing 2.3. This file contain three columns (the separator of this table is also “|||”):
1. Phrase in the source side.
2. Phrase in the target side. The phrase in the target side language that is paired with the source side phrase.
3. Orientation probabilities: Six probabilities in two sets indicating the orienta- tion (monotone, swap and discontinuous) in both directions (left-to-right and right-to-left), with each set of probability summing to 1.