Adaptation - Preference Learning for Machine Translation

In this Section we describe how the graphical user interface can be exploited for adaptation of the translation model, describe the language model adaptation, and őnally show how to useDtrainfor online learning in this setting.

5.3.1 Translation Model Adaptation from User Edits

Allowing the user to directly edit the translation derivations of the SMT system enables the extraction of very őne-grained corrections. This can be straight- forwardly implemented by generating the differential between the initial derivation and the őnal user-generated derivation. We always receive a full13

phrase-based alignment since it is enforced by the user interface.

To this end, after post-editing, the current phrase-alignment (non-aligned phrases are prohibited) is mapped to a set of hierarchical translation rules, which are then compared to the rules used in the original derivation, and any rule that has changed is saved. The result of this procedure is a set of unweighted translation rules, which we refer to asEdit.

In addition to the translation rules that are extracted from the edits, prior to translation, the source is checked for any tokens unknown to the MT system. We query the user to provide the system with suitable translations for all detected non-translatables. This check is done by comparing the input to the available rules in the corresponding per-sentence grammar. This results in a set of rules which we refer to asUnk.

To maximize the impact of the user edits, we further exploit the user-generated phrase-alignment to produce another set of rules, Auto. We adapt the rule extraction algorithm of Chiang [2007] for this purpose: From the phrase-alignment,

Note that this process could be carried out during the original phrase-extraction step.

we extract rules in the same way as in the original, word-based approach. Rules with a maximum of two non-terminals, and a maximum source length of eight words are extracted. We further forbid adjacent non-terminals on the source side, and the maximum seed size is three phrases. This results in a large set of additional rules.

During this process it can occur that rules are extracted that are already known, i.e. contained in per-sentence grammars Ð these rules are annotated by a new feature, denoting them as being seen in the adaptation process. Entirely new rules, only have a single feature, denoting them as new. Rules that őx non-translatables are also annotated with a separate feature. Note that the setsEdit, Unk and Autoare mutually disjoint.

Prior to the translation of a segment, the union of Auto, Unk andEdit is appended to the respective grammar.

5.3.2 Parameter Adaptation

Similar Denkowski et al. [2014a] we perform online updates of the SMT system’s linear model after receiving a new training example. Instead of Mira, we use our online pairwise ranking algorithmDtrainfor this purpose. We employ the Sparsefeature set with our tuning algorithm, possibly adding millions of sparse features derived from translation rules, to be able to perform őne-grained model updates. Note that this applies also to the extended rule sets described before, since the additional rules also őre the respective features.

The general process is in principle identical to the regular online updates. Yet we need to make sure that there is no degenerate behavior combining the translation model adaptation and the online updates: Since all reference translations (the post-edits) are effectively reachable, due to the rule extraction, weights of rule-based features derived from the reference may be overestimated, especially from rules covering large parts of the reference, since less rule applications generally lead to higher model scores. We therefore only add the speciőcally extracted rules inEdit to the grammar prior to an update, but not the larger setAuto. The rules rules inUnkare added in any case, since they were used for the initial hypotheses used for post-editing.

5.3.3 Language Model Adaptation

Language model adaptation is carried out in the same way as proposed by Denkowski et al. [2014a], adding the post-edits to a hierarchical Bayesian trigram language model [Teh, 2006]. We update the language model prior to doing a parameter update.

5.3.4 Adaptation Scheme

The full adaptation scheme (see Figure 5.2) can be described as follows: After the client service requested a translation of a source segment, a remote service checks the source for non-translatables. If non-translatables are detected, translations for each non-translatable word are requested from the user via the local client. After this step, the remote service annotates the grammar as described, and the segment is translated and returned to the client. When the post-edition of the segment is őnished, the server process receives a translation, alongside with a phrase-alignment and a set of rules that reŕect the alterations done by the user (theEdit set). The current grammar is then extended with the edited rules, and the adaptive language model is updated using the post-edit, followed by a parameter update, which includes another decoding step for generating a newk-best list including the new rules. Finally, and in any case prior to the translation of the next segment, the rule extraction process for generating the rules ofAutois carried out with the current training example.

In document Preference Learning for Machine Translation (Page 191-193)