[PDF] Top 20 Minimum Risk Training for Neural Machine Translation

Minimum Risk Training for Neural Machine Translation

... Although NMT models have achieved results on par with or better than conventional SMT, they still suffer from a major drawback: the models are op- timized to maximize the likelihood of training data instead of ... See full document

10

Lexicons and Minimum Risk Training for Neural Machine Translation: NAIST CMU at WAT2016

... Japanese-English translation track of the 2016 Workshop on Asian Translation was based on attentional neural machine translation (NMT) ...discrete translation lexicons to improve ... See full document

7

Dynamic Sentence Sampling for Efficient Training of Neural Machine Translation

... Josep Maria Crego, Jungi Kim, Guillaume Klein, Anabel Rebollo, Kathy Yang, Jean Senellart, Egor Akhanov, Patrice Brunelle, Aurelien Coquard, Yongchao Deng, Satoshi Enoue, Chiyo Geiss, Joshua Johanson, Ardas Khalsa, Raoum ... See full document

7

Concept Equalization to Guide Correct Training of Neural Machine Translation

... Neural machine translation decoders are usually conditional language models to sequentially generate words for target sen- ...a translation experi- ment from English to French, the concept ... See full document

6

The AFRL WMT17 Neural Machine Translation Training Task Submission

... the training sets seen in Table 1. Graphical training histories are shown in Figure 1, summarized in Table ...“Student” training data set both trains the fastest and leads to the highest- scoring ... See full document

5

From Bilingual to Multilingual Neural Machine Translation by Incremental Training

... On the other hand, architectures that share pa- rameters between all languages (Johnson et al., 2017) by using a single encoder and decoder trained to be able to translate from and to any of the languages of the system. ... See full document

7

Bridging the Gap between Training and Inference for Neural Machine Translation

... at training time as opposed to the pre- vious words generated by the model as context at ...tween training and inference, when predicting one word, we feed as context either the ground truth word or the ... See full document

10

Regularized Training Objective for Continued Training for Domain Adaptation in Neural Machine Translation

... Josep Maria Crego, Jungi Kim, Guillaume Klein, An- abel Rebollo, Kathy Yang, Jean Senellart, Egor Akhanov, Patrice Brunelle, Aurelien Coquard, Yongchao Deng, Satoshi Enoue, Chiyo Geiss, Joshua Johanson, Ardas Khalsa, ... See full document

9

Neural Machine Translation by Minimising the Bayes risk with Respect to Syntactic Translation Lattices

... Lattice minimum Bayes-risk (LMBR) decoding has been applied successfully to translation lattices in traditional SMT to improve translation per- formance of a single system (Kumar and Byrne, ... See full document

7

Three phase training to address data sparsity in Neural Machine Translation

... over training during various stages - leading to better translation quality for Indian ...increase translation quality between resource scarce language pairs by incorporating the weights learnt ... See full document

10

Multimodal Neural Machine Translation for Low resource Language Pairs using Synthetic Data

... Sennrich et al. (2016) incorporated monolingual data on the target side to investi- gate two methods of filling the source side of the monolingual data. In the first method, they used a dummy source sentence for every ... See full document

10

NICT Self Training Approach to Neural Machine Translation at NMT 2018

... to-seq model), respectively. These tables consist of three information groups. The first group shows training results; the number of training epochs and perplexity of the development set. The second and ... See full document

6

Low Resource Corpus Filtering Using Multilingual Sentence Embeddings

... (probabilistic translation dic- tionaries and language models) were trained on the provided clean data (excluding the dictionar- ...lexical translation and language model scores, and several shallow ... See full document

6

Fluency Constraints for Minimum Bayes Risk Decoding of Statistical Machine Translation Lattices

... the translation model P (F | E). Given the weakness of current translation models this is a severe ...reference translation E ¯ of F (see the discussion in Section ... See full document

9

Minimum Error Rate Training in Statistical Machine Translation

... the minimum Bayes risk approach, in which an optimal decision rule with respect to an application specific risk/loss function is used, which will normally differ from ...the training criterion ... See full document

8

Transductive Minimum Error Rate Training for Statistical Machine Translation

... We can also review the roles that the development and test datasets play in the procedure of avoiding over-training. The reason for that we transductively generate translations as pseudo ref- erences for test ... See full document

8

Minimum Bayes Risk Decoding for Statistical Machine Translation

... We will show that MBR decoding can be applied to machine translation in two scenarios. Given an automatic MT metric, we design a loss function based on the metric and use MBR decoding to tune MT ... See full document

8

Residual Stacking of RNNs for Neural Machine Translation

... enhance Neural Machine Translation models, several obvious ways such as enlarging the hid- den size of recurrent layers and stacking multiple layers of RNN can be ...the training and leads to ... See full document

7

Efficient Minimum Error Rate Training and Minimum Bayes Risk Decoding for Translation Hypergraphs and Lattices

... candidate translation that is shorter than the average length over its reference translations, using a penalty term which is linear in the difference between either ... See full document

9

Minimum Imputed Risk: Unsupervised Discriminative Training for Machine Translation

... The translation models are built using the corpus for the IWSLT 2005 Chinese to English translation task (Eck and Hori, 2005), which comprises 40,000 pairs of transcribed utterances in the travel ... See full document

10