[PDF] Top 20 Effective Selection of Translation Model Training Data

Effective Selection of Translation Model Training Data

... Data selection has been demonstrated to be an effective approach to addressing the lack of high-quality bitext for statistical machine translation in the domain of ...current data ... See full document

5

Detecting Cross Lingual Semantic Divergence for Neural Machine Translation

... of training examples in various related set- ...the training data corrupted in various ways, in- cluding random labelings of the original images, and random transformations of the input ...machine ... See full document

11

Data point selection for self training

... sparse data problems for statistical ...self- training is a cheap and effective method for improving parsing accuracy for morphologi- cally rich ... See full document

6

Hybrid Selection of Language Model Training Data Using Linguistic Information and Perplexity

... Table 1 shows the number of n-grams on LMs built on the English side of News Commentary v8 (hereafter NC) for each of the models. Regarding 1-grams, compared to f, the substitution of named entities by their categories ... See full document

5

Data Augmentation for Low Resource Neural Machine Translation

... Given a source and target sentence pair (S,T), we want to alter it in a way that preserves the semantic equivalence between S and T while diversifying as much as possible the training examples. A number of ways to ... See full document

7

Adaptive Development Data Selection for Log linear Model in Statistical Machine Translation

... test data dependent model parameter se- ...parameter selection to development data se- ...log-linear model parameters to dif- ferent test data and achieves consistent good trans- ... See full document

9

Exploring Transfer Learning and Domain Data Selection for the Biomedical Translation

... selective data training gives any benefit over simple training using whole N C corpus; we built ...selective data training in above mentioned experiments (M2 and ...ing model ... See full document

8

Method of Selecting Training Data to Build a Compact and Efficient Translation Model

... rate training (MERT) (Och, 2003), which is the optimization of feature weights by maximizing the BLEU score on the development set, can improve the performance of a ... See full document

6

Dynamic Data Selection for Neural Machine Translation

... Intelligent selection of training data has proven a successful technique to simul- taneously increase training efficiency and translation performance for phrase-based machine ... See full document

11

Translation Quality Based Supplementary Data Selection by Incremental Update of Translation Models

... ‘in-domain’ model would improve translation ...new translation model needs to be retrained and its performance evaluated in terms of evaluation ...for selection only when its inclusion ... See full document

18

Drift Detection Based Model Selection Framework For Real-Time Anomaly Detection In Iot

... 1 model is constructed as a bagging model with a voting combiner ...bagging model is composed of three ...learner training phase and the combiner ...the data into multiple overlapping ... See full document

6

Hybrid Data Model Parallel Training for Sequence to Sequence Recurrent Neural Network Machine Translation

... of training time is an important issue in many tasks like patent translation involving neural ...networks. Data parallelism and model parallelism are two com- mon approaches for reducing ... See full document

9

Improving Word Alignment Using Linguistic Code Switching Data

... LCS data typically contains no sentence-level alignments, but it still has some advantages for training word alignment models and machine translation (MT) systems which are worth ...LCS data ... See full document

9

Leave One Out Phrase Model Training for Large Scale Deployment

... Italian-English training data. Training is performed with and without inser- tion/deletion phrases and both with (FaTrain) and without (FaPrune) re-training of the forward and backward phrase ... See full document

8

A Comparative Evaluation of Data driven Models in Translation Selection of Machine Translation

... We evaluated the accuracy ratio of LSA and PLSA comparatively and classified the experiments with criteria of the values of k and the grammatical relations. We acquired up to 20% accuracy improvement, compared to ... See full document

7

Intelligent Selection of Language Model Training Data

... Gigaword data correspond- ing to 8 cutoff points in the cross-entropy differ- ence scores, and trained 4-gram models (again using absolute discounting with a discount of ... See full document

5

Extracting In domain Training Corpora for Neural Machine Translation Using Data Selection Methods

... all data from the previ- ous ...the model trained with all data has access to both the generic and domain vocabulary, the fine-tuned models are built on top of the generic vocabulary ... See full document

8

Improving Statistical Machine Translation Performance by Training Data Selection and Optimization

... In training process, we use GIZA++ 4 toolkit for word alignment in both translation directions, and apply “grow-diag-final” method to refine it (Koehn et ...log-linear model training, we take ... See full document

8

Japanese Russian TMU Neural Machine Translation System using Multilingual Model for WAT 2019

... Ja→En training data so that their sizes match the largest Ru → En data for each ...during training. We incorporated early-stopping by stopping training if BLEU score for the devel- ... See full document

6

Towards Effective Use of Training Data in Statistical Machine Translation

... Still, even with that much RAM it is not possible to train a language model with SRILM (Stolke, 2002) in one pass. Hence, we broke up the training corpus by source (New York Times, Washington Post, ...) and ... See full document

5