[PDF] Top 20 Tilde’s Parallel Corpus Filtering Methods for WMT 2018
Has 10000 "Tilde’s Parallel Corpus Filtering Methods for WMT 2018" found on our website. Below are the top 20 most common "Tilde’s Parallel Corpus Filtering Methods for WMT 2018".
Tilde’s Parallel Corpus Filtering Methods for WMT 2018
... can also be found in the target sentence (and vice versa). Although this filter removes all sentence pairs where numbers that are writ- ten in digits have been translated into num- bers written in words, it is effective ... See full document
7
Alibaba Submission to the WMT18 Parallel Corpus Filtering Task
... The parallel corpus is an essential resource for machine translation and multilingual natural lan- guage ...of parallel corpus is also very important in MT system training (Koehn and Knowles, ... See full document
6
The University of Helsinki Submission to the WMT19 Parallel Corpus Filtering Task
... pruning methods with the support of n-grams of varying size and the estimation of word like- lihoods from text segmented into subword ...noisy parallel training data, and store five values as potential ... See full document
7
UTFPR at WMT 2018: Minimalistic Supervised Corpora Filtering for Machine Translation
... We present the UTFPR systems at the WMT 2018 parallel corpus filtering task. Our supervised approach discerns between good and bad translations by training clas- sic binary ... See full document
5
Learning Bilingual Sentence Embeddings via Autoencoding and Computing Similarities with a Multilayer Perceptron
... our methods in the WMT 2018 paral- lel corpus filtering task (Koehn et ...web-crawled corpus of 104M parallel lines (ParaCrawl ...raw corpus with the heuristics of ... See full document
11
NRC Parallel Corpus Filtering System for WMT 2019
... on parallel corpus filter- ing was essentially the same as last year’s edi- tion (Koehn et ...noisy corpus crawled from the web using ParaCrawl (Koehn et ...of parallel data, covering ... See full document
9
MAJE Submission to the WMT2018 Shared Task on Parallel Corpus Filtering
... Participants in the shared task have to submit a file with quality scores, one per line, corresponding to the sentence pairs on the 1 billion word German- English Paracrawl corpus. Scores do not have to be ... See full document
5
Noisy Parallel Corpus Filtering through Projected Word Embeddings
... the WMT 2019 par- allel corpus filtering shared task is to select the 5 million words of parallel sentences producing the highest-quality machine translation system, given a set of ... See full document
5
Findings of the WMT 2019 Shared Task on Parallel Corpus Filtering for Low Resource Conditions
... Specifically, we provided a very noisy 50- 60 million word (English token count) Nepali– English and Sinhala–English corpora crawled from the web using the Paracrawl processing pipeline (see Section 4.4 for details). We ... See full document
19
STACC, OOV Density and N gram Saturation: Vicomtech’s Participation in the WMT 2018 Shared Task on Parallel Corpus Filtering
... from parallel corpora has been tackled by various researchers over the ...of corpus creation from web data, to filter dubious sen- tence ...tered corpus. In (Cui et al., 2013), the approach to data ... See full document
7
NICT’s Corpus Filtering Systems for the WMT18 Parallel Corpus Filtering Task
... NMT has shown large gains in quality over Sta- tistical machine translation (SMT) and set several new benchmarks (Bojar et al., 2017). However, NMT is much more sensitive to domain (Wang et al., 2017) and noise ... See full document
5
Parallel Corpus Filtering Based on Fuzzy String Matching
... to WMT 2019 shared task on paral- lel corpus ...develop methods for scoring each parallel sentence from a given noisy par- allel ...high-quality parallel sentences sub-sampled from the ... See full document
5
An Unsupervised System for Parallel Corpus Filtering
... the WMT 2018 Parallel Cor- pus Filtering shared task which addresses the problem of cleaning noisy parallel ...cleaning parallel sen- tences is important for improving the ... See full document
6
The RWTH Aachen University Filtering System for the WMT 2018 Parallel Corpus Filtering Task
... As neural network-based translation model we use the transformer architecture (Vaswani et al., 2017) implemented in the Sockeye toolkit (Hieber et al., 2017) which is build on top of MXNet (Chen et al., 2015). Encoder ... See full document
9
Findings of the WMT 2018 Shared Task on Parallel Corpus Filtering
... SMT For statistical machine translation, we used Moses (Koehn et al., 2007) with fairly ba- sic settings, such as Good-Turing smoothing of phrase table probabilities, maximum phrase length of 5, maximum sentence length ... See full document
14
Prompsit’s submission to WMT 2018 Parallel Corpus Filtering shared task
... The WMT 2018 parallel corpus filtering shared task partially shares its objectives with the First Automatic Translation Memory Cleaning Shared Task (Barbu et ...of parallel ... See full document
8
The ILSP/ARC submission to the WMT 2018 Parallel Corpus Filtering Shared Task
... There is a growing literature on using web- acquired data for constructing various types of language resources, including monolingual and parallel corpora. As shown in, among others, Pecina et al. (2014) and ... See full document
6
Coverage and Cynicism: The AFRL Submission to the WMT 2018 Parallel Corpus Filtering Task
... The preceding processes and metrics were de- signed to remove many sources of error men- tioned in the introduction of this paper. How- ever, we have not yet dealt with the case of hav- ing both English and German lines ... See full document
5
A SURVEY REPORT ON THE EXISTING METHODS OF BUILDING A PARALLEL CORPUS
... English-Chinese parallel corpus that is aligned at document and sentence ...The parallel data is taken from bilingual Websites containing good quality content in the two language from multiple ... See full document
7
Webinterpret Submission to the WMT2019 Shared Task on Parallel Corpus Filtering
... First, we filtered out some of the pairs (x, y) in the raw corpus according to several heuristic rules (Section 2.1). Then, for the remaining pairs, we computed a ranking value r(x, y) for each of them. This ... See full document
6
Related subjects