[PDF] Top 20 An Unsupervised System for Parallel Corpus Filtering
Has 10000 "An Unsupervised System for Parallel Corpus Filtering" found on our website. Below are the top 20 most common "An Unsupervised System for Parallel Corpus Filtering".
An Unsupervised System for Parallel Corpus Filtering
... sentence pairs for training both statistical and neu- ral MT systems (Koehn et al., 2018). A lot of previous work has studied the problem of par- allel data cleaning. Espl`a-Gomis and Forcada (2010) proposed BiTextor ... See full document
6
Findings of the WMT 2018 Shared Task on Parallel Corpus Filtering
... translation system quality is computationally intractable due to the high cost of training these systems to eval- uate different weight ...quality parallel corpora, while bad sentence pairs are either ... See full document
14
Prompsit’s submission to WMT 2018 Parallel Corpus Filtering shared task
... tems were built with Moses and tuned with Batch MIRA (Cherry and Foster, 2012). A 5-gram lan- guage model was estimated from the TL side of the training corpus. NMT systems followed the Transformer architecture ... See full document
8
SYSTRAN Participation to the WMT2018 Shared Task on Parallel Corpus Filtering
... Similar to (Legrand et al., 2016) our model ex- tracts context information from source and target sentences and then computes simple dot-products to estimate word alignments. The objective func- tion is computed at the ... See full document
5
MAJE Submission to the WMT2018 Shared Task on Parallel Corpus Filtering
... Participants in the shared task have to submit a file with quality scores, one per line, corresponding to the sentence pairs on the 1 billion word German- English Paracrawl corpus. Scores do not have to be ... See full document
5
Noisy Parallel Corpus Filtering through Projected Word Embeddings
... existing parallel cor- pora to learn word alignments and identify parallel sentences on the assumption that non-parallel sen- tences have few or none word ...translation system to produce ... See full document
5
The University of Helsinki Submission to the WMT19 Parallel Corpus Filtering Task
... rescoring system for the WMT 2019 Shared Task on Parallel Cor- pus ...Our system is based on contrastive scoring models using features extracted from dif- ferent kinds of data-driven and heuristic ... See full document
7
Building a Web Based Parallel Corpus and Filtering Out Machine Translated Text
... Our method of filtering out statistical machine translation is based on the similarity of algorithms of building phrase tables in the existing SMT systems. Those systems also have restrictions on reordering of ... See full document
9
Webinterpret Submission to the WMT2019 Shared Task on Parallel Corpus Filtering
... The filtering rules we implemented for our sub- mission are not language specific, and moreover, they only place very mild assumption on what con- stitutes a ”good” sentence ...translation system is most ... See full document
6
Compiling and Filtering ParIce: An English Icelandic Parallel Corpus
... systems, parallel data quality is im- portant and may weaken performance if inade- quate, especially for NMT (see ...good parallel corpora is thus to assess how accurate the alignments ... See full document
6
Tilde’s Parallel Corpus Filtering Methods for WMT 2018
... describes parallel corpus filtering methods that allow reducing noise of noisy “parallel” corpora from a level where the cor- pora are not usable for neural machine trans- lation training ... See full document
7
Discriminative Joint Modeling of Lexical Variation and Acoustic Confusion for Automated Narrative Retelling Assessment
... an unsupervised manner from a parallel corpus of manual or ASR transcripts of retellings and the orig- inal source narrative, much as in machine translation ... See full document
10
Coverage and Cynicism: The AFRL Submission to the WMT 2018 Parallel Corpus Filtering Task
... NMT system using OpenNMT(Klein et ...task parallel German–English data, excluding the Paracrawl ...This system was a 4-layer bidi- rectional RNN, with 600-dimensional word em- beddings and an RNN ... See full document
5
Findings of the WMT 2019 Shared Task on Parallel Corpus Filtering for Low Resource Conditions
... Use of embeddings. While the participant’s methods were dominated by non-neural compo- nents, sometimes using neural machine transla- tion outputs and scores, some participants used word and sentence embeddings. Given ... See full document
19
Unsupervised Adaptation for Statistical Machine Translation
... for filtering using automatic translation of the test data (HYP), a pseudo in-domain set (NC) and the references (REF) of the test sets (keeping one blind test ...based filtering is not able to perform good ... See full document
9
Improving Low Resource Neural Machine Translation with Filtered Pseudo Parallel Corpus
... Data filtering is often used in domain adap- tation (Moore and Lewis, 2010; Axelrod et ...high-quality parallel sentence pairs and achieve better translation performance and reduce time- complexity with a ... See full document
9
UTFPR at WMT 2018: Minimalistic Supervised Corpora Filtering for Machine Translation
... In this contribution, we presented the UTFPR sys- tems submitted to the WMT 2018 parallel corpus filtering task. Our supervised systems discern be- tween good and bad translations using classic bi- ... See full document
5
The ILSP/ARC submission to the WMT 2018 Parallel Corpus Filtering Shared Task
... By comparing the results of the two alternative ranking schemes, we conclude that their perfor- mances are similar for the 100M corpora. This is explained by the fact that their intersection is ex- tremely high: 5.2M ... See full document
6
NRC Parallel Corpus Filtering System for WMT 2019
... model tasks. The task can be defined as fol- lows: given a sequence of words in which cer- tain words have been masked, predict the masked words based on the observable ones. The sequence of words can be one or more ... See full document
9
Measuring sentence parallelism using Mahalanobis distances: The NRC unsupervised submissions to the WMT18 Parallel Corpus Filtering shared task
... target corpus; if the corpus has a large number of short numeri- cal sentences (and it appears to), the measurement will come to prefer those, whether or not they are useful for the downstream ... See full document
8
Related subjects