[PDF] Top 20 Aligning Sentences in Parallel Corpora
Has 10000 "Aligning Sentences in Parallel Corpora" found on our website. Below are the top 20 most common "Aligning Sentences in Parallel Corpora".
Aligning Sentences in Parallel Corpora
... Finally, for an eft-bead, we assume that the length of the English sentence is drawn from the distribution Pr g, and that the log of the ratio of the sum of the lengths of the French sen[r] ... See full document
8
Aligning Parallel Bilingual Corpora Statistically with Punctuation Criteria
... to aligning sentences in bilingual parallel corpora based on punctuation, especially for English and ...clean parallel corpora written in two Western languages, such as ... See full document
28
Aligning Sentences from Standard Wikipedia to Simple Wikipedia
... Text simplification can improve accessibility of texts for both human readers and automatic text process- ing. Although simplification (Wubben et al., 2012) could benefit from data-driven machine translation, ... See full document
7
Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling
... counter-intuitively, was slower despite reduced net- work traffic, due to skew in the distribution of sim- ilar document pairs. In our experiments, half of the source collection was not linked to any target docu- ment, ... See full document
5
Overview of the Second BUCC Shared Task: Spotting Parallel Sentences in Comparable Corpora
... of parallel resources for many language pairs and do- ...utilizing parallel por- tions found within comparable corpora (Utiyama and Isahara, 2003; Munteanu et ...monolingual corpora in which ... See full document
8
Identifying Parallel Documents from a Large Bilingual Collection of Texts: Application to Parallel Article Extraction in Wikipedia
... of parallel or closely parallel document ...extracting parallel fragments. For one thing, we know that extracting parallel sentences from a parallel corpus is something we do ... See full document
9
STACC, OOV Density and N gram Saturation: Vicomtech’s Participation in the WMT 2018 Shared Task on Parallel Corpus Filtering
... ble corpora, with lower volumes of noise, notably allowing OOV words to contribute to the score if they are capitalised words in truecased sentences or ...for parallel sentence identification in ... See full document
7
A Program for Aligning Sentences in Bilingual Corpora
... The program makes use of the fact that longer sentences in one language tend to be translated into longer sentences in the other language, and that shorter sentences tend to be translate[r] ... See full document
8
Pivot Approach for Extracting Paraphrase Patterns from Bilingual Corpora
... Paraphrase patterns are useful in paraphrase recognition and generation. In this paper, we present a pivot approach for extracting para- phrase patterns from bilingual parallel cor- pora, whereby the English ... See full document
9
Parallel Corpus Filtering Based on Fuzzy String Matching
... of sentences. These par- allel corpora are compiled from the ...two parallel corpora, some other publicly available data are provided for development pur- ...available parallel data. ... See full document
5
Extracting Parallel Sentences from Comparable Corpora using Document Level Alignment
... for parallel sentence ex- ...for parallel sentence extraction on comparable corpora, and de- scribes our approach, which finds a global sentence alignment between two ... See full document
9
IMaT: Unsupervised Text Attribute Transfer via Iterative Matching and Translation
... rewrite sentences such that they possess cer- tain linguistic attributes, while simultaneously preserving their semantic ...pseudo- parallel corpus by aligning a subset of se- mantically similar ... See full document
13
Proceedings of the 10th Workshop on Building and Using Comparable Corpora
... extracting parallel sentences from comparable monolingual corpora, so as to give an overview on the state of the art and to identify the best performing ... See full document
10
Aligning parallel texts with InterText
... InterText is a flexible manager and editor for alignment of parallel texts aimed both at individual and collaborative creation of parallel corpora of any size or translational memories. It is ... See full document
5
Weakly Supervised Attentional Model for Low Resource Ad hoc Cross lingual Information Retrieval
... We propose a weakly supervised neural model for Ad-hoc Cross-lingual Information Re- trieval (CLIR) from low-resource languages. Low resource languages often lack relevance annotations for CLIR, and when available the ... See full document
6
Microblogs as Parallel Corpora
... 1 parallel tweet can be found for every 200 tweets we process using our tar- geted ...had parallel data in the ZH-EN pair, if we extrapolate for the whole 868k filtered tweets, we expect that we can find ... See full document
11
Reliable Measures for Aligning Japanese English News Articles and Sentences
... and sentences to make a large parallel ...English sentences in these ...in sentences aligned by DP matching and that for sentence alignment uses similar- ities in articles aligned by ... See full document
8
Inversion Transduction Grammar Constraints for Mining Parallel Sentences from Quasi Comparable Corpora
... Broadly speaking, Bracketing ITGs are useful when we wish to make use of the structural properties of ITGs discussed above, without requiring any addi- tional linguistic information as constraints. Since they lack ... See full document
12
A Geometric Approach to Mapping Bitext Correspondence
... Chen, "Aligning Sentences in Bilingual Corpora Using Lexical Informa- tion," Proceedings of the 31st Annual Meet- ing of the Association for Computational Lin- guistics, Columbus, OH, 19[r] ... See full document
12
zNLP: Identifying Parallel Sentences in Chinese English Comparable Corpora
... Previous work (Smith et al., 2010; Munteanu and Marcu, 2005) on parallel sentence extrac- tion from comparable corpora has used external clues for this purpose. (Smith et al., 2010) boot- strapped the ... See full document
5
Related subjects