[PDF] Top 20 An Iterative Approach for Mining Parallel Sentences in a Comparable Corpus
Has 10000 "An Iterative Approach for Mining Parallel Sentences in a Comparable Corpus" found on our website. Below are the top 20 most common "An Iterative Approach for Mining Parallel Sentences in a Comparable Corpus".
An Iterative Approach for Mining Parallel Sentences in a Comparable Corpus
... GigaWord corpus (which English version con- tains roughly 26 gigabytes of texts) and more modest cor- pora such as the TDT3 corpus used in (Fung and Cheung, 2004), and which English part contains 290 000 ... See full document
8
Multi level Bootstrapping For Extracting Parallel Sentences From a Quasi Comparable Corpus
... for mining parallel sentences from quasi-comparable bilingual texts which have very different sizes, and which include both in-topic and off-topic ...better parallel sentence ... See full document
7
Task Alternation in Parallel Sentence Retrieval for Twitter Translation
... by mining parallel sentences from comparable data, for example by using cross-lingual informa- tion retrieval (CLIR) techniques to retrieve a target language sentence for a source language ... See full document
5
Mining a Comparable Text Corpus for a Vietnamese French Statistical Machine Translation System
... “longer sentences in one language tend to be translated into longer sentences in the other language, and shorter sen- tences tend to be translated into shorter sen- ... See full document
8
Extracting Parallel Sentences from Comparable Corpora using Document Level Alignment
... a comparable corpus, and describe novel methods for parallel sentence ex- ...for parallel sentence extraction on comparable corpora, and de- scribes our approach, which finds a ... See full document
9
IMaT: Unsupervised Text Attribute Transfer via Iterative Matching and Translation
... rewrite sentences such that they possess cer- tain linguistic attributes, while simultaneously preserving their semantic ...simpler approach, Iterative Matching and Translation (IMaT), which: (1) ... See full document
13
Identifying Parallel Documents from a Large Bilingual Collection of Texts: Application to Parallel Article Extraction in Wikipedia
... harvesting comparable news collections in order to extract parallel ...in-domain parallel training corpus of United Nation texts, im- proved significantly an Arabic-to-English SMT sys- tem ... See full document
9
Mining Very Non Parallel Corpora: Parallel Sentence and Lexicon Extraction via Bootstrapping and E
... extracting parallel sentences from far more disparate “very-non-parallel corpora” than previous “comparable corpora” methods, by exploiting bootstrapping on top of IBM Model 4 ...a ... See full document
7
Hierarchical Document Encoder for Parallel Corpus Mining
... The results show document embeddings are able to achieve strong performance on parallel document mining. On a test set mined from the web, all models achieve strong retrieval perfor- mance, the best being ... See full document
9
Extracting an English Persian Parallel Corpus from Comparable Corpora
... extract parallel sentences from Wikipedia documents was ...extracted sentences were added to the ...the corpus extracted by bidirectional method performs better than the corpus ... See full document
6
PEXACC: A Parallel Sentence Mining Algorithm from Comparable Corpora
... of sentences, the first in the source language and the last in the target language, the job of the translation similarity measure is to assess “how parallel” the two sentences ...“not ... See full document
8
Inversion Transduction Grammar Constraints for Mining Parallel Sentences from Quasi Comparable Corpora
... Our approach is motivated by a number of desirable characteristics of ITGs, which historically were developed for translation and alignment purposes, rather than mining applications of the kind discussed in ... See full document
12
The Role of Sketch Engine in Multiple Types of Corpora
... In the beginning, only monolingual corpora was decided to be supported by the system. However, as time passed by, the need for new features arose. Thus, the features accommodating multilingual and bilingual corpus ... See full document
5
A Data Mining Approach to Learn Reorder Rules for SMT
... Work has to be done in terms of prioritization of the rules, for example first priority should be given to more specific rules (the one with constraints) then to the general rules. More constraints with respect to ... See full document
6
Scientific registers and disciplinary diversification: a comparable corpus approach
... other corpus-based studies on reg- ister, our goal is not to uncover dimensions of vari- ation or to discover text classes (as ...our corpus are taken from 38 journals from nine disciplines (for details see ... See full document
10
PaCCSS IT: A Parallel Corpus of Complex Simple Sentences for Automatic Text Simplification
... addition, sentences classified as complex have higher parse trees [13], longer dependency links [14] and longer embedded complement chains modifying a noun [15], all fea- tures correlated with syntactic complexity ... See full document
11
Building a Web Based Parallel Corpus and Filtering Out Machine Translated Text
... as parallel with very low precision, from 20% to ...actually parallel or not. For example, if we need to get 100 000 really parallel documents we should check from 500 thousand to 100 million ...Our ... See full document
9
Noisy Parallel Corpus Filtering through Projected Word Embeddings
... the corpus is normalized through punctuation removal and ...all parallel data, both the (supposedly) clean and the noisy evalu- ation sets, using a set of heuristics based heavily on the work of Pinnis ... See full document
5
Weighted Set Theoretic Alignment of Comparable Sentences
... An interesting additional result, not shown in the tables, is the weak impact of the hyper- parameter α: between 100 and 500, the scores were marginally different; only values markedly outside this range gave worse ... See full document
5
DOMCAT: A Bilingual Concordancer for Domain Specific Computer Assisted Translation
... A bilingual concordancer is a tool that can retrieve aligned sentence pairs in a parallel corpus whose source sentences contain the query and the translation equivalents of the query are identified ... See full document
6
Related subjects