[PDF] Top 20 Feature Based Method for Document Alignment in Comparable News Corpora
Has 10000 "Feature Based Method for Document Alignment in Comparable News Corpora" found on our website. Below are the top 20 most common "Feature Based Method for Document Alignment in Comparable News Corpora".
Feature Based Method for Document Alignment in Comparable News Corpora
... any document would imply fewer possible document alignment pairs for the ...each document, we use the term extraction model from Vu et ...per document are 556/37, 410/28 and 384/28 for ... See full document
9
Set Theoretic Alignment for Comparable Corpora
... Sophisticated feature-based approaches have been developed in recent years in order to provide a method that may apply to larger sets of language pairs and ...a feature-based sentence ... See full document
10
Extracting bilingual terminologies from comparable corpora
... Our method works by first pairing each term ex- tracted from a source language document S with each term extracted from a target language doc- ument T aligned with S in the comparable cor- ...term ... See full document
10
Combining String and Context Similarity for Bilingual Term Alignment from Comparable Corpora
... are based on corpus-independent features, ...context-based method was affected by corpus comparability ...hybrid method that combines compositional and contextual similarity scores as features ... See full document
12
Extracting Parallel Sentences from Comparable Corpora using Document Level Alignment
... automatic document alignment in news and web corpora has been explored by a number of researchers, includ- ing Resnik and Smith (2003), Munteanu and Marcu (2005), Tillmann and Xu (2009), and ... See full document
9
ACCURAT Toolkit for Multi Level Alignment and Information Extraction from Comparable Corpora
... requires comparable corpora aligned in the document level as ...cognate based method fails, therefore, allowing increasing the recall of the ... See full document
6
Mining Large scale Comparable Corpora from Chinese English News Collections
... Existing approaches for keyword extraction could be distinguished into two main categories: supervised or unsupervised methods. Supervised machine learning algorithms were widely used in keyword extraction such as Naïve ... See full document
9
Sentence Alignment for Monolingual Comparable Corpora
... sentence alignment. Our method em- phasizes the search for an overall alignment, while relying on a simple local similarity ...cal alignment within mapping fragments to find sen- tence ...the ... See full document
8
Building Comparable Corpora Based on Bilingual LDA Model
... Table 2: Existing Methods Comparison The table shows CS outperforms other algo- rithms, which indicates that bilingual LDA is valid to construct comparable corpora. Thuy et al. (2009) matches similar ... See full document
5
A Portable Method for Parallel and Comparable Document Alignment
... fast method for parallel document alignment based on hapax legomena, ...module based on hapaxes and numerical entities; a classifier that includes three features based on ... See full document
13
Accurate Parallel Fragment Extraction from Quasi–Comparable Corpora using Alignment Model and Translation Lexicon
... We then apply an averaging filter to the initial scores to obtain filtered scores in both directions. The averaging filter sets the score of one word to the average score of several words around it. We think the words ... See full document
7
STACC, OOV Density and N gram Saturation: Vicomtech’s Participation in the WMT 2018 Shared Task on Parallel Corpus Filtering
... We based our ap- proach on STACC , an efficient and portable method for parallel sentence identification in comparable ...core method was expanded with a penalty based on the amount of ... See full document
7
Phrase based Parallel Fragments Extraction from Comparable Corpora
... phrase-based method to ex- tract parallel fragments from the compa- rable ...decoder based on the hierarchical phrase-based (HPB) translation model to detect the alignments in ... See full document
5
Automatic Building and Using Parallel Resources for SMT from Comparable Corpora
... corpus based machine translation, especially Statistical Machine Translation (SMT), from comparable corpora has recently received wide attention in the field Machine Translation ...from ... See full document
10
Bootstrapping Entity Translation on Weakly Comparable Corpora
... this feature also depends on the comparability of entity occurrences in time-stamped corpora, which may not hold as shown in Figure ...our method can find and compare articles, on different dates, ... See full document
10
Exploiting Comparable Corpora and Bilingual Dictionaries for Cross Language Text Categorization
... Using only comparable corpora. Figure 2 re- ports the performance without any use of bilingual dictionaries. Each graph show the learning curves respectively using a BoW kernel (that is consid- ered here as ... See full document
8
Parallel and comparable corpora: What are they up to?
... and comparable corpora are of use to translation, it is difficult to generate ‘possible hypotheses as to translations’ with such data (Aston, ...Parallel corpora, in contrast, provide ‘[g]reater ... See full document
13
Identifying Comparable Corpora Using LDA
... applied based on the proportion of source language document’s NEs found in the target document (we do not expect all the NEs to be present in the target language: NEs could be mis-translated, and not all ... See full document
5
Named Entity Transliteration with Comparable Corpora
... works as follows: We pool all documents in a sin- gle day to form a large pseudo-document. Then, for each transliteration candidate (both Chinese and English), we compute its frequency in each of those ... See full document
8
A Factory of Comparable Corpora from Wikipedia
... Given a pair of articles related by an interlan- guage link, we estimate the similarity between all their pairs of cross-language sentences with dif- ferent text similarity measures. We repeat the pro- cess for all the ... See full document
11
Related subjects