[PDF] Top 20 Multi-feature Based Chinese-English Named Entity Extraction from Comparable Corpora

Multi-feature Based Chinese-English Named Entity Extraction from Comparable Corpora

... Transliteration feature is used to characterize this ...of Chinese characters. It had two levels of transliteration, Chinese character to pinyin syllable and pinyin syllable to English letter ... See full document

8

Weakly Supervised Named Entity Transliteration and Discovery from Multilingual Comparable Corpora

... Named Entity recognition (NER) is an important part of many natural language processing ...of Named Entities (NEs) in a resource free language, given a bilingual corpora in which it is weakly ... See full document

8

Kernel based Reranking for Named Entity Extraction

... Our main contribution is to show that (a) tree kernels can be used to define general features (not merely syntactic) and (b) using appropriate al- gorithms and features, reranking can be very ef- fective for ... See full document

9

Mining Large scale Comparable Corpora from Chinese English News Collections

... keyword extraction could be distinguished into two main categories: supervised or unsupervised ...keyword extraction such as Naïve Bayes (Frank et ...keyphrase extraction from single ...a ... See full document

9

Named Entity Transliteration and Discovery from Multilingual Comparable Corpora

... Named Entity recognition (NER) is an important part of many natural language processing ...cover Named Entities (NEs) in a resource free language, given a bilingual corpora in which it is ... See full document

7

Named Entity Transliteration with Comparable Corpora

... gual named entity identification, we are interested in the problem of name transliteration across languages that use different ...of named entities in “comparable” texts in multiple languages, ... See full document

8

Evaluating Ensemble Based Pre annotation on Named Entity Corpus Construction in English and Chinese

... Annotated corpora are crucial language resources, and pre-annotation is an usual way to reduce the cost of corpus ...Ensemble based pre-annotation approach combines multiple ex- isting named ... See full document

5

A New Approach for English Chinese Named Entity Alignment

... extract Named Entity translingual equivalences based on the minimization of a linearly combined multi-feature ...require Named Entity Recognition on both the source side ... See full document

8

ACCURAT Toolkit for Multi Level Alignment and Information Extraction from Comparable Corpora

... parallel corpora and linguistic resources for many languages and domains is one of the major obstacles for the further advancement of automated ...exploit comparable corpora (non-parallel bi- or ... See full document

6

EM based Hybrid Model for Bilingual Terminology Extraction from Comparable Corpora

... t(f|e) based on document alignment score (D), lexical similarity (L), named entity similarity (N), context similarity (C), temporal similarity (T), and related term similarity ...directly from ... See full document

8

Applying Neural Networks to English Chinese Named Entity Transliteration

... For English to Chinese, the boundaries of transliteration units are required at the decoding ...The English source names in the test set need to be segmented before being passed to the neural net- ... See full document

5

Bilingual Lexicon Extraction from Comparable Corpora Enhanced with Parallel Corpora

... French/English comparable corpus within the sub-domain of breast cancer in the medical ...the comparable corpus to discover parallel sentences and induced bilingual lexicon, the method could be ... See full document

8

A Methodology for Bilingual Lexicon Extraction from Comparable Corpora

... meanings. From this it can be inferred that if two words co-occur more frequently than expected in a corpus of one language, then their translations into another language will also co-occur more frequently ... See full document

10

Clinical Natural Language Processing in languages other than English: opportunities and challenges

... in English has shown international community efforts were a useful and effi- cient channel to benchmark and improve the state-of-the- art ...tion from Japanese clinical narratives to extract dis- ease names ... See full document

13

Bilingual lexicon extraction from comparable corpora using in domain terms

... In this paper, we present a method that is able to improve the accuracy significantly without re- quiring a large initial bilingual dictionary. Our approach is based on utilising highly associated terms in the ... See full document

9

Quantitative analysis of translation revision:contrastive corpus research on native English and Chinese translationese

... linguistics such as word frequency lists, concordances and collocations and are explored in the special issue of the Meta journal (Laviosa, 1998) and in particular the innovative research of Munday (1998). Olohan (2004: ... See full document

10

Bilingual Lexicon Extraction from Comparable Corpora as Metasearch

... been proposed (D´ejean et al., 2002; Daille and Morin, 2005). This approach can be seen as a query reformulation process in IR for which similar words are substituted for the word to be translated. These similar words ... See full document

9

Bootstrapping Entity Translation on Weakly Comparable Corpora

... translating named entities (NEs), such as persons, locations, or organizations, is a non-trivial ...are based on translitera- tions, as shown in Table 1—Some translations, especially the names of most ... See full document

10

Building English Vietnamese Named Entity Corpus with Aligned Bilingual News Articles

... bilingual named entity recognition ...the English-Vietnamese pair, this task still presents a significant challenge in a number of important respects ... See full document

9

Paraphrase Fragment Extraction from Monolingual Comparable Corpora

... After collecting the document pairs, we asked an- notators, “Are these two documents about the same topic?”, and allowing them to answer “Yes”, “No”, and “Not sure”. Each set of six document pairs con- tained, four to be ... See full document

9