[PDF] Top 20 Mining New Word Translations from Comparable Corpora

Mining New Word Translations from Comparable Corpora

... parallel corpora to learn bilingual lexicons (Melamed, 1997; Moore, ...parallel corpora are scarce resources, especially for uncommon language ...pairs. Comparable corpora refer to texts ... See full document

7

Robust Transliteration Mining from Comparable Corpora with Bilingual Topic Models

... We found that our model was more success- ful at ﬁnding the correct transliteration of longer words, as smaller words tend to have more spelling variations and are orthographi- cally more similar to other words. By ... See full document

9

Inversion Transduction Grammar Constraints for Mining Parallel Sentences from Quasi Comparable Corpora

... We found that sentence pairs with high alignment scores are not necessarily more similar than others. This might be due to the fact that EM estimation at each intermediate step is not reliable, since we only have a small ... See full document

12

Learning the Optimal Use of Dependency parsing Information for Finding Translations with Comparable Corpora

... a word can be described by the sentence in which it occurs (Laroche and Langlais, 2010) or a sur- rounding word-window (Rapp, 1999; Haghighi et ...cessors from the dependency-parse tree, instead of a ... See full document

9

Using WordNet and Semantic Similarity for Bilingual Terminology Mining from Comparable Corpora

... all translations in the bilingual dictionary are relevant for the target context vec- tor ...prominent translations of polysemous ...word’s translations senses. We hypoth- esize that a word is ... See full document

8

PEXACC: A Parallel Sentence Mining Algorithm from Comparable Corpora

... obtained from 𝑠 by translation if 𝑠 and 𝑡 were to be presented together as a pair to a human ...𝑡 translations in the dictionary and/or the translations probabilities are small, the resulting (low) ... See full document

8

Identifying Word Translations from Comparable Documents Without a Seed Lexicon

... apart from word seg- mentation and lemmatization (which improves results but is not essential) it does not require any linguistic proces- ...the new languages, as described in section ... See full document

7

Unsupervised Word Sense Disambiguation Using Bilingual Comparable Corpora

... We evaluated our method through an experiment using corpora of English and Japanese newspaper articles. The first language was English and the second language was Japanese. A Wall Street Journal corpus (July, ... See full document

7

Bilingual Lexicon Extraction from Comparable Corpora Enhanced with Parallel Corpora

... extraction from comparable corpora relies on the assumption that words which have the same meaning in different languages tend to appear in the same lexical contexts (Fung, 1998; Rapp, ...each ... See full document

8

Extracting bilingual terminologies from comparable corpora

... one word on both source and target ...were word-word term pairs. 462 of these (i.e. 17% of the word-word term pairs or 9% of the overall set of aligned term pairs) were already in ... See full document

10

Extracting Multiword Translations from Aligned Comparable Documents

... using comparable corpora relied on dictionaries of single ...constructed from the translations of its ...aligned comparable documents, thereby not pre- supposing any given ...multi- ... See full document

9

Bilingual Terminology Mining Using Brain, not brawn comparable corpora

... alignment from comparable corpora, good results on single words can be obtained from large corpora — several millions words — the accu- racy of proposed translation is about 80% for the ... See full document

8

Word Co occurrence Counts Prediction for Bilingual Terminology Extraction from Comparable Corpora

... ranging from text categories to linguistic struc- tures to novel ...parable corpora are not available, one way to ap- proach this problem and to make co-occurrence counts more reliable, is to use prediction ... See full document

9

Discovering Light Verb Constructions and their Translations from Parallel Corpora without Word Alignment

... a new method for unsupervised joint discovery of MWEs and their ...of word alignment for this task, but we are in- terested in seeing how the automatic discovery of MWEs can be performed without relying on ... See full document

6

Some Experiments in Mining Named Entity Transliteration Pairs from Comparable Corpora

... news corpora in any given ...news corpora in the world’s languages, points to the promise of mining transliteration pairs endlessly, provided an effective identification of such NEs in specific ... See full document

8

Mining Large scale Comparable Corpora from Chinese English News Collections

... two standards are established: (a) the number of high relevant pairs created, which is the count of document pairs in Level 1 and 2; (b) the quality of the whole alignments, that is to say the per- centage of alignment ... See full document

9

Detecting Highly Confident Word Translations from Comparable Corpora without Any Prior Knowledge

... tifying word translations across comparable ...per-topic word distributions obtained by the bilingual LDA (BiLDA) latent topic ...probable word translations across languages in a ... See full document

11

Identifying Word Translations from Comparable Corpora Using Latent Topic Models

... As our training corpus, we use the English-Italian Wikipedia corpus of 18, 898 document pairs, where each aligned pair discusses the same subject. In or- der to reduce data sparsity, we keep only lemmatized noun forms ... See full document

6

MINT: A Method for Effective and Scalable Mining of Named Entity Transliterations from Large Comparable Corpora

... many translations were available for a source word, we considered only the top-4 trans- ...single word NETEs, in each pair of ...culled from Wikipedia interwiki links and were cleaned ... See full document

9

Mining Multi word Named Entity Equivalents from Comparable Corpora

... each word in the source document as a person, location, organization or ...a word can be constructed by concatenating strings from the acronym model, it is treated as an acronym, and the acronym ... See full document

8