• No results found

Using Features from a Bilingual Alignment Model in Transliteration Mining

N/A
N/A
Protected

Academic year: 2020

Share "Using Features from a Bilingual Alignment Model in Transliteration Mining"

Copied!
9
0
0

Loading.... (view fulltext now)

Full text

(1)

/iFydye!KBH9X/Qb?Bb?X+XDT

M/`2rX7BM+?!MB+iX;QXDT

b2vKKQ!KBHX/Qb?Bb?X+XDT

(2)
(3)

S

S

γ

G

|

α,G0

DP

(

α, G

0)

(

k

,

k

)

|

G

G

G

0

α

α >

0

G

α

G

0

G

G

0

k

(

k

,

k)

(

k

,

−k

)

k

k

p

((

k

,

k

))

|

(

k

,

k)) =

N

((

k

,

k

)) +

αG

0(( k

,

k

))

N

+

α

N

N

((

k

,

k))

(4)

アン

a

an

do リューriyuu

d

roid

Japanese

Character Sequence

English

Character Sequence

Model Score:

0.034

0.012

10e-12

f

1

f

2

f

3

f

4

logprob numsegs |

t| |s| |

sbad|+|tbad|

|s|+|t|

minprob

f

1

f

2

f

3

f

4

(5)

Document

Web Resource (Wikipedia)

Document

Japanese Wiki

Titles

Document

English Wiki Titles Interlanguage links

マイケル ジャクソン

...

Michael Jackson ...

Document

Segment File

Bilingual Co-segmentation

マイ|mi ケ|cha ル|el -4.6 -7.3 - -5.1

Document

Features

Document

Good pairs

Document

Bad Pairs

SVM

Document

Seed Sentences (Positive Examples)

Document

ExamplesNegative

Threshold

Train

Test Test pairs are a randomly sampled

(6)

Log probability of the least likely segment A ve ra g e l o g p ro b a b ili ty o f th e se g me n ts 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1

-1 -0.5 0 0.5 1

Score

SVM classification threshold precision

(7)

0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

Recall Precision

En-Ar

proposed lcsr50 random baseline 0 0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 1

Recall Precision

En-Ch

proposed lcsr40 random baseline 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

Recall Precision

En-Hi

proposed lcsr40 random baseline 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

Recall Precision

En-Ru

proposed lcsr40 random baseline 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

Recall Precision

En-Ta

proposed lcsr50 random baseline 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

(8)
(9)

References

Related documents

Moreover, a follow- up investigation of the non-English servers sug- gests that nearly a third contain some useful cross-language data, such as parallel English

An English Korean Transliteration Model Using Pronunciation and Contextual Rules Jong Hoon Oh, and Key Sun Choi Computer Science Division, Dept of EECS, Korea Advanced Institute of Science

Bilingual Knowledge Acquisition from Korean English Parallel Corpus Using Alignment Bilingual K n o w l e d g e A c q u i s i t i o n from K o r e a n E n g l i s h Parallel Corpus U s i

IJCNLP 2008 Statistical Transliteration for Cross Langauge Information Retrieval using HMM alignment and CRF Surya Ganesh, Sree Harsha LTRC, IIIT Hyderabad,

This paper proposed a noise-aware many-to-many alignment model that can distinguish partial noise in transliteration pairs for bootstrapping statistical ma- chine transliteration

Link reweighting was performed as follows (Eq. We performed multiple experiments on the NEWS dataset to test the effect of graph reinforcement with link reweighting with

Table 3 summarizes results of Russian to English machine translation systems trained on the orig- inal parallel corpus and on the morph-reduced corpus and using GIZA++

Unlike the previous studies using comparable corpora, therefore, we use two parallel corpora through the pivot language like Korean (KR)-English (EN) and English