to Select the Best among Outputs from Multiple MT systems
Yasuhiro Akiba, Taro Watanabe and Eiichiro Sumita
ATR Spoken Language Translation ResearchLaboratories
2-2-2 Hikaridai,Seika-cho, Soraku-gun, Kyoto 619-0288, Japan
fyasuhiro.akiba,taro.watanabe,[email protected]
Abstract
This paper addresses the problem of
automat-ically selecting the best among outputs from
multiple machine translation (MT) systems.
Existing approaches select the output assigned
thehighestscoreaccordingtoatargetlanguage
model. In some cases, the existing approaches
do not work well. This paper proposes two
methods to improve performance. The rst
method is based on a multiple comparison test
and checks whether a score from language and
translation models is signicantly higher than
the others. The second method is based on
probabilitythat a translation is notinferior to
the others, which is predicted from the above
scores. Experimentalresultsshowthatthe
pro-posedmethodsachievean improvementof 2 to
6 %inperformance.
1 Introduction
This paper addresses the challenging problem
of automatically selecting the best among
out-puts from multiple machine translation (MT)
systems (Figure 1). In combinations of
multi-ple MTsystems, somecomponent MTsystems
cantranslateasourcesentencewellwhileothers
cannot well. In sucha case,correct selectionof
thebestcanobviously boostperformance.
ATR has been developing such multiple MT
systems, including three Japanese-to-English
(J-E) MT systems: TDMT (Furuse and Iida,
Outputa
Outputb
Outputc
The Best Output
Selection System
MTa
MTb
MTc
Input
Figure 1: Theselection system.
1996),D3(Sumita,2001),andSMT(Watanabe
etal.,2002),andthree English-to-Japanese
(E-J)MTsystems: TDMT(FuruseandIida,1996),
HPAT (Imamura, 2002), and SMT (Watanabe
etal.,2002). InordertoevaluateeachMT
sys-tem, the MT outputs were manually assigned
one of four ranks 1
, A, B, C, and D, by
na-tive speakersof the target language. The ideal
selection for J-E MT systems is the
highest-rankedoutputsfromthethreeJ-EMTsystems:
TDMT, D3, and SMT. The ideal selection for
E-J MT systems is the highest-ranked outputs
fromthethreeE-JMTsystems: TDMT,HPAT,
andSMT.
Figure 2 shows the individual performances
of the three J-E MT systems and the ideal
se-lection system derived from their combination.
Figure 3 shows the individual performances of
the three E-J MT systems and the ideal
se-lection system derived from their combination.
The left-hand group of bars indicates the
ra-1
The fourranksare dened as follows: (A) Perfect:
noproblemineitherinformationor grammar;(B)Fair:
easy to understand, with either some unimportant
in-formationmissingor awed grammar; (C) Acceptable:
broken,but understandable with eort; (D) Nonsense:
importantinformationhasbeentranslatedincorrectly.
0
10
20
30
40
50
60
70
80
90
100
A
A+B
A+B+C
Performance (%)
Ideal Selection System
TDMT
D3
SMT
sys-thetotalnumberofsentencestranslatedbyeach
MT system (hereafter, performance for Rank
A).Themiddlegroupofbarsindicatestheratio
ofthenumberofsentencesrankedasAorBto
thetotalnumberofsentencestranslatedbyeach
MT system (hereafter, performance for Rank
A+B). The right-hand group of bars indicates
the ratio of the numb er of sentences ranked as
A, B, or C to the total numb er of sentences
translated by each MT system (hereafter,
per-formance for Rank A+B+C). The black bars
indicate the performance of the ideal selection
system. As Figures 2 and 3 show, the
perfor-manceoftheidealJ-EandE-Jselectionsystem
ismuchbetterthanthatofeachcomponentMT
system.
Conventional approaches to the selection
problem include methods (Callison-Burch and
Flournoy, 2001; Kaki et al., 1999) that
auto-maticallyselecttheoutputassignedthehighest
probability P(t) (hereafter, LM-score),
accord-ing to a language model (LM) for the
trans-lation target language. As a preliminary
ex-periment, theauthors appliedthis LM-score to
selecting the best among theoutputs from the
threeJ-EMTsystems. Inordertomakea
com-parison, theauthors also useda scorebasedon
a translation model (TM)called IBM4 (Brown
et al., 1993) (hereafter, TM-score) and a score
based on the product of theTM-score and the
LM-score (hereafter, TM3LM-score) to select
the best output. Table 1 shows the results of
thispreliminaryexperiment. Theoating
num-ber indicates the dierence between the
per-formance for Rank A of each selection system
and that ofD3 (the best MTsystem, i.e., with
thehighestperformanceforRankA). The
LM-score and TM-score based selections did not
0
10
20
30
40
50
60
70
80
90
100
A
A+B
A+B+C
Performance (%)
Ideal Selection System
TDMT
HPAT
SMT
Figure3: Performanceoftheidealselection
sys-Table 1: Dierence in performance forRank A
betweeneach selectionsystemand D3.
Scoringmethod TM3LM TM LM
Dierence
inperformance 4.1 -1.5 -0.5
boost/improve the performance for Rank A,
whereas the TM3LM-score did. The
prelimi-nary experiment appears to indicate that the
TM3LM-score works better than the LM-score
inselectingthebestoutput.
Ascanbeeasilyguessed, thescoresfrom
lan-guagemodel,translationmodel,orbothmodels
combined hastwo problems. The rst problem
isthatTM3LM-score,TM-score,and LM-score
are statistical variables. Even in the case that
TMandLMaretrainedonacorpusofthesame
size, changing the training corpus also changes
theTM-score, theLM-score, and the
TM3LM-score. Figure 4 shows this phenomenon. Inthe
gure, TRN1, TRN2, and TRN3 correspond,
respectively, to (2), (3), and (4) below, which
aretranslationsof(1) bydierent J-EMT
sys-tems.
(1)
I-subj
Konputa-no
computer-of
shisutemu
system
enjinia
engineer desu
am
\I'ma computerengineer."
(2) I'macomputer systemsengineer. (TRN1)
(3) I'macomputer salesman. (TRN2)
(4) It'scomputer. (TRN3)
LMand TMwere trainedintenwayson
train-ing sets, which were subsets of ATR
broad-coveragebilingualbasicexpression(BE) corpus
(Takezawa et al., 2002), according to ten-fold
crossvalidation(Mitchell,1997). Foreachi(i=
-35
-30
-25
-20
-15
-10
-5
0
1
2
3
4
5
6
7
8
9
10
Pairs of LMs and TMs
Logarithmic scaled score
TRN1
TRN2
TRN3
Each matrix shows agreement and disagreement between the ideal selection and the score-based
selectionusing theTM3LM-score. 1 and 0 indicates\selected"and \notselected", respectively.
TM3LM-Score-BasedSelection
Ideal TDMT D3 SMT
Selection 1 0 1 0 1 0
1 22.7 44.5 37.3 23.7 31.4 5.5
0 1.0 31.8 7.1 32.0 36.7 26.5
1,2,3),TRNiisscoredintenwaysby
TM3LM-score. Some TM3LM-scoresplaceTRNi(i=1,
2,3)inadierentorder. Evenifa hugecorpus
is prepared to train a good TM and LM, this
phenomenonremains.
Inordertosolvethisrstproblem,thispaper
propose a statistical-test-based selection
sys-tem. Here, the statistical test used is a
multi-plecomparisontestbasedontheKruskal-Wallis
test (Hochb erg and Tamhane, 1983). The
pro-posedmethodcheckswhether thehighestscore
issignicantlydierent from theothers.
The second problem is that the translations
with the highest TM3LM-score tend to dier
from those ranked highest by human
evalua-tors. Table 2 shows this phenomenon. The
Table consists of three 222 confusion
matri-cesforthreeJ-E MTsystems: J-ETDMT,D3,
and J-E SMT. Each matrix shows agreement
and disagreement between the ideal selection
bya human evaluator and the selection by the
TM3LM-score. The(1,1)-elementandthe
(0,0)-element indicate the percentage of agreement,
and the(1,0)-elementand the(0,1)-element
in-dicate the percentage of disagreement. In the
confusion matrix for J-E SMT, the number in
the(1,0)-elementislargerthanthatinthe
(0,1)-element. This means that the TM3LM-score
tends to give the highest score to the
transla-tion from J-E SMT when it is not the
transla-tionassignedthebestrank. Ontheotherhand,
in the confusion matrices for J-E TDMT and
D3, the number in the (0,1)-element is larger
thanthatinthe(1,0)-element. Thismeansthat
theTM3LM-scoretendsnottogivethehighest
score to the translation from the MT systems,
except forJ-E SMT, even ifthat translation is
assigned thebest rank.
Tosolvethissecondproblem,this paper
pro-poses a selection system based on the
condi-tional probability that a translation is not
in-feriorto theother translationswhen the
trans-lation encoded by using the TM3LM-score,
conditions. For each MT system, the
condi-tionalprobabilityislearnedasaregression tree
(Chambers and Hastie, 1992; Breiman et al.,
1984) from the vector-encoding of the
transla-tionslabeledas \notinferior"or\inferior".
The next section presents our two proposed
methods. Experimental results are shown and
discussedinSection 3. Finally, ourconclusions
arepresentedinSection 4.
2 Proposed Method
2.1 Proposed Method (1)
TosolvetherstproblemdescribedinSection1,
this Subsection proposes amethodthat selects
thetranslationaccording towhether thescores
of outputs from each MT system signicantly
dierfrom eachother.
In order to detect a signicant dierence,
the proposed method rst prepares multiple
subsets of the full parallel corpus according
to k-fold cross validation (Mitchell, 1997) and
trains both TM and LM on each subset. For
example, the full parallel corpus C is
di-vided into ten subsets V
i
(i = 0; 1; 111; 9).
For each i(i = 0; 1; 111; 9), the proposed
method trains a translation model TM
i on
C
i
(= C 0 V
i
) anda languagemodelLM
i on
thetarget-languagepart ofC
i
(Figure5).
Hereafter,letL
i
(t)denoteTri-gramstatistics
ofatranslationtbyusingLM
i
. Also,letT
i (s;t)
denote P
a2S
P(s;ajt), where s is a translation
source sentence, t is a translation target
sen-tence, and S is the alignment set 2
(Brown et
2
Notethat the denitionofS is changeddepending
C
1
C
9
C
10-fold Cross Validation
…..
TM
1
LM
1
…..
LM
9
TM
9
C
0
TM
0
LM
0
Parallel corpus
Figure5: Themethodtotrainmultiplepairsof
al.,1993) that includes thebestalignment, the
neighboring alignments, and the pegged
align-ments.
The proposed method scores each output
in k ways. In the example, as shown in
Figure 6, the proposed method scores
trans-lation t
a
from translation system MT
a
translations are scored by using a
lan-guage model of the translation target
lan-guage (Figure 6). On the other hand,
the proposed method scores translation t
a
when the translations are scored by
us-ing a translation model. Whereas, the
proposed method scores translation t
a
),whenthetranslationsare
scored byusing the products of the scores of a
languagemodel and atranslation model.
Then the proposed method compares
the means of the scores. In the example
(Figure 6),
)=10 are compared when
the translations are scored by using a
language model of the translation target.
P
)=10 are compared when the
translations are scored by using a
trans-lation model.
)=10 are compared when
the translations are scored by using the
prod-ucts of the scores of a language model and a
translation model.
The proposed method checks whether the
highest mean is signicantlydierent from the
ontheTM-trainingalgorithmsused.
(s,t
a
)
LM
0
Q:Which pair of averages of scores significantly different?
A:Apply Multiple Comparison Test based on Kruskal-Wallis Test (#)
Figure 6: Preparation formultiple comparison
othersbyusingamultiplecomparisontest with
the Kruskal-Wallis test 4
, which is known as a
Tukey-Kramer-type modication of the Dunn
test (Hochberg and Tamhane, 1983). If the
highest mean is signicantlydierent, the
pro-posed method selects the translation with the
highest score. If not, the proposed method
se-lectsfrom among thetranslations whosescores
are not signicantly dierent from the highest
score the translation from the MT whose
per-formanceis thebest.
2.2 Proposed Method (2)
To solve the second problem described in
Sec-tion1,thisSubsection proposes aselection
sys-tem based on the conditional probability that
a translation is not inferior to other
transla-tionswhenthetranslationencodedbyusingthe
TM3LM-score, the TM-score, or the LM-score
satisessomeconditions. ForeachMT system,
the conditional probability is learned as a
re-gression tree from the vector-encoding of the
translations labeled as \not inferior" or
\infe-rior",whichisthe criterionvariable.
In order to learn the conditional probability
mentioned above, translations from the
com-ponent translation systems, MT
a
, are ranked by human evaluators in
ad-vance (Figure 7). Let r
a
denote the rank
as-signedtotranslation t
a
from MTsystem MT
a .
Also,letr
best
denotethebestrankamongr
a ,r
b ,
3
It iswellknownthatrepeatingasimple t-test
mul-tipletimesincreases thechanceofincorrectlynding a
signicant dierence. Multiple comparison is designed
toavoidsuchaphenomenon.
4
TheKruskal-Wallistestisanon-parametricone-way
Analysisof Variance (ANOVA).This test doesnot
as-sumethedatadistribution.
(s,t
a
,r
a
)
to t
a
by human evaluator
(T(s,t
a
)* L(t
a
), T(s,t
a
), L(t
a
), R
a
(t
a
))
RT Learner
with pruning or shrinking
RT Learner
with pruning or shrinking
RT Learner
with pruning or shrinking
C
TM
LM
Figure7: Preparationforregression tree
c i i i i
is equal to 1 if r
i = r
best
; otherwise R
i (t
i ) is
equalto0. ThereforewhenR
a (t
a
)isequalto1,
the transaction t
a
is superior to or as good as
theother translations.
The proposed method trains a translation
model TM on the full parallel corpus C and a
language model LM on the
translation-target-languagepart ofC (Figure7).
Hereafter, let L(t) denoteTri-gramstatistics
of atranslation tbyusing LM.Also, letT(s;t)
denote P
a2S
P(s;ajt), where s is a translation
source sentence, t is a translation target
sen-tence,andS is thealignmentset(Brownetal.,
1993).
Next, the proposed method encodes three
vectors (s; t
a
(Figure 7) into three score-vectors with
non-inferiority or inferiority, respectively:
(T(s;t
For each MT
i
(i=a; b; or c),
the proposed method learns from
f(T(s;t
the conditional probability, which is expressed
by the regression tree (Chamb ers and Hastie,
1992; Breiman et al., 1984) RT
i
(Figure 7).
Regressiontree(RT)learnerisknownas
recur-sive binary partitioning. In growinga tree, an
RT learner recursively splits the training data
in each node so as to reduce variance within
partitions asmuchaspossible.
Ingeneral, thelearnedRTover-tsthe
train-ing data. As post-processing, the learned RT
is simplied by using two procedures:
prun-ingand shrinking(Chambers andHastie,1992;
Breiman et al., 1984). Pruning successively
snips o the least important splits. The
Im-portance of a rooted subtree is determined by
thecost-complexitymeasure,D
k
) denotesthedeviance
ofthesubtreeT 0
,size(T 0
)isthenumb erof
ter-minal nodes of T 0
, and k is a cost-complexity
parameter. Shrinking reduces the
num-(s,t
a
)
Q: Which translation is selected?
A: The translation with the maximal score (#)
Figure8: Themethodtoselectatranslationby
value of each node towards its parent node.
Shrunkentted values,forashrinking
parame-terk,arecomputedaccording totherecursion,
^
y(node) = k3 y(node)~ +(10k) 3y(parent);^
wherey(node)~ denotestheusualttedvaluefor
a node, y(parent)^ is the shrunken tted value
forthe node's parent, and k is a shrinking
pa-rameter such that 0 < k < 1. The
param-eter k in each of the two procedures is xed
so as to minimize cross-validation estimates of
the deviance. Therefore, after growing each
R T
i
(i=a; b; or c), theproposedmethod
per-formsoneof thesimplied procedures of
prun-ingand shrinking.
In the selection phase, the proposed method
encodes three pair of a source sentence and its
translation,(s;t
a ), (s;t
b
), and (s;t
c
) into three
vectors (T(s;t
The proposed method predicts the
condi-tional probability that each t
i
(i=a, b,orc) is
notinferior tothe others by using RT
i
and
se-lects 5
the translation with the highest
condi-tionalprobability.
3 Experimental Comparison
3.1 ExperimentalMethod
Theauthorsevaluatedtheproposedmethodsin
order to answer the following question: Which
selectionsystemimprovesperformancebest in
comparisonwiththatofthebestMTsystemi.e.
the MT systems with the highest performance
asshowninFigures2 and 3?
Inordertoanswertheabovequestion,the
au-thorsusedasetofthreeJ-EcomponentMT
sys-tems(TDMT,D3,andSMT)and asetofthree
E-J component MT systems (TDMT, HPAT,
andSMT).BilingualEnglishandJapanesedata
were from ATR broad-coverage bilingual basic
expression(BE)corpus(Takezawaetal.,2002),
which is split into three parts: a training set
of 125,537 sentence pairs, a verication set of
9,872pairs, and atest set of10,023 pairs.
Thefull corpusC intraining translation
tar-getlanguagemodelandtranslationmodelisthe
trainingset. Tensubsetsofthefullcorpus were
5
PreparingmultipleRTsforeachcomponentMT
sys-temenablesthesecondmethodtobeextendedsoas to
selectthebestoutputaccordingtothemultiple
lationmodelandlanguagemodelarelearnedby
using GIZA++ (Och and Ney, 2000) and the
CMU-CambridgeTo olkit(Clarksonand
Rosen-feld, 1997),respectively. Thetranslationmodel
islearnedfromIBM 1to4,includingtheHMM
model, as suggested by Och and Ney (2000),
and its training loop was terminated when the
perplexity for the validation set indicated the
lowest scores. The word classes used in TM
learningarethepart-of-speech(POS)classesin
TDMT.TheP-valueusedforthemultiple
com-parisontest is 0.05.
Four setsof aboutve hundred pairs of
En-glishandJapanesesentenceswererandomly
se-lectedfrom the test set. The Englishsentences
inthefoursetsweretranslatedbytheE-J
com-ponent MT systems and ranked by a native
speakerofJapanese; likewise theJapanese
sen-tencesinthefoursetsweretranslatedbytheJ-E
componentMTsystemsandrankedbyanative
speakerof English. Each performancewas
cal-culated asthe average of the performance over
thefour sets. In particular,theperformanceof
the second proposed method is calculated
ac-cording to four-fold cross validation (Mitchell,
1997).
3.2 Experimental Results
In order to evaluate the point mentioned at
the beginning of Section 3.1, the authors
com-paredtheperformanceof eachselection system
with thatof thebest MTsystem. Asshown in
Figure 2, among the J-E component MT
sys-tems, D3 had the best performance for Rank
A, and TDMT had the best performance for
bothRankA+B(equaltoorbetterthanB)and
Rank A+B+C (equal to orbetterthan C). As
shown in Figure 3, among the E-J component
MTsystems, TDMThad thebestperformance
for Rank A, Rank A+B, and Rank A+B+C.
Figures 9, 10, and 11 show the results of the
comparisons. The vertical axis in each gure
shows thedierencein theperformances.
Each bar corresponds to a selection system.
The rst three bars in left-to-right order
cor-respond toTM3LM-score-basedselection,
TM-score-based selection, and LM-score-based
se-lection, which were usedin thepreliminary
ex-periment described in Section 1. The next
three bars correspond to the rst proposed
methodbasedonTM3LM-score,TM-score,and
LM-score. The next three bars correspond to
the second proposed method in which
predic-bothTM3LM-scoreandTM-score,toallscores,
LM3TM-score, TM-score, and TM3LM-score.
Inthese selection methods,theregression trees
aresimpliedbyusingtheshrinkingprocedure.
The lastthree bars also correspond to the
sec-ond proposed method, but in these selection
methods, the regression trees are simplied by
using the pruning procedure. Accuracy means
thepercentageofcorrectly selectingtheoutput
assignedthehighest rank inalltrials.
Figure9showsthattherstproposedsystem
based on TM3LM-score achieved the greatest
improvementofalittleunder6%,inthe
perfor-manceforRankA.Ontheotherhand,the
exist-ingselectionsystemsimply using theLM-score
(languagemodelofthetranslationtarget)could
notimproveandevendegradedperformancefor
RankA.
Figure10showsthatthesecondproposed
sys-temwith thepruning procedure based on both
TM3LM-score and TM-score (marked
RT12-PRN in the graph) achieved the greatest
im-provement of about 5%, for Rank A+B (equal
toor higher than B). On the other hand,
Fig-ure10 shows that theexisting selectionsystem
-16
-14
-12
-10
-8
-6
-4
-2
0
2
4
6
8
10
12
14
16
18
Difference in performance (%)
LM*TM
TM
LM
LM*TM+KWMC
TM+KWMC
LM+KWMC
RT1+SHR
RT12+SHR
RT123+SHR
RT1+PRN
RT12+PRN
RT123+PRN
A
A+B
A+B+C
Accuracy
Figure 9: Dierence in performance between
eachselection systemand D3.
-16
-14
-12
-10
-8
-6
-4
-2
0
2
4
6
8
10
12
14
16
18
Difference in performance (%)
LM*TM
TM
LM
LM*TM+KWMC
TM+KWMC
LM+KWMC
RT1+SHR
RT12+SHR
RT123+SHR
RT1+PRN
RT12+PRN
RT123+PRN
A
A+B
A+B+C
Accuracy
formanceforRankA+B,witha degradationof
6%.
Figure11showsthatthesecondproposed
sys-tems, with eitherthe pruning procedure or the
shrinking procedure, based on only
TM3LM-score, onbothTM3LM-scoreand TM-score, or
onall scoresachieved animprovementofabout
2% for rank A in all cases. Figure 11 also
showsthatthesecondproposedsystemwiththe
shrinkingprocedurebasedonallscores(marked
RT123-SHRinthegraph)achievedan
improve-mentof a littlemore than2%forRank A+B.
4 Conclusions
Thispaperaddressedthechallengingproblemof
automaticallyselectingthebestamongoutputs
from multiple MT systems to improve
transla-tionquality. Thispaperproposedtwomethods.
The rst method is based on a multiple
com-parison test based on the Kruskal-Wallis test
and checks whether the highest score from the
languagemodel, thetranslation model, orboth
models combined is signicantly dierent from
theothers. Thesecondmethodisbasedon
con-ditionalprobabilitythatatranslation isnot
in-feriortotheotherswhenthetranslationsatises
someconditions. The conditionalprobabilityis
predicted bya regression tree learned from the
abovescores. The proposedmethodswere
eval-uatedusing anATRtravelcorpus.
Experimen-tal results showed that the performance of the
proposed methods is much better than that of
theexistingmethodsandachievedthe
improve-mentof 2 to6%in performance.
Acknowledgment
This research was supported in part by the
Telecommunications Advancement
Organiza-tion ofJapan.
-16
-14
-12
-10
-8
-6
-4
-2
0
2
4
6
8
10
12
14
16
18
Difference in performance (%)
LM*TM
TM
LM
LM*TM+KWMC
TM+KWMC
LM+KWMC
RT1+SHR
RT12+SHR
RT123+SHR
RT1+PRN
RT12+PRN
RT123+PRN
A
A+B
A+B+C
Accuracy
Figure 11: Dierence in performance between
LeoBreiman,Jerome H.Friedman,RichardA. Olshen,
and Charles J. Stone. 1984. Classication and
Re-gressionTrees. ChapmanandHall,California, USA.
PeterF.Brown,StephenDellaPietra, VincentJ.Della
Pietra,andRobertL. Mercer. 1993. The
mathemat-icsofstatisticalmachinetranslation: Parameter
esti-mation. ComputationalLinguistics,19(2):263{311.
ChrisCallison-Burchand Raymond S.Flournoy. 2001.
Aprogramforautomaticallyselectingthebestoutput
from multiple machine translation engines. In
Pro-ceedingsofMTSummitVIII,pages63{66.
John M. Chamb ers and Trevor J. Hastie. 1992.
Sta-tisticalModelsinS. Wadsworth&Brooks/Cole
Ad-vancedBooks&Software,ADivisionofWadsworth,
Inc.,California, USA.
PhilipClarksonandRonaldRosenfeld. 1997. Statistical
languagemodelingusingtheCMU-Cambridgetoolkit.
InProceedingsoftheEuropeanConferenceonSpeech
CommunicationandTechnology: EUROSPEECH-97,
pages2707{2710.
Osamu Furuse and Hitoshi Iida. 1996. Incremental
translation utilizing constituent boundary patterns.
InProceedingsofthe16thInternationalConferenceon
ComputationalLinguistics: COLING-96,pages412{
417.
Y. Hochb erg andA.C.Tamhane. 1983. Multiple
Com-parisonProcedures.Wiley,NewYork,USA.
KenjiImamura. 2002. Applicationoftranslation
knowl-edge acquired by hierarchical phrase alignment for
pattern-based mt. InProceedingsof the9th
Confer-enceonTheoreticalandMethodologicalIssuesin
Ma-chineTranslation,pages74{84.
Satoshi Kaki, Setsuo Yamada, and Eiichiro Sumita.
1999. Scoring multiple translations using
charac-ter n-gram. In Proceedings of the 5th Natural
Lan-guage Processing Pacic Rim Symposium:
NLPRS-99,pages298{302.
Tom M. Mitchell. 1997. Machine Learning. The
McGraw-HillCompaniesInc.,NewYork,USA.
FranzJosefOchandHermannNey. 2000. Improved
sta-tisticalalignmentmodels. InProceedingsof the38th
AnnualMeetingoftheAssociationforComputational
Linguistics: ACL00,pages440{447.
EiichiroSumita. 2001. Example-based machine
trans-lation using DP-matching between work sequences.
InProceedingsof the ACL2001 Workshop on
Data-Driven Methods in Machine Translation:
DDMT-2001,pages1{8.
ToshiyukiTakezawa,EiichiroSumita,FumiakiSugaya,
Hirofumi Yamamoto, and Seiichi Yamamoto. 2002.
Towardabroad-coveragebilingualcorpus forspeech
translation of travel conversations inthe real world.
InProceedings of Third International Conference on
Language Resources and Evaluation: LREC2002,
pages147{152.
Taro Watanab e, Kenji Imamura, and Eiichiro Sumita.
2002. Statistical machine translation system based
on hierarchical phrase alignment. In Proceedings of
the9thConferenceonTheoreticalandMethodological