Sonja Nieen and Hermann Ney
Lehrstuhl furInformatikVI
Computer Science Department
RWTH{ University ofTechnology Aachen
D-52056Aachen, Germany
Email: [email protected] chen .de
Abstract
Intheframeworkofstatisticalmachine
transla-tion(SMT),correspondencesbetweenthewords
in the source and the target language are
learned from bilingual corpora on the basis of
so-called alignment models. Many of the
sta-tistical systemsuse littleorno linguistic
know-ledge to structure the underlying models. In
thispaperwearguethattrainingdatais
typical-lynotlargeenoughto suÆcientlyrepresent the
rangeofdierentphenomenainnatural
langua-gesandthatSMTcantakeadvantageofthe
ex-plicitintroductionofsomeknowledgeaboutthe
languages under consideration. The
improve-mentof thetranslationresults isdemonstrated
on two dierentGerman-English corpora.
1 Introduction
In this paper, we address the question of how
morphological and syntactic analysis can help
statistical machine translation (SMT). In our
approach,we introduceseveral transformations
to the source string (in our experiments the
sourcelanguageisGerman)todemonstratehow
linguisticknowledgecanimprovetranslation
re-sults especially in the cases where the
token-type ratio (number of training words versus
numberofvocabularyentries) isunfavorable.
After reviewing the statistical approach to
machine translation, we rst explain our
mo-tivation for examining additional knowledge
sources. Wethenpresentourapproachindetail.
Experimental resultsontwobilingual
German-English tasks are reported, namely the
Verb-mobiland theEuTranstask. Finally,wegive
an outlookon ourfuturework.
2 Statistical Machine Translation
Thegoalofthetranslationprocessinstatistical
lows: A source language string f J
istobetranslatedinto atarget languagestring
e
. In the experiments reported in
thispaper, thesource languageis German and
the target language is English. Every English
stringisconsideredasapossibletranslationfor
theinput. Ifwe assign a probabilityPr(e I
toeachpairofstrings(e I
1 ;f
J
1
),thenaccordingto
Bayes' decisionrule,we haveto choosethe
En-glish string that maximizes the product of the
English language model Pr(e I
1
) and the string
translationmodelPr(f J
Many existing systems for SMT (Wang and
Waibel, 1997;Nieenetal.,1998; Ochand W
e-ber,1998)makeuseofaspecialwayof
structur-ingthe string translationmodel (Brown et al.,
1993): The correspondence between the words
inthesource and thetarget stringis described
byalignmentsthatassignonetarget word
posi-tion to each source word position. The
prob-ability of a certain English word to occur in
thetargetstringisassumedto dependbasically
onlyonthesourcewordalignedtoit. Itisclear
thatthisassumption isnotalwaysvalidforthe
translation of natural languages. It turns out
thateventhoseapproachesthatrelaxthe
word-by-wordassumptionlike(Ochetal.,1999)have
problemswithmanyphenomenatypical of
nat-ural languages in general and German in
par-ticularlike
idiomaticexpressions;
compoundwordsthathavetobetranslated
bymore thanone word;
long range dependencies like prexes of
verbs placedat theendof thesentence;
sources mentioned above are trained on
bi-lingual corpora. Bearing in mind that more
than40%ofthewordformshaveonlybeenseen
onceintraining(see Tables1and 4),itis
obvi-ousthatthephenomenalistedabovecanhardly
be learned adequately from the data and that
theexplicitintroductionoflinguisticknowledge
isexpected to improve translationquality.
The overall architecture of the statistical
translationapproachisdepictedinFigure1. In
this gure we already anticipate the fact that
wewilltransformthesourcestringsinacertain
manner. If necessarywe can also apply the
in-verse of these transformationson the produced
outputstrings. InSection3weexplainindetail
which kindsof transformationswe apply.
Source Language Text
Transformation
Lexicon Model
Language Model
Global Search:
Target Language Text
over
Pr( f
1
J
| e
1
I
)
Pr(
e
1
I
)
Pr( f
1
J
| e
1
I
)
Pr(
e
1
I
)
e
1
I
f
1
J
maximize
Alignment Model
Transformation
Figure 1: Architecture of the translation
ap-proachbased on Bayes' decisionrule.
3 Analysis and Transformation of
the Input
As already pointed out, we used the method
of transforming the inputstring in our
experi-ments. The advantage of thisapproach isthat
existingtrainingand searchproceduresdidnot
have to be adapted to new models
incorporat-ingtheinformationunderconsideration. Onthe
other hand, it would be more elegant to leave
the decision between dierent readings, for
in-equateforthepreliminaryidenticationofthose
phenomena relevant for improvingthe
transla-tionresults.
3.1 Analysis
We used GERTWOL, a German
Morphologi-cal Analyser (Haapalainen and Majorin, 1995)
and the Constraint Grammar Parser for
Ger-manGERCGfor lexicalanalysis and
morpho-logicaland syntacticdisambiguation. For a
de-scriptionof the Constraint Grammar approach
we refer the reader to (Karlsson, 1990). Some
preprocessing was necessary to meet the input
format requirements of the tools. In the cases
wherethetoolsreturnedmorethanonereading,
either simple heuristics based on domain
spe-cic preference rules where applied or a more
general, non-ambiguousanalysiswasused.
In the following subsections we list some
transformationswe have tested.
3.2 Separated GermanVerbprexes
Some verbs in German consist of a main part
and a detachable prex which can be shifted
to the end of the clause, e.g. \losfahren" (\to
leave")inthesentence\Ichfahre morgenlos.".
We extractedall word formsof separableverbs
fromthetrainingcorpus. Theresultinglist
con-tainsentriesof the formprefixjmain. The
en-try \losjfahre" indicates, for example, that the
prex\los"canbedetachedfromthewordform
\fahre". Inallclausescontainingaword
match-ingamainpartandawordmatchingthe
corre-spondingprexpart occuring at theendofthe
clause,theprexisprependedto thebeginning
ofthemainpart,asin \Ich losfahremorgen."
3.3 German Compound Words
Germancompoundwordsposespecialproblems
to the robustness of a translation method,
be-causetheworditselfmustberepresentedinthe
trainingdata: theoccurenceofeachofthe
com-ponents is notenough. The word \Fruchtetee"
forexample can not be translated although its
components\Fruchte"and \Tee"appearinthe
trainingset of EuTrans. Besides, even if the
compoundoccursintraining,thetraining
algo-rithmmaynotbecapableoftranslatingit
prop-erly as two words (in the mentioned case the
align-components.
3.4 Annotation withPOS Tags
One way of helping the disambiguation of
am-biguous words is to annotate them with their
partofspeech(POS)information. Wechosethe
following very frequent short words that often
caused errorsintranslationforVerbmobil:
\aber" can beadverb orconjunction.
\zu" can be adverb, preposition, separated
verbprexorinnitivemarker.
\der", \die" and \das" canbedenite
arti-clesorpronouns.
The diÆculties due to these ambiguities are
illustratedbythefollowing examples: The
sen-tence\Daswurdemirsehrgutpassen." isoften
translated by \The would suit me very well."
insteadof\That wouldsuitmevery well." and
\Das war zu schnell." is translated by \That
wasto fast." insteadof \Thatwastoo fast.".
We appended the POS tag in training and
test corpusfortheVerbmobil task(see 4.1).
3.5 Merging Phrases
Some multi-word phrases as a whole represent
a distinct syntactic role in the sentence. The
phrase \irgend etwas" (\anything") for
exam-ple may form either an indenite determiner
or an indenitepronoun. Like 21 other
multi-wordphrases\irgend-etwas"ismergedinorder
to form one singlevocabularyentry.
3.6 Treatment of Unseen Words
Forstatistical machine translationit isdiÆcult
to handle words not seen in training. For
un-known proper names, it is normally correct to
place the word unchangedinto the translation.
We have beenworkingon the treatment of
un-known words of other types. As already
men-tionedinSection3.3,thesplittingofcompound
words can reduce thenumberofunknown
Ger-manwords.
Inaddition,wehaveexaminedmethodsof
re-placingawordfullformbyamoreabstractword
formandcheckwhetherthisformisknownand
can be translated. The translation of the
sim-plied word form is generally not the precise
translation of the original one, but sometimes
form and can be transformed to the less
specic form \kalt"(\cold").
\Jahre" (\years") can be replaced bythe
sin-gularform \Jahr".
\beneidest" (\to envy" in rst person
singu-lar): iftheinnitiveform\beneiden"isnot
known, it might help just to remove the
leading particle\be".
4 Translation Results
We use the SSER (subjective sentence error
rate) (Nieen et al., 2000) as evaluation
cri-terion: Each translated sentence is judged by
a human examiner according to an error scale
from 0.0 (semantically and syntactically
cor-rect) to 1.0(completely wrong).
4.1 Translation Results for Verbmobil
The Verbmobil corpus consists of
spontane-ously spoken dialogs in the appointment
sche-dulingdomain (Wahlster, 1993). German
sen-tencesare translated into English. The output
ofthespeechrecognizer(forexamplethe
single-best hypothesis) is used as input to the
trans-lationmodules. Forresearchpurposesthe
orig-inaltext spoken by the users can be presented
to the translation system to evaluate the MT
component seperatelyfromtherecognizer.
The training set consists of 45680 sentence
pairs. Testing was carried out on a seperate
set of 147 sentences that do not contain any
unseen words. In Table 1 thecharacteristics of
thetrainingsetsaresummarizedfortheoriginal
corpusandaftertheapplicationofthedescribed
transformationsontheGermanpartofthe
cor-pus. The table shows that on this corpus the
splittingofcompoundsimprovesthetoken-type
ratiofrom59.7to65.2,butthenumberof
single-tons(wordsseenonlyonceintraining)doesnot
go down by more than2.8%. The other
trans-formations(prependingseparated verb prexes
\pref";annotationwithPOStags\pos";
merg-ingof phrases\merge")do notaect these
cor-pusstatisticsmuch.
Thetranslationperformanceresultsaregiven
inTable 2 for translation of text and in Table
3 for translation of the single-best hypothesis
given bya speech recognizer(accuracy 69%).
ing(\baseline"=no preprocessing).
preprocessing no. of no. of
single-tokens types tons
English 465143 4382 37.6%
German
baseline 437968 7335 44.8%
verbprexes 435686 7370 44.3%
splitcompounds 442938 6794 42.0%
pos 437972 7344 44.8%
pos+merge 437330 7363 44.7%
pos+merge+pref 435055 7397 44.2%
not improve translation quality, but it is not
harmfuleither. Thetreatmentofseparable
pre-xeshelpsasdoesannotating some words with
part ofspeechinformation. Mergingof phrases
doesnotimprovethequalitymuchfurther. The
besttranslationswereachieved withthe
combi-nationof POS-annotation, phrasemerging and
prependingseparated verb prexes. Thisholds
forbothtranslationoftextandofspeechinput.
Table2: Results on Verbmobiltext input.
preprocessing SSER[%]
baseline 20.3
verbprexes 19.4
splitcompounds 20.3
pos 19.7
pos+merge 19.5
pos+merge+pref 18.0
The fact that these hard-coded
transforma-tions are not only helpful on text input, but
also on speech input is quite encouraging. As
an example makes clear this cannot be taken
for granted: The test sentence \Dann fahren
wirdannlos." isrecognizedas\Dannfahrenwir
dannuns." andthefactthatseparableverbsdo
not occur in their separated form in the
train-ing data is unfavorable in this case. The
g-ures showthatingeneral thespeech recognizer
output containsenough informationforhelpful
preprocessing SSER[%]
baseline 43.4
verbprexes 41.8
split compounds 43.1
split+pref 42.3
pos+merge+pref 41.1
4.2 Translation Results for EuTrans
The EuTrans corpus consists of dierent
typesofGerman{Englishtextsbelongingtothe
tourism domain: web pages of hotels,
touris-ticbrochuresandbusinesscorrespondence. The
stringtranslationand language model
parame-tersweretrainedon 27028 sentencepairs. The
200testsentencescontain150wordsneverseen
intraining.
Table 4 summarizes the corpus statistics of
the training set for the original corpus,
af-ter splitting of compound words and after
ad-ditional prepending of seperated verb prexes
(\split+prexes"). The splittingof compounds
improves the token-type ratio from 8.6 to 12.3
andthenumberofwordsseenonlyoncein
train-ing reducesby8.9%.
Table 4: Corpusstatistics: EuTrans.
preprocessing no. of no. of
single-tokens types tons
English 562264 33823 47.1%
German
baseline 499217 58317 58.9%
split compounds 535505 43405 50.0%
split+prexes 534676 43407 49.8%
The number of words in the test sentences
neverseenintrainingreducesfrom150to81by
compoundsplittingand can furtherbe reduced
to 69 by replacingtheunknown wordforms by
more generalforms. 80 unknown words are
en-counteredwhen verbprexesaretreatedin
ad-ditionto compoundsplitting.
Experiments for POS-annotation have not
Comparedto theVerbmobil task,thiscorpus
islesshomogeneous. Mergingofphrasesdidnot
help much on Verbmobil and is therefore not
testedhere.
Table5showsthatthesplittingofcompound
words yields an improvement in the subjective
sentence error rate of 4.5% and the treatment
of unknownwords (\unk") improvesthe
trans-lation quality by an additional 1%. Treating
separable verb prexes in addition to splitting
compoundsgives thebest result sofar withan
improvement of7.1% absolute comparedto the
baseline.
Table 5: Resultson EuTrans.
preprocessing SSER[%]
baseline 57.4
splitcompounds 52.9
split+unk 51.8
split+prexes 50.3
5 Conclusion and Future Work
Inthispaper,we havepresentedsome methods
of providing morphologicaland syntactic
infor-mation for improving the performance of
sta-tistical machine translation. Firstexperiments
prove theirgeneralapplicabilityto realisticand
complextaskssuchasspontaneouslyspoken
di-alogs.
We are planning to integrate the approach
into the search process. We are also working
onlanguagemodelsandtranslationmodelsthat
use morphological categories for smoothing in
thecase of unseenevents.
Acknowledgement. This work was partly
supported by the German Federal Ministry of
Education, Science, Research and Technology
under the Contract Number 01 IV 701 T4
(Verbmobil) and as part of the EuTrans
projectbytheEuropean Community(ESPRIT
projectnumber30268).
The authors would like to thank Gregor
Leuschforhis support inimplementation.
References
P.F. Brown, S.A. Della Pietra, V.J.
tion: Parameter Estimation. Computational
Linguistics,19(2):263{311.
Mariikka Haapalainen and Ari Majorin. 1995.
GERTWOL und Morphologische
Disambi-guierungfurdasDeutsche. URL:
www.lingsoft./doc/gercg/NODALIDA-poster.html.
Fred Karlsson. 1990. Constraint Grammar as
a Framework for Parsing Running Text. In
Proceedings ofthe 13th International
Confer-enceon ComputationalLinguistics,volume3,
pages 168{173, Helsinki,Finland.
Sonja Nieen, Stephan Vogel, Hermann Ney,
and Christoph Tillmann. 1998. A DP based
Search Algorithm for Statistical Machine
Translation. In Proceedings of the 36th
An-nual Conference of the Associationfor
Com-putational Linguistics and the 17th
Interna-tional Conferenceon Computational
Linguis-tics,pages 960{967, Montreal, P.Q., Canada,
August.
SonjaNieen,FranzJosef Och, Gregor Leusch,
andHermannNey. 2000. AnEvaluationTool
for Machine Translation: Fast Evaluation
for MT Research. In Proceedings of the 2nd
International Conference on Language
Re-sources andEvaluation,pages39{45,Athens,
Greece,May.
Franz Josef Och and Hans Weber. 1998.
Im-proving Statistical Natural Language T
rans-lation with Categories and Rules. In
Pro-ceedings of the 36th Annual Conference of
the Association for Computational
Linguis-ticsandthe17th InternationalConferenceon
Computational Linguistics, pages 985{989,
Montreal, P.Q., Canada, August.
FranzJosefOch,ChristophTillmann,and
Her-mann Ney. 1999. Improved Alignment
Mod-els for Statistical Machine Translation. In
Proceedings of the Conference on Empirical
Methods in Natural Language Processing and
Very Large Corpora, pages20{28, University
ofMaryland, CollegePark, Maryland,June.
Wolfgang Wahlster. 1993. Verbmobil:
Transla-tion of Face-to-Face Dialogs. In Proceedings
of the MT Summit IV,pages 127{135, Kobe,
Japan.
Ye-Yi Wang and Alex Waibel. 1997.
Decod-ing Algorithm in Statistical Translation. In