Improving SMT quality with morpho syntactic analysis

(1)

Sonja Nieen and Hermann Ney

Lehrstuhl furInformatikVI

Computer Science Department

RWTH{ University ofTechnology Aachen

D-52056Aachen, Germany

Email: [email protected] chen .de

Abstract

Intheframeworkofstatisticalmachine

transla-tion(SMT),correspondencesbetweenthewords

in the source and the target language are

learned from bilingual corpora on the basis of

so-called alignment models. Many of the

sta-tistical systemsuse littleorno linguistic

know-ledge to structure the underlying models. In

thispaperwearguethattrainingdatais

typical-lynotlargeenoughto suÆcientlyrepresent the

rangeofdierentphenomenainnatural

langua-gesandthatSMTcantakeadvantageofthe

ex-plicitintroductionofsomeknowledgeaboutthe

languages under consideration. The

improve-mentof thetranslationresults isdemonstrated

on two dierentGerman-English corpora.

1 Introduction

In this paper, we address the question of how

morphological and syntactic analysis can help

statistical machine translation (SMT). In our

approach,we introduceseveral transformations

to the source string (in our experiments the

sourcelanguageisGerman)todemonstratehow

linguisticknowledgecanimprovetranslation

re-sults especially in the cases where the

token-type ratio (number of training words versus

numberofvocabularyentries) isunfavorable.

After reviewing the statistical approach to

machine translation, we rst explain our

mo-tivation for examining additional knowledge

sources. Wethenpresentourapproachindetail.

Experimental resultsontwobilingual

German-English tasks are reported, namely the

Verb-mobiland theEuTranstask. Finally,wegive

an outlookon ourfuturework.

2 Statistical Machine Translation

Thegoalofthetranslationprocessinstatistical

lows: A source language string f J

istobetranslatedinto atarget languagestring

e

. In the experiments reported in

thispaper, thesource languageis German and

the target language is English. Every English

stringisconsideredasapossibletranslationfor

theinput. Ifwe assign a probabilityPr(e I

toeachpairofstrings(e I

1 ;f

J

1

),thenaccordingto

Bayes' decisionrule,we haveto choosethe

En-glish string that maximizes the product of the

English language model Pr(e I

1

) and the string

translationmodelPr(f J

Many existing systems for SMT (Wang and

Waibel, 1997;Nieenetal.,1998; Ochand W

e-ber,1998)makeuseofaspecialwayof

structur-ingthe string translationmodel (Brown et al.,

1993): The correspondence between the words

inthesource and thetarget stringis described

byalignmentsthatassignonetarget word

posi-tion to each source word position. The

prob-ability of a certain English word to occur in

thetargetstringisassumedto dependbasically

onlyonthesourcewordalignedtoit. Itisclear

thatthisassumption isnotalwaysvalidforthe

translation of natural languages. It turns out

thateventhoseapproachesthatrelaxthe

word-by-wordassumptionlike(Ochetal.,1999)have

problemswithmanyphenomenatypical of

nat-ural languages in general and German in

par-ticularlike

idiomaticexpressions;

compoundwordsthathavetobetranslated

bymore thanone word;

long range dependencies like prexes of

verbs placedat theendof thesentence;

(2)

sources mentioned above are trained on

bi-lingual corpora. Bearing in mind that more

than40%ofthewordformshaveonlybeenseen

onceintraining(see Tables1and 4),itis

obvi-ousthatthephenomenalistedabovecanhardly

be learned adequately from the data and that

theexplicitintroductionoflinguisticknowledge

isexpected to improve translationquality.

The overall architecture of the statistical

translationapproachisdepictedinFigure1. In

this gure we already anticipate the fact that

wewilltransformthesourcestringsinacertain

manner. If necessarywe can also apply the

in-verse of these transformationson the produced

outputstrings. InSection3weexplainindetail

which kindsof transformationswe apply.

Source Language Text

Transformation

Lexicon Model

Language Model

Global Search:

Target Language Text

over

Pr( f

₁

J

| e

₁

I

)

Pr(

e

1 I

)

Pr( f

₁

J

| e

₁

I

)

Pr(

e

1 I

)

e

₁

I

f

₁

J

maximize

Alignment Model

Transformation

Figure 1: Architecture of the translation

ap-proachbased on Bayes' decisionrule.

3 Analysis and Transformation of

the Input

As already pointed out, we used the method

of transforming the inputstring in our

experi-ments. The advantage of thisapproach isthat

existingtrainingand searchproceduresdidnot

have to be adapted to new models

incorporat-ingtheinformationunderconsideration. Onthe

other hand, it would be more elegant to leave

the decision between dierent readings, for

in-equateforthepreliminaryidenticationofthose

phenomena relevant for improvingthe

transla-tionresults.

3.1 Analysis

We used GERTWOL, a German

Morphologi-cal Analyser (Haapalainen and Majorin, 1995)

and the Constraint Grammar Parser for

Ger-manGERCGfor lexicalanalysis and

morpho-logicaland syntacticdisambiguation. For a

de-scriptionof the Constraint Grammar approach

we refer the reader to (Karlsson, 1990). Some

preprocessing was necessary to meet the input

format requirements of the tools. In the cases

wherethetoolsreturnedmorethanonereading,

either simple heuristics based on domain

spe-cic preference rules where applied or a more

general, non-ambiguousanalysiswasused.

In the following subsections we list some

transformationswe have tested.

3.2 Separated GermanVerbprexes

Some verbs in German consist of a main part

and a detachable prex which can be shifted

to the end of the clause, e.g. \losfahren" (\to

leave")inthesentence\Ichfahre morgenlos.".

We extractedall word formsof separableverbs

fromthetrainingcorpus. Theresultinglist

con-tainsentriesof the formprefixjmain. The

en-try \losjfahre" indicates, for example, that the

prex\los"canbedetachedfromthewordform

\fahre". Inallclausescontainingaword

match-ingamainpartandawordmatchingthe

corre-spondingprexpart occuring at theendofthe

clause,theprexisprependedto thebeginning

ofthemainpart,asin \Ich losfahremorgen."

3.3 German Compound Words

Germancompoundwordsposespecialproblems

to the robustness of a translation method,

be-causetheworditselfmustberepresentedinthe

trainingdata: theoccurenceofeachofthe

com-ponents is notenough. The word \Fruchtetee"

forexample can not be translated although its

components\Fruchte"and \Tee"appearinthe

trainingset of EuTrans. Besides, even if the

compoundoccursintraining,thetraining

algo-rithmmaynotbecapableoftranslatingit

prop-erly as two words (in the mentioned case the

(3)

align-components.

3.4 Annotation withPOS Tags

One way of helping the disambiguation of

am-biguous words is to annotate them with their

partofspeech(POS)information. Wechosethe

following very frequent short words that often

caused errorsintranslationforVerbmobil:

\aber" can beadverb orconjunction.

\zu" can be adverb, preposition, separated

verbprexorinnitivemarker.

\der", \die" and \das" canbedenite

arti-clesorpronouns.

The diÆculties due to these ambiguities are

illustratedbythefollowing examples: The

sen-tence\Daswurdemirsehrgutpassen." isoften

translated by \The would suit me very well."

insteadof\That wouldsuitmevery well." and

\Das war zu schnell." is translated by \That

wasto fast." insteadof \Thatwastoo fast.".

We appended the POS tag in training and

test corpusfortheVerbmobil task(see 4.1).

3.5 Merging Phrases

Some multi-word phrases as a whole represent

a distinct syntactic role in the sentence. The

phrase \irgend etwas" (\anything") for

exam-ple may form either an indenite determiner

or an indenitepronoun. Like 21 other

multi-wordphrases\irgend-etwas"ismergedinorder

to form one singlevocabularyentry.

3.6 Treatment of Unseen Words

Forstatistical machine translationit isdiÆcult

to handle words not seen in training. For

un-known proper names, it is normally correct to

place the word unchangedinto the translation.

We have beenworkingon the treatment of

un-known words of other types. As already

men-tionedinSection3.3,thesplittingofcompound

words can reduce thenumberofunknown

Ger-manwords.

Inaddition,wehaveexaminedmethodsof

re-placingawordfullformbyamoreabstractword

formandcheckwhetherthisformisknownand

can be translated. The translation of the

sim-plied word form is generally not the precise

translation of the original one, but sometimes

form and can be transformed to the less

specic form \kalt"(\cold").

\Jahre" (\years") can be replaced bythe

sin-gularform \Jahr".

\beneidest" (\to envy" in rst person

singu-lar): iftheinnitiveform\beneiden"isnot

known, it might help just to remove the

leading particle\be".

4 Translation Results

We use the SSER (subjective sentence error

rate) (Nieen et al., 2000) as evaluation

cri-terion: Each translated sentence is judged by

a human examiner according to an error scale

from 0.0 (semantically and syntactically

cor-rect) to 1.0(completely wrong).

4.1 Translation Results for Verbmobil

The Verbmobil corpus consists of

spontane-ously spoken dialogs in the appointment

sche-dulingdomain (Wahlster, 1993). German

sen-tencesare translated into English. The output

ofthespeechrecognizer(forexamplethe

single-best hypothesis) is used as input to the

trans-lationmodules. Forresearchpurposesthe

orig-inaltext spoken by the users can be presented

to the translation system to evaluate the MT

component seperatelyfromtherecognizer.

The training set consists of 45680 sentence

pairs. Testing was carried out on a seperate

set of 147 sentences that do not contain any

unseen words. In Table 1 thecharacteristics of

thetrainingsetsaresummarizedfortheoriginal

corpusandaftertheapplicationofthedescribed

transformationsontheGermanpartofthe

cor-pus. The table shows that on this corpus the

splittingofcompoundsimprovesthetoken-type

ratiofrom59.7to65.2,butthenumberof

single-tons(wordsseenonlyonceintraining)doesnot

go down by more than2.8%. The other

trans-formations(prependingseparated verb prexes

\pref";annotationwithPOStags\pos";

merg-ingof phrases\merge")do notaect these

cor-pusstatisticsmuch.

Thetranslationperformanceresultsaregiven

inTable 2 for translation of text and in Table

3 for translation of the single-best hypothesis

given bya speech recognizer(accuracy 69%).

(4)

ing(\baseline"=no preprocessing).

preprocessing no. of no. of

single-tokens types tons

English 465143 4382 37.6%

German

baseline 437968 7335 44.8%

verbprexes 435686 7370 44.3%

splitcompounds 442938 6794 42.0%

pos 437972 7344 44.8%

pos+merge 437330 7363 44.7%

pos+merge+pref 435055 7397 44.2%

not improve translation quality, but it is not

harmfuleither. Thetreatmentofseparable

pre-xeshelpsasdoesannotating some words with

part ofspeechinformation. Mergingof phrases

doesnotimprovethequalitymuchfurther. The

besttranslationswereachieved withthe

combi-nationof POS-annotation, phrasemerging and

prependingseparated verb prexes. Thisholds

forbothtranslationoftextandofspeechinput.

Table2: Results on Verbmobiltext input.

preprocessing SSER[%]

baseline 20.3

verbprexes 19.4

splitcompounds 20.3

pos 19.7

pos+merge 19.5

pos+merge+pref 18.0

The fact that these hard-coded

transforma-tions are not only helpful on text input, but

also on speech input is quite encouraging. As

an example makes clear this cannot be taken

for granted: The test sentence \Dann fahren

wirdannlos." isrecognizedas\Dannfahrenwir

dannuns." andthefactthatseparableverbsdo

not occur in their separated form in the

train-ing data is unfavorable in this case. The

g-ures showthatingeneral thespeech recognizer

output containsenough informationforhelpful

baseline 43.4

verbprexes 41.8

split compounds 43.1

split+pref 42.3

pos+merge+pref 41.1

4.2 Translation Results for EuTrans

The EuTrans corpus consists of dierent

typesofGerman{Englishtextsbelongingtothe

tourism domain: web pages of hotels,

touris-ticbrochuresandbusinesscorrespondence. The

stringtranslationand language model

parame-tersweretrainedon 27028 sentencepairs. The

200testsentencescontain150wordsneverseen

intraining.

Table 4 summarizes the corpus statistics of

the training set for the original corpus,

af-ter splitting of compound words and after

ad-ditional prepending of seperated verb prexes

(\split+prexes"). The splittingof compounds

improves the token-type ratio from 8.6 to 12.3

andthenumberofwordsseenonlyoncein

train-ing reducesby8.9%.

Table 4: Corpusstatistics: EuTrans.

preprocessing no. of no. of

single-tokens types tons

English 562264 33823 47.1%

German

baseline 499217 58317 58.9%

split compounds 535505 43405 50.0%

split+prexes 534676 43407 49.8%

The number of words in the test sentences

neverseenintrainingreducesfrom150to81by

compoundsplittingand can furtherbe reduced

to 69 by replacingtheunknown wordforms by

more generalforms. 80 unknown words are

en-counteredwhen verbprexesaretreatedin

ad-ditionto compoundsplitting.

Experiments for POS-annotation have not

(5)

Comparedto theVerbmobil task,thiscorpus

islesshomogeneous. Mergingofphrasesdidnot

help much on Verbmobil and is therefore not

testedhere.

Table5showsthatthesplittingofcompound

words yields an improvement in the subjective

sentence error rate of 4.5% and the treatment

of unknownwords (\unk") improvesthe

trans-lation quality by an additional 1%. Treating

separable verb prexes in addition to splitting

compoundsgives thebest result sofar withan

improvement of7.1% absolute comparedto the

baseline.

Table 5: Resultson EuTrans.

baseline 57.4

splitcompounds 52.9

split+unk 51.8

split+prexes 50.3

5 Conclusion and Future Work

Inthispaper,we havepresentedsome methods

of providing morphologicaland syntactic

infor-mation for improving the performance of

sta-tistical machine translation. Firstexperiments

prove theirgeneralapplicabilityto realisticand

complextaskssuchasspontaneouslyspoken

di-alogs.

We are planning to integrate the approach

into the search process. We are also working

onlanguagemodelsandtranslationmodelsthat

use morphological categories for smoothing in

thecase of unseenevents.

Acknowledgement. This work was partly

supported by the German Federal Ministry of

Education, Science, Research and Technology

under the Contract Number 01 IV 701 T4

(Verbmobil) and as part of the EuTrans

projectbytheEuropean Community(ESPRIT

projectnumber30268).

The authors would like to thank Gregor

Leuschforhis support inimplementation.

References

P.F. Brown, S.A. Della Pietra, V.J.

tion: Parameter Estimation. Computational

Linguistics,19(2):263{311.

Mariikka Haapalainen and Ari Majorin. 1995.

GERTWOL und Morphologische

Disambi-guierungfurdasDeutsche. URL:

www.lingsoft./doc/gercg/NODALIDA-poster.html.

Fred Karlsson. 1990. Constraint Grammar as

a Framework for Parsing Running Text. In

Proceedings ofthe 13th International

Confer-enceon ComputationalLinguistics,volume3,

pages 168{173, Helsinki,Finland.

Sonja Nieen, Stephan Vogel, Hermann Ney,

and Christoph Tillmann. 1998. A DP based

Search Algorithm for Statistical Machine

Translation. In Proceedings of the 36th

An-nual Conference of the Associationfor

Com-putational Linguistics and the 17th

Interna-tional Conferenceon Computational

Linguis-tics,pages 960{967, Montreal, P.Q., Canada,

August.

SonjaNieen,FranzJosef Och, Gregor Leusch,

andHermannNey. 2000. AnEvaluationTool

for Machine Translation: Fast Evaluation

for MT Research. In Proceedings of the 2nd

International Conference on Language

Re-sources andEvaluation,pages39{45,Athens,

Greece,May.

Franz Josef Och and Hans Weber. 1998.

Im-proving Statistical Natural Language T

rans-lation with Categories and Rules. In

Pro-ceedings of the 36th Annual Conference of

the Association for Computational

Linguis-ticsandthe17th InternationalConferenceon

Computational Linguistics, pages 985{989,

Montreal, P.Q., Canada, August.

FranzJosefOch,ChristophTillmann,and

Her-mann Ney. 1999. Improved Alignment

Mod-els for Statistical Machine Translation. In

Proceedings of the Conference on Empirical

Methods in Natural Language Processing and

Very Large Corpora, pages20{28, University

ofMaryland, CollegePark, Maryland,June.

Wolfgang Wahlster. 1993. Verbmobil:

Transla-tion of Face-to-Face Dialogs. In Proceedings

of the MT Summit IV,pages 127{135, Kobe,

Japan.

Ye-Yi Wang and Alex Waibel. 1997.

Decod-ing Algorithm in Statistical Translation. In