Evaluation Experiment - Experiment and Evaluation

4.2 Experiment and Evaluation

4.2.2 Evaluation Experiment

First to all, I will give a brief review about evaluating MT systems. The evaluation of MT systems is a complex task. This is not only because many different factors are involved, but because measuring translation performance is itself difficult. In evaluating MT systems one should also take into account the fact that system performance will normally improve considerably during the first few months after its installation, as the system is tuned to the source materials. It follows that performance on an initial trial with a sample of the sort of material to be translated can only be broadly indicative of the translation quality that might ultimately be achieved after several months or years of work.

A traditional way of assessing the quality of translation is to assign scores to output sentences. A common aspect to score for is Intelligibility, where the intelligibility of a translated

sentence is affected by grammatical errors, mistranslations and untranslated words. Some studies also take style into account, even though it does not really affect the intelligibility of a sentence. Scoring scales reflect top marks for those sentences that look like perfect target language sentences

and bottom marks for those that are so badly degraded as to prevent the average translator/evaluator from guessing what a reasonable sentence might be in the context. In between these two extremes, output sentences are assigned higher or lower scores depending on their degree of awfulness - for example, slightly fluffed word order (... in an interview referred Major to the economic situation...) will probably get a better score than something where mistranslation of words has rendered a sen- tence almost uninterpretable (... the peace contract should take off the peace agreement...). Thus scoring for intelligibility reflects directly the quality judgment of the user; the less she understands, the lower the intelligibility score. Therefore it might seem a useful measure of translation quality.

By measuring intelligibility we get only a partial view of translation quality. A highly intelligible output sentence need not be a correct translation of the source sentence. It is important to check whether the meaning of the source language sentence is preserved in the translation. This property is called Accuracy or Fidelity. Scoring for accuracy is normally done in combination

with (but after) scoring for intelligibility. As with intelligibility, some sort of scoring scheme for accuracy must be devised. Whilst it might initially seem tempting to just have simple ’Accurate’ and ’Inaccurate’ labels, this could be somewhat unfair to an MT system which routinely produces translations which are only slightly deviant in meaning. Such a system is impossible to assign a meaning, and so the question of whether the translation means the same as the original cannot really be answered. The evaluation procedure is fairly similar to the one used for the scoring of intelligibility. However the scorers obviously have to be able to refer to the source language text (or a high quality translation of it in case they cannot speak the source language), so that they can compare the meaning of input and output sentences. As it happens, in the sort of evaluation considered here, accuracy scores are much less interesting than intelligibility scores. This is because accuracy scores are often closely related to the intelligibility scores; high intelligibility normally means high accuracy. Most of the time most systems don’t exhibit surreal or Monty Python properties. For some purposes it might be worth dispensing with accuracy scoring altogether and simply counting cases where the output looks silly (leading one to suppose something has gone wrong). It should be apparent from the above that devising and assigning quality scores for MT output - what is sometimes called ’Static’ or ’Declarative Evaluation’ 2 _{- is not straightforward. Interpreting the}

resultant scores is also problematic.

Next isErrow Analysis. The technique of error analysis tries to establish how seriously 2

’Declarative’ here is to be contrasted with ’procedural’. A declarative specification of a program states what the program should do, without considering the order in which it must be done. A procedural specification would specify both what is to be done, and when. Properties like Accuracy and Intelligibility are properties of a system which are independent of the dynamics of the system, or the way the system operates at all -hence ’non-procedural’ or ’declarative’.

errors affect the translation output. The method is this. To start off, write down a large list of all the types of errors you think the MT system might make. During the evaluation, all the errors in the translated text are counted up. Because you consider some errors more serious than others, each type of error will be multiplied by some weighting factor which you assign to it. The score then for each individual sentence or the whole text will be the sum of all the weighted errors. Although this method gives more direct information on the usefulness of an MT system, there are immediate problems with using detailed error analysis. The first is practical: it will usually require considerable time and effort to train scorers to identify instances of particular errors - and they will also need to spend more time analysing each output sentence. Second, is there any good basis for choosing a particular weighting scheme? Not obviously. The weighting is in some cases related to the consequences an error has for post-editing: how much time it will take to correct that particular mistake. However, a third problem and perhaps this is the most serious one: for some MT systems, many output sentences are so corrupted with respect to natural language correlates that detailed analysis of errors is not meaningful. Error types are not independent of each other: failure to supply any number inflection for a main verb will often mean that the subject and verb do not agree in number as required. It will be difficult to specify where one error starts and another ends and thus there is the risk of ending up with a general error scale of the form one, two, .... lots. The assignment of a weighting to such complex errors is thus a tricky business.

For the purpose to evaluate our system, we have done some evaluation experiment. In the evaluation experiment, we have collected 250 sentences from a different source [28] which includes many different types of business letters. When the translated sentence is identical with the estimated sentence, we consider it as a success. We do not include the translation of nouns as an evaluation condition due to the existence of synonyms. The result of the experiment is shown in Table 4.2.

Sentence Success SF is in agreement

250 183 204

(73%) (81%)

Table 4.2: Result of the Evaluation Experiment

In the 250 evaluating sentences, there are 183 (73%) sentences have been translated successfully and 204 (81%) sentences have been found that have the same SF as the SF base, among those sentence, there are 21 (8%) sentences which have the same SF as the SF base but have different translation sentences compared to the evaluating sentences. The reason is, that when matching

with the SF base, even if we can get the same Japanese SF part and the kind of nouns with the SF base, the Chinese SF part can be different. Here are some examples:

Japanese: Keiyaku jyouken wa tsugi no tooritoshi, saisyuu daketsu wo hakaritai.

Chinese: Hetong tiaojian ding wei ruxia jitiao, yiqi zuizhong neng dadao qianyue de mudi. English: Contract terms are shown as below, hope can come to an agreement.

The SF of this example is:

< J > X1X2wa tsugi no tooritoshi, X3X4 wo hakaritai.

< C > X1X2wei ruxia suoding, yiqi neng dadao X3X4 de mudi.

Translation: Hetong tiaojian wei ruxia suoding, yiqi neng dadao zuizhong qianyue de mudi. Japanese: Koutei wa, kisha sukejyu-ru doori to suru.

Chinese: Gongcheng jindu anzhao guigongsi richengbiao shishi.

English: The work will be done according to the schedule of your company.

The SF of this example is:

< J > X1wa, X2X3 doori to suru.

< C > X1anzhao X2X3.

Translation: Gongcheng anzhao guigongsi richengbiao.

By the SFs of those examples we can get the above translated sentences. Because the Chinese SF parts are different so even if we can find the corresponding SF in SF base, but the translated sentences are different than the original Chinese sentences.

Japanese: Omotoshite kishin shiteki no jikou ni tsuite cyousa wo okonatta. Chinese: Zhuyao jiu guihan suo zhichu de xiangmu jinxing le diaocha. English: The matter pointed out in your letter was chiefly investigated.

The SF of this example is:

< J >Omotoshite X1shiteki no X2ni tsuite, X3 wo okonatta.

< C >Zhuyao jiu X1suo zhichu de X2 jinxing le X3.

Japanese: Konkai no shisutemu dounyuu ni yori kisha no gyoumu gourika ga issou shinten suru

koto wo oinori moushi agemasu.

de jinzhan.

English: I pray that the business rationalization of your company progress further by introducing

this system.

The SF of this example is:

< J > X1no X2X3ni yori X4no X5X6ga issou shinten suru koto wo oinori moushi agemasu.

< C >Xiwang tongguo X1X2X3, X4zai X5X6fangmian qude gengda de jinzhan.

While they are two success examples and next I will give some failed examples:

Japanese: Jyouki ni yori aratamete keiyakusyo wo sakusei no koto toshitainode, go tehai negaimasu.

Chinese: Women xiwang yizhao shangmian suolie shixiang chongxin zhiding hetongshu, qing

yuyi anpai.

English: We want to remake the contract by the above mentioned items, please make a arrange-

ment.

The SF of this example is:

< J > X1ni yori aramamete X2wo X3 no koto toshitainode, go X4negaimasu.

< C >Xiwang yizhao X1chongxin X3X2, qing yuyi X4.

Translation: Xiwang yizhao shangji chongxin zhiding hetongshu, qing yuyi anpai.

Japanese: Kisha kaihatsu no shisutemu dounyuu ga chienshi, gyoumu ikou ni sukunaku nakarazu

sisyou wo kitashite orimasu.

Chinese: Youyu guigongsi xitong kaifa de yanwu, yi yanzhong yingxiang le bengongsi de yewu

jinzhan.

English: Because the introduction of the system which developed by your company is delayed, it

has interfered the business progress of our company. The SF of this example is:

< J > X1kaihatsu no X2ga chienshi, X3X4 ni sukunaku nakarazu X5wo kitashite orimasu.

< C >Youyu X1X2kaifa de yanwu, yi dui X3X4dailai le yanzhong de X5.

Translation: Youyu guigongsi xitong kaifa de yanwu, yi dui yewu jinzhan dailai le yanzhong de

As we have mentioned that when extracting a SF from the corpus, when the nouns in the Japanese and the Chinese sentence pair are not corresponding, we should add the needed noun into the part of SF for getting the correct translation. Because the expressions in business letters can be so varied, and the corpus for this research is still not large enough, so it is difficult to deal with all those kind of sentence pairs.

Here is another failed example:

Japanese: Kongo tomo yoroshiku onegaiita (Hiragana) shimasu. Chinese Jinhou hai qing nin duoduo guanzhao.

English: We look forward to a continued working relationship with you.

The SF of this example is:

< J > X1tomo yoroshiku onegaiita (Kanji) shimasu.

< C > X1hai qing nin duoduo guanzhao.

Because of the difference between the Hiragana and the Kanji characters we could not get the correct translation as we have expected. Analyzing sentences which could not find the corresponding SF in the SF base we encounter another main reason: the difference of the ending of a word. For example:

Japanese: Saraisyuu atari gotsugou wa ikagadesyou ka ? Chinese: Daxiazhou zuoyou shifou fangbian ne ?

English: How about the convenience around the week after next?

The SF of this example is:

< J > X1atari go X2wa ikagadesyou ka.

< C > X1zuoyou shifou X2ne ?

Japanese: Ano syouhin wa hontou ni jyouhi da. Chinese: Nage shangpin zhende shi hen jingzhi. English: That product is really elegant.

The SF of this example is:

< J >Ano X1wa hontou ni X2desu.

< C >Nage X1zhende shi hen X2.

Because of the difference between [∼.] and [∼?] or [∼da.] and [∼desu.] we could not get the corresponding SF in the SF base.

For solving the problem of the difference between the Hiragana and the Kanji characters we can output the pronunciation from the result of the morphological analysis and use it to get the correct translation. The RSF can help us to deal with problems which are caused by the different endings of a word.

4.3 Summary

In this chapter, first I have given a detailed description of the user requirements and a review of business letter. Based on the special functionalities and properties of business letters, in our system we only look at nouns as variable parts of a sentence, by using the SF-based approach to do the translation. For the purpose to evaluate our system, we have done some evaluation experiment. In the 250 evaluating sentences, there are 183 (73%) sentences have been translated successfully and 204 (81%) sentences have been found that have the same SF as the SF base, among those sentence, there are 21 (8%) sentences which have the same SF as the SF base but have different translation sentences compared to the evaluating sentences.

Chapter 5

Translating Compound Nouns in

SFBMT

In this chapter, I will discuss the challenges of automatic translating Japanese compound nouns into Chinese in the SFBMT system. We interest in compound nouns stems from the reali- sation that they are highly frequent and highly productive in Japanese. In our studies on SFBMT we have found that those compound nouns are difficult to be translated correctly, in order to en- hance the performance of the system and the quality of translation, we propose a shallow solution for translating those compound nouns in SFBMT. More specifically, I will be concerned with the translation of productively formed noun + noun compounds.

5.1 Overview

Compound nouns are very frequently used in some languages such as Japanese, Chinese, English, etc., and are often important words which determine the semantic content of the document. When we read or write document in a foreign language, we need more knowledge than what is provided in an ordinary dictionary, such as terminology, words relevant to current affairs, etc. Such expressions can be made up of multiple words, and there are almost infinite possible variations. These compound word are too large in number to be contained in a manually-created dictionary, thus automatic acquisition of their translations is highly desirable.

The translation of compound nouns has become a major issue in machine translation due to their frequency of occurrence and high productivity. We know that compound words pose well-known problems for linguistic description in general, and also some additional ones for nat-

ural language processing, such as the problems of identification, segmentation, disambiguation, interpretation, and so on. All of those problems make them particularly difficult to be handled in a system performing automatic translation, such as a machine translation system or a system for cross-language information retrieval. First, the relation between the parts of a compound is implicit and thus its interpretation is never wholly compositional. Whereas the interpretation of clauses is guided by syntactic clues, such as word order and morphemic markers, the meaning of a compound cannot be fully recovered from the surface structure. This is, in particular, a problem when translating from a language with frequent use of compounds, to a language generally preferring syntactic constructions instead, since an overt syntactic marker (usually a preposition) is then to be generated. Secondly, also languages with frequent use of compounds differ as to when the use of a compound is preferred over some other construction type - that is, it seems in part to be an arbitrary decision of each language when to compound.

In document SUPER-FUNCTION BASED MACHINE TRANSLATION SYSTEM FOR BUSINESS USER. Xin Zhao (Page 94-102)