As we have known that there are many different approaches to MT have been proposed and developed. In this section, I give an overview of some related work and discuss the differences between with those methods and ours.
3.4.1 Pattern-Based Machine Translation
The basic idea of Pattern-Based Machine Translation (PBMT) is to translate from an original language into a target language using Translation Patterns (TP), all the knowledge necessary for the translation being written in patterns. A translation pattern is a pair of source Context Free Grammar (CFG)-rule and its corresponding target CFG-rule [67, 74]. The followings are examples of translation patterns. See Table 3.3
(P1) take: VERB: 1 a look at NP:2 ⇒ VP:1 VP:1⇐ NP:2 wo(dobj) miru(see): VERB:1 (P2) NP:1 VP:2 ⇒ S:2 S:2 ⇐ NP:1 wa VP:2 (P3) PRON:1 ⇒ NP:1 NP:1 ⇐ PRON:1
Table 3.3: Examples of Translation Pattern
The (P1) is a translation pattern of an English colloquial phrase ”take a look at,” and (P2) and (P3) are general syntactic translation patterns. In the above patterns, a left-half part (like ”A B C ⇒ D”) of a pattern is a source CFG-rule, the right half part (like ”A ⇐ B C D”) is a target CFG-rule, and an index number represents correspondence of terms in the source and target sides and is also used to indicate a head term (which is a term having the same index as the left-hand side 2 of a CFG-rule). Further, some features can be attached as matching conditions for each term. The pattern-based MT engine performs a CFG-parsing for an input sentence with using source sides of translation patterns. This is done by using chart-type CFG-parser. The target structure is constructed by the synchronous derivation which generates a target structure by combining target sides of translation patterns which are used to make a parse. Figure 3.4 shows the architecture of a PBMT system.
Translation of an input string S to a target string T essentially consists of the follow three steps:
Post-edited
Sentences TranslationSentences
Source Sentences
Bilingual
Texts Morphological
Synthesizer MorphologicalAnalyzer Post Generator
Parser/Generator Sentennce
Dict. PatternDict.
Translation Engine
General
Dict. FailureRecovery Dict. System Dictionary
User Dictionary
Figure 3.4: Architecture of PBMT
2. propagating link constraints from source to target CFG skeletons to build a target CFG deriva- tion sequence.
3. Generating T from the target CFG derivation sequence.
PBMT was proposed to satisfy three requirements of the market: efficiency, scalability, and ease-of-use. To achieve the best possible average runtime and accuracy, PBMT should be com- bined with more powerful grammar formalisms, and the difficult problems with PBMT are how to translate light-verb phrases and how to find the pair of CFG rules. Furthermore, writing new patterns is difficult due to the lack of exibility in the way to describe constraints on the features associated with a non-terminal, requiring for instance a new non-terminal for each semantic condition, so that a deep understanding of the internals of the system was necessary in order to add new patterns
The SF of SFBMT is like the TP of PBMT in structure. However, because the nodes of a SF have no semantic attributes and there is no need to do the detailed syntactic and semantic analysis, SFBMT has not the above mentioned problems of the PBMT.
3.4.2 Glossary-Based Machine Translation
A Glossary-Based Machine Translation (GBMT) system was first developed as part of the Pangloss project [50, 9, 51, 13].
The GBMT engine uses a bilingual glossary and a bilingual dictionary to produce a trans- lation of phrases in a source text. The input to the engine is a flat tree structure where the root represents the entire text, the intermediary nodes are sentence nodes, and the leaves of the tree are analyzed lexical tokens that also contain the translation(s) of each lexical token. The GBMT engine is parameterized by a bilingual glossary. The bilingual glossary is essentially a phrasal dictionary: a glossary entry contains a source phrase pattern, a set of corresponding target phrase patterns, and correspondences between variables in the source and in the target patterns. A GBMT system pro- duces phrase-by-phrase translation of the source text, falling back on a word-by-word translation when no phrase from the glossary matches the input. Thus, the size of the glossary and the flexibility of the pattern language are crucial for good translations.
The GBMT engine processes source tree structures in four steps (Figure 3.5):
1. Glossary phrases are matched within the sentence sub-tree.
2. Target phrase patterns are added to the tree for each source phrase match. 3. Morphological information is transferred from source tokens to target tokens. 4. Agreement binding information is generated for each source phrase.
The tree structure manipulated by the GBMT engine contains both the source tree and the target tree, which are simply source and target projections of the same data structure. Each tar- get tree’s lexical token is sent to the morphological generator which produces the surface inflected form of each lexical token. Finally, the resulting fully instantiated tree structure is processed to pro- duce a target Tipster document which contains alternative translations, tagging and morphological information, and constituent information stored as Tipster annotations.
Translation (based on phrase pattern-matching) is fast and accurate regarding the content of the document and browsed documents can be translated almost in real-time. A GBMT system for a language pair is also extremely simple, cheap and fast to develop. Moreover, all language resources used by the system are entirely under the control of the user. However, the GBMT sys- tem lacks of translation accuracy and readability. In SFBMT, by introducing the SF variables, the
Deep Source Tree Structure Source Phrase Matching
Flat Source Tree Structure Creation of Target Phrases
Partial Target Tree Structure Morphological Transfer
Partial Target Tree Structure Full Target Tree Structure
Generation of Morphological Agreement
Figure 3.5: Process of GBMT
glossary quantities will be reduced, and by the SF the translation quality will be improved, such as word order and conventional expressions.
3.4.3 Example-Based Machine Translation
Example-Based MT is essentially translation by analogy. The basic philosophy of EBMT is: ”Man does not translate a simple sentence by doing deep linguistic analysis, rather, man does translation, first, by properly decomposing an input sentence into certain fragmental phrases, and finally by properly composing these fragmental translations into one long sentence. The translation of each fragmental phrase will be done by the analogy translation principle with proper examples as its reference.” – Nagao[45]. The EBMT approach uses raw, unanalysed, unannotated bilingual data and a set of SL and TL lexical equivalences mainly expressed in terms of word pairs (with SL and TL verb equivalences expressed in terms of case frames) as the linguistic backbone of the translation process. EBMT systems are attractive in that they require a minimum of prior knowledge and are therefore quickly adaptable to many language pairs. In EBMT, instead of using explicit mapping rules for translating sentences from one language to another, the translation process is mainly a matching process which aims at locating the best match in terms of semantic similarities between the input sentence and the available example in the database. The most beneficial advantage of the EBMT approach is that it is capable non-literal translations. One other advantage is the EBMT does
not require an extraordinary amount of work to port it to other languages. The general EBMT architecture is described in Figure 3.6.
Source Text Target Text Find most analogous examples Retrieve corresponding target language examples Combine examples Source Language Examples Target Language Examples Correspondence
Figure 3.6: Architecture of EBMT
The system begins with the input refered to as the source text. The most similar and analogous examples are retrieved from the source language database. The next step is to retrieve the corresponding translations of the analogous examples. And the final step is to recombine the examples into the final translations.
A basic premise in EBMT is that, if a previously translated sentence occurs again, the same translation is likely to be correct. In many cases an EBMT does not provide a translation by itself, but suggests similar sentences in the target language thus helping to ensure the consistency of style and terminology. EBMT uses a bilingual corpus and a bilingual dictionary where the latter can be constructed from the corpus, e.g. employing statistical methods. Matched from the corpus, EBMT can achieve the same style of translations as the corpus. Over the years, so-called light or shallow versions of EBMT have been proposed [47, 72, 4, 5], such that EBMT can function using nothing more than sentence-aligned plain text and a bilingual dictionary. In the translation process, such a system can look up all matching phrases in the source language and performs a word-level alignment on the entries containing matches to determine a translation. Portions of the input for which there are no matches in the corpus do not generate a translation but can be translated by using a bilingual dictionary and a phrasal glossary.
Shallow EBMT performs no deep linguistic analysis, but compares surface strings (pos- sibly with some morphological analysis), in this respect it is very similar to SFBMT. However, a shallow EBMT engine may be unable to properly deal with translations that do not involve one-to-
one correspondences between source and target words. In SFBMT, by extracting the constant parts (SF) out of a sentence and matching them with the SF base for translation, these problems may be overcome.