F A I L - S O F T C"EMERGENCY") M E A S U R E S IN A P R O D U C T I O N - O R I E N T E D MT S Y S T E M
E v a H a j i ~ o v ~ and Z d e n ~ k K i r s c h n e r F a k u l t y of M a t h e m a t i c s a n d P h y s i c s ,
C h a r l e s U n i v e r s i t y
M a l o s t r a n s k 4 n.25, 118 O O P r a h a i, C z e c h o s l o v a k i a
A B S T R A C T
A s y s t e m of f a i l - s o f t (emergency) m e a s - u r e s for a p r o d u c t i o n - o r i e n t e d M T s y s t e m is d i s c u s s e d , s t a t i n g f i r s t the s p e c i f i c p u r p o s e s of such a system, and s h o w i n g then, h o w t h e s e m e a s u r e s are b e i n g u s e d in the s y s t e m of E n g l i s h - t o - C z e c h m a c h i n e t r a n s - l a t i o n as p r e p a r e d by the g r o u p of m a t h e - m a t i c a l l i n g u i s t i c s at C h a r l e s U n i v e r s i t y in P r a g u e .
i. In v i e w of a p r o d u c t i o n - o r i e n t e d s y s t e m of m a c h i n e t r a n s l a t i o n , u n d e r the p r e s e n t - -day c o n d i t i o n s , o n e s h o u l d k e e p in m i n d that the e n d - u s e r e x p e c t s to h a v e at his d i s p o s a l a c o m p l e t e t e x t r a t h e r t h a n a n a l - t e r n a t i n g s e q u e n c e of ~sentence) s e g m e n t s and blanks. On the o t h e r hand, e v e r y o n e w h o has e v e r m a d e e v e n a p e r f u n c t o r y l o o k at the p r o b l e m s i n v o l v e d in M T w o u l d a g r e e that t h e r e is no such t h i n g as a " c o m p l e t e " MT system, n e i t h e r in the d i c t i o n a r y nor in the g r a m m a r p a r t of it. Also, as is c o m - m o n l y a c c e p t e d n o w a d a y s , a n y s y s t e m w i t h texts w r i t t e n in n a t u r a l l a n g u a g e at the input s h o u l d p r o v i d e m e a s u r e s for some k i n d of t r e a t m e n t of i l l - f o r m e d input. T h u s it
is i n e v i t a b l e to c o n s i d e r , first, w h a t type and q u a l i t y of t r a n s l a t i o n c a n m e e t the d e m a n d s of a p r o s p e c t i v e u s e r a n d w h a t k i n d
of t r a n s l a t i o n is r e a l i z a b l e u n d e r the g i v e n c o n d i t i o n s , and, second, to d e c i d e w h a t is to be s a c r i f i c e d from, a n d w h a t a d d e d to the s y s t e m to m a k e it w o r k in a p r o d u c t i o n e n v i r o m e n t .
In the p r e s e n t p a p e r we w o u l d like to o u t l i n e o n e a s p e c t of t h e a p p r o a c h t a k e n u p at the s t a r t of o u r e x p e r i m e n t A P A C 3 - 2 - the E n @ l i s h - . ~ - C z e c h m a c h i n e t r a n s l a t i o n s y s t e m for t r a n s l a t i n g I N S P E C a b s t r a c t s f r o m the field of m i c r o e l e c t r o n i c s (for a d e s c r i p t i o n of the h i s t o r y of the MT ef- f o r t s of o u r team, see H a j i ~ o v ~ , 1986; the A P A C s e r i e s is d e s c r i b e d in full d e t a i l in K i r s c h n e r , 1982; 1984; in p r e s s ) . We s h o u l d e m p h a s i z e that in the c o n d i t i o n s w i t h i n o u r reach, we a i m at a s a t i s f a c t o - r i l y a c c u r a t e r e n d e r i n g in t a r g e t l a n g u a g e of the c o n t e n t s of a r e l a t i v e l y s i m p l e t e x t in s o u r c e l a n g u a g e , w h i c h w o u l d suf- fice for s u c h a s y s t e m to be a p p l i c a b l e in i n f o r m a t i o n a c q u i s i t i o n a n d w o u l d be c a - p a b l e to m e e t the m a i n r e q u i r e m e n t s set u p by a v e r a g e users.
2.1 T h e s p e c i f i c p u r p o s e s of the s y s t e m of f a i l - s o f t C"emergency") m e a s u r e s to o v e r c o m e a n o m a l o u s input p h e n o m e n a and p a r t i a l f a i l u r e s of the M T s y s t e m c a n be s t a t e d as follows:
The d r a w b a c k is t h a t in t h e s e c i r c u m s t a n c e s the d a n g e r of a d i l e m m a t i c s i t u a t i o n r e s u l t - ing in a c u l - d e - s a c i m p a s s e increases. A special d e v i c e has b e e n i n t r o d u c e d to o v e r - c o m e such an a b n o r m a l end: i n s t e a d of e l i m i - n a t i n g such a d e f e c t i v e string, the a p p l l c a - tion of the p h a s e in q u e s t i o n is s u s p e n d e d - the p r o g r a m p r o c e s s i n g this s t r i n g skips the p h a s e and c o n t i n u e s in the n e x t one. T h e r u l e s t h a t c o m p e n s a t e for the l a c u n a c a n be e i t h e r the r u l e s t h a t in the f r a m e w o r k of
" p r e f e r e n t i a l " a p p r o a c h take up the r o l e of t h e i r m o r e strict p r e d e c e s s o r s , or r u l e s a d d e d p a r t i c u l a r l y for this p u r p o s e - to d e a l w i t h the m o s t u n d e s i r a b l e c o n s e q u e n c e s of such an o m i s s i o n .
- In the analysis, the e m e r g e n c y r u l e s
i n t e r p r e t u n r e c o g n i z e d e l e m e n t s and inte- grate t h e m into m o r e c o m p l e x s t r u c t u r e s .
- In the synthesis, t h e y h e l p to p r o d u c e an o u t p u t that m a k e s sense, c o r r e s p o n d s to the s o u r c e language, and is e a s i e r to p o s t - -edit.
- ~ h e n e v e r it is p o s s i b l e , t h e y a t t e m p t at f o r m i n g t a r g e t l a n g u a g e e q u i v a l e n t s for the u n i d e n t i f i e d e l e m e n t s , e i t h e r by a d a p t - ing i n t e r n a t i o n a l w o r d s or by " c z e c h i z i n g " E n g l i s h d i c t i o n a r y forms by e n d u i n g t h e m w l t h q u a l i t i e s and forms p r o p e r to t h e i r p r e s u m p t i v e C z e c h c o u n t e r p a r t s - e.g., gender, suffixes, etc.
- W i t h some c l a s s e s of words, t h e y serve
as g e n e r a l d i c t i o n a r y r u l e s p r o v i d e d the sets of s e m a n t i c features, frame i n f o r m a - tion and o t h e r n e c e s s a r y o u t f i t of i n d i v i d - ual m e m b e r s of t h e s e c l a s s e s c o r r e s p o n d to the s t a n d a r d a p p a r a t u s a s s i g n e d to their r e p r e s e n t a t i o n in the f r a m e w o r k of a g e n - eral device, and t h e i r o r t h o g r a p h y e n s u r e s
f o r m i n g c o r r e c t e q u i v a l e n t s in Czech.
2.2 T h e f a i l - s o f t m e a s u r e s can be c h a r a c - t e r i z e d as c o n s i s t i n g of t h r e e m a i n parts: the f i r s t two c o n c e r n e l e m e n t s n o t found
in the b a s i c d i c t i o n a r i e s a n d the third c o n c e r n s f a i l u r e s to a r r i v e at an a c c o m - p l i s h e d parse. [We l e a v e a s i d e a d i s c u s s i o n of the u n i f i c a t i o n of o r t h o g r a p h y - s u c h as A m e r i c a n and B r i t i s h usage, d i f f e r e n t w a y s of spelling, u s e of h y p h e n s etc. - w h i c h c o m e s b e f o r e the first d e v i c e d e - s c r i b e d here.) In a sense, t h e r e is a set of o t h e r r u l e s of " e m e r g e n c y " c h a r a c t e r : g e n e r a l r u l e s [which can be c a l l e d "sweep- ing rules") d e s i g n e d to o p e r a t e a f t e r all m o r e s p e c i f i c r u l e s f a i l e d to a p p l y - e.g., in the f o r m a t i o n of c o m p o u n d s or in n o m i n a l s y n t a x in g e n e r a l , etc.~ however, this b e i n g a c o n s t i t u t i o n a l c o m p o n e n t of w h a t we call " p r e f e r e n t i a l " a p p r o a c h , we shall c o n f i n e o u r s e l v e s to d e s c r i b i n g o n l y the f o r m e r t h r e e sets. To a v o i d a p o s s i b l e m i s u n d e r s t a n d i n g , we s h o u l d m a k e c l e a r t h a t w h e n we call our a p p r o a c h " p r e f e r e n - tial", it is o n l y the name t h a t it has in c o m m o n w i t h W i l k s " " p r e f e r e n t i a l s e m a n t i c s " . In o u r system, w e a p p l y a r a t h e r t r i v i a l and s i m p l e p r i n c i p l e w i t h the aid of w h i c h the d i f f e r e n t p r o b a b i l i t y of i n t e r p r e t a - tion[s) of some p a r t s of a s t r i n g is t a k e n into a c c o u n t and e x p l o i t e d . T h e m o s t p r o b - able s o l u t i o n s are c o v e r e d by the r u l e s f i r s t a n d w i t h as d e t a i l e d an a c c u r a c y as p o s s i b l e ) the next p r o b a b l e s o l u t i o n is o f f e r e d in some of the s u b s e q u e n t phases, etc., u n d e r m o r e l i b e r a l c o n d i t i o n s . T h e " s w e e p i n g r u l e s " c o m e last. T h a t is a l s o the r e a s o n w h y we w r i t e " p r e f e r e n t i a l " w i t h q u o t a t i o n m a r k s .
2.21 The first d e v i c e a i m e d at i n t e r c e p t - ing and i n t e r p r e t i n g w o r d s t h a t f a i l e d to be found in the b a s i c d i c t i o n a r i e s is the c o - c a l l e d t r a n s d u c i n g d i c t i o n a r y (TD). Its task is to i n t e r p r e t the still u n r e - c o g n i z e d w o r d s a c c o r d i n g to their t y p i c a l and (mostly) p r o d u c t i v e s u f f i x e s (the in- f l e c t i o n a l e n d i n g s b e i n g d e t a c h e d and d i c - t i o n a r y forms r e c o n s t r u c t e d by m o r p h e m i c a n a l y s i s in the p r e c e d i n g steps), and to
a s s i g n to t h e m p a r t - o f m s p e e c h and s e m a n t i c i n f o r m a t i o n . Thus, e.g., w o r d s e n d i n g in -ER, -OR, -GRAPH, -ODE a n d some o t h e r s are i n t e r p r e t e d as nouns, c o n c r e t e , i n s t r u m e n t s , c a p a b l e of b e i n g s u b s t i t u t e d for h u m a n ac ~ tor; w o r d s e n d i n g in -CS, -CY, -ESS, - T U D E are s u p p o s e d to be nouns, a b s t r a c t , p r o p - e r t i e s and, as d i s t i c t f r o m those e n d i n g in -ITY, -ICS, -SM, -SHIP, -HOOD, -THM, w h i c h o t h e r w i s e have the same s e m a n t i c c h a r a c t e r i s t i c s , t h e y f o r m a d j e c t i v e s in a r e g u l a r m a n n e r in Czech; the e n d i n g s -FY, -ATE, -ISE (-IZE), - D U C E i n d i c a t e v e r b s that can be b o t h t r a n s i t i v e and in- t r a n s i t i v e , of c a u s a t i v e and (semi) t e r m i - n o l o g i c a l c h a r a c t e r , y e t n o t a l l o w e d to f o r m a d j e c t i c e s of the p u r p o s i v e type. A n u m b e r of a d j e c t i v a l s u f f i x e s is c o n t a i n e d , too, viz. -ARY, -AL, -RSE, -IVE, -OUS, -IC, -BLE, -LESS, -ANARt -LEAR, -NEAR, -OLAR, -ULAR. In all, a b o u t 50 c l a s s e s of nouns, 13 of a d j e c t i v e s and 4 of v e r b s are c o v e r e d b y the TD device.
T w o f u r t h e r p i e c e s of i n f o r m a t i o n s h o u l d be added, the f i r s t b e i n g p r o b a b l y s u p e r - fluous: i) All w o r d s h a v i n g s u c h s u f f i x e s but d i f f e r e n t p r o p e r t i e s as r e g a r d s t h e i r p a r t - o f - s p e e c h c a t e g o r y , s e m a n t i c f e a t u r e s , etc., a r e s u p p o s e d to be c o n t a i n e d in the b a s i c d i c t i o n a r i e s . 2) M o s t of the c l a s s e s of w o r d s t r e a t e d by the TD a r e i n t e r n a t i o n - al w o r d s of L a t i n or G r e e k o r i g i n ; t h e y c a n e a s i l y be " t r a n s d u c e d " to C z e c h b y re- l a t i v e l y s i m p l e p r o c e d u r e s ; some of t h e s e p r o c e d u r e s p r e c e d e the T D o p e r a t i o n as a p a r t of a s p e c i a l m o r p h e m i c a n a l y s i s , b u t m o s t of t h e m o p e r a t e in the s y n t h e s i s , as an a c c e s s o r y to the E n g l i s h - C z e c h d i c t i o n - ary. A set of r e c u r s i v e l y a p p l i e d r u l e s
(in s e v e r a l cycles) takes o v e r the w o r d s i d e n t i f i e d b y TD, d e s i n t e g r a t e s them, r e - p l a c e s the E n g l i s h s u f f i x e s by the c o r r e - s p o n d i n g C z e c h ones, a n d scans the b a s e s for s p e l l i n g c o n f i g u r a t i o n s to be t r a n s - f o r m e d or a d a p t e d to C z e c h o r t h o g r a p h y
(replacing, e.g. P H by F, T H b y T, C pre- c e d i n g A , L , O , R , T , U b y K; S p r e c e d e d by A, E , I , N , O , R , Y a n d f o l l o w e d b y A , E , I , O is re- p l a c e d by Z, etc.). Thus, e.g. P H O T O L I T O - G R A P H I C c h a n g e s into F O T O L I T O G R A F I C K E 2 , C Y C L O T R O N g i v e s C Y K L O T R O N , I S O S M O T I C is t r a n s c r i b e d as I Z O S M O T I C K E 2 . To g i v e an e x a m p l e of s o l v i n g s i m i l a r p r o b l e m s , let us c o n s i d e r the w o r d I S O S E I S M I C : to p r e - c l u d e the s e c o n d S s i t u a t e d at a m o r p h e m i c j u n c t u r e f r o m b e c o m i n g a Z, w o u l d r e q u i r e e i t h e r a s p e c i a l e n t r y in the m a i n d i c t i o n - a r y - as o n e w o r d or as c o m b i n a t i o n of t h e p r e f i x a l ISO + SEISMIC, in w h i c h c a s e the a d j e c t i v e m u s t b e c o n t a i n e d in the d i c t i o n - a r y - or some s i m i l a r p r e l i m i n a r y t r e a t m e n t in the s p e c i a l m o r p h e m i c a n a l y s i s p r e c e d i n g the TD; the l a t t e r w a y of t r e a t m e n t w o u l d p r o b a b l y r e p r e s e n t the b e s t s o l u t i o n , w h i c h m a y be g e n e r a l i z e d for a l l or m o s t of the t y p i c a l t e r m i n o l o g i c a l p r e f i x e s i n v o l v i n g a n a l o g o u s p r o b l e m s as I Z O S E I S M I C K E 2 - e.g., A-, INFRA-, PRE-, PERI-, SEMI-, SYN-, M E S O - , M O N O - , H Y P E R - , P O L Y - etc. (needless to a d d t h a t this t i m e it w o u l d be s u c h w o r d s as I S O S M O T I C t h a t w o u l d r e q u i r e a s p e c i f i c t r e a t m e n t , e.g. to p r o c e ~ o n l y S M O T I C - - f r o m ISO + S M O T I C - in the d i c t i o n a r y ) . It s h o u l d be r e m a r k e d that, in p r i n c i p l e , t h i s p a r t of the t r a n s d u c i n g d e v i c e - o r - t h o g r a p h i c a l c h a n g e s - n e e d n o t be sepa- r a t e d f r o m the front p a r t o p e r a t i n g in the a n a l y s i s .
r e s t a r e f i r s t t r e a t e d as p r o p e r n a m e s a n d if t h e s u b s e q u e n t a n a l y s i s f a i l s to c~onfirm t h i s c o n j e c t u r e - i.e., t h e y a r e
n o t i n £ e g r a t e d i n t o w i d e r n o m i n a l c o m p l e x e s , e.g., as an a p p o s i t i o n - t h e y b e c o m e n o u n s
(which, by the way, h a p p e n s to the t e n t a - t i v e a d v e r b s , too). T h e w o r d s i d e n t i f i e d in t h i s t e n t a t i v e m a n n e r a r e " c z e c h i z e d " , w h i c h i n s o m e c a s e s m i g h t r e s u l t in q u i t e a c c e p t a b l e f o r m a t i o n s - e.g., if t h e o r i g - inal w o r d s c a n be t a k e n an " i n t e r n a t i o n a l " or t e c h n i c a l l y a n d t e r m i n o l o g i c a l l y u n i v - o c a l t e r m s : G E T T E R I N G --~ G E T E R O V A 2 N I 2 , A B E N D --) A B E N D O V A T - in o t h e r c a s e s in m o r e or l e s s c o m i c a l " m a c a r o n i c " c r e a t i o n s . In c o n c l u s i o n , it s h o u l d be a d d e d t h a t t h e o r i g i n a l m o r e a m b i t i o u s i d e a of a s s i g n i n g to e a c h u n r e c o g n i z e d w o r d ~that d o e s n o t c a r r y a n y c h a r a c t e r i s t i c c l u e m a k i n g t h e g u e s s e a s i e r ) t h r e e p a r a l l e l t e n t a t i v e in- t e r p r e t a t i o n s to l e t the s y n t a c t i c a n a l y - sis d e c i d e - n o u n , v e r b , a d v e r b - h a d to be a b a n d o n e d for r e a s o n s s i m i l a r to t h o s e t h a t led to t h e r e s i g n a t i o n in t h e c a s e o f h y - p e r s e n t e n t i a l c o n t e x t . T o o m a n y p o s s i b i l - ities, o f t e n c o m b i n e d w i t h o t h e r p a r a l l e l s o l u t i o n s , led to c o m b i n a t o r i a l e x p l o s i o n t h a t C t h o u g h o f t e n n o t a s s u m i n g t h e c h a r - a c t e r of a n i n f i n i t e loop) e x p a n d e d t h e s t r u c t u r e s to s u c h an e x t e n t t h a t s o o n e r or l a t e r a n o v e r f l o w b e c a m e i n e v i t a b l e . So
far, t h e r e is no r e m e d y for o v e r f l o w in o u r s y s t e m .
2.23 T h e last, r e l a t i v e l y s i m p l e , m e a s u r e c o n c e r n s c a s e s w h e r e a s i n g l e p a r s e [or m o r e p a r a l l e l s i n g l e p a r s e s ) - i.e., t r e e s c o v e r i n g i n d i v i d u a l i n p u t s t r i n g s - f a i l e d to be f o r m e d in t h e l a s t p h a s e of t h e a n a l - ysis; u s u a l l y t w o o r m o r e p a r t i a l t r e e s a r e f o r m e d i n s t e a d , w h i c h f a c t m a y be c a u s - ed by a n o m a l o u s s t r u c t u r e of t h e i n p u t s t r i n g , or o w i n g to s o m e p a r t i a l f a i l u r e in a n a l y z i n g o n e or m o r e s u b s t r i n g s [e.g., w h e n s o m e e l e m e n t C s ) or s t r u c t u r e [ s ~ w e r e m i s i n t e r p r e t e d ) , o r as a r e s u l t of s o m e
s u b j e c t i v e s h o r t c o m i n g s in t h e p r o g r a m - - o m i s s i o n , e r r o r , etc. T h e s y n t h e s i s p r o - g r a m is a b l e to p r o c e s s e v e n s u c h p a r t i a l a n d f r a g m e n t a r y r e s u l t s a n d a t t e m p t at c o m p i l i n g an a c c e p t a b l e o u t p u t , o n l y a s p e c i a l c h a r a c t e r ( ~ or } ) is p l a c e d in f r o n t of s u c h o u t p u t s t r i n g s to s i g n a l i z e t h a t t h e y h a d b e e n f o r m e d o n the b a s k s of d e f e c t i v e r e s u l t s of t h e a n a l y s i s . If n e c - e s s a r y , a set o f r u l e s of a m o r e or l e s s a d - h o c c h a r a c t e r d e p r i v e s " u n d e r d o n e "
( s u b ) t r e e s o f all a u x i l i a r y s t r u c t u r e s C c a t e g o r y l a b e l s , p a r e n t h e s e s , s e p a r a t o r s , f e a t u r e s , etc.) l e a v i n g o n l y ! e x i c a l v a l e u e s , a n d p e r f o r m s t h u s the f i n i s h i n g t o u c h e s to b r i n g t h e s u b s t i t u t e o u t p u t as c l o s e to r e a d a b l e a n d a c c e p t a b l e r e s u l t s as p o s s i b l e .
3. T h e o u t p u t s of i n d i v i d u a l p h a s e s c a n be o b t a i n e d in t h e l i s t i n g . S o m e of t h e s e p h a s e s , esp. t h e l a s t - b u t - o n e p h a s e f i x i n g the s t a t e o f t h i n g s b e f o r e t h e s y n t a c t i c m e a s u r e s h a v e b e e n a p p l i e d , u s u a l l y p r e - s e r v e i n f o r m a t i o n e n o u g h to r e c o g n i z e a n d e x a m i n e the u n r e t o u c h e d r e s u l t s a n d to d i v u l g e t h e d i a g n o s i s of e r r o r s or s h o r t - c o m i n g s n e c e s s a r y for f u r t h e r p r o g r e s s . T h i s is to s a y t h a t m o s t of t h e " e m e r g e n - cy" d e v i c e s o p e r a t e at m o m e n t s a n d in a m a n n e r w h i c h p e r m i t to e x a m i n e t h e p r e v i - o u s s t a t e o f t h i n g s , so t h a t t h e i r a c t i o n d o e s n o t o b s c u r e the r e g u l a r c o u r s e of the p r o c e s s i n g a n d a l l o w s n o r m a l c o n t r o l of it. It s h o u l d be a d d e d t h a t a p a r t of e m e r g e n c y d e v i c e s h a s a t e m p o r a r y c h a r a c - t e r d e a l i n g w i t h o m i s s i o n s a n d b u g s p r o p e r to t h e s y s t e m u n d e r d e v e l o p m e n t . W e a r e s u r e t h a t at l e a s t s o m e of t h e m w i l l b e - c o m e s u p e r f l u o n s .
REFERENCES
Haji~ovA, E. (1986) Machine Translation
Research in Czechoslovakia, Proceedings
of the Int.Conference on Translation
Mechanization, August 20-22, 1986, Co-
penhagen
Kirschner, Z. (1982~ A Dependency-Based
Analysis of English for the Purpose of
Machine Translation, Explizite Beschrei-
bung der SPrache und automatische Text-
bearbeitung IX, Prague
Kirschner, Z. (1984) On a Dependency Anal-
ysis of English for Automatic Transla-
tion. In: Contributions to Functional
Syntax, Semantics and Language Compre-
hension (ed.by P.Sgall), Prague, 335 -
358
Kirschner, Z. (in press), APAC3-2: An Eng-
lish-to-Czech Machine Translation Sys-
tem. Explizite Beschreibung der Sprache
und automatische Textbearbeitung XIII,