• No results found

Using a Natural Artificial Hybrid Language for Database Access

N/A
N/A
Protected

Academic year: 2020

Share "Using a Natural Artificial Hybrid Language for Database Access"

Copied!
7
0
0

Loading.... (view fulltext now)

Full text

(1)

U S I N G A N A T U R A L - A R T I F I C I A L H Y B R I D L A N G U A G E F O R D A T A B A S E A C C E S S

T e r u a k i A I Z A W A a n d N o b u k o H A T A D A

N H K T e c h n i c a l R e s e a r c h L a b o r a t o r i e s 1 - 1 0 - 1 1 , K i n u t a , S e t a g a y a , T o k y o 157, J a p a n

In t h i s p a p e r w e p r o p o s e a n a t u r a l - a r t i f i c i a l h y b r i d l a n g u a g e for d a t a b a s e a c c e s s . The g l o b a l c o n s t r u c t i o n o f a s e n t e n c e in t h i s l a n g u a g e is h i g h l y s c h e m a t i c , b u t a l l o w s e x p r e s s i o n s in t h e c h o s e n l a n g u a g e s u c h as J a p a n e s e o r E n g l i s h . Its a r t i f i c i a l l a n g u a g e part, SML, is c l o s e l y r e l a t e d to o u r n e w l y i n t r o d u c e d d a t a m o d e l , c a l l e d s c a l e d l a t t i c e . A d o p t i n g J a p a n e s e as its n a t u r a l l a n g u a g e p a r t , w e i m p l e m e n t e d a J a p a n e s e - S M L h y b r i d l a n g u a g e p r o c e s s i n g s y s t e m for o u r c o m p a c t d a t a b a s e s y s t e m S C L A M S , w h o s e d a t a b a s e c o n s i s t s o f s c a l - e d l a t t i c e s . T h e m a i n f e a t u r e s o f t h i s

i m p l e m e n t a t i o n a r e (i) a s m a l l l e x i c o n a n d l i m i t e d g r a m m a r , a n d (2) a n a l m o s t free f o r m in w r i t i n g K a n a J a p a n e s e .

1. I n t r o d u c t i o n

V a r i o u s q u e r y l a n g u a g e s for d a t a b a s e a c c e s s h a v e b e e n d e v e l o p e d , a m o n g w h i c h u n a m b i g u o u s a r t i f i c i a l o n e s are b e t t e r a d a p t e d to c o m p u t e r s . F o r m a n , o n the o t h e r h a n d , it w o u l d b e m o r e c o n v e n i e n t to c o m m u n i c a t e w i t h c o m p u t e r s in a n a t u r a l l a n g u a g e . T h e p o s s i b i l i t y o f m a n - m a c h i n e c o m m u n i c a t i o n in a n a t u r a l

l a n g u a g e h a s b e e n o n e o f the m a i n c o n c e r n s in t h e f i e l d o f a r t i f i c i a l i n t e l l i g e n c e , a n d c o n s i d e r a b l e r e s u l t s h a v e b e e n o b t a i n e d s p e c i f i c a l l y in

r e s e a r c h i n t o n a t u r a l l a n g u a g e a c c e s s to a d a t a b a s e . I~5 T h e s e r e s u l t s , h o w e v e r , s e e m to b e t o o c o m p l e x a n d i n f l e x i b l e for p r a c t i c a l a p p l i c a t i o n to g e n e r a l - p u r p o s e d a t a b a s e s y s t e m s .

W e w i l l p r o p o s e in t h i s p a p e r a " n a t u r a l - a r t i f i c i a l h y b r i d " l a n g u a g e for d a t a b a s e a c c e s s . T h e g l o b a l c o n s t r u c - t i o n o f a s e n t e n c e in t h i s l a n g u a g e is h i g h l y s c h e m a t i c b u t a l l o w s e x p r e s s i o n s in t h e c h o s e n l a n g u a g e s u c h as J a p a n e s e o r E n g l i s h . A J a p a n e s e v e r s i o n o f t h i s l a n g u a g e h a s b e e n i m p l e m e n t e d for o u r c o m p a c t d a t a b a s e s y s t e m S C L A M S 6 ; ( S C a l e d L A t t i c e M a n i p u l a t i o n S y s t e m ) . T h e m a i n f e a t u r e s o f t h i s i m p l e m e n t a t i o n are:

(I)

U s e o f o n l y a s m a l l l e x i c o n a n d l i m i t e d g r a m m a r so t h a t t h e y a r e q u i t e e a s y to i m p l e m e n t , a n d (2) A l l o w a n c e o f a l m o s t f r e e f o r m in

w r i t i n g K a n a J a p a n e s e .

F e a t u r e (i), w h i c h w i l l b e a c h i e v e d a l s o w h e n u s i n g o t h e r l a n g u a g e s l i k e E n g l i s h , F r e n c h , a n d so on, is o n e o f t h e m o s t n o t i c e a b l e m e r i t s o b t a i n e d b y u s i n g s u c h a n a t u r a l - a r t i f i c i a l h y b r i d l a n g u a g e for d a t a b a s e a c c e s s .

W e b e g i n w i t h a n e x p l a n a t i o n o f o u r b a s i c l o g i c a l u n i t o f d a t a , S c a l e d

L a t t i c e , o r S.L. for s h o r t , s i n c e t h e p r o p o s e d l a n g u a g e is c l o s e l y r e l a t e d to t h i s u n i t .

2. S M L : S c a l e d l a t t i c e m a n i p u l a - t i o n l a n g u a g e

2.1 S c a l e d l a t t i c e as a d a t a m o d e l W h a t the n o r m a l i z a t i o n t h e o r y in the r e l a t i o n a l d a t a m o d e l t e l l s us c a n b e s t a t e d v e r y l o o s e l y as "one f a c t in o n e p l a c e " 8 T h e c o n c e p t o f S c a l e d L a t t i c e , o r S.L. f o r s h o r t , a l s o g o e s a l o n g t h i s d i r e c t i o n .

R o u g h l y s p e a k i n g a n S.L. is a m u l t i - d i m e n s i o n a l t a b l e , a n d is d e f i n e d as a c o l l e c t i o n of d a t a o f o n e s p e c i e s a r r a n g - e d a t m u l t i - d i m e n s i o n a l l a t t i c e p o i n t s c o r r e s p o n d i n g to t h e c o m b i n a t i o n s o f a t t r i b u t e v a l u e s . Fig. 1 s h o w s a

g r a p h i c a l i m a g e o f S.L. w h i c h r e p r e s e n t s p o p u l a t i o n d a t a b y y e a r , p r e f e c t u r e , a n d sex.

Ye. 1980

d

1950

mYi

I (

Sex

r Population data

o

o ~

/ / /

- - - ~ ' <

LJ

/ / /

/

, Male popula- tion of Tokyo _--=in 1980.

Female popul~ tion of Tokyo in 1980.

Prefecture

r

o ~

All of male population data are arranged on this axis.

(2)

T h i s is an e x a m p l e o f t h r e e d i m e n - s i o n a l S . L ' s , w h i c h c a n b e f u r t h e r m o r e r e g a r d e d as a m a p p i n g o r a f u n c t i o n w i t h t h r e e v a r i a b l e s in the m a t h e m a t i c a l sense. L e t SI, $2, a n d $3 b e f i n i t e s e t s s u c h as

S1 = { 1950, 1 9 5 1 . . . 1 9 8 0 } , { T o k y o , O s a k a , N a g o y a . . . . }, $2

a n d

$3 = { m a l e , f e m a l e } .

A l s o l e t A b e a n a p p r o p r i a t e s e t h a v i n g e n o u g h e l e m e n t s to r e p r e s e n t v a l u e s o f p o p u l a t i o n . T h e n the a b o v e S.L. c a n b e n a t u r a l l y r e g a r d e d as a m a p p i n g :

F : S1 x $2 x S3 ~ A, (i)

w h i c h a s s o c i a t e s a n y t r i p l e (x, y, z) of a t t r i b u t e v a l u e s i n S1 x $2 x $3 w i t h t h e c o r r e s p o n d i n g p o p u l a t i o n v a l u e F(x, y, z). Thus, for e x a m p l e ,

F (1980, T o k y o , male)

d e n o t e s the m a l e p o p u l a t i o n o f T o k y o in 1980.

G e n e r a l l y an S.L. is a m a p p i n g F o f the d i r e c t p r o d u c t o f f i n i t e s e t s SI, ..., Sn i n t o an a p p r o p r i a t e s e t A d e n o t e d b y

F : S1 x ... x S n ~ A. (2)

T h e s e s e t s S1, ..., Sn and t h e i r

e l e m e n t s w i l l b e s o m e t i m e s c a l l e d r o o t w o r d s a n d l e a f w o r d s r e s p e c t i v e l y .

T h e f o l l o w i n g are the a d v a n t a g e s o f t h i s d a t a m o d e l :

(i

D a t a c o n t a i n e d in an S.L. c a n b e d i s p l a y e d e x a c t l y in the t w o - d i m e n s i o n a l t a b l e form, w h i c h is v i s u a l l y v e r y u n d e r s t a n d a b l e .

(2

In o r d e r to d i s p l a y d a t a in t a b l e

form, it is n e c e s s a r y to c u t o u t an a p p r o p r i a t e t w o - d i m e n s i o n a l c r o s s s e c t i o n f r o m the S . L . , o r m o r e p r e c i s e l y to s e l e c t t w o a p p r o p r i a t e s c a l e s o n w h i c h the t a b l e is c o n s t r u c t e d , and, at the s a m e time, to fix the r e m a i n i n g s c a l e s a t s o m e a t t r i b u t e v a l u e s . T h i s is n o t h i n g b u t a r e t r i e v a l o p e r a t i o n . C u t t i n g o u t s u c h a s e c t i o n is v e r y easy, w h i c h m e a n s t h a t c e r t a i n r e t r i e v a l o p e r a t i o n s a r e a l s o e a s y .

(3

S i n c e a n S.L. is r e g a r d e d as a m a p p i n g , p r e c i s e a n d p o w e r f u l

n o t a t i o n s c o n c e r n i n g " s e t s and m a p p i n g s " a r e d i r e c t l y a p p l i c a b l e

for m a n i p u l a t i o n o f t h e S.L. d a t a . 2.2 B r i e f o u t l i n e o f S C L A M S

W e h a v e i m p l e m e n t e d a c o m p a c t d a t a - b a s e s y s t e m S C L A M S ( S c a l e d l a t t i c e

m a n i p u l a t i o n s y s t e m ) , w h o s e d a t a b a s e c o n s i s t s o f S.L.'s.6, 7 S C L A M S h a s t h e f o l l o w i n g t h r e e m a j o r m o d e s :

(i)

S t o r a g e m o d e : S t o r a g e o f d a t a as a s e t o f S . L . ' s e d i t i n g f r o m a n y f i l e i n t o the d a t a b a s e .

(2) R e t r i e v a l m o d e : S e l e c t i o n o f o n e o r m o r e s u i t a b l e S . L . ' S f r o m t h e d a t a b a s e .

(3) M a n i p u l a t i o n m o d e : D a t a e x t r a c t i o n f r o m t h e a b o v e S . L . ' s a n d s o m e o p e r a t i o n o n the data.

Thus, a r e t r i e v a l o p e r a t i o n a c c o r d - i n g to a u s e r ' s q u e r y is d i v i d e d i n t o two m o d e s : R e t r i e v a l a n d M a n i p u l a t i o n . R e t r i e v a l m o d e is s i m i l a r to the d o c u - m e n t r e t r i e v a l s y s t e m , a n d M a n i P u l a t i o n m o d e to the d a t a b a s e s y s t e m , in a n a r r o w s e n s e , r e g a r d i n g e a c h S.L. as a s m a l l file. T h e m a i n c o n c e r n o f o u r d e s i g n o f S C L A M S w a s to c o m b i n e e f f e c t i v e l y t h e s e two m o d e s , in o t h e r w o r d s , to i n t e g r a t e t h e f u n c t i o n o f d o c u m e n t r e t r i e v a l s y s t e m s a n d t h a t o f d a t a b a s e s y s t e m s .

2.3 M a n i p u l a t i o n of s c a l e d l a t t i c e s b y S M L

In t h i s p a p e r w e w i l l f o c u s o u r a t t e n t i o n e x c l u s i v e l y o n M a n i p u l a t i o n m o d e o f S C L A M S . The m a j o r f u n c t i o n of t h i s m o d e is to m a n i p u l a t e S . L . ' s in a v a r i e t y o f w a y s s u c h as e x t r a c t i o n o f d a t a s a t i s f y i n g s p e c i f i e d c o n d i t i o n s , j o i n o f m o r e t h a n two S . L . ' s data, e l e m e n t a r y c a l c u l a t i o n s for e x t r a c t e d d a t a , etc. T h e s e o p e r a t i o n s a r e d o n e t h r o u g h a q u e r y l a n g u a g e for e n d u s e r s , n a m e d as S M L ( S c a l e d l a t t i c e M a n i p u l a - t i o n L a n g u a g e ) .

W e n o w s h o w a f e w e x a m p l e s to i l l u s t r a t e s o m e a s p e c t s o f SML. L e t F1 a n d F2 b e two S . L . ' s , i.e. two m a p p i n g s s u c h as

F1 : S l x $2 x S 3 ~ A I , a n d (3

F2 : S1 x $2 ~ A 2 , (4

w h e r e S1 = Y e a r s c a l e

(3)

$2 = P r e f e c t u r e s c a l e

= { T o k y o , O s a k a , N a g o y a , . . ) , (6) $3 = S e x s c a l e

= { m a l e , f e m a l e } , (7)

A 1 = S e t o f p o p u l a t i o n v a l u e s , A2 = S e t o f n u m b e r s o f T V s u b -

s c r i b e r s .

T h e s e S . L . ' s m a y b e c o n s i d e r e d as an o u t p u t o f R e t r i e v a l m o d e .

E a c h e x a m p l e b e l o w c o n s i s t s o f an i n f o r m a l q u e r y a n d the c o r r e s p o n d i n g f o r m a l o n e e x p r e s s e d b y SML. N o t i c e t h a t t h e S M L e x p r e s s i o n s c o n t a i n the m a t h e m a t i c a l n o t a t i o n s to d e s c r i b e s e t s a n d m a p p i n g s .

E x a m p l e i. L i s t the m a l e p o p u l a - t i o n o f T o k y o in 1980.

L I S T A;

A = F I ( 1 9 8 0 , T o k y o , m a l e ) ; E x a m p l e 2. L i s t n a m e s a n d t h e n u m b e r o f p r e f e c t u r e s in w h i c h the m a l e p o p u l a t i o n in 1980 is g r e a t e r t h a n o n e m i l l i o n .

L I S T B, C;

B = < X : F I ( 1 9 8 0 , X, m a l e ) > 1 , 0 0 0 , 0 0 0 > ;

C = C O U N T (B);

In t h i s e x a m p l e B is d e f i n e d as the s e t o f p r e f e c t u r e X's w i t h the p o p u l a t i o n v a l u e F I ( 1 9 2 0 , X, male) > 1 , 0 0 0 , 0 0 0 , a n d C as C O U N T o f B, w h e r e C O U N T is o n e o f a g g r e g a t e f u n c t i o n s p r e p a r e d i n S C L A M S .

E x a m p l e 3. L i s t n u m b e r s of T V s u b - s c r i b e r s i n 1980 o f p r e f e c t u r e s £n

w h i c h the f e m a l e p o p u l a t i o n in 1975 is l e s s t h a n o n e m i l l i o n .

L I S T NUM;

N U M = F 2 ( 1 9 8 0 , P);

P = < X : F I ( 1 9 7 5 , X, female) < i , 0 0 0 , 0 0 0 > ;

In t h i s e x a m p l e two S . L . ' s F1 a n d F2 are r e l a t e d b y a c o m m o n s c a l e $2.

G e n e r a l f o r m a t o f a q u e r y o r a s e n t e n c e b y S M L is s h o w n in Fig. 2.

L I S T al, a2, ..., am; b l = e x p r e s s i o n i;

b 2 = e x p r e s s i o n 2;

b n = e x p r e s s i o n n;

Fig. 2 G e n e r a l f o r m a t of a q u e r y b y S M L In t h i s f o r m a t e a c h o f v a r i a b l e s a l , . . . , a m is e q u a l to o n e of t h o s e bl, ..., bn; a n d the o r d e r of bl, ..., b n is a r b i t r a r y . T h e t y p e s of e x p r e s s i o n s c a n b e c l a s s i - f i e d i n t o £ h e f o l l o w i n g s i x c a t e g o r i e s : i) N u m e r a l o r l i t e r a l c o n s t a n t s ; e.g.

1980, T o k y o , m a l e , etc.

2) A g g r e g a t e f u n c t i o n v a l u e s ; e.g. C O U N T (x), S U M (y), etc. 3) S . L . ' s v a l u e s ; e.g.

F ( x l . . . . , xn) , etc.

4) S e t o p e r a t i o n f o r m u l a s ; e.g. x & y, xly, x-y, etc.

5) S e t d e f i n i t i o n f o r m u l a s ; e.g. <3, 5, 7, ii>, < T o k y o , N a g o y a ,

O s a k a > ,

<xi:F(xl,...,xi, . . . , x n ) < y > , etc. 6) A b b r e v i a t e n o t a t i o n s for e l e m e n t s o f

a s c a l e , i.e. l e a f w o r d s ; e.g. S.l, S . I I - 2 0 , etc.

• T h e l a t t e r , for e x a m p l e , r e p r e s e n t s f r o m l l t h to 2 0 t h e l e m e n t s o f a s c a l e S.

It w o u l d b e e a s i l y seen, f r o m t h e a b o v e e x p l a n a t i o n , t h a t a q u e r y b y S M L is e x p r e s s e d b a s i c a l l y as a s e t o f " n o n - p r o c e d u r a l " l o c a l q u e r i e s , a n d t h u s the q u e r y as a w h o l e h a s a l s o o f n o n -

p r o c e d u r a l n a t u r e .

3. H y b r i d i z a t i o n of S M L w i t h a n a t u r a l l a n g u a g e

3.1 A n i l l u s t r a t i v e e x a m p l e

W e h a v e a s s u r e d t h a t o u r q u e r y l a n g u a g e S M L is s u f f i c i e n t l y f l e x i b l e a n d h a s s t r o n g e x p r e s s i v e p o w e r ,

[image:3.596.298.548.91.741.2]
(4)

w i t h m a t h e m a t i c a l n o t a t i o n s c o n c e r n i n g s e t s a n d m a p p i n g s . H o w e v e r , w e c a n a l s o say t h a t S M L is l e s s c o n v e n i e n t t h a n a n a t u r a l l a n g u a g e w h i c h s e e m s to b e b e s t s u i t e d for c a s u a l u s e r s . W e t h e r e f o r e t r i e d to h y b r i d i z e S M L w i t h a n a t u r a l l a n g u a g e l i k e E n g l i s h , J a p a n e s e , e t c . , b e l i e v i n g t h a t s u c h a n a t u r a l - a r t i f i c i a l h y b r i d l a n g u a g e s h o u l d b e o n e o f the m i l e s t o n e s to a r e a l i z a t i o n o f d a t a b a s e s y s t e m s w h o l l y a c c e s s i b l e v i a u n r e s t r i c t - e d n a t u r a l l a n g u a g e s .

The n e x t e x a m p l e , c l o s e l y r e l a t e d to E x a m p l e 2 in t h e l a s t s e c t i o n , w i l l s h o w us h o w to h y b r i d i z e SML w i t h a n a t u r a l l a n g u a g e , say E n g l i s h .

E x a m p l e 4. L i s t n a m e s a n d the n u m b e r of p r e f e c t u r e s in w h i c h the m a l e p o p u l a t i o n in 1980 is l e s s t h a n t h e f e m a l e p o p u l a t i o n o f T o k y o in 1970.

N o w w e c o n s i d e r the f o l l o w i n g t w o t y p e s o f e x p r e s s i o n s for this q u e r y . T _ ~ e I ( O r i g i n a l f o r m a l e x p r e s s i o n b y

SML) L I S T A, B;

A = < X : F I ( 1 9 8 0 , X, male) < C >; B = C O U N T (A) ;

C = F I ( 1 9 7 0 , T o k y o , f e m a l e ) ; T y p e II ( E x t e n d e d n e w e x p r e s s i o n )

L I S T A, B;

A = N a m e s of p r e f e c t u r e s in w h i c h the m a l e p o p u l a t i o n in 1980 is l e s s t h a n C;

B = N u m b e r of e l e m e n t s o f A;

C = V a l u e o f the f e m a l e p o p u l a t i o n of T o k y o in 1970;

The f e a t u r e s o f T y p e II e x p r e s s i o n s are:

(i)

T h e g l o b a l c o n s t r u c t i o n is q u i t e s i m i l a r to t h a t o f T y p e I e x p r e s - sion, b u t it a l l o w s us to w r i t e p h r a s e s in the c h o s e n n a t u r a l l a n g u a g e for d e f i n i t i o n s o f v a r i - a b l e s s u c h as A, B, a n d C. (If n e c e s s a r y , s o m e o f the v a r i a b l e s m a y r e t a i n the o r i g i n a l f o r m a l d e f i n i t i o n s . )

(2)

N o t i c e t h a t v a r i a b l e s y m b o l s s u c h as A a n d C c a n b e e m b e d d e d in o r d i n a r y E n g l i s h p h r a s e s , so t h a t the o r i g i n a l q u e r y e x p r e s s e d as a

c o m p l e x s e n t e n c e is d i v i d e d i n t o s o m e s i m p l e q u e r i e s . T h i s c o n t r i - b u t e s to r e a d a b i l i t y o f q u e r i e s b o t h for m a n a n d c o m p u t e r .

3.2 F e a t u r e s of a J a p a n e s e - S M L v e r s i o n We h a v e i m p l e m e n t e d a " J a p a n e s e - SML" h y b r i d l a n g u a g e p r o c e s s i n g s y s t e m , as an e x t e n s i o n of S C L A M S . T h e m a j o r d e s i g n g o a l w a s to b e p r a c t i c a l r a t h e r t h a n j u s t a m b i t i o u s . The p r o c e s s i n g s y s t e m , w h i c h w i l l b e c a l l e d T r a n s l a t o r , is e s s e n t i a l l y a t r a n s l a t o r o f a J a p a n e s e p h r a s e i n t o the c o r r e s p o n d i n g S M L e x p r e s - sion, o r in the a b o v e t e r m i n o l o g y , o f a T y p e II e x p r e s s i o n i n t o its T y p e I e q u i v a l e n t . T h e m a i n p r o c e s s o f T r a n s - l a t o r is s h o w n in Fig. 3.

Type II eipression

Syntax Analysis l<---

Conversion .... ] ~ -

Type I expression

Japanese Grammar

Rules

Fig. 3 Process of Translator

S o m e c o n s i d e r a t i o n s in a c h i e v i n g p r a c t i c a b i l i t y o f the i m p l e m e n t e d s y s t e m are :

(i) In o u r i m p l e m e n t a t i o n a J a p a n e s e s e n t e n c e o r p h r a s e c a n b e w r i t t e n as a s t r i n g o f o n l y K a n a c h a r a c t e r s , in w h i c h c a s e it is d e s i r a b l e , for c o n v e n i e n c e , to g u a r a n t e e f r e e d o m f r o m s e g m e n t a t i o n as m u c h as

p o s s i b l e . O u r s y s t e m i n d e e d a l l o w s the f r e e w r i t i n g o f a K a n a s e n t e n c e , as l o n g as t h e l e a f w o r d s (the e l e m e n t s of scales) c a u s e n o c o n - f u s i o n w i t h the r e s e r v e d w o r d s in t h e l e x i c o n .

(2) It is d e s i r a b l e to k e e p the g r a m m a r as c o m p a c t as p o s s i b l e to s a v e s t o r a g e s p a c e a n d p r o c e s s i n g time. T h i s w a s d o n e b y r e s t r i c t i n g f o r m s o f p o s s i b l e T y p e II e x p r e s s i o n s .

4. T r a n s l a t i o n o f J a p a n e s e i n t o S M L 4.1 M i c r o - g r a m m a r for J a p a n e s e

[image:4.596.334.499.316.417.2]
(5)

c l a s s i f i e d into six c a t e g o r i e s i)~6). Then the p o s s i b l e Type II e x p r e s s i o n s , w h i c h our T r a n s l a t o r c a n accept, are r e s t r i c t e d to those c o r r e s p o n d i n g to the c a t e g o r i e s 2), 3), and a p a r t of 5), i.e. the s o - c a l l e d i m p l i c i t set d e f i n i - tions. It s h o u l d be n o t i c e d that

e x p r e s s i o n s b e l o n g i n g to the o t h e r c a t e g o r i e s are n e a t l y e x p r e s s e d rather by Type I forms.

We now show the l e x i c o n and the g r a m m a t i c a l rules p r e s c r i b i n g these Type II e x p r e s s i o n s .

L e x i c a l items and their c a t e g o r i e s . There are 12 c a t e g o r i e s of l e x i c a l items.

l) N u m : N u m b e r s , e.g.

12, 165.3, -0.137, etc.

2) Naux: A u x i l i a r y numbers, e.g.

hyaku, byaku, pyaku, sen, m a n (hundred, t h o u s a n d , m i l l i o n ) , etc.

3) ~ : Names of a g g r e g a t e functions, e.g.

kosu, souwa, saidai, h e i k i n (count, sum, m a x i m u m , average), etc.

4) e ~ : E q u a l i t y w o r d s or copulas, e.g.

no, dearu, deatte, n i h i t o s h i i , n i h i t o s h i k u (is equal to), etc.

5) ~ : W o r d s for c o m p a r i s o n , e.g.

ijo, ika, miman, igo (more, less, later), etc.

6) Comp____~2: P a r t i c l e for c o m p a r i s o n , i.e.

yori, y o r i m o ( % than).

7)

a d j : A d j e c t i v e s , e.g.

ookii, hayail shouno, daino (large, early, small, wide), e tc.

8)* Root : Root words, i.e. names of scales, e.g.

nen, ken (year, p r e f e c t u r e ) , etc.

9)* Leaf : L e a f words, i.e. e l e m e n t s of scales, e.g.

1980, Tokyo, o t o k o (male), etc.

l0 * Unit: W o r d s for data units, e.g.

en, nin, km (Yen, person, k i l o m e t e r ) , etc.

ii) * S L : Names of S.L.'s r e p r e s e n t i n g

the sort of the S.L. data, u s u a l l y given at S t o r a g e mode, e.g.

jinko, TV k e i y a k u s h a

(population, TV s u b s c r i b e r ) , etc.

12)** Var: V a r i a b l e names such as

A, B, KEN, etc.

The items in the c a t e g o r i e s m a r k e d by one a s t e r i s k are a u t o m a t i c a l l y added

to the l e x i c o n at the b e g i n n i n g of M a n i p u l a t i o n mode in o r d e r to cover

those S.L.'s w h i c h are p a s s e d from R e t r i e v a l mode, and d e l e t e d a f t e r use. They are thus h i g h l y a p p l i c a t i o n oriented.

The l e x i c o n w o u l d b e c o m e v e r y large if it i n c l u d e d the items in Leaf

category. We tried to e x c l u d e t h e m f r o m our l e x i c o n by c o n t r i v i n g a re- c o g n i t i o n m e t h o d of t h e m from the c o n t e x t s , so that the l e x i c o n c o n t a i n s o n l y about 100 a p p l i c a t i o n i n d e p e n d e n t items plus a p p l i c a t i o n o r i e n t e d ones.

Var c a t e g o r y m a r k e d by two a s t e r i s k s was also e x c l u d e d f r o m our lexicon,

since the f o r m a t i o n rules of this c a t e g o r y is w e l l - d e f i n e d and e a s i l y p r o g r a m m e d .

G r a m m a t i c a l rules. It was s u f f i c i - e n t to p r e p a r e m e r e l y a d o z e n g r a m m a t i c a l rules e x p r e s s e d as c o n t e x t - f r e e - l i k e p r o d u c t i o n s w i t h c o n d i t i o n s of a p p l i c a - tion.

l) Initial p r o d u c t i o n

IRI

S ~ D V

2) R a n g e - o f - S . L . p h r a s e

R-~- I Var

M o d M o d . . . M o d

i

n

SL

(6)

3) Root m o d i f i e r

~ ~ M o d Mod ... Mod

Y

n

SL

Condition: n = dim(SL)-l.

4)

5)

6)

M o d i f i e r

Mod ~ {(R°°tD ga) Leaf

k

D o m a i n - o f - S .L. phrase

D ~ I Var }

(R ga cond) Root

Numer ic value

eq

V ~ i Var 1

Num (Naux) (Unit)

D nita--~suru Agg

7) C o n d i t i o n

c o n d ~ V < (c°mp i) eq } comp 2 adj

An e x a m p l e of p a r s i n g trees by this grammar is given in Fig. 4. We assume that 'jinko' S.L. is of d i m e n s i o n three.

D

I E

Mod Mod

A

Leaf eq Leaf eq SL

I J

I

i t

eolnd

Var compl eq Root

i u

i i

1980 no otoko no jinko ga C ijo no ken

(Prefectures in which the male population in 1980 is greater than C.)

Fig. 4 Example of a parsing tree

4.2 T r a n s l a t i o n into SML

T r a n s l a t i o n from Type II e x p r e s s i o n s in J a p a n e s e into Type I e x p r e s s i o n s in

'pure' SML is p e r f o r m e d by u s i n g two fundamental tools: a w o r d - f o r - w o r d c o n v e r s i o n table and a c o n v e r s i o n procedure.

W o r d - f o r - w o r d c o n v e r s i o n table. This is p r e p a r e d for the f o l l o w i n g five c a t e g o r i e s of lexical items:

Agg, compl, adj, Root*, SL*.

For the a s t e r i s k e d c a t e g o r i e s the table is made up w h e n e v e r M a n i p u l a t i o n mode is invoked. A p o r t i o n of the c o n v e r s i o n table is shown in Table i.

Table 1 Word-for-word conversion table (a part)

Category

Agg

compl

adj

Root

SL

Words (Items) source target kosu COUNT souwa SUM saidai MAX ijo >= miman < ookii > hayai < daino > nen Sl ken $2 jinko F1 menseki F2

C o n v e r s i o n procedure. Since the p r o p o s e d grammar is so compact, we

c o n s i d e r e d that the c o n v e r s i o n p r o c e d u r e i n c l u d i n g syntax a n a l y s i s w o u l d be best r e a l i z e d through a g e n e r a l - p u r p o s e p r o g r a m m i n g language, say PL/I, rather than a c o m p r e h e n s i v e g r a m m a r - w r i t i n g s y s t e m like ATN. 9) This will also c o n t r i b u t e to a p o r t a b i l i t y of the system.

The p r o g r a m m i n g c o n s i d e r a t i o n s were:

(1)

To insure a free w r i t i n g of a J a p a n e s e Kana phrase, we a d o p t e d a l e f t - t o - r i g h t parsing, p r e d i c t i n g the s u c c e e d i l g category. However, the lexicon does not include the leaf words, we h a d to impose the r e s t r i c t i o n that any leaf word should be e n c l o s e d by a space or an apostrophe.

[image:6.596.44.242.105.421.2] [image:6.596.50.270.484.615.2]
(7)

e x a m p l e g i v e n b e l o w . )

(3) T w o i m p o r t a n t s t e p s in a p a r s i n g f l o w a r e the d e c i s i o n s :

a) W h i c h o f the i n i t i a l p r o d u c t i o n s c a n b e a p p l i e d ; S ~ R , S---~D, or S ~ V ?

b) W h i c h ~ p h r a s e a c t u a l l y a p p e a r s , R o r R?

4.3 A n e x a m p l e

W e n o w r e t u r n to E x a m p l e 4 in S e c t i o n 3.1. T h a t q u e r y w i l l b e w r i t t e n in T y p e II f o r m in J a p a n e s e as f o l l o w s .

(We a d o p t h e r e a r e a l n o t a t i o n o f o u r s y s t e m u s i n g K a n a c h a r a c t e r s . )

E x a m p l e 5. (A J a p a n e s e t r a n s l a t i o n o f E x a m p l e 4).

L I S T A, B;

A = ' I 9 8 0 ' I ' ~ = " I ~ Y ~ C ~ / ~ w ;

B = A / = ~ ;

C = 1 9 7 0 / ~ ¢ # ~ / ~ Y T / ~ Y = ~ ; T h i s T y p e II e x p r e s s i o n w i l l b e t r a n s l a t e d i n t o t h e f o l l o w i n g T y p e I e q u i v a l e n t .

L I S T A, B;

S Y S 0 1 = '1980'; S Y S 0 2 = ' ~ b = ' ;

A = < X : F I ( S Y S 0 1 , X, SYS02) < C > ; B = C O U N T (A) ;

S Y S 0 3 = '1970'; S Y S 0 4 = ' b ~ ~ ~ ' ;

S Y S 0 5 = ' ~ Y T ' ;

C = F I ( S Y S 0 3 , S Y S 0 4 , S Y S 0 5 ) ;

5. C o n c l u S i o n s

O u r c o m p a c t d a t a b a s e s y s t e m S C L A M S w i t h a t r a n s l a t o r f r o m J a p a n e s e i n t o S M L h a s b e e n i m p l e m e n t e d for I B M 3 7 0 / 1 3 8 . The t r a n s l a t o r is a P L / I p r o g r a m c o n - s i s t i n g o f a b o u t 500 s t a t e m e n t s i n c l u d - i n g t h e l e x i c o n a n d t h e g r a m m a t i c a l r u l e s t h e m s e l v e s . T h e o v e r a l l p e r - f o r m a n c e o f the t r a n s l a t o r s e e m s to b e s u f f i c i e n t for p r a c t i c a l use. In fact, the t r a n s l a t i o n t i m e of e a c h T y p e II e x p r e s s i o n is a b o u t 1 s e c o n d .

W e b e l i e v e , f r o m o u r e x p e r i e n c e s , t h a t a n a t u r a l - a r t i f i c i a l h y b r i d l a n g u a g e l i k e o u r s w i l l b e a p r a c t i c a l s t e p to e x p l o r e the b e t t e r l a n g u a g e s for d a t a - b a s e a c c e s s , s p e c i f i c a l l y for c a s u a l u s e r s .

A c k n o w l e d g e m e n t . T h e a u t h o r s w i s h to e x p - ~ s ~ £ ~ g-r-atitute to Y. S u z u k i , t h e f o r m e r D e p u t y - D i r e c t o r o f N H K

T e c h n i c a l R e s e a r c h L a b o r a t o r i e s a n d M. M a c h i d a , H e a d o f I n f o r m a t i o n P r o c e s s i n g R e s e a r c h G r o u p o f t h e L a b o r a t o r i e s for e n c o u r a g e m e n t a n d g u i d a n c e . T h e y are a l s o g r a t e f u l to J. K u t s u z a w a , S e n i o r R e s e a r c h E n g i n e e r o f o u r g r o u p f o r h i s v a l u a b l e c o m m e n t s c o n c e r n i n g the im- p l e m e n t a t i o n o f the s y s t e m .

i.

2.

3.

4 .

5.

6 .

7.

8 .

9 .

R e f e r e n c e s

W . A . W o o d s e t al.: T h e l u n e r s c i e n c e s n a t u r a l l a n g u a g e i n f o r m a t i o n s y s t e m . B B N Rep. 2378, B o l t B e r a n e k a n d N e w m a n , C a m b r i d g e , M a s s . , 1972. E.F. C o d d : S e v e n s t e p s to r e n d e z v o u s w i t h t h e c a s u a l u s e r . In " D a t a b a s e m a n a g e m e n t " , J.W. K l i m b i e e t al., e d s . , N o r t h - H o l l a n d , A m s t e r d a m , 1974, pp. 1 7 9 - 2 0 0 .

L.R. H a r r i s : U s e r o r i e n t e d d a t a b a s e q u e r y w i t h the R O B O T n a t u r a l l a n g u a g e q u e r y s y s t e m . Proc. 3rd V L D B , T o k y o , Oct. 1977.

G.G. H e n d r i x e t al.: D e v e l o p i n g a n a t u r a l l a n g u a g e i n t e r f a c e to c o m - p l e x data. A C M T r a n s . o n D a t a b a s e S y s t e m s , Vol. 3, N o . 2 , J u n e 1978, pp. 1 0 5 - 1 4 7 .

M. S i b u y a et al.: N o u n - p h r a s e m o d e l a n d n a t u r a l q u e r y l a n g u a g e . I B M J. RES. D E V E L O P . , Vol. 22, N o . 5 , Sep. 1978, pp. 5 3 3 - 5 4 0 ~

T. A i z a w a et al.: S C L A M S - a d a t a p r o c e s s i n g s y s t e m (in J a p a n e s e ) . P r e p r i n t o f W G D B M S of IPSJ, T o k y o , J u l y 1979.

T. A i z a w a (ed.) : S C L A M S - a u s e r ' s m a n u a l . N H K Res. L a b . , T o k y o , Apr. 1980.

C. J. Date: A n i n t r o d u c t i o n to d a t a - b a s e s y s t e m s , 2 n d ed.. A d d i s o n - W e s l e y , 1977.

Figure

Fig. 2 General format of a query by SML
Fig. 3 Process of Translator
Table 1 Word-for-word conversion table (a part)

References

Related documents