AN AUTOMATICALLY P R O D U C E D THESAURUS
GERHARD SCHWANHAUSSER, T h e s a u r u s H o c h s c h u l f o r s c h u n g H o c h s c h u l b a u , I n s t i t u t fur H o c h s c h u l b a u , S o n d e r f o r s c h u n g s b e r e i c h 63, S t u t t g a r t , F e d e r a l R e p u b l i c o f G e r m a n y
A p r o c e d u r e f o r the c o m p u t e r g e n e r a t i o n of a t h e s a u r u s f r o m a s e t of d e s c r i p t o r s , m a n u a l l y a s s i g n e d to the d o c u m e n t s in a l i b r a r y , is d e s c r i b e d . R e c o g n i s e s only a q u a s i a s s o c i a t i v e r e l a t i o n s h i p a m o n g d e s c r i p -t o r s . The s p e c i f i c a d v a n -t a g e s of -the -t h e s a u r u s as an o p e n - e n d e d one, -the k e y w o r d s d e r i v e d f r o m a c t u a l d o c u m e n t s b e i n g the m o s t helpful i n r e t r i e v -ing the d o c u m e n t s , and as an aid in i n f o r m a t i o n s e a r c h in the c o l l e c t i o n a r e pointed out. C o m p u t a t i o n a l p r o c e d u r e s for g e n e r a t i n g the t h e s a u r u s i n c l u d e k e y w o r d s t a t i s t i c s , m a t r i x i n v e r s i o n , c a l c u l a t i o n o f s i m i l a r i t y m a t r i x using T a n i m o t o coefficient, a u t o m a t i c c l u s t e r a n a l y s i s using m i n i m a l t r e e p r o c e d u r e , and c o m p i l a t i o n of g r o u p s and m a i n g r o u p s of d e s -c r i p t o r s a r e given. A n a l g o r i t h m for the g r a p h i -c d i s p l a y o f the m a i n g r o u p s of d e s c r i p t o r s h a s b e e n f o r m u l a t e d . T h e m a i n d i s a d v a n t a g e of the p r o c e d u r e is t h a t only a l i m i t e d n u m b e r of k e y w o r d s can be p r o c e s s e d w i t h i n a r e a s o n a b l e c o m p u t e r CPU t i m e . P o i n t s out that the p r o c e d u r e can be a p p l i e d to a l i b r a r y of 10, 000 to 20, 000 d o c u m e n t s with a k e y w o r d b a s e of 1, 000 w o r d s u s i n g about 3 h o u r s of c o m p u t i n g t i m e .
SEMINAR ON THESAURUS (1975). P a p e r BA
1 G E N E R A L CONSIDERATIONS
The b a s i c p r e r e q u i s i t e for the e s t a b l i s h -ment of an a u t o m a t i c a l l y c o n s t r u c t e d t h e s a u r u s is the e x i s t e n c e of an indexed c o r p u s of d o c u ments. In the c a s e of the l i b r a r y of the Z e n t r a -larchiv fur H o c h s c h u l b a u , S t u t t g a r t , t h e r e h a s been available a c o r p u s of 5, 900 m a n u a l l y i n -dexed d o c u m e n t s (1). We s u p p o s e that, e v e n if the indexing h a s b e e n done c a r e f u l l y it is s t i l l a rather s u b j e c t i v e w o r k and we should use the minimum of f u r t h e r a s s u m p t i o n s to p r o d u c e a thesaurus, w h i c h we c a l l , on a c c o u n t of its iaterindexer i n c o n s i s t e n c y , a p r a g m a t i c t h e saurus (3), Defining t h e s a u r u s as an i n t e r related l i s t of k e y w o r d s of the a r e a of k n o w -ledge which is u n d e r c o n s i d e r a t i o n , the q u e s t i o n can be r a i s e d how to l i m i t the a r e a of knowledge and haw to d e c i d e which key w o r d s do b e l o n g to this area, s i n c e new b r a n c h e s of s c i e n c e d e v e l o p their v o c a b u l a r i e s f a i r l y r a p i d l y and e v e n c h a n ges of meaning o c c u r as t i m e g o e s on. A p r a g -matic t h e s a u r u s s u c h a s o u r s i m p l i e s s o m e basic a s s u m p t i o n s . F i r s t l y , w e a s s u m e d e v e r y document e n t e r e d into the l i b r a r y i s r e l e v a n t £or this a r e a of knowledge- S e c o n d l y , the manu-ally assigned key w o r d s for e v e r y d o c u m e n t a r e taken to be the m o s t r e l e v a n t o n e s . T h i r d l y , there is only one type of i n t e r r e l a t i o n b e t w e e n the key w o r d s , n a m e l y a q u a s i a s s o c i a t i v e r e -lation of e v e r y key w o r d of a d o c u m e n t to all other key w o r d s of t h a t d o c u m e n t . Having adopted these a s s u m p t i o n s to a s p e c i a l l i b r a r y you need but to do s o m e m a c h i n e c a l c u l a t i o n s
and a t h e s a u r u s is a l m o s t r e a d y , A p r o c e d u r e s u c h as this f a c i l i t a t e s the u p d a t i n g of a t h e s a u -r u s , w h e n e v e -r sufficient a d d i t i o n a l d o c u m e n t s have b e e n e n t e r e d into the l i b r a r y . A d e t a i l e d s p e c i f i c a t i o n of the a r e a of i n t e r e s t is not n e c e s -s a r y , -s i n c e the i n d e x e d d o c u m e n t c o l l e c t i o n can be u n d e r s t o o d as a d e s c r i p t i o n of it. In a d d i t i o n to t h a t the l i s t of key w o r d s need not - should not - be fixed in a d v a n c e . It should r a t h e r be a n open ended l i s t , a d d i n g new k e y w o r d s w h e n -e v -e r it is f-elt n -e c -e s s a r y . As to th-e m -e a n i n g of the k e y w o r d s , i t c a n b e t a k e n t h a t e v e r y k e y w o r d is defined by the c o n t e n t of t h e d o c u m e n t o r d o c u m e n t s t o which i t h a s b e e n a s s i g n e d i n the i n d e x i n g p r o c e s s .
T h e m a i n a d v a n t a g e of a p r a g m a t i c t h e -s a u r u -s d e r i v e d f r o m a c t u a l d o c u m e n t -s i -s t h a t i t will be a p r i n c i p a l r e t r i e v a l aid for t h o s e d o c u m e n t s , and i t will a l s o aid the u s e r t o find a p p r o -p r i a t e c o m b i n a t i o n of key w o r d s w h i c h would give s a t i s f a c t o r y r e t r i e v a l r e s u l t s w h i c h h e e x p e c t s f r o m the s y s t e m .
F u r t h e r , a p r a g m a t i c t h e s a u r u s can b e a v e r y good d i s c u s s i o n b a s i s for any c o n f e r e n c e i n t e r e s t e d in e s t a b l i s h i n g a t h e s a u r u s on a s i m i -l a r a r e a o f r e s e a r c h . T h e r e wi-l-l b e n o n e e d t o a r g u e by m e a n s of a r b i t r a r y e x a m p l e s m a d e off-h a n d by toff-he p a r t i c i p a n t s of toff-he c o n f e r e n c e ,
be-c a u s e a p r a g m a t i be-c t h e s a u r u s will be-c o n t a i n an a d e q u a t e n u m b e r of d i f f e r e n t e x a m p l e s a l o n g with the f r e q u e n c y c o u n t s f r o m w h i c h one m a y d r a w c o n c l u s i o n s m o r e e a s i l y than f r o m f i c t i
B A 1 SCHWANHAUSSEK
t i o u s e x a m p l e s . T h e m a i n r e s t r i c t i o n o f t h e a p p r o a c h d e s c r i b e d h e r e , i s the l i m i t e d n u m b e r o f k e y w o r d s w h i c h can b e p r o c e s s e d within r e a s o n a b l e c o m p u t e r C P U t i m e . W e g u e s s t h a t t h e p r o c e d u r e s c a n he a p p l i e d to a l i b r a r y of s o m e 10, 000 - 20, 000 d o c u m e n t s w i t h a key w o r d r e p e r t o i r o f 1000 w o r d s u s i n g not m o r e t h a n 3 h o u r s of c a l c u l a t i o n t i m e .
23 C a l c u l a t i o n o f S i m i l a r i t y M a t r i x
21
C O M P U T A T I O N A L P R O C E D U R E S K e y W o r d S t a t i s t i c s and M a t r i x I n v e r s i o n
T h e l i b r a r y of the Z e n t r a l a r c h i v fur H o c h s c h u l b a u k e e p s a l l i t s d o c u m e n t s with a n u m m e r u s c u r r e n s for i d e n t i f i c a t i o n (5, 7). F o r e v e r y key word, t h e r e i s a p e e k - a - b o o c a r d c a r r y i n g the n u m b e r s of the d o c u m e n t s to which t h i s k e y w o r d h a s b e e n a s s i g n e d d u r i n g t h e m a n u a l i n d e x i n g p r o c e s s . T h e i n f o r m a t i o n o n the p e e k - a - b o o c a r d s h a v e b e e n k e y p u n c h e d coding the k e y w o r d s with a n u m b e r . T h e s t a t i s t i c a l e v a l u a t i o n c a m e u p with two t a b l e s c o n -c e r n i n g the k e y w o r d f r e q u e n -c y and the i n d e x i n g d e p t h . T h e g r a p h i c d i s p l a y s h o w s t h a t m o s t k e y w o r d s h a v e b e e n u s e d b e t w e e n 20 and 400 t i m e s . T h e i n d e x i n g depth s h o w s t h a t o n the a v e r a g e a d o c u m e n t h a s b e e n i n d e x e d with 4 to
12 d e s c r i p t o r s . B e c a u s e the r e p r e s e n t a t i o n of all d o c u m e n t n u m b e r s p e r k e y w o r d i s a m a t r i x i t w a s n e c e s s a r y t o m a k e a m a t r i x i n v e r s i o n t o get all k e y w o r d s p e r d o c u m e n t .
22 A s s o c i a t e R e l a t i o n s h i p
I t i s a s s u m e d t h a t t h e r e i s a n a s s o c i a -t i v e r e l a -t i o n b e -t w e e n e v e r y -two d e s c r i p -t o r s ( = Key w o r d s ) , w h e n e v e r t h e y h a v e b e e n a s s i g n -e d t o on-e o r m o r -e d o c u m -e n t s T h -e r -e d u c t i o n of all p o s s i b l e t y p e s of r e l a t i o n s to a s i n g l e o n e , not d i s t i n g u i s h i n g b e t w e e n t h e m a t a l l , i s the r e a s o n why w e c a l l our t h e s a u r u s a p r a g m a t i c o n e . In s p i t e of t h e fact t h a t t r u e s e m a n t i c a n a -l y s i s w o u -l d b e a v e r y usefu-l thing, t h e r e a r e u p to now no c o m p u t e r i z e d s y s t e m s known, which would d o t h i s for a r e a l l i b r a r y . E v e n i n t e l l e c -t u a l -t h e s a u r u s c o n s -t r u c -t i o n i s a v e r y h a r d and t i m e c o n s u m i n g t a s k , s o t h a t m a n y a u t h o r s h a v e not s p e c i f i e d the t y p e of r e l a t i o n b e t w e e n k e y w o r d s for m a n u a l l y c o m p i l e d t h e s u a r i (8). W e c h e c k e d o u r p r o c e d u r e a g a i n s t the r e c o m -m e n d a t i o n s for the e s t a b l i s h -m e n t of -m o n o l i n g u a l t h e s a u r i (3) and found it to be w i t h i n the r u l e s . T h e a s s o c i a t i v e r e l a t i o n b e t w e e n two d e s c r i p t o r s is u n d e r s t o o d to be i n d e p e n d e n t of the s e q u e n c e and of the c o n t e x t of p o s s i b l e o t h e r d e s c r i p t o r s . A l l p a i r s o f d e s c r i p t o r s h a v e b e e n c h e c k e d e v e n if the c o m b i n a t i o n of s o m e d e s c r i p t o r s do not
m a k e s e n s e . ,
T h e s i m i l a r i t y m a t r i x h a s b e e n c a l c u -l a t e d on the b a s i s of the T a n i m o t o coefficient for w h i c h is a f u n c t i o n of the f r e q u e n c y of the des-c r i p t o r A and the f r e q u e n des-c y of d e s des-c r i p t o r B and of the f r e q u e n c y of t h e two d e s c r i p t o r s taken t o g e t h e r C .
T A N I M O T O C O E F F I C I E N T C
f =
A + B - C
A = F R E Q U E N C Y OF T H E D E S C R I P T O R A B = F R E Q U E N C Y O F T H E D E S C R I P T O R B C = F R E Q U E N C Y OF T H E D E S C R I P T O R S
A AND B T O G E T H E R
T h e c a l c u l a t i o n of the s i m i l a r i t y m a t r i x w a s the m o s t t i m e c o n s u m i n g o n e , b e c a u s e 450, 000 c o m b i n a t i o n s had to he c h e c k e d whether t h e r e is a c o m m o n f r e q u e n c y C d i f f e r e n t from z e r o and s o m e 45, 000 T a n i m o t o coefficients h a d to be c a l c u l a t e d . S i n c e the n e c e s s a r y com-p u t e r t i m e of one a n d a half h o u r w a s not availa b l e for a s i n g l e j o b , t h e s y m m e t r y of the simi-l a r i t y m a t r i x h a s not b e e n u s e d , to m a k e a re-s t a r t o f the p r o g r a m m e e a re-s i e r without pare-sre-sing all the r e s u l t s c a l c u l a t e d u p t o the b r e a k i n g p o i n t . T h e input data, n a m e l y t h e i n v e r t e d file could be k e p t w i t h i n the c o r e m e m o r y putting a l w a y s t h r e e i n t e g e r n u m b e r s i n t o one word u s i n g a M A S K / S H I F T s u b r o u t i n e . T h e com-plete s i m i l a r i t y m a t r i x h a s b e e n p r i n t e d out, t h e coded k e y w o r d s w r i t t e n in p l a i n t e x t along with t h e i r a b s o l u t e and r e l a t i v e f r e q u e n c y counts. A c c o r d i n g to s u g g e s t i o n s of I v a n o v a (4) only t h o s e d e s c r i p t o r s s h o u l d b e a s s u m e d t o b e re-l a t e d w h o s e T a n i m o t o c o e f f i c i e n t s s u m up to half of all c o e f f i c i e n t s of the s a m e l i n e . We did u s e an a p p r o x i m a t i o n to the p r o p o s a l of Ivanova p u t t i n g a l l t h o s e d e s c r i p t o r s into one g r o u p , w h i c h h a v e a T a n i m o t o c o e f f i c i e n t g r e a t e r than the m e a n v a l u e of t h a t l i n e . T h e r e s u l t of this | p r o c e d u r e w a s a l i s t of as m a n y o v e r l a p p i n g g r o u p s a s t h e r e w e r e d e s c r i p t o r s . A manipula t i o n w a s m a d e c o n c e r n i n g the v e r y r a r e and t h e v e r y f r e q u e n t d e s c r i p t o r s . All t h i r t y e i g h t des-c r i p t o r s a p p e a r i n g ondes-ce and six d e s des-c r i p t o r s a p p e a r i n g m o r e t h a n 500 t i m e s h a v e b e e n deleate f r o m the a n a l y s i s , s u c h t h a t t h e r e w e r e left 7 1 0 d e s c r i p t o r s A s i m i l a r r e c o m m e n d a t i o n has b e e n m a d e b y S a l t o n for the SMART s y s t e m . T h e l i s t of the r e s u l t i n g 710 d e s c r i p t o r groups c o r r e s p o n d s t o the a l p h a b e t i c a l l i s t s i n t r a d i -t i o n a l -t h e s a u r i , e x c e p -t -t h a -t -t h i s l i s -t does no-t d i s t i n g u i s h d i f f e r e n t t y p e s of r e l a t i o n s .
A u t o m a t i c a l l y p r o d u c e d t h e s a u r u s B A 4
24 A u t o m a t i c C l u s t e r A n a l y s i s
In g e n e r a l , t h e r e is a s y s t e m a t i c p a r t and an a l p h a b e t i c a l p a r t in e v e r y t h e s a u r u s . The s y s t e m a t i c p a r t d e s c r i b e s the a r e a u n d e r consideration by m e a n s of s o m e s u b s e t s of the keywords. T h e a u t o m a t i c p r o c e d u r e for c l u s -ter analysis can be d i v i d e d into t h o s e which produce s e p a r a t e d c l u s t e r s and t h o s e which produce o v e r l a p p i n g c l u s t e r s . Since in our case the a l p h a b e t i c a l l i s t of d e s c r i p t o r groups is a l r e a d y o v e r l a p p i n g we d e c i d e d to u s e the m i n i m a l - t r e e p r o c e d u r e w h i c h d i v i d e s all keywords into e x a c t l y s e p a r a t e d g r o u p s T h e name of the m i n i m a l - t r e e p r o c e d u r e d e r i v e s from the fact t h a t a m a t r i x of d i s t a n c e s h a s b e e n used in the o r i g i n a l w o r k (6), c h o o s i n g the m i n i -mal distance b e t w e e n the w o r d s . In our c a s e we had the s i m i l a r i t y m a t r i x as a b a s i s , so it is obviously the m a x i m u m of the s i m i l a r i t y which c o r r e s p o n d s to the s a m e p r o c e d u r e . We have kept the n a m e i n s p i t e of u s i n g the o p p o s i t e measure.
The m i n i m a l - t r e e p r o c e d u r e i s s t a r t e d with an a r b i t r a r y d e s c r i p t o r . A s e c o n d d e s criptor is picked up out of the g r o u p of d e s c r i p -tors of the f i r s t one, so t h a t the s e c o n d one h a s maximum s i m i l a r i t y to the f i r s t . T h e t h i r d descriptor h a s to h a v e the m a x i m u m value of similarly to one of the p r e d e c e s s o r s , and is linked c o r r e s p o n d i n g l y T h e p r o c e d u r e h a s to be continued until all d e s c r i p t o r s h a v e e n t e r e d the tree exactly o n c e . T h e t r e e o b t a i n e d by t h i s procedure i s i n d e p e n d e n t f r o m t h e s t a r t i n g point as to the sum of all s i m i l a r i t i e s w h i c h h a v e b e e n entered into the t r e e . To cut the m i n i m a l - t r e e into pieces it is r e c o m m e n d e d to u s e an a r b i t rary level of s i m i l a r i t y which should be s u r -passed u n l e s s a b r a n c h of the t r e e is not to be cut off (2). A s e r i e s of t r i a l s s h o w e d t h a t an appropriate l e v e l of s i m i l a r i t y is h a r d to define, since too low a l e v e l d o e s not b r e a k the t r e e into a sufficient n u m b e r of f r a c t i o n s , s o m e being very l a r g e , o t h e r s b e i n g v e r y s m a l l . T h e number of f r a c t i o n s i n c r e a s e s as the l e v e l of similarity i n c r e a s e s a t which the b r a n c h e s s t a n d the cutting t r i a l s , but u n f o r t u n a t e l y the s i z e of the fractions d e c r e a s e s m o r e r a p i d l y so t h a t a number of s i n g l e d e s c r i p t o r s w e r e found to form c l u s t e r s of t h e i r own To finish a f i r s t version of o u r t h e s a u r u s we d e c i d e d to cut the minimal t r e e m a n u a l l y u s i n g the m e a n v a l u e of all s i m i l a r i t i e s as d e c i s i o n aid. I will r e t u r n to this point at the end. T h e a r b i t r a r y d i v i s i o n of the m i n i m a l t r e e could be m a d e s u c h that all fractions a r e a l m o s t of the s a m e s i z e . We have 23 so called m a i n g r o u p s e a c h c o n t a i n i n g s o m e JO d e s c r i p t o r s , w h i c h is a q u i c k l y s c a n n a b l e size. The definition of our g r o u p s and m a i n
g r o u p s i s a n a l o g o u s t o the f a c e t a n d s e m a n t i c c l a s s e s i n t r a d i t i o n a l t h e s a u r u s w o r k .
3 C O M P I L A T I O N OF GROUPS AND MAIN GROUPS
T h i s p a r t of the t o t a l p r o c e d u r e is m a i n l y for output p u r p o s e s . I t p r o d u c e s l i s t s of a l l g r o u p s of d e s c r i p t o r s i n d i c a t i n g the n u m b e r of the m a i n g r o u p s within w h i c h a d e s -c r i p t o r -can be found. T h i s s -c h e m e of r e f e r e n -c e a m o n g the d e s c r i p t o r s will be of g r e a t u s e to one who i s t r y i n g t o r e t r i e v e d o c u m e n t s o n the b a s i s of key w o r d s but h a s not found the p r o -p e r key w o r d c o m b i n a t i o n t o get the r e l e v a n t d o c u m e n t s . U s i n g the r e f e r e n c e s h e can c h a n g e his k e y w o r d c o m b i n a t i o n s t e p b y s t e p , a l w a y s c h e c k i n g the s t a t i s t i c a l n e i g h b o u r h o o d of any k e y w o r d . T h e l i s t of m a i n g r o u p s h a s b e e n p r i n t e d a l o n g with two s e l e c t e d key w o r d s w h i c h m a y b e s o m e s o r t o f i n t u i t i v e r e p r e s e n -t a -t i v e s for -the m a i n g r o u p s , T h i s m e -t h o d -too h a s t o b e r e p l a c e d b y s o m e b e t t e r , o b j e c t i v e m e t h o d when a r e v i s i o n of the t h e s a u r u s is m a d e . T o m a k e the c l u s t e r s o f the m a i n g r o u p s e a s i e r t o r e a d a g r a p h i c r e p r e s e n t a t i o n h a s b e e n p r o d u c e d b y the p r o g r a m d e s -c r i b e d in the next s e -c t i o n .
4 G R A P H I C DISPLAY OF T H E MAIN GROUPS
T h e g r a p h i c d i s p l a y w a s not a p r i n c i p a l a i m o f t h i s t h e s a u r u s p r o j e c t , but t h e p r a c t i -cal v a l u e of s u c h a thing is w e l l r e c o g n i s e d . T h e r e f o r e , the p r o g r a m n e c e s s a r y t o d o t h i s h a s b e e n d e s i g n e d t o b e a s e a s y a s p o s s i b l e , not s t r i v i n g for the gain of c o m p u t e r and p l o t t e r t i m e . T h e a l g o r i t h m i s b a s e d o n a n e q u i -d i s t a n t g r i -d with l i m i t a t i o n in the -d i r e c t i o n of the one a x i s . T h e difficulty w a s to find a f r e e p l a c e for e v e r y point which h a s t o b e e n t e r e d into the t r e e s u c h t h a t a line could be d r a w n not c r e a t i n g a m b i g u i t y c o n c e r n i n g p r e v i o u s l y d r a w n l i n e s . T h i s h a s been s o l v e d b y the s e l e c t i o n of 32 p o i n t s which could n e v e r p r o -duce confusion. W h e n e v e r t h e r e w a s a big c l u s t e r t h e r e h a s b e e n a n a u x i l i a r y r o u t i n e t o e s c a p e f r o m t h a t p a r t o f the a r e a all filled u p with p o i n t s . T h i s e s c a p e could b e m a d e c h e c k -ing the v e r t i c a l l i n e s at left and at r i g h t for the n e a r e s t f r e e point. O b v i o u s l y t h i s d r a w i n g p r o c e d u r e could not r e p r e s e n t the s i m i l a r i t y b y the g e o m e t r i c d i s t a n c e . But the m e a s u r e of s i m i l a r i t y s t r o n g e r than the m e a n v a l u e h a s b e e n m a d e v i s i b l e b y double l i n e s b e t w e e n the
d e s c r i p t o r s . T h e m a i n g r o u p s h a v e b e e n d r a w n i n s e p a r a t e p i c t u r e s a s well a s all t o
B A 4 SCHWANHAUSSEK
g e t h e r , r e p r e s e n t i n g the w h o l e m i n i m a l t r e e
5 CONCLUSIONS
* The procedures described here have
e n a b l e d the a u t o m a t i c c o n s t r u c t i o n of a p r a g -m a t i c t h e s a u r u s for a given -m e d i u -m s i z e d
s p e c i a l l i b r a r y . T h i s t h e s a u r u s w i l l b e a v a l u a b l e r e t r i e v a l aid for the s a m e d o c u m e n t c o l l e c t i o n f r o m w h i c h i t h a s b e e n d e r i v e d , S i n c e the c o l l e c t i o n h a s b e e n m a n u a l l y i n d e x e d , t h i s t h e s a u r u s cannot b e a n i n d e x i n g i n s t r u -m e n t e x c e p t for e r r o n e o u s -m i s u s e o f k e y w o r d s .
T h e p r o c e d u r e s for the c o n s t r u c t i o n allow f o r s o m e m a n u a l s t e e r i n g o f p a r a -m e t e r s and cut-off v a l u e s . B e s i d e s t h e y a r e a l s o o b j e c t i v e p r o c e d u r e s w h i c h c a n b e r e -p e a t e d w h e n e v e r n e c e s s a r y , -p r o d u c i n g a n e x a c t m a p p i n g of a l l the m e a n i n g s of a l l the d o c u m e n t s within the l i b r a r y . T h e b a s i c a s s u m p t i o n t h a t m a n u a l i n d e x i n g c a n be a good s t a r t i n g point i s a s s u r e d t h r o u g h the f a c t , t h a t i t i s e a s i e r for a n e x p e r t t o s a y t h i s is a v a l u a b l e book for t h a t l i b r a r y and it s h o u l d b e a s s i g n e d t h e s e d e s c r i p t o r s , w h i c h c a n b e r e a d i n the t i t l e o r t h e t a b l e o f c o n t e n t s o r t h e h e a d i n g s , t h a n t o a s k a n e x p e r t o r e x p e r t c o n f e r e n c e " W h i c h a r e the k e y w o r d s o f the a r e a o f k n o w l e d g e which you a r e e x p e r t s in" ? . T h e a u t o m a t i c i n d e x i n g of d o c u m e n t s c a n be e x p e c t e d in the f u t u r e , but t h e r e is s t i l l a c o n -s i d e r a b l e a m o u n t of w o r k to be done on both s i d e s , l i n g u i s t i c a n d c o m p u t e r r e s e a r c h . We, t h e r e f o r e , hope t h a t a u t o m a t i c t h e s a u r u s c o n
s t r u c t i o n i s a n a i d for p e o p l e l o o k i n g i n t o l i b r a -r i e s t o find t h e i n f o -r m a t i o n t h e y n e e d .
6 B I B L I O G R A P H I C A L R E F E R E N C E S 1 S e c 1 B O U C H E ( R e i n h a r d ) , KIND
( F r i e d b e r t ) and SCHWANHAUSSER ( G e r h a r d ) . G e s a m t -b i -b l i o g r a p h i e , Z e n t r a l a r c h i v f u r H o c h s c h u l b a u . T e i l 1 und
2. H r s g . : Z e n t r a l a r c h i v f u r H o c h s c h u l b a u . S t u t t g a r t : Z e n t r a l a r c h i v fur H o c h s c h u l -b a u 1973. 436, 891 S.
2 S e c 24 D E I C H S E L ( G u n t r a m ) . V e r f a h r e n d e r A u t o m a t i s c h e n K l a s s i f i -k a t i o n d u r c h C l u s t e r a n a l y s e und i h r e Anwendung b e i m o r -p h o l o g i s c h e n U n t e r s u c h u n g e n a n A m o b e n . Z u l a s s u n g s a r b e i t zum S t a a t s e x a m e n fur d a s L e h r a m t a n G y m n a s i e n . S t u t t g a r t : U n i v e r s i t a t , I r t s t i -tut fur I n f o r m a t i k 1972. G e t r . B 4 Pag
S e c 1 22
S e c 23
S e c 21
S e c 24
Sec 21
Sec 22
DIN 1463 V o r n o r m . Richtlinien fur die E r s t e l l u n g und Wetter* e n t w i c k l u n g deutschsprachiger T h e s a u r i . B e r l i n , K o l n : B e u t h V e r t r . 1972, 11 S IVANOVA ( N S ) . P r o b l e m e der
a u t o m a t i s i e r t e n T h e s a u r u s b i l -dung. In : I n f o r m a t i k , Berling 17 (1970) H. 1, S. 2 5 - 2 8 . KIND ( F r i e d b e r t ) a n d SCHWAN.
HAUSSER ( G e r h a r d ) . T h e s a u -r u s H o c h s c h u l p l a n u n g . Stutt g a r t : Z e n t r a t a r c h i v fur H o c h s c h u l b a u 1973. S. 33-38. ( I n f o r m a t i o n . 6 / 2 5 )
KNODEL ( W a l t e r ) . Graphen-t h e o r e Graphen-t i s c h e M e Graphen-t h o d e n und ihm A n w e n d u n g e n . B e r l i n , Heidel-b e r g , New Y o r k : S p r i n g e r 1969. VIII, 111 S. (Okonom e t r i e und U n t e r n e h (Okonom e n s -f o r s c h u n g . 13)
RISCHKOWSKY ( F r a n z i s k a ) . T h e s a u r u s Hochschulplanung E i n a u f g a b e n b e z o g e n e r Thesaur u s fuThesaur die L i t e Thesaur a t u Thesaur d o k u m e n t a t i o n d e r H o c h s c h u l I n f o r n m a -t i o n s - S y s -t e m B m b H (HIS) H a n n o v e r . H r s g . : HIS Hannova M u n c h e n - P u l l a c h : V e r l a g D o k u m e n t a t i o n 1973. IX, 2055 THESAURUS
BILDUNGSFOKS-CHUNG. V e r z e i c h n i s der D e s k r i p t o r e n und Nichtdeskrv
p t o r e n i n d e r L i t e r a t u r d o k u m e n t a t i o n d e s M a x P l a n c k -I n s t i t u t e f u r Bildungs-forsdii H r s g : Rolf N e u h a u s . Mitarbs E . Guhde, B . H e g e l h e i m e r , M R i c k , E V o s w i n k e l . Mun c h e n - P u l l a c h : V e r l a g Doku-m e n t a t i o n 1972. XVI, 471 S
A P P E N D I X
F i g 1 D o c u m e n t T i t l e s with Keywords F i g 2 R e g i s t e r of K e y w o r d s in Context F i g 3 Indexing Depth
F i g 4 D e s c r i p t o r F r e q u e n c y F i g 5 E x a m p l e of L i s t of G r o u p s F i g 6 L i s t of K e y w o r d s
F i g 7 E x a m p l e of M a i n G r o u p F i g 8 P a r t of the T o t a l M i n i m a l Tree F i g 9 S e l e c t i o n s c h e m e for Drawing
of M a i n G r o u p s
A u t o m a t i c a l l y p r o d u c e d t h e s a u r u s B A 7
F i g 3 Indexing Depth
BA7
S C H W A N H A U S S E K
07.09.73 SEITE 2
ANALYSE
ANATOMIC
ANFAENGER
ANFORDERUNG (NICHT ALS NACHFRAGE)
* ANFORDERUNG
NACHPRAGE
ANGEBOT
* ANGESTELLTER
PERSONAL
* ANLEITUNG
EINFUEHRUNG
ANORDNUNG
ANORGANISCHE CHEHIE
ANPASSUNG
ANSIGHT
ANTEIL
ANWENDUNG
APOTHEKE
ARBEIT
* ARBEITSPEREICH
TAETIGKEIT BEMESSUNG
ARBEITSKRAFT
* ARBEITSKREIS
ARBEIT GRUPPE
ARBEITSMARKT
ARBEITSMITTEL
ARBEITSPLATZ
Fig 6 List of Keywords
Automatically produced thesaurus
AUSSCHREIBUNG
BEGRUENDUNG BELASTUNG BIBUDGRAPHIE DICHTEENTWICKLUNGSPLAN
ERGEBNIS
FACHHOCHSCHULE
FORM
KRANKENPFLEGE
KUNST
LOHPKOERPER
MOBILITAET
MUSTER
PHILOLOGIE PHILOSOPHIE P RI N Z I P PRIVATRECHTSWISSENSCHAFT RICHTUNG
SOZIALWISSENSCHAFT
5PRACHWISSENSCHAFT
STADTPtAN
THEOLUGIE
VERMINDERUNG
VERORDNUNG
VORKLINIKUM
WETTBEWEREWIRTSCHAFTSWISSENSCID: