• No results found

An automatically produced Thesaurus

N/A
N/A
Protected

Academic year: 2020

Share "An automatically produced Thesaurus"

Copied!
15
0
0

Loading.... (view fulltext now)

Full text

(1)

AN AUTOMATICALLY P R O D U C E D THESAURUS

GERHARD SCHWANHAUSSER, T h e s a u r u s H o c h s c h u l f o r s c h u n g H o c h s c h u l b a u , I n s t i t u t fur H o c h s c h u l b a u , S o n d e r f o r s c h u n g s b e r e i c h 63, S t u t t g a r t , F e d e r a l R e p u b l i c o f G e r m a n y

A p r o c e d u r e f o r the c o m p u t e r g e n e r a t i o n of a t h e s a u r u s f r o m a s e t of d e s c r i p t o r s , m a n u a l l y a s s i g n e d to the d o c u m e n t s in a l i b r a r y , is d e s c r i b e d . R e c o g n i s e s only a q u a s i a s s o c i a t i v e r e l a t i o n s h i p a m o n g d e s c r i p -t o r s . The s p e c i f i c a d v a n -t a g e s of -the -t h e s a u r u s as an o p e n - e n d e d one, -the k e y w o r d s d e r i v e d f r o m a c t u a l d o c u m e n t s b e i n g the m o s t helpful i n r e t r i e v -ing the d o c u m e n t s , and as an aid in i n f o r m a t i o n s e a r c h in the c o l l e c t i o n a r e pointed out. C o m p u t a t i o n a l p r o c e d u r e s for g e n e r a t i n g the t h e s a u r u s i n c l u d e k e y w o r d s t a t i s t i c s , m a t r i x i n v e r s i o n , c a l c u l a t i o n o f s i m i l a r i t y m a t r i x using T a n i m o t o coefficient, a u t o m a t i c c l u s t e r a n a l y s i s using m i n i m a l t r e e p r o c e d u r e , and c o m p i l a t i o n of g r o u p s and m a i n g r o u p s of d e s -c r i p t o r s a r e given. A n a l g o r i t h m for the g r a p h i -c d i s p l a y o f the m a i n g r o u p s of d e s c r i p t o r s h a s b e e n f o r m u l a t e d . T h e m a i n d i s a d v a n t a g e of the p r o c e d u r e is t h a t only a l i m i t e d n u m b e r of k e y w o r d s can be p r o c e s s e d w i t h i n a r e a s o n a b l e c o m p u t e r CPU t i m e . P o i n t s out that the p r o c e d u r e can be a p p l i e d to a l i b r a r y of 10, 000 to 20, 000 d o c u m e n t s with a k e y w o r d b a s e of 1, 000 w o r d s u s i n g about 3 h o u r s of c o m p u t i n g t i m e .

SEMINAR ON THESAURUS (1975). P a p e r BA

1 G E N E R A L CONSIDERATIONS

The b a s i c p r e r e q u i s i t e for the e s t a b l i s h -ment of an a u t o m a t i c a l l y c o n s t r u c t e d t h e s a u r u s is the e x i s t e n c e of an indexed c o r p u s of d o c u ments. In the c a s e of the l i b r a r y of the Z e n t r a -larchiv fur H o c h s c h u l b a u , S t u t t g a r t , t h e r e h a s been available a c o r p u s of 5, 900 m a n u a l l y i n -dexed d o c u m e n t s (1). We s u p p o s e that, e v e n if the indexing h a s b e e n done c a r e f u l l y it is s t i l l a rather s u b j e c t i v e w o r k and we should use the minimum of f u r t h e r a s s u m p t i o n s to p r o d u c e a thesaurus, w h i c h we c a l l , on a c c o u n t of its iaterindexer i n c o n s i s t e n c y , a p r a g m a t i c t h e saurus (3), Defining t h e s a u r u s as an i n t e r related l i s t of k e y w o r d s of the a r e a of k n o w -ledge which is u n d e r c o n s i d e r a t i o n , the q u e s t i o n can be r a i s e d how to l i m i t the a r e a of knowledge and haw to d e c i d e which key w o r d s do b e l o n g to this area, s i n c e new b r a n c h e s of s c i e n c e d e v e l o p their v o c a b u l a r i e s f a i r l y r a p i d l y and e v e n c h a n ges of meaning o c c u r as t i m e g o e s on. A p r a g -matic t h e s a u r u s s u c h a s o u r s i m p l i e s s o m e basic a s s u m p t i o n s . F i r s t l y , w e a s s u m e d e v e r y document e n t e r e d into the l i b r a r y i s r e l e v a n t £or this a r e a of knowledge- S e c o n d l y , the manu-ally assigned key w o r d s for e v e r y d o c u m e n t a r e taken to be the m o s t r e l e v a n t o n e s . T h i r d l y , there is only one type of i n t e r r e l a t i o n b e t w e e n the key w o r d s , n a m e l y a q u a s i a s s o c i a t i v e r e -lation of e v e r y key w o r d of a d o c u m e n t to all other key w o r d s of t h a t d o c u m e n t . Having adopted these a s s u m p t i o n s to a s p e c i a l l i b r a r y you need but to do s o m e m a c h i n e c a l c u l a t i o n s

and a t h e s a u r u s is a l m o s t r e a d y , A p r o c e d u r e s u c h as this f a c i l i t a t e s the u p d a t i n g of a t h e s a u -r u s , w h e n e v e -r sufficient a d d i t i o n a l d o c u m e n t s have b e e n e n t e r e d into the l i b r a r y . A d e t a i l e d s p e c i f i c a t i o n of the a r e a of i n t e r e s t is not n e c e s -s a r y , -s i n c e the i n d e x e d d o c u m e n t c o l l e c t i o n can be u n d e r s t o o d as a d e s c r i p t i o n of it. In a d d i t i o n to t h a t the l i s t of key w o r d s need not - should not - be fixed in a d v a n c e . It should r a t h e r be a n open ended l i s t , a d d i n g new k e y w o r d s w h e n -e v -e r it is f-elt n -e c -e s s a r y . As to th-e m -e a n i n g of the k e y w o r d s , i t c a n b e t a k e n t h a t e v e r y k e y w o r d is defined by the c o n t e n t of t h e d o c u m e n t o r d o c u m e n t s t o which i t h a s b e e n a s s i g n e d i n the i n d e x i n g p r o c e s s .

T h e m a i n a d v a n t a g e of a p r a g m a t i c t h e -s a u r u -s d e r i v e d f r o m a c t u a l d o c u m e n t -s i -s t h a t i t will be a p r i n c i p a l r e t r i e v a l aid for t h o s e d o c u m e n t s , and i t will a l s o aid the u s e r t o find a p p r o -p r i a t e c o m b i n a t i o n of key w o r d s w h i c h would give s a t i s f a c t o r y r e t r i e v a l r e s u l t s w h i c h h e e x p e c t s f r o m the s y s t e m .

F u r t h e r , a p r a g m a t i c t h e s a u r u s can b e a v e r y good d i s c u s s i o n b a s i s for any c o n f e r e n c e i n t e r e s t e d in e s t a b l i s h i n g a t h e s a u r u s on a s i m i -l a r a r e a o f r e s e a r c h . T h e r e wi-l-l b e n o n e e d t o a r g u e by m e a n s of a r b i t r a r y e x a m p l e s m a d e off-h a n d by toff-he p a r t i c i p a n t s of toff-he c o n f e r e n c e ,

be-c a u s e a p r a g m a t i be-c t h e s a u r u s will be-c o n t a i n an a d e q u a t e n u m b e r of d i f f e r e n t e x a m p l e s a l o n g with the f r e q u e n c y c o u n t s f r o m w h i c h one m a y d r a w c o n c l u s i o n s m o r e e a s i l y than f r o m f i c t i

(2)

B A 1 SCHWANHAUSSEK

t i o u s e x a m p l e s . T h e m a i n r e s t r i c t i o n o f t h e a p p r o a c h d e s c r i b e d h e r e , i s the l i m i t e d n u m b e r o f k e y w o r d s w h i c h can b e p r o c e s s e d within r e a s o n a b l e c o m p u t e r C P U t i m e . W e g u e s s t h a t t h e p r o c e d u r e s c a n he a p p l i e d to a l i b r a r y of s o m e 10, 000 - 20, 000 d o c u m e n t s w i t h a key w o r d r e p e r t o i r o f 1000 w o r d s u s i n g not m o r e t h a n 3 h o u r s of c a l c u l a t i o n t i m e .

23 C a l c u l a t i o n o f S i m i l a r i t y M a t r i x

21

C O M P U T A T I O N A L P R O C E D U R E S K e y W o r d S t a t i s t i c s and M a t r i x I n v e r s i o n

T h e l i b r a r y of the Z e n t r a l a r c h i v fur H o c h s c h u l b a u k e e p s a l l i t s d o c u m e n t s with a n u m m e r u s c u r r e n s for i d e n t i f i c a t i o n (5, 7). F o r e v e r y key word, t h e r e i s a p e e k - a - b o o c a r d c a r r y i n g the n u m b e r s of the d o c u m e n t s to which t h i s k e y w o r d h a s b e e n a s s i g n e d d u r i n g t h e m a n u a l i n d e x i n g p r o c e s s . T h e i n f o r m a t i o n o n the p e e k - a - b o o c a r d s h a v e b e e n k e y p u n c h e d coding the k e y w o r d s with a n u m b e r . T h e s t a t i s t i c a l e v a l u a t i o n c a m e u p with two t a b l e s c o n -c e r n i n g the k e y w o r d f r e q u e n -c y and the i n d e x i n g d e p t h . T h e g r a p h i c d i s p l a y s h o w s t h a t m o s t k e y w o r d s h a v e b e e n u s e d b e t w e e n 20 and 400 t i m e s . T h e i n d e x i n g depth s h o w s t h a t o n the a v e r a g e a d o c u m e n t h a s b e e n i n d e x e d with 4 to

12 d e s c r i p t o r s . B e c a u s e the r e p r e s e n t a t i o n of all d o c u m e n t n u m b e r s p e r k e y w o r d i s a m a t r i x i t w a s n e c e s s a r y t o m a k e a m a t r i x i n v e r s i o n t o get all k e y w o r d s p e r d o c u m e n t .

22 A s s o c i a t e R e l a t i o n s h i p

I t i s a s s u m e d t h a t t h e r e i s a n a s s o c i a -t i v e r e l a -t i o n b e -t w e e n e v e r y -two d e s c r i p -t o r s ( = Key w o r d s ) , w h e n e v e r t h e y h a v e b e e n a s s i g n -e d t o on-e o r m o r -e d o c u m -e n t s T h -e r -e d u c t i o n of all p o s s i b l e t y p e s of r e l a t i o n s to a s i n g l e o n e , not d i s t i n g u i s h i n g b e t w e e n t h e m a t a l l , i s the r e a s o n why w e c a l l our t h e s a u r u s a p r a g m a t i c o n e . In s p i t e of t h e fact t h a t t r u e s e m a n t i c a n a -l y s i s w o u -l d b e a v e r y usefu-l thing, t h e r e a r e u p to now no c o m p u t e r i z e d s y s t e m s known, which would d o t h i s for a r e a l l i b r a r y . E v e n i n t e l l e c -t u a l -t h e s a u r u s c o n s -t r u c -t i o n i s a v e r y h a r d and t i m e c o n s u m i n g t a s k , s o t h a t m a n y a u t h o r s h a v e not s p e c i f i e d the t y p e of r e l a t i o n b e t w e e n k e y w o r d s for m a n u a l l y c o m p i l e d t h e s u a r i (8). W e c h e c k e d o u r p r o c e d u r e a g a i n s t the r e c o m -m e n d a t i o n s for the e s t a b l i s h -m e n t of -m o n o l i n g u a l t h e s a u r i (3) and found it to be w i t h i n the r u l e s . T h e a s s o c i a t i v e r e l a t i o n b e t w e e n two d e s c r i p t o r s is u n d e r s t o o d to be i n d e p e n d e n t of the s e q u e n c e and of the c o n t e x t of p o s s i b l e o t h e r d e s c r i p t o r s . A l l p a i r s o f d e s c r i p t o r s h a v e b e e n c h e c k e d e v e n if the c o m b i n a t i o n of s o m e d e s c r i p t o r s do not

m a k e s e n s e . ,

T h e s i m i l a r i t y m a t r i x h a s b e e n c a l c u -l a t e d on the b a s i s of the T a n i m o t o coefficient for w h i c h is a f u n c t i o n of the f r e q u e n c y of the des-c r i p t o r A and the f r e q u e n des-c y of d e s des-c r i p t o r B and of the f r e q u e n c y of t h e two d e s c r i p t o r s taken t o g e t h e r C .

T A N I M O T O C O E F F I C I E N T C

f =

A + B - C

A = F R E Q U E N C Y OF T H E D E S C R I P T O R A B = F R E Q U E N C Y O F T H E D E S C R I P T O R B C = F R E Q U E N C Y OF T H E D E S C R I P T O R S

A AND B T O G E T H E R

T h e c a l c u l a t i o n of the s i m i l a r i t y m a t r i x w a s the m o s t t i m e c o n s u m i n g o n e , b e c a u s e 450, 000 c o m b i n a t i o n s had to he c h e c k e d whether t h e r e is a c o m m o n f r e q u e n c y C d i f f e r e n t from z e r o and s o m e 45, 000 T a n i m o t o coefficients h a d to be c a l c u l a t e d . S i n c e the n e c e s s a r y com-p u t e r t i m e of one a n d a half h o u r w a s not availa b l e for a s i n g l e j o b , t h e s y m m e t r y of the simi-l a r i t y m a t r i x h a s not b e e n u s e d , to m a k e a re-s t a r t o f the p r o g r a m m e e a re-s i e r without pare-sre-sing all the r e s u l t s c a l c u l a t e d u p t o the b r e a k i n g p o i n t . T h e input data, n a m e l y t h e i n v e r t e d file could be k e p t w i t h i n the c o r e m e m o r y putting a l w a y s t h r e e i n t e g e r n u m b e r s i n t o one word u s i n g a M A S K / S H I F T s u b r o u t i n e . T h e com-plete s i m i l a r i t y m a t r i x h a s b e e n p r i n t e d out, t h e coded k e y w o r d s w r i t t e n in p l a i n t e x t along with t h e i r a b s o l u t e and r e l a t i v e f r e q u e n c y counts. A c c o r d i n g to s u g g e s t i o n s of I v a n o v a (4) only t h o s e d e s c r i p t o r s s h o u l d b e a s s u m e d t o b e re-l a t e d w h o s e T a n i m o t o c o e f f i c i e n t s s u m up to half of all c o e f f i c i e n t s of the s a m e l i n e . We did u s e an a p p r o x i m a t i o n to the p r o p o s a l of Ivanova p u t t i n g a l l t h o s e d e s c r i p t o r s into one g r o u p , w h i c h h a v e a T a n i m o t o c o e f f i c i e n t g r e a t e r than the m e a n v a l u e of t h a t l i n e . T h e r e s u l t of this | p r o c e d u r e w a s a l i s t of as m a n y o v e r l a p p i n g g r o u p s a s t h e r e w e r e d e s c r i p t o r s . A manipula t i o n w a s m a d e c o n c e r n i n g the v e r y r a r e and t h e v e r y f r e q u e n t d e s c r i p t o r s . All t h i r t y e i g h t des-c r i p t o r s a p p e a r i n g ondes-ce and six d e s des-c r i p t o r s a p p e a r i n g m o r e t h a n 500 t i m e s h a v e b e e n deleate f r o m the a n a l y s i s , s u c h t h a t t h e r e w e r e left 7 1 0 d e s c r i p t o r s A s i m i l a r r e c o m m e n d a t i o n has b e e n m a d e b y S a l t o n for the SMART s y s t e m . T h e l i s t of the r e s u l t i n g 710 d e s c r i p t o r groups c o r r e s p o n d s t o the a l p h a b e t i c a l l i s t s i n t r a d i -t i o n a l -t h e s a u r i , e x c e p -t -t h a -t -t h i s l i s -t does no-t d i s t i n g u i s h d i f f e r e n t t y p e s of r e l a t i o n s .

(3)

A u t o m a t i c a l l y p r o d u c e d t h e s a u r u s B A 4

24 A u t o m a t i c C l u s t e r A n a l y s i s

In g e n e r a l , t h e r e is a s y s t e m a t i c p a r t and an a l p h a b e t i c a l p a r t in e v e r y t h e s a u r u s . The s y s t e m a t i c p a r t d e s c r i b e s the a r e a u n d e r consideration by m e a n s of s o m e s u b s e t s of the keywords. T h e a u t o m a t i c p r o c e d u r e for c l u s -ter analysis can be d i v i d e d into t h o s e which produce s e p a r a t e d c l u s t e r s and t h o s e which produce o v e r l a p p i n g c l u s t e r s . Since in our case the a l p h a b e t i c a l l i s t of d e s c r i p t o r groups is a l r e a d y o v e r l a p p i n g we d e c i d e d to u s e the m i n i m a l - t r e e p r o c e d u r e w h i c h d i v i d e s all keywords into e x a c t l y s e p a r a t e d g r o u p s T h e name of the m i n i m a l - t r e e p r o c e d u r e d e r i v e s from the fact t h a t a m a t r i x of d i s t a n c e s h a s b e e n used in the o r i g i n a l w o r k (6), c h o o s i n g the m i n i -mal distance b e t w e e n the w o r d s . In our c a s e we had the s i m i l a r i t y m a t r i x as a b a s i s , so it is obviously the m a x i m u m of the s i m i l a r i t y which c o r r e s p o n d s to the s a m e p r o c e d u r e . We have kept the n a m e i n s p i t e of u s i n g the o p p o s i t e measure.

The m i n i m a l - t r e e p r o c e d u r e i s s t a r t e d with an a r b i t r a r y d e s c r i p t o r . A s e c o n d d e s criptor is picked up out of the g r o u p of d e s c r i p -tors of the f i r s t one, so t h a t the s e c o n d one h a s maximum s i m i l a r i t y to the f i r s t . T h e t h i r d descriptor h a s to h a v e the m a x i m u m value of similarly to one of the p r e d e c e s s o r s , and is linked c o r r e s p o n d i n g l y T h e p r o c e d u r e h a s to be continued until all d e s c r i p t o r s h a v e e n t e r e d the tree exactly o n c e . T h e t r e e o b t a i n e d by t h i s procedure i s i n d e p e n d e n t f r o m t h e s t a r t i n g point as to the sum of all s i m i l a r i t i e s w h i c h h a v e b e e n entered into the t r e e . To cut the m i n i m a l - t r e e into pieces it is r e c o m m e n d e d to u s e an a r b i t rary level of s i m i l a r i t y which should be s u r -passed u n l e s s a b r a n c h of the t r e e is not to be cut off (2). A s e r i e s of t r i a l s s h o w e d t h a t an appropriate l e v e l of s i m i l a r i t y is h a r d to define, since too low a l e v e l d o e s not b r e a k the t r e e into a sufficient n u m b e r of f r a c t i o n s , s o m e being very l a r g e , o t h e r s b e i n g v e r y s m a l l . T h e number of f r a c t i o n s i n c r e a s e s as the l e v e l of similarity i n c r e a s e s a t which the b r a n c h e s s t a n d the cutting t r i a l s , but u n f o r t u n a t e l y the s i z e of the fractions d e c r e a s e s m o r e r a p i d l y so t h a t a number of s i n g l e d e s c r i p t o r s w e r e found to form c l u s t e r s of t h e i r own To finish a f i r s t version of o u r t h e s a u r u s we d e c i d e d to cut the minimal t r e e m a n u a l l y u s i n g the m e a n v a l u e of all s i m i l a r i t i e s as d e c i s i o n aid. I will r e t u r n to this point at the end. T h e a r b i t r a r y d i v i s i o n of the m i n i m a l t r e e could be m a d e s u c h that all fractions a r e a l m o s t of the s a m e s i z e . We have 23 so called m a i n g r o u p s e a c h c o n t a i n i n g s o m e JO d e s c r i p t o r s , w h i c h is a q u i c k l y s c a n n a b l e size. The definition of our g r o u p s and m a i n

g r o u p s i s a n a l o g o u s t o the f a c e t a n d s e m a n t i c c l a s s e s i n t r a d i t i o n a l t h e s a u r u s w o r k .

3 C O M P I L A T I O N OF GROUPS AND MAIN GROUPS

T h i s p a r t of the t o t a l p r o c e d u r e is m a i n l y for output p u r p o s e s . I t p r o d u c e s l i s t s of a l l g r o u p s of d e s c r i p t o r s i n d i c a t i n g the n u m b e r of the m a i n g r o u p s within w h i c h a d e s -c r i p t o r -can be found. T h i s s -c h e m e of r e f e r e n -c e a m o n g the d e s c r i p t o r s will be of g r e a t u s e to one who i s t r y i n g t o r e t r i e v e d o c u m e n t s o n the b a s i s of key w o r d s but h a s not found the p r o -p e r key w o r d c o m b i n a t i o n t o get the r e l e v a n t d o c u m e n t s . U s i n g the r e f e r e n c e s h e can c h a n g e his k e y w o r d c o m b i n a t i o n s t e p b y s t e p , a l w a y s c h e c k i n g the s t a t i s t i c a l n e i g h b o u r h o o d of any k e y w o r d . T h e l i s t of m a i n g r o u p s h a s b e e n p r i n t e d a l o n g with two s e l e c t e d key w o r d s w h i c h m a y b e s o m e s o r t o f i n t u i t i v e r e p r e s e n -t a -t i v e s for -the m a i n g r o u p s , T h i s m e -t h o d -too h a s t o b e r e p l a c e d b y s o m e b e t t e r , o b j e c t i v e m e t h o d when a r e v i s i o n of the t h e s a u r u s is m a d e . T o m a k e the c l u s t e r s o f the m a i n g r o u p s e a s i e r t o r e a d a g r a p h i c r e p r e s e n t a t i o n h a s b e e n p r o d u c e d b y the p r o g r a m d e s -c r i b e d in the next s e -c t i o n .

4 G R A P H I C DISPLAY OF T H E MAIN GROUPS

T h e g r a p h i c d i s p l a y w a s not a p r i n c i p a l a i m o f t h i s t h e s a u r u s p r o j e c t , but t h e p r a c t i -cal v a l u e of s u c h a thing is w e l l r e c o g n i s e d . T h e r e f o r e , the p r o g r a m n e c e s s a r y t o d o t h i s h a s b e e n d e s i g n e d t o b e a s e a s y a s p o s s i b l e , not s t r i v i n g for the gain of c o m p u t e r and p l o t t e r t i m e . T h e a l g o r i t h m i s b a s e d o n a n e q u i -d i s t a n t g r i -d with l i m i t a t i o n in the -d i r e c t i o n of the one a x i s . T h e difficulty w a s to find a f r e e p l a c e for e v e r y point which h a s t o b e e n t e r e d into the t r e e s u c h t h a t a line could be d r a w n not c r e a t i n g a m b i g u i t y c o n c e r n i n g p r e v i o u s l y d r a w n l i n e s . T h i s h a s been s o l v e d b y the s e l e c t i o n of 32 p o i n t s which could n e v e r p r o -duce confusion. W h e n e v e r t h e r e w a s a big c l u s t e r t h e r e h a s b e e n a n a u x i l i a r y r o u t i n e t o e s c a p e f r o m t h a t p a r t o f the a r e a all filled u p with p o i n t s . T h i s e s c a p e could b e m a d e c h e c k -ing the v e r t i c a l l i n e s at left and at r i g h t for the n e a r e s t f r e e point. O b v i o u s l y t h i s d r a w i n g p r o c e d u r e could not r e p r e s e n t the s i m i l a r i t y b y the g e o m e t r i c d i s t a n c e . But the m e a s u r e of s i m i l a r i t y s t r o n g e r than the m e a n v a l u e h a s b e e n m a d e v i s i b l e b y double l i n e s b e t w e e n the

d e s c r i p t o r s . T h e m a i n g r o u p s h a v e b e e n d r a w n i n s e p a r a t e p i c t u r e s a s well a s all t o

(4)

B A 4 SCHWANHAUSSEK

g e t h e r , r e p r e s e n t i n g the w h o l e m i n i m a l t r e e

5 CONCLUSIONS

* The procedures described here have

e n a b l e d the a u t o m a t i c c o n s t r u c t i o n of a p r a g -m a t i c t h e s a u r u s for a given -m e d i u -m s i z e d

s p e c i a l l i b r a r y . T h i s t h e s a u r u s w i l l b e a v a l u a b l e r e t r i e v a l aid for the s a m e d o c u m e n t c o l l e c t i o n f r o m w h i c h i t h a s b e e n d e r i v e d , S i n c e the c o l l e c t i o n h a s b e e n m a n u a l l y i n d e x e d , t h i s t h e s a u r u s cannot b e a n i n d e x i n g i n s t r u -m e n t e x c e p t for e r r o n e o u s -m i s u s e o f k e y w o r d s .

T h e p r o c e d u r e s for the c o n s t r u c t i o n allow f o r s o m e m a n u a l s t e e r i n g o f p a r a -m e t e r s and cut-off v a l u e s . B e s i d e s t h e y a r e a l s o o b j e c t i v e p r o c e d u r e s w h i c h c a n b e r e -p e a t e d w h e n e v e r n e c e s s a r y , -p r o d u c i n g a n e x a c t m a p p i n g of a l l the m e a n i n g s of a l l the d o c u m e n t s within the l i b r a r y . T h e b a s i c a s s u m p t i o n t h a t m a n u a l i n d e x i n g c a n be a good s t a r t i n g point i s a s s u r e d t h r o u g h the f a c t , t h a t i t i s e a s i e r for a n e x p e r t t o s a y t h i s is a v a l u a b l e book for t h a t l i b r a r y and it s h o u l d b e a s s i g n e d t h e s e d e s c r i p t o r s , w h i c h c a n b e r e a d i n the t i t l e o r t h e t a b l e o f c o n t e n t s o r t h e h e a d i n g s , t h a n t o a s k a n e x p e r t o r e x p e r t c o n f e r e n c e " W h i c h a r e the k e y w o r d s o f the a r e a o f k n o w l e d g e which you a r e e x p e r t s in" ? . T h e a u t o m a t i c i n d e x i n g of d o c u m e n t s c a n be e x p e c t e d in the f u t u r e , but t h e r e is s t i l l a c o n -s i d e r a b l e a m o u n t of w o r k to be done on both s i d e s , l i n g u i s t i c a n d c o m p u t e r r e s e a r c h . We, t h e r e f o r e , hope t h a t a u t o m a t i c t h e s a u r u s c o n

s t r u c t i o n i s a n a i d for p e o p l e l o o k i n g i n t o l i b r a -r i e s t o find t h e i n f o -r m a t i o n t h e y n e e d .

6 B I B L I O G R A P H I C A L R E F E R E N C E S 1 S e c 1 B O U C H E ( R e i n h a r d ) , KIND

( F r i e d b e r t ) and SCHWANHAUSSER ( G e r h a r d ) . G e s a m t -b i -b l i o g r a p h i e , Z e n t r a l a r c h i v f u r H o c h s c h u l b a u . T e i l 1 und

2. H r s g . : Z e n t r a l a r c h i v f u r H o c h s c h u l b a u . S t u t t g a r t : Z e n t r a l a r c h i v fur H o c h s c h u l -b a u 1973. 436, 891 S.

2 S e c 24 D E I C H S E L ( G u n t r a m ) . V e r f a h r e n d e r A u t o m a t i s c h e n K l a s s i f i -k a t i o n d u r c h C l u s t e r a n a l y s e und i h r e Anwendung b e i m o r -p h o l o g i s c h e n U n t e r s u c h u n g e n a n A m o b e n . Z u l a s s u n g s a r b e i t zum S t a a t s e x a m e n fur d a s L e h r a m t a n G y m n a s i e n . S t u t t g a r t : U n i v e r s i t a t , I r t s t i -tut fur I n f o r m a t i k 1972. G e t r . B 4 Pag

S e c 1 22

S e c 23

S e c 21

S e c 24

Sec 21

Sec 22

DIN 1463 V o r n o r m . Richtlinien fur die E r s t e l l u n g und Wetter* e n t w i c k l u n g deutschsprachiger T h e s a u r i . B e r l i n , K o l n : B e u t h V e r t r . 1972, 11 S IVANOVA ( N S ) . P r o b l e m e der

a u t o m a t i s i e r t e n T h e s a u r u s b i l -dung. In : I n f o r m a t i k , Berling 17 (1970) H. 1, S. 2 5 - 2 8 . KIND ( F r i e d b e r t ) a n d SCHWAN.

HAUSSER ( G e r h a r d ) . T h e s a u -r u s H o c h s c h u l p l a n u n g . Stutt g a r t : Z e n t r a t a r c h i v fur H o c h s c h u l b a u 1973. S. 33-38. ( I n f o r m a t i o n . 6 / 2 5 )

KNODEL ( W a l t e r ) . Graphen-t h e o r e Graphen-t i s c h e M e Graphen-t h o d e n und ihm A n w e n d u n g e n . B e r l i n , Heidel-b e r g , New Y o r k : S p r i n g e r 1969. VIII, 111 S. (Okonom e t r i e und U n t e r n e h (Okonom e n s -f o r s c h u n g . 13)

RISCHKOWSKY ( F r a n z i s k a ) . T h e s a u r u s Hochschulplanung E i n a u f g a b e n b e z o g e n e r Thesaur u s fuThesaur die L i t e Thesaur a t u Thesaur d o k u m e n t a t i o n d e r H o c h s c h u l I n f o r n m a -t i o n s - S y s -t e m B m b H (HIS) H a n n o v e r . H r s g . : HIS Hannova M u n c h e n - P u l l a c h : V e r l a g D o k u m e n t a t i o n 1973. IX, 2055 THESAURUS

BILDUNGSFOKS-CHUNG. V e r z e i c h n i s der D e s k r i p t o r e n und Nichtdeskrv

p t o r e n i n d e r L i t e r a t u r d o k u m e n t a t i o n d e s M a x P l a n c k -I n s t i t u t e f u r Bildungs-forsdii H r s g : Rolf N e u h a u s . Mitarbs E . Guhde, B . H e g e l h e i m e r , M R i c k , E V o s w i n k e l . Mun c h e n - P u l l a c h : V e r l a g Doku-m e n t a t i o n 1972. XVI, 471 S

A P P E N D I X

F i g 1 D o c u m e n t T i t l e s with Keywords F i g 2 R e g i s t e r of K e y w o r d s in Context F i g 3 Indexing Depth

F i g 4 D e s c r i p t o r F r e q u e n c y F i g 5 E x a m p l e of L i s t of G r o u p s F i g 6 L i s t of K e y w o r d s

F i g 7 E x a m p l e of M a i n G r o u p F i g 8 P a r t of the T o t a l M i n i m a l Tree F i g 9 S e l e c t i o n s c h e m e for Drawing

of M a i n G r o u p s

(5)
(6)
(7)

A u t o m a t i c a l l y p r o d u c e d t h e s a u r u s B A 7

F i g 3 Indexing Depth

(8)
(9)
(10)

BA7

S C H W A N H A U S S E K

07.09.73 SEITE 2

ANALYSE

ANATOMIC

ANFAENGER

ANFORDERUNG (NICHT ALS NACHFRAGE)

* ANFORDERUNG

NACHPRAGE

ANGEBOT

* ANGESTELLTER

PERSONAL

* ANLEITUNG

EINFUEHRUNG

ANORDNUNG

ANORGANISCHE CHEHIE

ANPASSUNG

ANSIGHT

ANTEIL

ANWENDUNG

APOTHEKE

ARBEIT

* ARBEITSPEREICH

TAETIGKEIT BEMESSUNG

ARBEITSKRAFT

* ARBEITSKREIS

ARBEIT GRUPPE

ARBEITSMARKT

ARBEITSMITTEL

ARBEITSPLATZ

Fig 6 List of Keywords

(11)

Automatically produced thesaurus

AUSSCHREIBUNG

BEGRUENDUNG BELASTUNG BIBUDGRAPHIE DICHTE

ENTWICKLUNGSPLAN

ERGEBNIS

FACHHOCHSCHULE

FORM

KRANKENPFLEGE

KUNST

LOHPKOERPER

MOBILITAET

MUSTER

PHILOLOGIE PHILOSOPHIE P RI N Z I P PRIVAT

RECHTSWISSENSCHAFT RICHTUNG

SOZIALWISSENSCHAFT

5PRACHWISSENSCHAFT

STADTPtAN

THEOLUGIE

VERMINDERUNG

VERORDNUNG

VORKLINIKUM

WETTBEWERE

WIRTSCHAFTSWISSENSCID:

ZONE

(12)
(13)

Automatically produced thesaurus

BA7

Fig 9 Selection Scheme for Drawing up of Main Groups

(14)
(15)

Figure

Fig 3 Indexing Depth
Fig 9 Selection Scheme for Drawing up of Main Groups
Fig 10 Interrelation among Main Groups

References

Related documents

 With IT budget constraints in mind, as companies often address the need to identify trends from increasing volumes of data by using internal or outsourced analytics

regulatory commissions' actions concerning rate relief; eminent domain actions affecting our water systems; electric power interruptions; the ability to successfully

Furthermore, a more comprehensive meta-analysis of 87 working memory training programmes, mostly consisting of Cogmed and n-back studies, found large near-transfer effects

During the visit to the Open Governance Partnership, one event participant noted that “the importance of historical context was impressed on us over and over again by our

Sunday, November 14 th , will be the dedication recital of the new Juget-Sinclair organ at Christ King in Wauwatosa with Isabelle Demers performing. Robin Cote, voicer

so may be excused without benefit of refund if the Event Committee deems the exhibitor to be in violation of this requirement. Cleaning supplies and disposal sites will

The IBM ISS X-Force ® research and development team drives IBM Security Innovation Protection Technology Threat Landscape Forecasting Malware Analysis Public Vulnerability

Check only the Page (menu item) value, which means the editors will be able to create computer documents only under some page, not under article or news document in the