COMPUTATIONAL ANALYSIS OF INTERFERENCE PHENOMENA
]ON
THE LEXICAL LEVELtI")
W. Skalmowski and M. Van Overbeke
Summary
T h i s c o n t r i b u t i o n p r e s e n t s t h e r e s u l t s of c o m p a r i s o n of D u t c h t e x t s w r i t t e n b y b i l i n g u a l s I) ( s p e a k i n g F r e n c h a n d
D u t c h ) , w i t h D u t c h t e x t s r e g a r d e d as S T A N D A R D W R I T T E N D U T C H . T h e a t t e n t i o n w a s f o c u s s e d o n F r e n c h l o a n - w o r d s a p p e a r i n g in b o t h t y p e s of t e x t s a n d t h e d i f f e r e n c e s in t h e i r u s e . C e r t a i n g e n e r a l i z a t i o n s as to t h e m e c h a n i s m s of i n t e r f e r e n c e a r e s u g g e s t e d .
I. M a t e r ~ l s
T h e m a t e r i a l s u s e d f o r t h e p r e s e n t c o n t r i b u t i o n b e l o n g to
t w o g r o u p s :
r G r o u p A : t e x t s w r i t t e n b y f r a n c o p h o n e s w i t h ca. 6 y e a r s o f D u t c h t r a i n i n g . T h e s e t e x t s r e p r e s e n t w h a t w e c a l l F r a n c o p h o n e W r i t t e n D u t c h ( b e l o w F W D ) .
- g r o u p B : T e x t s f r o m r e c e n t c o n t e m p o r a r y D u t c h l i t e r a t u r e b y b o t h D u t c h a n d F l e m i s h a u t h o r s . T h e y w i l l h e r e r e p r e s e n t S t a n d a r d W r i t t e n D u t c h ( S W D ) . = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
( ~ We a r e g r e a t l y i n d e b t e d f o r t h e a s s i s t a n c e of oul c o l - l e a g e s M r . L . D E B U S S C H E R E , w h o p r e p a r e d a l l c o m p u t e r p r o g r a m s n e e d e d in t h i s i n v e s t i g a t f o n , M r . R . E E C K H O U T , w h o h e l p e d u s w i t h m a n y s u g g e s t i o n s as t o t h e p o s s i b i l i t i e s o f i n f o r m a t i o n p r o c e s s i n g t e c h n i q u e s a n d w i t h c r i t i c a l r e m a r k s c o n c e r n i n g t h e l i n g u i s t i c a s p e c t s of o u r p r o b l e m , a n d - l a s t b u t n o t l e a s t - t h e D i r e c t i o n o f t h e M A T H E M A T I C A L C E N T R E o f t h e U n i - v e r s i t y of L o u v a i n , w h o p u t a t o u r d i s p o s a l t h e I B M - 3 6 0 c o m - p u t e r .
T h e t e x t s of g r o u p A w e r e w r i t t e n b y 4 0 0 f r a n c o p h o n e 18 y e a r - o l d p u p i l s in t h e h i g h e s t c l a s s e s at the 61 p r i v a t e s e c u n d a r y
s c h o o l s in B r u s s e l s a n d i t s s u b u r b s . T h i s s a m p l e r e p r e s e n t s o n e f i f t h of t h e t o t a l p o p u l a t i o n . F r o m e v e r y p u p i l w e o b t a i n e d t w o D u t c h c o m p o s i t i o n s , o n e of t h e m a p i e c e of h o m e w o r k w r i t t e n in N o v e m b e r ] 9 6 7 , a n o t h e r a n e x a m i n a t i o n c o m p o s i t i o n f r o m
D e c e m b e r of the s a m e y e a r . T h e r e a s o n s f o r t h i s c h o i c e a r e e v i d e n t , s i n c e t h e p u p i l s c a n c a l l in t h e i r p a r e n t s ' a n d t h e i r d i c t i o n a r i e s ' a s s i s t a n c e i n t h e f i r s t s i t u a t i o n b u t n o t in the s e c o n d .
F r o m e v e r y c o m p o s i t i o n t h e f i r s t 125 w o r d s w e r e p u t o n p u n c h - c a r d s t o g e t h e r w i t h c o d e d i n f o r m a t i o n as to t h e i r s o u r c e . In
t h i s w a y a c o r p u s o f ca. 1 0 0 , 0 0 0 w o r d s w a s c o m p i l e d . In o r d e r to a l l o w f o r c o m p a r i s o n of r e l a t i v e p a r a m e t e r s s u c h as w o r d - s p r e a d , v o c a b u l a r y - g r o w t h e t c . , it w a s l a t e r d i v i d e d i n t o t w o p a r t s e a c h c o n t a i n i n g ca. 5 0 , 0 0 0 w o r d s ( p a r t s I a n d 2 b e l o w ) . T h e t e x t s of g r o u p B, i . e . the SWD, w e r e o b t a i n e d b y p u t t i n g t o g e t h e r e x t r a c t s f r o m l i t e r a r y w o r k b y I0 c o n t e m p o r a r y a u t h o r s . T h i s a n t h o l o g y g a v e us a c o r p u s of s o m e ] O , 0 O O w o r d s .
T h e f i r s t p a r t of g r o u p A r e f l e c t s ca. 5 0 d i f f e r e n t s u b j e c t - m a t t e r s , w h e r e a s t h e S W D - a n t h o l o g y r e f l e c t s o n l y ]O s u b j e c t - m a t t e r s or " t h e m e s " . So the d i s p r o p o r t i o n of c o r p o r a is o u t - w e i g h e d b y a
themes/tokens
r a t i o w h i c h is I/ 10 in b o t h c o r p o r a .i
~ a z ~ /
i J i j
I ~ j °
C O M P A R I S O N O F V O C A B U L A R Y G R O W T H
10 t h e m e s : I t h e m e : o
FiG. I
2
Lexical ~ter~rence
T h e m a i n p u r p o s e of this c o n t r i b u t i o n is to test and v e r i f y c e r t a i n n o n - c o m p u t a t i o n a l i n s i g h t s m a d e a b o u t l a n g u a g e i n t e r - f e r e n c e in g e n e r a l . D u t c h p r e s e n t s a v e r y p o i g n a n t e x a m p l e of this p h e n o m e n o n s i n c e its v o c a b u l a r y c o n t a i n s a v e r y l a r g e n u m b e r of F r e n c h l o a n - and f o r e i g n w o r d s and t h e r e is s t i l l an " o p e n d o o r " a l l o w i n g the i n t r u s i o n of l e x i e a l g a l l i c i s m in p r a c t i c a l l y u n l i m i t e d q u a n t i t y . T h u s the D u t c h v o c a b u l a r y h o l d s a lot of p a r a l l e l l e x e m e s of b o t h o r i g i n s , e.g.
analyse~
T h i s s i t u a t i o n s t r o n g l y r e s e m b l e s that of E n g l i s h w i t h its A n g l o - S a x o n a n d R o m a n c e w o r d s , a l t h o u g h the s e m a n t i c d i f f e r - e n t i a t i o n of s u c h w o r d - p a l r s s e e m s to h a v e p r o g r e s s e d m u c h m o r e t h e r e . W h e r e a s the n a t i v e D u t c h s p e a k e r p l a y s b o t h k e y s w i t h a n u n b i a s e d e a s e , for the B e l g i a n f r a n c o p h o n e this a m b i g - u o u s s i t u a t i o n p r o d u c e s c e r t a i n c o n s t r a i n t s and d i f f i c u l t i e s , w h i c h h a v e v i s i b l e e f f e c t s on w o r d - c h o i c e , g r o w t h r a t e of f o r e i g n w o r d s and v o c a b u l a r y s i z e in g e n e r a l .
F o r r e a s o n s of s i m p l i c i t y our i n v e s t i g a t i o n did n o t a d o p t the u s u a l d i s t i n c t i o n b e t w e e n l o a n - w o r d s a n d f o r e i g n w o r d s s i n c e t h i s is b a s e d on the d i f f e r e n t d e g r e e s of i n t e g r a t i o n of f o r e i g n l e x e m e s , m e a s u r e d b y d i f f e r e n c e s in p r o n u n c i a t i o n , s o c i a l a c c e p t a b i l i t y w i t h i n the s p e a k i n g c o m m u n i t y and c e r t a i n p r e s c r i p t i v e a r r a n g e m e n t s s u c h as t h e i r i n c l u s i o n in v o c a b u l a - r i e s a n d d i c t i o n a r i e s , w h o s e a u t h o r i t y is g e n e r a l l y a c c e p t e d . As the a i m of o u r i n v e s t i g a t i o n w a s to f i n d w a y s of p r o v i d i n g n u m e r i c a l v a l u e s for i n t e r f e r e n c e p h e n o m e n a , w e p r o c e e d e d in a p u r e l y d e s c r i p t i v e w a y , u s i n g o n l y e t y m o l o g i c a l c r i t e r i a to d i s t i n g u i s h b e t w e e n o r i g i n a l and f o r e i g n l e x i c a l e l e m e n t s . T h u s w e c o n s i d e r e d u n i t s c o n t a i n i n g e i t h e r l e x i c a l or m o r p h o - l o g i c a l e l e m e n t s , or b o t h , as l o a n - w o r d s . So
bonjouPen
w i t h its F r e n c h l e x i c a l e l e m e n t w a s e n t e r e d , b u t a l s otrotser#n
b e c a u s e of its F r e n c h w o r d - f r o m a t i o n a l p a r t . C o m p o s i t a c o n t a i - n i n g o n l y o n e f o r e i g n e l e m e n t (e.g.
avondto~let)
w e r e t r e a t e d as l o a n - w o r d s u n l e s s this e l e m e n t had a l r e a d y b e e n e n t e r e d as an a u t o n o m o u s w o r d . No d i s t i n c t i o n w a s m a d e b e t w e e n f o r e i g n w o r d s i n c l u d e d in the S t a n d a r d D u t c h V o c a b u l a r y of v a n D a l e , ( e . g .assaut)
a m d t h o s e w h i c h a r e n o t m e n t i o n e d t h e r e(e,g.auberge),
pressionlst etc.
a r e c o u n t e d as d i f f e r e n t i t e m s . A l s o for r e a s o n s of s i m p l i c i t y a l l n o n - F r e n c h f o r e i g n w o r d s a r e r e l e -g a t e d h e r e to the c a t e g o r y of p u r e D u t c h i t e m s .
3.
Lexical mter~renceand word-~ngthAs a f i r s t a p p r o x i m a t i o n t e s t the p e r c e n t a g e of f o r e i g n w o r d s
in the v o c a b u l a r y in b o t h F W D - a n d S W D - t e x t s w a s e s t a b l i s h e d . The r e s u l t s are as f o l l o w s :
T O K E N S T Y P E S Z o g T Y P E S F W D 4 7 , 3 0 7 5 , 6 5 3 0 . 8 3 7 5
SWD 10,358 2 , 6 1 6 0 . 8 8 0 7
F O R E I G N T Y P ~ o g F . T .
6 4 8 0 . 4 9 5 4
% F . T Y P E S ~
1 1 . 8 5
141 O . 4 2 8 5 5 . 3 8
T h e d i f f e r e n c e of f o r e i g n v o c a b u l a r y r a t i o in b o t h g r o u p s r e s u l t s in d i s t r i b u t i o n a l d i f f e r e n c e s of w o r d s of d i v e r g i n g l e t t e r - n u m b e r . T h o u g h the o v e r a l l w o r d - l e n g t h of t o k e n s in b o t h g r o u p s is n e a r l y i d e n t i c a l ( 4 . 5 ] for SWD a n d 4 . 6 1 for FWD) e n a p p l i c a t i o n of the c h i - s q u a r e test p r o v e d the d i v e r g e n c e s
of w o r d d i s t r i b u t i o n ( w o r d s b e l o n g i n g to d i f f e r e n t w o r d - c l a s s e s ) to b e h i g h l y s i g n i f i c a n t . T h e a v e r a g e w o r d - l e n g t h of t y p e s (M) is d i f f e r e n t in b o t h g r o u p s :
F W D
SWD
M o
7 . 8 5 2 . 9 7
7 .03 2 . 7 2
is s t r i k i n g l y e v i d e n t for w o r d - l e n g t h |0
(fig.2)
The fact that F W D - a u t h o r s W o u l d " s w i t c h in" this D u t c h - f o r m a t i o n a l d e v i c e in c a s e s w h e r e the D u t c h n a t i v e s p e a k e r does not, shows that fran- c o p h o n e s are " o v e r - a w a r e " of this m e a n s of t r a n s l a t i n g the F r e n c h g e n i t i v e c o n s t r u c t i o n by a D u t c h c o m p o s i t u m (e.g.pot de
~ e u ~ 8 >
b l o e m p o t ) .
This fact s t r e n g t h e n s the a s s u m p t i o n m a d e in this p a p e r , that the l e x i e a l level of l a n g u a g e is v e r y c l o s e l y c o n n e c t e d w i t h h i g h e r ( s y n t a c t i c ) levels, so that s t a t i s t i c a l l y s t a t a b l e facts m a y be e x p l a i n e d only in c o n n e c t i o n w i t h c e r t a i n m o r e g e n e r a l m o d e l s of s p e e c h p r o d u c t i o n .•
.
\
/ /
/ /
• i///
//
jY
1-
L~_ com~osita
WORD LENGTH OF TYPES
AND COMPOSITA
50
, ' / \
/I//
\
/ t
/
/ F
I ).F/
I
I
c - - - ~ - - flUfl~l Of le~ter$
\
\ t~
~A
[image:6.612.147.508.289.677.2]• types SWD
types FWD
FIG. 2
,4. A n interference model
T h e i n t e r f e r e n c e m o d e l p r e s e n t e d h e r e c o n s i s t s of t w o p a r t s : t h e s y n t a c t i c o n e , c o n t a i n i n g a l s o t h e w o r d - f o r m a t i o n a l d e v i c e s , w h i c h m a y b e t h o u g h t of as a g e n e r a t i v e d e v i c e o f t h e k i n d d e s c r i b e d b y N . C H O M S K Y a n d o t h e r g e n e r a t i v i s t s ; t h e s e c o n d p a r t , c a l l e d
the lexical morpheme store,
is t h o u g h t of as c o n s i s t i n g of e n t r i e s " w r i t t e n d o w n " in t e r m s of c o n c e p t u a l s y m b o l s , p r o - v i d e d w i t h a c t u a l l i n g u i s t i c i n t e r p r e t a t i o n s . T h e s e " i n t e r p r e t a - t i o n s " , w h i c h in a v e r y s i m p l i f i e d m a n n e r m a y b e i d e n t i f i e d w i t h w o r d stout court,
a r e p i c k e d o u t of t h e s t o r e a n d " f i t t e d " i n t o p r e v i o u s l y c o n s t r u c t e d s e n t e n c e f o r m s . In o t h e r w o r d s , w e a s s u m e t h a t t h e s e n t e n c e s a r e f o r m e d a c c o r d i n g to s e m a n t i c r e q u i r e m e n t s b e f o r e the a c t u a l w o r d s h a v e b e e n c h o s e n . T h i s l a s t r o u t i n e g o e s o n in a s e m i - a u t o m a t i c w a y , w h i c h m a y b e v i s u a l i z e d as p i c k i n g t h e r e q u i r e d l e x e m e s - a c c o r d i n g to t h e e n t r i e s in t e r m s of c o n c e p t u a l s y m b o l s - o u t of a m a g n e t i c t a p e g l i d i n g u n d e r a r e a d i n g d e v i c e of s o m e s o r t .F o r the c a s e of a b i l i n g u a l s p e a k e r , w e c a n i m a g i n e t h e p r o c e d u r e
as a t a p e w i t h t h r e e d i f f e r e n t t r a c k s , t h e m i d d l e o n e c o n t a i n - ~ n g t h e " e n t r i e s " , t h e o t h e r t w o t h e r e s p e c t i v e a c t u a l m o r p h e m e s , in c a s u D u t c h a n d F r e n c h (D a n d F i n
fig. 3).
S p e a k i n g in o n e of t h e t w o l a n g u a g e s d e m a n d s a s w i t c h - o v e r to o n e of t h e e x t e r n a l t r a c k s . It m a y b e a s s u m e d t h a t , in t h e c a s e of a m o n o l i n g u a l D u t c h s p e a k e r , t h e c e l l s c o n t a i n the p a r a l l e l F r e n c h a n d D u t c h w o r d s in a n u n o r d e r e d m a n n e r , w h e r e a s w i t h a f r a n c o p h o n e a b i a s e x i s t s t o w a r d s the F r e n c h l o a n - w o r d ( e . g . c o l u m n ] on f i g . 3 :ph~nom¢ne > fenomeen (verschijnsel)).
T h i s e x p l a i n s t h e p r e d i -ing, w h i c h m a y b e c o n c e i v e d of as an a u t o m a t i c s w i t c h - o v e r to the F r e n c h s i d e , w h e r e v e r the D u t c h t r a c k is b l a n k or w h e n e v e r the b i l i n g u a l ' s c o m p e t e n c e f a i l s to f u r n i s h a g o o d D u t c h w o r d or s y n o n y m . In t h i s p r o c e s s the F r e n c h l e x e m e is p l a c e d in the c e l l on the D u t c h s i d e ( c f . c o l u m n 3 w h e r e ~ is the l a c k i n g w o r d ) .
1
2
3
4
D i
've.chi/nsel
~ 1 on,lo~'ing.--.,..-
2 fenomeen J 2 analyseii
' *ClPH
E NOMENONS* "
CANALYSISm
~
I-->2
~.
wti= I'-~" I
F I'PHL~NOM~'NE ~' I. ANALYSE
=SUN-BURNED ~
~_~ /np~= I->1+1+2
[image:8.612.143.473.182.357.2]* ~ I-BASAN -|e == +
FIG. 3
I Engels IBrTts I 2 ~ a n i s c h
=BRITISH ~
optm I'-->2
I I. BRITANN/QUE
<
W e a s s u m e that the w o r d - f o r m a t i o n a l r u l e s b e l o n g to the s y n - t a c t i c a l p a r t . T h u s the r e s h a p i n g of n e w F r e n c h b o r r o w i n g s
(cf. the l o a n - a d j e c t i v e g e b a g a n c e r d c o m p o s e d of the F r e n c h b a g a n ~ , w h o s e c o u n t e r p a r t is l a c k i n g in the D u t c h t r a c k , a n d of two D u t c h a f f i x e s g e - a n d -d) is d o n e in the g r a m m a t i - c a l p a r t of o u r m o d e l . As a m a t t e r of f a c t , this a s s u m p t i o n is a h e u r i s t i c o v e r - s i m p l i f i c a t i o n , b e c a u s e c e r t a i n g r a m m a t i - cal m o r p h e m e s a r e in f a c t b o r r o w e d , cf. the e n d i n g s - e r e n , - a t ~ e , - a g e etc. In o r d e r to e x p l a i n t h i s p h e n o m e n o n , o n e c o u l d a r g u e o n the f a c t t h a t in m a n y c a s e s w h o l e w o r d - i t e m s a r e i n t r o d u c e d to the l e x i c a l s t o r e a n d a c t i v a t e the a n a l o g y m e c h a n i s m , b u t this p r o b l e m w o u l d l e a d us b e y o n d the s c o p e of the p r e s e n t i n v e s t i g a t i o n .
A code-swit~ing t h ~ r y
T h e r e h a s b e e n m u c h s p e c u l a t i o n a b o u t the p o s s i b l e p r i n c i p l e of l e x e m e o r d e r in the s t o r e , s o m e o r d e r i n g b e i n g a n e c e s s a r y c o n d i t i o n of e f f i c i e n t r e - c o d i n g . M u c h d i s c u s s i o n , too, has
b e e n d e v o t e d to the s o - c a l l e d Z I P F - I a w 3) . T h e m o s t c o n v i n c i n g e x p l a n a t i o n w a s that s u g g e s t e d b y H E R D A N 4), n a m e l y t h a t an o r d e r - ing of i t e m s b y d e c r e a s i n g f r e q u e n c y w o u l d d i m i n i s h the n u m b e r of o p e r a t i o n s n e c e s s a r y to i d e n t i f i e a g i v e n i t e m . "Let us ... a s s u m e that the a r r a n g e m e n t of the entries is s y s t e m a t i c a c c o r d - ing to f r e q u e n c y of o c c u r e n c e in d e s c e n d i n g o r d e r of f r e q u e n c y , so that the most f r e q u e n t w o r d has rank I, the s e c o n d most f r e - quent word rank 2, and so on. If in such a d i c t i o n a r y , that is one in which words are a r r a n g e d in o r d e r of d e c r e a s i n g f r e q u e n c y and i n c r e a s i n g order of rankj the look-up p r o c e d u r e is one of s u c c e s s i v e comparison, the w o r d of rank r will r e q u i r e r look-up operations~ and since this word occurs - the Z i p f - l a w a s s u m e d - C/r times, the total n u m b e r of look-up o p e r a t i o n s r e q u i r e d to
locate a w o r d is C (the c o n s t a n t in the Zipf-law, f o r m u l a t e d as r.fr= C ). Thus for n words c o n t a i n e d in the d i c t i o n a r y , nC
look-up o p e r a t i o n s will be r e q u i r e d . On the o t h e r hand, we know that for the Z i p f - l a w the total n u m b e r of o c c u r e n c e s (the text length in terms of w o r d number) a n d thus the total n u m b e r of words to be searched, is given by
I~Crdr = C l o ~ n N
It follows that the a v e r a g e n u m b e r of look-up o p e r a t i o n s per word is
A n = nC/C lo~ n = n / l o ~ n
(...) This c o m p a r e s f a v o u r a b l y with the n/2 look-up o p e r a t i o n s which w o u l d be n e e d e d under the scheme d e s c r i b e d above, w h i c h makes no use of the f r e q u e n c y e l e m e n t . " )
W i t h i n the f r a m e w o r k of our m o d e l it w o u l d m e a n t h a t the w i n d - ing a n d u n w i n d i n g of the tape t a k e s c o n s i d e r a b l y less time t~a~L in the c a s e of w h o l l y r a n d o m d i s t r i b u t i o n . T h e q u e s t i o n r e m a i n s of w h a t p r i n c i p l e u n d e r l i e s the d i f f e r e n t i a t i o n of i t e m p o s s i - b i l i t y . H e r e too, the c o n c e p t of " p i g e o n - h o l i n g " o~ s e m a n t i c
In o t h e r w o r d s , the " c o n c e p t u a l s y m b o l s " do n o t r e p r e s e n t s e p a r a t e p i e c e s of the u n i v e r s de d i s o o u r 8 t a k e n at r a n d o m , b u t are p r o b a b l y o r d e r e d b y s o m e c l a s s i f i c a t i o n a l s y s t e m , r e s e m b l i n g the b i o l o g i c a l c l a s s i f i c a t i o n .
6. Word content and entropy
To t e s t this h y p o t h e s i s w e d i v i d e d the F W D m a t e r i a l i n t o t h r e e f r e q u e n c y - c l a s s e s ( g r o u p I: a b s o l u t e f r e q u e n c y ], g r o u p II: f r e q u e n c y 2 a n d 3, g r o u p III : f r e q u e n c y a b o v e 3) a n d e x a m i n e d the s a m p l e s of t h e s e g r o u p s a c c o r d i n g to t h e i r d i s t r i b u t i o n w i t h i n the c l a s s i f i c a t i o n a l s y s t e m
a p p l i e d b y L . B R O U W E R S in his D u t c h t h e s a u r u s H E T J U I S T E W O O R D 3). The s u p p o s i t i o n w a s t h a t in the e v e n t of o r d e r i n g of s o m e k i n d , the d i s t r i b u t i o n of i t e m s a m o n g the " c o n t e n t c l a s s e s " in the t h e s a u r u s ( e x p r e s s e d as e n t r o p y a n d r e d u n - d a n c y ) w o u l d be d i f f e r e n t for v a r i o u s f r e q u e n c y g r o u p s , a n d f u r t h e r , t h a t in the e v e n t of the " p i g e o n - h o l i n g " s u g g e s t e d b y H E R D A N , the r e d u n d a n c y s h o u l d i n c r e a s e f o r g r o u p s of i t e m s w i t h h i g h e r f r e q u e n c i e s . S u c h an i n c r e a s e w a s in f a c t o b s e r - v e d , as the r e a d e r c a n c o n c l u d e f r o m the f o l l o w i n g t a b l e :
F R E Q U E N C Y 1
H 5 . 0 9 9
R 0 . 1 5
F R E Q U E N C Y 2-3 F R E Q U E N C Y • 3
4 . 8 9 2 4.854
0 . 1 8 0 . 1 9
/o/'/~
/
,,,~f
/ .
r u l e out o t h e r d e v i c e s a l l o w i n g q u i c k i n t e r c o n n e c t i o n s b e t w e e n w o r d s b e l o n g i n g to the s a m e c o n t e n t - g r o u p b u t d i f f e r i n g in f r e q u e n c y ; (cf. the s o - c a l l e d
association of
related concepts
s u g g e s t e d b y P . A . K O L E R S @ ) . H o w e v e r , theb a s i c p r i n c i p l e of o r d e r s e e m s to be of a s t a t i s t i c a l k i n d , as is p r o v e d b y the p e r f e c t fit of the r a n k - f r e q u e n c y d i s t r i b u t i o n w i t h the t h e o r e t i c a l d i s t r i b u t i o n a c c o r d i n g to the Z I P F - M A N D E L B R O T f o r m u l a t i o n ( o f . f i g . 4 ) . T h e c o r r e - l a t i o n c o e f f i c i e n t b e t w e e n the o b s e r v e d and the t h e o r e t i c a l d i s t r i b u t i o n is 0 . 9 9 3 !
Z Consequences
The a s s u m e d m o d e l h a s l c o n s e q u e n c e s , w h i c h h a v e b e e n e m p i r i - c a l l y t e s t e d :
I. T h e a s s u m e d m o d e l , and e s p e c i a l l y the p r o c e s s of
bZank-
fiZl£ng
of the D u t c h t r a c k w i t h F r e n c h m o r p h e m e s , p r e s u p p o -ses t h a t in g e n e r a l the F W D - w r i t e r s w i l l u s e a g r e a t e r n u m b e r of f o r e i g n w o r d s t h a n the SWD a l l o w s . T h i s f a c t is a l r e a d y a p p a r e n t f r o m the o v e r a l l p e r c e n t a g e of f o r e i g n e l e m e n t s in F W D (cf. f i g . 5 ) In p a r t i c u l a r the f o r e i g n w o r d s s h o u l d a p p e a r m o r e f r e q u e n t l y in p r o p o r t i o n to the i n c r e a s e of t e x t - l e n g t h ) . T h e i n v e s t i g a t i o n of v o c a b u l a r y g r o w t h r a t e has in f a c t
s h o w n t h a t this is the c a s e : the r a t i o of n e w f o r e i g n w o r d s to the t o t a l v o c a b u l a r y r e m a i n s s t a b l e (ca. ]/ lO) u n t i l a v o c a b u l a r y of 3 , 0 0 0 i t e m s is r e a c h e d . T h e r e a f t e r it i n c r e a s e s c o n s i d e r a b l y . The s a m p l e d e s c r i b e d as P a r t 2
(fig. 5) c o n t a i n i n g c a . 5 0 , O 0 0 w o r d s , has n o t b e e n p r e - e d i t e d ; i.e. n o o r t h o g r a p h i c m i s t a k e s or o m m i s s i o n s h a v e b e e n e l i m i - n a t e d , as it w a s d o n e m a n u a l l y in P a r t I. T h u s all o r t h o g r a - p h i c i d i o s y n c r a s i e s h a v e b e e n c o u n t e d as n e w t y p e s b y the c o m p u t e r . We a s s u m e t h a t the d i f f e r e n c e in the s i z e of the s o - c a l l e d b a s i c v o c a b u l a r y ( 3 , 0 0 0 -- 3 , 5 0 0 ) is m a i n l y d u e to
t h i s f a c t .
t,J
/
1
J ~ (.n
\\, ~ , \
13
2. As the c h o i c e of l e x e m e s f r o m the s t o r e t a k e s p l a c e in t e r m s of " c o n c e p t u a l s y m b o l s " , the l e x i c a l d i v e r s i t y s h o u l d n o t b e s u b s t a n t i a l l y d i m i n i s h e d on a c c o u n t of the l i m i t e d v o c a b u l a r y . T h e b l a n k - f i l l i n g s w i t h F r e n c h l e x e m e s s h o u l d a l l o w the f r a n c o p h o n e s to k e e p the o v e r a l l d i v e r s i t y on a n o r m a l l e v e l , i.e. on that of the S W D - w r i t e r s . In o t h e r w o r d s , w e s u p p o s e that the g r e a t e r n u m b e r of f o r e i g n e l e m e n t s
in F W D - t e x t s is the c o n s e q u e n c e of the e n d e a v o r to " k e e p in p a c e " w i t h the n o r m a l r a t e of l a n g u a g e d i v e r s i t y .
CONCLUSIONS
a) T h e f r a n c o p h o n e b i l i n g u a l s u s e m o r e t h a n t w i c e as m u c h w o r d s as the m o n o l i n g u a l n a t i v e s p e a k e r s of D u t c h . b) T h i s f a c t is c o n n e c t e d w i t h the t e n d e n c y to k e e p the
o v e r a l l v a r i e t y of v o c a b u l a r y at a c e r t a i n " n o r m a l " l e v e l of s p e e c h p r o d u c t i o n . T h i s v a r i e t y is a b i t s m a l l e r t h a n in the c a s e of n a t i v e s p e a k e r s (cf. the
r lo~ee_~
r a t i o = l o g N ; for F W D 0 . 8 3 7 , for SWD 0 . 8 8 0 ) .
c) It c a n n e v e r t h e l e s s be d e s c r i b e d as " n o r m a l " s i n c e the v a l u e of p a r a m e t e r B in M A N D E L B R O T ' s f o r m u l a t i o n of the Z I P F - I a w is ] . 0 3 3 4 7 .
d) T h e f o r e i g n l e x e m e s are n o t e q u i d i s t r i b u t e d in the a s s u m e d w o r d s t o r e ; t h e i r n u m b e r i n c r e a s e s w i t h the g r o w i n g text l e n g t h a n d this i n c r e a s e is q u i t e e v i d e n t a b o v e the f i r s t 3 , 0 0 0 w o r d s , T h i s f a c t a l l o w s one to t h i n k of t h e m as a " b a s i c v o c a b u l a r y " , c o v e r i n g v a r i o u s s u b j e c t s (two d i f f e r e n t m u l t i - s u b j e c t s a m p l e s g a v e n e a r l y i d e n t i c a l v a l u e s of the b a s i c v o c a b u l a r y ) . e) T h e e x i s t e n c e of the b a s i c v o c a b u l a r y and the g o o d fit
of e m p i r i c a l d a t a w i t h the t h e o r e t i c a l d i s t r i b u t i o n k n o w n as Z I P F - I a w , s t r e n g t h e n s the a s s u m p t i o n that the
w o r d - u n i t s in the s t o r e are o r d e r e d .
f) O n e of £he o r d e r i n g p r i n c i p l e s is the p i g e o n - h o l i n g of i n f o r m a t i o n a c c o r d i n g to s o m e c l a s s i f i c a t i o n a l s y s t e m w h i c h t a k e s i n t o a c c o u n t the i n f o r m a t i o n a l c o n t e n t of w o r d s .
R E F E R E N C E S
i. The terms " b i l i n g u a l " and " b i l i n g u a l i s m " are u n d e r s t o o d here in the m e a n i n g u s e d by E . H A U G E N , B i l i n ~ u a l i s m in the A m e r i c a s , A l a b a m a 1958, p.9 : " B i l i n g u a l s (...) is a c o v e r t e r m for p e o p l e w i t h a n u m b e r of d i f f e r e n t l a n g u a g e s k i l l s , h a v i n g in c o m m o n only that t h e y are not m o n o l i n g u a l s " . Cf. a l s o the same a u t h o r , T h e N o r w e $ i a n l a n g u a g e in A m e r i c a , P h i l a d e l p h i a 1953, p.7 : " B i l i n g u a l i s m is u n d e r s t o o d here to b e g i n at the p o i n t w h e r e the s p e a k e r of one l a n g u a g e can p r o d u c e c o m p l e t e m e a n i n g - ful u t t e r a n c e s in the other l a n g u a g e "
2. V A N DALE, Groot W o o r d e n b o e k der N e d e r l a n d s e Taal, d o o r Dr.C. K r u y s k a m p , M . N i j h o f f , Den H a a g 1 9 8 1 - 8 .
3. Cf. M a n d e l b r o t , S t r u c t u r e f o r m e l l e des t e x t e s et c o m m u n i c a t i o n , Word, i0 (1954) p p . l - 4 2 and G . H e r d a n , The C a l c u l u s of L i n g u i s t i c O b s e r v a t i o n , M o u t o n & Co, The Hague 1962, p p . 5 9 - 8 4 .
4. G . H e r d a n , T y p e - T o k e n M a t h e m a t i c s , M o u t o n , The H a g u e 1 9 6 0 , p . 2 0 5 . 5. L . B r o u w e r s s.j., Het ~ u i s t e woord. B e t e k e n i s w o o r d e n b o e k der
N e d e r l a n d s e taal, B r e p o l s , B r u s s e l - T u r n h o u t , 1 9 6 5 . 6. P . A . K o l e r s , B i l i n g u a l i s m a n d I n f o r m a t i o n P r o c e s s i n g , The
S c i e n t i f i c A m e r i c a n , v o i . 2 1 8 ~ 3, 1968.
I n s t i t u t e of A p p l i e d L i n g u i s t i c s U n i v e r s i t y of L o u v a i n
V e s a l i u s s t r a a t 2~ L o u v a i n ( B e l g i u m )