JÖRGEN Pin d
Computers, Typesetting, and
Lexicography
A bstract
As part o f the general strategy o f computerizing the lexicographic work process at the Institute o f Lexicography, we have adopted Donald
E. Knuths typesetting program as our typesetting engine. The main
characteristics o f the program wiU be briefly described, followed by a dis cussion o f its advantages for lexicographic work.
has already been used for the typesetting o f a 1300 page etymo logical dictionary o f Icelandic. A number o f other projects are under way. Special notice will be paid to the problem o f coding as it relates to the making o f dictionaries. The advantages o f a generic, or logical, coding over typographic coding will be emphasized. However, doubts will be raised about the possibility o f providing a set o f tags which are completely neutral with respect to typographic considerations.
1
Introduction
In th is p a p e r I w a n t t o d iscu ss o n e p a r tic u la r a s p e c t o f c o m p u ta tio n a l lexicogra^ p h y, n a m e ly th e ty p e s e tt in g o f d ic tio n a r ie s . T h is is p e rh a p s n o t an issue w h ich is c e n tr a l t o c o m p u t a t io n a l le x ic o g r a p h y , y e t it is a s u b je c t w h ich d eserv es stu d y, e s p e c ia lly n o w w h e n th e a rts o f ty p e s e tt in g h a v e b e e n m o v in g o n t o th e d e sk to p . I w ill s h o w y o u th e a p p r o a c h w e h a v e a d o p t e d a t th e In s titu te o f L e x ico g ra p h y ,
a n d r e m a r k o n h o w it fits in t o o u r o v e r a ll s tr a t e g y fo r c o m p u ta tio n a l lex icog ra r
ph y.
L e t m e b e g in , in all m o d e s ty , b y q u o tin g m y self. In 1986 I w a s in v ite d t o g iv e a ta lk a t th e N o r d D a t a C o n fe r e n c e in S to c k h o lm . A t th a t tim e w e w ere ju s t e m b a r k in g o n w id e s p r e a d u se o f c o m p u te r s a t th e In s titu te , a n d I a tt e m p te d t o d r a w u p a s c h e m a tic d ia g r a m o f a ‘ L e x ic o g r a p h e r s ’ w o r k b e n c h ’ (see figu re 1 ),
c o m m e n t in g th a t a n u m b e r o f fe a tu re s h a d n o t b e e n im p le m e n te d . “ T h is h o ld s es p e c ia lly fo r th e ‘ m a n u s c r ip t w r ite r ’ . O u r w o r k h a s n o t y e t rea ch ed th e s ta g e w h ere th is is in g r e a t d e m a n d , b u t w e e n v isa g e th e p o s s ib ility o f u sin g th e d a ta b a s e to tu rn o u t m a n u s c r ip ts fo r a ty p e s e tt in g p r o g r a m like (F in d 1 9 8 6 :8 7 ).
W e ll, th is w a s w r itte n b e fo r e w e e v en h a d a v e rsio n o f ru n n in g at th e In s titu te ! A s a m a t te r o f fa c t , th o u g h w e e x p e c t e d th a t ty p e s e ttin g w o u ld b e
Figure 1 : T h e lexicographer's w orkbench, 1 9 8 6 vintage.
so m e th in g th a t w e w o u ld d e a l w ith m u ch la te r, a lo t o f w o r k o v e r th e p a st c o u p le o f y ea rs h as b e e n d e v o te d t o th e ty p e s e tt in g sid e o f le x ic o g r a p h y .
T h e r e a re tw o m a jo r rea son s fo r th is. T h e first is th a t th e e d it o r w a n ts t o b e a b le t o prin t p r o o fs w h ich a re as c lo s e ly re la te d t o th e fin a l fo r m o f th e d ic t io n a r y as p o ssib le . T h u s a ‘ m a n u s c r ip t w r ite r ’ h as in fa c t b e e n im p le m e n te d as a fe a tu re o f th e ‘ w o r k b e n c h ’ w e a re c u r r e n tly w o r k in g o n . T h e re la tio n s h ip o f th is ‘ m a n u s crip t w r ite r ’ t o th e w o r k o n th e v e r b a l d ic t io n a r y h a s a lr e a d y
b een to u c h e d o n in th e p a p e r b y B jö r n P o r S v a v a rsson a n d J ö r g e n F in d in th is v o lu m e .
T h e s e c o n d rea son is th e fa c t th a t w e h a v e b e e n e n g a g e d in p r o d u c in g Ic e la n d ic d ic tio n a r ie s fr o m m a n u scrip ts, ra th e r th a n fr o m a d a ta b a s e . F o r e m o s t a m o n g th e se is a n Ic e la n d ic e t y m o lo g ic a l d ic t io n a r y b y th e la te Å s g e ir B lö n d a l
M a g n u ss o n , fo r m e r e d it o r a t th e In s titu te , w h ich w ill a p p e a r la te r th is year.^ W e h a v e a ls o e m b a r k e d u p o n a series o f re p rin ts o f o ld e r Ic e la n d ic le x ic o g r a p h ic w ork s. T h e s e w o rk s h a v e b een c o d e d in th e TfeX t y p e s e tt in g la n g u a g e .
2
Named Categories and Visual Formatting
In recen t y ea rs a r e v o lu tio n h as b e e n ta k in g p la c e in th e t y p e s e tt in g in d u s tr y w h ere n u m erou s ‘ d e s k to p p u b lis h in g ’ p r o g r a m s h a v e g r a d u a lly b e e n r e p la c in g th e tr a d itio n a l t o o ls o f th e p rin ter. H o w fa r h as th is r e v o lu tio n a ffe c te d th e
[image:2.595.113.499.170.379.2]310
Computational Linguistics — Reykjavik 1989
d ic t io n a r y p u b lis h e r a n d m a k e r? I w a n t t o a rg u e th a t su ch sy stem s a re n ot su ite d fo r th e m a k in g o f d ic tio n a r ie s .
T r a d itio n a lly , d ic tio n a r ie s h a v e b e e n p r o d u c e d fr o m c o lle c tio n s o f slip s w h ich
h a v e b e e n u sed t o ea se th e ta s k o f k e e p in g th e d ic t io n a r y en tries in a lp h a b e tic a l o r d e r a n d t o a llo w th e m t o e x p a n d as n e e d e d , w ith o u t u n d u ly a ffe ctin g en tries w h ich fo llo w a lp h a b e tic a lly . T h e slip s h a v e th en o fte n b e e n u sed , w ith m in im a l
m a rk u p , as th e m a n u s c r ip t f o r th e p rin te r. In th e p a st fe w y ea rs a tte m p ts h ave, h o w e v e r, b e e n m a d e t o u se d a ta b a s e s y s te m s t o ea se th e a r d u o u s ta sk o f heindling th e c o lle c t io n s o f slip s, w ith s o m e s u cce ss ( c f . th e p a p e r b y B jö r n I>6r S vavarsson
a n d J ö r g e n F in d in th is v o lu m e ) . I f w e c o n s id e r fo r a m o m e n t th e n a tu re o f th e d a ta b a s e s y s te m , it is o b v io u s th a t o n e o f its m a jo r s tren g th s is th e fa ct th a t it a llo w s th e u ser t o a ssign n a m e s o r ta g s t o th e in d iv id u a l field s in th e d a ta b a s e . T h u s w e ca n e a sily im a g in e a d a ta b a s e sy s te m fo r le x ic o g r a p h ic w ork w h ich k n o w s a b o u t c a te g o r ie s su ch as headw ord, p r o n u n c ia tio n , gra m m a tica l
cod e, s e m a n tic field , u sa g e n o te s , a n d s o o n .
O n e o f th e ty p o g r a p h ic a l re q u ire m e n ts fo r a d ic t io n a r y is th a t s o m e o f these c a te g o r ie s s h o u ld b e re fle cte d in th e ty p e s e tt in g itself. T h is sh ow s fo r e x a m p le in th e u se o f d ifieren t fo n ts in d ic tio n a r ie s , t y p ic a lly u sed t o d istin g u ish s o m e o f th e
c a te g o r ie s . N o t e th a t o n ly s o m e o f th e c a te g o r ie s w ill b e th u s re flected , sin ce a ty p ic a l d ic t io n a r y c o n ta in s m a n y m o r e c a te g o r ie s th a n w o u ld b e d istin g u ish ed b y ty p o g r a p h ic m ea n s. S o m e d is t in c tio n s w ill th u s b e lo st in th e p rin te d d ic t io n a r y w h ich a re k e p t in th e d a ta b a s e sy stem s.
Id ea lly , th e le x ic o g r a p h e r w o u ld lik e t o u se th e d a ta b a s e t o a u to m a tic a lly g e n e r a te ‘s c r ip t s ’ f o r ty p e s e tt in g , s im p ly b y in s tr u c tin g th e d a ta b a s e t o prin t relev a n t ty p o g r a p h ic c o d e s a r o u n d s o m e o f th e field s a n d n o t o th e r s . A n even b e t t e r a p p r o a c h w o u ld b e t o ta g all th e c a te g o r ie s in th e ty p e s e ttin g s crip t a n d th e n in s tr u c t th e t y p e s e tt in g s y s te m as t o w h ich o n e s sh o u ld a ffect th e t y p e s e tt in g p r o c e s s a n d w h ic h o n e s s h o u ld n o t b e re fle cte d ty p o g r a p h ic a lly . T h is la t te r a p p r o a c h is e a s y e n o u g h t o a c c o m p lis h i f th e ty p e s e tt in g sy s te m a llow s
‘ g e n e r ic ’ o r a b s tr a c t c o d in g o f th e in p u t.
T h e d e s k to p p u b lis h in g s y s te m s m e n tio n e d a t th e b e g in n in g o f th is s e c tio n d o n o t a llo w su ch a b s t r a c t c o d in g (in d e e d v e r y fe w o f th em a re a b le t o dea l
w ith tr a d itio n a l ty p e s e tt in g c o d e s ) , s in c e th e y a re a lm o s t u n iv ersa lly b a se d on th e id e a o f ‘ d ir e c t m a n ip u la t io n ’ o r ‘ v isu a l fo r m a t t in g ’ . T h e u ser m a n ip u la tes a p o in tin g d e v ic e , s u ch as a m o u s e , t o m a rk p a rts o f th e te x t fo r , say, a fon t
ch a n g e . T h e n o t io n o f a b s t r a c t c o d in g p la y s n o p a rt a t all in th e fo r m a ttin g , a n d th u s it is im p o s s ib le in su ch a s y s te m t o fo r m a lin k b etw een th e ca te g o rie s o f th e d a ta b a s e s y s te m a n d th e t y p e s e tt in g . H o w e v e r, th is is, o f c o u r s e , o f th e
u tm o s t im p o r t a n c e f o r th e le x ic o g r a p h e r . A p r io r i, I w o u ld h a v e th o u g h t th a t th is lim ita tio n o f th e d e s k to p p u b lis h in g s y s te m s w o u ld ru le th em o u t as b e in g s u ita b le fo r le x ic o g r a p h ic w o r k , a n d I w a s th u s ra th e r su rp rised w h en I ca m e a c r o s s th e fo llo w in g d e s c r ip t io n o f th e a p p r o a c h ta k en a t th e d ic t io n a r y o f O ld
E n g lish in T o r o n t o .
tio n s a n d c ro s s -re fe re n c e s w h ich a re d istin g u ish e d b y ty p e — e m p h a sizes th e im p o r ta n c e o f in te r a c tiv e fo r m a ttin g . B e c a u s e th e w o r k in g c o p y o f th e e n tr y o n th e screen d e p ic ts th e fin a l a p p e a r a n c e o f a p a g e , w e h o p e t o im p r o v e c o n s is t e n c y .. .
. . . F o r e x a m p le t o p u t a k e y w o r d in b o ld in a c it a t io n , a n e d it o r ca n a c tiv a te th e a re a t o b e fo r m a tte d b y ra n g in g o v e r it w ith th e m o u s e , a n d th e n u se th e m o u s e t o s e le c t a n d a p p ly th e p r o p e r t y
b o ld fr o m th e C h a r a c te r L o o k s M en u (H e a ly 1 9 8 5 :2 4 8 ).
T h e s y s te m b e in g d e s c r ib e d is a X e r o x w o r k s ta tio n , ru n n in g p u b lis h in g s o ft w a re sim ila r t o p ro g ra m s ru n n in g o n th e M a c in to s h c o m p u te r .
A s m e n tio n e d earlier, th is a p p r o a c h is se v e re ly h a n d ic a p p e d b y th e fa c t th a t
th ere is n o e a s y w a y in a v isu a l fo r m a ttin g sy s te m t o fo r m lin k s t o th e c a te g o r ie s o f th e d a ta b a s e sy s te m b e in g u sed . I w o u ld th e r e fo r e lik e t o a rg u e th a t th e req u irem en ts w h ich n eed t o b e m a d e o f a ty p e s e tt in g s y s te m fo r le x ic o g r a p h ic
w o r k a re tw o fo ld .
• T h e ty p o g r a p h y s h o u ld b e o f th e h ig h e st o r d e r .
• T h e sy ste m m u st b e a b le t o w o r k w ith g e n e r ic o r lo g ic a l m a rk u p .
T h e s e re q u irem en ts a re m e t b y a n u m b e r o f sy s te m s . W e h a v e c h o s e n t o
w o rk w ith I t j X . In th e fo llo w in g p a g e s I w ill d e s c r ib e th e w a y w e h a v e u sed TfeX. W h ile s o m e o f y o u a re u n d o u b te d ly fa m ilia r w ith I fe X , I w ill p r e s u m e th a t n o t e v e ry o n e is, a n d a sk th o s e k n o w le d g e a b le t o b e a r w ith m e w h ile I g iv e a s h o r t tu to ria l in tr o d u c tio n t o T ^ .
3
W hat is T^jX? A Tutorial Introduction
is a ty p e s e tt in g s y s te m ‘ in te n te d fo r th e c r e a tio n o f b e a u tifu l b o o k s ’ t o q u o te 1 ^ ’ s a u th o r . P r o fe s s o r D o n a ld E . K n u th o f S ta n fo r d U n iversity . T h o s e w h o h a v e re a d his T ^ b o o k w ill a ls o k n o w th a t th e p r e v io u s q u o t e c o n tin u e s w ith ‘ a n d e s p e c ia lly b o o k s th a t co n ta in a lo t o f m a t h e m a tic s ’ .
is in d e e d th e p re m ie r sy s te m fo r t y p e s e tt in g m a th e m a tic s a v a ila b le in th e w o r ld to d a y , s o it is p e rh a p s s o m e w h a t s u rp risin g t o fin d it u sed f o r th e m a k in g o f d ic tio n a r ie s , in d e e d d ic tio n a r ie s w h ich c o n ta in n o m a th e m a tic s at all!
I w ill a tt e m p t t o d e s c r ib e w h y w e h a v e fo u n d t o b e e m in e n tly s u ita b le fo r th e ty p e s e tt in g o f o u r d ictio n a rie s.
3 .1 T h e B eginnings o f
It is p e r h a p s ra th e r s u rp risin g th a t w e s h o u ld b e a b le t o u se 1^)X a t all c o n s id
e rin g th a t it w as c r e a te d fo r o n e e x p r e s s p u r p o s e , v iz . t o a llo w D o n K n u th t o ty p e s e t h is o w n m a g iste ria l tre a tis e o n th e A r t o f P ro g ra m m in g in w h a t h e felt w o u ld b e an a c c e p ta b le m a n n e r. T h e s e b o o k s s ta r te d o u t b e in g t y p e s e t in le a d in
312
Computational Linguistics — Reykjavik 1989
. . . w h e n I r e c e iv e d g a lle y p r o o fs th e y lo o k e d a w fu l— b e c a u s e p rin tin g
t e c h n o lo g y h a d c h a n g e d d r a s tic a lly sin ce th e first e d itio n h a d b een p u b lis h e d . T h e b o o k s w h e r e n o w d o n e w ith p h o to ty p e s e tt in g in stea d o f h o t le a d M o n o t y p e m a ch in e s; a n d (a la s !) th e y w ere b e in g d o n e w ith th e h e lp o f c o m p u te r s in s te a d o f b y h a n d (K n u th 1 9 8 6 f:9 6 ).
T h is w a s in 1 9 7 7 . T h is le a d K n u th t o te m p o r a r ily a b a n d o n th e p r o je c t o f w r itin g th e A r t o f C o m p u t e r P ro g ra m m in g w h ile h e w o u ld m a k e u p his ow n s y s te m f o r th e t y p e s e tt in g , a ta s k w h ich h e e s tim a te d w o u ld ta k e a b o u t o n e year. In fa c t it t o o k n in e y ea rs o f c o n c e n tr a te d w o r k t o finish a n d i t ’ s c o m p a n io n p r o g r a m M E TflF O N T, w h ich is a s y s te m fo r g e n e r a tio n s o f le tte rfo rm s.
T h e s o u r c e c o d e fo r th e s y s te m h a s g r a c io u s ly b e e n p u t in th e p u b lic d o m a in b y K n u th . T h e p r o g r a m s a re w r itte n in WEB w h ich is a s p e c ia l sy stem fo r ‘lite r a te p r o g r a m m in g ’ (K n u t h 1 9 8 4 b ). A WEB p r o g r a m is p r o c e s s e d b y tw o p r o g r a m s . TANGLE m a k es a P a s c a l p r o g r a m fr o m th e WEB s o u r c e w h ich c a n th en b e c o m p ile d b y a P a s c a l c o m p ile r , w h ile WEAVE m a k es a s crip t fr o m th e sa m e s o u r c e , c o n ta in in g th e s o u r c e c o d e w ith c o m m e n t s a n d d e ta ile d in d ices. R u n n in g th is s c r ip t t h r o u g h p r o d u c e s a ty p e s e t v e rs io n o f th e p r o g r a m . K n u th has t h o r o u g h ly d o c u m e n t e d th e a n d M E TflF O N T p r o g r a m s in h is fiv e v o lu m e w o r k C o m p u t e r s an d T y p e settin g (K n u th 1986are).
3 .2
T h e N a tu re o f TteX
c a n b e d e s c r ib e d as a d o c u m e n t c o m p ile r o r a ty p e s e tt in g la n g u a g e . B o th te r m s re q u ire s o m e c la r ific a tio n .
In th e h is to r y o f c o m p u t e r s c ie n c e , m a n y c o m p u t e r la n g u a g e s h a v e e v o lv e d . S o m e o f th e se h a v e b e e n g e n era l p u r p o s e la n g u a g e s lik e P a s c a l o r C , o th ers h a v e b e e n s p e c ific a lly c r a ft e d fo r s o m e p a r tic u la r ta sk . is an e x a m p le o f a
s p e c ia l p u r p o s e la n g u a g e , a n d s o is M E TflFO N T. as a la n g u a g e h as p rim itiv e c o n s t r u c t s w h ic h re la te t o th e tr a d itio n a l a rt o f p rin tin g .
T h e o b je c t s w h ic h h a n d le s a re ‘ b o x e s ’ a n d ‘g lu e ’ , t o u se K n u t h ’s ter m in o lo g y (s e e fig u re 2 ). T h e sm a lle st b o x e s w h ich m a n ip u la te s a re th o se s u r r o u n d in g th e in d iv id u a l le tte rs. L a r g e r b o x e s c a n b e b u ilt o u t o f th e u n d e- c o m p o s a b le b o x e s s u r r o u n d in g th e le tte rs. T h u s a lin e o f t y p e is a ls o c o n sid e re d a b o x fr o m 'I^jK ’s p o in t o f v ie w . G lu e is th e s t u ff w h ich g e ts p u t b e tw e e n w o rd s a n d o t h e r b o x e s (t h o u g h n o t b e tw e e n th e b o x e s m a k in g u p in d iv id u a l w o r d s ). L e a d in g , th e d is t a n c e b e tw e e n c o n s e c u tiv e lin es o f ty p e , is im p le m e n te d in 1 ^ t h r o u g h in te rlin e g lu e . T h is ‘ b o x e s a n d g lu e ’ m o d e l tu rn s o u t t o b e su rp risin g ly
p o w e r fu l a n d e n a b le s 1 ^ t o p e r fo r m e x tr a o r d in a r y fe a ts o f ty p e s e tt in g fo r e x a m p le in th e t y p e s e tt in g o f m a th e m a tic s .
S o m e o f l ^ ’s a lg o r ith m s a re q u ite w ell k n o w n . T h is is e s p e c ia lly tru e fo r th e p a r a g r a p h s e tt in g a lg o r ith m (P la s s a n d K n u th 1 9 8 2 ), as w ell as th e h y p h e n a tio n a lg o r ith m d e v is e d b y F ra n k L ia n g (L ia n g 1 9 8 3 ).
Topline glue
Interword glue
KnutlVs hbxEs
affl 2luH mddtel
Interline glue
Line-final glue
Figure 2: T ^ 's boxes-and-glue model
b y n o tin g th e e x te n t t o w h ich th e in te r -w o r d g lu e h a s t o str e tc h o r sh rin k . sets th e p a ra g ra p h b y m in im iz in g th ese d e m e rits . T h e in te re s tin g th in g t o n o te is th a t th is m ea n s th a t th e pciragraph as a w h o le is ty p e s e t in o n e g o a n d a w o r d
c o m in g la te in a p a ra g ra p h ca n in flu e n ce th e s e ttin g o f lin es c o m in g ea rlier in th e p a ra g ra p h .
L ia n g ’s a lg o rith m fo r w o r d h y p h e n a tio n is p a tte r n -b a s e d , b u t d e p a r ts fr o m o ld e r v ersion s b y u sin g b o t h v a ria b le le n g th p a tte r n s a n d p a tte r n s w h ich b o t h
a llo w a n d in h ib it h y p h e n a tio n p o in ts . I w ill n o t d iscu ss th is a n y fu rth e r h ere, b u t s im p ly n o te th a t h is m e t h o d g iv e s e x ce lle n t resu lts in a n u m b e r o f la n g u a g e s b esid es E n g lish . In p a r tic u la r th e Ic e la n d ic h y p h e n a tio n ta b le d o e s a v e r y c r e d ita b le j o b o f h y p h e n a tin g .
TfeX h as n u m e ro u s p rim itiv e s (a r o u n d 3 0 0 ) fo r d e a lin g w ith ty p e s e tt in g a n d a lso a v e ry p o w e r fu l m a c r o p r o g r a m m in g la n g u a g e . It is th is la tte r w h ich g iv e s
its s ta tu s as a p r o g r a m m in g la n g u a g e .
H ere a re a fe w e x a m p le s o f th e p r im itiv e o p e r a tio n s w h ich o p e r a te s w ith .
N o t e th a t p rim itiv e s a n d IfeiX m a c r o s a re e x p re s s e d w ith ‘ c o n t r o l s e q u e n c e s ’ . T h e s e u su a lly sta rt w ith a s p e cia l ‘ e s c a p e c h a r a c te r ’ w h ich is t y p ic a lly \, th e b a ck sla sh .
• \ k e r n . T h is c o m m a n d is fo llo w e d b y a d im e n s io n s p e c ific a tio n (e .g . in p rin te rs ’ p o in ts ) a n d m o v e s th e p la c e m e n t o f tw o b o x e s re la tiv e ly t o ea ch o th e r . N o t e th a t b o x e s c o m e in h o r iz o n ta l a n d v e r tic a l v e rsio n s a n d \ k e r n c a n b e u sed t o p o s it io n b o x e s b o t h v e r tic a lly a n d h o riz o n ta lly , d e p e n d in g
[image:6.595.100.486.174.392.2]314
Computational Linguistics — Reykjavik 1989
( e .g ., le tte r p a irs lik e ‘ V ’ a n d ‘ A ’ w h ic h , b e c a u s e o f k ern in g, a re p rin te d as ‘V A ’ r a th e r th a n ‘ V A ’ .
• M o o s e n e s s . C h a n g e s t o lo o s e n e s s m ea n th a t w ill a tte m p t t o set a p a r tic u la r p a r a g r a p h in m o r e o r few er lin es th a n th e o p tim a l s e ttin g calls fo r . B y s e ttin g \ l o o s e n e s s = l an a t t e m p t is m a d e t o o p e n th e p a ra g ra p h a n d set it o n e lin e lo n g e r th a n w o u ld b e th e ca s e i f n o M o o s e n e s s is s p e c ifie d .
• \ f o n t d i m e n . T h is c o m m a n d e n a b le s o n e t o q u e r y th e ‘ cu rren t fo n t ’ fo r
fo n t p a r a m e te r s lik e th e x -h eig h t, n o r m a l s p a c in g , e tc .
• \ p e n a l t y . C o n t r o ls th e d e s ir a b ility o f b re a k in g a t a p a rticu la r p o in t. P en a l ties c a n b e b o t h p o s it iv e (m a k in g a b r e a k less lik e ly ) a n d n e g a tiv e (in d ic a t in g d e s ir a b le b r e a k p o in t s ) . In fin ite p e n a ltie s (h a v in g a va lu e g re a te r th an 1 0 0 0 0 ) e ith e r f o r c e ( \ p e n a l t y = - 1 0 0 0 0 ) o r p r o h ib it (\ p e n a lt y = 1 0 0 0 0 ) a
b r e a k a t a p a r tic u la r p o in t.
• \ s f c o d e . T h e ‘ s p a c e fa c t o r c o d e ’ is u sed t o c o n tr o l th e s tr e tc h in g o f sp a ces a fte r in d iv id u a l c h a r a c te r s . U sin g th e \ s f c o d e m ak es it p o s s ib le t o , say, s tr e tc h s p a c e s a fte r p e r io d s m o r e th a n a fte r o r d in a r y ch a ra cters.
• \ h y p h e n c h a r . V e r y fe w th in g s a re h a r d -w ir e d in to 1 ^ . E v en th e h y p h e n # a tio n c h a r a c te r c a n b e c h a n g e d . B y s e ttin g \ h y p h e n c h a r \ t e n n n = ' \#, w ill u se th e h a s h -m a rk as th e h y p h e n a tio n c h a r a c te r fo r th e 10 p t R o m a n fo n t (w itn e s s th e first lin e o f th is p a r a g r a p h ).
3 .3
as a P rogram m in g Language
T tjX h a s a v e r y p o w e r fu l m a c r o la n g u a g e w h ich ca n b e u sed t o w rite m a cro s a t a lm o s t a n y le v e l o f a b s tr a c t io n . T h e e x e c u tio n o f th ese m £icros tak es p la ce t h r o u g h a p r o c e s s o f m a c r o e x p a n s io n , w h e re th e m a c r o s a re g r a d u a lly red u ced t o p r im itiv e s o f th e la n g u a g e . S in ce m a c r o s c a n c a ll o th e r m a c r o s , it is p o s s ib le t o s tr u c tu r e th e c o d e in a s y s te m a tic w a y b y g r a d u a lly m o v in g fr o m
p r im itiv e c o n s t r u c t s t o m o r e a b s tr a c t o n e s.
”IteX o b s e r v e s a b lo c k s tr u c tu r e , lik e m o s t o th e r p rog rsim m in g la n g u a g es. T h e b lo c k s tr u c tu r e is a c h ie v e d b y u sin g th e s y m b o ls fo r ‘ o p e n ’ a n d ‘ c lo s e g r o u p ’ w h ich a re u s u a lly th e c u r ly b r a c e s { a n d } . U sin g g r o u p in g , it is a s im p le m a tte r t o s tr u c tu r e c o d e , s u ch th a t th e lik e lih o o d o f n a m in g c o n fiic ts are lessen ed.
T h e m a c r o la n g u a g e , lik e m o s t o th e r m cicro la n g u a g es, u ses registers f o r th e diflferent ‘ d a t a t y p e s ’ w h ich a re a v a ila b le. T h e s e reg isters c o m e in five v a rieties:
C o u n t re g iste rs a r e u sed fo r k e e p in g in te g e r va lu es (3 2 b it ). h as p rim itiv e o p e r a t io n s f o r in te g e r a r it h m e tic o n ly , b u t th is is u su a lly n o t a p r o b le m . T h e fo llo w in g p ie c e o f c o d e d e c la r e s a c o u n t re g ister n a m e d \ f i g n o w h ich is in itia liz e d t o 0:
\ n e w c o u n t \ f i g n o
T h e c o d e fo r th e fig u re m a c r o w o u ld th e n ta k e c a r e o f p la c in g th e fig u re a n d a ssign in g a n u m b e r t o it w h ich w o u ld b e in c r e m e n te d f o r e a ch fig u re. T h is last
o p e r a tio n is a ch ie v e d b y ;
\ a d v a n c e \ f i g n o b y 1
D im e n s io n reg isters a re u sed fo r p r in te r s ’ d im e n s io n s , p o in ts , p ic a s , m illim e ters, e tc . T h e fo llo w in g p ie c e o f c o d e d e cla re s a d im e n s io n r e g iste r a n d th en in itia lizes it.
\ n e w d im e n \ p a g e w id th \ p a g e v id th -1 7 0 m m
N e x t c o m e th e g lu e reg isters o r ‘ s k ip ’ reg isters. T h e s e c o n ta in g lu e sp e cifica r tio n s . T h e fo llo w in g e x a m p le illu stra tes th e d e fin itio n o f th e \ s m a l l s k i p m a c r o
w h ich m ak es u se o f th e s m a lls k ip a m o u n t g lu e reg ister:
\ n e v s k ip \ s m a lls k ip a m o u n t
\ s m a lls k ip a m o u n t = 3 p t p l u s I p t m in u s I p t \ d e f \ s m a l l s k i p { \ v s k l p \ s n i a l l s k i p a m o u n t }
T h e \ s m a lls k ip a m o u n t re g ister is set t o 3 p t p lu s I p t m in u s I p t . T h e m a c r o \ s m a l l s k i p is d efin ed as a v e r tic a l sk ip ( \ v s k i p ) o f \ s m a lls k ip a m o u n t .
F in a lly , w e c o m e t o th e b o x reg isters w h ich a re u sed fo r h o ld in g th e b o x e s
g ra d u a lly a c c c u m u la te d fo r ea ch p a g e . B o x e s h a v e th ree d im e n s io n s , as m e n tio n e d b e fo r e . T h e s e c a n b e q u e rie d o r se t, u sin g th e p r im itiv e s \wd, \ h t , a n d \ d p fo r th e w id th , h eig h t, a n d d e p th , re sp e ctiv e ly .
3 .4
D efining M acros
W e h a v e a lr e a d y seen o n e e x a m p le o f h o w m a c r o s a re d e fin e d . T h is is d o n e w ith th e \ d e f p r im itiv e . M a c r o s c a n ta k e a rg u m e n ts , it is e v en p o s s ib le t o h a v e m a c r o s w h ich ch e ck f o r o p tio n a l a rg u m en ts, a h ig ly u sefu l fe a tu re . A ty p ic a l m a c r o w ith a rg u m en ts is th e fo llo w in g sim p le m a c r o fo r s e ttin g h e a d w o r d s in b o ld fa c e . ( T h e
p e rce n t sig n 7. is u su a lly a co m m e n t c h a r a c te r in I ^ . A n y t h in g c o m in g a fte r th e 7. o n a lin e is ig n o r e d b y 1 ^ . )
\ d e f \ h v o r d # l{ 7 t m a c r o f o r t h e h e a d w o r d { \ b f # l \ m a r k { # l } } }
T h is sets th e h e a d w o r d in b o ld fa c e ( \ b f ) a n d d efin es a ‘ m a r k ’ . T h is m a r k c a n , fo r in sta n ce, b e u sed t o e s ta b lis h th e ra n g e o f en tries o n a p a r tic u la r p a g e o f a d ic tion a ry . T h e p a ra m e te rs a re d e n o te d b y # a n d th e y a re n u m b e re d c o n s e c u tiv e ly , sta rtin g w ith # 1 .
316
Computational Linguistics — Reykjav^ 1989
is th e n a s tr a ig h tfo r w a r d m a t te r t o ch e c k w h e th e r th e list o f h e a d w o rd s th u s g en e r a te d is in c o r r e c t a lp h a b e tic a l o r d e r . T h is c a n b e a c c o m p lis h e d in th e fo llo w in g m a n n e r:
\newwrite\hwordfile '/. first a file is defined
\newif\ifproofmode '/. A conditional is declared
\proofmodetrue
Are we printing proofs? Yes we are.
\ifproofmode \messaigef**** Printing proofs
* * * * * }\inunediate\openout\outf ile=\j obname.hwrd
\def\hword#l{'/, macro for the headword
{\bf#l\mark{#l}}
\immediate\write\outf ile-f#!}}
\else \message{**** Final rim ***♦*}
\def\hword#l{% macro for the headword
■(\bf#l\markf#l}}}
\fi
T h e c o n d it io n a l c o n s tr u c tio n
\if
\else
\fi
th u s m a k es it e a s y t o p rin t d ifferen t v e rsio n s o f th e sa m e m a n u scrip t a c c o r d in g t o n e e d .
T h is h a s o n ly b e e n tlie b rie fe s t o f in t r o d u c tio n s t o as a p r o g r a m m in g la n g u a g e , b u t it s h o u ld , I h o p e , rev ea l t o th e re a d e r s o m e th in g o f th e fla v ou r o f th e TfeX la n g u a g e .
3 .5
in Iceland
T h e In s titu te h a s b e e n re s p o n s ib le fo r in tr o d u c in g TfeX in to Ice la n d . I have
ea rlie r d e s c r ib e d th e s te p s u n d e rta k e n t o m a k e T]eX w o r k w ith Ic e la n d ic (J ö rg e n P in d 1 9 8 8 ). In p a r tic u la r :
• It w a s n e c e s s a r y t o m a k e a set o f p a tte r n s fo r t o a ch ie v e c o r r e c t (o r n e a r ly c o r r e c t ) h y p h e n a tio n . T h e p a tte r n s w ere g e n e ra te d b y F ra n k L ia n g ’s p r o g r a m
PATGEN, u sin g as in p u t a 2 1 0 .0 0 0 w o r d d ic t io n a r y m a d e b y th e
In s titu te f o r I B M in Ic e la n d t o u se in I B M sp e llin g checkers.^
• T h e C o m p u t e r M o d e r n F o n ts h a d t o b e a d a p te d t o I c e la n d ic b y a d d in g a few ch a ra cte rs (e .g ., ‘9 ’ (e t h ) eind ‘ 1>’ (t h o r n ) ) .
• C h a n g e s h a d t o b e m a d e t o th e s ta n d a r d m a c r o c o lle c t io n s t o a llo w fo r n ew fo n ts a n d s o m e d ifferen ces in c h a r a c te r d e fin itio n s.
W it h th e se ch a n g es, h as b e e n fo u n d t o w o r k a d m ir a b ly fo r I c e la n d ic a n d h as a lr e a d y b e e n u sed t o ty p e s e t a n u m b e r o f b o o k s . I g u ess Ic e la n d m u st b e u n iq u e in h a v in g b r o u g h t o u t a n u m b e r o f T ^ e d b o o k s a n d y e t n o m a t h e m a tic s
b o o k h as b e e n ty p e s e t w ith th e Ic e la n d ic v e rs io n o f as y et!
4
Typography and Dictionaries
4 .1 Som e G eneral Observations
T h e ty p e s e ttin g o f d ic tio n a r ie s u s u a lly p resen ts fe w p r o b le m s . D ic tio n a r ie s a re u su a lly set in tw o o r th re e c o lu m n s w h ich a re ra th e r n a rro w . T h is c a n o ft e n le a d
t o difiSculties w ith lin e -b re a k in g , sin ce th e n a rro w c o lu m n s lea v e r e la tiv e ly little la titu d e fo r th e p a r a g r a p h -b r e a k in g a lg o r ith m . F o r th is r e a so n , it is a d v a n ta g e o u s t o c h o o s e a fo n t w ith a n a rro w set w id th , a n d , s e c o n d ly , it is n e ce s s a ry t o a llo w th e ty p e s e ttin g p r o g r a m m o r e fie x ib ility in s tr e tc h in g a n d c o m p r e s s in g in te rw o rd sp a ces th a n is n o r m a l in b o o k s w h ich a re set t o th e fu ll w id th o f th e p a g e . In
th is fle x ib ility is c o n tr o lle d w ith th e p r im itiv e \ t o l e r a n c e .
W h e n th e c o lu m n s a re set in reg ister, as is u su a lly th e ca se , w id o w lin es a re
b o u n d t o o c c u r b e c a u s e th e le a d in g (in te rlin e g lu e in I f e X ) is n o t a llo w e d t o waxy. T h e s e c a n b e g o t rid o f b y s tr e tc h in g o r sh rin k in g th e p a ra g ra p h ( o r p a ra g ra p h s o n th e p r e v io u s p a g e o r p a g e s ). In th is is c o n tr o lle d b y th e \ l o o s e n e s s
p rim itiv e . I f o n e is p re p a re d t o a c c e p t fu ll w id o w lin es (a s w e o c c a s io n a lly d id in th e e ty m o lo g ic a l d ic t io n a r y ), it is p o s s ib le t o a c h ie v e th is in b y s e ttin g th e g lu e re g ister \ p a r f i l l s k i p e q u a l t o 0 p t, th u s d r a w in g th e la st lin e o f a p a ra g ra p h o u t t o th e fu ll w id th o f th e c o lu m n .
I f th e c o lu m n s a re n o t set in re g ister (a s is, fo r e x a m p le , th e ca s e in th e O x fo r d E n g lish D ic t io n a r y w h ere th e q u o ta tio n s a re set in sm a lle r ty p e , th u s
fo r c in g v a ria b le le a d in g ), it is m u ch ea sier t o c o n t r o l fo r w id o w lin es sin ce th e s p a c e b e tw een p a ra g ra p h s ca n ea sily b e v a ried (th is is d o n e in 1 ^ w ith th e \ p a r s k i p p r im itiv e ).
It is c u s tu m a r y in d ic tio n a r ie s t o p rin t w o r d s a t th e t o p o f th e p a g e , s h o w in g th e ra n g e o f th e en tries o n th a t p a g e . T h is p r o c e s s ca n v e r y e a sily b e a u t o m a t ed in I ^ , u sin g th e \m ark. B y \ m a r k in g all h e a d w o r d en tries a n d d e fin in g
318
Computational Linguistics — Reykjavik 1989
5
W ork Finished and in Progress
T h e m a jo r p e r fo r m a n c e te st o f fo r le x ic o g r a p h ic w o rk w as th e ty p e s e ttin g o f th e e t y m o lo g ic a l d ic t io n a r y b y Å s g e ir B lö n d a l M a g n u ss o n . T h is b o o k ru ns to 1231 t w o -c o lu m n p a g e s w ith fo r t y p a g e s o f in t r o d u c t o r y m a te ria l. t o o k ca re o f th e t y p e s e tt in g o f all th e p a g e s e x c e p t fo r tw o p a g es w h ich c o n ta in illu stra tion s d e m o n s tr a tin g th e u se o f th e d ic tio n a r y . T h e s e tw o p a g es w e re d esig n ed w ith a
d r a w in g p r o g r a m .
O rig in a lly , it w a s n e v e r in te n d e d th a t th e e t y m o lo g ic a l d ic t io n a r y w o u ld b e
t y p e s e t w ith 1 ^ . W h e n k e y b o a r d in g o f th e m a n u scrip t b e g a n in 1985, w e d id n o t h a v e I fe X , a n d th e c o d in g o f th e m a n u s crip t w as su ch th a t it w o u ld b e ea sy t o tra n s fe r it t o a p rin te r fo r ty p e s e tt in g w ith tr a d itio n a l p r in te r s ’ ty p e s e ttin g c o d e s . H o w e v e r, in J a n u a ry 1 9 8 9 , w h e n w e w ere re a d y t o tu rn th e m a n u scrip t
o v e r t o th e p rin te r, it tu rn e d o u t th a t th e y d id n o t h a v e all th e ch a ra cte rs n eed ed fo r th e ty p e s e tt in g , a n d w o u ld a ls o h a v e diflSculties w ith a ll th e d iv erse flo a tin g aurcents w h ich th e b o o k c o n ta in s . A t th a t p o in t I d e c id e d t o m a k e s o m e tria l runs
w ith u sin g P o s t S c r ip t fo n ts ( A d o b e T im e s R o m a n ). It tu rn e d o u t th a t n o p r o b le m s w h e re e n c o u n te r e d w h ich c o u ld n o t ra th er e a sily b e s o lv e d . E v en th e fa c t th a t P o s t S c r ip t h as a fa ir ly lim ite d ch a r a c te r r e p e r to ir e c o u ld b e rem ed ied b y d r a w in g th e m issin g ch a r a c te r s w ith F o n to g r a p h e r , a fo n t g e n e r a tin g p ro g ra m ru n n in g o n th e M a c in to s h (A lt s y s C o r p o r a t io n 1 9 8 9 ).
F ig u r e 3 s h o w s a s a m p le p a g e fr o m th e d ic t io n
2
iry.O u r m a jo r p r o je c t in th e fu tu r e w ill, o f c o u r s e , b e th e d ic t io n a r y o f v erb s o u tlin e d in th e p a p e r b y J 6n H ilm a r J ö n s s o n in th is v o lu m e . T h e e d itin g w ill ta k e p la c e in a d a ta b a s e s y s te m , a n d th e o u t p u t o f th a t sy ste m , a s crip t, w ill b e g e n e r ic a lly c o d e d .
A d d itio n a lly , w e h a v e ju s t e m b a rk e d o n a p r o je c t t o rep rin t s o m e o ld e r Ice la n d ic le x ic o g r a p h ic w ork s. W o r k is n o w in p ro g re s s o n fo u r o ld e r d ictio n a rie s. T h e s e axe a ll c o d e d in th e la n g u a g e , a n d th e in te n tio n is t o b rin g th ese
o u t in n e w e d it io n s . T h e s e a re d e a lt w ith as te x tu a l o b je c t s , th o u g h th e g en eric c o d in g w o u ld , o f c o u r s e , c o n s id e r a b ly e a se th e ta sk o f p u ttin g th em o n lin e , if
th a t s h o u ld b e d e c id e d a t a la te r s ta g e (c f. A lsh a w i e t a l. 1 9 8 9 ).
6
Issues of Coding
In re ce n t y e a r s , m o r e a n d m o r e a tt e m p ts h a v e b een m a d e t o u se d a ta b a s e sy stem s fo r th e c r e a tio n o f d ic tio n a r ie s . W h e n a d a ta b a s e is u sed fo r a d ic tio n a r y , it b e c o m e s p o s s ib le t o n a m e th e field s w h ich a re b e in g en te re d . T h e d a ta b a s e
ädess afdan kaO u r
hreyfingu f leSJu e6a for, sbr.
löna
aflön.
Sjå so.ölila.
édess. Oédeis h. (18. öld) 'öhreinindi; öhapp;
ådrepa'; af fs.
ö
ogdess
af so.dessa
(<*del(l)sa
< *danlisön),
sbr.aS dessa niOur ö e-m
‘{>agga niflur I e-m' ogdessas!
'saurgast, versna’. Eiginl. ‘t>afi sem dettur å e-n eSa skellur å e-m’. Sjådess.
aOili k. 'hlutafieigandi'; aSild kv. 'hlutdeild', sbr.
sakaralUld, réttaraÖUd
o.s.frv. Orö jressi lutui
önd- verfiu aö skyldu og rétti æitingja (efia tengdamanna) i målaferlum, sk.adal
(1) ogaOall.
AOill k. fnorr. karlmannsnafn, sbr.
aSall
ogaDili.
AOils k. karlmannsnafn; sbr. sæ.Adils,
sæ. runar.Apisl < *AÖglsl,
fe.Eadgils.
Forliöurinna8- i
skylt viOaöal-
(2) ogöDal,
sbr. fsæ. pn.Adr,
um viSliBinn sjågisl
(1).adfu, adjö uh. (18. öld) ‘kveBjuorÖ’. To. ur d.
adjø
<
fr.adieu
<a Dieu,
eiginl. ‘guB veri meB |jér’.admfréll. aBmiréll k. (nfsl.) 'sjöliBsforingi'. To.
ur d.
admiral <
ffr.a(d)miral
(s.m.) < arab.amir
‘höfBingi’. Sjå
emir.
Adélf k. karlmannsnafn; lökunafn, Kkl. ættaB ur
)>., sbr. nhj).
Adolf,
Ih|).Alhaiwolf, Alhulf,
gotn.Aih-
aulfs;
llkl. <*aj>a-wulfaz.
Sjåaöall
ogulfur.
adressa kv. (19. öld) 'heimilisfang'; adressera s.
‘skrifa heimilisfang, ...’. To. ur d.
adresse, adressere
ættuB ur fr.
adresser,
sbr. lat.ad
'til' ogdirectum
(l.h.) 'beint'.
aBsjåll 1. ‘nfskur, naumur ( litlåtum’ <
*al-séall-,
e.t.v. leitt af gamalli forskeyttri so., sbr. gotn.
aisaih-
wan
‘gaumgæfa’ og fsl.sjö ad sér,
-aBur, t-aSr k. viBsk. no. eins og
munadur, unad-
ur.
Skiptist å viB-udur
(s.t>.) og er komiB af germ.*-ö-pu-.
betta viBsk. er runniö af verknaBarviBsk. *-/)i(- < ie.*-tu-
sem skeytt var viB stofn ö-sagna. Vfxl-ad-
og-ud-
eru upphaflega håB sérhljöBi eftir- farandi endingar, t.d. nf. et.*-apuR
>-udr,
en ef. et.*-apoR > -adar,
og gegndu jressar tvær myndir viBsk. (upphah sama hlutverki, en sfBar hefur-ad-
veriB aB mestu sérhæft (verknaBarmerkingu, en-ud-
aB mestu (gerandmerkingu. Sjå-udur, -nadur
og-nudur.
aBvenla kv. ‘jélafasta’. To., komiB ur lat.
adrenius
'koma', o: koma eBa fæBing Krists f heiminn.
aBventislar k.ft. krislinn truflokkur; nafngiftin
l^ur aB tru jreirra å endurkomu Krists.
aBvifandi Ih.nt.:
koma a.
‘koma aB eins og af til-viljun’. Sjå
*\jfa
(2).1af fs. (ao.) ‘frå, burt'; sbr. fær., nno. og sæ. av,
d.
af,
gotn.(rf,
fe.af, of,
fhji.ab(a),
lat.ah
(<*ap),
gr.
åpolapö',
sk.afar, afr
(2).aflur, al
(4),efja, eflir,
efsa, öfund, öfugur
og e.t.v.afiann.
Sjåaf-
(2).2af- forskeyti; sbr. fær., nno. og sæ. av, d.
af-,
gotn.
af-,
fe.of-,
fh|).ab-, aha-, abo-,
lat.ab-,
gr.apo-,
fi.apa-.
Sjå fs.af.
Ymist gamalt forskeyti einsog t.d. (
afhragd, aflåt, afråd, afrek
o.s.frv. eBa sfB ar forskeytt fs. eBa ao., sbr. t.d.afdrdnur, afhyda.
qfrakja
o.fl. ForskeytiB heldur oft eiginlegri (slaBar- legri) merkingu sinni, sbr. t.d.afhjarga, affjalla, af
hus, afhvarf,
en stundum verBur tåkngildi (>ess niBr- andi eBa herBandi, t.d.afgelja, afgera, afdi
‘ofål’.afgamall, afkostir, afstopi
‘ofstopi’, eBa meira eBa minna öeiginlegt, t.d. iafråd, afrek.
ifa
kv., merking ekki fullljös, en Ifkl. ‘fjandskap- ur, mein', sbr. ffsl.ipll ok
(5/u /farik åsa Sonum
(Lokas.). Sumir telja aB
åfa
sé f ætt viB lo.afur
og
åfa
kv., en stofnsérhIjöBiB, germ.*é,
er annars ö|>ekkl f jreirri orBsift. ABrir ætla aBåfa
(f Lokas.) sé eiginl. s.o. ogåfå
og |iå <*åfo < *åfå.
Enn aBrir lengja orBiB viB vo/a kv.; Iftt sennilegt; d/aerstakorB og rithåttur ekki öruggur, e.t.v. stenduråfu
fyriröfu
og orBiB jiå s.o. og
åfa
og lengt lo.afur.
Allt övfst.åfå kv. (18. öld) ‘åhrif, t.d. af vfnanda', sk.
åfengur
1. ‘sem hn'fur å’; åfengi h. 'vfnandi' ogåfang
h. E.t.v. <*anfa(n)hö
dregiS af forskeyttri so.*anfa(n)han,
sbr. fhj).anafåhan
'byrja' (eiginl. ‘grfpa å’), eBa myndaB af so.få
(1) eBa öllu heldur samb.fåå.
åfang h. t ‘åtak, hnjask, ofbeldi’; e.t.v. leitt af
forskeyttri so.
*anfa(n)han
‘grfpa f, byrja', sbr. fh[).anafang
‘ålak, hrifs, byrjun'; sk.åfå
ogåfengur.
Sjåfå(l).
åfangi, tåfangr k. Sjå
åirangr.
afar ao. ‘mjög‘, einnig forskeyti afar-, sbr. afar- kostir; liklega sama orB og gotn.
afar
‘å eflir. sfBar',(h|).
arar, ahur
'aflur', sbr. nfsl. afur- (<*afr-)
semnotaB er sem forskeyti f Ifkri merk. og
afar- (afuryrdi,
afurnagandi)
ogaf-
(2) sem stundum er haft i herB andi merkingu, t.d.afkostir
s.s. afarkostir,afgamall
‘mjög gamall';
afar
s;^nisi vera einsk. miSstig af fs. eBa ao.af,
sbr. h.åpara-
‘aftari, sfBari'. ABrir telja aBafar
sé sk. gotn.ahrs
‘sterkur’. Sjåafr
(2).af-baka s. (16. öld) ‘aflaga, skekkja'; sbr. nno.
avbakleg
‘öfugsnuinn, 0hægur, erfiBur, afskekktur',avhekl
‘jjver, öfugur’, sæ. måli.dhdklig
‘luralegur, ölögulegur', fær.avhekladur
‘illa iroBinn, aflagaB- ur (um skö)'. Myndun orBsins er öljös, t>6tt |>aB sé synilega tengi no.hak.
F.J. (1914) ætlar aB |)aB merki f öndverBu‘ab
bakfletla trjåviB, höggva åvala af trjåm' og slyBsI |iar m.a. viB umsögn B.H., en )>aB samræmisl Iftt merkingu og formi nno. og sæ. orBmyndanna. Sjåhak
ogbekilk,
ath.hakill.
-baldi k. (nfsl.) ‘ofsafenginn maBur’, sk.baldinn
I. ogof
beldi
h. -bragB h. ‘e-B fråbært’; sbr. nno.avbragd
og
fær.avbragd-
favhragdsstyrki
‘mikiB afl‘. Leitt af so.*ab-bregdan
eBabregda af,
sbr.afbrugdinn
‘fråbrugBinn, öllkur’ og
afhrudig(u)r.
-bråBig(u)r I., af-brySi (faf-brygBi) kv. Sjådhrudig(u)r.
-danka s.(nfsl.) ‘svipla metorBum eBa stöBu’; -dankaBur I.
320
Computational Linguistics — Reykjavik 1989
T h e tr a d it io n a l w a y o f m a k in g a d ic t io n a r y h as b een t o p r o c e e d in a so m e w h a t d ifferen t m a n n e r , w r itin g th e d ic t io n a r y en tries o n slip s o f paper.^
W h ile th e c o m p a r is o n b e tw e e n slip s o f p a p e r , a file c a b in e t, a n d a d a ta b a se s y s te m is o ft e n m a d e , th is c o m p a r is o n is s o m e w h a t m is le a d in g sin ce ca te g o rie s on
th e w r itte n sh eets o r s lip s a re u su a lly n o t n a m e d . In th e case o f d ictio n a rie s th is is m o s t cle a r ly th e ca se. A n e x a m p le w ill sh o w th is. F ig u re 4 sh ow s a slip fr o m th e c o lle c t io n w h ic h w as u sed in th e m a k in g o f th e first sta n d a rd d ic t io n a r y o f
Ictur (-u rif pi. ds.) (le:døQ, le:tøe] n. 1. a. SbrifI, Typer: g o t n e s b t , l a t n e s k t /.; fæ r a e - d i letu r^ optegne n*t, fere i Pennen; s e t t /., en Slags Halvfraktur, nsrmende sig til Schwabacherlypen. ^ *b* l e t u r s la n d , Papir (DdluHj. 255); l e t r a r o lta (egl. Typefaar) (DdluHj. 217) = p r e n t s m i d f a , —
2. Indskrift: /. <t s t e i n i . -b a n d [«r-bani] n. Forkortelse, Abbreviatur.
•breyting l-brei:dii]k, >brei:l-] f. Udhævelse. -gerB , -gjorB I*QerB, •QdrB| f. I* Dogslavskrift, Typernes Karakter: l e t u r g e r d in e r a l t 6 n n - u r , Typerne er af en helt anden Karakter. — 2, Skrivning: h v o r u g u r k e i r r a h a f 9 i n u m i d s u o m ikt'B i le t u r g j o r B , a S ^ e i r m æ it u rita n S fn sin (]ThMk. 362). — 3. a. ( s a m n i n g r i t s ) Optegnelse, Affattelse af et Skrift.
Figure 4: A sample entry from the dictionary by S ig f u s B lo n d a l and one o f the dictionary slips on which it is based
m o d e r n I c e la n d ic , S ig fu s B lo n d a l’s Ic e la n d ic -D a n is h d ic t io n a r y (B lo n d a l 1923). It is q u ite o b v io u s th a t n o c a te g o r ie s as su ch are m a rk ed o n th e slip . T h e y ca n ,
h o w e v e r, b e in fe r r e d fr o m th e s lip b y th e u se o f m a rk in g s w h ich in d ic a te differen t fo n ts . T h e slip im p lic it ly m a rk s c a te g o r ie s b y th e use o f u n d e rlin in g a n d oth er t y p o g r a p h ic a l m a rk s.
[image:13.595.157.443.339.575.2]T h is a p p r o a c h is q u ite n a tu ra l, c o n s id e r in g th a t d ic t io n a r y e d it o r s , su ch as Sigfiis B lö n d a l, w ere w o r k in g w ith th e s o le a im o f p r o d u c in g a p r in te d d ic tio n a r y . T h e y th o u g h t o f th eir w o r k as th a t o f p r o d u c in g a text, a n d th e ir a p p r o a c h w as q u ite p la in ly a ‘ ty p o g r a p h ic a l’ o n e w h e re th e o n ly th in g s th e y n e e d e d t o k eep d is tin ct in th e m a n u scrip ts w ere ch a n g e s w h ich w o u ld s h o w u p o n th e p rin te d p a g e, lik e fo n t ch a n g es.
T h is a p p r o a c h has n o d o u b t b e e n a lm o s t u n iv e rsa lly fo llo w e d , a t le a st un til q u ite recen tly . S o m e p u b lis h e d d ic tio n a r ie s h a v e b e e n m a d e a v a ila b le t o re sea rch ers. T h e s e a re g e n e ra lly t y p o g r a p h ic a lly c o d e d a n d b r in g in g th e m o n lin e has o ft e n p r o v e d t o b e a fo r m id a b le ta sk (A ls h a w i et a l. 1 9 8 9 ).
T h is d is c r e p a n c y b etw e e n th e d a ta b a s e re p re s e n ta tio n o f a d ic t io n a r y a n d th e p r in te d , ty p o g r a p h ic a l, re p re se n ta tio n is q u ite u n fo r tu n a te a n d v a rio u s step s h ave b e e n ta k en t o c lo s e th e g a p . T h is is c u r r e n tly n o t t o o d ifficu lt a ta sk a n d I w an t t o d iscu ss h ere b rie fly h o w o n e c o u ld a ch ie v e th is a im w ith T ^ .
A p r o g r a m m in g la n g u a g e su ch as m a k es it p o s s ib le t o c o d e th e meinu- s c r ip t a t a n y level o f a b s tr a c t io n w h ich o n e fin d s m o s t co n v e n ie n t. T h e p rim itiv e s
w h ich d ea ls w ith a re fo r th e m o s t p a rt t y p o g r a p h ic a l o n e s , as a lr e a d y d is cu ssed . H ow ev er, it is b y n o m ea n s n e ce ssa ry t o u se th e se p r im itiv e s d ire ctly . L et m e illu stra te th is b y ta k in g th e e n tr y fr o m th e Ic e la n d ic -D a n is h d ic t io n a r y sh ow n in fig u re 4 as an e x a m p le . T h is c a n b e ty p o g r a p h ic a lly c o d e d as fo llo w s in (th e p h o n e t ic tr a n s c r ip tio n h a s b e e n left o u t ) :
\bold{letur (-urs,} pi. ds.) [...] n. 1. a. Skrift, Typer:
\ital{gotneskt, latneskt 1.; færa e-d i letur}, optegne n-t,
f^re i Pennen; \ital-fsett 1.}, en slags Halvfraktur,
nærmende sig til Schwabachertypen. --- \bold{*b.}
\ital-fleturs land}, Papir (BöluHJ . 255);
\ital{letra rolla} (egl. Typefaar) (BöluHj. 217)
= \ital-(prentsmidja}.
T h is e x a m p le sh o u ld b e m o s t ly s e lf-e x p la n a to r y . T h e in s tr u c tio n s \ b o l d a n d \ i t a l c h a n g e r e s p e c tiv e ly t o th e b o ld a n d it a lic fo n ts . T h is re p r e s e n ta tio n is fa irly c lo s e t o th e o n e g iv en o n th e slip s th e m se lv e s, as d e p ic te d in fig u re 4. N o t e in cid e n ta lly th e s o m e w h a t stra n g e u se o f fo n ts in th e first lin e w h e re p a ren th eses
d o n o t b a la n c e c o r r e c t ly w ith re s p e c ts t o fo n ts . T h is u se is p r o b a b ly q u ite n a tu ral fo r th e p rin te r (w h o h as, a fter all, b e e n ta u g h t th a t a d e lim ite r c h a r a c te r , fo r e x a m p le , sh o u ld b e lo n g t o th e sa m e fo n t as th e p r e c e d in g t e x t ) . T o s o m e o n e a c c u s to m e d t o th e n o tio n s o f ‘ b lo c k in g ’ a n d ‘ e n v ir o n m e n ts ’ fr o m c o m p u t e r s c ie n c e th is m a n n er o f fo n t c h a n g e d o e s seem illo g ic a l.
I f w e ca re t o a n a ly z e th e e x a m p le fr o m a fu n c tio n a l p e r s p e c tiv e , w e ca n ea sily see th a t it c o n ta in s a n u m b e r o f d ifferen t c a te g o r ie s . T h e r e is th e h e a d w o r d , w h ich is p rin te d in b o ld ty p e , a n d s o is th e g r a m m a t ic a l e n d in g s ig n ify in g th e g en itiv e. H ere w e h a v e a n e x a m p le , ev e r s o c o m m o n in d ic tio n a r ie s , o f o n e fo n t
322
Computational Linguistics — Reykjavik 1989
A d ifferen t w a y o f c o d in g w o u ld b e t o c o d e th e ca te g o r ie s d ir e c t ly w ith o u t a n y re fe r e n c e w h a ts o e v e r t o th e ir ty p o g r a p h ic a l im p le m e n ta tio n . T h is a p p r o a c h , w h ic h h a s q u ite a s h o r t h is to ry , h as b e e n v a rio u s ly n a m e d ‘ lo g ic a l’ o r ‘g e n e r ic ’ c o d in g , a n d c a n th u s b e d is tin g u is h e d fr o m th e v isu a l c o d in g sh ow n a b o v e . G e n e r ic c o d in g h a s r e c e n tly r e c e iv e d in c r e a s e d a tte n tio n th r o u g h th e sta n d a rd
iz a t io n o f th e S G M L (S ta n d a r d G e n e r a liz e d M a r k u p L a n g u a g e ) (IS O 1986, B a r r o n 1 9 8 9 , B r y a n 1 9 8 9 ). S im ila r c o n c e p t s h a v e b een e x p re sse d in o th e r la n g u a g es a n d fo r m a tte r s , th o u g h S G M L ca rries it t o its lo g ic a l c o n c lu s io n : S G M L is sim p ly a m a n n e r o f c o d in g a m a n u s c r ip t, a n d h a s re a lly n o th in g d o d o w ith ty p e s e ttin g , o r d a ta b a s e m a n ip u la tio n . It d o e s , h o w e v e r, e m b o d y a m a n n e r o f rep resen tin g th e s tr u c tu r e s w h ic h a re t o b e fo u n d in a p a r tic u la r d o c u m e n t.
In p a r tic u la r , as re g a rd s I ^ , L eslie L a m p o r t ’s m a c r o p a ck a g e is v ery m u c h g e a r e d to w a r d s lo g ic a l c o d in g (L a m p o r t 1986; see a lso L a m p o r t 1988).
is a m a c r o p a ck a g e u sed fo r g e n e ra l d o c u m e n t p r o c e s s in g . It uses th e c o n c e p t o f s e p a r a te ‘s ty le file s ’ t o c a p tu r e th e d ifferen t fo r m a ttin g n eed s o f re p o rts , a r tic le s , b o o k s , e t c . F u r th e r m o r e , it d efin es c a te g o r ie s su ch as ‘ title s ’ , ‘ s e c tio n s ’ , ‘ c h a p te r s ’ , ‘f o o t n o t e s ’ , a n d s o fo r t h t o e x p re s s th e d ifferen t lo g ic a l ca te g o r ie s o f d o c u m e n ts .
T h e m a c r o la n g u a g e is su ch th a t o n e c a n ea sily im p le m e n t m a c r o s t o a n y d e g r e e a f a b s tr a c t io n r e q u ire d . U sin g su ch an a p p r o a c h , it w o u ld b e e a sy en o u g h t o c o d e th e a b o v e e x a m p le fr o m S igfu s B lo n d a l’ s d ic t io n a r y in th e fo llo w in g
m a n n e r (I h a v e fo r m a tte d it h ere fo r ea sier r e a d a b ility ):
\ h w o r d { l e t u r } ( \ d e c l { - u r s } , \ i i { p l . d s . } ) \ p h o n { [ . . . ] } \ p o s { n . }
\ s e n s e { l . }
\ s u b s e n s e { a . } \ t r a n s { S k r i f t , T y p e r } : \ V
\ e x a n i p l - { \ i c - { g o t n e s k t , l a t n e s k t 1 . ; f a r a e - d l e t u r } , \ d a { o p t e g n e n - t , f ^ r e i P e n n e n } } ;
\ e x a m p H \ i c - [ s e t t 1 . } ,
\ d a { e n s l a g s H a l v f r a k t u r , næ rm ende s i g t i l S c h v a b a c h e r t y p e n } } .
---\ s u b s e n s e { * b . }
\ e z a m p l { \ i c - [ l e t u r s l a n d } , \ d a { P a p i r } \ s o u r c e { B 6 1 u H j. 2 5 5 } ; \ e i a m p l - C \ i c f l e t r a r o l l a } \ d a { ( e g l . T y p e f a a r ) }
\ s o u r c e { B 6 1 u H j . 2 1 7 ) } = \ x r f - ( p r e n t s m i d j a } .
\ s e n s e { 2 . }
T h is , I h a ste n t o a d d , is ju s t a d e m o n s tr a tio n o f th e m a n n e r b y w h ich it w o u ld b e p o s s ib le t o p r o c e e d . In p a r tic u la r , in n o w a y is th is c o d in g b a s e d u p o n a s t u d y o f th e e n tries in th is d ic t io n a r y , a s tu d y w h ich it w o u ld b e n e cessa ry t o u n d e r ta k e i f it w e re d e s ire d t o c o d e th e d ic t io n a r y in th is m a n n er.
T h e c a te g o r ie s m e n tio n e d a b o v e s h o u ld b e e a s y e n o u g h t o u n d e rsta n d sin ce th e y h a v e b e e n g iv e n n a m e s w h ich a re fa ir ly s e lf-e x p la n a to r y (t h e c a te g o r ie s \ i c
n ecessa ry t o g iv e d e ta ile d e x p la n a tio n s fo r ea ch o f th e m . It is, o f c o u r s e , im m e d ia te ly a p p a re n t th a t th e m a n u s crip t g e ts c o n s id e r a b ly m o r e c o m p lic a t e d w h en su ch a s y s te m o f c o d in g is e m p lo y e d . A ft e r a ll, a lo t o f c a te g o r ie s a re d e lim it e d w h ich w ill n o t fin d a n y p a r tic u la r re a liz a tio n in th e p r in te d t e x t . B y w o r k in g fr o m su ch a m a n u s crip t it is m u ch ea sier t o set u p a o n e -t o - o n e re la tio n s h ip w ith a d a ta b a s e re p re se n ta tio n w h ich o f c o u r s e is c o n s id e r a b ly m o r e d iffic u lt w h en d e a lin g o n ly w ith a v isu a lly c o d e d m a n u s c r ip t.
T h e a s tu te rea d er w ill p r o b a b ly o b je c t t o th e c h o ic e o f te rm s fo r th e e n tries
la b e lle d \ s e n s e a n d \ s u b s e n s e in th e a b o v e e x t r a c t , sin c e th e se o n ly refer t o n u m b ers a n d le tte rs a n d c a n n o t s t r ic t ly b e s a id t o d e n o t e th e sen se. T h is is, o f co u rs e , tr u e . In th is ca se it w o u ld h a v e b e e n b e t t e r t o la b e l th e w h o le p a ssa g e
b e lo n g in g t o th e p a rticu la r sen se, le a v in g o u t th e n u m b e rs a n d le tte rs a n d le t tin g a ssign th e se a u to m a tica lly . T h e p o in t h ere is s im p ly th a t it is p o s s ib le t o a p p r o a c h th e ta sk o f c o d in g in d ifferen t w a y s, a n d it is d ifficu lt t o s p e c ify o n c e a n d fo r all a fin ite set o f c a te g o rie s th a t w ill ta k e c a r e o f a ll th e e n titie s o n e c o u ld c o n c e iv a b ly w an t t o cod e.^
O n e e x a m p le w ill illu stra te th is. T h e e t y m o g ic a l d ic t io n a r y , lik e a ll o f its k in d , co n ta in s n d ifferen t a c c e n ts w h ich h a v e t o b e c o d e d fo r . In 1 ^ , a c c e n ts a re ex p re sse d w ith s p e c ia l m a c r o s w h ich m a k e u se o f a n \ a c c e n t p r im itiv e . T h u s o n e w o u ld w rite \ = a t o g e t
‘a’,
w h e re th e \= sig n ifies a fio a tin g b a r a c c e n t, o r \ ’ a t o g e t a e t c . B u t th is c o m m a n d w ill n o t a lw a y s g iv e th e c o r r e c t resu lt. T h u s if o n e a tte m p ts t o p u t a n a c u te a c c e n t o n t o p o f a ‘ k ’ b y w r itin g \ ’ -( k } th e resu lt is L T h e c o r r e c t v e rsio n sh o u ld lo o k lik e ‘ k ’ . T h is reflects a lim it a tio n o f th e \ a c c e n t p r im itiv e in w h ich ca n b e c ir c u m v e n te d b y w r itin g s p e c ia lp u rp o s e m a cro s fo r le tte rs lik e ‘ k ’ .
T o o b ta in th is e ffe ct it is n e cessa ry t o w r ite a s p e c ia l p u r p o s e m a c r o in
H ow ever, in th a t ca se , it is o f co u r s e n e ce s s a ry t o k n o w a b o u t th e fo n ts b e in g used fo r ty p e s e ttin g . O n e o f th e m a jo r p rem ises o f g e n e r ic m a r k u p is th a t su ch
k n o w le d g e is n o t n ecessa ry , in d e e d it is n o t n e ce ssa ry t o k n o w h o w th e te x t w ill e v e n tu a lly b e u sed , say, w h e th e r it w ill b e p r in te d o r p u t in to a d a ta b a s e .
6.1
V isu al C o d in g said D irect M an ipu lation
T h e a p p r o a c h t o c o d in g w h ich h as b e e n d e s c r ib e d h ere, is la n g u a g e -b a s e d a n d th u s c o n tr a s ts v e r y m u ch w ith th e ‘ d ir e c t m a n ip u la t io n ’ a p p r o a c h w h ich h a s in recen t y ea rs b e e n p o p u la r iz e d e s p e c ia lly o n th e M a c in to s h c o m p u t e r . A s re g a rd s ty p o g r a p h y , th e d ir e c t m a n ip u la tio n a p p r o a c h en ta ils th a t th e u ser p o in ts t o o r ‘ click s ’ o n w o rd s o r le tte rs o n th e scre e n a n d th e n t y p ic a lly c h o o s e s th e rele vant fo n t fr o m a m en u . T h is w as th e p a tte r n o f u sa g e w h ich w as e m b o d ie d in
M a c W rite , th e a r c h e ty p ic a l M a c in to s h w o r d -p r o c e s s in g p r o g r a m . T h e e ffe c ts o f th e fo n t ch a n g es c o u ld b e im m e d ia te ly seen o n th e scre e n , in a WYSIWYG ‘ W h a t y o u see is w h a t y o u g e t ’ r e p re se n ta tio n . T h e u ser in te r fa c e w as im m e d ia te ly h a iled as a b r e a k th r o u g h , w h ich o f c o u r s e it w a s, a n d y e t, as tim e h as s h o w n , it has its p ro b le m s . T h is c a n b e seen in th e e v o lu tio n o f w o r d -p r o c e s s in g p r o g r a m s
324
Computational Linguistics — Reykjavik 1989
fo r th e M a c in to s h w h ich te n d t o m o v e th e m c lo s e r t o a la n g u a g e -b a se d rep re s e n ta tio n . T h u s th e n o t io n o f ‘ sty le s h e e ts ’ , a n id e a b o r r o w e d fr o m B ria n R e id ’s p r o g r a m S crib e, h a s n ow b e e n c a rrie d o v e r in to a lm o s t e v e ry w o r d -p r o c e s s in g
p r o g r a m f o r th e M a c in to s h (R e id a n d W a lk e r 1 9 8 0 ). U sin g sty le sh eets, it b e c o m e s p o s s ib le t o m a r k s e c tio n s in a s e m i-g e n e ric o r lo g ic a l m a n n er. U n fo rtu n a te ly th e n o t io n o f s ty le sh eets o n ly a p p lie s t o p a ra g ra p h s, a n d is th u s useless
fo r th e m a k in g o f d ic tio n a r ie s w h e re o n e is m a in ly in te re ste d in c a te g o rie s a t a m u ch fin er g r a n u la r ity (i.e . s u b -p a r a g r a p h c a te g o r ie s ).
A s d e m o n s tr a te d in th is p a p e r , a la n g u a g e -b a s e d fo r m a tte r lik e ca n
e a s ily b e a c c o m m o d a t e d t o a m a n u s crip t g e n e r a te d fr o m a d a ta b a s e a n d thus it c a n d e a l w ith c a te g o r ie s a t a n y lev el. I ca n s ta te w ith o u t h e sita tio n th a t ou r e x p e r ie n c e u sin g h a s s h o w n th a t it is e m in e n tly su ited fo r le x ic o g r a p h ic w o rk .
References
Alshawi, Hiyan, Bran Boguraev, and David Carter. 1989. Placing the Dictionary On- Line. Bran Boguraev and Ted Briscoe [Eds.]. Computational Lexicography for Natural Language Processing:41-63. Longman, London.
Altsys Corporation. 1989. Fontographer, Users’s Guide. Plano, Texas. Barron, David. 1989. W hy use SGM L? Electronic Publishing, 2 (l):3 -2 4 . Blondal, Sigfus. 1923. fslensk-donsk ordabok. Reykjavik.
Bryan, Martin. 1988. SGML: An A uthor’s Guide to the Standard Generalized Markup Language. Wokingham, Addison-Wesley.
The DAN LEX Group. 1987. Descriptive Tools fo r the Electronic Processing o f Dictio nary Data. Lexicographica, Series M ajor, 20. Max Niemeyer Verlag, Tubingen.
Healy, A. diPaolo. 1985. The Dictionary o f Old English and the Final Design o f its Computer System. Computers and the Humanities, 19:245-249.
ISO. 1986. International Standard 8879: Standard Generalized Markup Language (SGM L), s.l.
Knuth, Donald E. 1984a. Literate Programming. Computer Journal, 2 7 (2 ):9 7 -lll. Knuth, Donald E. 1984b. The T ^ b ook . Addison-Wesley, Reading, Massachusetts. Knuth, Donald E. 1986a. TRX: The Program. Computers and Typesetting, vol B.
Addison-Wesley, Reading, Massachusetts.
Knuth, Donald E. 1986b. The METfiFONT6oofc. Computers and Typesetting, vol C. Addison-Wesley, Reading, Massachusetts.
Knuth, Donald E. 1986c. METRFONT: The Program. Computers and Typesetting, vol D. Addison-Wesley, Reading, Massachusetts.
Knuth, Donald E. 1986d. Computer M odem Typefaces. Computers and Typesetting, vol E. Addison-Wesley, Reading, Massachusetts.
Knuth, Donald E. 1986e. Remarks to Celebrate the Publication o f Computers and Typesetting, TUGboat 7:95-98.
Lamport, Leslie. 1988. Document Production: Visual or Logical. TUGboat 9:8-10. Liang, Franklin M. 1983. Word Hy-phe-na-tion by Computer. Report STAN-CS-83-977.
Stanford University, Department o f Computer Science.
Pind, Jörgen. 1986. The Computer Meets the Historical Dictionary. Nordisk DATAnytt 16(10):41^3.
Find, Jörgen. 1988. Umbrotsforritid Islenskun t>ess og gildi vid ordabokargerd. Ord og tunga, 1:175-219.
Plass, Michael, and Donald E. Knuth. 1982. Choosing Better Line Breaks. Jurg Niever- gelt, Giovanni Coray, Jean-Daniel Nicoud, and Alan C. Shaw [Eds.]. Document Preparation Systems: A Collection o f Survey Articles:221-242. North-Holland, Amsterdam.
Reid, Brian K., and Janet H. Walker. 1980. Scribe: Introductory Users’s Manual. [3. ed.) Unilogic, Pittsburgh.