• No results found

What Should be Included in a Commercial Word Data Base, and Why?

N/A
N/A
Protected

Academic year: 2020

Share "What Should be Included in a Commercial Word Data Base, and Why?"

Copied!
14
0
0

Loading.... (view fulltext now)

Full text

(1)

W hat Should be Included in

a Commercial Word Data Base,

and W hy?

T h e o r ig in a l title , w h ich s h o u ld b e c o n s id e r e d as s y n o n y m , w as:

U s e r c o n s id e r a tio n s an d m a r k e t stra teg y as basis f o r p r o ­

filin g c o n t e n t in co m p u ta tio n a l w ord bases, w ith sp ecia l r e ­ gard to th e N o rw eg ia n T erm B a n k ’s data base N O T an d a m u ltilin g u a l N o rw eg ia n w ord list

Abstract

In the article I present definitions o f the concepts quality and quality assurance, which are basis for proposing a general definition o f quality of word bases. The quality o f word bases is the specifications that customer and producer agree upon, and includes linguistic and other user relevant considerations. This is exemplified with the revision work o f a term record format, and work with a word base o f everyday language.

1

Introduction

In th is p a p e r I w ill p r o p o s e g u id e lin e s th a t c o u ld b e w o r th a im in g a t in o r d e r t o c o m p ile w o r d b a se s th a t th e u sers w ish , n e e d , a n d w ill p a y fo r . T h e p resen ta tion o f th e g u id e lin e s w ill b e b a s e d o n a p r e s e n ta tio n o f th e q u a lity a ssu ra n ce c o n c e p t, w h ic h is b e c o m in g v e r y im p o r ta n t in th e in d u s tr y a n d th e se rv ice s e c to r . T h is w ill b e e x e m p lifie d w ith s o m e w o r d b a ses, i.e. c o m p u t e r b a se d w o r d lists a n d t e r m in o lo g ic a l d a t a b a ses. I w ill e m p h a s iz e th a t th e p re se n te d p r o p o s a ls are n o t a c o m p le t e s o lu tio n , b u t I h o p e th e y w ill p resen t s o m e id ea s fo r fu rth e r w ork .

T h e id e a s w ill b e b a s e d o n e x p e r ie n c e w ith te r m in o lo g ic a l d a t a b a ses at th e N o r w e g ia n T e r m B a n k ( N T , N o r s k t e r m b a n k ), a n d w ith w o rk o n w o r d lists b a s e d o n e v e r y d a y la n g u a g e a t th e D e p a r t m e n t o f S ca n d in a v ia n L a n g u a g e s a n d L ite r a tu r e (N o r d is k in s t it u t t ) in c o o p e r a t io n w ith th e N o rw e g ia n C o m p u t in g C e n t r e f o r th e H u m a n itie s (N A V F s e d b -s e n te r fo r h u m a n istisk fo r s k n in g ). I h ave b e w o r k in g in o r in c lo s e c o n t a c t w ith th ese in s titu tio n s fo r s o m e y ea rs, a n d b a se

(2)

Ivar Utne: Commercial Word Data Base

359

th e d e s c r ip tio n s o n th e cu rren t te r m r e c o r d fo r m a t o n d o c u m e n ts p r o d u c e d b y p erson n el a t N T . M u c h o f th ese a c tiv itie s , a n d e s p e c ia lly th o s e o f t e r m in o lo g ic a l w o rk , h a v e b e e n b a s e d s o le ly o n s u p p o r t fr o m th e in d u s tr y a n d u sers o u ts id e th e U n iversity.

N o t e th a t th e term p r o d u c e r in th is p r e s e n ta tio n a lw a y s w ill m ea n p r o d u c e r o f p ro d u cts an d s e r v ices, a n d th e te r m p r o d u c t w ill a lw a y s m ea n p r o d u c t and

s erv ice.

2

W ord Base Quality — Quality Assurance

In o r d e r t o d esig n w o r d b a ses a n d t o g e t la n g u a g e s e rv ice s c o n tr a c ts in c lu d in g c o m p ila t io n o f lists as p a rt o f th e m , w e n eed t o h a v e a s tr a t e g y t o s u c c e e d in th e m a rk e t.

T h e m a in p o in ts fo r su ch a m a r k e t str a te g y c o u ld b e d e s c r ib e d as tra n sfe r o f p r o d u c t s a n d serv ices t o a m a rk et w ith r e g a rd to :

( a l ) U sers’ w ish es a n d n eed s

(a 2 ) Id e n tifica tio n o f n e w p a y in g u ser g r o u p s a n d th e ir w ish es a n d n eed s

(a 3 ) C o m p a r e user w ish es a n d n eed s w ith th e n e e d s o f th e resea rch ers t o g a in c o o p e r a t io n a n d /o r c o o r d in a t io n in o r d e r t o g iv e m o r e r e so u rce s t o resea rch a n d t o im p r o v e th e p r o d u c t

(a 4 ) P r ic e s , w h ich w ill n o t b e fu r th e r d e v e lo p e d h ere

A c c o r d in g t o th is I w ill c o n c e n tr a te o n th e q u a lity o f th e p r o d u c t s , w h ich is th e m o s t im p o r ta n t p rem ise o n a n e ffe c tiv e m a rk e t stra te g y .

Q u a lity o f p r o d u c t s (a n d se r v ic e s ) is a c c o r d in g t o th e I n te r n a tio n a l S ta n ­

d a r d iz a tio n O r g a n iz a tio n d efin ed as:

T h e t o t a lit y o f fea tu res a n d ch a r a c te r is tic s o f a p r o d u c t o r s e rv ice th a t b e a r o n its a b ility t o sa tis fy s ta te d o r im p lie d n eed s. (IS O 8 4 0 2 - 1 9 8 6 :2 )

M o r e s h o r tly it c o u ld b e s ta te d as:

A c c o r d a n c e w ith w h a t s p e c ific a tio n s th e c u s to m e r a n d p r o d u c e r h a v e a g reed u p o n .

A n o t h e r re la te d c o n c e p t is qu ality a ssu ra n ce ( Q A ) . A c c o r d in g t o th e sa m e sta n d a rd Q A is d e fin e d as:

(3)

W h ile IS O stresses th e re la tio n b e tw e e n p r o d u c e r a n d c u s to m e r , N orw eg ia n P e tr o le u m D ir e c t o r a t e (O lje d ir e k t o r a t e t ) stresses s a fte y fo r p e rs o n n e l (a n d s o c i­ e t y )

2

m d N o r w e g ia n S o c ie ty fo r Q u a lity ( N F K , N o r s k F o re n in g fo r K v a lite t) in s o m e p u b lic a t io n s a ls o e x p re sse s th e im p o r t a n c e o f p r o fit as resu lt o f e ffectiv e p r o d u c t io n ro u tin e s .

N o r w e g ia n P e tr o le u m D ir e c t o r a t e s ta te s a s p e c ts o f Q A in R eg u la tion s c o n ­ c e r n in g th e l i c e n s e e ’s in te r n a l c o n tr o l in p e tr o le u m a c tiv itie s o n th e N orw eg ia n

C o n t in e n t a l S h e lf w ith c o m m e n t s (O lje d ir e k to r a te t 1 9 8 5 a :3 ):

T h e r e g u la tio n s w ill b e a p p lic a b le t o w ork er p r o t e c t io n a n d th e w ork ­ er e n v ir o n m e n t w ith in th e s c o p e o f th e A c t c o n c e r n in g w ork er p r o ­ t e c t io n a n d w o rk e r e n v ir o n m e n t (t h e W o r k in g E n v iro n m e n t A c t ) .

T h e r e g u la tio n s w ill a ls o b e a p p lic a b le t o p r o t e c t io n a g a in st p o llu ­ tio n in p e tr o le u m a c tiv itie s w ith in th e s c o p e o f th e A c t co n c e r n in g p o llu t io n a n d refu se (t h e P o llu t io n A c t ) .

T h e Q A r e q u ire m e n ts re la te d t o w o rk e r e n v iro n m e n t a n d p u b lic co n sid e ra ­ tio n s is n o t y e t p r e c is e ly fo r m u la te d in th e le g isla tio n a n d re g u la tio n s . A n in stru c­ tiv e p r e s e n ta tio n o f Q A a n d w o rk e r e n v iro n m e n t is H e lle sø y 1988 (N o rw e g ia n t e x t ) .

N o r w e g ia n S o c ie t y fo r Q u a lity s ta te s th eir v ie w s o n p r o fit fo r in s ta n c e in:

th e title o f a b o o k le t c a lle d P r o fit by qu ality ( T h e N o rw e g ia n o rig in a l t itle is: L ø n n s o m h e t ved k v a lite t), N F K 1987a.

th e h e a d in g “ C o r r e c t p e r fo r m e d q u a lity a ssu ra n ce in crea ses th e p r o ­ d u c t iv it y ” ( T h e N o rw e g ia n o r ig in a l te x t is: “ R ik t ig u tfø r t k v a litets­ sik r in g ø k e r p r o d u k t iv it e te n ” ) in a n o th e r b o o k le t (T h e N orw eg ia n o r ig in a l title is K v a lite t o g K v a lite ts s ty r in g w h ich m ea n s Q u a lity ^lnd Q u a lity C o n t r o l, N F K 1 9 8 7 b ).

T h is m e a n s th a t Q A im p lie s e x is t e n c e o f s y s te m a tic a lly c o n tr o lle d rou tin es t o e n s u re th a t:

( b l ) F o r th e p r o d u c e r - c u s t o m e r r e la tio n : A p r o d u c t w ill b e c o m p le te d a c c o r d in g t o w h a t is a g re e d b e fo r e h a n d , w h ich im p lie s r o u tin e s t o secu re th a t th e o r d e r is u n a m b ig u o u s a n d th a t th ere e x is ts r o u tin e s d u rin g p r o d u c tio n t o en su re th a t th e p r o d u c t io n a c ts to w a rd s th e a g reed g o a l.

( b 2 ) F o r th e e m p lo y e r s : T h a t th e p r o d u c e r h as as e ffe c tiv e r o u tin e s as p o ssib le , t o en su re lo w c o s t fo r b o t h p a r ts (e m p lo y e r ).

( b 3 ) F o r th e e m p lo y e e s : T h e p e r s o n n e l im p lie d w o r k tak es p la c e w ith o u t an y la c k o f s e q u r ity a n d a c c o r d in g t o w o r k e n v ir o n m e n t req u irem en ts.

(4)

Ivar Utne: Commercial Word Data Base

361

In o r d e r t o a p p ly th is o n w o rk w ith w o r d b a ses w e h a v e t o d e c id e w h a t is cen tra l a s p e c ts o f w o r d b a se q u a lity a n d h o w c u s to m e r a n d p r o d u c e r c a n a g ree o n w h a t a lte rn a tiv e esp ression s o r sty le a n d o th e r n o n -lin g u is tic s p e c ific a tio n s th a t a re t o b e d e fin e d as g o a ls.

T h e s p e c ific a tio n s fo r w o r d b a s e q u a lity s h o u ld c o n s id e r a t lea st la n g u a g e , user relev a n t in fo r m a tio n a n d u ser in te rfa ce . T h e c o n s id e r a tio n m a y in c lu d e d if­ feren t va lu es, e .g . th a t cu ltu r a l c o n s id e r a tio n s a n d r h y tm ic la n g u a g e m a y b e o f lo w o r n o im p o r ta n c e ; m o r e in d e ta il:

( c l ) G o o d a n d c o m m u n ic a tiv e lan gu age

G o o d s t y le /p e r fo r m a n c e (a e s t h e t ic a n d cu ltu r a l c o n s id e r a tio n s w ith p o s s i­ b le co n s e q u e n ce s fo r e c o n o m ic s )

- R h y tm ic o r h a rm o n iu s la n g u a g e

- C u ltu ra l c o n s id e r a tio n s , tr a d itio n o f “ g o o d ” w r itte n la n g u a g e - S u b cu ltu ra l c o n s id e r a tio n s , e .g . firm o r o th e r lo c a l s ta n d a r d s

E ffe ctiv e in fo r m a tio n tra n sfer (e c o n o m ic a n d a d m in is tr a tiv e c o n s id e r a ­ tio n s ) m ean s:

- R e q u ire m e n ts fo r th e s e le c tio n a n d fo r m a tio n o f te rm s a c c o r d in g t o IS O 7 0 4 -1 9 8 7 (E ):1 2 -1 3 a re th a t th e te rm s s h o u ld b e :

* lin g u is tica lly c o r r e c t

“ th e te rm sh o u ld c o n fo r m t o th e n o r m s o f la n g u a g e in q u e s tio n ” ( o p .c it .) , e .g . le tte rs {k s in s te a d o f x ) a n d in ­ fle c tio n p a ra d ig m s

>t> a c c u r a te , o r m o tiv a te d

“ th e te rm sh o u ld re fle ct, as fa r as p o s s ib le , th e c h a r a c ­ teristics o f th e c o n c e p t w h ich a re g iv e n in th e d e fin itio n ” ( o p .c it .) , e .g . m a g n e tic ta p e w h ich is d efin ed as “ ccirrier o f a m a g n e tic r e c o r d in g , h a v in g th e fo r m o f a t a p e ” (b a s e d o n o p .c i t .)

* c o n c is e

i.e. p recisen ess

* p e r m it, if p o s s ib le , th e fo r m a tio n o f d e riv a tiv e s

“ a lc o h o l — a lc o h o lic , a lc o h o lis m , a lc o h o liz e ” ( o p .c it .) * m o n o s e m o u s , i f th e te rm s a re c o n s id e r e d fo r s ta n d a r d iz a tio n

th ere sh o ld b e o n ly o n e s ta n d a r d iz e d te r m fo r a c o n c e p t , a n d o n ly o n e c o n c e p t fo r o n e te rm

- N a tiv e (e .g . N o r w e g ia n ) v ersu s fo r e ig n (e .g . E n g lis h ) e x p r e s s io n s * a n a tiv e e x p re s s io n m a y b e p r e fe r a b le b e c a u s e o f a c c u r a c y a n d

b e c a u s e it is lin g u is tic a lly c o r r e c t (in th e n a tiv e la n g u a g e ) * a fo re ig n e x p re s s io n m a y b e p r e fe r a b le in in to r n a tio n a l c o m m u ­

(5)

- “ n o is e fre e ” o r neutraJ e x p re ssio n s

* n o t s tig m a tiz in g in th e a c tu a l s u b c u ltu r e

* re la te d t o relev a n t tr a d itio n

( c 2 ) U s e r r e le v a n t in fo r m a tio n

R e fe r e n c e t o s t a n d a r d /a u t h o r it y d o c u m e n t s /p u b lic a t io n s , i.e. c o n t r o llin g / p r e s c r ip t iv e d o c u m e n ts (p u b lic a tio n s fr o m la n g u a g e c o u n c ils o r a ca d em ies, s ta n d a r d s )

N e w u se o f s y m b o ls a n d o th e r e x p r e s s io n s , w h ich u su a lly a re n o t in clu d e d in d ic tio n a r ie s

A d d it io n a l co n s e n s u s w ith s u b je c t /u s e r d efin ed g r o u p s

( c 3 ) U ser in te r fa c e in a d a t a b a se sy s te m

T r a n s p o r t a b le p r o g r a m a n d d a ta , in c lu d in g c o p y p r o te c tio n

S tra tifie d s e le c tio n o f in fo r m a tio n re la te d t o th e p u r p o s e S im p le a n d lo g ic a l u ser d ia lo g u e

F req u en t u p d a t e , e s p e c ia lly w h en la n g u a g e serv ices in te ra ct in m u ltip u r­ p o s e p r o je c t s

F o r th e im p lie d in terest g r o u p s in th e c o lle c t io n o f Q A d e fin itio n s listed a b o v e th is m e a n s fo r;

( d l ) C u s to m e r s :

T o d e c id e w h a t w ill b e th e g o a ls w e h a v e t o c la r ify a n d c o o r d in a t e th e n e e d s a n d w ish es o f th e c u s to m e r s w ith th e w o r d b a s e k n o w le d g e /e x p e r tis e . C e n tr a l in te rfa ce s m a y b e liste d like th is:

- W h a t th e u sers w a n t a n d n eed

- W h a t th e users w a n t, b u t d o n ’ t n e e d , e .g .:

* a lte r n a tiv e s y n o n y m s in th e ta rg e t la n g u a g e - W h a t th e u sers n e e d , b u t d o n ’ t a sk fo r , e .g .:

* c o n s is te n t t e r m in o lo g y w ith o u t u se o f s y n o n y m s in th e sa m e lan ­ g u a g e

* c o n s e q u e n t u se o f s u b s ta n d a r d s (- n o r m s ), su ch as B ritish E n glish w ith o u t U S fo r m s

* d e fin e d su b s e ts o f N o rw e g ia n

* a c c o m m o d a t io n t o e x is t in g s ta n d a r d s a n d re g u la tio n s

In o r d e r t o d e c id e th e q u a lit y re q u ire m e n ts o f th e p r o d u c t th e cu s to m e r a n d p r o d u c e r m u st b e in d ia lo g u e ;

- T o c h o o s e a n d d e sig n e x p r e s s io n s a c c o r d in g t o th e lin g u istic req u ire­ m e n ts a b o v e , c f. ( c l ) .

(6)

Ivax Utne: Commercial Word Data Base

363

* In w o r d lists: th e s e le c tio n o f la n g u a g e s , re fe re n ce s, d e fin itio n s e tc .

* In th esa u ri: a n y r e g is tr a tio n o f d e le t e d s u b je c t w o r d s /c o n c e p t s a fter re v is io n o f a n o ld th esa u ru s

— T o d esig n th e in fo r m a tio n a c c o r d in g t o n eed s a n d s u b je c t k n o w le d g e

* In te r m in o lo g ic a l d a t a b a ses: s h o u ld th e a d d itio n a l s y n o n y m s , referen ces, o r c o n t e x t e x c e r p ts b e left o u t

* In w o r d lists: e .g . th e s e le c tio n o f v a ria n ts (s t y le ) w ith in N o r w e g - ia n -N y n o rs k

(d 2 ) E m p lo y e r s :

B a c k u p rou tin es

E ffe c tiv e to o ls

R e fe r e n c e in fo r m a tio n

H ou sek eep in g o f in fo r m a tio n fo r o th e r p r o je c t s ^lnd fo r resea rch

(d 3 ) E m p lo y e e s :

E ffe ctiv e a n d u ser M e n d ly t o o ls fo r a u to m a t iz in g , t o g e t rid o f b o r in g w o rk a n d h a v e o v e r v ie w a n d c o n tr o l

P r o p e r p r o c e d u r e d e s c r ip tio n s

E a sy a cce s s t o relev a n t in fo r m a tio n

U n a m b ig u o u s re feren ces

(d 4 ) S o c ie ty :

C o n s id e r a tio n o f c u ltu r e valu es, a c c o r d in g t o th e p rev a len t valu es

P u b lic c o n s id e r a tio n s , e.g . p r o p e r la n g u a g e f o r e ffe c tiv e c o m m u n ic a t io n w h ich m a y im p ly a p r e c o n d itio n fo r sa fe ty a n d g o o d h ea lth

In th e n e x t s e c tio n I w ill a p p ly th e se d e fin itio n s a n d p r in c ip le s t o th e w o r d ba se w o rk a t m y in s titu tio n . F or a d is cu ssio n o f q u a lit y a ssu ra n ce fo r la n g u a g e w ork in gen era l it is referred t o U tn e 1 987 (N o r w e g ia n t e x t ) .

3

Record Format for a Terminological Data

Base

(7)

T h e t e r m in o lo g ic a l w o r k h a s u s u a lly b e e n p a rt o f m u ltip u r p o s e p r o je c t s a im ­ in g a t c o s t e ffe c tiv e d o c u m e n t p r o d u c t io n . L a n g u a g e se rv ice s h ave in m o st o c c a r sio n s b e e n lo o k e d u p o n as a to t a lit y in th is c o n t e x t . It is v e ry u n u su al th a t th e s u p p o r te r s g iv e g r a n ts fo r d e v e lo p m e n t o f d ic tio n a r ie s . F or th e b ilin g u a l d ic t io ­ n a ries th e e x c e p t io n s h a v e b e e n p r o je c t s fo r o il c o m p a n ie s , m o s t ly in th e p e r io d 1 9 8 4 -8 6 . T h e u p d a t e o f th e d a t a is p a r tly fin a n ce d b y s u b s c r ip tio n o n c o n tin u ­ o u s ly u p d a t e d c o p ie s . N T is m a in te n a n c e c o n t r a c t o r fo r a th esa u ru s d e v e lo p e d b y N T in 1 9 8 5 -8 6 .

A r a d ic a l r e s tr u c tu r in g o f th e d a t a b a se fo r m a t fo r th e te r m in o lo g ic a l d a ta b a s e w a s p e r fo r m e d in 1988. A p r e s e n ta tio n o f th is re s tr u c tu r in g w h ile in p ro ce ss is p re se n te d in E b e lin g a n d U t n e 1988.

In th e fo llo w in g I w ill ex cim p lify a p p lic a tio n o f th e q u a lity a ssu ra n ce c o n c e p t o n th e r e s tr u c tu r in g o f th is d a t a b a se. T h is a p p lic a tio n is b a s e d o n m y p o in t o f v ie w w h ich is p a r tly fr o m o u ts id e . T h e p r o c e s s w as in p r a c tic e n o t p la n n e d an d p e r fo r m e d a c c o r d in g t o th e p r in c ip le s p re se n te d b e lo w , b u t h as in fa c t fo llo w e d m o s t o f th e m . M y p r e s e n ta tio n w ill b e an a p p lic a tio n c o n n e c te d t o a p o ssib le a n d re a lis tic stra te g y .

L o t s o f c o n s id e r a tio n s a b o u t q u a lit y o f lin g u istic e x p re s s io n s a re n o t d e p e n ­ d a n t o f th is r e v is io n o f fo r m a t a n d a re th e r e fo r e n o t listed b e lo w . T h e g en era l leading p r in c ip le s fo r th e re v is io n h a v e b e e n t o g a in b e tt e r q u a lity o f:

( e l ) L a n g u a g e, b y :

E r r o r free d a t a t o a la rg er e x te n t

(e 2 ) U s e r r e lev a n ce, b y :

U n a m b ig u o u s c la s s ific a tio n o f g r e a te r p a rt o f th e d a ta

M o r e fle x ib le in t r o d u c t io n o f n ew c a te g o r ie s /c la s s ific a tio n

M o r e stress o n s ta n d a r d iz a tio n

(e 3 ) U s e r in te r fa c e , b y :

M o r e fle x ib le p r e s e n ta tio n , e x c e r p t io n a n d in t r o d u c tio n o f differen t a n d m o r e fin e g r a in e d d a t a ty p e s

Im p r o v e d d is t r ib u tio n

T h is im p lie d s o m e m o r e a p p lie d le a d in g p rin cip le s (w ith o u t e x p la in in g th e lin k s t o th e g e n e ra l p r in c ip le s in d e ta il):

( f l ) M o r e s t r ic t ly c o n s t r u c t e d h ie ra rch ica l str u c tu r e

G o a l : T o m a k e m o r e fle x ib le e x c e r p ts o f su b se ts o f d a t a p o s s ib le , i.e. to e x t r a c t d ifferen t c o m b in a t io n s o f field s a n d p a r ts o f field s, a n d in tro­ d u c e m o r e u n a m b ig u o u s lin k s b e tw e e n a te rm a n d its a b b r e v ia tio n , r e fe re n ce o r its c o n t e x t

(8)

Ivax Utne: Commercial Word Data Base

365

- A b b r e v ia t io n s , c o n tr a c tio n s , s y m b o ls a n d fo r m u la s a re u n a m b ig u ­ o u s ly b o u n d t o th e ir fu ll fo r m s a n d n o t o n ly t o th e ir c o n c e p t - R e fe r e n c e s a re u n a m b ig u o u s ly b o u n d t o th e ir te r m ( o r a b b r e v ia ­

tio n e t c .) , c o n t e x t - e x c e r p t s , d e fin itio n s e tc

- C o n t e x t s a re u n a m b ig u o u s ly b o u n d t o th e ir te r m s e t c

- C o m m e n ts a re u n a m b ig u o u s ly b o u n d t o all relev a n t k in d s o f field s

(f2 ) M o r e fo r m a l n o ta tio n la n g u a g e

Goal:

B e m ore c o n s e q u e n t in re g is tr a tio n o f a ll k in d s o f in fo r m a tio n , i.e. m o r e fo rm a l a n d r e s tr ic te d fo r m a ts fo r th e in fo r m a tio n ty p e s , t o m a k e it easier t o e x tr a c t e x p r e s s io n s fr o m th e s a m e p a g e o r p a g e s in a sp e cifie d s o u r c e o r t o e x tr a c t te rm s fr o m a s p e cifie d s u b je c t a re a

S o l u t i o n : D efin ed v o c a b u ljir y (e .g . title s ) a n d s y n ta x fo r re feren ces P r o g r a m t o o ls a c c o r d in g t o th e g o a l

(f3 ) A p r o d u c t iv e n o ta tio n s y s te m t o in c lu d e u ser relev a n t in fo r m a tio n

Goal:

B e m o re fle x ib le in in tr o d u c in g o f n e w in fo r m a tio n cla ss e s, i.e. th e p o s s ib ility o f in tr o d u c in g n e w in fo r m a tio n cla sses a c c o r d in g t o s im p le a n d d efin ed r o u tin e s

S o l u t i o n : S u p p le m e n ta r y in fo r m a tio n n o t tr a d itio n a lly in c lu d e d in d ic t io ­ n aries, is in tr o d u c e d b e c a u s e o f th e u sefu ln ess f o r th e ta r g e t g r o u p s . T h is m ean s f o r in s ta n c e th a t th e re is in t r o d u c e d a d is t in c tio n b e ­ tw een gen era l a b b r e v ia t io n s , p r o je c t s p e c ific a b b r e v ia t io n s , s y m b o ls , cla ss ifica tio n c o d e s a n d fo rm u la s. S o m e o f th e c a te g o r ie s a re;

- In te rn a tio n a l sta n d a r d iz e d s y m b o l; s h o w n , re fe rre d t o a n illu stra ­ tio n , a n d / o r d e s c r ib e d

- C h e m ic a l/m a t h e m a t ic a l fo rm u la s, e .g .

NH

4

HSO

3 fo r a m m o n iu m

h yd rogen su lfite

- In te r n a tio n a l cla s s ifica tio n c o d e s , e .g . a c c o r d in g t o U N (U n ite d N a t io n s ), E C (E u r o p e a n C o m m u n it y ), C A S (C h e m ic a l A b s t r a c t S e r v ic e s ) a n d w id e ly u sed n a tio n a l s ta n d a rd s

- T r a d e n a m es fo r p r o d u c t s o f th e c o n c e p t

- R e fe r e n c e s t o officia l d o c u m e n ts , e .g . s ta n d a r d s , reg isters, law s a n d re g u la tio n s

- H a z a rd cla s s ific a tio n , e .g . fire, p o is o n

- R e fe r e n c e s t o fig u res w h ich illu s tra te th e c o n c e p t s

- A r e a o f a p p lic a tio n (n o t u n iq u e ly fo r th is d a t a b a s e ), f o r h ou s e­ k eep in g , c o lle c tin g su b se ts a n d fo r in d ic a t in g m e a n in g .

(f4 ) E x is te n c e o f p r o g r a m s a n d o th e r r o u tin e s fo r e r r o r ch eck s

(9)

S o l u t i o n : P r o g r a m t o o ls a c c o r d in g t o th e g o a l

(f 5 ) S ta n d a r d iz a tio n

G o a l : A s it h a s b e e n a g o a l fo r y e a rs, o n e c o n c e p t is o n e r e c o r d

S o l u t i o n : S y n o n y o m s in th e sa m e la n g u a g e a re c o lle c te d in th e sa m e r e c o r d .

G o a l : A s it h a s b e e n a g o a l fo r y ea rs, th e re sh o u ld b e o n ly o n e p referred fu ll fo r m te r m fo r ea ch la n g u a g e .

S o l u t i o n : T h is is c a lle d m a in te r m , a n d th e o th e r s a re ca lle d s y n o n y m s o r d e p r e c a te d term s.

G o a l : T h e fo r m a t s h o u ld a ls o in c lu d e re g is tra tio n o f a lte rn a tiv e sta n ­ d a rd s , lik e fo r m e r m a in te rm , a n d m a in term s in o th e r sta n d a rd s.

S o l u t i o n : I n t r o d u c t io n o f field s fo r a p p r o v a l d a te a n d s c o p e o f a sta n d a rd . T h is in c lu d e s a ls o o u t o f d a te s ta n d a rd s.

(f 6 ) T r a n s p o r t a b le p r o g r a m a n d d a t a

G o a l : P r o g r a m a n d d a t a files p r o d u c e d fo r d ifferen t m a ch in es a n d m ed ia

G o a l : R o u t in e s fo r c o p y p r o t e c t io n

S o l u t i o n : P r o g r a m t o o ls fo r b o t h th e se g o a ls.

4

W ord Lists

T h e d e v e lo p m e n t o f w o r d lists o f e v e r y d a y la n g u a g e w as in itia te d b y th e re­ sea rch ers a t th e U n iv e r s ity a b o u t 2 0 y e a rs a g o . T h e p r o je c t is a c o o p e r a tio n b e tw e e n N T a n d N o rw e g ia n C o m p u t in g C e n tr e fo r th e H u m a n itie s. F u rth er d e­ v e lo p m e n t is fin a n c e d b y sa le. C u s t o m e r s a re m o s t ly s o ftw a re h ou ses, in stitu tio n s a n d firm s (g r a p h ic in d u s tr y a n d n e w s p a p e r s ) w h ich a re a b le t o in c lu d e th e lists in e x is t in g so ftw a r e o r t o d e v e lo p s ta n d a lo n e p r o g r a m s .

T h e q u a lit y o f th e se lists is p a r tly b a s e d o n g e n e ra l n eed s fo r s p e llin g lists, s p e c ia l a n d g e n e r a l n e e d s fo r lists w ith h y p h e n a tio n m a rk s (w ith s p e c ia l refer­ e n c e t o d ifferen t le v e ls) a n d o u r e s tim a tio n o f u ser n e e d s fo r o th e r lists w ith g r a m m a t ic a l in fo r m a tio n a n d lists o f w o r d p a r ts . T h e d e v e lo p m e n t o f sp ellin g lists h a s b e e n b a s e d o n o u r o w n a s s u m p tio n th a t a fr e q u e n c y b a sed a n d gen era l c o r r e s p o n d e n c e v o c a b u la r y w o u ld su it th e u se rs’ n eed s b e st. T h e d e v e lo p m e n t o f lists w ith g r a m m a t ic a l in fo r m a tio n a n d w o r d c o m p o s it a is b a s e d o n p o in t (a 3 ) in m a rk e t s tr a t e g y a b o v e , i.e. c o m b in a t io n o f resea rch ers’ a n d c u s to m e r s ’ n eeds. T h e d e v e lo p m e n t o f lists w ith s u b s ta n d a r d s a ls o a c c o m m o d a t e s n eed s a n d w ish es fo r a c c o m m o d a t io n t o s u b s ta n d a r d s a n d m o r e p e r s o n a l w a y s o f w ritin g .

(10)

Ivar Utne: Commercial Word Data Base

367

• S pellin g w ord lists fo r N orw eg ia n -B ok m S J a jid fo r N o r w e g ia n -N y n o r s k , in ­ c lu d in g a c o lle c tio n o f h ig h fr e q u e n c y w o r d fo r m s w ith o u t a n y fu r th e r c o d in g .

• W o r d lists w ith h y p h en a tio n m a rks, b a s e d o n s p e c ia l n e e d , b u t a ls o w ith g e n era l a p p lic a tio n . T h e m a rk s e x p re s s d ifferen t lev els, s o th a t th e d ifferen t ty p e s o f m a rk s b o r d e r s b e tw e e n :

— C om p ou n d s, like d a ta -l-b a se (w h ic h is w r itte n as o n e w o r d in N o r w e ­ g ia n )

— P r e - and s u ffic e s a n d th e rest o f th e w o r d lik e e x = p l o s = i o n (N o r w e ­ g ia n : e k s = p lo = s jo n )

— I n fle c tio n m o r p h e m e s , like th e d e fin ite p lu ra l in b i l / / e n e ( = th e c a r / /

T h e fu r th e r d e v e lo p m e n t in clu d e s p a r tly fu rth e r refin em en t o f th e lists a b o v e a n d d e v e lo p m e n t o f n ew w o r d list ty p e s p a r tly b a s e d o n a c o m b in a t io n o f w h a t is ask ed fo r in n ew p r o d u c ts fo r te x t p r o c e s s in g o r d a t a b a s e t o o ls w ith d ia lo g u e s in n a tu ra l la n g u a g e , a n d w o r k w ith m a c h in e -a id e d tr a n s la tio n w ith w h a t is ask ed fo r , c f. c o m b in a t io n o f c u s to m e r n eed s a n d w ish es w ith th e re se a rch e rs’ as m a rk et stra teg y. T h a t m ea n s lists co n ta in in g :

• W o rd com p osita , e .g . p a rts o f la tin a n d g re e k lo a n w o rd s .

- T h e m a ch in e a id e d s y s te m m a y u se lists lik e th is t o tr a n s la te lo a n w o r d s w h ich h a v e id e n tic a l p a rts , b u t re la te d in fle c tio n a l p a r a d ig m s .

- In te x t p r o c e s s in g sy ste m s th is m a y b e u sed as p a r t o f h y p h e n a tio n p r o g r a m s , a n d c o m b in e d w ith s u p p le m e n ta r y ru les p a r t ly a ls o as p a rt o f s p e ll ch eck ers.

• W o r d lists in c lu d in g g ra m m a tica l in fo r m a tio n , like p a r t o f s p e e c h a n d in ­ fle c tio n a l p a ra d ig m s.

- T h e m o s t im p o r ta n t p a rt o f a m a ch in e a id e d tr a n s la tio n .

- In te x t p r o c e s s in g s y s te m s a n d e s p e c ia lly in d ia lo g u e b a s e d d a t a b a se t o o ls th is m a y b e u sed in p r o g r a m s th a t a re b a s e d o n s im p le r s y n ta x a n a ly sis o r c a lc u la tio n s o f p o s s ib le p a rt o f sp e e ch fo r w o r d s in a te x t strin g .

A n d as a c o m b in a t io n w ith w o r k t o s y s te m a tiz e su bsta n d a rd s in th e w r itte n N orw eg ia n la n g u a g es:

(11)

a s u b s ta n d a r d . T h is im p lie s a g r e a t v a riety o f p o s s ib le w ritte n lan ­ g u a g e s w h ich c o m p u t e r s y s te m s s h o u ld b e a b le t o c o n tr o l. T h e e x a c t d e fin itio n s o f ea ch su ch la n g u a g e a re n o t o b je c t iv e ly s ta te d , b u t are t o s o m e e x te n t a m a t te r o f p e r s o n a l o r u ser g r o u p s d e cisio n s . T a b le 1 a n d 2 in U t n e 1 989 (p a p e r a t N o rd isk e d a talin gvistikd age 1 9 8 9 ) p re se n t s o m e e x a m p le s o f th is d iv ersity . S o m e o f th e e x a m p le s are re p e a te d in T a b le 1 b e lo w . A fu r th e r e x p la n a tio n o f th e la n g u a g e s itu a tio n is p re s e n te d in U t n e 1989.

L a n g u a g e p o r r id g e lin e p r o b le m s b o y s

M o d . N o r w .-B o k m å l: g r ø t lin je p r o b le m e r g u tte r R a d . N o r w .-B o k m å l: g ra u t lin je p r o b le m g u tte r R a d . N o r w .-N y n o r s k : g ra u t lin je p r o b le m gutau: M o d . N o r w .-N y n o r s k : g ra u t lin e p r o b le m g u ta r

T a b le 1. S p e llin g a n d in fle c tio n in N o rw e g ia n ( M o d . = M o d e r a te , R a d . = R a d ic a l)

T o s o m e e x te n t th is s u b s ta n d a r d w o rk s as i f th ere a re s lig h tly d if­ fe re n t w r itte n s u b la n g u a g e s in sid e ea ch o f th e tw o officia l N o r w e ­ g ia n la n g u a g e s . T h e r e is a ls o s o m e treidition fo r u n o fficia l w ritte n s ta n d a r d s , e .g . o n e m o r e m o d e r a te th a n N o r w e g ia n -N y n o r s k ca lled C o n s e r v a tiv e N o r w e g ia n -N y n o r s k , a n o th e r m o r e m o d e r a te th a n N o r - w e g ia n -B o k m å l ca lle d C o n s e r v a tiv e N o r w e g ia n -B o k m å l (N o rw e g ia n : R ik s m å l) a n d a th ir d o n e b e tw e e n th e t w o o ffic ia l la n g u a g e s w h ich is s o m e t im e s c a lle d P a n -N o r w e g ia n (N o r w e g ia n : S a m n o r s k ). O f th ese th r e e C o n s e r v a tiv e N o r w e g ia n -B o k m å l h as th e w id e st u se w ith its u se in a t le a st o n e o f th e m o s t w id e s p r e a d n e w sp a p e rs, in lo ts o f b o o k s a n d p u b lic a t io n s e v e r y y ea r.

T h e list c o n ta in in g a d iv e r s ity a fo r m a lte rn a tiv e s w ith in ea ch o f th e la n g u a g e a n d a ls o t o s o m e e x te n t o t h e r u n o fficia l la n g u a g e s ta n d a rd s ca n b e co n s id e re d as a m u ltilin g u a l d ic tio n a r y . T h is t o t a l w o r d b a se c o n c e p t , w h ich a t th e tim e b e e in g c o n ta in s b e tw e e n 2 0 a n d 3 0 0 0 0 e n tries (w h ic h c a n b e in fle cte d a c c o r d in g t o d if­ feren t a lte r n a tiv e s ) is th e b a s e o f s u ch a m u ltilin g u a l N o rw e g ia n w o r d b a se. T h is b a s e w ill b e d e v e lo p e d b o t h fo r resea rch , in c lu d e d m a ch in e a id e d tra n sla tion , a n d fo r c o m m e r c ia l a p p lic a tio n s .

O t h e r p o s s ib ilitie s n o t w o r k e d o u t in d e ta il y e t a re fo r in sta n ce:

S y n o n y m s a n d w o r d s w ith re la te d m e a n in g

D e p r e c a t e d w o r d s a n d e x p r e s s i o n s , a n d th e ir s u b s titu tio n s

(12)
(13)

Utne, Ivax. 1988: Terminologi, arbeidsinstrukser og lagerstyring — om kodeuttrykk i fagspråk. Nordiske Datalingvistikdage og Symposium fo r datamatstøttet leksiko­

grafi og terminologi 1987. Proceedings:273-285. Institut for Datalingvistik, Han­

delshøjskolen i København. Copenhagen.

Utne, Ivar. 1989: Machine aided translation between the two Norwegian languages Norwegian-Bokmål and Norwegian-Nynorsk. Nordiske datalingvistikdage 1989. Reykjavik. In press.

S tr ø m g a te n 53 N -5 0 0 7 B E R G E N N o rw a y

Appendix

F r o m “N O T ” U s e r ’s G u id e :2 , v er. 1 9 8 9 -0 5 -2 3

The Record Format

T h e te r m s in th is d a t a b a s e a re o r g a n iz e d in c o n c e p t r e c o r d s , w h ich co n sist o f o n e N o r w e g ia n s e c tio n , o n e E n g lish s e c tio n a n d o n e s e c tio n th a t is c o m m o n t o b o t h la n g u a g es.

T h e N o r w e g ia n s e c tio n co n s is ts o f th e fo llo w in g fields:

N h v d = N o r w e g ia n m a in te r m (r e c o m m e n d e d fo r u se)

N sy n == N o r w e g ia n s y n o n y m t o th e sa m e c o n c e p t ( t o b e a v o id e d )

N fr a = N o r w e g ia n s y n o n y m n o t r e c o m m e n d e d fo r u se (m u st n o t b e u s e d )

N k rt N o r w e g ia n c o n tr a c tio n (u s e d fo r rea son s o f s p a c e , as fo r in s ta n c e in scre e n p ic tu r e s , d ra w in g s, sign s e t c .)

N frk = N o r w e g ia n a b b r e v ia t io n N d e f = N o r w e g ia n d e fin itio n

(14)

Ivar Utne: Commercial Word Data Base

371

T h e E n g lish s e ctio n c o n ta in s th e fo llo w in g fields;

E h v d = E n g lish m a in te rm

E sy n = E n g lish s y n o n y m t o th e s a m e c o n c e p t ( t o b e a v o id e d ) E fra E n g lish s y n o n y m n o t r e c o m m e n d e d fo r u se (m u s t n o t

b e u sed )

E krt = E n g lish c o n t r a c t io n (u s e d fo r rea son s o f s p a c e , as fo r in sta n ce in screen p ic tu r e s , d ra w in g s, sig n s e t c .)

E frk = E n g lish a b b r e v ia t io n E d e f = E n g lish d e fin itio n

k on = c o n t e x t o f term k om = c o m m e n t t o te rm

g e o = g e o g r a p h ic a l d is t r ib u tio n o f te r m r e f = r e fe re n ce o f term

fig = r e fe re n ce t o fig u re

T h e c o m m o n s e c tio n c o n s is ts o f in fo r m a tio n a b o u t th e c o n c e p t as a w h o le :

b rk = a re a o f a p p lic a tio n

s b l = in te r n a tio n a l s ta n d a r d iz e d s y m b o l fm l = c h e m ic a l/m a t h e m a t ic a l fo r m u la n r = n a tio n a l a n d in te r n a tio n a l n u m b e rs n v n = tr a d e n a m e

k o m — c o m m e n t t o th e c o n c e p t as a w h o le k on = c o n t e x t o f s y m b o l, fo r m u la e tc . k o m = c o m m e n t t o s y m b o l, fo r m u la e tc . r e f = r e fe re n ce o f s y m b o l, fo r m u la e tc .

fig = refe re n ce t o fig u re o f s y m b o l, fo r m u la e tc .

Figure

fig =

References

Related documents