• No results found

C on tin u ity for M etric Validation

As show n in S e c tio n 4.7. th e d o m ain m e tric d e r iv a tio n p ro cess is guided by d o m ain - specific know ledge. In th is process, som e d ecisions a n d c o m p ro m ise s have to be m ade. N onetheless, no m a t t e r how th e d o m ain m e tric is d eriv e d , e it h e r th ro u g h th e process ju st d escribed o r o th e r a p p ro a c h e s , we need to v a lid a te it.

T h e v a lid a tio n is b a se d o n th e m a th e m a tic a l n o tio n o f c o n tin u ity . T h e c o n tin u ity of a d a ta fu n c tio n d e p e n d s o n th e n eig h b o rh o o d s tru c tu re s o f b o t h its d o m a in space a n d value space (D e fin itio n 3 .3 .1 4 ). S ince th e n eig h b o rh o o d s t r u c tu r e o n th e value space is u su ally known in ad v a n ce, th e g o al o f d o m ain m etric d e riv a tio n is to d e riv e a p q -m etric su ch th a t the n e ig h b o rh o o d s tr u c t u r e o n th e d om ain sp ace in d u c e d by th e p q -m e tric w ould m ake th e d a ta fu n c tio n c o n tin u o u s. T h u s , a p q -m etric c a n b e v a lid a te d if it preserves th e co n tin u ity of th e d a t a fu n c tio n . In [34], we define c o n tin u ity in sev eral d iffe re n t m a th e m a tic a l se ttin g s and d e sc rib e its s e m a n tic s in th e co n tex t o f d a ta an a ly sis. W e d e m o n s tra te th e re la tio n sh ip betw een d a t a c o n tin u ity a n d m e tric validity in th e follow ing e x a m p le .

E xam ple 4.8.1 .4.? in the p rim e num bers exam ple o f S e c tio n /,.6. let

X = { 9 . 1 1 . 1 2 . 1 6 . 2 0 . 2 5 . 3 0 . 4 9 }

be the d o m a in space o f in te re st. We define a data fu n c tio n f : A' — ► C . where C' = { a .t) .c \.

as follow s:

/ (9) = a f ( l l ) = a / (12) = b / ( 1 6 ) = «

/ ( 2 0 ) = h f ( 25) = a / (30) = c / (49) = «

The in tu itio n o f f is th a t f ( p ) is an indication o f the n u m b e r o f d iffe re n t observable prop­ erties p possesses, so Po = { 1 2 .1 6 .2 0 .3 0 } . = { 9 . 1 2 . 3 0 } . P$ = { 2 0 .2 5 .3 0 } . and

PiU = { 1 1 .4 9 } . T h u s. a . b. a n d c correspond to one. tw o. a n d three observable proper­ ties respectively. In o rd e r to evaluate the c o n tin u ity o f f . we have to know the g eo m etries o f d o m a in space X a n d value space C . Let us specify the va lu e g eo m etry by the topology

flC = {0. {c}. [b. c}. {a. b. c } }. The idea o f LIC is that the o p en s { a .6 .c } . {ft.c}. and {r}

N p.V p E C rep resen t the s m a lle s t neighborhood o f p. W e have N n = {a. b. c}. N/, = j h. c j-. and N r = { c }. .4. p q -m e tr ic <l\ on d o m a in X can be defined as d-,\{p.q) = \K p — /v(/j. such that Vx € X . x is the p ro d u c t o f K x p r im e num bers (fo r in sta n ce. K g = 2. K n - 1. A’12 =

3 ___). N o te that {c} is a neighborhood o f f (30) = c. H ow ever, f o r everg neighborhood \ l

o f 30 induced by d \ . { 1 2 .2 0 . 3 0 } C M a n d { b .c \ C f [ M \ . T h u s, f is n o t co n tin u o u s with

respect to d.\ a n d d,\ is n o t va lid f o r f .

On the o th e r h a n d , i f w e use th e pq-m etric d-> in S e c tio n 4 . 6 as the d o m a in m etric, it is easy to v e r ify that f is c o n tin u o u s w ith respect to d> a n d d-> is va lid f o r f . |

[n o rd e r to v a lid a te a m e tric d e riv e d for C A M in E x am p le 4.7.1. a ll we have to do is to check w h e th e r th e d a t a fu n c tio n is c o n tin u o u s o r n o t. U n fo rtu n ately , th is is n o t p o ssib le w ith o u t fu rth e r d o m ain -sp ec ific k n o w le d g e o n th e value space. R ecall t h a t th e v alues in th e value space o f C A M d e p e n d o n th e d o m a in m e tric a n d th e choice o f g e o m e tric c e n te r. We really do not have th e ■’tru e " v alu es (i.e.. v alues in d ep en d e n t o f th e d o m a in m e tric ) in th e value* space for us to e v a lu a te t h e c o n tin u ity . T h u s , th e v a lid a tio n c a n o n ly b e d o n e if we have an em p irica l d a t a set w h ich p ro v id e s th e "true"' values in th e valu e sp a c e . A ltern a tiv e ly , we* can form a p a n e l o f c o n s u lta n ts a n d p sychologists to give us th e ir e x p e r t e s tim a tio n o f th e "tru e" values in th e valu e sp a c e . B ased on th e ir o pinions, we c a n c o n c lu d e w h e th e r a given d o m ain m e tric is valid o r n o t.

M etric v a lid a tio n is p a r t o f th e " tria l-a n d -e rro r" p ro c e ss involved in know ledge discovery. G iv en a d a t a fu n c tio n , th e re a r e tim es w hen a m a th e m a tic a lly valid d o m a in m e t­ ric is e x tre m e ly difficult (o r c o m p u ta tio n a lly expensive) to d eriv e. In su c h cases, we m ight w ant to a c c e p t a p a rtia lly va lid d o m a in m e tric such t h a t th e d a t a fu n c tio n is c o n tin u o u s a t a large su b s e t o f th e d o m a in .

4.9

Sum m ary

S cientific d a t a is c h a ra c te riz e d by rich a n d com p lex in te rre la tio n s h ip s , especially in te r-in sta n c e re la tio n s h ip s . B ased o n d a ta -a s-fu n c tio n s a n d p s e u d o -q u a s im e tric s , th is ch a p ­ te r p re sen ts a fo rm al m a th e m a tic a l m o d e l su ita b le for m o d elin g c o m p le x in te rre la tio n sh ip s in scientific d a ta . C o m p a re d to th e s p a tia l d a t a m odels a n d e x is tin g scien tific d a t a m odels arisin g in c o m p u ta tio n a l flu id d y n a m ic s a n d scientific v is u a liz a tio n , th e p ro p o se d m odel offers m ore flex ib ility a n d g e n e ra lity .

In a d d itio n to th e m a th e m a tic a l fo u n d a tio n , we p re se n t a d e ta ile d a p p r o a c h for m e tric d e riv a tio n . T h e a p p ro a c h itse lf is also useful as a p a ra d ig m for know ledge d iscovery from th e m e tric p e rs p e c tiv e . S ince o u r m o d el is b a se d o n form al m a th e m a tic a l se m a n ­ tics. th e re s u lts c a n b e fo rm a lly v alid ated . T h e n o tio n o f c o n tin u ity is used as a precise m a th e m a tic a l to o l for v a lid a tin g th e re su lts o f th e m e tric d e riv a tio n process, e ith e r for d a ta m o d e lin g o r k n o w led g e disco v ery p u rp o se s.

F rom th e k n o w led g e discovery p e rsp e c tiv e , we b eliev e th e m e tric -b a se d d a t a m odel has tre m e n d o u s p o te n tia l as th e fo u n d a tio n for d ev e lo p in g v a rio u s d a ta m in in g m ech a n ism s. In p a r tic u la r, th e p ro c e ss for m e tric d e riv a tio n b ase d o n o b se rv a b le p ro p e rtie s ca n b e very v alu ab le for d a t a m in in g in ca te g o rica l d a t a w hich a re p e rv a siv e in social sciences.

7!J

C hapter 5