A Method for Extracting Rules from Incomplete Information System

(1)

A Method for Extracting Rules from

Incomplete Information System

Yidong Lan 1;2

Department of Mechanics and Engineering Science

PekingUniversity

Beijing,100871, China

Lin Zhang, Liancheng Liu 3

CIMSEngineering Research Center

TsinghuaUniversity

Beijing 100084, China

Abstract

In this paper we propose a method of extracting rules in Incomplete Information

System based onan irregular decisiontable. Neednotto make estimationof

miss-ing attribute value, we get the rule set contains no missing attribute value. The

experimentjustify,theaccuracyofobtainedrulesetisalmostsamewiththehighest

accuracy among those oftheestimating missingattributevaluemethods.

Key words: Roughset, IncompleteInformationSystem, irregular

decisiontable.

1 Introduction

One of the main tools of data analysis is rule induction from raw data

rep-resented by a database. Among many dierent data analysis theories and

methods, rough sets [9] based methods are the new promising ones and have

beensuccessfullyappliedtovarious areasbecause oftheirtheoretic simplicity

and practicability. Thereareoftensome erroneous,incomplete,uncertainand

vaguedata inreal-lifedata. In this paperwewilldiscussone formof data

in-completeness: missing attribute values. Forthe convenience of data analysis,

dataisusuallyrepresented byatwo-dimensiontable. Thistablewillbecalled

1

Supportedby NationalScience Foundationof China(No.60047015)andNational

High-TechnologyDevelopmentProgramofChina(No.863-511-930-004)

2

Email: [email protected]

3

Email: [email protected]

CC BY-NC-ND license.

Open access under

(2)

a Knowledge Represent System,or anInformation System. When containing

the missing attribute values,it is called an Incomplete Information System.

There are several approaches to extract rules from Incomplete Information

System. The simplest among them consist in removing examples with

un-knownvalues. Another simpleone istoreplaceunknown value withthe most

common value[1]. Other more complex approaches were presented in [4], [6],

[8],[10], [11]. Except the method of removing examples with missing values,

all the above-mentionedmethods replacethe missingvalues according to the

distribution of attribute values of the other examples in decision table. In

this paper, we propose a method to deal with missing attribute valuesbased

on irregulardecision table: this method can use the informationof examples

with missing attribute value themselvesto reduce the uncertaintyof rules.

2 Preliminary

2.1 Rough Set

TheinformationsystemcanbeformulatedasaquadrupleS =(U;A;V

a ;f a ) a2A ,

where U is a non-empty nite set of objects called the universe, A is a

non-emptynite setof attributes. 8a2A, V

a

istheset ofvaluesofa,f

a

:V !V

a

is the function between U and V

a .

For any ; 6= B A, the binary relation I

B = f(x;y) 2 U 2 : 8a 2 B;f a (x)= f a

(y)g is called indiscernibility relation, it is re exive, symmetric

and transitive,so isalsoanequivalence relation. I

B

can bedenoted asI if B

is understood.

8x 2 U, I is an indiscernibility relation on U, I(x) = fy 2 U : xIyg

is called an indiscernibility class. [x]

I

denotes an indiscernibility class of I

contains an elementx2U.

The set of indiscernibility class, C

1 ;C

2

;:::;C

s

, which is induced from I

aboveU, satises: C i 6=;;i=1;2;:::;s; C i \C j =;;i6=j;1i;j s; s S i=1 C i =U.

Then I forms a partition of U, is denoted as P(I). If X U and I is

an indiscernibility relation on U, then the lower approximation (of X byI)

is dened as X

I =

S

fI(x) : I(x) Xg, and the upper approximation by

X I = S fI(x):I(x)\X 6=;g. If X I 6=X I

,the pair of the formhX

I 6=X

I iis

called a rough set. X

I

is called the positive region of X, U nX

I the negative region, X I nX I the boundary.

LetK =(U;A)beaninformationsystem andletC;DAbetwosubsets

of attributes, called condition and decision attributes respectively.

(3)

a decision table and will be denoted T = (U;C;D;V a ) a2C[D . With every x2U, weassociate a functiond x :A!V a , such that d x

(a)=a(x), for every

a2 C[D;Thefunction d

x

willbecalled adecisionrule(inT),and xwillbe

referred to asa labelof the decision rule d

x .

Ifd

x

is a decisionrule, then the restriction of d

x to C, denoted d x jC, and the restrictionof d x toD,denoted d x

jDwillbecalled conditionsand decision

ofd

x

respectively. A decisionrule oftenhas theform!', whereand 'is

conditions and decision,respectively. Fora decisionrule !', itsaccuracy

is dened as: j\'j

jj

, where j \' j denotes the number of data that match

both and ', j j denotes the number of data that match . When its

accuracy is 1, the rule is consistent, otherwise inconsistent. In this paper we

calldecision rule as rule forconvenience.

2.2 Variable PrecisionRough Set

Variable precision rough set[11 ] is the extension of the original rough set.

Through setting the threshold value, it is loosen the strict denition of

approximation boundary. For a givendecision table T = (U;C;D;V

a )

a2C[D

and the threshold 1 < 1, I is indiscernibility relation on U, we dene

X I = S jI(x): jI(x)\Xj jI(x)j

jthe lower approximationset, X I = S jI(x): jI(x)\Xj jI(x)j

>1 j the upper approximation set, respectively.

Similarly,roughrulesetcanbedenedas: Q !dP(I Q ) P(I d );Q C, then < X;Y >2 Q !d , X Y I Q , Q det !d = f< X;Y >2 Q !d : X Y I Q

g is consistent rule, positive region of Q !d is: V 1 = S fX 2P(I Q ):<X;Y >2Q det !dg.

3 Data Analysis Method based on Rough Set

Oneaimofdataanalysismethodbasedonroughsetistoinducedecisionrules

fromrawdata. Forgivenrawdata,selectingdierentindiscernibilityclasscan

constitute dierent decision rule sets. The mining algorithm of rough

deci-sion rulesistoselectsuitableindiscernibilityclassesaccordingsome criterion,

nally getting the optimal decision rule set. There are many criterions to

evaluatethe obtaineddecisionruleset, suchasqualityofclassication[6]and

entropy [2].In this paper we discuss the latter.

3.1 Entropy of Information

In information theory, entropy can be used to measure the uncertainty of

informationsystem.

(4)

prob-ability distribution of X is fp 1 ;p 2 ;p 3 ;:::;p s g, s P i=1 p i = 1, the entropy of X is: H(X)= s X i=1 P i log 2 1 p i : Foran information system L =< U;;V q ;f q > q2 , each indiscernibility

relationI inU canbeseenasanstochasticvariable,itsvalueisindiscernibility

class X 1 ;X 2 ;:::;X s

,formed by the partitiontoU byI,P(I). Itsprobability

distributioncan beevaluatedas: p(X^ =X

i )= jX i j jUj ;i=1;:::;s. Similarlythe

entropy of partitionP(I) isdened as followed:

Denition 3.2 For each indiscernibility relation I in U, the indiscernibility

class, formed byitspartition toU, P(I), is X

1 ;X

2

;:::;X

s

, then the en tropy

partition P(I) is: H(I)= s P i=1 jX i j jUj log 2 jUj jX i j

. Foreverydata object inU, which

is partitionedby anindiscernibilityrelation,H(I) evaluatesthe average least

times we should take to make certain this data object belonging to which

indistinguishingclass.

In [2], Ivo Duntsch and Gunther Gediga use the entropy to measure the

uncertaintyof information system.

Foran informationsystem L=< U;;V

q ;f

q >

q2

, where U isa nite set

ofobjects, isanite setofattribute, foreachq2, V

q

istheset ofvalueof

q, f q is aninformation function U ! V q , Q , d 2 is decision attribute, P(I d

) is a partition of U with classes Y

i

, i k, each having cardinality r

i .

The positive region of Q !d is V = X

1 [X 2 [:::[X c , where X i 2P(I Q ), i=1;2;:::;c. H det (Q!d)= P ic i log 2 ( 1 i )+jUnVj 1 n log 2 (n),where i is

the estimatedprobabilityofanelementofX

i

being inthe classX

i \Y i ,where the cardinality of U is n. S det (Q ! d) def = 1 H det (Q!d) H(I d ) log 2 (n) H(I d ) is normalized

entropy measure function, where H(I

d ) = k P i=0 ri n log 2 ( n r i

). In this method for

every consistent rule in decision table, its entropy is: 1 n log 2 ( n m ), where m

is the number of rules equivalent to this rule, n is the number of all rules;

every inconsistent rule, 1

n log

2

(n). Usually in Incomplete Information System

the rules are induced after replacing missing attribute value and attribute

reduct. Asabove-mentioned,themissingattributevalueofanexampleislled

according tothe distribution of this attribute value in other examples. From

thepointviewofattributereduct,notalltheattributevaluesarenecessaryfor

decision. If the attribute or attribute value is redundant, we needn't replace

the missing attribute value; from the point view of examples classication,

if we can use the present known attribute value to classify the example, the

existenceofmissingattributevaluedoesn'taecttheclassifyabilityofsystem,

the replacing ofmissingvalue isunnecessary,too. Soextractingrulesdirectly

(5)

extraction rules, inother hand can reduce the uncertaintyof rule set.

In Incomplete InformationSystem, the method of measure uncertaintyin

[2] can not be used directly. So we make some modication of that method:

for the rules that haven'tmissing attribute values,we compute its en tropy

as [2]; otherwise the rule's entropy is set tobemaximum, 1

n log

2

(n). Because

both the inconsistent rules and the rules with missing attribute values are

the rules we will remove, so all their uncertainties should be set maximum.

Using this method to compute entropy,bysome rule mining algorithm(such

asgeneticalgorithm), wecan get therules set with missingattribute valueas

less aspossible. One of this kind of rules set is of the form Table 1.

Table 1

Incomplete decisiontable

Condition attribute Decision attribute

U Price Mileage Size Max-speed Quality

1 High Low Full Low Good

2 Low * Full Low Good

3 * * Compact Low Poor

4 High * Full High Good

5 * * Full High Excellent

6 Low High Full * Good

As to the rules that havemissing value,some of them will not result to

in-consistent rules, it means this missing attribute value is redundant attribute

value,the missing attribute valuecan be replaced by, where represents

the valuewe need not to know; As to the rules with missing attribute values

whichwill resultto inconsistent rules, they can beseen as the elements ofan

indiscernibilityclass of optimalattribute set. In thisindiscernibilityclass, we

discern the elements bythe attribute in condition attribute set beside

opti-mal attribute set. In this indiscernibilityclass, we dene consistent rule as

follo wed:

Denition 3.3 In decision table T = (U;C;D;V

a ) a2C[D , Q C is optimal attribute set, x 2 U, z 2 [x] I Q , threshold 0:5 < 1, b 2 C nQ, b i 2 V b , V b is valueset of b, I b

is indiscernibility relation above[x]

I Q , Y 2 P(I d ), b det !d=f<z;Y >2b !d:[z] I b Y I b gis consistentrule in[x] I Q .

In[7],KryszkiewiczM.proposedamethodtoreductthedecisionof

Incom-plete Information System. But this method would result to the inconsistent

rules. Now byan example we illustrate method we proposed in this paper.

(6)

System, bysome ruleminingalgorithm,wecan obtainanincomplete decision

table of the formTable 1, afterattributes reductand redundant rules

remov-ing. In this decision table, for the rules that contain missing attribute value,

if the missingattribute valuewillnot resulttoinconsistentrules, weuse to

replace intable,itmeansthe missingattribute value isredundant attribute

value,suchasrule1,2,3,6. When themissingattribute valuewillresulttothe

inconsistent rules, such as rule 4 and 5, we should nd a suitable attribute

in the reductattributes, fromthe value of this attribute wecan discernthese

inconsistentrules. Inthis way,neednotestimatethemissingattributevalues,

we can get adecisiontable whichcontains no missingattribute value. In T

a-ble 2,the attribute A is reduct attributes, and select to discernrule 4 and 5.

Table 2isdierentfrompopulardecisiontable,not allthedecisionruleshave

the same numberof condition attribute, is called irregulardecisiontable.

Table 2

Irregulardecisiontable

Condition attribute Decision attribute

U A Price Mileage Size Max-speed Quality

1 High Low Full Low Good

2 Low Full Low Good

3 Compact Low Poor

4 A

1

High Full High Good

5 A

2

Full High Excellent

6 Low High Full Good

The algorithm of mining rules in Incomplete Information System, which

contains missing valueas less as possible is selected to be general-purpose

genetic algorithm program GENESIS[3 ]. By this algorithm, we can obtain

the rule set with missing value and optimal attribute set. The algorithm of

obtainingirregular decisiontable is described as followed:

Input:

(i) Original trainingdata set, A;

(ii) the set of rules whichare extracted in trainingdata set with n rules, B;

(iii) rule array I =[1;2;3;:::;n];

(iv) optimal attribute set, D;

(v) the set of attributes whichis reducted in attribute reduct, C;

(vi) irregular decisiontable, E =;.

Output: irregulardecision table E.

(7)

While(I 6=[0;0;0;:::;0])Do

Letc=I(p), wherep is the rst nonzero elementin I.

If the c-th rule contains no missing attribute value then put it into

E. LetI(p)=0.

Else If the accuracy of the c-th rule is 1, then put it intoE. let

I(p) =0.

Else Selectallthe rules whichwillinconsistentwith c-th inB, c

1 ;c 2 ; :::;c i , 0 < i < n, j = 1;2;:::;i, c j

2 I. Select all the examples

in A which match any one of the rule c

j

. Classify these examples

on everyattribute in C. The attribute with the highest accuracy

of classication is selected to be the attribute added in E. From

these examplesextract rules on conditionattributes in D and the

added attribute. Put the extracted rules intoE. Let I(p) = 0,

I(c 1 )=0, I(c 2 )=0;:::; I(c i )=0. Endif EndWhile End 4 Experiment

To testthe utility ofour method,we makeexperimenton U.C.Irvine

Reposi-toryof Machine LeaningData Table. Ourexperimentsare conducted inJava

under Windows2000, using JDK 1.3 and Oracle 8.1.7. Three data sets are

selected: Breast-cancer, Echocardiogram and Hepatitis. In every data set the

experiment is done 10 times to get mean value of results. In the experiment

procedure, each data set is divided into two groups: 50% data are taken as

training set and the remains as testing set. Entropy as tness function,

ge-netic algorithm isselected to be mining algorithmof rules. After getting the

optimalconditionattributeset, usingthe algorithmwepropose toremovethe

missing valuein rules. Table3 shows the comparison of accuracy of rule set

obtained by irregulardecisiontablemethod(IDT)and thehighest oneamong

those ofestimating missingattribute value methods(EMAV).Wecan see, the

tworule sets havethe almostsame accuracy.

Table 3

Comparisonof accuracy ofdierent rulesset

The highest accuracy

ofthe EMAVmethods

The accuracy of IDT method Accuracy dierence Breast-cancer 71:5%[5] 71:7% +0:2% Hepatitis 86:4%[5] 82:4% 4:0% Echocardiogram 93:4%[5] 91.6% -1.8%

(8)

5 Summary

Inthispaperweproposeamethodbasedonirregulardecisiontabletoextract

rules in Incomplete Information System. Need not to estimate the missing

attribute values,we obtain rule set with no missing attribute value. Making

experiments on three database, the result justify the validity of our method,

the obtained rules set has the almost same accuracy with the highest one

among those of the estimating missing attribute value methods.

References

[1]Clark, P., and T. Niblett, The CN2 Induction algorithm, Machine Learning 3

(1989), 261{283.

[2]Dntsch I., and G. Gediga, Uncertainty measures of rough set prediction,

ArticialIntelligence 106 (1998), 109{137 .

[3]Grefenstette, J.J., TechnicalReportCS-82-11,ComputerScienceDepartment,

Vanderbilt University,1984.

[4]Grzymala-Busse, J. W., On the unknown attribute values in learning from

exampless, in: Z.W. Ras and M. Zemankova,eds., Proc. of theISMIS-91, 6th

International Symposiun on Methodologies for Intelligent Systems(Charlotte,

NC,USA,1991),LectureNotesinArticialIntelligence,Springer-Verlag,Berlin

HeidellbergNewYork, 542(1991), 368{377.

[5]Grzymala-Busse, J. W.,and H.Ming , A comparison of several approachesto

missingattributevaluesindatamining,in:Ziarko,W.,andYaoY.,eds.,RSCTC

2000, LNAI2005(2001), 378{385.

[6]Jelonek, J., K. Krawiec and R. Slowinski, Rough set r eductionof attributes

and their domains for neural networks, Computational Intelligence 11 (1995),

339{347.

[7]Kryszkiewicz M., Rough set approach to incomplete information systems,

InformationSciences,112 (1998), 39{49.

[8]Knonenko I., I. Bratko and E. Roskar ,Experiments inautomatic learning of

medical diagnostic rules,Technical Report, Jozef Stefan Insticute, Lljubljana,

Yugoslavia, 1984.

[9]Pawlak,Z.,\Roughsets:TheoreticalAspectsofReasoningaboutData",Kluwer

Academic Publishers,1991.

[10]Wong,K.C.,andK.Y.Chiu,Synthesizingstatistical knowledge forincomplete

mixed-mode data, IEEE Transactions on Pattern Analysis and Machine

Intelligence9(1987), 796{805.

[11]Ziarko,W.,Variableprecisionroughsetmodel,JournalofComputerandSystem