A Method for Extracting Rules from
Incomplete Information System
Yidong Lan 1;2
Department of Mechanics and Engineering Science
PekingUniversity
Beijing,100871, China
Lin Zhang, Liancheng Liu 3
CIMSEngineering Research Center
TsinghuaUniversity
Beijing 100084, China
Abstract
In this paper we propose a method of extracting rules in Incomplete Information
System based onan irregular decisiontable. Neednotto make estimationof
miss-ing attribute value, we get the rule set contains no missing attribute value. The
experimentjustify,theaccuracyofobtainedrulesetisalmostsamewiththehighest
accuracy among those oftheestimating missingattributevaluemethods.
Key words: Roughset, IncompleteInformationSystem, irregular
decisiontable.
1 Introduction
One of the main tools of data analysis is rule induction from raw data
rep-resented by a database. Among many dierent data analysis theories and
methods, rough sets [9] based methods are the new promising ones and have
beensuccessfullyappliedtovarious areasbecause oftheirtheoretic simplicity
and practicability. Thereareoftensome erroneous,incomplete,uncertainand
vaguedata inreal-lifedata. In this paperwewilldiscussone formof data
in-completeness: missing attribute values. Forthe convenience of data analysis,
dataisusuallyrepresented byatwo-dimensiontable. Thistablewillbecalled
1
Supportedby NationalScience Foundationof China(No.60047015)andNational
High-TechnologyDevelopmentProgramofChina(No.863-511-930-004)
2
Email: [email protected]
3
Email: [email protected]
CC BY-NC-ND license.
Open access under
a Knowledge Represent System,or anInformation System. When containing
the missing attribute values,it is called an Incomplete Information System.
There are several approaches to extract rules from Incomplete Information
System. The simplest among them consist in removing examples with
un-knownvalues. Another simpleone istoreplaceunknown value withthe most
common value[1]. Other more complex approaches were presented in [4], [6],
[8],[10], [11]. Except the method of removing examples with missing values,
all the above-mentionedmethods replacethe missingvalues according to the
distribution of attribute values of the other examples in decision table. In
this paper, we propose a method to deal with missing attribute valuesbased
on irregulardecision table: this method can use the informationof examples
with missing attribute value themselvesto reduce the uncertaintyof rules.
2 Preliminary
2.1 Rough Set
TheinformationsystemcanbeformulatedasaquadrupleS =(U;A;V
a ;f a ) a2A ,
where U is a non-empty nite set of objects called the universe, A is a
non-emptynite setof attributes. 8a2A, V
a
istheset ofvaluesofa,f
a
:V !V
a
is the function between U and V
a .
For any ; 6= B A, the binary relation I
B = f(x;y) 2 U 2 : 8a 2 B;f a (x)= f a
(y)g is called indiscernibility relation, it is re exive, symmetric
and transitive,so isalsoanequivalence relation. I
B
can bedenoted asI if B
is understood.
8x 2 U, I is an indiscernibility relation on U, I(x) = fy 2 U : xIyg
is called an indiscernibility class. [x]
I
denotes an indiscernibility class of I
contains an elementx2U.
The set of indiscernibility class, C
1 ;C
2
;:::;C
s
, which is induced from I
aboveU, satises: C i 6=;;i=1;2;:::;s; C i \C j =;;i6=j;1i;j s; s S i=1 C i =U.
Then I forms a partition of U, is denoted as P(I). If X U and I is
an indiscernibility relation on U, then the lower approximation (of X byI)
is dened as X
I =
S
fI(x) : I(x) Xg, and the upper approximation by
X I = S fI(x):I(x)\X 6=;g. If X I 6=X I
,the pair of the formhX
I 6=X
I iis
called a rough set. X
I
is called the positive region of X, U nX
I the negative region, X I nX I the boundary.
LetK =(U;A)beaninformationsystem andletC;DAbetwosubsets
of attributes, called condition and decision attributes respectively.
a decision table and will be denoted T = (U;C;D;V a ) a2C[D . With every x2U, weassociate a functiond x :A!V a , such that d x
(a)=a(x), for every
a2 C[D;Thefunction d
x
willbecalled adecisionrule(inT),and xwillbe
referred to asa labelof the decision rule d
x .
Ifd
x
is a decisionrule, then the restriction of d
x to C, denoted d x jC, and the restrictionof d x toD,denoted d x
jDwillbecalled conditionsand decision
ofd
x
respectively. A decisionrule oftenhas theform!', whereand 'is
conditions and decision,respectively. Fora decisionrule !', itsaccuracy
is dened as: j\'j
jj
, where j \' j denotes the number of data that match
both and ', j j denotes the number of data that match . When its
accuracy is 1, the rule is consistent, otherwise inconsistent. In this paper we
calldecision rule as rule forconvenience.
2.2 Variable PrecisionRough Set
Variable precision rough set[11 ] is the extension of the original rough set.
Through setting the threshold value, it is loosen the strict denition of
approximation boundary. For a givendecision table T = (U;C;D;V
a )
a2C[D
and the threshold 1 < 1, I is indiscernibility relation on U, we dene
X I = S jI(x): jI(x)\Xj jI(x)j
jthe lower approximationset, X I = S jI(x): jI(x)\Xj jI(x)j
>1 j the upper approximation set, respectively.
Similarly,roughrulesetcanbedenedas: Q !dP(I Q ) P(I d );Q C, then < X;Y >2 Q !d , X Y I Q , Q det !d = f< X;Y >2 Q !d : X Y I Q
g is consistent rule, positive region of Q !d is: V 1 = S fX 2P(I Q ):<X;Y >2Q det !dg.
3 Data Analysis Method based on Rough Set
Oneaimofdataanalysismethodbasedonroughsetistoinducedecisionrules
fromrawdata. Forgivenrawdata,selectingdierentindiscernibilityclasscan
constitute dierent decision rule sets. The mining algorithm of rough
deci-sion rulesistoselectsuitableindiscernibilityclassesaccordingsome criterion,
nally getting the optimal decision rule set. There are many criterions to
evaluatethe obtaineddecisionruleset, suchasqualityofclassication[6]and
entropy [2].In this paper we discuss the latter.
3.1 Entropy of Information
In information theory, entropy can be used to measure the uncertainty of
informationsystem.
prob-ability distribution of X is fp 1 ;p 2 ;p 3 ;:::;p s g, s P i=1 p i = 1, the entropy of X is: H(X)= s X i=1 P i log 2 1 p i : Foran information system L =< U;;V q ;f q > q2 , each indiscernibility
relationI inU canbeseenasanstochasticvariable,itsvalueisindiscernibility
class X 1 ;X 2 ;:::;X s
,formed by the partitiontoU byI,P(I). Itsprobability
distributioncan beevaluatedas: p(X^ =X
i )= jX i j jUj ;i=1;:::;s. Similarlythe
entropy of partitionP(I) isdened as followed:
Denition 3.2 For each indiscernibility relation I in U, the indiscernibility
class, formed byitspartition toU, P(I), is X
1 ;X
2
;:::;X
s
, then the en tropy
partition P(I) is: H(I)= s P i=1 jX i j jUj log 2 jUj jX i j
. Foreverydata object inU, which
is partitionedby anindiscernibilityrelation,H(I) evaluatesthe average least
times we should take to make certain this data object belonging to which
indistinguishingclass.
In [2], Ivo Duntsch and Gunther Gediga use the entropy to measure the
uncertaintyof information system.
Foran informationsystem L=< U;;V
q ;f
q >
q2
, where U isa nite set
ofobjects, isanite setofattribute, foreachq2, V
q
istheset ofvalueof
q, f q is aninformation function U ! V q , Q , d 2 is decision attribute, P(I d
) is a partition of U with classes Y
i
, i k, each having cardinality r
i .
The positive region of Q !d is V = X
1 [X 2 [:::[X c , where X i 2P(I Q ), i=1;2;:::;c. H det (Q!d)= P ic i log 2 ( 1 i )+jUnVj 1 n log 2 (n),where i is
the estimatedprobabilityofanelementofX
i
being inthe classX
i \Y i ,where the cardinality of U is n. S det (Q ! d) def = 1 H det (Q!d) H(I d ) log 2 (n) H(I d ) is normalized
entropy measure function, where H(I
d ) = k P i=0 ri n log 2 ( n r i
). In this method for
every consistent rule in decision table, its entropy is: 1 n log 2 ( n m ), where m
is the number of rules equivalent to this rule, n is the number of all rules;
every inconsistent rule, 1
n log
2
(n). Usually in Incomplete Information System
the rules are induced after replacing missing attribute value and attribute
reduct. Asabove-mentioned,themissingattributevalueofanexampleislled
according tothe distribution of this attribute value in other examples. From
thepointviewofattributereduct,notalltheattributevaluesarenecessaryfor
decision. If the attribute or attribute value is redundant, we needn't replace
the missing attribute value; from the point view of examples classication,
if we can use the present known attribute value to classify the example, the
existenceofmissingattributevaluedoesn'taecttheclassifyabilityofsystem,
the replacing ofmissingvalue isunnecessary,too. Soextractingrulesdirectly
extraction rules, inother hand can reduce the uncertaintyof rule set.
In Incomplete InformationSystem, the method of measure uncertaintyin
[2] can not be used directly. So we make some modication of that method:
for the rules that haven'tmissing attribute values,we compute its en tropy
as [2]; otherwise the rule's entropy is set tobemaximum, 1
n log
2
(n). Because
both the inconsistent rules and the rules with missing attribute values are
the rules we will remove, so all their uncertainties should be set maximum.
Using this method to compute entropy,bysome rule mining algorithm(such
asgeneticalgorithm), wecan get therules set with missingattribute valueas
less aspossible. One of this kind of rules set is of the form Table 1.
Table 1
Incomplete decisiontable
Condition attribute Decision attribute
U Price Mileage Size Max-speed Quality
1 High Low Full Low Good
2 Low * Full Low Good
3 * * Compact Low Poor
4 High * Full High Good
5 * * Full High Excellent
6 Low High Full * Good
As to the rules that havemissing value,some of them will not result to
in-consistent rules, it means this missing attribute value is redundant attribute
value,the missing attribute valuecan be replaced by, where represents
the valuewe need not to know; As to the rules with missing attribute values
whichwill resultto inconsistent rules, they can beseen as the elements ofan
indiscernibilityclass of optimalattribute set. In thisindiscernibilityclass, we
discern the elements bythe attribute in condition attribute set beside
opti-mal attribute set. In this indiscernibilityclass, we dene consistent rule as
follo wed:
Denition 3.3 In decision table T = (U;C;D;V
a ) a2C[D , Q C is optimal attribute set, x 2 U, z 2 [x] I Q , threshold 0:5 < 1, b 2 C nQ, b i 2 V b , V b is valueset of b, I b
is indiscernibility relation above[x]
I Q , Y 2 P(I d ), b det !d=f<z;Y >2b !d:[z] I b Y I b gis consistentrule in[x] I Q .
In[7],KryszkiewiczM.proposedamethodtoreductthedecisionof
Incom-plete Information System. But this method would result to the inconsistent
rules. Now byan example we illustrate method we proposed in this paper.
System, bysome ruleminingalgorithm,wecan obtainanincomplete decision
table of the formTable 1, afterattributes reductand redundant rules
remov-ing. In this decision table, for the rules that contain missing attribute value,
if the missingattribute valuewillnot resulttoinconsistentrules, weuse to
replace intable,itmeansthe missingattribute value isredundant attribute
value,suchasrule1,2,3,6. When themissingattribute valuewillresulttothe
inconsistent rules, such as rule 4 and 5, we should nd a suitable attribute
in the reductattributes, fromthe value of this attribute wecan discernthese
inconsistentrules. Inthis way,neednotestimatethemissingattributevalues,
we can get adecisiontable whichcontains no missingattribute value. In T
a-ble 2,the attribute A is reduct attributes, and select to discernrule 4 and 5.
Table 2isdierentfrompopulardecisiontable,not allthedecisionruleshave
the same numberof condition attribute, is called irregulardecisiontable.
Table 2
Irregulardecisiontable
Condition attribute Decision attribute
U A Price Mileage Size Max-speed Quality
1 High Low Full Low Good
2 Low Full Low Good
3 Compact Low Poor
4 A
1
High Full High Good
5 A
2
Full High Excellent
6 Low High Full Good
The algorithm of mining rules in Incomplete Information System, which
contains missing valueas less as possible is selected to be general-purpose
genetic algorithm program GENESIS[3 ]. By this algorithm, we can obtain
the rule set with missing value and optimal attribute set. The algorithm of
obtainingirregular decisiontable is described as followed:
Input:
(i) Original trainingdata set, A;
(ii) the set of rules whichare extracted in trainingdata set with n rules, B;
(iii) rule array I =[1;2;3;:::;n];
(iv) optimal attribute set, D;
(v) the set of attributes whichis reducted in attribute reduct, C;
(vi) irregular decisiontable, E =;.
Output: irregulardecision table E.
While(I 6=[0;0;0;:::;0])Do
Letc=I(p), wherep is the rst nonzero elementin I.
If the c-th rule contains no missing attribute value then put it into
E. LetI(p)=0.
Else If the accuracy of the c-th rule is 1, then put it intoE. let
I(p) =0.
Else Selectallthe rules whichwillinconsistentwith c-th inB, c
1 ;c 2 ; :::;c i , 0 < i < n, j = 1;2;:::;i, c j
2 I. Select all the examples
in A which match any one of the rule c
j
. Classify these examples
on everyattribute in C. The attribute with the highest accuracy
of classication is selected to be the attribute added in E. From
these examplesextract rules on conditionattributes in D and the
added attribute. Put the extracted rules intoE. Let I(p) = 0,
I(c 1 )=0, I(c 2 )=0;:::; I(c i )=0. Endif EndWhile End 4 Experiment
To testthe utility ofour method,we makeexperimenton U.C.Irvine
Reposi-toryof Machine LeaningData Table. Ourexperimentsare conducted inJava
under Windows2000, using JDK 1.3 and Oracle 8.1.7. Three data sets are
selected: Breast-cancer, Echocardiogram and Hepatitis. In every data set the
experiment is done 10 times to get mean value of results. In the experiment
procedure, each data set is divided into two groups: 50% data are taken as
training set and the remains as testing set. Entropy as tness function,
ge-netic algorithm isselected to be mining algorithmof rules. After getting the
optimalconditionattributeset, usingthe algorithmwepropose toremovethe
missing valuein rules. Table3 shows the comparison of accuracy of rule set
obtained by irregulardecisiontablemethod(IDT)and thehighest oneamong
those ofestimating missingattribute value methods(EMAV).Wecan see, the
tworule sets havethe almostsame accuracy.
Table 3
Comparisonof accuracy ofdierent rulesset
The highest accuracy
ofthe EMAVmethods
The accuracy of IDT method Accuracy dierence Breast-cancer 71:5%[5] 71:7% +0:2% Hepatitis 86:4%[5] 82:4% 4:0% Echocardiogram 93:4%[5] 91.6% -1.8%
5 Summary
Inthispaperweproposeamethodbasedonirregulardecisiontabletoextract
rules in Incomplete Information System. Need not to estimate the missing
attribute values,we obtain rule set with no missing attribute value. Making
experiments on three database, the result justify the validity of our method,
the obtained rules set has the almost same accuracy with the highest one
among those of the estimating missing attribute value methods.
References
[1]Clark, P., and T. Niblett, The CN2 Induction algorithm, Machine Learning 3
(1989), 261{283.
[2]Dntsch I., and G. Gediga, Uncertainty measures of rough set prediction,
ArticialIntelligence 106 (1998), 109{137 .
[3]Grefenstette, J.J., TechnicalReportCS-82-11,ComputerScienceDepartment,
Vanderbilt University,1984.
[4]Grzymala-Busse, J. W., On the unknown attribute values in learning from
exampless, in: Z.W. Ras and M. Zemankova,eds., Proc. of theISMIS-91, 6th
International Symposiun on Methodologies for Intelligent Systems(Charlotte,
NC,USA,1991),LectureNotesinArticialIntelligence,Springer-Verlag,Berlin
HeidellbergNewYork, 542(1991), 368{377.
[5]Grzymala-Busse, J. W.,and H.Ming , A comparison of several approachesto
missingattributevaluesindatamining,in:Ziarko,W.,andYaoY.,eds.,RSCTC
2000, LNAI2005(2001), 378{385.
[6]Jelonek, J., K. Krawiec and R. Slowinski, Rough set r eductionof attributes
and their domains for neural networks, Computational Intelligence 11 (1995),
339{347.
[7]Kryszkiewicz M., Rough set approach to incomplete information systems,
InformationSciences,112 (1998), 39{49.
[8]Knonenko I., I. Bratko and E. Roskar ,Experiments inautomatic learning of
medical diagnostic rules,Technical Report, Jozef Stefan Insticute, Lljubljana,
Yugoslavia, 1984.
[9]Pawlak,Z.,\Roughsets:TheoreticalAspectsofReasoningaboutData",Kluwer
Academic Publishers,1991.
[10]Wong,K.C.,andK.Y.Chiu,Synthesizingstatistical knowledge forincomplete
mixed-mode data, IEEE Transactions on Pattern Analysis and Machine
Intelligence9(1987), 796{805.
[11]Ziarko,W.,Variableprecisionroughsetmodel,JournalofComputerandSystem