Jerzy W. Grzymala-Busse
Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS 66045, USA
and
Institute of Computer Science, Polish Academy of Sciences, 01-237 Warsaw, Poland [email protected]
http://lightning.eecs.ku.edu/index.html
Summary. A new approach to missing attribute values, based on the idea of an
attribute-concept value, is studied in the paper. This approach, together with two other approaches to missing attribute values, based on “do not care” conditions and lost values are discussed using rough set methodology, including attribute-value pair blocks, characteristic sets, and characteristic relations. Characteristic sets are generalization of elementary sets while characteristic relations are generalization of the indiscernibility relation. Additionally, three definitions of lower and upper approximations are discussed and used for induction of certain and possible rules.
1 Introduction
In this chapter data sets are presented in the form of decision tables, where columns are labeled by variables and rows by case (or example) names. Vari- ables are categorized into independent variables, also called attributes, and dependent variables, also called decisions. Usually decision tables have only one decision. The set of all cases that correspond to the same decision value is called a concept (or a class).
In most papers on rough set theory it is assumed that values, for all vari- ables and all cases, are specified. For such tables the indiscernibility relation, one of the most fundamental ideas of rough set theory, describes cases that can be distinguished from each other.
However, in many real-life applications, data sets have missing attribute values, or, in different words, the corresponding decision tables are incom- pletely specified. For simplicity, incompletely specified decision tables will be called incomplete decision tables.
In data mining two main strategies are used to deal with missing attribute values. The former strategy is based on conversion of incomplete data sets (i.e., data sets with missing attribute values) into complete data sets and then
J.W. Grzymala-Busse:Three Approaches to Missing Attribute Values: A Rough Set Perspec- tive, Studies in Computational Intelligence (SCI)118, 139–152 (2008)
acquiring knowledge, e.g., by rule induction or tree generation from complete data sets. In this strategy conversion of incomplete data sets to complete data sets is a preprocessing to the main process of data mining. In the later strategy, knowledge is acquired from incomplete data sets taking into account that some attribute values are missing. The original data sets are not converted into complete data sets.
Typical examples of the former strategy include [4, 11]:
• Replacing missing attribute values by the most common (most frequent) value of the attribute.
• Replacing missing attribute values restricted to the concept. For each con- cept missing attribute values are replaced by the most common attribute value restricted to that concept.
• For numerical attributes, missing attribute value may be replaced by the attribute average value.
• For numerical attributes, missing attribute value may be replaced by the attribute average value restricted to the concept.
• Assigning all possible values of the attribute. A case with a missing attribute value is replaced by a set of new cases, in which the missing attribute value is replaced by all possible values of the attribute.
• Assigning all possible values of the attribute restricted to the concept.
• Ignoring cases with missing attribute values. An original data set, with missing attribute values, is replaced by a new data set with deleted cases containing missing attribute values.
• Considering missing attribute values as special values.
The later strategy is exemplified by the C4.5 approach to missing attribute values [21] or by a modified LEM2 algorithm [9,13]. In both algorithms original data sets with missing attribute values are not preprocessed, i.e., data sets are not preliminarily converted into complete data sets.
Note that from the view point of rough set theory, in the former strat- egy the conventional indiscernibility relation may be applied to describe the process of data mining since, after preprocessing, the data set is complete (has no missing attribute values). Furthermore, lower and upper approximations, other basic ideas of rough set theory, are also conventional.
In this chapter we will concentrate on the later strategy used for rule induction, i.e., we will assume that the rule sets are induced from the original data sets, with missing attribute values, not preprocessed as in the former strategy.
We will assume that there are three reasons for decision tables to be in- complete. The first reason is that an attribute value, for a specific case, is lost. For example, originally the attribute value was known, however, due to a variety of reasons, currently the value is not available. Maybe it was recorded but later it was erased. The second possibility is that an attribute value was not relevant – the case was decided to be a member of some concept, i.e., was classified, or diagnosed, in spite of the fact that some attribute values were not
known. For example, it was feasible to diagnose a patient in spite of the fact that some test results were not taken (here attributes correspond to tests, so attribute values are test results). Since such missing attribute values do not matter for the final outcome, we will call them “do not care” conditions. The third possibility is a partial “do not care” condition: we assume that the missing attribute value belongs to the set of typical attribute values for all cases from the same concept. Such a missing attribute value will be called an attribute-concept value. Calling it concept “do not care” condition would be perhaps better, but this name is too long.
The main objective of this chapter is to study incomplete decision tables, assuming that in the same decision table some attribute values may be lost, some may be “do not care” conditions, and some may be attribute-concept values. Decision tables with lost values and “do not care” conditions were studied in [7–9, 12].
For such incomplete decision tables there are three special cases: in the first case all missing attribute values are lost, in the second case all missing attribute values are “do not care” conditions, and in the third case all miss- ing attribute vales are attribute-concept values. Incomplete decision tables in which all attribute values are lost, from the viewpoint of rough set theory, were studied for the first time in [13], where two algorithms for rule induction, modified to handle lost attribute values, were presented. This approach was studied later in [23–25], where the indiscernibility relation was generalized to describe such incomplete decision tables.
On the other hand, incomplete decision tables in which all missing at- tribute values are “do not care” conditions, again from the view point of rough set theory, were studied for the first time in [4], where a method for rule induc- tion was introduced in which each missing attribute value was replaced by all values from the domain of the attribute. Originally such values were replaced by all values from the entire domain of the attribute, later by attribute values restricted to the same concept to which a case with a missing attribute value belongs. Such incomplete decision tables, with all missing attribute values be- ing “do not care conditions”, were extensively studied in [14, 15], including extending the idea of the indiscernibility relation to describe such incomplete decision tables.
Rough set methodology for incomplete decision tables with missing at- tribute values of the type attribute-concept values is presented in this chapter for the first time, though it was briefly mentioned in [9].
In general, incomplete decision tables are described by characteristic rela- tions, in a similar way as complete decision tables are described by indiscerni- bility relations [7].
For complete decision tables, once the indiscernibility relation is fixed and the concept (a set of cases) is given, the lower and upper approximations are unique.
For incomplete decision tables, for a given characteristic relation and the concept, there are three different possible ways to define lower and upper
approximations, called singleton, subset, and concept approximations [7]. The singleton lower and upper approximations were studied in [14,15,23–25]. Sim- ilar ideas were studied in [2, 22, 26–28]. In this chapter we further discuss applications to data mining of all three kinds of approximations: singleton, subset and concept. As it was observed in [7], singleton lower and upper ap- proximations are not applicable in data mining.
The next topic of this chapter is demonstrating how certain and possible rules may be computed from incomplete decision tables. An extension of the well-known LEM2 (Learning from Examples Module, version 2) rule induction algorithm [1, 5], called MLEM2, was introduced in [6]. LEM2 is a component of the LERS (Learning from Examples based on Rough Sets) data mining system. Originally, MLEM2 induced certain rules from incomplete decision tables with numerical attributes and with missing attribute values interpreted as lost. Using the idea of lower and upper approximations for incomplete decision tables, MLEM2 was further extended to induce both certain and possible rules from a decision table with some numerical attributes and with some attribute values being lost and some attribute values being “do not care” conditions.
A preliminary version of this chapter was presented at the Workshop on Foundation of Data Mining, associated with the Fourth IEEE International Conference on Data Mining, Brighton, UK, November 1–4, 2004 [10].