George Potamias1
and Vassilis Moustakis1, 2
1 Institute of Computer Science, Foundation for Research and Technology-Hellas (FORTH), Science and Technology Park of Crete, P.O. Box 1385, 71110 Heraklion, Crete, Greece
{potamias,moustaki}@ics.forth.gr
2 Department of Production and Management Engineering, Technical University of Crete, Uni- versity Campus, Kounoupidiana, 73100 Chania, Crete, Greece
Abstract. Post and prior to learning concept perception may vary. Inductive learning systems support learning according to concepts provided and miss to identify concepts, which are hidden or implied by training data sequences. A training instance, known to belong to concept ‘A’ either participates in the for- mation of rule about concept ‘A’ or indicates a problematic instance. A test in- stance known to belong to concept ‘A’ is either classified correctly or misclassi- fied. Yet an instance (either training or test) may be pointing to a blurred description of concept A and thus may lie in between two (or more) concepts. This paper presents a synergistic iterative process model, SIR, which supports the resolution of conflict or multi-class assignment of instances during inductive learning. The methodology is based on two steps iteration: (a) induction and (b) formation of new concepts. Experiments on real-world domains from medicine, genomics and finance are presented and discussed.
1 Introduction
Equivocal association of a training example with a rule during inductive learning spots
vagueness about the concept the example manifests. The rule points to a class, which covers examples that belong also to other classes. A majority metric is often used to tag the rule to a single concept (or class). Majority often refers to the number of ex- amples (or cases) covered by the rule. Thus a rule that covers 10 cases known to be- long to class A and one example known to belong to class B would be tagged as a rule associated with class A. Although equivocal rule(s) – case(s) association may happen for a variety of reasons it may also point out to the existence of concepts, which lie in- between the concepts steering learning in the first place. Equivocal rule learning may also be attributed to data inconclusiveness (this means that some essential features are missing from concept and case representation), tuning of generalization heuristics used in learning, or noisy training cases. Attempts to rectify multi-class assignment include addition or deletion of attributes, attribute-values and training cases, [2], [14].
Work reported herein was partially supported via the INTERCARE Health Telematics project, funded by the European Commission (HC 4011). Responsibility for results reported lies with the authors and do not represent official INTERCARE views.
G.A. Vouros and T. Panayiotopoulos (Eds.): SETN 2004, LNAI 3025, pp. 164–173, 2004. © Springer-Verlag Berlin Heidelberg 2004
Learning In-between Concept Descriptions Using Iterative Induction 165 In between concepts may reflect a tacit property of the domain over which learning is directed. For example, in a medical domain in-between concepts may either reflect uncertainty about the status of the patient at some point of clinical decision making, or because of a wealth of data (such as the gene-expression data), which point to in- between concepts for molecular-based disease characterization [5], [9]). In financial decision making in-between concepts may point to a firm, which is neither excellent nor very good, but it is in between excellent and very good.
Literature has focused more on accuracy and rule comprehensibility and has not addressed in-between class resolution. Borderline concepts are discussed in [7], yet no formal procedure has been established to support identification and modeling. In [8] it is suggested the use of a dummy feature to resolve borderline concept conflict in medical decision making; however, his approach sheds light on the cause of learned rule ambivalence, but does not support identification and modeling of the intrinsic features of in-between concepts and cases.
Fig. 1. The diagram presents an in-between or borderline concept, which lies between concepts A and B. The shaded area corresponds to the new concept description. SIR supports learning of the “in-between” concept description.
In the present article we elaborate on an iterative learning process, which copes with equivocal (or multi-class) rule(s) – case(s) association. Objectives are twofold: (a) to present, and demonstrate a methodology for inventing in-between hidden classes that could explain and model multi-class assignment; and, (b) to identify representa- tive and borderline cases. We support our approach by coupling the learning process with multi-class resolve heuristics reflecting respective domain dependent background knowledge. Work reported herein conceptually links with earlier research by [17] and practically focuses on the identification of in-between concept description along lines suggested in Figure 1.
166 George Potamias and Vassilis Moustakis
Section 2 overviews the methodology Synergistic Iterative Re-assignment (SIR) learning process principles and SIR heuristics. Section 3 summarizes the implementa- tion of SIR to two learning frameworks: rule induction and similarity based learning. Section 4 presents results from extensive experimentation using medical and financial decision-making domains. We conclude the paper in section 5 by discussing the im- portance of our work for vague concept modeling and decision support, and by sug- gesting areas for future work.