Rule-based learning - Adaptive Supervised Learning: Related Work

2.5 Adaptive Supervised Learning: Related Work

2.5.1 Rule-based learning

Expert knowledge systems often take the form of a set of rules that describe the behavioral properties of an operational system, in means of a reaction/output for an action/input. The tremendous increase of the available data and the changing nature of the data generating processes have led to the need for decision support systems that can learn, evolve and adapt such rules from the available data autonomously. In the following, we present a list of adaptive rule-based systems; these systems are for classification problems, unless otherwise stated. These approaches are given, in chronological order, by:

STAGGER [142] is the first approach that addresses a solution for concept drifts via incremental concept learning. It operates by finding a symbolic represen- tation of the hidden concept (learning the concept by inspecting the instances with the positive class label). The concept is represented through a set of rules with conjunctive and disjunctive operators between their literals. The search is achieved in a similar way to the search in the version space [120], except that (i) the starting point here is the single literals, (ii) the generalization is accomplished by adding more disjunctive conditions and (iii) the specialization is achieved by adding more conjunctive conditions. Pruning and backtracking through the search process guarantees to reflect any concept drift on the found representations.

Floating rough approximation 4 (FLORA4) [179] is an approach from the family of rule-based algorithms, which keeps a concept description in means of three types of propositional predicates: (i) predicates that cover only positive ex- amples, (ii) predicates that cover only negative examples and (iii) predicates that cover both types of examples. Predicates of each type are accompanied by their support, the number of examples covered by each predicate. A predicate is moved from one set to the other depending on the change of its purity. FLORA2 applies a window adjusted heuristic (WAH) in order to cope with concept changes in the setting of incremental concept learning from a stream

of objects. This heuristic calls for shrinking the window size of the covered examples whenever a drop in the performance is detected.

Fast and adaptive classifier by incremental learning (FACIL) [63, 64] introduces an adaptive method for learning a rule-based system incrementally from a data stream. Learned rules are handled based on their purity, the ratio between the number of covered instances belonging to the majority class to the total number of covered examples. On the arrival of new samples, a decision is made based on the following ordered criteria:

1. If a consistent rule that covers this sample is found, the purity of this rule is increased.

2. If no covering consistent rule is found, the consistent rule with the min- imum generalization cost is chosen and generalized, as long as this cost does not exceed a given threshold.

3. Otherwise, the purity of inconsistent rules, covering this example, is de- creased.

4. If none of the past criteria is satisfied, a new rule consistent with this sample is created. Rules that have purity lower than a predefined threshold are removed and replaced by less general consistent rules.

RILL [50] is an adaptive rule-based algorithm that reserves a set of rules and instances. On the arrival of a new instance, which is not covered yet by any of the learned rules, the nearest rule is retrieved and generalized until it covers this instance. The generalized rule is only accepted when this generalization does not drop the purity of the original rule, otherwise it is retracted and the new instance is simply added to the set of rules.

The field of soft computing has also developed its own incremental data-driven fuzzy rule-based approaches for regression problems, such as FLEXFIS [110] and eTS+ [7]. These two methods learn the so-called Takagi-Sugeno-Kang (TSK) fuzzy system [165], which consists of TKS rules, each of which has a linear function in the consequent part. The rules are learned in an online manner, after the application of incremental clustering. Despite their similarity in the learned models, FLEXFIS and eTS+ technically diﬀer in the way they learn and update the rules’ antecedents and consequent.

In the recent years, adaptive rule learning has witnessed a leap in the complex- ity of the learned rules. AMRules, for example, is a rule induction method for regression on data streams [4]. Each rule is specified by a conjunction of literals on the input attributes in the premise part, and a (linear) function minimiz- ing the root mean squared error in the consequent. Rules are incrementally added on the basis of Hoeﬀding’s bound [81] and their performance is moni- tored by the Page-Hinkley (PH) test [121], such that a rule is pruned as soon as its performance drops due to a concept change. AMRules can be seen as an extension to the very fast decision rules (AVFDR) classifier [99] in order to solve regression problems with model rules. Very fast decision rules (VFDR) [71] incrementally induce a compact set of decision rules form a data stream; it is extended by AVFDR to detect and react to changing data by applying SPC, see Subsection 2.4.

In this work we choose to compare our proposed methods with AMRules and FLEXFIS. This choice is based on the following reasons: (i) They are considered as the state of the art rule-based evolving methods. (ii) The availability of their implementations. For more details, a comprehensive explanation of AMRules and FLEXFIS is added in Appendix A.2 and Appendix A.4, respectively.

In document Novel methods for mining and learning from data streams (Page 49-51)