Future Work - A context aware approach for handling concept drift in classification

In what follows, we list a number of future directions that we believe are interesting extensions and improvements to our work.

7.3.1 Context Identification

The first interesting extension is to study an incremental identification of context variables. That is, we plan to explore how the importance of context variables can be identified over time in the absence of pre-existing training data, i.e. when training data becomes available only gradually. The information theoretic measures proposed enable to perform such incremental learning. Moreover, it would also be interesting to explore the case where the importance of the context variables may change over time. Given this, the weight of the context variables should not remain static once identified, but updated continuously on the availability of new data examples. For this purpose, it is possible to apply some forgetting approaches (windowing or weighting) with respect

Chapter 7 Conclusions and Future Work 174

to the available past observations, so that context identification would be a continuous learning process.

7.3.2 Example Weighting

To account for resource boundedness, it would be interesting to explore the possibility of an online context-based weighting approach to adjust example weights without the need to reiterate over the past data.

Moreover, the weighting process assumes continuous drift, without attempting to iden- tify when a drift actually occurs, and thus constantly re-weights the training examples whenever the classifier needs to be updated with new data. Therefore, it would also be interesting to explore a triggered-based re-weighting approach, which combines example weighting with drift detection. That is, re-weighting of the training examples can be triggered only if the recent contextual circumstances change. This eliminates the need for the unnecessary reiteration over past examples to update their weights in the absence of drift.

Finally, learning the equivalence among context values incrementally, in the absence of pre-existing training knowledge, is also an interesting extension to our work.

7.3.3 Drift Detection

The context-aware adaptive multi-model learners proposed for drift detection do not account for the case of gradual changes. Therefore, an interesting research direction is to design an appropriate extensions to the proposed drift detection and classifier selection mechanisms (for example introducing a weighting function into the model selection process), so that the model will also operate in gradually changing environments.

Chapter 7 Conclusions and Future Work 175

Testing the behaviour of the proposed hybrid model learner with other change detection methods would also be interesting. It is also worth noting that the proposed context- based drift detection approach may be beneficial for the case of unlabelled streaming data (i.e. where the class labels of the examples are not readily available). On the other hand, existing change detection approaches mainly depend on monitoring the error rate of the classification model, which makes them not applicable for the unlabelled data case.

7.3.4 Other Research Directions

In addition to the above mentioned future work extensions to the proposed work, it would also be interesting to explore the following research direction. First, test the behaviour of the proposed algorithms with other change types and scenarios (such as local concept changes and virtual changes), the possibility of new class emergence, and the presence of complex concept drift (i.e. when more than one type of drift is present at the same time). The second important future research direction is to address the constraints of data steam learning (e.g. infinite size, limited memory and computation resources, one-shot treatment etc.), which extend the applicability of the proposed algorithms to other application domains.

Finally, the usefulness of the proposed algorithms can be further tested with other real-world domains that exhibit concept drift and contextual characteristics, such as consumer credit scoring and stock market price prediction. Both these domains are subject to concept drift that is usually linked to changes in different macroeconomic variables, which represent contextual characteristics for these domains.

Appendix A

Evaluation of

k-Nearest

Neighbour Approach

Appendix A

In the following, we test the accuracy of the conditional mutual information (CMI) estimator given in Equation 4.20 and analyse the influence of various factors that could affect this estimate utilising bias and relative bias as a performance criteria.

A.1 Experimental Setup and Results

In order to evaluate the performance of the k-nearest neighbour estimate, we will use multivariate normally distributed variables with zero mean and unit variance, because in this case we are able to compare with analytic results. Specifically, the theoretical value of the conditional mutual information for normally distributed variables can be

Appendix A Evaluation of k-Nearest Neighbour Approach 177

derived in terms of Equation 4.31, with the entropies being calculated according to the definition of entropy for a multivariate normal distribution given as follows [24]:

H(X1, X2, ..., Xd) =

2log2[(2πe)

d_|_Σ_|_] _(A.1)

where|Σ|is the determinant of the covariance matrix Σ.

The bias of the estimates with respect to the theoretical values will be computed in

terms of the absolute bias: |Iˆ−I|, and the relative bias: |Iˆ−_II|, where ˆI is the average

of the obtained estimates, and I is the theoretical value.

Let us assume thatZ = (C, Y, X) is normally distributed random variable of dimension

dz (the variables C,Y,X can be either scalars or multidimensional) with zero mean

values and covariance matrix M, whereMi,j = 1, fori = j, and Mi,j = r, for i 6= j,

i, j = 1 :dz. Here, we assume that C,Y are scalars, whileX is an element of a higher

dimensional space, with dimensiondx= 3. Each variable is made to be correlated with

the other variables with correlation, r, ranging from 0.9 to 0.6 with a step size of 0.1.

Here, with reference to a predictive model, X corresponds to a vector of the primary

variables, while C, Y to the candidate context variable, and the target variable, re-

spectively. Three factors will be considered in evaluating the accuracy of the CMI

estimator: the value of the parameterk, the dimension of the dataset, and the number

of available data samples. The related experiments are detailed next.

In document A context aware approach for handling concept drift in classification (Page 189-193)