Concept Change over Time - Novel methods for mining and learning from data streams

In the conventional supervised learning setting, the data generating process (the probability measure P onZ = X × Y) is assumed to be stationary; it is also assumed that examples are independently sampled according to P.

Under the assumptions of stationarity and independence, each new observation

zt is generated at random according to P, i.e., the probability to observe a specific

z∈ Z is given by1

P(z) = P(x, y) = P(x)· P(y | x) (2.6)

= P(y)· P(x | y) . (2.7)

P(y) represents the probability distribution in the output space, or the so-called

prior. The conditional probability P(y| x) is the posterior probability, which is the probability of observing y after observing x.

Giving up the assumption of stationarity (while keeping the one of independence), the probability measure P generating the next observation may possibly change over time. Formally, we are not dealing with a single measure P, but with a sequence of measures (P1, P2, P3, . . .), assuming that z is generated by Pt. One speaks of a

concept change if these measures are not all equal [94]. Gama et al. [76] present in

their survey paper a coherent taxonomy of the diﬀerent types of concept change and maps them to the change of the underlying distributions. Thus, we distinguish three types of concept changes:

 Real concept change is defined by the change in the posterior P(y | x)2_.

1_{We slightly abuse notation by using the same symbol for the joint probability and its marginals.} 2_{This type of change is known as concept shift in [139], despite the fact that recent works preserve}

 Virtual concept change is the change of the data’s probability P(x) in (2.6), i.e., the distribution of the inputs [178]. This change may or may not cause a change in the concept, i.e., the conditional distribution P(y| x) [167, 178]. A listing of the diﬀerent definitions of virtual changes in the literature is presented in [76] as:

– The term “virtual drift” was initially defined in [178] as a phenomenon

caused by the insuﬃcient knowledge about the data distribution and not by a change.

– A virtual concept change makes the revision of the induced model necessary

due to the change in the data distribution, as proposed by [167].

– A drift is referred to as virtual if its target concept remains unaﬀected [51]. – A virtual drift is called a sampling shift in [139], a temporary drift in [104]

and a feature change in [77].

Global and local concept change [167] are properties characterizing the scale in which the change occurs, independent of its nature (real or virtual). Unlike global drifts, local drifts occur in a subspace or a partition of the input space

X .

The problem of concept change over time has a second important criteria, namely the rate of change, at which the new concept appears and replaces the previously ob- served concept. This thesis relies on the terminology defined in [171], which classifies the diﬀerent types of concept change into categories based on the pattern at which the new concept replaces the old one, as presented in Figure 2.2:

Concept shift refers to the abrupt/sudden change in the generating process, that is the probability measure Pt is very diﬀerent from Pt−1. Hence, the new

concept has to be learned and any learned concept becomes out of date. Gradual drift refers to a progressive change of the data generating process, such

as the change from P1 to P2. A gradual drift starts at time t1 and ends at time t2 when the measures P1 and P2 are sampled at the time t∈ [t1, t2] with probabilities λ(t) and 1−λ(t), respectively. The function λ(t) is a monotonically decreasing sample probability, with λ(t) = 1 for any t≤ t1 and λ(t) = 0 for any

t ≥ t2.

ti

m

e

incremental

drift

gradualdrift

concept

shift

recurringconcept

applying the first measure P1. As a result, aged examples may remain partially consistent with the current measure. The gradual change, from P1 to P2, occurs by having samples from the first measure P1 with a probability close to 1 at the beginning of the drift, this probability decreases monotonically until it vanishes at the end of the drift, causing the measure P2 to be the dominant one.

Incremental drift refers to the smooth transition between two probability mea- sures, e.g., change from Pt1 to Pt2. An incremental drift occurs at time t1 and

ends at time t2 when the intermediate measures Pt1+1, Pt1+2, . . . , Pt2−2, Pt2−1

are sampled at the times t1+1, t1+2, . . . , t2−2, t2−1, such that the distributions Pt and Pt−1 are statistically indiﬀerent. As an example, one could imagine the

measures to be the Gaussian distributions Pt1 ∼ N (µ1, σ

1), Pt2 ∼ N (µ2, σ

2 2). The incremental change occurs when generating the data from intermediate dis- tributions, by shifting the mean slightly from µ1 to µ2 and the variance from

σ2

1 to σ22.

Recurring concept is the concept that occurs at least once after its disappearance from the data.

Finally, it is worth mentioning that Webb et al. [176] propose the first attempt to characterize the diﬀerent types of concept change in a formal framework.

In document Novel methods for mining and learning from data streams (Page 41-44)