Concept Drift in Process Mining

2.4 Process Discovery

2.4.3 Concept Drift in Process Mining

Concept driftis a problem in the domain of online machine learning that describes changes in ahidden context that potentially cause more or less radical changes in the target concept [284]. That is, changes in the underlying data often make the model built on the "old" data inconsistent and can only be addressed with regular model updates to reflect the new circumstances. The problem of concept drift complicates the task of learning a model and requires a deviation from common techniques which treat every instance (old and new) with equal importance [228].

The same applies in the domain of process discovery: Models of business processes may change over time (see Business Process Flexibility Section2.3), i.e. an event log may not only contain information of one BP but in fact of multiple versions/variants of a process. If we take the example logL1(see Equation2.1on page35) and assume that all traces

in the first line occurred before all traces of the second line then it is fair to assume that the execution of two (completely) different versions of the process was recorded in one log. IfL1is split into two separate logs

L∗₁={[b,a]4, [a,b,d,e]5, [b,a,e,d]4, [b,a,c,a,b,c,b,a,d,e,e,d]6} and

L∗₂={[g,g]2, [f,h]3, [f,f,h,f,g,h,g,f,h]8, [g,h,f]2}

their individual analyses result in the discovery of two different and independent BP models (see Figure2.18). Both of these modelsBP1andBP2are more precise (see precision

quality criterion in Section2.4.1) with regards to their respective source logsL∗₁ andL∗₂

than the overall BP model (see Figure2.12). Additionally, instead of only discovering one overall representative process like in the traditional process discovery problem (see Fig- ure2.12) the process’s evolution and thus arguably a more accurate reflection of the real- ity recorded inL1is discovered. Traditional process discovery algorithms as discussed in

Section2.4.2work on the assumption that the log describes the behaviour of one single, not changing business process and thus treat every trace/instance within a log with equal importance irrespective of their occurrence timing. The result of these algorithms is an "averaged" model that tries to represent all behaviour captured in the event log even if this behaviour is contradictory. Note, that this is not the case in the concept drift example from Figure2.18- here the behaviour is exclusive and can easily be merged.

Because only few real-life processes are in a steady state (as assumed by the traditional process discovery algorithms), detecting, understanding, and dealing with concept drift in the domain of process discovery is of "prime importance for the management of processes" [251], i.e. the problem of concept drift is an important challenge to process discovery algorithms [233,251]. Since concept drift describes the change of an underlying concept over time it can be translated into a one-dimensional clustering problem based

g h f BP Influence in Log: BP1 BP2 Moment of Change e d c a b

L = { [b,a] , [a,b,d,e] , [b,a,e,d] , [b,a,c,a,b,c,b,a,d,e,e,d] } L = { [g,g] , [f,h] , [f,f,h,f,g,h,g,f,h] , [g,h,f] } time 4 5 4 6 1 * * 2 2 3 8 2

Fig. 2.18 Example Concept Drift in the Original LogL1

on time, i.e. grouping of instances based on their timestamps into a specified number of clusters which can be interpreted into well-fitting and well-structured BP model versions. The difficulty is to find the right balance between fine-grained and coarse-grained clustering. An extreme example for a too fine-grained clustering is when each trace is considered its own cluster, i.e. represents a BP model (similar to the paths in the flower model in Fig- ure2.13). Examples of too coarse-grained clustering are the traditional process discovery algorithms for which all instances in the log belong to one single cluster. The challenge of concept drift has gained more attention recently driven by specifically dynamic real- life scenarios, e.g. in the domain of health care, where processes are weakly structured (many exceptions) and inductive methods such as the abstraction-based algorithms have difficulties to describe the behaviour with one single standardised BP model [248]. Here, the discovery of "temporal patterns" produces simpler models and helps to understand time-separated parts of a log [248].

One important and comprehensive step towards a definition of the problem of concept drift in process mining and its accompanying challenges is provided by Bose (et al.) in [20, 21, 23]. In the work three main challenges of concept drift have been identified [20,21,23]:

• Change Point Detection:

The challenge of detecting when, i.e. where in the log, a process change has taken place.

• Change Localisation and Characterisation:

If a change has been identified it should be further specified, i.e. what exactly has changed (localisation) and how has it changed (characterisation). With regards to the localisation Bose et al. differentiate between three different perspectives: Control-flow, data, and resource (similar to what was defined in Section2.1, page16). Furthermore, the exact location or region within these perspectives should be identified. Another characterisation of a BP concept drift is its "nature" [23]: (1)sud- den drift, i.e. an immediate substitution of a complete BP by another at one single point in time, (2) gradual drift, i.e. for a limited time both BPs coexist during the substitution of one BP by another, (3)recurring drift, i.e. a set of different BPs are substituted back and forth (alternating BPs, e.g. to adapt to seasonally recurring circumstances), and (4)incremental drift, i.e. a drift from one BP to another by smaller incremental intermediate BPs. Note, that this classification is not mutually exclusive, e.g. recurring and incremental drifts can be either sudden or gradual. While in the BP flexibility Section 2.3change from an enactment point of view was discussed, the concept drift characterisation of change is from an diagnosis point of view. For instance, modification policies like "Migrate" or "Flush" respectively represent sudden or gradual drifts. The characterisation from Bose et al. is an essentially a different view derived from the BP flexibility domain. It is, in comparison, a simplified view because certain aspects of an enacted change can not be extracted from the log, e.g. swiftness or anticipation, or are of no relevance to the concept drift problem, e.g. momentary change does not constitute a concept drift but should be considered noise.

• Change Process Discovery:

Is the challenge to describe a discovered change process in the second-order dy- namics [23], i.e. to discover possibly existing higher-level patterns that specify the order and other specifics of recurring changes. For instance, due to seasonal circumstances concept drift patterns could be modelled as a usually non-concurrent process.

Furthermore Bose et al. see two different classes of concept drift analyses on an event log: Offline analysisrepresents a scenario where the concept drift discovery on an event log can be performed without any real-time constraints andonline analysis which describes the scenario where a concept drift has to be discovered in (near) real-time. Some solutions for the latter type are also discussed in the following Online Process Discovery Section2.4.4.

With regards to the first type of offline concept drift discovery Bose et al. propose a solution to address the first and partly the second of the identified challenges in [21]. Though restricted to the control-flow perspective, the proposed approach is able to detect points of change in time and localise these changes in the control-flow. In the work the log is split up into a number of different parts. For each of them four measures are

calculated upon which a change discovery can be achieved [21]: For each of the activities therelation type count triple is calculated by determining how many times activities always, sometimes, or never eventually (not directly) follow the specified activity. Con- sideringL∗₁ the triple is (0, 2, 3) for activityd becaused is never followed by any activity ineachof the instances (→0),d followed by activitiesd ande insomeof the instances (→2), andd isneverfollowed by three of the involved activities a,b, and c (→3). This measure can also be calculated for the "precedes" relation, i.e. the number of times an activity is always, sometimes, or never preceded (not directly) by the other activities. A second measure calculated is therelation type entropy, i.e. entropy over the relation type count vector. In contrast to relation type count and entropy which are global measures over each instance in the entire (sub-)log, Bose at al. also suggest two local features [23]:

Window count, which is the number of times a specified activity is followed by another one within a certain window, andJ-measure, the probability of a particular activity being followed by another within a certain window, calculated based on the window count values.

These four "feature" values can then be used to identify a concept drift. This is achieved by splitting up the (time-ordered) log into sub-logs of the same length, i.e. contain same number of traces, and compute feature sets for these sub-logs. If significant changes in the feature values from neighbouring sub-logs are identified concept drift can be assumed. In order to find the most probable position for such a drift ahypothesis test is executed [23], which can generally be described as a sliding of the sub-log ranges over the log and finding the point for which the feature values differ the most. This method has been evaluated using the methodology of rediscovering, i.e. the position/time of ar- tificially imposed changes recorded in a log were successfully detected [23], but also in a real-life setting by detecting drifts in three different process event logs of a Dutch munic- ipality [22]. However, a few shortcomings are: This method is only effective if a suitable sub-log length is chosen; Changes in more complex constructs like loops can not be suf- ficiently detected since it is not captured "how many times" an activity eventually follows another activity in one trace; Gradual or incremental drifts are hard to detect with a fixed sub-log length, e.g. the specified length can at the same time both be too small and too large to detect some of the incremental intermediate BP changes.

The problem of concept drift does not only present a challenge for process discovery but can also help to solve it. One such approach is explained in the work by Weber et al. [275, 276]: Here theα-algorithm [249] as introduced previously in Section 2.4.2is used to create Petri Nets and transform them into probabilistic deterministic finite au- tomata (PDFA) [260,261]. Similar to the approach from Bose a hypothesis test is carried out which involves the usage of a sliding window to extract the individual sub-logs which are examined for concept drifts after transformation to PDFAs, i.e. compare the distribu- tion generated by the discovered PDFA with the "ground truth". This is essentially carried out for every new trace and it is argued that the approach complies with real-time constraints because it guarantees a result within a pre-specified amount of time (regardless

of how long that time is). To achieve this upper bound run-time for each iteration the approach uses an algorithm presented in [274] to determine the required trace length necessary to, with a certain probability, correctly (re-)discover the structure of a Petri Net model. The detection of concept drift in this approach is mainly focussed on probabilities and not on the structure of the Petri Net [275]. Also, due to the possibility of infinite states it is restricted to acyclic models.

Another approach by Carmona et al. in [34] which extends the initial process discovery approach introduced in [33] to detect concept drift by utilising the theory ofabstract interpretation[41]. First an abstract representation in the form of a polyhedra is built for some initial traces in a log. Subsequent traces are examined whether or not they lie within the initial polyhedra. If so, it is considered to be a trace from the same process. If a trace lies outside of it a changed process is indicated and the polyhedra is updated. However, this technique is again a factual analysis that is not able to handle noisy behaviour, i.e. to "ignore" infrequent behaviour.

Generally, all of these approaches can be seen as a type of pre-processing with which the log is split up into different parts that are then individually analysed to discover the different versions of a BP as models. The change point information identified may also help to address the third challenge of concept drift: To discover the change process. In [88] it is examined how better decision support can be achieved for flexible processes by using information about the processes’ changes. For this two different process mining algorithms, multiphase miner [255] and an adapted method by Cortadella et al. based on the theory of regions [40], has been used in order to identify a change process. The process is discovered from a so calledchange event logthat is provided by the adaptive workflow management system ADEPTf l ex [191] (see Section2.3.2) and stores detailed context information

about each change that was applied to the workflow. These change logs, however, are in the most cases not available and thus detailed information about the changes/concept drifts are not known at the time of analysis. The technique of Bose et al. can be a substi- tute for analysing these changes but being an a-posteriori analysis the information about the change characteristics lacks in detail as opposed to logged information at the time of the enactment of the change.

In document Descriptive business process models at run time (Page 58-62)