• No results found

Process Discovery at Run-time

Through process discovery a causal connection from the system to the discovered BP Model can be established. Related relevant work in process discovery has been discussed in Section2.4and is the basis for this gap analysis.

3.3.1 Gap Analysis: Process Discovery Algorithms

The general problem is that due to uncontrolled deviations from the planned BP during implementation and enactment the real BP needs to be discovered directly from execu- tion event logs (see Section2.2.2). The problem of process discovery and its challenges (see Section2.4.1) are well researched with a very large number of publications in this area. The most relevant techniques (see Section2.4.2) have, however, individual short- comings with regards to functional requirements necessary to establish a causal connec- tion from BPMS to discovered BP model:

1. No Infrequent and Incomplete Logs: A reality in real-life scenarios are infrequent (noisy) and incomplete logs. To establish a strong causality an algorithm must be able to interpret any given real-life scenario. However, many algorithms cannot deal with incomplete and noisy logs, e.g. abstraction-based and other factual algorithms not employing a heuristics-based approach [45,249], or region-theory based algo- rithms [132,243].

2. Representation not on BP Abstraction Level: There is a noticeable difference be- tween business process specifications at design-time and the representations of business processes discovered from logs. Whereas prominent standards for busi- ness process models, e.g. BPMN and EPC, are BP-domain-specific, the results of process discovery algorithms conform to general purpose representations like Petri Nets, e.g. [243, 249], or other abstract languages such as Causal Net, e.g. [278], or fuzzy models, e.g. [89]. For a business analyst, the comprehension of these general purpose languages for decision making is a difficult task, because (1) these are of a different representation and abstraction level than what she is familiar with and (2) the mapping between the process modelled at design-time and the discovered process model at run-time can be difficult to establish since they both conform to different languages. This can be mitigated by language transformations which are however not guaranteed to produce a sound BP-domain model since they reside on a different level of abstraction. Also, techniques in the area of process discovery almost exclusively focus on the discovery of the control-flow perspective and ne- glect other perspectives such as resources and performance (see Section2.1). Tech- niques that mine information about these perspectives are mostly associated with the model enhancement discipline (introduced in Section 2.2.3) and discussed in more detail in Section2.5.3. Furthermore, representations such as Petri Net, Causal Net, etc. can not be easily used for further analysis and decision support since they have difficulties to express perspectives other than of the control-flow, e.g. re- sources, performance. This limitation of expression is fittingly summarised by Aalst in [230] who concludes"The world is not a Petri Net"and motivates the need for a more representative, BP-domain-specific solution.

3. Non-Deterministic or Non-Automatic: Generally, non-deterministic, e.g. genetic algorithms [26,147] or neural nets [39], as well as techniques that require manual effort, e.g. [243], are not suited for the discovery of causally connected run-time BP models. Although non-deterministic types of analyses like the genetic algorithms do not subscribe to any of the above mentioned shortcomings, the simple fact that no correct or stable output can be guaranteed for the same input, makes them un- suitable to maintain a causal connection in a real-time setting; The same applies for approaches requiring manual effort.

Other limitations of process discovery algorithms exist but are rather dependent on other aspects of the use case than the establishment of a causal connection at run-time. One

such example is the balance between over-fitting and under-fitting: Since often the qual- ity of the solution is dependent on what is more important for a specified use-case, e.g. under-fitting might be preferred to discover human readable non-spaghetti nets - in other occasions the discovery of extremely accurate over-fitting models might be desired. An- other such use-case dependent limitation is that of footprint abstraction: Most of the process discovery approaches are mainly based on footprints representing direct activity successions, i.e. some form of a localdirectly-followsrelation, e.g. [126,249,278]. Global relations, e.g. activity x1is eventually followed by activity x2, are either not taken into

account or are only used to identify and reflect special behaviour, e.g. non-free choice constructs [279]. Especially with regards to the discovery of parallel constructs the even- tually follows relation has a clear benefit in comparison to the directly follows relation: all "follows" combinations of the elements of two parallel paths are easier to observe when they are required to be "eventually" rather than "directly". This will be discussed in more detail in Section5.2. The author argues that results can be improved and computation costs reduced by basing the discovery on global relations (see Section5.8.1).

In the context of this gap analysis the Inductive Miner (IM) [126] together with its ex- tensions [127,128] to deal with incomplete and noisy logs deserves special mentioning: It is the only algorithm - to the best of the authors knowledge - that does not subscribe to any of the three main limitations that apply when PD algorithms are used for establishing a causal connection during run-time. However, still two limitations remain: (1) The IM is based on local rather than global relationships and (2) it has a potentially exponential run-time when considering incomplete logs because the involved constraint solving is an NP-hard problem for which an SMT4solver is used [128].

3.3.2 Gap Analysis: Online Process Discovery

Business processes can stretch out for months, have a very high instantiation frequency, or both. For a slow and long executing BP the traditional process discovery approach might temporarily create causality from system to the discovered model; this is not the case for BPs with very frequent state changes. Here, the causality from system to model is already "outdated" when the discovery analysis has finished. Another additional chal- lenge in the BPM domain which is not address by traditional process discovery algorithms is that of uncontrolled deviations as elaborated earlier in Section3.1. Those can manifest themselves in many different forms, e.g. (1) a predicted condition does in reality never apply, (2) exceptional or unforeseen conditions occur that demand ad-hoc adjustments in the execution flow, or (3) a gradual or abrupt deviation from planned BPs towards a more applicable/optimal (in real life) execution flow takes place. The first two deviation examples are addressed by the more sophisticated process discovery algorithms of recent years which can deal with noisy and incomplete logs (see discussion above). However, during recent years and due to the advancement of Big Data, a frequently changing en-

vironment, and fast machine guided processes, the latter deviation example of continu- ous and uncontrolled BP evolution has gained more relevance. These observations build the foundation of (initial) research in the fields of concept drift in the business process domain for which recent research is reviewed in Section2.4.3). In reality only a "...few processes are in steady-state and due to changing circumstances processes evolve" [138]. This view stands in contrast to the view of traditional PD approaches where BPs are as- sumed to be steady-state. In Section2.4.3different solutions to address concept drift are discussed, e.g. [21, 23, 34, 274–276], including their individual limitations. The biggest of which with regards to descriptive BP run-time models is that their purpose is essen- tially different: they focus on detection and localisation of concept drift and are rather pre-processing methods to identify where to split logs so that they can be individually analysed by a traditional PD algorithm. The sole detection and localisation of these drifts on static event logs are not sufficient for maintaining a causal connection from system to model at run-time.

Approaches of Online Process Discovery as discussed in Section2.4.4go a step fur- ther. They work on an online stream of events rather than on an offline event log. The conceptual idea is that of carrying out Process Discovery with Complex Event Processing techniques (see Section2.2.3). That means in the context of models at run-time, imme- diate processing of state changes (events) when they occur to information of an higher abstraction level (BP models). The motivation is to have a run-time reflection of the em- ployed processes based on up-to-date rather than historical information. As discussed in the respective Section2.4.4online process discovery algorithms have to deal with two ad- ditional challenges as opposed to the traditional process discovery algorithms: (1) antic- ipation of concept drift, i.e. reflecting new behaviour as well as forgetting old behaviour, and (2) potentially processing of an infinite amount of events at high frequency, i.e. scal- able with regards to the amount of events occurring. In the state of the art Section2.4.4 it has been shown that incremental discovery approaches [37,119–121,219] address the challenges of online process discovery, albeit only partly:

• Incremental process mining [37]: Newly observed behaviour is added, but once ob- served behaviour will stay valid indefinitely despite it becoming outdated, i.e. no support for real concept drift since revolutionary changes (see Section2.3) are not supported.

• Incremental workflow mining [119–121,219]: Additonal to the same short-coming of incremental process mining, it also requires human interaction, i.e. not applica- ble in real-time environment.

A contrast to the incremental approaches are that of Streaming Process Discovery as discussed in the work of Burattin et al. in [27]. It discusses different techniques for us- ing event streams for process discovery (e.g. sliding window, FIFO queue, ageing, lossy counting) and generally provides a solution that addresses the challenges of online pro- cess discovery. However, since it is based on the HeuristicsMiner its shortcomings apply

for this online approach as well, namely the BP-independent representation, the sole fo- cus on the control-flow, and the reliance on only local relations between activities. To the best of the authors knowledge the only other streaming process discovery approach is that of Maggi et al. [138], but since it is discovering declarative process models (using the sliding window and lossy counting) it is outside of the scope of this thesis.

Summary

The findings of this section are summarised in Table3.2. The used abbreviations as well as the content are explained further in the following paragraphs.

Approaches: Different types of approaches that are compared with regards to the chal- lenges of BP model discovery at run-time: (1) Traditional Process Discovery (TPD) repre- sent the static offline process discovery approaches based on Logs (see previous Section), (2) Traditional Process Discovery with Concept Drift Detection (TPD+CDD) is the combi- nation of both fields, i.e. using a concept drift algorithm to detect where to split the log into sub-logs which are then individually analysed by a traditional Process Discovery al- gorithm, (3) Incremental Process Discovery as proposed in [37], (4) the approach of Maggi et al. (MEA) [138], (4) Incremental Workflow Discovery as proposed in [119–121,219], and (6) the approach of Burratin et al. [27].

BP Representation: The BP representation of an approach decides what information can be retrieved and what further analysis can be carried out. The domain of process discovery is solely focussing on the control-flow perspective which as discussed in Sec- tion3.2is not sufficient. Furthermore, is the type of BP representation important: While traditional approaches (TPD) have also advanced towards discovering BP domain mod- els, that is not the case for online process discovery approaches: Either they are based on declarative models (IPD and MEA) or on non-domain-specific models (IWD and BEA) such as Petri-Net and Heuristic Net.

Supported Input: While the traditional discovery approaches (TPD, TPD+CDD) work on complete logs, the online approaches (IPD, MEA, IWD, BEA) can operate on a stream of events. However, only the BEA can handle noisy or in complete logs, i.e. she is the only one able to handle real-world BP scenarios.

Concept Drift: Supporting concept drift is an important requirement to maintain the causal connection from system to BP model. While traditional approaches (TSP) assume to be observing an unchangeable process, i.e. do not support concept drift (−) the incre- mental approaches (IPD and IWD) support concept drift to some level (◦) and the other approaches (TPD+CDD, MEA, BEA) even fully (+).

Table 3.2 Overview Gap Analysis Process Discovery at Run-time

Approach BP Representation Supported Input Concept Level of Type View Type Inc./Noise Drift Automation

TPD diverse Control- flow Perspec- tive only

Log varies − varies

TPD+CDD diverse Log varies + ◦

IPD Declarative Stream − ◦ ◦

MEA Declarative Stream − + +

IWD Petri-Net Stream − ◦ ◦

BEA Heuristic Net Stream + + +

Level of Automation: Similarly, the level of automation is important to maintain the causal connection in real-time systems: while all approaches are at least semi-automatic (◦), i.e. only need some manual steps, others are able to operate fully automatic (+), i.e. without any manual steps.