• No results found

Process Mining Tools, Discovery Algorithms and Techniques

Many software tools are available to facilitate PM such as ProM, Disco, Celonis, Interstage Business Process Manager, Rapid Miner, and ProMiner. According to Mans et al. (2013), ProM, which is an open source solution, has become the de facto standard for PM in research and is used in all the PM dental research literature. Although ProM offers a wide variety of PM techniques and algorithms and is an open framework environment allowing the development of plug-ins by researchers, a brief functional analysis of the available products in the literature would have been useful. Disco, a commercial product, has a more intuitive interface and would be more appropriate in some scenarios e.g. where the user has limited PM experience.

PM algorithms are specialised data analysis techniques designed to examine the EL and to produce a process model representative of the EL’s contents. These are often classified in three groups; deterministic, heuristic and genetic algorithms (Gehrke & Werner, 2013). Some of the commonly used PM algorithms are the Alpha Miner, Heuristic Miner, Fuzzy Miner, Inductive visual Miner, Genetic Process Mining, Region-based process mining. Deterministic Algorithms produce defined and reproducible results. They are based on the ordering relationships between events. The Alpha Miner and its variants are deterministic algorithms. Heuristic algorithms incorporate the frequency of occurrence of events and can discover short sequences of events. The resulting process models reflect frequency of occurrence of traces and accordingly can eliminate ‘noise’ and rarely occurring events and traces if required. The Heuristic Miner is an example of this type. Genetic Algorithms much more resource intensive, generating large numbers of possible process models before deciding on the optimum. Typically, they follow the four steps; initialisation, selection, reproduction and termination, iteratively improving the final model over several generations. The AGNEs Miner (Goedertier, et al., 2009) is another algorithm facilitating the inclusion of negative events. A brief description of these follows.

The Alpha Miner

TheAlpha algorithm produces a petri-net (place-transition) from a sequence of events. It does this by examining causal relationships between tasks. It takes an event log (or workflow log) W and a set of possible events T as inputs. It assumes that the log is complete with respect to all binary sequences and contains no noise (De Weerdt, et al., 2012). In its basic form, it has several limitations and it has been enhanced as the Alpha+, Alpha++ and Alpha# models. The Alpha miner is mainly of theoretical interest and too simple to apply to real-life logs. It builds a model based on local relations between activities. It cannot deal with noise. Silent steps, non-local free-choice constructs, and duplicate steps (local loops) cannot be discovered. Short loops can be dealt with by the Alpha+ algorithm and Alpha++ can detect non-free choice constructs. Its strength is that it is a simple algorithm containing the basic PM ideas and concepts and can be formalised in a short form. It is, however, not robust and unsuitable for real world event logs.

Heuristic Miner

The Heuristic Miner was developed to address many of the problems of the Alpha Miner and can deal with noise and exceptions. It is especially suited to a real-life setting (De Weerdt, et al., 2012). It outputs a heuristic net which can be converted to a Petri net which in turn can be formally analysed using the process-quality metrics. It is generally useful with real-life data containing ‘not too many’ different events. It is an extension of the Alpha Algorithm and can discover short loops and non-local dependencies. It has a noise threshold parameter setting making it suitable for a real-world setting. It applies frequency information to three types of relationships between activities in an event log; direct dependency, concurrency, directedly-connectedness. It derives XOR and AND connectors from dependency relations and can exclude exceptional behaviour and noise by leaving out edges. It lacks the capability of detecting duplicate activities. As with the Alpha & Alpha++, it builds a model based on local relations between activities.

Fuzzy Miner

This technique addresses some of the problems of large numbers of activities and highly unstructured behaviours. It employs an adaptive simplification and visualisation technique. It outputs a fuzzy model. It can simplify the process model at a desired level of abstraction and uses significance/correlation metrics to do this. It can hide less important activities in clusters and builds a model based on a global approach looking at the whole event log. This tool aims to emphasize graphically the most relevant behaviour,

by calculating the relevance of activities and their relations. Two metrics are used to present this. First, ‘significance’ measures the frequency of occurrences of events in the log, and second, ‘correlation’, determines how closely related two events that follow each other are, so that events highly related can be so represented in the model. It has limited ability to define choices and to define parallelism of events.

In both ProM and Disco, it is presented with an interface where the settings can be configured and their effect on the model can be seen immediately. The widths of the edges between the nodes is proportional to their importance (i.e. absolute frequency) and the darker edges indicate a higher level of correlation between the nodes i.e. their tendency to follow one another (Mans, 2011). The Fuzzy Miner is also capable of animating and replaying the log on the model. This gives a rapid, intuitive understanding of the process and quickly shows heavily executed paths and bottlenecks. Shortcomings of the models generated are that the model is without clear semantics which cannot be converted to other models. Due to this, the formal metrics commonly used to evaluate process models i.e. fitness, precision, simplicity and generalisability cannot be applied to the model.