Top PDF Workflow mining: Discovering process models from event logs

Workflow mining: Discovering process models from event logs

Workflow mining: Discovering process models from event logs

19, 20, 23] also allows for concurrency. It uses stochastic task graphs as an intermediate representation and it generates a workflow model described in the ADONIS modeling language. In the induction step task nodes are merged and split in order to discover the underlying process. A notable difference with other approaches is that the same task can appear multiple times in the workflow model, i.e., the approach allows for duplicate tasks. The graph generation technique is similar to the approach of [7, 33]. The nature of splits and joins (i.e., AND or OR) is discovered in the transformation step, where the stochastic task graph is transformed into an ADONIS workflow model with block- structured splits and joins. In contrast to the previous papers, our work [31, 32, 48–50] is characterized by the focus on workflow processes with concurrent behavior (rather than adding ad-hoc mechanisms to capture parallelism). In [48–50] a heuristic approach using rather simple metrics is used to construct so-called “dependency/frequency tables” and “dependency/frequency graphs”. In [31] another variant of this technique is presented using examples from the health-care domain. The preliminary results presented in [31, 48–50] only provide heuristics and focus on issues such as noise. The approach described in this paper differs from these approaches in the sense that for the α algorithm it is proven that for certain subclasses it is possible to find the right workflow model. In [4] the EMiT tool is presented which uses an extended version of α algorithm to incorporate timing information. Note that in this paper there is no detailed description of the α algorithm nor a proof of its correctness.
Show more

38 Read more

Discovering Structured Event Logs from Unstructured Audit Trails for Workflow Mining

Discovering Structured Event Logs from Unstructured Audit Trails for Workflow Mining

Abstract. Workflow mining aims to find graph-based process models based on activities, emails, and various event logs recorded in computer systems. Current workflow mining techniques mainly deal with well-structured and -symbolized event logs. In most real applications where workflow management software tools are not installed, these structured and symbolized logs are not available. Instead, the artifacts of daily computer operations may be readily available. In this paper, we propose a method to map these artifacts and content-based logs to structured logs so as to bridge the gap between the unstructured logs of real life situations and the status quo of workflow mining techniques. Our method consists of two tasks: discovering workflow instances and activity types. We use a clustering method to tackle the first task and a classification method to tackle the second. We propose a method to combine these two tasks to improve the performance of two as a whole. Experimental results on simulated data show the effectiveness of our method.
Show more

11 Read more

Activity Mining for Discovering Software Process Models

Activity Mining for Discovering Software Process Models

2 Related Work The research in the area of software process mining started with new approaches to the grammar inference problem for event logs [CW98]. The other work from the software do- main are in the area of mining from software repositories [MSR05]; like in our approach, they use SCM systems and especially CVS as a source of input information, but for mea- suring the project activity, detecting and predicting changes in code, advising newcomers to an open-source project and detecting the social dependencies between the developers. The first application of “process mining” to the workflow domain was presented by Agrawal in 1998 [AGL98]. This approach models business processes as annotated activity graphs and is restricted to sequential patterns. The approach of Herbst and Karagiannis [HK99] uses machine learning techniques for acquisition and adaptation of workflow models. The foundational approach to workflow mining was presented by van der Aalst et al. [WvdA01]. Within this approach, formal causality relations between events in logs and the α-mining algorithm for discovering workflow models with its improvements are presented. In ad- dition to the software process and business process domains, the research concerning dis- covering the sequential patterns in the area of data mining is important here [AS95]. In comparison to the classical approaches, we do not have logs of activities and, so, must discover the activities first. We make use of our document-oriented view on the activities, i.e. the process is derived from the inputs and outputs of the activities. We suggest coming up with the model early and refining it when additional information is available.
Show more

6 Read more

A Cloud Theory-based Simulated Annealing for Discovering Process Model from Event Logs

A Cloud Theory-based Simulated Annealing for Discovering Process Model from Event Logs

applying the genetic algorithm. Their approach was based on the discovery of Petri nets. Petri nets are an approach to present the process model [8]. Bratosin et al. in 2010 improved this approach and tried to decrease the time taken in models evaluation stage by a sampling of event log [9]. In the same year and in another article, they tried to reduce the running time of algorithm by using a distributed approach [9]. Tsai et al. in 2010 added time perspective exploring to the genetic algorithm by the use of available data on the events time in the event log and incorporated
Show more

7 Read more

A Data Warehouse for Workflow Logs

A Data Warehouse for Workflow Logs

Compared to these approaches we emphasized application areas where the workflow process model is known. Nevertheless, the proposed warehouse model can also store log data of ad-hoc workflows and may thus serve as a basis for process mining techniques mentioned above. The focus of our work is to exploit the workflow log and building a data warehouse to obtain aggregated information such as i.e. to detect critical process situations or quality degradations under different circumstances, rather than re-engineering workflow specifications form the log. However, these process mining techniques can deliver important data for discovering typical execution scenarios, dependencies between decisions and probabilities of workflow instance types. Business process re-engineering and workflow improvement will benefit form a combination of the approaches.
Show more

15 Read more

Efficient Discovery of Understandable Declarative Process Models from Event Logs

Efficient Discovery of Understandable Declarative Process Models from Event Logs

– Second, of the millions of potential constraints, many may be trivially true. For example, the response constraint in Fig. 2 holds for any event log that does not contain events relating to activity a. Moreover, one constraint may dominate another constraint. If the stronger constraint holds (e.g., (a → ♦b)), then automatically the weaker constraint (e.g., ♦a → ♦b) also holds. Showing all constraints that hold typically results in unreadable models. This paper addresses these two problems using a two-phase approach. In the first phase, we generate the list of candidate constraints by using an Apriori al- gorithm. This algorithm is inspired by the seminal Apriori algorithm developed by Agrawal and Srikant for mining association rules [7]. The Apriori algorithm uses the monotonicity property that all subsets of a frequent item-set are also frequent. In the context of this paper, this means that sets of activities can only be frequent if all of their subsets are frequent. This observation can be used to dramatically reduce the number of interesting candidate constraints. In the sec- ond phase, we further prune the list of candidate constraints by considering only the ones that are relevant (based on the event log) according to (the combination of) simple metrics, such as Confidence and Support, and more sophisticated met- rics, such as Interest Factor (IF) and Conditional-Probability Increment Ratio (CPIR), as explained in Section 4. Moreover, discovered constraints with high CPIR values are emphasized like highways on a roadmap whereas constraints with low CPIR values are greyed out. This further improves the readability of discovered Declare models.
Show more

16 Read more

Business Process Management and Process Mining within a Real Business Environment: An Empirical Analysis of Event Logs Data in a Consulting Project

Business Process Management and Process Mining within a Real Business Environment: An Empirical Analysis of Event Logs Data in a Consulting Project

Therefore, in YAWL new constructs are introduced, such as OR joints, removal of tokens and multiple instances activities, that make the language easier and more expressive. In particular, the OR is one of the most problematic patterns and very often other notations struggle with its semantic ( Rozinat, 2010). Despite other languages, in YAWL the OR split/merge is projected to guarantee the desired synchronization. On the one hand, the OR-split triggers some, but not necessarily all the outgoing flows and it is appropriate in those situation when it is unknown until runtime what concurrent resultant work can lead from the completion of activities. On the other hand, the OR-join ensures that an activity waits until all the incoming flows have either finished, only if there is something necessary to wait. Moreover, the formalism offers several new syntactical elements which intuitively describes other workflow patters. For instance, the notation enables the description of simple choice (graphically depicted via an XOR split), simple merge (indicated as a XOR join). Clearly, the possibility of designing common situation, such as parallelism of activities, are still satisfied with the present notation (via AND split). Furthermore, in YAWL, transitions are assumed to be no atomic: in this sense they do not fire immediately, and it may require some time to do the task. For this reason, one transition here is equal to two transitions in a Petri net, with one place within them.
Show more

162 Read more

Discovering and Utilising Expert Knowledge from Security Event Logs

Discovering and Utilising Expert Knowledge from Security Event Logs

M ) is greater than 1, 5 event types were removed as their frequencies were higher than the SD. After that, the application created the object frequency distribution (OFD) of the remaining entries and conducted a Two-Sample Kolmogorov–Smirnov normality test. The OFD was determined as normal distribution and the support range (SR) was calculated as 5% − 33%. Using the SR and 100% confidence value, the Apriori algorithm generated 64,931 object-based association rules. These rules produced 9 chains of events, which resulted in 10 temporal-association rules using a 50% threshold of temporal- association-accuracy value. The entire set of these rules formed a Directed Acyclic Graph (DAG) as no cycles, conflicts and redundancies were found. After that, a causal rank was calculated and assigned to each rule in the DAG using Fast Causal Inference algorithm. This produced the final set of TAC rules, which was stored into the MySQL database and represented as a PDDL domain action model as well. Every step of this application is performed in a fully automated manner, and are also capable of error management and exception handling. It is worth mentioning here that the processing of this application is based on batches. It can process a batch of event logs, specified by a directory path, and generate individual domain action models against each dataset. Every time an event log dataset is processed, the resulting TAC rules alongside other relevant information are stored in the database. Another Figure 5.3 presents the second part of the same application. It shows the process of generating a PDDL domain action model file, named ‘database-to-domain.pddl’, directly from the database that contains 347 TAC rules. After creating a DAG of the rules, only 134 of them remained and were encoded into a domain action model. Rest of the rules were eliminated by the application due to cycles, conflicts and redundancies.
Show more

177 Read more

Discovering Queues from Event Logs with Varying Levels of Information

Discovering Queues from Event Logs with Varying Levels of Information

1 Eindhoven University of Technology, the Netherlands; 2 Technion, Haifa, Israel Abstract. Detecting and measuring resource queues is central to busi- ness process optimization. Queue mining techniques allow for the iden- tification of bottlenecks and other process inefficiencies, based on event data. This work focuses on the discovery of resource queues. In par- ticular, we investigate the impact of available information in an event log on the ability to accurately discover queue lengths, i.e. the number of cases waiting for an activity. Full queueing information, i.e. times- tamps of enqueueing and exiting the queue, makes queue discovery triv- ial. However, often we see only the completions of activities. Therefore, we focus our analysis on logs with partial information, such as missing enqueueing times or missing both enqueueing and service start times. The proposed discovery algorithms handle concurrency and make use of statistical methods for discovering queues under this uncertainty. We evaluate the techniques using real-life event logs. A thorough analysis of the empirical results provides insights into the influence of information levels in the log on the accuracy of the measurements.
Show more

12 Read more

Process Mining Event Logs from FLOSS Data: State of the Art and Perspectives

Process Mining Event Logs from FLOSS Data: State of the Art and Perspectives

6 Conclusion FLOSS repositories store a sheer volume of data about participants activities. A number of these repositories have been mined using some of the techniques and tools we have discussed in this paper. However, to the date, there has not been any concrete investigation into how logs from FLOSS repositories can be process mined for analysis. This maybe attributed partly to two apparent factors. Firstly, researchers interested in mining software repositories have not come across pro- cess mining and thus its value is unexploited; secondly, the format of recorded in FLOSS poses a challenge in constructing event logs. Nevertheless, after re- viewing existing mining techniques and the analysis they provide on the data, one can infer the type of input data, the expected output and thus construct logs that can be used for analysis through any of process mining recognised tools such as the ProM framework or Disco. The example presented previously has been carried out using Disco as tool of visualisation. This approach can bring an additional flair and extensively enrich data analysis and visualization in the realm of FLOSS data. In our future work, we plan to produce tangible exam- ples of process models as reconstructed from FLOSS members daily activities. These logs can be built from Mailing archives, CVS data as well as Bug reports. With a clearly defined objective and the type of data needed, process mining promises to be a powerful technique for empirical evidence provision in software repositories.
Show more

16 Read more

Sequence Partitioning for Process Mining with Unlabeled Event Logs

Sequence Partitioning for Process Mining with Unlabeled Event Logs

Another type of behavior that frequently occurs in process models is parallelism [31]. For example, in a process with two parallel branches ab and cd, their concurrent execution may lead to (sub-)sequences such as abcd, acbd, cabd, acdb, cadb, etc. However, activities have often time constraints or other ordering restrictions, meaning that only a small fraction of all possible interleavings is actually observed in practice. In any case, the interleaving of parallel branches fits well into the nature of our problem and can be handled without adaptation, provided that, in a process with parallel branches, these will be captured as separate patterns. As a consequence, the presence of parallel behavior may increase the number of patterns required to find a minimal solution.
Show more

26 Read more

Discovering User Communities in Large Event Logs

Discovering User Communities in Large Event Logs

Abstract. The organizational perspective of process mining supports the discovery of social networks within organizations by analyzing event logs recorded during process execution. However, applying these social network mining techniques to real data generates very complex mod- els that are hard to analyze and understand. In this work we present an approach to overcome these difficulties by focusing on the discovery of communities from such event logs. The clustering of users into com- munities allows the analysis and visualization of the social network at different levels of abstraction. The proposed approach also makes use of the concept of modularity, which provides an indication of the best di- vision of the social network into community clusters. The approach was implemented in the ProM framework and it was successfully applied in the analysis of the emergency service of a medium-sized hospital. Key words: Process Mining, Social Network Analysis, Hierarchical Clustering, Community Structure, Modularity
Show more

12 Read more

Aligning Data-Aware Declarative Process Models and Event Logs

Aligning Data-Aware Declarative Process Models and Event Logs

iii Aligning Data-Aware Declarative Process Models and Event Logs Abstract Conformance checking, a branch of process mining, allows analysts to determine whether the execution of a business process matches the modeled behavior. Process models can be procedural or declarative. Procedural models dictate the exact behavior that is allowed to execute a specific process whilst declarative models implicitly specify allowed behavior with the rules that must be followed during execution. The execution of a business process is represented by event logs. Conformance checking approaches check various perspectives of a process execution including control-flow, data and resources. Approaches that checks not only the control-flow perspective, but also data and resources are called multi-perspective or data-aware approaches. The approaches provide more deviation information than control-flow based techniques. Alignment based techniques of conformance checking have proved to be advantageous in both control-flow based and data-aware approaches. While there exist several data-aware approaches for procedural process models that are based on the principle of finding alignments, there is none so far for declarative process models.
Show more

49 Read more

Mining conditional partial order graphs from event logs

Mining conditional partial order graphs from event logs

5.2 Concurrency-aware CPOG mining This section presents an algorithm for extracting concurrency from a given event log and using this information for simplifying the result of the CPOG mining. Classic process mining techniques based on Petri Nets generally rely on the α-algorithm for concurrency extraction [1]. We introduce a new concurrency ex- traction algorithm, which differs from the classic α-algorithm in two aspects. On the one hand, it is more conservative when declaring two given events concur- rent, which may lead to the discovery of more precise process models. On the other hand, it considers not only adjacent events in a trace as candidates for the concurrency relation but all event pairs, and therefore can find concurrent events even when the distance between them in traces is always greater than one, as we demonstrate below by an example. This method works particularly well in combination with CPOGs due to their compactness, however, we believe that it can also be useful in combination with other formalisms.
Show more

21 Read more

A Generic Import Framework For Process Event Logs

A Generic Import Framework For Process Event Logs

2 Process Mining and ProM Process-aware information systems, such as WfMS, ERP, CRM and B2B sys- tems, need to be configured based on process models specifying the order in which process steps are to be executed [1]. Creating such models is a complex and time-consuming task for which different approaches exist. The most tradi- tional approach is to analyze and design the processes explicitly making use of a business process modeling tool. However, this approach has often resulted in discrepancies between the actual business processes and the ones as perceived by designers [3]; therefore, very often, the initial design of a process model is incomplete, subjective, and at a too high level. Instead of starting with an ex- plicit process design, process mining aims at extracting process knowledge fromprocess execution logs”.
Show more

25 Read more

Discovering Stochastic Petri Nets with Arbitrary Delay Distributions From Event Logs

Discovering Stochastic Petri Nets with Arbitrary Delay Distributions From Event Logs

Also attempts at eliciting non-Markovian stochastic Petri nets exist. Leclercq et al. investigate how to extract models of normally distributed data in [4]. Their work is based on an expectation maximization algorithm that they run until convergence. In comparison to our approach, they are not able to deal with missing data and do not consider di fferent execution policies. Reconstructing model parameters for stochastic systems has also been investigated by Buchholz et al. in [16]. They address the problem to find fixed model parameters of a partially observable underlying stochastic process. In contrast to our work, the underlying process’s transition distributions need to be specified beforehand, while our aim is to infer also transition distributions of a GDT_SPN model. In a similar setting, i.e., with incomplete information, Wombacher and Iacob estimate distributions of activities and missing starting times of processes in [17].
Show more

12 Read more

A Generic Import Framework For Process Event Logs

A Generic Import Framework For Process Event Logs

2 Process Mining and ProM Process-aware information systems, such as WfMS, ERP, CRM and B2B sys- tems, need to be configured based on process models specifying the order in which process steps are to be executed [1]. Creating such models is a complex and time-consuming task for which different approaches exist. The most tradi- tional approach is to analyze and design the processes explicitly, making use of a business process modeling tool. However, this approach has often resulted in discrepancies between the actual business processes and the ones as perceived by designers [3]; therefore, very often, the initial design of a process model is incomplete, subjective, and at a too high level. Instead of starting with an ex- plicit process design, process mining aims at extracting process knowledge fromprocess execution logs”.
Show more

12 Read more

Aligning Event Logs and Declarative Process Models for Conformance Checking

Aligning Event Logs and Declarative Process Models for Conformance Checking

1 Introduction Traditional Workflow Management Systems (WFMSs) are based on the idea that pro- cesses are described by procedural languages where the completion of a task may en- able the execution of other tasks. While such a high degree of support and guidance is certainly an advantage when processes are repeatedly executed in the same way, in dynamic settings (e.g., healthcare) a WFMS is considered to be too restrictive. Users often need to react to exceptional situations and execute the process in the most appro- priate manner. Therefore, in these environments systems tend to provide more freedom and do not restrict users in their actions. Comparing such dynamic process executions with procedural models may reveal many deviations that are, however, not harmful. In fact, people may exploit the flexibility offered to better handle cases. In such situations we advocate the use of declarative models. Instead of providing a procedural model that enumerates all process behaviors that are allowed, a declarative model simply lists the constraints that specify the forbidden behavior, i.e., “everything is allowed unless explicitly forbidden”.
Show more

16 Read more

Privacy-Preserving Process Mining - Differential Privacy for Event Logs

Privacy-Preserving Process Mining - Differential Privacy for Event Logs

In contrast, Fig. 9 shows that the error caused by Query 2 on the Road Traffic Fine log is small. It is noteworthy that by using Query 2 with an  value of 0.1 we often obtain the exact same result as when using the unprotected event log. In this case, the F1 score is consistently 1.0 indicating that our approach can be used to protect the privacy of individual participants while still discovering the correct main process behavior for very structured processes with a small number of variants. When lowering the  further to 0.01 as shown in Fig. 9 , differences appear due to the added noise by our protection approach. In particular, some of the less frequent activities connected to the appeals part of the Road Traffic Fines process, for example Notify Result Appeal to Offender and Receive Result Appeal From Prefecture, appear in the discovered process model. Some of the noise added by our privacy protection method cannot longer be distinguished from the regular process behavior. Still, other parts of the frequent process behavior are left intact. For example, the process model starts with Create Fine and may end with either Payment or Send for Credit Collection as in the model discovered on the unprotected log.
Show more

20 Read more

Discovering Petri Nets From Event Logs

Discovering Petri Nets From Event Logs

More from a theoretical point of view, the process discovery problem is related to the work discussed in [12, 37, 38, 49]. In these papers the limits of inductive inference are explored. For example, in [38] it is shown that the computational problem of finding a minimum finite-state acceptor compatible with given data is NP-hard. Several of the more generic concepts discussed in these papers can be translated to the domain of process mining. It is possible to interpret the problem described in this article as an inductive inference problem specified in terms of rules, a hypothesis space, examples, and criteria for successful inference. The comparison with literature in this domain raises interesting questions for process mining, e.g., how to deal with negative examples (i.e., suppose that besides log L there is a log L 0 of traces that are not possible, e.g., added by a domain expert). However, despite the relations with the work described in [12, 37, 38, 49] there are also many differences, e.g., we are mining at the net level rather than sequential or lower level representations (e.g., Markov chains, finite state machines, or regular expressions), tackle concurrency, and do not assume negative examples or complete logs.
Show more

53 Read more

Show all 10000 documents...