• No results found

3.3 Phases of Application

3.3.2 Data Collection

3.3.2.2 Event Log Structure

Building event logs from the data available within the organisation is part of the most important steps in process mining. This is primarily because ev- erything that follows in the methodology from here on is based on the event log constructed in this step. It is thus crucial to understand how event logs are structured to be practically useful. Given Figure 3.4 the conceptual view of how an event log is structured with how data points are set up in relation to one another. The conceptual view also includes the hierarchy of information. When looking at a process from the highest level, it can be seen that the pro- cess definition is first identified. The definition specifies which activities belong to the process and the structure of execution. As the definition is unique to a specific process but numerous processes of this type can be executed, a process instance or case is created. As this instance is executed, the trace refers to the actual path that is followed, whether or not that path is as planned. These traces are the focus point of process mining for the reason that they convey reality. Further, these traces consist of events, which in turn have attributes associated with them. A visual representation of the process definition concept is illustrated in Figure 3.5.

While this conceptualisation is that of an ideal case, information systems in general do not capture the information in this manner. Data manipulation is required to format it appropriately. This is primarily due to systems not being designed to monitor processes specifically. They do however monitor transac-

Process Definition Process Instance Traces Events Activity Tasks Specifies Attributes  Timestamp  Event Type  Cost  Resource  Etc. Performed on Definition Instantiation Recording Ordered Instantiation Creates

Figure 3.4: Conceptual event log structure.

Adapted from Fliegner (2014)

tions (often financial) which take place and the state changes required to issue new work orders as mentioned by Fliegner (2014). As most of the ERP systems store tables with the previously mentioned data, Structured Query Language (SQL) is used to extract the required information. This data is stored in a temporary database which serves as the data origin from which formatting and filtering can be done. When the table created here is formatted and filtered to the desired extent, it can be loaded into the process mining tool-set. Fliegner (2014) goes on to identify common challenges concerning the extracted data. Of the challenges mentioned, there are only a few that have not yet been covered. A challenge that is referred to as snapshots entails the lifetime of the cases involved in the extracted processes. When looking at the recorded pro- cesses, it can be the case that a process was in progress before the recording was started. This is also true for the end of the process as to where the ex- traction of the event log interrupted a process which is still in progress and thus when being analysed, it seems as though the process is incomplete. These

1 1 1 2 n 2 1 2 n

Process Cases Events

Activity = …  Time = …  Resource = …  Cost = …  Activity = …  Time = …  Resource = …  Cost = …  Activity = …  Time = …  Resource = …  Cost = …  Activity = …  Time = …  Resource = …  Cost = …  Activity = …  Time = …  Resource = …  Cost = …  Activity = …  Time = …  Resource = …  Cost = … ... ... ... ...

Figure 3.5: Event log structure.

Adapted from Van der Aalst (2011)

kind of processes should be addressed in the context of the filtering stage and assumed to be unwanted (noise), as they give an altered view of reality. Flieg- ner (2014) also refers to proper event selection. The author in this case refers to the level of detail contained in the event databases. As some processes have sub-processes within events, it should be decided if these sub-processes form part of the intended scope and detail level. Preferably, the level of detail should be uniform in the sense that the events in the event log are of the same detail and of the same perspective, as the process mining algorithms will treat all events in the log as equal, distorting trace events.

The next two problems faced are encountered when the event data is dis- tributed over multiple databases. Event correlation needs to be ensured. This means that all events are required to belong to a case. This is an easy task when all events are grouped in one database but when the events are distributed, case referrals are often missing. The same concept applies to timestamps. When events are grouped within one database, the ordering of the processes already gives enough information to do process mining. However, when events are distributed across multiple databases, timestamps become a necessity in

that they are often the only measure by which these events can be ordered.

Schedule Assign Start

Reassign Suspend Resume Abort case Abort activity Withdraw Complete Manualskip Autoskip Successful termination Unsuccessful termination

Figure 3.6: Transactional process life cycle model.

Adapted from Van der Aalst (2011)

Figure 3.6 shows a standard model for the life cycle of a task as presented by Van der Aalst (2011). It includes not only the preferred route available for the completion of a task but also the variation thereof. Variations include the reassignment of a resource and the termination of a task which can only occur when the task is in a specific state of its progress.

Up until now, an event log had the standard form as shown above, where the tasks were logged as they occurred; by a resource, at a time, belonging to a process instance. There is however another way to represent the same event log in terms of traces. Traces, as presented by Ingvaldsen (2011), are a collection of all the paths followed by the cases belonging to a process instance. This can be of use where a summary is needed whether it is necessary to understand the path followed by an individual case or the distribution of possible paths followed by numerous cases. An example of this is shown in Table 3.1.

Table 3.1: Event log traces Notice for Maintenance Process Number of Instances Log Traces

5010 ABDEA

460 ACDEHFA

It should be noted that the event logs constructed from whichever records available, are the basis for process mining. In that, it should be recognized that the availability of accurate and reliable event logs are crucial for a suc- cessful process mining application. This is especially true when the results obtained from a process mining analysis will later form part of the BPM life- cycle in that they will be used in redesigning of evaluating the current business processes as shown in 2.2.1.