Alignments - Applying Process Mining Algorithms in the Context of Data Collection Scenarios

The algorithm described in Section 6.2 will work reliably in the context of QuestionSys, because the event log will always be complete, and the engine does not allow for any deviations [2, 37]. In reality, this is not always the case which is why the Token Replay Algorithm has some drawbacks. One of the drawbacks is, that it is specific to Petri nets, so if a mining algorithm that does not produce a Petri net is used during process discovery, or the existing process model is not modeled as a Petri net, it is not possible to use the Token Replay Algorithm without converting the model beforehand. Additionally, if the process model and the event log are deviating heavily, the process model is flooded with tokens. Having many tokens in a Petri net allows for any behavior possible in the model and therefore no differentiation between correct and incorrect behavior is possible.

Conformance checking with the Token Replay Algorithm is no longer supported in newer versions of ProM because more sophisticated and robust algorithms such as Alignments have been developed.

To create an alignment between the process model and the event log, it is necessary to relate actions within the event log to the activities represented by the process model. This sounds like an easy task but is a very difficult task once the event log and model start to deviate [82].

When dealing with Alignments, the three different moves Synchronous move, Move on model only and Move on log only need to be taken into account. They represent the different behavior that may occur and has been introduced in [91].

1. Synchronous move

A synchronous move represents that a move within both the process model and the event log was taken. In other words, they fit together.

2. Move on model only (Model moves)

Moves on model only represent that an activity in the process model had to be executed without it being documented in the event log.

6.3 Alignments

3. Move on log only (Log moves)

Moves on log only may be seen as the opposite of modes on model only. An activity within the event log is executed but no activity in the process model respectively.

Model moves and log moves represent deviations and should, therefore, reduce the fitness. By calculating costs for log moves and model moves, deviations can be represented by a lower fitness. A possible fitness function is defined in [91]. Of course, different costs can be assumed for each activity and each type of move. Activity X may have a cost of 5 when there is a model move, while activity Z has a cost of 1 if a log move occurs. This allows for an even more precise representation and more important a weighting of deviations, not only on the type level but also on the activity level.

Alignments allow for more detailed diagnostics based on the instance level, which may be aggregated into diagnostics regarding the whole process. In addition, Alignments can indicate which activities are often skipped (represented by model moves) or that an activity is frequently executed at times where it is not supposed to, according to the process model (represented by log moves). This allows to relate the behavior from the event log to the process model in a more precise way [14].

Calculating Alignments is not a trivial task, because there may be multiple Alignments for a single instance, and the goal is to find the best fitting one. Some Alignments are not optimal in a sense that there is an alignment with lower cost, especially if the costs are customized. The implementation in ProM guarantees to return an optimal alignment [91]. If you are interested in the formal definition of Alignments please be referred to [91, 92].

6.3.1 Alignments in the Context of Data Collection Scenarios

When using a version of ProM that is newer than version 5.2, the conformance checking tool automatically uses Alignments instead of the Token Replay Algorithm. With the same event log and process model as already used in Section 6.2, the result looks a bit different this time. Figure 6.5 displays the result when checking conformance using Alignments instead of the Token Replay Algorithm.

The result presented in Figure 6.5 contains more than one information. It incorporates the frequency of different questions by changing the color of the question in the questionnaire model. A darker color of a question represents a higher frequency. Some of the questions in Figure 6.5 have a red border, indicating a model move. The green and purple bar on the bottom represents the ratio between model moves and synchronous moves. Based on this result, there was a model move in the fourth question, resulting from one participant dropping out of the questionnaire. Each subsequent question indicates a model move because Alignments do not recognize the abortion of a process instance. Instead, the instance is completed via model moves, reducing the fitness. Looking at the posterior questions of the questionnaire, they indicate that there are 50% (2 out of 4) model moves, resulting from the two dropouts added to the event log. Of course, it is possible to also calculate the fitness, and in this scenario, the fitness is significantly lower than compared to the Token Replay Algorithm. While the fitness using the Token Replay Algorithm was 93.55%, with Alignments the fitness is only 77.88% [92].

Figure 6.5 is created with the ProM plug-in called Replay a Log on Petri Nets for Con- formance Analysis.

In document Applying Process Mining Algorithms in the Context of Data Collection Scenarios (Page 90-93)