Model selection validation for case study 1

6.9 Model Evaluation

6.9.1 Model selection validation for case study 1

In this section we aim to check the validity of model selection of this case study against issues which were key motivations for our optimisation. Also, the same model validation metrics are applied on models that are selected by the Bayesian Information Criteria (BIC) in order to provide a comparison between the two selection methods.

127 6.9. Model Evaluation

package. The reason for not using our new visualization function here is because the goal is to investigate validation aspects not to visualize the process models. Hence, using the old visualization function is sufficient for this purpose.

Model validation is an important step that comes after model selection. As discussed earlier in Chapter 4, there are three main issues that may generate undesirable abstraction model which are the existence of highly connected states that reflects the potential of higher level of abstraction, similar states and unimportant states.

We will validate model selection against these issues. The way of quantifying these issues are provided in Chapter 4 and here is a brief reminder:

1- The issue of strong connected states is identified using the graph theory technique for strong components detection.

2- The issue of multiple similar states is detected using state type similarity measure, where states are similar if they have same state type for instance, simple, composite and complex and same event types that occupied 80% of both states. A list of all states types for this case study and case studies that will be discussed in the following chapter is provided in Appendix C. The presence of similar state is scored by counting how many same-type similar states a model has. 3- The issue of unimportant states corresponds to state converge where it shows the percentage of how many cases are involved in a state.

In the following section, the validation for models selection is presented and the methods that are used for identifying validation metrics are explained in detail likewise. The proposed optimisation methods both strict and soft have selected fewer number of states compared with BIC. The best model that is selected using the strict optimisation is the 4state model and using the soft optimisation is the 9state model whereas BIC has selected a model of 12 states as the best model. Figure 6.17 shows the best model for our optimisation with maximum score of optimisation whereas the best model using BIC has the minimum value.

Figure 6.17: Best models of different metrics in case study 1

1- Connected components:

The highly connected states are abundantly observed in the 12states model that is selected by BIC, as shown in Figure 6.18(a). There are 4 possible higher abstraction that can be detected in this model where each cluster must have at least two states.

In contrast to Figure 6.18(b) and (c), the number of connected states is few where there are 2 and 1 clusters of states in the models selected by soft and strict optimisation respectively.

129 6.9. Model Evaluation

(a) model selected by BIC _{(b) model selected by soft optimisation}

Figure 6.18: Connected components detection in case study 1

2- Similar states:

The model in Figure 6.19(a) has three same-type similar states. Detecting multiple similar states for all these types have resulted in finding:

1- Production states (Discharge) are shown in state 1, 2, 3 and 10. 2- Simple states (Admission-Elective) are state 5, 9 and 11.

3- Composite states (Chemotherapy cycles) are state 6 and state 7.

The model presented in 6.19(b) has three same-type similar states as follows: 1- Production states (Discharge) are state 4 and state 6.

2- Simple states (Admission-Elective) which are states 3, 5 and 8. 3- Composite states (Chemotherapy cycles) are state 7 and state 9.

The model presented in 6.19(c) has no similar states all states are constructed from different event types.

(a) similar states model selected by BIC

(b) similar states of soft optimisation model

Figure 6.19: Similar states detection in case study 1(states numbering starts from left to right) 3- Unimportant states:

Based on state importance percentage that is extracted and reported in Table 6.4, we use the threshold of 50% to determine if a state is important or not. The threshold can be adjusted based on user preference. The result showed that, as expected the states in model selected by the strict optimisation were all significant. Unlike to states in soft model where this model contained 4 non-significant states. It should be noted that, as discussed in Chapter 5, the issue

131 6.9. Model Evaluation

of unimportant states might be related to bad model initialization that cannot be addressed in the optimisation.

A summary of model selection validation metrics is presented in Table 6.6. Clearly, the three issues are more likely to be found in the model that is selected by BIC whereas these issues are hardly observed in the model that is selected by the strict optimisation. Although the soft model and BIC model have the same number of similar states, the number of states for each similar type in BIC model is higher than soft model. For example, there are 4 states of production type (Discharge) in BIC model whereas only 2 states of production type (Discharge) in the soft model. Also, the soft model has better scores of all the proposed criteria compared to BIC model.

Table 6.6: Validation metrics of case study 1

Issues strict optimisation soft optimisation BIC

found count found count found count

strong connected components yes 1 yes 2 yes 4

similar states no - yes 3 yes 3

unimportant states ( <50%) no - yes 4 yes 3

In document Unsupervised Abstraction for Reducing the Complexity of Healthcare Process Models (Page 150-155)