3.6 Different approaches for event log pre-processing
3.6.1 Aggregation approaches for event log pre-processing
This method aims to improve the sequential order of the events which includes two steps; solve batch events and mapping fine-grained events into main event.
(a) Event log manipulation: solve batch events
There are some data quality issues related to MIMIC-III data such as missing accurate times- tamps. This issue may be resulted from batched events. Batch processing is the execution of several events at once and recording them with the same time, for example a group of labo- ratory results received at the same time. The issue of batch processing also leads to a huge number of fine-grain events that increase process model complexity. In our data model, the tables Chart-events and Lab-events contain a large number of batch events which should be addressed as a preliminary step for mining patient pathways.
Each patient in the ICU has been checked on a regular basis at varying intervals. The differ- ent measurements that are taken in each check have been recorded with the same time. For process mining purposes we are focusing on the process of charted observations regardless of which items are checked therefore, all items are consolidated into a single charted event. Our hypothesis is that handling batched events as a single event simplifies the process model and improves process mining quality.
This problem is addressed in the extraction stage. Batched events are re-extracted with the same event label. The extraction includes tables that have batched events such as chart-event and lab-event. More precisely, for different chart measurements in the chart-event table such as Calcium, Glucose and Platelet count are all extracted under the name of Chart event. An SQL example for re-extraction of batched events is shown in Figure 3.7.
Figure 3.7: An SQL example for avoiding batch events
This method has significantly reduced the number of events which in turn reduced model complexity. It should be noted that, reducing the number of events using this method does not lead to significant information loss for our purpose in this research. We believe that, from a process mining perspective, the exact name of measurements in the ICU is less important when we aim to mine the general abstracted process model. We are able to capture the events occurred in chart-event and lab-event tables.
Although this method reduces the number of activities and events, the variation of patients pathways is still extremely high and the event log needs further manipulations.
(b) Event log manipulation: mapping fine-grained events into main activity Another data quality issue in MIMIC-III is the different level of granularity of recorded events. The relation between these events can be represented as ontological events which have a semantic
49 3.6. Different approaches for event log pre-processing
relation with a main activity. For example, an admission activity can have a number of events where the patient may have been admitted into different wards such as Medical Intensive Care Unit (MICU) or Coronary Care Unit (CCU). Our hypothesis is that mapping fine-grained events into main activity will simplify the patient pathway model and reduce event numbers to help finding interesting patterns.
Using our data model, the categories of fine-grain events are relatively limited for some tables. Ontological events are located in Admissions and Transfer tables. Mapping the fine-grain events into main activity was done using the Add Mapping of Activity Names log enhancement filter in ProM. The events are mapped into main activities as illustrated in Table 3.3.
The results of this experiment shows that the number of different types of activities was Table 3.3: Mapping ontological events
Ontological events Mapped activity
admit CCU Admission
admit CSRU Admission
admit MICU Admission
admit SICU Admission
admit TSICU Admission
transfer CCU Transfer
transfer CSRU Transfer
transfer MICU Transfer
transfer SICU Transfer
transfer TSICU Transfer
reduced by nearly half of the previous processing step. Also, the number of events was reduced and consequently the mean of events per case is reduced likewise.
On the other hand, the number of process variations remained high and was not affected by mapping fine-grain events.
3.6.2
Temporal approaches for event log pre-processing
Outliers events can be defined as events that prevent capturing clear patterns; such events affect the quality of process mining efforts. Repeated events, which known as duplicate tasks, occur when the same event type has been executed multiple times in the same case. In critical care, for example, the incidence of repeated events is high because events include periodic monitoring (known as charting) of heart rate, blood pressure and other vital signs.
This method aims to improve the temporal aspect of the events. There are three temporal aspects of healthcare events which are the recorded time resolution, event duration and event interval. The following section will discuss these aspects and how to tackle them in detail.
(a) Recorded time resolution
An event can be stored with different time resolutions. It could be stored with a timestamp that shows the date and time or date only. Also timestamps can be stored with hours, minutes and seconds. This is considered as data quality related issue which has a strong impact on the quality of process models as mentioned by [95].
For instance, mining the process of a group of events with inconsistent temporal resolution can produce a misleading process model. This is because inconsistent temporal resolution may change the actual order of the events. In MIMIC-III, for example, the Prescription event is stored with date only while other event types are stored with different resolution that includes timestamps of hours, minutes and seconds. Therefore, the process model will allocate the Prescription event an inaccurate order.
Getting the accurate time for Prescription events from MIMIC-III database is not applicable because this depends originally on the storage schema of the MIMIC-III database where the field for storing a Prescription event is defined on a Date format only.
(b) Duration of care activity
The duration of an activity can be defined as the elapsed time since the activity started to the end of that activity. It is a feature for an activity. Some papers refer to it as execution time [97].
In MIMIC-III there is another category of fine-grained events besides ontological events which are transactional events. A transactional event is an event that provides information about the duration of an activity - when it starts, updates, comments and finishes.
This type of event is very common in healthcare processes for example, the process of trans- ferring a patient inside a hospital which starts when a nurse creates a call for transfer, the call might be updated or cancelled, then the call should be acknowledged and the outcome should be recorded.
Transactional events are located in the Call and Input tables. Mapping these fine-grained events into the main activity was done using the Add Mapping of Activity Names log enhancement filter in ProM as presented in Table 3.4
51 3.6. Different approaches for event log pre-processing
Table 3.4: Mapping transactional events Transactional events Mapped activity
call create Call
call update Call
call acknowledge Call
call outcome Call
call first reservation Call call current reservation Call
input start Input
input store Input
input comment Input
input end Input
(c) Interval of care event
The interval of care event can be defined as the time gap between two events. Interval time is a feature of an event. There are different ways it can be used to leverage this feature. In the following section, we investigate the potential of using the interval feature of same type events to reduce model complexity.