Performance Analysis - Realizing a Process Cube Allowing for the Comparison of Event Data

In this section the performance of the PROCUBEsystem with respect to loading and unloading operations is analysed. Clearly, loading time affects the productivity of the system only once, when the event log data is loaded into the databases, whereas unloading operation could be performed multiple times, i.e., whenever a process mining technique is applied to the events in the cube (possibly a subcube). The time required by these operations has to be small enough to guarantee adequate user interaction with the tool. In what follows, the PROCUBEtool is subject to several tests.

Test 1. For the first test, subsets of the WABO1 event log are loaded and unloaded from the database. These subsets contain 160, 338, 687, 1368, 2732, 5505, 11061, and 22130 events. The latter sublog is actually the entire WABO1 event log. The loading and unloading speed is assessed for each sublog in 4 distinct configurations of the in-memory database, i.e., 2D with dimensions TRACE parts and EVENT timestamp, 3D which contains the dimensions from 2D and EVENT orgEXT resources, 4D adds EVENT created to 3D dimensions, and the 5D configuration adds to 4D the TRACE termName dimension. This test illustrates the dependency of the loading and unloading time for typical selection of dimensions. Test 2. The second test illustrates the effects of sparse dimensions on the loading and unloading

performance. This test is performed on two 2D configurations and follows the methodology from Test 1. The dimensions of these two cubes are summarized in Table 6.1.

Cube Dimension Nr. of members

Low sparsity TRACE termName 12

EVENT orgEXT resources 20

High sparsity EVENT taskDescription 73

EVENT conceptEXT name 692

Table 6.1: Summary of dimensions for the 2D cubes in Test 2.

Test 3. For the last test, the WABO1 event log is split into several non-overlapping sublogs and the total unloading time of these sublogs is compared to the unloading of the entire WABO1 event log. This test illustrates that the filtering operations and extraction of sublogs does not infer any additional penalty on the unloading time.

100 300 900 2700 8100 24300 1 3 9 27 81 Nr. of events Time (s) Loading speed 2D load 3D load 4D load 5D load

Figure 6.7: Loading times for Test 1.

Test 1

Let us begin by showing the loading times for this test setup in Figure 6.7. Although, both scales on the figure axis are logarithmic, it is easy to see that the loading time increases linearly with respect to the number of events in the log. Moreover, the loading time is practically independent of the number of cube dimensions. The latter remark suggests that loading time per dimension into the relational database and in-memory database are about the same, i.e., if one of the dimension is moved from the relational database to the cube, the loading time does not change. Moreover, the loading implies just one constant set of operations per event, therefore it is independent of the number of dimensions in the created cube. Of course, the amount of memory used for the cube increases with the number of dimensions.

100 300 900 2700 8100 24300 1 10 100 700 Nr. of events Time (s) Unloading speed 2D load 3D load 4D load 5D load

The situation during the unloading is completely different however. The unloading time for the same databases is shown in Figure 6.8. The time spent for unloading the event log from the database increases considerably for larger numbers of cube dimensions. Of course, unloading time heavily depends on the number of cube cells that do not have any events corresponding to them. These empty cells do not affect the loading time into the database, but consume memory. The opposite is true during unload, when each cell has to be verified. Hence, time is spent on empty cells, but these cells do not contribute with any information to the resulting log. Generally, the sparsity of a cube increases with the increase of the number of dimensions, and as such, the number of empty cells does too. For this particular case study, unloading an event log with 11061 events takes 27 s for a 2D cube, and 688 s for a 5D cube, which illustrates a super-linear increase in the unloading time. Similar tendency can be observed with respect to the number of events in the log. It appears that the sparsity of the cube increases with the number events in the log with a supper-linear rate as well. These observations can be intuitively explained by two facts. First, all the dependencies in the hyper-cubic structures are multiplicative rather than additive, hence the sparsity is expected to rise exponentially. Secondly, event logs contain attributes which characterize the events very precisely, e.g., timestamp or name of a resource. Obviously, finding two events happening in exactly the same time, to say the least, is very difficult, and hardly any resource is engaged in all activities. Hence, due to this precision of event logs the sparsity is unavoidable when a process cube is constructed, and unfortunately, the unloading time complexity rises exponentially with the number of dimensions and events for typical situations.

Test 2

As mentioned previously, for this test, we compare loading and unloading times of cubes configurations with different levels of sparsity.

100 300 900 2700 8100 24300 1 3 9 27 81 110 Nr. of events Time (s) Loading speed non−sparse sparce

Figure 6.9: Loading times for Test 2.

It can be seen in Figure 6.9 that the loading time does not vary much in between the two cubes. The sparser cube appears to load only slightly longer. This behavior is expected and was explained on the results of the Test 1. On the examples from Test 1, it is shown that unloading time heavily depends on the number of in-memory dimensions and number of events. However, the unloading time is also dependent on the sparsity of the cube. The unloading time for the two cube configurations with the same number of events and dimensions but different sparsity are illustrated in Figure 6.10. Observe that the difference in between unloading times of the higher and lower sparsity cubes for the entire WABO1 event log is more than 10 fold.

100 300 900 2700 8100 24300 1 10 100 700 Nr. of events Time (s) Unloading speed non−sparse sparse

Figure 6.10: Unloading times for Test 2.

One might expect a larger difference, as the ratio between the number of cells in the cubes is actually about 191, i.e., 73 × 629 cells of a sparse cube divided by 12 × 20 cells of a non-sparse cube, where 73, 629, 12 and 20 represent the number of elements of the dimensions of the cubes. Although all the cells have to be visited while unloading the event log, the hybrid nature of the database prevents huge increase in the required time. Processing time required for empty cells is considerably lower than for the cells with events, i.e., if an empty cell is detected then no query is issued to the relational database and the algorithm jumps to the next cell. Hence, with the 191 times increase in the number of cells, the overall computational load increase is only 10 fold. Test 3

For the purpose of this test, the WABO1 event log with 22,130 events was loaded with the fol- lowing two dimensions EVENT timestamp and TRACE caseStatus. Furthermore, the drill down operation is applied along the timestamp dimension.

Cell Name All EVENT NO VALUE 2010 2011 2012 SUM

Unload time (s) 61.9 0.001 4.4 32.5 26.3 63.2

Table 6.2: Summary of the unload time for the Test 3.

In Table 6.2 we provide the unloading time for each cell in the visualization table. The column SUM stands for the sum of all columns except All EVENTS. Observe that the time to unload the entire WABO1 event log from the database is only marginally lower than the cumulative time required for its separate components. This result shows that filtering operation does not infer any performance penalties on the developed database structure. Applying the same operation on the event data stored in the relational database would require complex queries, and as such, would slow down the process. Therefore, fast filtering along the process cube dimensions is herein proven and it represents a benefit of the multidimensional database technologies.

In document Realizing a Process Cube Allowing for the Comparison of Event Data (Page 54-57)