In Section 2.2.1, the XES meta-model was presented. From all the elements of the XES structure, attributes are the most relevant when employing a multidimensional structure for analysis. Case attributes and event attributes are used to create the dimensions of a hypercube together with their corresponding members. Therefore, they have to be loaded in the Palo in-memory database such that to be easily accessed for the process cube creation. As discussed in the previous section, due to sparsity issues, the user is asked to decide upon a smaller set of attributes to be used as dimensions in the process cube. The rest of the attributes are stored in relational databases (RDB), as explained in Section 5.2.
Except for traces, events and their corresponding attributes, the log keeps also information regarding the classifiers, the extensions and the global attributes. Even though unnecessary for OLAP operations, these elements are indispensable for the event log reconstruction. Therefore, they are stored separately in RDB tables and used later for unloading purposes.
The loading of an event log into databases consists of two steps. First, a special tree structure is created from event data to facilitate the construction of the process cube. Secondly, the created structure is used for building the process cube and storing parts of event data in RDB in an easy-to-access manner. We use pseudocode to present both steps.
Algorithm Parsing(log)
1. B log, gives the event log from the file
2. Create a log id, that uniquely identifies the log
3. Create tables in the RDB, with the attributes of the log, the classifiers, the extensions and the globals
4. B rootNode is the root node of a tree structure
5. B eventCoordinates is a list of attribute values for all events in the log 6. Determine the number of traces in the log (nt)
7. for i ← 1 to nt
8. dotraces[i] ← log.getTraces();
9. rootNode.addNodes(traces[i].getAttributes()); 10. Determine the number of events in traces[i] ( ne) 11. for j ← 1 to ne
12. do eventCoordinates ← NULL;
13. events[j] ← traces[i].getEvents();
14. rootNode.addNodes(events[j].getAttributes());
15. eventCoordinates.setEvent(log id, traces[i].getAttributes(),
16. events[j].getAttributes());
17. j ←j + 1;
18. i ←i + 1;
19. returnrootNode, eventCoordinates
In the first step, the classifiers, the extensions and the global attributes are extracted from the XES log structure, and loaded in RDB tables. In that sense, a log id is assigned to the log and is used to distinguish between the classifiers, extensions and global attributes of this log from the ones of other already existing or to be created logs. Traces and events with their attributes are added to a tree structure with the rootNode as the root element of the tree. The rootNode contains all the links of the tree. Nodes are added to the tree structure in the following way: the first hierarchical level of the tree presents properties of cases and events, the next level contains the values of the properties. Other hierarchical levels are also possible. In this project, we implemented hierarchies for time attributes. As such, in case of time attributes, years, months and days of week form the levels of the tree.
Except for the rootNode, a set of event coordinates is determined for each event, on lines 15-16 of the Parsing algorithm. Event coordinates give all the necessary information that can be used to place an event back in an event log. Since an event is part of a trace and a trace belongs to
a log, also trace and log information is included in the event coordinates. Consequently, event coordinates are composed of the log id, the trace id with the corresponding trace attributes and the event id with the event attributes.
Algorithm Loading(rootN ode, eventCoordinates) 1. B Create the process cube PC
2. Determine the number of dimensions nd in the rootNode 3. Allow the user to select a subset Md of all available dimensions 4. for each i ∈ Md
5. doDi← rootNode.getChildren(i).getLeafs(); 6. if rootNode.getChildren(i) is a time attribute
7. then Hi← createHierarchy(rootNode.getAttribute(i)); 8. Create P C with the dimensions Di, i ∈ Md with unique cell values 9. Determine the total number of events in the log (nte)
10. for i ← 1 to nte 11. do k ← 0; 12. columnValues ← NULL 13. forj ← 1 to nd 14. do if j ∈ Md 15. thenk ← k + 1; 16. mk← eventCoordinates.getEvent(i).getAttribute(j); 17. else columnValues.addAtribute(eventCoordinates.getEvent(i).getAttribute(j)); 18. columnValues.addAtribute(getCell(m1, . . . , mk)); 19. RDB.addRow(columnValues);
Once the rootNode and the eventCoordinates are created, they can be used to build the process cube PC. All the trace and event attributes accessible from the rootNode, are potential dimensions of the process cube. Due to sparsity issues, the user is allowed to select a subset of these to be the actual dimensions of the cube. Of course, selecting all the dimensions is also possible. For each of the chosen dimensions, its corresponding member elements and the hierarchy are added, in line 5 to 7, in the Loading algorithm. After populating dimensions with elements, the process cube PC is created, based on these dimensions. At this point, the process cube PC has dimensions and elements, but does not have any values in the cells. The eventCoordinates provides both the coordinates of the cell and the set of its corresponding events. In Section 5.2, it was explained that event data cannot be directly stored in a cell, due to cell limitations. Instead, each cell is given a cell id and the rest of event data which is not yet saved in the PC can be stored in RDB tables, with cell id as a column. As such, members of the PC dimensions are identified in eventCoordinates, line 16, and are used as parameters for the getCell(m1, . . . , mk) function which identifies a cell, line 18. The members that are not among PC dimension members, are added in the RDB together with the cell id, line 19.
Algorithm Unloading(P C)
1. B log, is the event log to be created after unloading 2. B trace, is a trace of the event log
3. B event, is an event of the event log 4. log ← NULL;
5. Add all the classifiers, extensions and globals to the log, from the RDB tables 6. B eventList is a list with the corresponding coordinates of all the events 7. B attributeList is a list with all the attributes corresponding to an event 8. Create the eventList from both PC dimensions and RDB columns 9. Determine the number of events in the eventList (ne)
10. for i ← 1 to ne
11. doattributeList ← eventList.getEvent(i).getAttributes();
12. trace ← NULL;
14. Determine the number of attributes in eventList (na) 15. for j ← 1 to na
16. do attribute ← attributeList.getAttribute(j); 17. if attribute is a log attribute
18. thenlogAttributes.add(attribute); ;
19. else if attribute is a trace attribute
20. thentraceAttributes.add(attribute);
21. else eventAttributes.add(attribute);
22. event.addAttributes(eventAttributes); 23. if logAttributes are in log
24. if there is a trace with the traceAttributes in log
25. B k is the position of the trace in log
26. thenlog.getTrace(k).add(event); 27. else trace.addAttributes(traceAttributes); 28. trace.add(event); 29. log.add (trace); 30. else trace.addAttributes(traceAttributes); 31. trace.add(event); 32. log.addAttributes(logAttributes); 33. log.add(trace); 34. returnlog;
Figure 5.1, presented earlier, shows the basic flow of event data in the system. From the event log, event data is loaded in both Palo and MySQL databases and can be retrieved from those at unloading and used to recreate the initial event log. Even though such a functionality does not add yet any value, it can still be used to test the correctness of loading and unloading event data in and from relational and OLAP structures. In what follows, we describe the unloading procedure to complete the scenario.
For the Unloading algorithm presented in this thesis, we consider the complete list of events from the initially loaded event log. Nevertheless, this list can be filtered and, as a result, only a subset of total events can be considered at unloading. In any case, there is no change with respect to the pseudocode, only in line 8, the eventList is created differently, this time, based on filtering results.
First, the initially NULL log is populated with classifiers, extensions and global attributes from RDB tables. Then, both event data from RDB and from Palo OLAP cube is extracted and used to create an eventList structure. The eventList structure is similar to the eventCoordinates structure created in the Parsing algorithm, in the sense that the eventList constains enough information to place events back in event logs. For instance, the event id gives the order of the event in the log. Note that information like the log id, the case id and the event id is discarded when constructing the event log, as it was created at loading and was not initially part of the log.
The eventList contains a list of three types of attributes: log attributes, trace attributes and event attributes. The event attributes, for instance, can be used to create an event, as in line 22. The trace attributes can be used to created a trace. However, since a trace may correspond to multiple events, we check, in line 24, whether a trace with the same attributes already exists. Then, the created event is added either to the already existing trace or to the trace that is newly created. A similar test is required when adding the log attributes to the log, to avoid repeating data in the new event log.