4.2 EA mining conversion method
5.1.1 Business Process Discovery Algorithm
5.1.1.1 Read log file algorithm
Algorithm 1Read log file
1: procedureReadLogFile(logPath)
2: parser = TextParser(logPath)
3: whileparser is not at the end of the filedo
4: arrFields = read current fields from parser
5: data = new row of logData
6: data.Id = arrFields[0]
7: data.Activity = arrFields[2] 8: insert data into logData 9: returnlogData
The first implementation of the business process discovery algorithm is to read the data from
a log file. TheReadLogFile algorithm (algorithm.1) requires the logPath, Microsoft Windows
path, that directed to the location of the log file. This algorithm produces alogData, a table
that structured refers to the metamodel of the log file (Fig.4.2). The table consists ofTraceId
that cluster events based on case,EventIdunique identifier of an event,Activitythe name of
an activity,Resourcea person who executes an activity,Timestampthe time that an activity
executed,AppServicethe application service of that accommodates an activity,AppNamean
application name that facilitates an activity. In the beginning, the algorithm creates a text parser to read the log file, since the file is structured as a csv file then the parser needs to consider the delimiter of the file, in this implementation we used ”,” (comma-delimiter). After
that, the parser will loop for each line of the file and put it on thearrFieldarray of the string
data type. The next step, the algorithm will insert each array fields as a new row of the table
logData. The fields that needed for the algorithm.2 only TraceId and Activity. Finally, the
ReadLogFilealgorithm will returnlogData.
5.1.1.2 Business Process Discovery Algorithm
In the business process discovery algorithm (algorithm.2), first of all, the algorithm reads
the log file using ReadLogFile algorithm (algorithm.1) and produceslogData table, currently
the log data table consists onlyTraceId andActivity. Next, the algorithm creates sequence
of processes and occurrences of each sequences. The algorithm reads each row oflogData
and identify cluster for each Trace/Case using TraceId. In step.10 and 11 the algorithm
will identify id for a new cluster and the beginning activity for each clusters (actA). If the
𝑖𝑑 = 𝑟𝑜𝑤.𝑖𝑑 means that current row still in the same cluster, then the algorithm will get the
next activity (actB) and create akeyof”𝑎𝑐𝑡𝐴 → 𝑎𝑐𝑡𝐵”to identify sequence of process (step.15).
After that, in step.16 the algorithm will check to a dictionary, whether the key has already existed in the dictionary. if it existed, then save the occurrence, if not existed then create the dictionary (if the dictionary has not been initialised before) and add new entry in the dictionary with key and initial occurrence. The loop process continue until all the records in the logData exhausted.
In the next block of code, the algorithm will create a frequency table. Looping all records in the dictionary that were created before. In this block of code, the algorithm will save
the current dictionary entry into dictItem (step.25) and create a new row for an entry into
result table. The result row consists of source, source of an activity. Target, target of an
activity. Frequency, occurrences of each sequence of processes. Anddependency, dependecy
”𝑎𝑐𝑡𝐴 → 𝑎𝑐𝑡𝐵” and value is total occurrences of each sequence of processes. Thus, the key
will be split intorowResult.source(step.27) androwResult.target(step.27). Lastly, frequency
will be populated with value/total occurrences of each sequence of processes.
Algorithm 2Business Process Discovery
1: procedureBusinessProcessDiscovery(logPath) 2: logData = ReadLogFile(logPath)
3:
4: // get sequence of processes and occurrences
5: whilethis is not the end of logDatado
6: row = current row of the logData
7: id = null
8: ifid != row.idthen
9: // Get a TraceId for each clusters
10: id = row.id
11: actA = row.Activity
12: actB = null
13: else
14: actB = row.Activity
15: key = actA + ’->’ + actB
16: ifdict.exists(key)thendict[key] += 1 17: else dict.add(key, 1)
18: id = row.id 19: actA = actB 20: go to the next row 21:
22: // create frequency table
23: resultTable = create new result table
24: whilethis is not the end of the dictionarydo 25: dictItem = current dictionary entry
26: rowResult = new result table row
27: rowResult.source = dictItem.key.split(’ -> ’)[0]
28: rowResult.target = dictItem.key.split(’ -> ’)[1]
29: rowResult.frequency = dictItem.value
30: add rowResult into resultTable
31: go to the next dictionary entry
32:
33: // calculate dependency
34: whilethis is not the end of result tabledo
35: rowResult = current row
36: invertRow = Get rows from result table where row.source = rowResult.target and row.target = row.source
37: dependency = (rowResult.frequency - invertRow.frequency) / (rowResult.frequency - in- vertRow.frequency + 1)
38: rowResult.dependecy = dependecy 39: go to the next row
40:
41: returnresultTable
In the last block of code, the algorithm will calculate the dependency between sequence of
processes. Using the formulation that described in definition.11𝑎 ⇒ 𝑏 = (|| | || | || ), we
can calculate the dependency between two sequence of processes. First of all, the algorithm
will loop the result table and put the current row into rowResult. Next, the algorithm will
invertRow will be𝑏 > 𝑎. In step.37 we can get the dependency factor between sequence of processes using the formulation that we mentioned before. Lastly, we update the result table with the new dependency factor that we got before. In the end, we can successfully generate the result table based on the definition.16.