• No results found

Discovering Workflow Transactional Behavior from Event-Based Log

N/A
N/A
Protected

Academic year: 2021

Share "Discovering Workflow Transactional Behavior from Event-Based Log"

Copied!
16
0
0

Loading.... (view fulltext now)

Full text

(1)

Event-Based Log

Walid Gaaloul, Sami Bhiri, and Claude Godart LORIA - INRIA - CNRS - UMR 7503 BP 239, F-54506 Vandœuvre-lès-Nancy Cedex, France

{gaaloul,bhiri,godart}@loria.fr

Abstract. Previous workflow mining works have concentrated their efforts on process behavioral aspects. Although powerful, these proposals are found lacking in functionalities and performance when used to discover transactional workflow that cannot be seen at the level of behavioral aspects of workflow. Their limitations mainly come from their incapacity to discover the transactional dependencies between process activities, or activities transactional properties. In this paper, we describe mining techniques, which are able to discover a workflow model, and to improve its transactional behavior from event logs. We propose an algorithm to discover workflow patterns and workflow termination states (WTS). Then based on the discovered control flow and set of termination states, we use a set of rules to mine the workflow transactional behavior.

Keywords: Business intelligence, Workflow mining, transactional Workflows, En-terprize knowledge discovery, knowledge modelling.

1

Introduction

Current workflow management systems (WFMS) which are driven by explicit process models offer little aid for the acquisition of workflow models and their adaptation to changing requirements [1]. It is difficult for process engineers to validate a formal process model using by only a visual representation of the process. They often agree to a visual representation, but when they are confronted with the WFMS implementing the process, it often turns out that the system has a different interpretation of the process model than they had expected and the process model as it was modelled is rejected. The modelling errors are commonly not detected until the workflow model is performed. That is why

the workflow mining approach proposes techniques to acquire workflow models from

observations of enacted workflows instances (i.e. workflow log). All workflow activities are traced, and logs are passed to workflow mining component which (re)discover the workflow model.

Previous works on workflow mining[2,3,4,5] have restricted themselves to struc-tural considerations with limited checks of transactional behavior. Especially, they have neglected transactional workflow properties [6], such as transactional dependencies be-tween workflow activities, or activity transactional properties. We are convinced that the capability of workflow mining to discover workflows transactional features provides a significant improvement to WFMS understanding and design.

R. Meersman, Z. Tari (Eds.): CoopIS/DOA/ODBASE 2004, LNCS 3290, pp. 3–18, 2004. c

(2)

In this paper, we describe mining techniques which are able to discover a workflow model, and mine its transactional behavior from its event logs. We propose an algorithm to discover workflow patterns and workflow set of termination states (WTS). Then we extract from it the process transactional behavior using a set of rules.

The remainder of this paper is organized as follows. Section 2 presents a motivating example which shows the interest of mining workflow transactional behavior. Then in section 3 we introduce distinctive concepts and some needed prerequisites. Section 4 overviews our approach which we detail in the next four sections. Section 9 concludes and presents some future works.

2

Motivating Example

In this section we present a motivating example showing the need for discovering trans-actional behavior to detect design gaps and thereafter improve the workflow supporting the application. Let us suppose an application for on line purchase of personal computers (PC). This application is carried out by the workflow illustrated in the figure 1. Activities in the online PC purchase are described below.

Customer Requirements Specification (CRS): The first activity in the workflow is

to receive a customer order. This activity allows to acquire the customer requirements and then creates a new instance of the workflow.

Customer Identity Check (CIC): The application checks the identity of the

cus-tomer.

Payment (P ): This activity ensures the payment process by credit card, cheques, ... Command Items (CI): If the on line merchant have not all the computer

compo-nents, he commands them.

Computer Assembly (CA): After receiving the required items, this activity ensures

the computer assembly.

Send Item (SI): After payment and assembly, the computer is sent to the customer.

The application enhances its classical control flow by specifying an additional work-flow transactional behavior to ensure failures handling. It specifies that (i) CRS, P , CI and CA are sure to complete, (ii) the work of CRS, CA and P can be semantically undone and (iii) the work of P and CA (respectively of CRS) will be semantically undone when SI (respectively CIC) fails.

Let suppose now that in reality (by observation of sufficient execution cases) SI never fails and P is not sure to complete. This means there is no need for P to be compensatable and CA have to be compensated when P fails.

Classical workflow mining is not able to detect such an anomaly and thereafter to improve the workflow model. To overcome this limitation, it is necessary to extend workflow mining in such a way to be able to discover workflow transactional behavior as a feed back loop to improve the transactional behavior of the workflow model.

(3)

Fig. 1. An example of workflow involving transactional behavior.

3

Transactional Workflow

A transactional workflow, is a workflow that emphasizes transactional behavior for fail-ures handling and recovery. Within transactional workflows, we distinguish between the control flow and the transactional behavior.

3.1 Control Flow

A Workflow process definition is composed of workflow activities . Activities are related together to form a control flow via transitions which can be guarded by a control flow operator. The control flow dimension is concerned with the partial ordering of activities. The activities that need to be executed are identified and the routing of cases along these activities is determined. Conditional, sequential, parallel and iterative routing are typical structures specified in the control flow dimension. We use workflow patterns [7] to express and implement the control flow dimension requirements and functionalities.

3.2 Transactional Behavior

Workflow transactional behavior specifies mechanisms for failures handling. It defines activity transactional properties and transactional flow (interactions).

Activities transactional properties: Within transactional workflow, activities

empha-sizes transactional properties for its characterization and correct usage. The main trans-actional properties that we are considering are retriable, compensatable and pivot[8]. An activity a is said to be retriable (ar) iff it is sure to complete. a is said to be

compen-satable (acp) iff its work can be semantically undone. Then, a is said to be pivot (ap) iff

its effect can not be compensated. Back to our example, we note that all activities except CICand SI are specified as retriable, and activities CRS, P and CA are specified as compensatable.

(4)

Transactional flow: A transactional flow defines a set of interactions to ensure failures

handling. Transactional workflows take advantage of activity transactional properties to specialize their transactional interactions. For instance, in our example, we take ad-vantage of transactional properties of the activities to precise that CA and P will be compensated when SI fails and CRS will be compensated when CIC fails.

3.3 Workflow Set of Termination States

The state, at a specific time, of a workflow composed of n activities, is the tuple (x1, x2, ..., xn), where xi is the state of the activity ai at this time. The activity states that we consider are quite classical initial, aborted, activated, failed, ter-minated and compensated. A workflow can have a set of termination states. For instance the set of termination states of the workflow given in the section 2 is { (CRS.terminated, CIC.terminated, P.terminated,CI.terminated, CA.terminated, SI.terminated); (CRS.compensated,CIC.failed, P.aborted,CI.aborted,CA.aborted, SI.aborted); (CRS.terminated,CIC.terminated, P.compensated, CI.terminated, A.compensated,SI.failed)}.

4

Overview of Our Approach

Mining transactional workflow returns to discover control flow and transactional behav-ior. As illustrated in the figure 2, we mainly proceed in two steps. The first one consists in the mining of the control flow (section 6) and the set of termination states (section 7) from the workflow log. Then, based on the discovered control flow and set of termina-tion states, we use a set of rules to mine the workflow transactermina-tional behavior (sectermina-tion 8). We illustrate the applicability of each one of these mining points through the previous example given in section 2. We show there how thanks to this mining we can improve the transactional workflow carrying out this application.

(5)

5

Workflow Event Log

Workflows are defined as case-based, i.e., every piece of work is executed for a specific case. Many cases can be handled by following the same workflow process definition. Routing elements are used to describe sequential, conditional, parallel, and iterative routing thus specifying the appropriate route of a case [9,10]. To be completed, workflow log should cover all the possible cases (i.e. if a specific routing element can appear in the mined workflow model, the log should contain an example of this behavior in at least one case). Thus, the completeness of mined workflow model depends on how much the log covers all possible dependencies between its activities.

A workflow log is considered as a set of events streams. Each events stream represents a workflow execution. Each event is described by activity identifier, execution time and activity state (see Definition 1).

Definition 1 (Event log). An event log is related to an activity.

Thus, an event is seen as a triplet event(activityId, occurTime, activityState), where:

– (activityId : int) is the ID of the activity concerned with the event, – (occurTime : int) is the execution time,

– (activityState : symbol) is the activity state (initial, aborted, active, failed, terminated

and compensated).

A workflow log may contain information about thousands of events streams. Since there are no causal dependencies between events corresponding to different events streams, we can project the workflow log onto a separate events streams without loosing any information (see Definition 2).

Definition 2 (Workflow Event log). A workflow log is considered as a set of events

streams. Each events stream represents the execution of one case.

More formally, an events stream is defined as a quadruplet stream: (sequenceLog, workflowOccurence, beginTime, endTime) where:

– (sequenceLog:{event}): is an ordered Event log belonging to an execution of a

workflow case,

– (wOccurence : int) is the workflow execution instance number, – (beginTime: time) is the moment of log beginning,

– (endTime: time) is the moment of log end.

So, workflowLog:{wStreami: stream; 0≤ i ≤ number of workflow instantiations}

is a Workflow Event log where:∀ wStreami∈ workflowLog; wStreami.wOccurence

references the same workflow.

We defineWL = {workflowLog} as the set of all workflows logs.

An example of an events stream stream extracted from our workflow model example is given below:

L = stream(5, 16, [event(CRS,5,initial), event(CIC,5,initial), event(P,5,initial), event(CI,5,initial), event(CA,5,initial), event(SI,5,initial), event(CRS,8,terminated), event(CIC,10,terminated), event(CI,13,terminated),

event(P,15,terminated), event(CA,16,failed), event(CA,18,terminated), event(SI,20,terminated)])

(6)

We are interesting in extracting workflow patterns that describe control flow within processes workflow. Statistical calculus used to discover these patterns (see section 6) extract control flow dependencies between workflow activities that are executed with-out "exceptions" (i.e. they reached successfully their terminated state). Because initial workflow log contains data relating to the whole life cycle of workflow activity (i.e. including all activity states), wee need to filter workflow log and take only the events that its state is exclusively terminated (see Definition 3). Note that this is the mini-mal information we assume to be present at this point. Any information system using transactional systems such as ERP, CRM, or workflow management systems offer this information in some form [5].

Definition 3 (Log projection). Builds a Workflow Log state projection.

W orkf lowLogstate: WL → WL

wl = {(sL, wO, beginT ime, endT ime)} → wl = {(sL, wO, beginT ime, endT ime} where sL⊂ sL and ∀ e:{event} ∈ sL’ e.activityState = terminated.

6

Control Flow Mining

In this present work, we are exclusively interested in discovering "elementary" workflow patterns: Sequence, Parallel split, Synchronization, Exclusive choice, Multiple choices, Simple merge and M-out-of-N Join pattern[7].

Discovering workflows patterns from event-based log basically involves determining the logical dependencies among its activities. Activities dependence is defined as an occurrence of one activity directly depending on another activity. We define three types of Activities dependence:

1. Sequential dependence captures the sequencing of activities (Sequence pattern) where one activity follows directly one other.

2. Conditional dependence captures selection, or a choice of one activity from a set of activities potentially following (e.g. exclusive choice pattern) or preceding (e.g. Simple Merge pattern) a given activity.

3. Concurrent dependence captures concurrency in terms of "fork" (e.g. Parallel Split pattern) and "join" (e.g. synchronization pattern).

The main challenge which we cope with is the discovery of the sequential or concur-rent nature of joins and splits of these patterns. To reach our goal, we proceed with three steps: Step (i) the construction of statistical dependency table, Step (ii) the discovery of frequent episodes in log, and Step (iii) the mining of workflow patterns through a set of rules.

6.1 Construction of the the Statistical Dependency Table

Some numerical representations of event-log are needed for supporting analysis to be performed for discovering workflow patterns. The statistical dependency table based on a notion of frequency table [11] expresses activities dependencies. The size of this table is N*N, where N is the number of activities in mined workflow. The (m,n) table

(7)

entry (notation P(m/n)) is the frequency of the n activity preceding the m activity. For example, let A and B two activities in mined workflow; P(B/A)=0.45 expresses that if B occurs, 45% of the time A is a previous activity. Note that the construction of statistical dependency table is done from terminated log event projection (this projection restrict log to executions without "exceptions").

If we assume that events stream is exactly correct (i.e., contains no noise) and derives from a sequential workflow, as well as the zero entries in statistical dependency table are interpreted as signifying independence, the non-zero frequencies frequency table directly represent probabilistic dependence relations, and so a causal dependency. But, due to the concurrent dependence as we can see in workflow patterns like Synchronization pattern, Parallel split pattern and Multiple choice pattern, the events streams represent interleaved events sequences from all of concurrent threads. As consequence, an activity might not, for some concurrency reasons, depend on the immediate predecessor, but on another "indirectly" preceding activity.

Thus, some entries in statistical dependency table indicates spurious or false depen-dencies. To unmask and correct this erroneous frequencies we calculate the frequency using a concurrent window, i.e. we will not only consider the events occurred imme-diately backwards but also the events covered by the concurrent window. Formally, a concurrent window defines a log slide over an events stream (see Definition 4).

Definition 4 (log window). A log window defines a set of Event logs over an events

stream S:stream(sLog, wOccurence, bStream, eStream).

Formally, we define a log window as a triplet window(wLog, bWin, eWin), where :

– (bWin : time) is the moment of the window beginning (with bStream≤ bWin) – (eWin : time) is the moment of the window end (with eWin≤ eStream)

– wLog⊂ sLog and ∀ e: event where bWin ≤ e.occurTime ≤ bWin ⇒ e: event ∈

wLog.

The time span eWin-bWin is called the width of the window, and it is denoted width(window).

The width of the concurrent window is the maximal duration that a concurrent execu-tion can take. It depends on the studied workflow and is estimated by the user. Based on that, we construct an events stream partition (see Definition 5). This partition is formed by a set of overlapping windows. Each window is built by adding the next event log not included in the previous window. After that, we suppress events log which are not in concurrent window. Thus, the width of these windows can not be taller than the fixed concurrent duration.

Definition 5 (K-partition). K-partition builds a set of partially overlapping windows

partition over an events stream.

K-partition : workflowLog→ ({ window})*

S : stream(sLog, wOccurence, bStream, eStream)→ {wi: window; 1≤i≤n} where:

– w1.bWin = bStream and wn.eWin=eStream,

(8)

∀ i; 0≤i<n; wi+1.wLog{the last e:event in wi+1.wLog} ⊂ wi.wLog and wi+1.wLog

= wi.wLog.

Based on this definition, we are now able to describe our mining algorithm. Algorithm 1 computes the activity frequency and algorithm 2 activity dependencies.

As starting point, we need to calculate, for each activity A in a mined workflow, its Statistic frequency (noted #A) from W orkf lowLogterminated. It is used then to

calculate dependency frequency and to discover workflow patterns (see section 6.3). Al-gorithm 1 shows how it is computed from workflowLog. Each stream in workflowLog are read event by event and corresponding frequency activity are updated. Note that indentation is used in the algorithms below to specify the extent of loops and conditional statements.

Algorithm 1 : Statistic activity frequency algorithm

Input: Wlog : W orkf lowLogterminated(workflowLog), K :width(concurrent

win-dow)

output: AFT : #[]

var

t_id: int; begin

for all S:stream in Wlog

for all e:event in S.sequenceLog t_id= e.activityId;

AFT[t_id]++; endFor

endFor end

Algorithm 2 computes Statistic activity dependency. It scans the set K-partition

windows over workflowlog, window by window, and for each window it computes

for the last activity the frequencies of its preceded activities and the corresponding table is updated in consequence. The first window need a particular treatment. The statistic activity dependency will be found by dividing each row entry in the previous table by the frequency of activity computed in Algorithm 1.

Algorithm 2 : Statistic activity dependency algorithm Input: Wlog : W orkf lowLogterminated(workflowLog)

output: SFD : Statistic activity Dependency Table

var t_reference: int; t_preceded : int; fWin : window; depFreq :int[][]; freq :int begin

(9)

t_reference = last_activity(win) /* the function last_activity(win) returns the activityId of the last event in win.wLog */

win = preceded_Events(win); /* the function preceded_Events(win) returns win without the last event*/

for all e:event in (win.wLog) t_preceded= e.activityId;

depFreq[t_reference][t_preceded]++; endFor

endFor

/* particular case: first window*/

fWin = firstwindow(K-partition(Wlog)) /* return the first window*/

fwin=preceded_Events(fwin) While (fwin.wLog <> null)

t_reference = last_activity(fwin)

for all e:event in (fwin.wLog-{last_activity(fWin)}) t_preceded= e.activityId;

depFreq[t_reference][t_preceded]++; endFor

fwin=preceded_Events(fwin) endWhile

/*Final step: construction of statistical dependency table */

for all freq=depFreq[t_reference][t_preceded] in depFreq P(t_reference/t_preceded]=freq/#t_reference;

endFor end

6.2 Discovering Episodes in Logs

The statistical dependency table is not sufficient. Some entries can indicate non-zero entries that do not correspond to dependencies. For example the events stream given in section 5 suggests a sequential dependency between CI and P activities which is incorrect. To deal with this issue, we will use episodes to eliminate this noise and to identify correctly workflow patterns.

Through the discovery of specific episodes in events stream, we can eliminate the confusion caused by the concurrence which produces spurious non-zero entries in the statistical dependency table. For this reason we are interested in finding recurrent com-binations of events, which we call frequent episodes. Our definition of frequent episode is a variation of the one from [12]. Formally, an episode is a partially ordered collection of events occurring together. In our workflow mining technique we need to discover and identify K-Parallel and K-serial episodes in W orkf lowLogterminated events streams

(10)

projection. The calculus of K-Parallel and K-serial depends on the width of the concur-rent window (see Definition 6 and 7). We have adapted an algorithm proposed in [12] to find such class of episodes.

Definition 6 (K-Parallel episodes). Π(t1, t2) denotes the K-Parallel relation on ac-tivities t1and t2and can be seen as a relation over workflow activities belonging to the same window.

Π(t1, t2) iff t1 and t2 have (i) no time ordering constraints on their respective

terminated events log and (ii) if t1and t2have events log in an event stream then these events log belongs to the same window W and K= width(W). Note that, there can be other events occurring between t1and t2.

Definition 7 (K-serial episodes). Γ (t1, t2) denotes the K-serial relation on activities t1 and t2and can be seen as a relation over Workflow activity belonging to the same window.

Γ(t1, t2) iff (i) the respective terminated events log of t1 and t2in workflow log occur in this order and (ii) if they have events log in an event stream then these events log belong to the same window W and K= width(W). Note there can be other events occurring between t1and t2.

The K-Parallel and K-serial relations are easy to interpret and they can be discovered efficiently from log events stream [12]. Moreover, any complex partially ordered episode could be seen as a recursive combination of parallel and serial episodes.

6.3 Mining of Workflow Patterns

After the compute of the statistical dependency table and the discovery of episodes, the last step will be the identification of workflow patterns through a set a rules. In fact, each pattern will be identified by a particular episodes set and statistical tests. Each pattern has its own features, which represents its unique identifier. Our algorithm allows, if the execution log is completed, the discovery of the whole workflow patterns included in the mined workflow.

We divided the workflows patterns in three categories : sequence, fork and join patterns. In the following we will present rules to discover the most interesting workflow patterns belonging to these three categories.

Sequence pattern: In this category we find only the sequence pattern (c.f. table 1). In

this pattern, the enactment of the activity B depends only on the completion of activity A. So we need, in besides of the discovery of Γ (A,B) episode, statistical tests (P(B/A) =

1 ∧ #B = #A) that ensure the exclusive dependency linking B to A.

Fork patterns: This category (c.f. table 2) has a "fork" point where a single thread of control splits into multiple threads of control which can be, according to the used pattern, executed or not. In the following, we denote p(B1, B2, ..., Bn) the equivalency class of Π containing{Bi;0 ≤ i ≤ n}.

(11)

Table 1. Rules of sequence workflow patterns mining

The causality between the activities A and Bibefore and after "fork" point is shared

by Exclusive Choice, Parallel Split and Multi-choice, the three patterns of this category. This causality is ensured by the statistical tests(∀0 ≤ i ≤ n; P (Bi/A) = 1). The Exclusive choice pattern, where one of several branches is chosen after "fork" point, has an episode different from Parallel Split and Multi-choice patterns which have the same episode. The non-parallelism between Bi, in the Exclusive choice pattern are ensured by (∀0 ≤ i, j ≤ n; P (Bi/Bj) = 0). Parallel Split and Multi-choice patterns differentiate themselves by the frequencies relation between the activity A and the activities Bi. Effectively, only a part of activities are executed in the Multi-choice pattern after "fork" point, while all the Biactivities are executed in Parallel Split pattern.

Join patterns: This category (c.f. table 3) has a "join" point where a single thread of

control splits into multiple threads of control. The number of necessary branches for the activation of the activity B after the "join" point depends on the used pattern. In the following, we denote p(A1, A2, ..., An) the equivalency class of Π containing{Ai;

0 ≤ i ≤ n}.

The enactment of activity B after the "join" point in the Synchronization pattern requires the execution of all the Aiactivities(∀0 ≤ i ≤ n; P (B/Ai) = 1). In contrary

of Simple Merge and M-out-of-N-join pattern that have the same episodes different from the Synchronization pattern and where the parallelism between the Ai activities can be only seen in the M-out-of-N-join pattern (∃0 ≤ i, j ≤ n; P (Ai/Aj) = 0).

6.4 Example

As a working example, let the workflow model in section 2. We will focus on the discovery of the synchronization pattern formed by the given CA, P , SI activities. The width of the concurrence window infers the inclusion of the activity CI in our computing statistical dependency table and the discovery of episodes. This inclusion will allow us to remove any confusion or erroneous deductions. Table 4 presents a fraction of the statistical dependency table.

The episodes discovered in the log are:

Γ(p(CA, P ),SI)

Statistic dependency value (bold numbers) and discovered episodes bellow indicates that mined workflow contains a synchronization pattern formed by the given CA, P ,

(12)

Table 2. Rules of fork workflow patterns mining

SI activities. Note that the frequency P(CA/CI) lets us think about the sequential pattern which can give an indication about the episodes class that we must find in order to identify this pattern.

7

Mining the Set of Termination States

In this section, we describe how to mine the set of termination states of a workflow from its log. First we give a formal definition of a workflow set of termination states denoted WTS (Definition 8). In this definition, we specify also the WTS format used in our mining approach. Then we present the algorithm used to mine the WTS from a given event log (Algorithm 3).

Definition 8 (Workflow Termination State WTS). In a workflow execution case, each

activity has its termination state. It is described by the activity identifier and the ac-tivity state. Thus, an acac-tivity Terminated State denoted ATS is seen as a couple: ATS = (activityid, state), where :

– (activityId : int) is the ID of the activity , {(State: symbol)} is the last activity state

(13)

Table 3. Rules of join workflow patterns mining

A Case Terminated State denoted CTS is a set of ATS corresponding to a workflow execution case; CTS={ATS}. The set of the workflow termination states denoted WTS contains all possible CTS without redundancy; WTS={CTS}.

The algorithm build the WTS by proceeding as follows: each stream in the log is scanned and for each event, the ATS of its corresponding activity is updated by keeping only the last state. The Algorithm build for each stream its corresponding CTS. We can find many streams with the same CTS. The algorithm build the WTS as a the set of all

CTSs without redundancy.

Algorithm 3 : Mining Terminated States Set Input: Wlog : (workflowLog)

output: WTS :(workflow set of termination states)

var activity : int; courantA : ATS; CourantC : CTS Resul : WTS; begin

for all S:stream in Wlog CourantC=Null;

(14)

Table 4. Fraction of the statistical dependency table P CI CA P SI #CI = 100 0 0 0.36 0 #CA = 100 1 0 0.41 0 #P = 100 0.43 0.29 0 0 #SI = 100 1 1 1 0

for all e:event in S.sequenceLog

courantA.activityId = e.activityId; courantA.State= e.activityState; UpdateCTS(CourantC,courantA);

/* the function UpdateCTS updates courantA in CourantC */

endFor

WTS = WTS + CourantC; endFor

end

8

Mining Transactional Behavior

We define at this level a set of rules [13] allowing to mine workflow transactional beha-vior. These rules allow to tailor the activities transactional properties and the transactional flow according to the discovered control flow and set of termination states.

To illustrate the applicability of our rules we go back to the example of PC on line purchase. The control flow mining allows to discover the activities sequence order as illustrated in the figure 1. We suppose that the mining of the set of termination states allows to deduce the following W T S:

{[(CRS, terminated), (CIC, terminated), (P , terminated), (CI, terminated), (CA, ter-minated), (SI, terminated)]; [(CRS, terter-minated), (CIC, terter-minated), (P , failed), (CI, terminated), (CA, terminated), (SI, initial)]; [(CRS, compensated), (CIC, failed), (P , aborted), (CI, aborted), (CA, aborted), (SI, aborted)]}.

Let a be an activity that can be compensated (what means ∃AT S ∈ W T S | AT S.activityId = a ∧ AT S.state = compensated), we extract from the disco-vered control flow and W T S the compensation condition of a denoted cpCond(a). We can write cpCond(a) in disjunctive normal form; cpCond(a) = cpCondi(a). Then cpCondi(a) is one (and not necessary the) compensation condition of a. For in-stance for our example, the only activity that can be compensated is CRS and we have cpCond(CRS) = CIC.failed. Below, we introduce our rules to mine the workflow transactional behavior.

∀ activity a

1.  ∃AT S ∈ W T S | AT S.activityId = a ∧ AT S.state = failed =⇒ a is retriable 2. ∃AT S ∈ W T S | AT S.activityId = a ∧ AT S.state = failed =⇒ a is not

(15)

3.  ∃AT S ∈ W T S | AT S.activityId = a ∧ AT S.state = compensated =⇒ a

should be not compensatable and if it is not the case it will never be compensated.

4. ∃AT S ∈ W T S | AT S.activityId = a ∧ AT S.state = compensated =⇒

a is compensatable

∧ a have to be compensated when one of its compensation conditions occurs The first (respectively the second) rule says that if a never fails (respectively can fail) then a is (respectively is not) retriable. The third and forth rules allows to deduce when an activity a is compensatable and when it will be compensated.

Back to our example, we can deduce by applying the above rules the following transactional behavior:

– by applying 1 to the all activities except CIC and P we obtain: CRS, CI, CA and

SIare retriable.

– by applying 2 to CIC and P we obtain: CIC and P are not retriable.

– by applying 3 to the all activities except CRS we obtain: CIC, P , CI, CA and

SI should be not compensatable and if it is not the case, they will never be compensated.

– by applying 4 to CRS we obtain:

CRSis compensatable

∧ CRS have to be compensated when CIC fails

Thanks to this transactional behavior mining, we are able to detect that contrary to what is specified, SI never fails and P can fail. These two information allow to improve the workflow by:

1. omitting the two compensation interaction when SI fails, 2. specifying that there is no need for P to be compensatable and 3. adding an interaction ensuring the compensation of CA when P fails

9

Conclusion and Future Work

In this paper we have introduced a new workflow mining approach that allows disco-vering workflow transactional behavior from event-Based Log. Previous works [2,3,4,5] have only been interested in discovering control flows. We proceed in two steps.

1. The first one consists in mining workflow patterns and the set of termination states. The mining of workflow patterns looks like the mining of control flows. But our approach is original regarding other proposed techniques:

– It assumes a new approach never stated until now that it is characterized by a

partial discovery of the workflow at its initial phase. Therefore, we can recover results of mining patterns workflows even if our log is incomplete;

– It discovers more complex features with a better specification of "fork" point

(Exclusive choice, Parallel split and Multi choice patterns) and "join" point (Synchronization, Simple merge and M-out-of-N Join patterns);

(16)

– It seems to be more simple in computing. This simplicity will not affect its

efficiency in treating the concurrent aspect of workflow.

2. In the second step, based on the discovered control flow and set of termination states, we use a set of rules to mine the workflow transactional behavior.

Thus, our approach allows to detect transactional modelling anomalies and thereafter to improve the workflow model and then provides a significant improvement to WFMS understanding and design. However, the work described in this paper represents an in-itial investigation. In our future works, we hope to discover more complex patterns by using more metrics (e.g. entropy, periodicity, etc.) and by enriching the workflow log. We are also interested in the modelling and the discovery of more complex transactio-nal characteristics of cooperative workflows (e.g., workflows composition, compensate activity, roll-back, etc).

References

1. Joachim Herbst. Inducing workflow models from workflow instances. In the Concurrent Engineering Europe Conference. Society for Computer Simulation (SCS), 1999.

2. Rakesh Agrawal, Dimitrios Gunopulos, and Frank Leymann. Mining process models from workflow logs. Lecture Notes in Computer Science, 1377:469–498, 1998.

3. Jonathan E. Cook and Alexander L. Wolf. Discovering models of software processes from event-based data. ACM Transactions on Software Engineering and Methodology (TOSEM), 7(3):215–249, 1998.

4. Joachim Herbst. A machine learning approach to workflow management. In Machine Lear-ning: ECML 2000, 11th European Conference on Machine Learning, Barcelona, Catalonia, Spain, volume 1810, pages 183–194. Springer, Berlin, May 2000.

5. W.M.P. van der Aalst and L. Maruster. Workflow mining: Discovering process models from event logs. In QUT Technical report, FIT-TR-2003-03, Queensland University of Technology, Brisbane, 2003.

6. Marek Rusinkiewicz and Amit Sheth. Specification and execution of transactional workflows. pages 592–620, 1995.

7. W. M. P. Van Der Aalst, A. H. M. Ter Hofstede, B. Kiepuszewski, and A. P. Barros. Workflow patterns. Distrib. Parallel Databases, 14(1):5–51, 2003.

8. A. Elmagarmid, Y. Leu, W. Litwin, and Marek Rusinkiewicz. A multidatabase transaction model for interbase. In Proceedings of the sixteenth international conference on Very large databases, pages 507–518. Morgan Kaufmann Publishers Inc., 1990.

9. S. Jablonski and C. Bussler. Workflow Management: Modeling Concepts, Architecture, and Implementation. International Thomson Computer Press, 1996.

10. Peter Lawrence. Workflow handbook 1997. John Wiley & Sons, Inc., 1997.

11. Jonathan E. Cook and Alexander L. Wolf. Automating process discovery through event-data analysis. In Proceedings of the 17th international conference on Software engineering, pages 73–82. ACM Press, 1995.

12. Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1(3):259–289, 1997.

13. Sami Bhiri, Claude Godart, and Olivier Perrin. A transactional-oriented framework for com-posing transactional web services. To appear In IEEE International Conference on Services Computing (SCC 2004). IEEE Computer Society, Shangai, september 2004.

References

Related documents