• No results found

Process Modelling and Workflow Environments

2. Scientific Data Analysis, Data Mining and Data Analysis Environments

2.2. Distributed Computing and Data Mining Systems

2.2.3. Process Modelling and Workflow Environments

Workflow environments including workflow designers and enacting systems are a popular technology in business and e-science alike to flexibly define and enact complex data pro- cessing tasks (see also Section 2.1.3). Driven by specific applications, a large collection of scientific workflow systems have been prototyped in the past, e.g. Taverna [72], Triana [128], Kepler [86] or Galaxy [61]. There also exist distributed workflow management sys- tems that are integrated with grid computing environments, such as SWIMS [41]. The next generation of workflow systems are marked by workflow repositories such as myEx- periment [58] or workflow sharing functionality, which tackle the problem of organizing workflows by offering the research community the possibility to publish, exchange and discuss individual workflows. In the area of business process management (BPM), there also exist a lot of environments for managing business processes and workflows, e.g. com- mercial and open source systems such as Activiti [6], Aris [7], Intalio BPMS [76], jBPM [79], YAWL [129], and many more.

A conceptual basis for process and workflow technology is provided by the concept of workflow patterns [117]. Such workflow patterns represent simple patterns for modelling certain structures in processes and workflows. According to [15], process patterns have many advantages: processes in BPM systems serve as both the specification and the source code. The modelled processes become the solutions deployed and provide a simple com- munication tool between end-users, business analysts, developers and the management.

2.2. Distributed Computing and Data Mining Systems

In the context of the process and workflow systems mentioned above, several languages and notations for describing, modelling and visualizing processes emerged, e.g. XPDL [33], BPEL [4], YAWL [129], BPMN [153], and many others.

In the following, we will describe success factors for process modelling, the modelling notation BPMN which will be used for visualising process in this thesis, and the Triana system as prototypical workflow environment.

Success Factors for Process Modelling

Modelling processes and designing executable workflows usually is done to replace manual, semi-automatic or inefficient processes that have been used before. According to Hammer and Champy [66], process reengineering is ”the fundamental rethinking and radical re- design of (business) processes to achieve dramatic improvements in critical, contemporary measures of performance”. There exist several metrics from the area of business process reengineering (BPR) [17], which aim at measuring efficiency and effectiveness of an ex- isting business. In literature, BPR is often evaluated according to a set of dimensions in the effects of redesign measures, e.g. time, cost, quality and flexibility [104], cost, quality, service and speed [66] or cycle time, cost, quality, asset utilization and revenue generated [91]. Ideally, a redesign or modification of a process decreases the time required to handle incidents, it decreases the required cost of executing the process, it improves the quality of the service that is delivered and it improves the ability of the process to react flexible to variation [17]. However, a property of such an evaluation is that trade-off effects become visible, which means that in general, improving upon one dimension may have a weakening effect on another (see Figure 2.6).

Figure 2.6.: Dimensions in the effects of process (re)design (based on [104]).

BPMN

Business Process Model and Notation (BPMN), previously known as Business Process Modeling Notation, is a graphical representation for specifying business processes in a business process model [153]. BPMN is a standard for business process modelling main- tained by the Object Management Group (OMG). The graphical notation for specifying business processes is based on a flowcharting technique. The data mining patterns, which

Figure 2.7.: BPMN overview.

will be described in Chapter 5, are visualized using BPMN. In detail, the modelling in BPMN is based on a small set of graphical elements from the following categories [154]:

• Flow Objects are the main describing elements that represent Events, Activities and Gateways. An Event is something that happens in the course of the process and affects the flow of the process. There are three types of Events, based on when they affect the process flow: Start, Intermediate, and End. Events are visualized as circles with open centres to allow internal markers to differentiate different triggers or results. An Activity represents work that is performed in the process and can be atomic or non-atomic. Activities are visualized by a rounded-corner rectangle. A Gateway is used to control the divergence and convergence of the process flow and determines decisions, forking, merging, and joining of paths. Gateways are represented by the diamond shape with internal markers to indicate the type of behaviour control.

• Connecting Objects are used for connecting the Flow Objects. Connecting Ob- jects represent Sequence Flow, Message Flow and Associations. A Sequence Flow is used to show the order (the sequence) of the activities in a process. Sequence Flows are visualized by a solid line with an arrowhead. A Message Flow is used to show the flow of messages between two separate process participants that send and receive them. Message Flows are visualized by a dashed line with an arrowhead. An Association is used to associate data, text, and other Artifacts with flow objects and to show the inputs and outputs of activities. Associations are visualized by a dotted line with an arrowhead.

2.2. Distributed Computing and Data Mining Systems

• Swim lanes are used as visual mechanism of organising and categorising activities. A Pool consists of different Lanes, a lane holds the Flow Objects, Connecting Objects and Artifacts. Two separate Pools represent two different process participants. • Artifacts represent Data Objects, Groups and Annotations. Data Objects are used

to show how data is required or produced by activities and are connected to activities through Associations. Groups are visualized by a rounded corner rectangle drawn with a dashed line. Grouping can be used for documentation or analysis purposes, but does not affect the Sequence Flow. Annotations are a mechanism for a creator of the diagram to provide additional text information for the reader.

Figure 2.7 gives an overview over the core BPMN elements. In Figure 2.8 we can see an example of a process modelled in BPMN. The start of the process is modelled by the Start Event. After the process starts, two tasks are performed for receiving and checking an order, which are modelled by two Activities. After that, a Gateway visualizes that there is a decision on whether the order is valid. The X in the Gateway further specifies that it is an XOR-gateway, which means that the sequence flow can follow only one way. If the order is valid, it is processed and closed. If not, the order is rejected. Finally, the process ends, which is modelled by the End Event.

Figure 2.8.: Example of a Process modelled in BPMN.

Triana

Triana [128], a Java-based application distributed under the Cardiff Triana Project Soft- ware License (based on the Apache Software License Version 1.1), has been developed at Cardiff University as part of the GridLab [1] and GridOneD [2] projects. Triana consists of two distinct components: a graphical workflow editor for visual composition of workflows and a workflow manager (also known as ’engine’) for executing workflows.

In a Triana workflow, any atomic operation is represented by a separate workflow unit. A workflow unit is supposed to be a light-weight component that is concerned with the correct progression of the workflow, but it does not implement the operation itself. The actual operation may take the form of a web service or a WSRF-compliant service. The Web Services Resource Framework (WSRF) is a family of OASIS-published specifications for web services [3]. The unit, however, implements the required logic for setting the necessary properties for execution, and for passing input data to the service. It may contain classes for visualization of the results of the conducted operation and may pass the output data to the next unit in the workflow. The properties of each unit can be modified

by using a control dialogue box. The input and output data of a unit are represented by simple or complex data types.

In the GUI, workflow units can be packed into folders in a tree-like structure. A workflow is created by selecting and dragging units from the folders and dropping them onto a workspace. The workflow developer can connect units, which means that the output of one operation can be used as input to the next operation in the workflow. By doing so, it is possible to compose workflows of arbitrary complexity.

Triana is capable of discovering and binding to web services and WSRF-compatible services. When binding to a WSRF-service, each of its public methods are displayed as a single unit. Triana provides two units, WSTypeGen and WSTypeViewer, which process the WSDL file of the service, dynamically render the respective input and output fields and create request (i.e. input) and response (i.e. output) data types.

Workflows are executed by the workflow manager residing either on the user’s client machine or on a dedicated manager machine in a grid environment. For this purpose the Triana workflow editor produces a Java object representing the visual workflow, which is then executed by the manager. The manager can be launched on any machine where an appropriate Java Virtual Machine (JVM) is installed. Although the execution is performed on a single machine, the tasks can be distributed, while using the execution machine as central synchronization manager. Triana’s workflow manager is completely independent of the Triana workflow editor; it is self-contained and needs no additional software in order to execute pre-defined workflows.

Figure 2.9 shows a screenshot of the Triana environment. The DataMiningGrid Ap- plication Description Schema, which will be described in Chapter 3, is developed in a workflow environment which is based on Triana.