Dataflow Computation Model - Design of the Orchestra Language

3.2 Design of the Orchestra Language

3.2.4 Dataflow Computation Model

Orchestra’s design supports a computation model that is dissimilar to the classicalvon Neu-

mann computation model in which data passively resides in a specialised store whilst in-

structions are each executed in a sequence controlled by aprogram counter. Its computation

model permits a workflow to be represented as a Directed Acyclic Graph(DAG) in which

the vertices represent workflow tasks (e.g. service operations) to be executed with directed edges between them that represent data dependencies. These dependencies indicate the data movement to and from the services. For example, input data may be passed to a particular service by invoking a functionality (e.g. operation) it provides. This service may produce a result which may be passed to another service that provides a different functionality. The workflow tasks in this computation model may be ready to execute simultaneously as they represent asynchronous concurrent events. Each workflow task is executed as soon as all the input parameters required for its execution become available. This execution model must not be confused with other models that provide similar features such as actor-oriented computation models described by Dennis [256], [257], petri nets proposed by Peterson [190], or process networks described by Kahn [258]. This thesis does not consider the differences between these models which are discussed in detail by Johnston et al. [259]. The remainder of this section discusses the following:

1. The structure of the dataflow graph. 2. Dataflow dependencies.

3. Dataflow patterns.

Dataflow Graph Structure

Orchestra’s dataflow graph can be represented as a DAG G that consists of verticesV_G

and edges E_G, each having the form (x→y), wherex,y∈V_G. Hence, a path in this graph

can be expressed as a sequence of edges that share adjacent endpoints as from vertex v₁

114 TheWeb Service OrchestraLanguage

introduction of loops and control structures in the workflow specification. For instance, the

vertices v₁ and v_n cannot be identical in the preceding path. The edges that flow toward

a particular vertex are input dependencies to that vertex, while those that flow away are its output dependencies. Based on this representation, a service-oriented workflow can be

expressed as a graphW= (S,D), where the vertices are denoted bySwhich can be expressed

as a finite set of services: S={s₀,s₁, ..,s_n}, and the edges between them are denoted by

D where D⊂S×S. Any pair of vertices such as sx and sy are considered neighbours if

(s_x,s_y)∈D. Each service can be represented as a tuples= (E,O), where E denotes the

service endpoint, andOdenotes the operation provided by the service.

Dataflow Dependencies

Dataflow graphs are commonly generated for compiler optimisation purposes, and have been used in the past to capture the relationship between data entities and operators for

a particular program [260], [261], [262]. Orchestra’s dataflow graph permits services to

communicate with each other using messages that contain a typed data set. For instance, an input message may contain one or more parameters to invoke a particular service operation which in turn may produce an output message that contains a result. These messages are required to compose service operations together, where the output of a particular service is passed as an input to another. Service composition is only possible when the input and

output types are identical. Orchestra’s dataflow dependencies include the following:

• Service input dependency: Service input dependency means that a service invocation typically requires one or more inputs for it to be executed.

• Service output dependency: Service output dependency means that a service invocation produces a particular output upon its execution, which may represent intermediate or final data in the workflow.

• Service composition dependency: Service composition dependency means that a

service invocation produces an output that is passed in the workflow as an input parameter to execute other service invocations.

3.2 Design of theOrchestraLanguage 115

These dependencies can be detected automatically by the language compiler, after which they may be analysed to determine opportunities for parallelism. For example, the analysis outcome can be used to decompose a workflow into smaller parts for execution onto dis- tributed machines, and the dependencies can be used to determine the order in which these parts can be executed efficiently. Some advanced notations based on ontological concepts can be used to capture such dependencies by Cardoso and Sheth [263], and Paolucci et al. [264]. However, the discussion of these notations is beyond the thesis scope.

Dataflow Patterns

Dataflow patterns have been discussed in Section 2.3 in detail, and they are used as ba- sic building blocks to compose workflow tasks for scientific applications as described by

Juve et al. [265]. Similarly,Orchestrauses dataflow patterns to compose different services

where each service is responsible for computing a particular workflow task. Therefore these patterns are re-defined as follows:

• Process pattern: This pattern represents a simple service invocation, where the service takes one or more inputs and produces a single output. It has been shown in Figure 3.1.

• Pipeline pattern: The pipeline pattern is used for chaining several services, where the output of a particular service operation is passed as an input to another service operation in a sequential manner.

• Distribution pattern: This pattern represents the passing of a service operation output to multiple service operations as an input parameter to these operations.

• Aggregation pattern: This pattern represents the collection of multiple outputs from different service operations which are all used as input parameters to a single service operation.

116 TheWeb Service OrchestraLanguage

In document On the construction of decentralised service oriented orchestration systems (Page 140-143)