3.2 Design of the Orchestra Language
3.2.4 Dataflow Computation Model
Orchestra’s design supports a computation model that is dissimilar to the classicalvon Neu-
mann computation model in which data passively resides in a specialised store whilst in-
structions are each executed in a sequence controlled by aprogram counter. Its computation
model permits a workflow to be represented as a Directed Acyclic Graph(DAG) in which
the vertices represent workflow tasks (e.g. service operations) to be executed with directed edges between them that represent data dependencies. These dependencies indicate the data movement to and from the services. For example, input data may be passed to a particular service by invoking a functionality (e.g. operation) it provides. This service may produce a result which may be passed to another service that provides a different functionality. The workflow tasks in this computation model may be ready to execute simultaneously as they represent asynchronous concurrent events. Each workflow task is executed as soon as all the input parameters required for its execution become available. This execution model must not be confused with other models that provide similar features such as actor-oriented com- putation models described by Dennis [256], [257], petri nets proposed by Peterson [190], or process networks described by Kahn [258]. This thesis does not consider the differences between these models which are discussed in detail by Johnston et al. [259]. The remainder of this section discusses the following:
1. The structure of the dataflow graph. 2. Dataflow dependencies.
3. Dataflow patterns.
Dataflow Graph Structure
Orchestra’s dataflow graph can be represented as a DAG G that consists of verticesVG
and edges EG, each having the form (x→y), wherex,y∈VG. Hence, a path in this graph
can be expressed as a sequence of edges that share adjacent endpoints as from vertex v1
114 TheWeb Service OrchestraLanguage
introduction of loops and control structures in the workflow specification. For instance, the
vertices v1 and vn cannot be identical in the preceding path. The edges that flow toward
a particular vertex are input dependencies to that vertex, while those that flow away are its output dependencies. Based on this representation, a service-oriented workflow can be
expressed as a graphW= (S,D), where the vertices are denoted bySwhich can be expressed
as a finite set of services: S={s0,s1, ..,sn}, and the edges between them are denoted by
D where D⊂S×S. Any pair of vertices such as sx and sy are considered neighbours if
(sx,sy)∈D. Each service can be represented as a tuples= (E,O), where E denotes the
service endpoint, andOdenotes the operation provided by the service.
Dataflow Dependencies
Dataflow graphs are commonly generated for compiler optimisation purposes, and have been used in the past to capture the relationship between data entities and operators for
a particular program [260], [261], [262]. Orchestra’s dataflow graph permits services to
communicate with each other using messages that contain a typed data set. For instance, an input message may contain one or more parameters to invoke a particular service operation which in turn may produce an output message that contains a result. These messages are required to compose service operations together, where the output of a particular service is passed as an input to another. Service composition is only possible when the input and
output types are identical. Orchestra’s dataflow dependencies include the following:
• Service input dependency: Service input dependency means that a service invoca- tion typically requires one or more inputs for it to be executed.
• Service output dependency: Service output dependency means that a service invoca- tion produces a particular output upon its execution, which may represent intermediate or final data in the workflow.
• Service composition dependency: Service composition dependency means that a
service invocation produces an output that is passed in the workflow as an input pa- rameter to execute other service invocations.
3.2 Design of theOrchestraLanguage 115
These dependencies can be detected automatically by the language compiler, after which they may be analysed to determine opportunities for parallelism. For example, the analysis outcome can be used to decompose a workflow into smaller parts for execution onto dis- tributed machines, and the dependencies can be used to determine the order in which these parts can be executed efficiently. Some advanced notations based on ontological concepts can be used to capture such dependencies by Cardoso and Sheth [263], and Paolucci et al. [264]. However, the discussion of these notations is beyond the thesis scope.
Dataflow Patterns
Dataflow patterns have been discussed in Section 2.3 in detail, and they are used as ba- sic building blocks to compose workflow tasks for scientific applications as described by
Juve et al. [265]. Similarly,Orchestrauses dataflow patterns to compose different services
where each service is responsible for computing a particular workflow task. Therefore these patterns are re-defined as follows:
• Process pattern: This pattern represents a simple service invocation, where the ser- vice takes one or more inputs and produces a single output. It has been shown in Figure 3.1.
• Pipeline pattern: The pipeline pattern is used for chaining several services, where the output of a particular service operation is passed as an input to another service operation in a sequential manner.
• Distribution pattern: This pattern represents the passing of a service operation out- put to multiple service operations as an input parameter to these operations.
• Aggregation pattern: This pattern represents the collection of multiple outputs from different service operations which are all used as input parameters to a single service operation.
116 TheWeb Service OrchestraLanguage