Workflow Desktop - The ‘thin’ framework layer

7 Overview – iPipeline

7.2 The ‘thin’ framework layer

7.2.1 Workflow Desktop

The user interacts with the application via the workflow desktop component. It is here that the user creates their workflows using the domain specific processes provided by the problem domain developer. Figure 7-2 illustrates the desktop with its key features identified.

Figure 7-2: iPipeline’s workflow desktop component

The desktop can expand infinitely in any direction to accommodate any sized

workflow or any number of workflows. The single most important part of the workflow

desktop is the toolbox. It is through the toolbox that the user is provided with access

to the process nodes from which they will build a workflow. The contents of the

toolbox will change depending on the specific problem domain being studied. It is

through the toolbox that iPipeline’s ‘thin’ framework layer connects to the ‘thick’

framework layer. The content of the toolbox is defined in the ‘thick’ framework layer.

The toolbox is divided into three sections mirroring the three phases of the

information processing cycle (see Chapter 5). The toolbox divisions, their mapping to

the information processing cycle and a description of each division is given in

Page | 98

Toolbox division

Information processing

cycle phase

Description

Data Source

Input Phase

Processes encapsulating

algorithms that provide

data for processing.

Data Manipulation

Processing Phase

Processes encapsulating

algorithms that manipulate

the data in the pipeline.

Data Visualisation

Output Phase

Processes encapsulating

algorithms that visualise

the data in the pipeline.

Table 7-2: Divisions of the toolbox and their relationship to the information processing cycle

To construct a workflow the user places nodes from the toolbox onto the desktop surface and connects them together. At a minimum a workflow must be composed of one data source process and one data visualisation process. Such a workflow will simply create a visualisation that displays the data loaded into the pipeline. More commonly one (or more) data manipulation processes will be inserted between the data source and data visualisation processes. The definition of a valid workflow can therefore be given as:

1. A workflow that contains a route through the workflow’s directed graph which starts at a data source

process and ends at a data visualisation process and

2. Optionally the route may contain any number of data manipulation

processes.

3. All processes must have correctly configured settings panels.

A valid workflow must have at least one (1) route through its directed graph that meets the conditions above but may have more. Figure 7-2 has two (2) such routes. Figure 7-2 shows both the minimal case (route 1) where a data source and data visualisation process is directly connected and the optional case where a data manipulation process has

been inserted (route 2). Each route can be executed to produce a set of data that is

Figure 7-3: Workflow to merge five data files into a single data source.

Chapter 7: Design of iPipeline

Page | 99

delivered to the data visualisation process at the end of the route. Since Figure 7-2 has two routes through its directed graph it will deliver two sets of data to the iRaster visualisation at the end of the workflow. These will be:

1. The raw data read from a data file by the input file process at the start of the workflow (route 1 in Figure 7-2).

2. A sorted set of data created by the optional data manipulation process (route 2 in Figure 7-2).

The power of visual programming becomes apparent when the workflow becomes more complex. Figure 7-2 dealt with a simple case where a single file of data is loaded into the pipeline. Figure 7-3 extends the workflow to load five (5) files of data. This situation might arise where a researcher wishes to combine simultaneous recordings from five different sensors into a single set of data. The problem domain developer has provided a data manipulation process to merge multiple datasets into a single dataset. Modifying the workflow only takes

moments. A classical computer program would require changes to the source code and re- compilation to achieve the same result. This can realise considerable time savings as operations can be inserted into (or removed from) the workflow without having to re-code or recompile the application. The processes encapsulate computer code that is “written once; re-used anywhere”.

Figure 7-4 further extends the Figure 7-3 example by introducing a new process – the export to file process. The workflow in Figure 7-3 loads merges and sorts five sets of data. This processing takes time and it might be useful to create and permanently store a single file with this

processing already

completed. Figure 7-4 shows such a workflow that produces two files. The first

Figure 7-4: Workflow to merge five datasets. Two new files are created, the first of raw data and the second after sorting the raw data.

Page | 100

file stores the result of combining five sets of raw data. The second file stores the same data after it has been sorted. The

benefits to the researcher can be summarised as:

1. The single file of raw data is easier to use and distribute than five files of raw data. 2. The result of

merging the raw data files and sorting the result is preserved allowing these steps to be eliminated in future. 3. The workflow is naturally paralysable. Writing the merged raw data file to disk can occur at the same time as sorting the raw data as they appear on different routes through the workflow.

The workflow desktop provides a flexible visualisation of the “program” that the user is creating. At the same time it provides the ability to rapidly reconfigure that program and preserve the results.

7.2.2 Process base classes

In document Visualisation Studio for the analysis of massive datasets (Page 97-100)