Process base classes - The ‘thin’ framework layer

7 Overview – iPipeline

7.2 The ‘thin’ framework layer

7.2.2 Process base classes

layer must interact with the “domain specific visual programming language” built by the problem domain developer (The ‘thick’ layer). However the ‘thin’ framework has no knowledge either of the problem domain or the specific algorithms that have been implemented by the problem

domain developer. Nevertheless this gap must be bridged if any “visual program” is to be created and executed. The process base classes (and the IProcess interface) provide the

Figure 7-5: Thin layer base process class implementation structure.

Chapter 7: Design of iPipeline

Page | 101

means to bridge the gap (see Figure 7-5). The bridge is built on the idea that whatever workflow may have been created by the user it must take the form of a directed graph. As such a guarantee exists that the workflow takes the form of a collection of ‘nodes’ linked together with parent / child relationships. The ‘thin’ framework layer is therefore aware only of ‘nodes’ and ‘links’:

 Nodes represent ‘something that is done to data’.

 Links define the order in which the nodes ‘do something to the data’ (a parent node must finish ‘doing its thing to the data’ before a child node starts to ‘do its thing’).

The above statements both contain an abstract “something”. That ‘something’ is an algorithm that manipulates data. The “BaseProcess” class can therefore be considered an abstraction that represents an algorithm created by the problem domain developer. Hence any ‘BaseProcess’ may be a ‘node’ in the workflows directed graph. To make any algorithm interoperate with iPipeline it must first be “wrapped” in a “BaseProcess” class. Wrapper classes are a form of software design pattern formally identified by the ‘gang of four’ in their seminal book on software design patterns (Gamma et al., 1994). This class exposes a common interface that iPipeline uses to interact with the encapsulated algorithm. Within the iPipeline code base this interface is called ‘IProcess’ the UML diagram in Figure 7-5 shows the IProcess interface and its implementing classes.

The IProcess interface is itself derived from the Java languages Runnable interface. This means that each process is itself ‘runnable’ as an independent entity – essentially a mini-program. The runnable interface has however been expanded to add the functionality needed by a ‘node’ in a directed graph. The most basic function is to encapsulate the problem domain developer’s algorithm (whatever it might be). The process(Object[]) : boolean method provides this facility. The problem domain developer will extend the base class (which implements IProcess) and overrides this method to implement their algorithm. The method is invoked by the parallel execution engine during execution of the workflow.

Every node in the workflow’s directed graph must accept the result from previous parent nodes and produce a result of its own. The IProcess interface provides three methods to manage the production of results:

 getResult() : IProcessingResult

 setProcessingResult (IProcessingResult) : boolean  clearResult() : void

These methods are used by the parallel execution engine to control the flow of processing results through the workflow. The interface IProcessingResult is the abstraction representing the structure produced by iPipelines structure based dataflow model (Davis & Keller, 1982; Dennis & Robinet, 1974; Keller & Yen, 1981)(see Chapter 5). This abstraction will be discussed further in the next section.

The final three methods of the IProcess interface reflect the information processing cycle. As discussed in chapter 5 this cycle is composed of three primary steps:

1. Input

Page | 102

3. Output.

A ‘base’ class is required for each of these steps. These three base classes will form the link between the ‘thin’ framework layer that manages the workflow and the ‘thick’ framework layer which defines the algorithms to be executed. The BaseProcess class seen in Figure 7-5 provides an implementation of the IProcess interface sufficient to represent a ‘general processing node’ (step 2 of the information processing cycle). Step 1 and step 3 are essentially more specialised versions of the general processing node. Specialised sub- classes are provided to implement these steps. Step 1 is represented by the BaseInitialProcess class while step 3 is represented by the BaseVisualisationProcess class.

Taken together the three ‘base’ classes provide the problem domain developer with a means to wrap their algorithms into a ‘thin’ framework class that:

1. Can be represented as a node in a workflow’s directed graph and 2. Be executed by the thin frameworks parallel execution engine.

In addition to the software implementation of a process each process also needs to be represented visually on the workflow desktop. The representation is known as the process glyph. The user will connect these process glyphs together to create the workflow. Figure 7-6 shows the visual elements of a process glyph.

Figure 7-6: The visual representation (glyph) of a process

The process glyph has two primary tasks to perform. Firstly to identify the algorithm encapsulated by the process and secondly to define which other processes are allowed to connect to this process. The majority of the process glyph is given over to describing the algorithm encapsulated by the process. The BaseProcess class and its child classes allow both a textual description and an image to be associated with the encapsulated algorithm. At the heart of the glyph a button prominently displays the image with the textual description above it. The button provides access to the process settings panel created by the problem domain developer.

The remainder of the process glyph is an input and an output connection points. As described in chapter five iPipeline uses a structure based dataflow model. In this model each process receives input and generates output by receiving and transmitting ‘tokens’ that encapsulate data. The input / output connection points define which processes will deliver dataflow tokens for processing and where the token created by the process will be sent. The BaseProcess class has both an input and an output connection point. Hence the class both accepts dataflow tokens as input and produces them as output. Its child classes

Chapter 7: Design of iPipeline

Page | 103

BaseInitialProcess and BaseVisualisationProcess (see Figure 7-5) do not have both connection points as they represent the start or end of a workflow. The BaseInitialProcess has no input connection point because it represents the start of the workflow and therefore there can be no earlier process that provides input dataflow tokens. Equally the BaseVisualisationProcess has no output connection point since it represents the end of a dataflow.

In addition to the controlling processes connectivity through their presence or absence each connection point carries a series of three glyphs that define the types of process that can connect to that point. These glyphs are inspired by the UK road traffic sign system and reflect the information processing cycle and are presented in Table 7-3 below.

Process Category Visual encoding used

UK Road Traffic sign

Visual road traffic encoding

Input Process Order / Command

Data Manipulation Process Warning / Danger Output Visualisation Process Information

Table 7-3: Visual encoding of process type and its similarity to UK road signs

As Table 7-3 shows the stages of the information processing cycle are connected to a geometric shape. Input processes (stage one) are related to a circle / disk glyph. Circular signs on the road system encode commands and usually instruct a driver to start doing something. Input processes are associated with this symbol as they represent the start of a workflow. Very often such processes will draw data from a hard disk. The second stage of the information processing cycle (data manipulation / processing) is represented by a triangle. The road sign analogue is a warning / danger sign. The intent is to prompt the user to ensure that they are selecting the correct data manipulation process + algorithm to generate the data visualisation they need. The final stage (output) is represented by a rectangle intended to represent a computer monitor or sheet of paper. The road sign analogue is a traffic sign that presents information to the driver. This reflects that the production of information is the end goal of the information processing cycle and that the visualisation processes are the end of an iPipeline workflow. To indicate the types of process that may connect to a connection point the glyph is either filled with a solid colour (connection allowed) or unfilled (connection dis-allowed).

In document Visualisation Studio for the analysis of massive datasets (Page 100-103)