The Spiral Development Methodology - Visualisation Studio for the analysis of massive datasets

2. Overview

6.4 The Spiral Development Methodology

Boehm defines the primary characteristics of the spiral development method as follows (Boehm, 2000):

 a risk-driven process model generator

 a cyclic approach for incrementally growing a system

 a set of anchor point milestones for ensuring stakeholder commitment 6.4.1 Risk driven process model generator

Boehm defines risk as “situations or possible events that can cause a project to fail to meet its goals”. This project had several major risk factors arising from its multi- disciplinary nature. Examples would include:

 Several of the desired project goals were in mutual conflict with each other, such as implementing visually responsive applications (Visual Analytics) vs cross-platform compatibility (Software Engineering).

 As initially conceived the project would have utilised the CARMEN Virtual Laboratory (VL) for neurophysiology (Gibson et al., 2008; Weeks, 2010) to

Page | 92

provide the required cluster computing hardware but this was not under the researcher’s control.

 It was expected that the implemented features would evolve over time as each development cycle would provide greater clarity on what was physically possible for the hardware to accomplish.

It was therefore considered vital that a software development methodology should be employed that allowed risk to be evaluated and addressed at each stage of the development. The spiral model directly links risk to the generation of its “process model” and is, therefore, particularly well suited to the development of projects with a large number of (initially) unknown risk factors. A process model should answer two main questions:

1. What should be done next? 2. For how long should it continue?

In the spiral model, the answers to these questions are driven by risk considerations. Ultimately, the answers to these questions will define the work and the amount of time dedicated to the next cycle of development.

6.4.2 Cyclic approach for incrementally growing a system

In software development it is common to time box or time limit each development cycle. It is also common to require that each cycle has a distinct identifiable output. Software development methodologies that employ this approach are generally described as “incremental” methodologies. The spiral model’s development cycles produce incremental product prototypes. Each cycle builds on the prototype from the previous cycle with the completed software product “evolving” with each cycle (see Figure 6-3).

In the case of this project the development of functional prototypes proved critical to identify areas where further refinement was needed. The initial research plan called for the following prototypes:

Prototype Description Chapter

Visual programming language (core i-Pipeline / VPL) implementation.

Develops the VPL directed graph, base classes for different types of graph nodes and the i-Pipeline virtual desktop

See: Chapter 5 for Visual programming Languages (VPL’s). See: Chapter 7 for i- Pipelines design.

Development of neuroscience data model

Develops an efficient data storage system with support for rapid filtering and sorting operations

See: Chapter 8 for the design and development of the neuroscience data model.

Development of initial neuroscience processing algorithms

Development of the burst sort and inter spike interval sorting algorithms for i-Raster

See: Chapters 9, 10 and 11 for processing algorithms and their use. Integration of data model and

neuroscience algorithms with i- Pipeline

Algorithms are “wrapped” into graph nodes and the i-Pipeline Toolbox implemented

See: Chapters 7, 8 appendix 1

Development of the parallel execution engine

A data-availability-driven approach to the VPL’s execution is implemented on top of a structure based dataflow model

See: Chapter 5 (Section 5.2.2) – Dataflow execution models

Chapter 6: Research and Development Methodology

Page | 93

Prototype Description Chapter

Development of the i-Raster visualisation

Implementation of a revised i- Raster visualisation incorporation the new overview and filtering features

See: Chapter 9

Development of a pairwise cross correlation and clustering algorithm to run over the CARMEN compute cluster

Developed and deployed a CARMAN service for pairwise cross correlation and clustering. See: Chapter 10 (Section 10.1.1) – Computational challenge of Pairwise Cross Correlation Development of i-Grid visualisation

Implementation of the i-Grid visualisation with a new dendrogram overview using the clustering algorithm.

See: Chapter 10

Development of i-Animate Implementation of i-Animate to visualise recording MEA array with heat map overlays.

See: Chapter 11

Table 6-1: Research prototyes, work undertaken to produce them and associated thesis chapter(s).

The primary risk with the prototype plan as set out in Table 6-1 was the reliance on an external third party (the CARMEN Virtual Laboratory (Gibson et al., 2008)) to provide the compute cluster for the most computationally intensive data analysis. Ultimately this risk was realised with the provided hardware and access restrictions preventing the CARMEN compute cluster from delivering the needed computing power. The fall back plan of using the Plymouth University high performance computing cluster was adopted and the pairwise cross correlation and clustering algorithms were implemented using this hardware.

6.4.3 Key research outputs

In this project a framework for data processing and analysis has been developed modelled on the VPL concept. This framework will allows a developer to provide a library of pre-coded algorithms which are represented as graphical nodes. Each node is deployable to a desktop environment and can be linked with other nodes to form a visual data processing pipeline. The pipeline will follow the dataflow execution model with each algorithm executing concurrently. The final output is delivered to a visualisation module that acts as the terminus for the pipeline. The visualisation module is an independently executing program which applies the principles of information visualisation and allows an interactive exploration of the processed data. The visualisation may also present options for further processing.

The created VPL, named iPipeline is a general framework into which domain experts may place a library of analysis algorithms. By exchanging libraries the VPL is customisable for data analysis in any problem domain. For the purpose of development a library for the analysis of neuroscience data has been developed. The neuroscience problem addressed was the discovery of a neural network’s architecture from the analysis of simultaneously recorded neural spike train data. This problem was selected as it is a computationally challenging “big data” problem that can be solved efficiently through combining the disciplines of neuroscience, software engineering and information visualisation.

Summary

This chapter reviews the design of the iPipeline data processing environment including both the technical design and the visualisation choices made.

Chapter 7 Designing iPipeline & the neuroscience visual programming

language

“pipeline noun. In computing, a pipeline is a set of data processing elements connected in series, where the output of one element is the input of the next one.”

Chapter 7: Design of iPipeline

Page | 95

In document Visualisation Studio for the analysis of massive datasets (Page 91-95)