The Strategy - Analysis, Representation and Mapping of Neural Networks onto Parallel Hardware

In this thesis, the neural network execution strategy has been to achieve a high- performance execution for a wide range of models, by exploiting general-purpose parallel hardware platforms. Generality, flexibility and scalability have been other considerations. The main disadvantage of the general-purpose execution strategy is that the general-purpose devices cannot match the level of performance provided by the special-purpose devices [51,70,85,131]. Special-purpose neurocomputers are often application- and algorithm-specific devices and are usually too expensive. The level of performance provided by general-purpose devices can be acceptable for most applications, and can be enhanced through parallelism and cascading. The general- purpose execution strategy shifts the complexity to the software, as it requires flexible representations and efficient mapping strategies which are capable of exploiting general- purpose hardware.

The Virtual Machine concept lies at the centre of the general-purpose execution strategy. This idea is not new - similar ideas have been put forward in the past [23,97]. TRW Mark III, presented in chapter 2, was an early example of a general-purpose, parallel, scalable neurocomputer which pioneered this philosophy in neural computing. The Galatea VM typically consists of a communications unit and an execution unit, each specialised for separate tasks. The communications unit consists of a local memory unit and a CPU, and it is responsible for interfacing with the external environment, controlling the co-processor board and carrying out other calculations which would be too expensive to execute on the board. Execution units are compact, general-purpose neurocomputer, accelerator or co-processor boards. A number of VMs can be connected to a host machine producing a general purpose neural computer. The VMs or general- purpose neurocomputer units are currently under development at Siemens and Philips. After their completion, an assessment of the Galatea GPNC is necessary. The criteria for this assessment would be based on the following requirements, which are also the research objectives for this thesis. They are: high performance, generality, parallelism, flexibility, scalability and modularity.

Siemens and Philips VMs are general-purpose neurocomputers which are expected to yield a high performance, targeting large-size real-world applications. Typical applications include computationally demanding vision tasks, and image recognition and processing. Siemens based VM, SYNAPSE-1, which will be a commercial product, can

at 4MBytes, and can be increased by upgrading the local RAM for the communications unit.

Two levels of parallelisms are possible with the VM approach. The execution unit of each VM is a medium- to fine-grained parallel processor array. In addition to this, many VMs can be connected in parallel, providing coarse-grained parallelism. This second level of parallelism is the mapping domain that this thesis work has focused on as part of the development of a general purpose neural computer.

Two radically different mapping/execution philosophies are practised for the execution and mapping of neural networks. The first one exploits the parallel distributed structure of networks; the neural-oriented features such as layers, clusters, neurons and synapses are mapped and executed on parallel distributed hardware. The three neural network case studies in chapter 4 show that these networks favour different types of structural mappings due to the differences in their topological and computational properties. The structural mapping approach is not general or flexible, yet it is simple to understand and can deliver a high performance on massively parallel hardware platforms. Strictly speaking the structural parallelism is data parallelism.

The second mapping approach is based on the high performance execution of the computations involved in neural network simulations. The second approach has been chosen in this thesis, as it is more general, flexible and cost-effective. The mapping strategy based on this computational mapping approach is to develop a mapper as an optimiser. The mapper’s main task is then to optimise the use of hardware resources for an efficient execution. The mapper as an optimiser strategy is upgradable. Any optimiser, including genetic algorithms and neural networks can be used to optimise the mapping process. In fact, there have already been attempts to use neural networks as an optimiser in the mapping problem [134].

Central to the optimisation, is the costing of computational load, with two aspects; the processing costs and the communications costs. Most of the parallel mapping efforts have been focused on the computational costing of the sequential and the potentially parallel executions. Naturally, to demonstrate the approach, a linear computational model, and a homogeneous processor architecture have been assumed for simulations on the SUN LAN. The heterogeneous hardwares with nonlinear computational models would be more challenging, although the same principles apply.

The mapping strategy of computational optimisation is generic, and it can also be applied to structural mapping of networks. In that case, it would involve the evaluation of the processing and communications costs for all the objects of the system. Developing a computational optimiser object mapper would be facilitated by object-oriented languages. In fact, the computational look-up tables could be set up as parts of the object classes which are distributed by the mapper. The challenging task for the mapper would be to decide on the level of granularity for partitioning neural network object representations. To achieve this, the potential computational costs of a number of partitionings can be simulated and the execution with the minimum computational cost is chosen. An additional degree of complexity to mapping can be foreseen on the heterogeneous architectures. Then the computational optimisation can last increasingly long in proportion to the granularity of the systems. Neural network or genetic algorithm based optimisers can then replace straightforward computational cost calculations.

In document Analysis, Representation and Mapping of Neural Networks onto Parallel Hardware (Page 166-168)