gives some indication as to the power, in both speed and network capacity, that a variety of implementations can achieve This

diagram also includes conventional computers and some low intelligence

animals, so that comparisons can be made[2]. The values given in this

diagram are for single devices only, the performance of many of these implementations can be increased by utilising arrays of these devices.

The rest of this chapter looks at the different architectures that have

been proposed for neural co-processors and neural ASICs. After

presenting these architectures some analysis is carried out, with particular reference to the criteria laid down in Chapter 3. From this

Richard Palmer Phd. Thesis S p e e d ( C o n n e c l i o n s / S e c 1 0 9 10 6 1 0 3 10 1 1 E T A N N * I n t e l l i g e n M e m o r y < H i t a c h i Wajfer * 1 1 Be t * 3hi ps F 1 y • e • 1 D N N A * DELTA N E T S I M o i Trarispij 1 D Iter O Wor]m • S U N 3 + | P C / A T + i L e e c h • | 1 1 1 1 I 1 1 1 1 I 0 3 6 9 12 1 0 1 0 1 0 1 0 10 S t o r a g e ( C o n n ®Ct i on s) K e y • B i o l o g i c a l B r a i n + W o r k s t a t i o n o C o - P r o c e s s o r * N e u r a l A S I C

Diagram 4.1 The Diversity of Neural Implementations

analysis it is possible to see that there is no single architecture, from those currently available, that meet all these requirements.

4.1.1 Neural Co-processors

Neural co-processors are a fairly mature technology with several products now being commercially available. Co-processors can be obtained for several hosts - PCs, SUNs and other UNIX workstations. They tend to be board based, and consist of memory, input/output buffers and fast computational units. The technology used in these co-processors ranges from digital signal processors to bit-slice technology, and even custom built devices. The parallelism incorporated in most co-processor boards is low: the speed is obtained by using extremely fast multipliers and pipelined architectures.

Kicnara ralmer Pha. JLnesis

4.1.1.1 NETSIM - Texas Instruments

NETSIM[3] is a board-based co-processor built around two custom designed chips. These are a 'Solution Engine' and a communications chip.

The Solution Engine has just three instructions, these being specially- designed for neural operations. These are:

The speed obtainable from the Solution Engine is 4xl06 interconnections per second or 1.3x10s learning interconnections per second.

NETSIM boards can be connected into arrays, of either two or three dimensions. The communications chip performs all the necessary data transfers, at a rate of 10 Mbit per second. The theoretical maximum number of NETSIM boards that can be connected in one array is 27000, with up to 256 neural nodes per board.

A major feature in this design is the development, and eventual expansion into arrays of NETSIM boards. For development purposes a replica NETSIM board is inserted into a PC host. Highlevel languages can be used to develop and debug applications using this replica NETSIM board. Upon completion the software can be downloaded onto an array, with the host PC providing input and output as well as control over the array.

NETSIM provides a versatile implementation for neural networks. The speed achieved is adequate for many applications, but this may require complex and bulky arrays of NETSIM boards to be built.

4.1.1.2 DELTA Co-Processor - SAIC Corp.

The DELTA[4] co-processors provides a solution to a wide range of applications. It allows the full development and eventual production of systems built around DELTA processor boards, and the DELTA operation

Repeat-Multiply-Sum Read-Write

Repeat-Multiply-Sum-Write

(Update Synaptic Value)

Kicnara rainier m u . inesis

system.

The development tools are standard across the range of products, and use the DELTA operating system and ANSim development tools. This facility enables the development of software to be carried out with a consistent

interface. The range of processor boards available allows a variety of

systems to be built: they can be used in conjunction with a host processor, or in a stand-alone environment. The stand-alone processors include A/D and D/A converters so real-time control systems can be built with minimal support hardware.

The DELTA processor architecture consists of a 4 stage pipelined Harvard Architecture providing maximum parallelism for the operations required:

Operand Fetch (4 Address Modes)

Multiply Accumulate Write Result

The multiplication and addition are performed by BIT's ECL floating

point multiplier and ALU chips. The performance achieved is 107

interconnections per second with a capacity of 106 connections.

Input and output is performed by FIFO Buffer links capable of 40MB/sec, allowing up to 32 devices to be connected using a shared bus.

The DELTA processor and ANSim development tools provide a commercial means of incorporating neural networks into many applications. By

providing a range of compatible hardware and software products, a

variety of solutions can be achieved from this system.

4.1.2 Neurocomputers and ASIC Neural Devices

The next generation of neural implementations is composed of

neurocomputers using neural ASIC devices. These implementations can

offer a 103 increase in performance over neural co-processors by

utilising the latest technologies and by building specialised devices. This area of neural device is still undergoing considerable research,

Richard Palmer Phd. Thesis

with very few commercial products out on the market yet: most products now available are Neural ASICs out of which neurocomputers can be made. Hitachi is one of the first companies to produce a prototype neurocomputer, but this will not be marketed for the next couple of y e a r s .

Below several neural ASICs are discussed. The key points of each design are mentioned, giving a quick review of each implementation, and not necessarily a full description of its operation. The examples chosen indicate the variety of possible solutions, and the different goals and objectives used by the respective research groups. After presenting these examples, assessments are made of the technologies used, and possible drawbacks to these architectures are highlighted.

4.1.2.1 ETANN - Intel Corporation.

By the development of 'Floating Gate Non-Volatile Analogue Memories' Intel have been able to produce a true analogue neural device, the

Electrically Trainable Artificial Neural Network (ETANN) [5]. The

architecture consists of 64 neurons, and two 4096 synaptic matrices. The first matrix provides an input synapse array, and the second provides a feedback synapse array. This arrangement allows both Hopfield and two layer feedforward networks to be implemented. Sample/hold units are provided to enable the analogue signals to be clocked through the circuitry.

Each synapse is composed of two EEPROM memory cells, an excitatory and an inhibitory cell. The difference between the floating gate voltages (Vfg) of the two EEPROM cells represents the weights value. Diagram 4.2 shows a synaptic cell .

The operation of each synapse is as below:

Weight = (Vfg+ - Vfg-) Input = (Vin+ - Vin-)

Output = ((Vin+ - Vin-) * (Vfg+ - Vfg-)) = (Iout+ - lout-)

K i c n a r a rainier rna. inesis

In document A novel architecture for a high performance low complexity neural device (Page 51-56)