Simulation Results - Hardware Simulation - — Simulation of Neural Networks Hardware

Chapter 5 — Simulation of Neural Networks Hardware

5.5. Hardware Simulation

5.5.3. Simulation Results

This section reports the most important results obtained through exhaustive simulations of a Back Propagation neural network. Further details can be foimd in [130] and in Appendix C, The classical character recognition application is used to test the hardware performance based on the assumptions discussed throughout this chapter. These simulations also serve to demonstrate the effectiveness of the modifications and extensions made in the Pygmalion environment and in the nC language. In particular, the flexibility provided through the independent specification of hardware constraints allowed the observation on individual effects caused by each of these constraints. The results are then compared with the theoretical analyses discussed in section 5.4.1.

Chapter 5 Simulation of Neural Networks Hardware 95

Figure 5.5 shows the character recognition application used as a benchmark for the simulations. The Back Propagation neural network is configured with three layers of 96x24x96 artificial neurons, and is trained to recognise 10 numeric characters (0-9) defined over a matrix of 96 pixels (12x8). Such problem is a simplified version of the OCR (Optical Character Recognition) application, in which the network is trained to read corrupted characters (or hand-written versions) and complete them to produce the correct pattern.

Back Propagation Netwod(

Perfect Output

Figure 5.5 — The Character Recognition Application

Recall Phase

It has been frequently mentioned in the literature that fewer bits are required to correctly execute the recall phase of a neural network. This suggests that applications not requiring learning can be implemented in hardware using lower precision, and therefore using less VLSI area. This option is very important, and the neural networks’ hardware simulator offers to the user the required flexibility to determine and tune the best configuration of hardware parameters.

It has been observed (see details in Appendix C) that at least 8 bits of data (4 in the integer part and 4 in the decimal part) are required to correctly retrieve the ten patterns without employing any saturation and overflowlunderflow mechanism. The minimum number of 4 bits in the integer part guarantees that overflows and underflows do not occur, while 4 bits in the decimal part is the minimum precision needed to correctly retrieve all ten patterns. By reducing the number of bits in the integer part of data, overflows and underflows occur more frequently. When 2 bits are used, the network is no longer able to recognise all patterns. Appendix C gives details on the obtained results.

The influence of the lookup table is less effective than the precision of data. It has been found during these simulations that a lookup table made o f only 32 entries is enough to run the network free o f any error, while an 8-entry table has been reported to produce several overflows and underflows, but still able to correctly recognise all ten patterns.

96 Simulation of Neural Networks Hardware Chapter 5

Learning Phase

The learning phase in a Back Propagation neural network involves the recall phase followed by the processing of Equations (2) and (3). Since this phase deals with the updating of weights that is a function of the errors given in (3), veiy small values can be obtained when the convergence is close to be achieved. Therefore, it is expected that low precision can hinder the correct trajectory towards the convergence.

The influence of the lookup table is shown in Figure 5.6. It can be seen that by increasing the table size, the number of cycles required to get convergence is also increased, since the trajectory now takes smaller steps through the solution. A table with 256 entries has shown to be enough for the particular application. In addition. Figure 5.6 indicates that there is a minimum size for the lookup table beyond which no further improvement in the learning speed is achieved. It has also been observed that for an ideal

sigmoid function (without using lookup table) the convergence is reached in 269 cycles.

N um ber of _A — # C ycles 2 0 0 ' r ---♦ 2f>6 1280 2304 33 L 28 43 ookup 52 53 Table 76 64 Size 00 7424 8448

Figure 5.6 — The Effect of the Lookup Table

The reason why the convergence is slower when using high precision calculation of the threshold table is that it enforces a learning trajectory that uses very fine steps through the solution, which generally requires more iterations.

The effect of varying the decim^ and integer part of the data is shown in Figure 5.7. By keeping fixed the integer part of the representation (6 bits in Figure 5.7a), and varying the decimal part, the simulations show the influence of e and t | . These results are

congruent with results obtained in the theoretical analysis (section 5.4.1) and summarised in Table 5.1. Note that a minimum of 7 bits is required (e = 0.2 and u = 0.5) for the decimal part of weights, while for e = 0.1 and u = 0.1, a minimum of 10 bits is required.

Similarly, Figure 5.7.b shows the relationship between the learning phase and the integer part of weights (keeping fixed the decimal part at 13 bits, while the integer part is

Chapter 5 Simulation of Neural Networks Hardware 97

varied from 3 to 6 bits). A minimum of 3 bits is required to represent the integer part. However, when few bits (such as 3 and 4) are used, overflows and underflows frequently occur, thus activating the saturation mechanism. Nevertheless, the network is still able to successfully learn the patterns, although it requires more cycles to converge. From 5 bits onwards, no saturation mechanism is required, and performance is improved (see details of these results in Appendix C). Again, these results are very close to the ones obtained in section 5.4.1. 1500 ^ 1000- ^ ■ 0.1/0.1 00.1/0.5 0 0.2/0.1 00.2/0.5 0.2/0.1 0.1/0.1 00.1/0.5 0.2/0.1 00.2/0.5 0.2/0.5 0.2/0.1 0.1/0.5 0.1/0.1 (a) (b)

Figure 5.7 — Impact of Hardware Constraints During the Learning Phase

Some discrepancies between simulation results and the theoretical analysis discussed earlier are due to hardware constraints, which are not considered in the mathematical approach. The simulation shows that the implementation o f the activation function through a lookup table plays an important role for the learning phase of the neural computation. Therefore, the table size and its indexing mechanism must be carefully designed.

Finally, it is interesting to compare hardware simulation with software simulation results for the same application, employing floating-point representation with no hardware constraints (Table 5.2). This comparison highlights the relevance o f hardware constraints and shows the feasibility of using fixed-point arithmetic for the learning phase. In contrast to floating-point representation, experimental results show that with an appropriate choice of the number o f bits, fixed-point notation reduces the required number o f learning cycles. Furthermore, regarding the hardware implementation, fixed-point arithmetic leads to faster and smaller realisations.

98 Simulation of Neural Networks Hardware Chapter 5

e 0.1 0.1 0.2 0.2

n 0.1 0.5 0.1 0.5

Cycles 963 >2000 512 >2000

Table 5.2 — 32-bit Floating-Point Simulation Results

5.6. Summary

The importance of the hardware simulator, integrated in the Pygmalion

environment, is twofold:

• Hardware experimentation — it provides a tool for testing different hardware constraints for each particular application and algorithm, so that its final performance can be better predicted; and

• Hardware parameters tuning — it permits the adjustment of the ASNNCs, so that the user can find an optimal hardware configuration in terms of area and speed. As an example, a design with on-chip learning should use higher number of bits for weights and states than a design that is limited to the recall phase only. In the latter case, a higher level of integration is obtained, affording a larger number of PEs per integrated circuit. Therefore, the NSC uses this information to yield the neural chip in an optimal way.

Although theoretical studies have shown the capabihty of expressing a relation between the algorithm (Back Propagation) and the required precision, the results are still very limited and dependent on the algorithm parameters. As an example, the introduction of a momentum term in the Back Propagation algorithm [161] invalidates the results obtained by Equations (6) and (7). Conversely, through simulations, the user has the flexibility to experiment any application or algorithm and find out what hardware parameters are suitable to solve the particular problem.

An exhaustive simulation of the OCR application under several hardware constraints has provided encouraging results in terms of hardware performance for the Back Propagation neural network. The results obtained are consistent with theoretical studies [17] as well as practical experiments [22]. The success verified through simulations is particularly important to the Generic Neuron architecture, because all issues investigated in this chapter are in fact implemented by this architecture.

The next chapter introduces the NSC framework and gives a detailed description of the Generic Neuron architecture.

In document High Level Synthesis of Neural Network Chips (Page 95-100)