Number of Units Required - The Second Hidden Layer

4 Issues in Topology Determination

4.3 The Second Hidden Layer

4.3.1 Number of Units Required

Section 4.3.1.1 considers the number of units required to perform a separation for one output unit only. Although only one output unit has been considered so far, it is necessary to consider the extra complexities of using many output units. This is considered briefly in section 4.3.1.2.

4.3.1.1 One Output Unit

This section illustrates an approach to realising the targets of the vertices of the 2HLI hypercube, which leads to a procedure for recommending the number of hidden units to be used in the second hidden layer.

The approach is based on the N-bit parity problem. To solve the parity problem, for each vertex of the hypercube, the output of the network must be 1 if there are an odd number of Is in the co-ordinate of the vertex, and 0 otherwise. This problem is recognised by Rumelhart et al as being hard for neural networks, because "the most similar patterns (those which differ by

Issues in Topology Determination Second Hidden Layer

a single bit) require different answers."33 Hertz et al show that the N-bit

parity problem is realisable with N hidden units with threshold activation functions in a single layer.34 The positions of the hyperplanes for N = 3 are shown in figure 4.14. The first hyperplane to be placed goes through the midpoints of those edges of the hypercube which extend from the origin. Subsequent hyperplanes are placed parallel to the first until all vertices of the hypercube with different targets lie in different regions of input space. (Nilsson has the general result that any set of patterns can be realised by placing a number of parallel hyperplanes.)35

Figure 4.14 — Realisation of the 3 bit parity problem with 3 hyperplanes.

The parity problem is described as a "worst function" by Minsky and Papert, though they do not say that it is the worst case.36 For the purposes of developing this procedure, however, the N-bit parity problem will be assumed to be the hardest problem for a hypercube of dimensionality N. "Hardest" here is taken to mean that the problem requires the most hyperplanes to solve, and that the solution of all other problems on the hypercube requires at least 1 fewer hyperplane. This is shown empirically for the 3D case in figure 4.15.

The logic behind all other problems requiring at least 1 fewer hyperplane is that if, for example, the vertex at the origin in figure 4.14 was to change its target, the hyperplane used to distinguish it from the three adjacent

33Rumelhart et al, 1986, p. 334 34Hertz et al, 1991, p. 131 35Nilsson, 1965, p. 109

Issues in Topology Determination Second Hidden Layer

vertices is no longer required. Since the realisation of the 3 bit parity

problem shown in figure 4.14 does not have to start at the origin, but can start from any vertex, 2 hyperplanes are sufficient for realisation of the targets if any vertex changes target.

(a)

(i)

*4

Figure 4.15 — All the possible different problems on the vertices of a cube. Equivalent problems can be achieved bp rotating the hypercube or swapping the black and white targets, (a) 1 black, 7 white, (b) 2 black, 6 white. (c) 3 black, 5 white, (d) 4 black, 4 white. Note that only the 3-bit parity problem in (d)(v) requires 3 hyperplanes for realisation.

Let us assume that the N-bit parity problem is the hardest problem, with N hyperplanes required for solution, and at most N -1 hyperplanes are required to solve any other problem. The procedure for calculating the number of hidden units in the second layer is then simply to use the

Issues in Topology Determination Second Hidden Layer

number of units required to realise the largest parity problem it is possible to get with the number of vertices that are used in the 1HLO hypercube. Thus, for the example in figure 4.12, seven vertices of the 1HLO hypercube are used, since there are seven regions in input space. The largest parity problem it is possible to generate on seven vertices is the 2-bit parity problem, and hence 2 hidden units in the second hidden layer should suffice for any desired final output. This gives the following formula for the number of units in the second hidden layer, M, in terms of the number of regions R the hyperplanes realise:

M = |_log2/?J [4.10]

where LxJ is the largest integer not greater than x. R may be reduced by only considering those regions which lie in a given bounded region of input space.

With a low ratio of input units to units in the first hidden layer, the number of units in the second hidden layer is likely to be significantly lower than the number of units in the first hidden layer. For example, with 3 input units, and 100 hidden units in the first hidden layer, the maximum possible number of regions from [4.4] is 166 751. From [4.10] the maximum number of hidden units required in the second hidden layer is 14.

4.3.1.2 Many Output Units

When there are many output units, each output unit may give its own classification to various regions of input space, which have been partitioned by the first hidden layer. This is an extra complication from the one output unit case. Here, each output unit will place its own, unique demands on the second hidden layer. Consider the case of a topology with two output units, two input units and three hidden units. An example of a possible partitioning of input space, and targets for the output units is shown in figure 4.16.

In order to cope with any possible set of targets for an output, it may be necessary for the second hidden layer to make each region available to each output unit, in order to enable the output unit to give the desired classification to those regions. The simplest way to do this is to have a unit in the second hidden layer dedicated to each region. Each vertex of the

Issues in Topology Determination Second Hidden Layer

1HLO hypercube that is used is distinguished from the rest of the

hypercube by a unit in the second hidden layer. This is shown in figure 4.17, for the example in figure 4.16. Using this method, all of the possible outputs are realisable for any number of output units.

Output 1 Output 2 Shading

0 0 I I

0 1

1 0

1 1

Figure 4.16 — A possible partitioning of input space, indicated by the thin, black lines, and the targets for the two output units, indicated by the shading of the regions. The 1HL0 hypercube is represented by the thick, shaded lines, with the vertices indicated by the white circles.

The disadvantage with this method is that the number of regions may be

prohibitively large for relatively small numbers of units in the input and first hidden layer. For example, with 4 input units, and 12 hidden units in the first hidden layer, there are 794 possible regions, using equation [4.4]. This is rather a large number of units to use in the second hidden layer, and it is unlikely that all of the 2794 possible outputs would be needed! Another possibility is to apply the procedure in 4.3.1.1 once for each output unit. Thus, if there are O output units, the number of units in the second hidden layer should be O.M, where M is calculated as per [4.10]. Rather than 794 units in the second hidden layer indicated for the above example by the simple method, this method gives a very much smaller number: Llog2 794j = 9 units in the second layer for each output unit — a total of 36 units in the second hidden layer. Eighty-nine output units would be needed before this method exceeded the number recommended by the simple method.

Issues in Topology Determination Second Hidden Layer

Figure 4.17 — Using 7 units in the second hidden layer to represent each region in input space, from figure 4.16.

In document Guaranteeing generalisation in neural networks (Page 161-166)