4 Issues in Topology Determination
4.2 The First Hidden Layer
4.2.1 Regions and the Requirement for a Second
Each unit in the first hidden layer bisects input space into two halves. One half where the unit is on (i.e. has output 1), and the other where the unit is off (i.e. has output 0). The linear boundary in input space between the unit being on and the unit being off is termed the hyperplane of the unit. For all points on the hyperplane, the excitation of the unit is zero.
The position of the hyperplane, and on which side of the hyperplane the unit is active, is determined by the ratio and the sign of the weights. For example, in 2D input space (with axes x and y), a desired line of division of input space might be y = -2x + 3. If wx is the weight of the unit to input x, u>y is the weight of the unit to input y, and wb is the bias weight to the unit, the line of zero excitation is given by wx.x + Wy.y + wb = Q- There are a family of weights that implement the desired line of division, whereby wx - 2wy and wb = -3wy. If the unit is to be active when y > -2x + 3, then Wy should have a positive value, and for y < ~2x + 3, Wy should have a negative value.
When there are many units, each of which gives a different bisection of input space, there are many regions with linear edges. A region is bounded on its edges, by some or all of the hyperplanes. Each region is uniquely identified by the activity of the units in the first hidden layer. This is illustrated for a simple example in figure 4.1.
From figure 4.1, notice that there are seven regions formed. With three units, however, there are eight possible patterns of activity from the units
Issues in Topology Determination First Hidden Layer
in the first hidden layer. Clearly not all of these eight possibilities are realisable. The unrealisable patterns of activity are termed virtual cells by Makhoul et al.14 Mirchandani and Cao,15 and Makhoul et al16 both give the formula for the maximum number of regions, Rmax, that can be formed by N hyperplanes in an input space of d dimensions:
f
2n N<d^■max
~ j £
N^j N > d [4.4]where
" r r'.(n-r)\
This means that given a certain desired number of regions, R, a lower bound for the number of hyperplanes required to implement R regions can be provided using [4.4] so that Rmax This is a lower bound — the given number of hyperplanes need not realise the maximum number of regions. Figure 4.3 gives an example of this.
Figure 4.3 — (a) Three hyperplanes arranged to realise the maximum 7
regions, (b) Three hyperplanes which realise only 6 regions.
The requirement for a second hidden layer for a given number of first hidden units is indicated by a linearly inseparable problem from the outputs of the first hidden layer to the targets for the regions they delineate. The simplest example of this, for 2D input is the "four-quadrant
14Makhoul et al, 1989, pp. 458-459 15Mirchandani & Cao, 1989, p. 661 16Makhoul et al, 1989, p. 456
Issues in Topology Determination First Hidden Layer
dichotomy in 2D"17, or the "real-valued XOR"18 , which will be called the chequer-board problem19 in this thesis, and is illustrated in figure 4.4. It is an infinite training problem, in which input space is divided into four portions. No two adjacent portions may have equal output. Four regions may be formed using two hidden units in a single hidden layer, whose targets for the chequer-board problem require the solution of XOR from the outputs of the hidden layer. The XOR problem is not linearly separable (by an output unit), and hence, a second hidden layer is required for an exact realisation.
Figure 4.4 — (a) The chequer-board problem. The problem is, in fact, unbounded, despite the square drawn round it. (b) A topology for attempting to solve the problem in (a), (c) Placing the hyperplanes on the boundaries of the regions, (d) The output unit cannot meet the requirements for the targets of the regions, since XOR must be solved, and hence this topology cannot realise the chequer-board problem.
The case of the chequer-board might seem to be counter to the theory of authors such as Hornik et al, that only one hidden layer is necessary to realise a given problem. There is a deterministic relation between input and output, and one might think that there are enough hidden units to realise the targets of the regions.
9 1 Unit -TT) 12 Target V I 0 0 0 2 1,2 0 1 1 1 0 1 1 1 0 (c) (d) 17Cosnard et al, 1993, p. 2293 18Makhoul et al, 1989, p. 459 19Weir, 1993
Issues in Topology Determination First Hidden Layer
This is an example of what may be termed partitioning, contrasted with separation. Here, partitioning is taken to mean that the hyperplanes have been placed along all borders between regions of opposite class. Separation implies that the output unit assigns the correct class to each region. This nomenclature applies henceforth. Two hidden units in a single hidden layer are sufficient to partition the chequer-board, but are not sufficient to separate it, because of the XOR problem at the first hidden layer. A second hidden layer is required for separation. In general, a single hidden layer may be sufficient for separation, though not always.
However, the chequer-board can be approximated, to an arbitrary degree of accuracy, without using a further hidden layer. This will be illustrated in section 4.2.2.2. It remains to be shown that no single hidden layer topology can exactly realise the chequer-board.