Nets with Multiple Output Units - Tangent Hyperplanes as Linear Approximations

3. Tangent Hyperplanes

3.3. Tangent Hyperplanes as Linear Approximations

3.3.2. Nets with Multiple Output Units

In the case of a net with r output units the solution manifold for a pattern is defined by the system of equations

f N j ( W J i ) = t i l

N2(W Ji

) =

ti2

... (3.3.14)

Nr( W , / ; ) = tir

where Nj is the net's 1-0 function for the output unit j, is the input vector of pattern /, W is a weight state and ty is the target for pattern i and output unit J-

Each N j defines the output for an output unit based on a weight state and an input pattern. As m entioned before for the single output unit case, the input pattern is not considered to be a variable since it is fixed during training. Each N j is therefore a function of the weights. If there are k weights in the net then the set of weights that satisfies the function is a surface with {k~ \ ) d im e n sio n s.

Each of these surfaces can be seen as a solution m anifold for a particular output unit in the net. To be considered a solution a weight state has to produce correct outputs for all output units. Hence, the global solution m anifold for a particular pattern is the intersection of all the solution manifolds for each output unit; therefore, in the general case, it has (k - r ) d im e n sio n s.

The approach taken here to com pute a linear approxim ation to the global solution manifold of a pattern, as described in (3.3.14), is to compute a linear approxim ation to each of the output units' solution m anifolds. The system of linear approxim ations obtained for a pattern is an approxim ation to the system in (3.3.14).

For each pattern such a linear system can be computed. The set of systems of linear approxim ations obtained is an approxim ation to the set of systems of equations found for the set of patterns.

Chapter 3. Tangent H yperplanes

Out 1 Out 2 O utl

Output Layer Hidden Layer Input Layer a) net b) subnet

/

©

Neurons Weights This neuron is ignored when looking at subnet 1

V Weights Ignored in ^ subnet

' for output 1

figure 3.6 - the concept of a subnet

In figure 3.6 the subnet concept is introduced. A subnet in the scope of this work is a part of a net in which only one output unit is present. For any particular net there are r subnets in which r is the number of output units. Figure 3.6.b) shows a subnet of the net in 3.6.a) where all output units but output unit 1 are ignored. The equations in (3.3.14) each relate to a subnet; in particular the solution m anifolds for each output unit correspond to the solution m anifolds of the subnets. This is because equation N i defines the output in the subnet which contains the output unit i.

For each subnet a linear approxim ation to a pattern’s solution m anifold can be computed. To find the solution m anifold of a subnet for a pattern, the pattern’s steepest gradient direction is com puted for that subnet, i.e. all output units in the net that don't belong to the subnet have no influence on the gradient's com putation. The direction found is used in conjunction with line search to find the subnet solution manifold and a tangent hyperplane to it can then computed as described in §3.3.1.

For each pattern then, there is one tangent hyperplane per output unit. The system of r hyperplanes, r being the number of output units, is the linear

approxim ation to the net's global solution m anifold described in (3.3,14) for the pattern.

In matrix N A in equation (3.3.12) each row represents a hyperplane. Hence r * p (r stands for the num ber of output units and p for the num ber of patterns) rows will be needed in the general case of a feed-forward net with multiple output units. The same principle applies to vector N B ' (3.3.12).

In figure 3.6 a net with two output units is shown. If, for example, the training set has only two patterns then matrix N A is as defined in equation (3.3.15). NA — ATAyy N Ai2 N A 21 NA.22 (3,3.15)

where N A i j stands for the normalised gradient vector computed for pattern i and output unit j assuming that all other output units have zero error, i.e. the coefficients of the gradient vector corresponding to the weights connecting to those output units are zero. For instance N A j j is defined as

[ n a i l 1 n a j 12 n a i i 3 n a j 14 n a i i 5 na-i 0 .0 O.O] (3.3.16)

where n a n j stands for the gradient vector for the subnet containing output unit 1 for pattern 1, weight j in figure 3.6. The zeros correspond to the weights that are not part of the subnet; in this case the weights 7 and 8 are not part of the subnet in figure 3.6.b).

In document Robustness and generalisation : tangent hyperplanes and classification trees (Page 96-99)