Lower-Order Computational Signatures

4.6 Experimental Setup

5.4.1 Lower-Order Computational Signatures

In the previous chapter, we introduced two measures used for analyzing the computational strategies evolved using neural diversity. In this section, we classify as well as describe the nature of these signatures and other variations of lower order signatures.

5.4.1.1 Transfer Function Likelihood

The transfer function likelihood (or likelihood of occurrence - as described in the previous chapter) captures the likelihood of each transfer appearing in the best neural network model, either before or after training. A high likelihood of a certain transfer function to exist in the best model can be regarded as an indicator that that transfer function provides access to a region of the search space that might have the most appropriate hypothesis. In other words, the transfer functions might be more suited for the given problem.

This was described as the number of times a transfer function appears in the best model normalized by the total number of possible transfer functions as in Eq. 4.1 on page 62. This gives an relative likelihood of the transfer function appearing in the best models E[ fa(gb(.))] = la,b. Given the role of transfer functions as the components that play the

important role of forming decision boundaries, we will expect that the likelihoods of each of the transfer functions for a certain dataset D, will give us some hints regarding the nature of the problem. In addition to that, it also gives us a hint of the underlying computational

strategy (hi(x)) adopted by the best neural network model, which could then be used to

refine the choice of transfer functions among other possibilities.

We consider two conceivable forms of capturing this information; the first approach was to obtain the likelihood of the activation function or in other another case the output function irrespective of their combination. In other words, the likelihood of an activation function g(.) like the Euclidean distance will be captured from the neural network model without regard for the output function f (.) it is combined with. We refer to this as a dis- jointsignature extraction, denoted as first-order problem signature of the transfer function likelihood. This is because the measure is essentially capturing statistics of the decoupled components of the transfer function.

The second approach was to capture the likelihood of transfer functions as a unit, in other words the likelihood of all the possible combinations of activation and output functions. We referred to this as joint signature extraction, and is classified as second-order problem signature of the transfer function likelihood . This is recorded in a matrix with the x-axis representing the output functions, while the y-axis represent the activation functions. As such the value for the likelihood of a transfer function of a certain activation function

g(.), and a certain output function f (.), is given by the value found at the index of activation function i(g(.)), and output function i( f (.)), i.e (i(g(.)), i( f (.))) (see Fig. 5.1).

Figure 5.1: The transfer function likelihood visualization for the Iris problem showing the relative expectations of the transfer functions to be used in elite models. The inten- sity represents the degree of usage with dark and light signifying heavy and light usage, respectively.

5.4.1.2 Associated Error

While the transfer function likelihood captures the likelihood of transfer functions being used, the associated error captures a different aspect. The associated error measures the error associated with a transfer function. It does so by back-propagating the error of the output unit on a pattern xito each hidden unit. The associated error e0for each hidden unit

is then calculated as in Eq. 4.2 on page 63, and this is associated with the transfer function of the hidden unit. This would have required a different approach in the case of a neural network with multiple layers of hidden units since the error of the neurons from the deeper layers depends on differentiating and then propagating the error backward. However; in our case, the neural network had one hidden layer, i.e. two layers of weights.

The associated error is averaged over several runs (samples) to get an approximate error that should be somewhat representative of the expected error for the transfer function.

The associated error also has disjoint first-order, as well as joint second-order problem signatures.

5.4.1.3 Connection density

Connection density is one of the measures used in neuroscience for studying biological neural networks [96, 91, 21], and was adopted for studying neural diversity machines. It is calculated for each neuron in the hidden and output layers as the sum of the active (i.e. turned on and takes part computation) connections to the neuron, normalized by the number of all the connections (both active and inactive). This can be expressed as in Eq. 5.1.

d_i= pi/qi (5.1)

Where qi and pi, are the total number of connections, and the total number of active

connections for neuron i, respectively. This is accumulated over the number of sampling runs for signature extraction. Figure 5.2 shows the example of an illustration of the connection density for the Iris dataset.

Figure 5.2: An illustration of the connection density for the Iris dataset.

In document An empirical study towards efficient learning in artificial neural networks by neuronal diversity (Page 99-101)