Neural Architectures Codification Methods

The most commonly used codification for representing the neural network architecture is the direct codification method (Bornholdt & Graudenz, 1992), which explicitly represents each connection by means of a binary digit. Given an architecture with N neurons, the matrix that represents the set of connections in the network is termed matrix W, of dimensions NxN: 11 12 1N 21 22 2N N1 N2 NN w w ... w w w ... w W ... ... ... ... w w ... w       =      

where w_ij = {0,1}, ∀i, j = 1, 2, ..., N. The values w_ij represent the presence or absence of connections. If w_ij = 1, then there exists a connection from the ith_{neuron to the j}th_neuron

of the network. On the other hand, if w_ij = 0, then there is no connection between these two neurons. If the matrix is composed of zeros along and below the main diagonal, the neural architecture of the represented network is feedforward. If, on the other hand, there are elements other than zero, it is a feedback neural architecture. Figure 3 illustrates a feedforward neural network, its connections matrix and the binary string that represents its architecture. This codification method employs N2_{bits to codify feedback architec-}

tures with N neurons, while the number of bits required to encode feedforward architectures can be reduced to N·(N-1) / 2.

This approach provides a very fast and easy to implement process of encoding and decoding neural networks as binary strings and vice versa. However, its drawback is that the bits required for codification is a square number. Therefore, the neural architecture to be encoded is sizeable, the connection matrixes are very large, and it takes the genetic algorithm a very long time to process them.

Other codifications have been designed to reduce the size of the individuals of the population that the direct codification methods generate, of which the indirect codifi- cation methods are very important (Kitano, 1994). They are based on the idea of using an intermediate representation between the connections matrix and the string that encodes the architecture.

A prominent member of the indirect codification methods of neural architectures is the grammar codification method (Kitano, 1994), which involves describing the neural architecture by means of a set of derivation rules belonging to a grammar and then encoding these rules by means of binary strings. The intermediate representation chosen from the neural architectures and the strings that encode them is, in this case, the set of grammar rules. This manages to reduce the length of the strings that form the population as compared with the direct codification method. The grammar codification method is composed of two parts: (1) construction of the grammar whose words are the set of neural architectures that are to be used, that is, the search space; and (2) the codification of the derivation rules of the grammar by means of binary strings.

For feedforward neural networks with two input neurons, five hidden neurons and one output neuron, the grammar encoding method needs 40 bits, whereas the direct codification method takes 64 bits. The drawback of this approach is that it is very tedious to define a grammar each time a new neural architecture has to be built.

The basic architectures codification method (Ríos, Barrios, Carrascal, & Manrique, 2001) is another important approach that can be classified as an indirect codification method. It is based on an algebraic formalisation of the set of feedforward neural architectures with one hidden layer. This algebraic description outputs a binary codification in which any neural architecture is represented by a string of bits of a shorter length than the above two encoding methods.

This codification method denotes the set of all generalised feedforward neural networks with a maximum of I input neurons, H hidden units, and O output units by R_I,H,O. Of the set of all existing neural networks R_I,H,O, we are only interested in the subset V_I,H,O⊆R_I,H,O of all valid neural networks, as V_I,H,O only contains points that can solve a problem and does not include illegal neural networks. Given two valid neural networks v and v’, the

1 2 5 4 3 8 Neural architecture 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Connections matrix _{codifies the architecture}String (individual) that

0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 6

superimposition operation between them, denoted by v⊕v’, is the union of the connections from v and v’. Figure 4 illustrates the superimposition operation.

A valid neural network b∈V_I,H,O is also called basic, and so, b∈B_I,H,O, if and only if b has only one direct connection from the input to the output layer; or there is one connection from the input to one hidden unit and another from this unit to the output neuron. The subset B_I,H,O⊆V_I,H,O of all basic neural networks has important features because any valid generalised feedforward neural architecture can be built from these basic structures. We now calculate the cardinal of set B_I,H,O. There is a total of I·O artificial neural networks with no hidden units, plus I·H·O nets where one output is connected to one input through one hidden neuron. Thus, the cardinal of B_I,H,O is I·O·(H+1).

The binary codification of neural architectures using this approach is as follows: There are I·O(H+1) basic neural networks that can be superimposed to build more complex structures. Thus, the set of basic neural networks {b₁,…,b_I·O(H+1)} can be combined in

2I·O(H+1)_{different ways to build valid neural networks. So, there exists a one-to-one}

correspondence between the set of all possible superimpositions of all valid neural architectures and the set Vb

I,H,O of all binary strings that encode each of them. With all

these premises, the codification of all the points in V_I,H,O will be based on the codification of the set of basic neural architectures B_I,H,O={b₁, b₂, …, b_i, …, b_I·O(H+1)} with binary strings of length I·O(H+1) bits as shown in Table 1.

1 1 3 2 1 I H O 1 1 2 1 I H _O 1 1 3 2 1 I H _O 3

Figure 4. Superimposition operation

b1→ 1, 0, …, 0, …, 0 b2→ 0, 1, …, 0, …, 0 … bi→ 0, 0, …, 1, …, 0; 1 is set in the i th position. … bI·O(H+1)→ 0, 0, …, 0, …, 1

b₁, ..., b_I·O(H+1) can be ordered in any way, but should then preserve the same sequence as has been chosen. The null net, a neural network without any connections, is always encoded as a string of I·O(H+1) zeros. Figure 5 shows an example of this table of correspondences between basic neural networks and their codification for the set V_2,1,1. Once all basic neural networks have been encoded, any valid neural network included in V_I,H,O can be built by applying the binary-OR operator (∨) to the encoded basic nets. This outputs all binary strings included in the set Vb

I,H,O. It is easy to encode any neural

architecture v∈V_I,H,O by finding the basic neural networks that have to be superimposed to obtain v and, starting from a string of I·O(H+1) zeros, switching the ith_{bit to 1 if the}

ith_{basic neural net is needed to superimpose to give v. When this codification is used,}

the set Vb

I,H,O has two important features. First, the search space defined with the

proposed codification yields only possible solutions to the problem, that is, there are no illegal neural networks. Second, there exist several binary strings that codify the same valid ANN because the set Vb

I,H,O encodes the set of all decompositions of all valid neural

networks and any valid neural architecture has at least one decomposition. This is a very positive feature, as genetic algorithms will find the best artificial neural network that solves a problem faster, because, generally, several codifications of the best neural architecture are spread in the search space.

In document Artificial Neural Networks in Real life Applications Juan R Rabunal pdf (Page 116-119)