Randomized Expansion - Information Expansion

2.4 Role of a Feature Extractor in the Generalized Machine Learning

2.4.6 Information Expansion

2.4.6.1 Randomized Expansion

Wouter F. Schmidt et al [123] firstly proposed a feed forward neural network by randomly choosing the weights of the hidden layer, where the output layer is trained by a single layer learning rule or a pseudo-inverse strategy. The ran- domization of the hidden layer of the feedforward neural network calculates the weighted sum by using a squashing function F (x) which maps the values of the random weights between 0 and 1. The squashing function is the sigmoid function given as:

F (x) = 1

1 + e−x (2.38)

To formulate this idea, suppose M = [m1, m2, ...., mn, mn+1]T and X =

[x1, x2, ...., xn+1]T are the weight and input vectors respectively. The output at

hidden layer is described as

Ohidden = F (

In mathematical formulation the output layer perform the following function:

Ooutput =

wixi = WtX (2.40)

The total function computed by the network can be formulated as:

Onet(X) = hidden

wiF (MTi X) + whidden+1 (2.41)

The weight vector W∗ _{for which this equation is minimal is calculated as}

follows:

W∗_{= R}−1_P _(2.42)

where R and P are the input correlation matrix and input-target correlation vector respectively of the output unit.

RVFL

Random Vector Functional Link (RVFL) technique [124] also uses the randomized hidden nodes without training and training at the output. In this technique, the inputs are enhanced to xn with elements (xn1, xn2, ..., xnj, ..., xnJ). The target

outputs are tnk and the weights are wkj. In functional-link networks, the outputs

tkcan be treated independently of each other. The weights βj is initially assigned

with random values. For each input pattern the change in the weights is taken as:

The weights are updated once the changes are calculated for all the patterns in the training set as:

β_j(k + 1) = β_j(k) +X

∆β_nj (2.44)

The learning rate η may be increased as (tp− op) decreases.

The network only adjust the weights βj to minimize the system error

E =X j En= X n (netn− penaltyn)2 (2.45)

where symbol penalty_nis the target output and netn is the actual output of

the functional net.

Extreme Learning Machine

Recently a similar kind of new learning algorithm for single layer feedforward neural network[128] was proposed by Huang et al [125, 126]. The only difference between ELM and other algorithms is its high efficiency for auto encoder (Deep Belief Networks) [129]. Another important characteristic of ELM is its parameters of the hidden nodes. These parameters are randomly generated independently from the training samples and they are independent from each other in a wide type of neural networks and mathematical series/ expansion as well as in biological learning mechanism [130].

The reason for the evolution of extreme learning machine [131] is that it has the capacity of extremely fast learning. Since ELM is easy to implement, tends to achieve the smallest training error and good generalization performance. It is

true that the learning speed of other feedforward neural networks is in general far slower than required and it has been a major bottleneck in their applications [132].

The model of ELM is similar to other standard multiple layer feed forward neural network models consisting mainly of three layers 1) input layer, 2) hidden layer and 3) output layer. The basic ELM neural architecture is shown in Figure

2.15.

Figure 2.15: A Schematic Overview of an ELM

For N arbitrary distinct sample (xi, ti), where xi = [xi1, xi2, ..., xin] ǫ Rn and

˜ N X i=1 βig(xj) = ˜ N X i=1 βig(wi.xj + bi) = oj, j = 1, 2, ...., N, (2.46)

where wi = [wi1, wi2, ...., win] is the weight vector connecting the ith hidden

node and the input node, β_i = [βi1, βi2, ...., βin]T is the weight vector connecting

the ith hidden node and the output nodes, and bi is the threshold of the ith

hidden node wi.xj denoting the inner product of wi and xj.

The single-hidden layer feed forward neural networks (SLFNs) of extreme learning machine (ELM) is further enhanced to the generalized single hidden layer feedforward neural network [133].

The output function of ELM for generalized SLFNs is

fL(x) = L

i=1

βihi(x) = h(x)β, (2.47)

where β = [β1, β2, ..., βL]T is the vector of output weights from hidden to

output nodes, and h(x) = [h1(x), h2(x), ..., hL(x)] is the output (row) vector of

the hidden layer with respect to the input x. The output function hi(x) of hidden

nodes may not be unique. In particular, in real applications, hi(x) can be

hi(x) = G(ai, bi, x), ai ǫ Rd, bi ǫ R, (2.48)

where G(ai, bi, x) is a non-linear piecewise continuous function satisfying ELM

universal approximation capability theorem [133,134].

As the hidden neurons of ELM are totally independent from each other, dif- ferent output activation functions can be used to compute output of each hidden

neuron including sigmoid function, fourier function, hardlimit function, Gaussian function and multiquadrics function.

The equation (2.46) can be written compactly as

Hβ = T (2.49) where H(w1, ...., w_N˜, b1, ..., b_N˜, x1, ..., xN) =         g(w1.x1+ b1) ... g(w_N˜.x1 + b_N˜) ... g(w1.xN + b1) ... g(w_N˜.xN + b_N˜)         N x ˜N (2.50) β =             βT 1 . . βT N             and T =             tT 1 . . tT N             N xm (2.51)

In order to satisfy the learning criteria at the output stage, the standard ELM implementation uses the minimal norm least square method instead of the standard optimization method in the solution.

where H† _{is the Moore-Penrose generalized inverse of matrix H [}₁₃₅_].

Echo State Network

Echo State Network is another most popular type of Recurrent Neural Network (RNN) proposed by H. Jaeger [127] which is driven by a (single or multidimen- sional) time signal whose activations are generated using tanh (tan hyperbolic) function. These activations are used to perform linear classification/ regression.

Echo state networks consist of three layers of ‘neurons’: an input layer which is connected with random and fixed weights to the next layer which forms the reservoir. The neurons of the reservoir are connected to other neurons in the reservoir with a fixed, random, sparse matrix of weights. Typically only about 10 % of the weights in the reservoir are non-zero. The reservoir is connected to the output neurons using weights which are trained using error descent. In echo state networks only weights connected to the output neurons needs training making this network an easy and efficiently trained network. Echo State Networks are specialized in converting non-linear information into information of temporal nature.

The idea of reservoir can be stated as: Win denotes the weights from the Nu,

inputs u, to the Nx reservoir units x, W denotes the Nx × Nx reservoir weight

matrix, and Wout denotes the (Nx+ 1) × Ny weight matrix linking the reservoir

units to the output units, denoted by y as shown diagrammatically in Figure

The network dynamics are governed by

x(t) = f (Winu(t) + Wx(t − 1)) (2.53)

where typically f (.) = tanh(.) and t is the time index. The feed forward stage is given by

y = Woutx (2.54)

This is followed by a supervised learning of the output weights, Wout. If we are

using online learning, a simple least mean square rule gives

Wout= Wout+ η(ytarget− y)xT (2.55)

where η is a learning rate (step size) and ytarget is the target output corresponding

to the current input.

In document Towards On-line Domain-Independent Big Data Learning: Novel Theories and Applications (Page 84-92)