Weighted Integration Architecture Network

5.4 Integration Architecture Network

5.4.2 Weighted Integration Architecture Network

The concatenate operation signiﬁcantly increases the width of the integration layer, which means a larger number of parameters are stored in this layer. How- ever, a large amount of parameters results in high storage requirements, and also it makes the network susceptible to over-ﬁtting, especially if the amount of

FRQY FRQYB FRQY FRQYB FRQY FRQYB ,QWHJUDWLRQOD\HU )XOO\FRQQHFWHGOD\HU SRRO ZHLJKWPDWUL[ SRRO ZHLJKWPDWUL[ SRRO ZHLJKWPDWUL[ ZHLJKWPDWUL[ 6RIWPD[ DFODVVLILHU

Figure 5.4: The architecture of the proposed Weighted Integration Architecture Network (WIAN). A weight matrix is assigned to each integration layer.

labeled data in the training set is small. Additionally, because of the existing redundant information between two adjacent layers, we propose to integrate the previous convolutional layers (reshaped to the same dimensions by a convolution operation and subsequently normalized) by applying element-wise max, average or sum operations, as depicted in Figure 5.3. The resulting network architectures are named the Max Integration Architecture Network (MIAN), the Average In- tegration Architecture Network (AIAN) and the Sum Integration Architecture Network (SIAN), respectively. Furthermore, we propose an adaptive method to integrate the previous convolutional layers, which assigns to each previous convolutional layer a weight matrix, respectively, and then combine them by element- wise summing (as shown in Figure 5.4). Two weight schemes are explored in this section, one relies on the responses or entropy of the convolutional layer and the other one is based on an adaptive weight learning method.

Responses based weight scheme: as shown in Figure 5.4, for the given N

convolutional layers in the network, we denote the feature maps from layer Cn

as Fn_{, n} _{= 1}_{, ..., N}_{. These feature maps can be represented as a vector with}

dimension wn × hn × cn, where wn and hn are the width and height of each

individual feature map, and cn denotes the number of feature maps of layer Cn.

We further associate each unit of a feature map with a spatial coordinate (x, y)

and the activation of this unit bya(x, y). The response value of each feature map

is calculated as rc ₌wn

x=1 hn

y=1a(x, y), c = 1, ..., c

n_{. The response value of each}

layer is computed as Rn ₌ cn

c=1r

5.4 Integration Architecture Network

each layer is deﬁned as:

weightn_R= R

n=1Rn

(5.3)

Entropy based weight scheme: we further employ entropy information [172]

on each convolutional layer to deﬁne a weight value. The activation of each

unit a(x, y) in the feature map can be treated as a state pi, and the entropy of

each feature map is computed by ec ₌wn

x=1 hn

y=1(pi×logpi), c= 1, ..., c

n_{. The}

entropy of each layer is computed as En ₌ ck

c=1e

c_{. The weight value in the}

weight matrix of each layer is then deﬁned as:

weightn_E = E

n=1En

(5.4)

For the responses or entropy based weight scheme, each unit in the weight matrix is assigned the same value of response or entropy calculated from each layer, thus, the weight matrix of responses or entropy based weight scheme can be reduced to one single weight value.

Finally, the activation value of each unit an+1₍_{x, y}₎ _{in the integration layer is}

calculated using the following formula:

an+1(x, y) =

N n=1

weightn×an(x, y) (5.5)

Note that, in the speciﬁc case that the weight value from each layer is equal to

1/N, the scheme becomes equal to the scheme of AIAN. If the weight from each

layer is equal to 1, the scheme becomes equal to the scheme of SIAN.

Adaptive weight learning scheme: we further investigate an adaptive weight

learning scheme in this chapter. Diﬀerent from the responses or entropy based scheme where each unit in the weight matrix share the same weight value, the adaptive weight learning scheme assigns to each of the integrated layer a weight

matrix. The initial values in each weight matrix are set to 1/k, where k is the

number of integrated layers. Then each unit in the weight matrix is automatically

updated during each iteration of the CNN training. Let weightn

of each unit in the weight matrix of the nth _{layer, then the integrated value of}

each unitan+1(x, y) in the integration layer is calculated as:

an+1(x, y) = N n=1 weightn₍_x,y₎×an(x, y) (5.6)

5.5 Experimental Results

The proposed WIAN is implemented using Caﬀe [153]. The experimental envi- ronment is consisted of a computer with an i7 processor, 32GB RAM, and an NVIDIA TITANX. The network is trained using mini-batches of size 100 with- out data augmentation. The training process starts from the initial weights and learning rates, and it continues until the accuracy on the training set stops im- proving. Then the learning rates are lowered by a factor of 10 according to an epoch schedule determined on the validation set. The source code of WIAN is available at: http://press.liacs.nl/researchdownloads/.

5.5.1 Datasets

We evaluate the performance of WIAN on four benchmark datasets: CIFAR-10 [164], CIFAR-100 [164], MNIST [8] and SVHN [173].

CIFAR-10: the CIFAR-10 dataset is constructed for object recognition. It is

composed of 10 object classes, with 6000 images per class. 50000 images are selected for training, and the remaining 10000 images are used for testing. Each

image is given in the RGB-format with size 32×32pixels.

CIFAR-100: the CIFAR-100 dataset is similar to the CIFAR-10 dataset (both

use the same image size and format), except that the CIFAR-100 contains 100 classes with 600 images per class. CIFAR-100 also uses 50000 images for training and the remaining 10000 images for testing.

MNIST:the MNIST dataset consists of images of hand written digits which are

5.5 Experimental Results

in total. For this experiment, all images of the dataset have been resized to a

ﬁxed resolution of32×32pixels.

SVHN:the Street View House Numbers (SVHN) dataset is a collection of house

numbers in the Google Street View images. It is composed of over 600000 color

images with a ﬁxed resolution of32×32pixels.

In document Large scale visual search (Page 106-110)