Increasing the Classification Accuracy in Deep Learning using Generative Adversarial Networks

(1)

Increasing the Classification Accuracy in Deep Learning using Generative Adversarial Networks

Kalpana Devi Bai. Mudavathu*¹, Dr M V P Chandra Sekhara Rao²

1Research Scholar, Department of CSE, Acharya Nagarjuna University, Guntur , AP, India.

2 Professor, Department of CSE, RVR & JC College of Engineering, Guntur ,AP, India.

Abstract

In recent years, Generative Adversarial Networks (GANs) have become a research focus of deep learning. Several applications and models have been developed using GANs to make the training process more optimized and accurate. The consistency of training data sets is one of the core features of machine learning algorithms. Several models fail to achieve higher accuracies due to inconsistency in datasets. In this proposed work we discuss how the machine learning model on images varies its performance with data, and introduce a new approach for generating training datasets of images using generative models. GANs are particular kind of unsupervised Neural Networks which use differentiable randomness to generate photo-realistic images. The developed GAN architecture automatically adds the randomness to the outputs making the images as realistic as possible. All our manipulations to the GAN architecture are expressed concerning necessitated optimisation and are used on near-real time. The new datasets are generalized with different network architectures without any prior additions and are considered as high quality in case of the generated model.

Keywords: Convolution Neural Networks, Dropout, GANs, Generative Models, Hyper- parameters, Improving Model, Max Pooling, Neural Networks, Vanilla Neural Network.

1. Introduction

Recognition of images is one of the most challenging tasks to identify with computers. The main reason is that every image is unique. The features keep on changing for each of them.

Different lights, colours, angles, shapes, facial expressions (Concerning identifying humans) make the computers more difficult to learn and identify [1], [2].

So, this is a challenging problem in the field of machine learning to develop a classifier which gives us the best results. There are several algorithms to solve this problem using computer vision applications of finding geometry such as Cascade Classifier [3] to detect objects. Even though these did not perform well in advance feature detection techniques, neural nets came into action. Several neural Network models are proposed and trained until now for image feature extraction, classification and optimisation.

As these are very difficult for a traditional computer to solve, Machine learning came into action and in the case of image recognition Deep learning specifically. Deep learning is a subset of machine learning which imitates the human brain to solve several problems [4]. It draws inspiration from the human brain's neuron and is used for identifying the patterns in the data. Deep learning uses examples/data to train the algorithm. One of the most significant problems is to acquire data to train. These neurons are arranged in deep layers with different

(2)

patterns and architectures. All the neurons are firmly connected with their corresponding weights. When the data is passed in through these layers, the model or the network learn about the data, and whenever a new input is given to the model based on the previously trained network, it gives an output/prediction with specific parameters.

Few Neural Network Architectures like Convolution Neural Networks (ConvNets) performed exceptionally well in identifying the features of the image. This model could not perform well on few images due to inconsistent and insufficient data in particular. This work will solve the problem of insufficient data for image recognition/feature identification using particular kind of Neural Nets (Generative Adversarial Networks). GANs are also advancing in several medical fields such as radiology where they identify the geometry of the medical images in contrarily to generic natural photos. These help in identifying the abnormalities and produce more smooth and clear outputs of the input images for the continuous run of diagnosis. So to increase the efficiency of model this data generation technique can also be applied.

2. Related Work

Ian Goodfellow et al. proposed Generative Adversarial Network which is initially used for data generation. Later these GAN's were widely used in lots of applications such as Neural Transfer, increasing resolution of Images also named "Super-Resolution." These GANs are classified into two categories of supervised learning the first part are known, generative models. These Generative models are used for Parameter and Hyper-parameter estimation.

In recent days these are advanced for data generation techniques for evaluating models.

They use several differential techniques to add randomness for the real data/samples which are given as input. This part of GAN's is also known as Machine Understand Data. The other part is discriminator is a classification algorithm. The discriminator is an ordinary neural network architecture which learns the parameters over several iterations of input data. In most of the research, the most common discriminator used is a convolutional neural network.

This CNN model gave the most accurate results while classifying the real and fake samples between the training data and the data generated by the generator.

Alireza Makhzani proposed a new way of boosting GANs in his work "Adversarial AutoEncoders." These introduce a new method of understanding deep models by mapping the previous data that is trained to the data distribution. Using GAN's we can also create artwork which is similar to Neural Style Transfer [6] by straying from accomplished styles.

3. Deep Convolutional Neural Networks

Convolution Neural Networks also called as ConvNets, is a particular type of neural network architecture which is widely used on Image Classification. ConvNets have three individual layers convolutional layer, pooling layer and fully connected layer. These ConvNets are much similar to artificial neural networks apart from these three layers [5]. A filter/kernel is used to find the features in the image. We multiply the filter with image matrix (Image converted to pixel matrix) to identify the features. This operation is called as Convolution.

One of the exceptions is that this Convolution Architecture already knows the input as the raw images which are encoded in tensors. With this Neural Network makes the algorithm more classified as the input is already encoded by the parameters of the neural network. So much faster-training processes when CNN is applied to images rather than working on texts or other kinds of data.

(3)

Figure 1. Convolution Neural Network Diagrammatic Architecture

3.1. Convolution Layers

In ConvNets, convolutional layers are the core feature. These layers function in a way that the output of neurons is computed which is connected to local regions in the input. Each computing a dot product linking their weights and a region they are connected to the input.

The convolutional layer parameters consist of a set of learnable filters [8]. The filters are spatially small when the shapes are considered. A typical filter can be taken off size 5x5x3 where the initial number 5 represents the height, and second 5 represents the width of the image and the last 3 represents the depth. As we have three primary colours red, blue, green to represent the image we take the depth as 3. When the network is in forwarding pass state, we convolve the filter with the image which is taken as the input and calculates a dot product within the entries of the filter and the input at any location.

3.2. Pooling Layers

It is common to regularly insert a Pooling layer between two consecutive convolution layers in a Convolution Neural Network. Its purpose is to progressively reduce the spatial size of the image to reduce the number of parameters and computation in the neural network.

The advantage of using pooling layer is that it reduces the over fitting problem [7]. The Pooling Layer functions independently on every depth slice of the input and resizes it spatially, using the MAX operation(returning the highest value in them) or AVG operation(returning the average of input features). The most common form is a pooling layer with filters of size 2x2 applied with a stride (controls how the filter convolves around the input size) of 2 down samples every depth slice in the input by two along both width and height, discarding 75% of the activations.

Every MAX operation would, in this case, be taking a max over four numbers (the small 2x2 region in some depth slice). The depth dimension remains unchanged.

Figure 2. Filter with size 2 x 2

(4)

3.3. Activation Layers

Activation function gives the output of the neuron when the inputs are passed with the respective weights. Individually in the artificial neural network, we do the total of products of inputs(X) and their corresponding Weights (W) and employ an Activation function f(x) to it to get the output of that layer and feed it as an input to the next layer [9].

Liner Rectified unit is one of the mostly used activation functions. These give us the generalised-element-wise output indicating that any value below zero will result in zero.

Mathematically the Rectified Linear unit (ReLU) activation is given by

f (x) = max(0, x) (1)

Sigmoid function returns binary output. The sigmoid nonlinearity has the mathematical form (2). It takes a real-valued number and “squashes” it into a range between 0 and 1. It is defined by the equation,

where „a‟ is the slope of the sigmoid activation, and „v’ is the output of the neuron for which the sigmoid function is applied.

4. State of the Art ConvNets

Several neural network architectures are used to classify images in deep learning. Out of which most used is ConvNets. ConvNets mostly depend on the data that is fed to the training algorithm. More precise the data is passed in as inputs more accuracy. Basic ConvNets does not lead to best outcomes, based on the data we need to change the convolution architecture and the parameters.

4.1. Datasets

As mentioned above the dataset chosen for training was MNIST handwritten dataset and CelebA. To efficiently distinguish deep learning algorithms used for handwritten digits the need for a shared dataset to train and test against is needed. This dataset is claiming and very challenging to classify accurately is the predictions are. The test and train datasets are divided into 70 to 30 ratio. The training dataset is used for training the model, and the test set is used for evaluation of the model. The GAN model is trained over two lakhs images from CelebA dataset for increasing the accuracy of the Convolutional Model.

4.2. Image Classification Techniques

Initially, most of the image classification algorithms are tested on MNIST handwritten numbers dataset. Logistic regression a binary classifier or a convolution neural network is used to classify different handwritten numbers from 0-9. Considering the Logistic Regression algorithm on digital classification below is the training loss and accuracy, as mentioned above ConvNets gave better accuracy and less loss.

(5)

Figure 3. Logistic Regression Loss

Table I. Logistic Regression on MNIST

Iteration Loss Accuracy

500 0.5933407545 82

1000 0.6860304475 84

2000 0.2904675007 85

3000 0.3925182223 86

4000 0.3139611483 86

5000 0.2556229234 87

Figure 4. ConvNets Loss

Table II. CNN on MNIST

Iteration Loss Accuracy

500 0.203466 95

1000 0.181349 95

2000 0.094402 95

3000 0.047231 96

4000 0.045122 96

5000 0.248005 96

(6)

The performance of the Logistic Regression and ConvNets are tabulated above.

Comparatively, ConvNets are better than Logistic Regression as the loss reduces drastically with more precision in the accuracy of the model. Although these algorithms have associations with the classification algorithm, they are not directly comparable. A direct comparison cannot be made between the logistic network and CNN architecture, as they are only performed on MNIST dataset.

4.3. Data Generation

Deep learning algorithms cannot perform well if the data is limited. Therefore, using the same deep learning algorithm, we have adopted a new way of generating the data by adversarial networks. By generating fake data, we appended them to the dataset, by this neural network gives better performance and metrics. Few commonly used data generation techniques are PCA, Data Augmentation. In data augmentation, we apply few operations like changing colours, flipping and rotating images. While PCA reduces the dimensions of the dataset and constructs a completely new dataset by increasing the features.

5. Generative Adversarial Networks

GAN's are one of the most powerful ways of generating data using Neural Networks.

They use generative models (A Discriminative approach) which adds noise(randomness) to the original data to make them fake. Usually, GAN's are created using two networks Discriminator and Generator Network. These two networks in GANs work next to each other in a zero-sum game to generate random data. The job of discriminator network is to classify images basically a common convolutional neural network is used [10,11,12]. The second generator network job is to add randomness noise to the input image and send back to discriminator network. This GAN, once it finishes its training, it generates huge amounts of images that are similar to the training data.

The images which are created using generative adversarial networks often look right to the individuals, so these lead us to train convolutional neural networks(discriminator) to classify different outputs in images. To check the trueness of generated images, there are no parameters and metrics used.

Figure 5. GAN Architecture 5.1. Proposed Solution

Data generation can achieve higher accuracies for image classification using

(7)

convolutional neural networks. This data addition includes significant innovative points, model structure improvement, a logical extension, and novel applications. Initially, the complete training data is sent to the algorithm where the discriminator network classifies the images. Once the images are classified as an original/classified image is sent to the generator. In a generator, any differential function can be used to add randomness to the original image. Let D, G represent Discriminator and Generator respectively, x be the input data (Image Tensors) and random variables z. Here D classifies the output as one if the image is a true (from training sample) or zero if the image is no match for the classes of the training set. i.e., D(x). The role of D is to achieve a correct classification of data from the training samples. G(z) generates fake images and resends to the discriminator, D(G(z)) when labels as one then the image is added to the training dataset. This process keeps on iterating until the image classifier model achieves the highest accuracy.

Figure 6. Proposed GAN Architecture

We rely on manually examining of output to optimise the generative model, which might make it hard to identify parameters and small differences in model performance. However, evaluating the performance of the generative network is still an open research question, and therefore manual review is the approach used. The first part of the solution is to train the generative adversarial networks to generate fake data for each class in the dataset. In our case, that means one generative adversarial network pair for each class in the dataset.

5.2. Learning Model

In our Neural Network consisting of D(discriminator) and G(Generator), we describe our optimiser initially, Which is equivalent to training the sigmoid function classification of classifiers. While the discriminatory network is in the active state, it is responsible for minimising the cross entropy. Mathematically, the cost function is defined as,

For "n" number of iterations the loss is calculated as,

One of the notable drawbacks about adversarial networks is that the discriminator has an unlimited modeling capability in classifying the real images and generated samples regardless of their complexity, which immediately causes over-fitting. To limit the modeling ability of the discriminator, we use Loss-Sensitive GAN (LS- GAN), which demands the external function to satisfy the Lipchitz constraint. Once the gradient vanished they give

(8)

consistent results. The addition of LS-GAN do not change the GAN architecture or the model but adjust the parameter learning and optimization method.

5.3. GAN Implementation

The discriminator network consists of two convolutional networks, each followed by BN (except for the output layer) and activation function which is Leaky ReLU (Leaky Rectified Linear unit). Several different activations like sigmoid, tanh can replace this activation function. In our use case, leaky RELU allows gradients flow more natural in the architecture.

Since the primary aim of the discriminator is to label real and fake samples, two loss functions should be implemented [13, 14, 15]. The sum of partial losses calculates the total loss in identifying the labels. Since the datasets are massive, we used batch gradient descent, training the network in mini batches with batch size n = 64.

The generator network consists of two convolutional networks, each followed by BN (except for the output layer) and activation function which is ReLU. The input images are converted into normally distributed vectors "z". These vectors are reshaped into four- dimensional layers and fed into the generator as a series of un-sampling layers. A transpose convolution is applied with stride length 2 for every un-sampled layer. The output layer size depends on the stride length. In the working model padding of length, two was used to keep the same resolution of the image. Once the fake image is classified, it is added to the training dataset.

5.4. Training the GAN

The GAN model is trained on the MNIST images with the normalised image resolution as 128*128. The total number of images in the sample is 60,000. To create the noise the strategy used is to select one possible number and randomise its uniform distribution. For other dataset celebA consisting of over two lakh different celebrity faces, we change the eye the distribution by adding randomness to the features such as eyes, face, skin and generate new samples [16,17]. The probabilities of labels are set individually with the chance of 0.30 for the input neural while the considered input dimensions are 100.

5.5. Hyper Parameters

To evaluate our model, we trained the GAN on two datasets. Firstly, on MNIST architecture mentioned above is used which is prepared with batch size 64 and learning rate 0.00025. The total number of epochs are two with thousand steps. The dimensions for the input neural network are taken as hundred.

6. RESULTS

Generation of MNIST was successfully trained on the machine, and after 10,000 epochs the neural network could be able to generate the numbers which could be recognised as

"Real Numbers" by the discriminator. Below are the generated images after every 2000 epochs. The training process took approximately 45 seconds on a standard CPU, and 90 Seconds GPU with 6GB Memory

.

(9)

Figure 7. Input

Figure 8. Output after 2000 Epochs

Figure 12. Final Output after 10,000 Epochs

The GAN architecture was modified and added few layers since the data that has to be processed in CelebA dataset is enormous [15]. We trained over two hundred thousand images of CelebA dataset over 10,000 epochs with the most massive batch size possible that is 128. The photos after every 2000 iterations are presented below.

(10)

Figure 13. Input

Figure 18. Final Output after 10,000 Epochs

After the dataset is modified, the images are again sent to the above-mentioned convolutional architecture [19,20,21]. The MNIST dataset is still trained in the model which gave 96.0% accuracy (Figure 20). After the fake data is added which is generated by the GAN Architecture, without changing the model the classifier achieved 99.23% (Figure 21).

If the dropouts are added as the data is increased there might be a chance of performance increase. Over 3000 new images are synthesised using the developed GAN Model, a similar process of training the data again increased the accuracy for about 2.7%, and a further increase of 1% is seen after adding the dropout layer.

(11)

Table III. CNN Architecture

Layer (type) Output Shape Param

conv2d_1 (Conv2D) (None, 32, 24, 24) 832

activation_1 (Activation) (None, 32, 24, 24) 0

max_pooling2d_1 (MaxPooling2) (None, 32, 12, 12) 0

flatten_1 (Flatten) (None, 4608) 0

dense_1 (Dense) (None, 10) 46090

activation_2 (Activation) (None, 10) 0

One of the drawbacks of this process is that the training time also increased proportionally.

Figure 19. Model of CNN

(12)

Figure 20. Real Dataset Performance

The above is the graph which represents the model accuracy with the total number of iterations in epochs and also the loss calculated with the cost function with iterations of MNIST dataset for the above Image Classifier architecture.

Figure 21. Modified Dataset Performance

7. Conclusion

We proposed a new solution for improving convolutional neural network by using adversarial systems. The accuracies of the image classifiers were increased persistently and applied to two datasets CelebA and MNIST. This toolbox can help several neural architectures to tune hyper-parameters as well as improve their training process. We also implemented few better ways of image generation techniques using GANs with minimal vanilla neural network architecture to add fake generated images to training dataset of image classifiers.

References

1. Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems.

2. Kunfeng Wang,Chao Gou,Yanjie Duan, et al. Generative Adversarial Networks:Introduction and Outlook[J].

IEEE/CAA Journal of Automatica Sinica, 2017, 4(4): 588-598.

3. X. Zhuang, W. Kang and Q. Wu, "Real-time vehicle detection with foreground-based cascade classifier," in IET Image Processing, vol. 10, no. 4, pp. 289-296, 4 2016. doi: 10.1049/iet-ipr.2015.0333.

4. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.

5. Krizhevsky, A., Sutskever, I. \& Hinton, G. ImageNet classification with deep convolutional neural networks.

In Proc. Advances in Neural Information Processing Systems 25 1090–1098.

6. CAN: Creative Adversarial Networks Generating “Art” by Learning About Styles and Deviating from Style Norms∗ Ahmed Elgammal1† Bingchen Liu1 Mohamed Elhoseiny2 Marian Mazzone3 arXiv:1706.07068v1 [cs.AI] 21 Jun 2017.

7. Holte, R.C. (1991). Very simple classification rules perform well on most datasets (Technical Report TR-91- 16). Ottawa, Canada: University of Ottawa, Department of Computer Science.

(13)

8. STRIVING FOR SIMPLICITY: THE ALL CONVOLUTIONAL NET Jost Tobias Springenberg∗ , Alexey Dosovitskiy∗ , Thomas Brox, Martin Riedmiller.

9. Searching for Activation Functions Prajit Ramachandran, Barret Zoph, Quoc V. Le arXiv:1710.05941 [cs.NE].

10. Improved Techniques for Training GANs Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen arXiv:1606.03498 [cs.LG]

11. Improved Training of Wasserstein GANs Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, Aaron Courville arXiv:1704.00028 [cs.LG]

12. It Takes (Only) Two: Adversarial Generator-Encoder Networks Dmitry Ulyanov, Andrea Vedaldi, Victor Lempitsky arXiv:1704.02304 [cs.CV].

13. MMD GAN: Towards a Deeper Understanding of Moment Matching Network Chun-Liang Li, Wei-Cheng Chang, Yu Cheng, Yiming Yang, Barnabás Póczos

14. Conditional Adversarial Network for Semantic Segmentation of Brain Tumor Mina Rezaei, Konstantin Harmuth, Willi Gierke, Thomas Kellermeier, Martin Fischer, Haojin Yang, Christoph Meinel arXiv:1708.05227

15. Face Transfer with Generative Adversarial Network Runze Xu, Zhiming Zhou, Weinan Zhang, Yong Yu arXiv:1710.06090 [cs.CV]

16. Generate Identity-Preserving Faces by Generative Adversarial Networks Zhigang Li, Yupin Luo arXiv:1706.03227

17. High-Quality Face Image SR Using Conditional Generative Adversarial Networks Huang Bin, Chen Weihai, Wu Xingming, Lin Chun-Liang arXiv:1707.00737

18. Neural Stain-Style Transfer Learning using GAN for Histopathological Images Hyungjoo Cho, Sungbin Lim, Gunho Choi, Hyunseok Min

19. VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION Karen Simonyan & Andrew Zisserman arXiv:1409.1556

20. Deep Visual-Semantic Alignments for Generating Image Descriptions Andrej Karpathy Li Fei-Fei arXiv:1412.2306v2

21. Visualizing and Understanding Convolutional Networks Matthew D. Zeiler Rob Fergus arXiv:1311.2901v3

Authors

Kalpana is a research scholar in the Department of Computer Science and Engineering at Acharya Nagarjuna University, Guntur, and Presently working as Assistant Professor at PSCMRCET, VIJAYAWADA affiliated to JNTUK, Kakinada. Se has over years of 9 experience in teaching and worked in the fields of Imaging Processing, Data Mining, Artificial Intelligence

M.V.P.Chandra Sekhara Rao , is a Professor in the Department of Computer Science and Engineering in R.V.R. & J.C. College of Engineering, Chowdavaram, Guntur. He has over the years of 22 experience in teaching. He completed his B.E and M.Tech in Computer Science & Engineering. He got Ph.D. from JNTU, Hyderabad and his research area is Data Mining. He has published 5 papers in international journals and presented a paper at an international conference.