Application of genetic algorithm and
deep learning for optimization of hyper
parameters
Anam Saiyeda
School of Engineering Sciences & Technology (SEST), Jamia Hamdard, New Delhi,110062, India
Moin Uddin
School of Engineering Sciences & Technology (SEST), Jamia Hamdard, New Delhi,110062, India
Abstract:
Deep learning is one of the most prevalent techniques today in the AI and machine learning revolution. With the growing power of social media and availability of image data the analysis of images has become an important part of machine learning. The strength of deep learning is that it works very well for image classification. Convolutional neural net is one of the techniques for classification of images. The hyper-parameters decide the architecture of the network. Thus their selection is an important task. This paper proposes a new technique for optimization and selection of the hyper-parameters using genetic algorithm. GA helps us in selecting the optimal set of hyper-parameters using cross entropy error as the fitness function.
Keywords: convolution, hyper-parameters, cross-entropy
I. Introduction
Deep learning, a branch of machine learning is inspired by the multi-layered neural networks of the human brain. It is based on a set of algorithms, which attempt to model high level abstractions in data. It has been applied to various AI problems in the areas of computer vision, speech recognition and natural language processing. [1, 2, 3]
Soft computing is used for inexact solutions to computationally hard tasks, for which there is no known algorithm that can compute an exact solution in polynomial time. It is a set of techniques, attempting to exploit tolerance for imprecision, uncertainty and partial truth. This is done to achieve robustness, tractability and low solution cost. The combination of these 2 techniques can yield better results for several tasks and lead to better solutions. [4]
A convolutional neural is a kind of feed-forward artificial neural network. It is used for supervised learning tasks. Hyper-parameters are variables which specify the structure of the network. Choosing hyper-parameters values is equivalent to model selection. A general network cannot be specified which works for all datasets. [5] Thus selecting an appropriate set of hyper-parameters is necessary. CNN hyper-parameters include the number of layers, the number of hidden units per layer, the activation function for a layer, the kernel size for a layer, the arrangement of these layers within the network, etc. [17]
The rest of the paper is organized as follows. The second section covers literature review. The next section introduces the concept of deep learning and hyper-parameters, followed by the description of Genetic Algorithms in the next section. The proposed work is described in the section 5 followed by the conclusion.
II. Literature review
Manual search, grid search and random search are the most widely used methods for hyper parameter selection in deep learning. [19] In manual search hyper-parameters are manually selected. Grid search is quick to implement and performs an exhaustive search of manually specified subset of the hyper-parameters space. Random search randomly chooses trials [6] but suffers from being non-adaptive. Bayesian optimization techniques use a probabilistic model. [20]
David et al. [7] proposed the use of genetic algorithm to improve the training of deep autoencoders. A chromosome in the GA population is a set of weights for the autoencoder. For each chromosome the root mean squared error (RMSE) is used as the fitness score to select the optimal solution.
Shinozaki et al. [8] use evolutionary algorithm as the optimization strategy for structure and parameters of a Deep Neural Net. Young et al. [9] performs optimization of hyper-parameters using genetic algorithm. The fitness function used is the error on the test set for the dataset. This paper proposed an algorithm i.e Modified Genetic Algorithm (MGA) tuned Artificial Neural Network. The parameters of the Artificial Neural Network(ANN) are optimized using MGA. It is applied to determine whether a given input data is suspicious for tumor or not [10].
A hybrid scheme using genetic algorithms (GA) and deep restricted Boltzmann machines (RBM) was proposed by Levy et al. [11] It extracted features from a painting using generic image processing (IP) functions and unsupervised deep learning (RBMs). The weighted nearest neighbor (WNN) method, using GA, outperformed SVM classifier and nearest neighbor classifier, with over 90% classification accuracy.
III. Deep learning and hyper-parameters
Deep learning has applications in many learning tasks, such as detection, forecasting, prediction, etc.[21] In several health related problems like the diagnosis of liver cancer, diabetes, heart-failure, tumor segmentation and transplant acceptance deep learning has been extensively used. Deep learning methods have been utilized in medical imaging tasks where the limited number of labeled datasets lead to difficulties. [12] In the field of medical image analysis deep learning has found various applications like vessel detection in ultrasound images [13], Parkinson’s disease [14], lung cancer [15] and tuberculosis [16].
The tasks performed using deep learning are classification and finding patterns. Unsupervised learning tasks use auto encoders and Restricted Boltzman machine(RBM). For supervised learning tasks like image recognition DBN(deep belief networks), convolution nets are used, for object recognition CNN and RNTN used and speech recognition utilizes recurrent nets. For general classification MLP and deep belief nets are used. Time series uses recurrent nets. [22]
3.1 Convolutional neural network(CNN)
CNN are feed-forward neural networks. CNN works well for the task of image classification. Image classification assigns an input image a label from a fixed set of categories. In a CNN the connectivity pattern between neurons is inspired by the organization of the animal visual cortex. It is a multilayer neural network where every layer transforms one volume of activations to another through a differentiable function.
The architecture of a CNN consists of several layers. The types of layers are categorized as Convolutional Layer, Pooling Layer, and Fully-Connected Layer. [23]
For image classification task the simple architecture has the layers [INPUT - CONV - RELU - POOL - FC]
INPUT is for storing the raw pixel values of the image
CONV layer computes the output of neurons that are connected to local regions in the input, computes a dot product between their weights and a small region they are connected to in the input volume.
RELU layer applies an element wise activation function, such as the max(0,x) thresholding at zero.
FC (fully-connected) layer computes the class scores, each neuron in this layer will be connected to all the numbers in the previous volume.
3.2 Hyper-parameters
Hyper-parameters are the parameters that express higher-level properties of the model, like complexity, speed of learning etc. These are usually fixed before the beginning of the actual training process. Number of hidden layers in a deep neural network, number of neurons, Learning rate, learning rate decay schedule, regularization strength are some hyper-parameters.
Mathematically hyper-parameters optimization problem can be described as follows.
Consider an algorithm has hyper-parameters α1 , α2 , α3 ………. αn with respective domains λ1, λ2, …., λn Hyper-parameters space is described as λ = λ1 × λ2 ×…. × λn For each hyper-Hyper-parameters setting α ∈ λ the learning algorithm is A α then the validation loss that Aα achieves on data Dvalid when trained on Dtrain is given by
l(α) =L(Aα, Dtrain, Dvalid) .
For this the hyper-parameters optimization problem is to find α∈ λ which minimizes l(α)
Model parameters are optimized according to some loss function, while hyper-parameters are searched for by exploring various settings to see which values provided the highest level of accuracy. Thus hyper-parameters selection is a search procedure to find the optimal values in the given set.
IV. Genetic Algorithm
Genetic Algorithms are subset of evolutionary algorithms, computer based problem solving algorithms utilizing computational models of evolutionary processes as key elements for design and implementation. These search algorithms are inspired by the evolutionary ideas of natural selection and genetics. Using natural selection and the genetics inspired operators of crossover, mutation and inversion they move from one population of chromosomes to a new population. Chromosomes are the basic solution sets and they compete to survive to be included in the final population similar to the idea propagated by Charles Darwin’s theory of evolution. [24]
The procedure begins with an initial population which can be randomly generated. According to the problem the population can be binary or non-binary. It is then subjected to various operators. Chromosomes are the members of the population. A chromosome is composed of cells. A fitness value is assigned to the population which calculates how good or bad it is. Creation of new chromosomes is done by using the older ones by the application of operators like crossover and mutation. The process stops once the solution is found. If the solution is not found then the chromosome forms a new generation. [25]
The crossover operator creates a new chromosome from two chromosomes by selecting some parts of the older ones. The various types of crossovers are one point, two point, multi point and uniform.
Mutation operator is used to create genetic diversity and break the local maxima. Selecting a chromosome and changing one of its bits is its simplest application forming a new chromosome. After crossover and mutation, the fitness of a chromosome is again evaluated. [26]
Fitness function evaluates the new chromosomes and is the basis for selection. It prunes the candidates with low fitness values and forms an improved generation. Fitness functions are be problem specific and are affiliated to maximizing or minimizing the value of fitness.
V. Proposed work
A technique for optimizing the hyper-parameters selection of a deep network is presented. Genetic algorithm is the evolutionary algorithm selected. The hyper-parameters to be optimized are encoded as a gene. A range is defined for each gene to avoid searching the areas of the hyper-parameters space that are not of interest. The initial population is generated randomly. The fitness for each individual is evaluated. The cross entropy error is used as the fitness function. The new generation is formed using selection, cross-over, and mutation based on the individuals with the highest fitness from the previous generation. Selection is performed by removing any individuals from the population that have a fitness value below the average fitness for their generation. The individuals are then used to create the new generation, performing crossover and mutation.
The fitness function is the cross-entropy error. During training mean squared error or average cross-entropy error is used for a neural network based deep learning classifier. However average cross-entropy error is considered slightly better because
CNN is being used to classify data, not for regression, thus cross entropy is used rather than mean squared error. Taking the assumption that the desired outputs are all either 0 or 1, if the neuron's actual output is close to the desired output for all training inputs, x, then the cross-entropy will be close to zero. This happens for classification problems. For Boolean functions, the cross-entropy is positive, it will tend towards zero as the neuron gets better at computing the desired output for training inputs. The cross-entropy cost function avoids the problem of learning slowing down, unlike the quadratic cost.
5.1 Proposed technique
1. Create initial population by initializing N networks with random set of hyper-parameters. A chromosome will be the set of hyper-parameters values
2. Perform crossover operation over the individuals 3. Perform mutation
4. Calculate fitness value according to the fitness function given in equation 1 5. Select the best solution based on fitness
6. Repeat steps till an improved population is generated with higher survival rate.
The fitness function is as follows
(1)
Where n is the total number of items of training data, x are the training inputs and y is the corresponding desired output.
The output from a neuron is a=σ(z), where z=∑j( wjxj) + b is the weighted sum of the inputs. A Neuron with several input variables, x1, x2… corresponding weights w1, w2 … and a bias b is depicted in figure 1.
Figure 2: Flowchart depicting the algorithm
VI. Results
The advantage of using the proposed technique will be improvement in running time. The fitness function proposed will also be an improvement as cross entropy is better suited for classification tasks than the root means squared error function.
The dataset chosen was image set from Caltech 101. [28] Weights and number of neurons in each layer are the chosen hyper-parameters to be optimized. A five layer convolution neural network is used with an input layer, convolutional layer, pooling layer, fully-connected layer and output layer. The task is classification of images into categories i.e. assigning labels to each image.
Using a brute force approach will require a lot of time running into hours. Whereas, genetic algorithm reduces the search time. Moreover the cross-entropy error ensures classification is performed in an improved way. This is because cross entropy is a better measure than mean square error for classification.
VII. Conclusion
Deep learning is the phrase-de-jour in machine learning and works very well for image classification. It performs automatic feature extraction, to classify data, as opposed to traditional algorithms, which require intense time computation and effort on the part of data scientists. It extracts more complex, flexible and less brittle features than hand-crafted methods, because of the feature hierarchy in the deep net. A new technique for improving the selection of hyper-parameters for the deep learning technique of convolutional neural networks has been proposed. The technique uses cross entropy error as the fitness function to remove the less fit chromosomes. Cross entropy is preferred over the generally used mean square error function or the classification error as the task being performed is classification. GA is used as the optimization technique as it avoids local maxima and minima and gives us a good optimal solution.
VIII. References
[1] M. M. Najafabadi, Deep learning applications and challenges in big data analytics, Journal of Big Data , vol. 2, no. 1, pp. 1–21 ,2015. [2] A. Dacal-Nieto, E. Vazquez-Fernandez, A. Formella, F. Martin, S. Torres-Guijarro, and H. Gonzalez-Jorge, A genetic algorithm
approach for feature selection in potatoes classification by computer vision, 2009 35th Annual Conference of IEEE Industrial Electronics, 2009.
[3] G Hinton, Geoffrey, Li Deng, Dong Yu, George E. Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine
29, no. 6 : 82-97, 2012
[4] Zhang, Jian, Chongyuan Tao, and Pan Wang. A Review of Soft Computing Based on Deep Learning. Industrial Informatics-Computing Technology, Intelligent Technology, Industrial Information Integration (ICIICII) International Conference on. IEEE, 2016. [5] Claesen, Marc, and Bart De Moor. Hyperparameter Search in Machine Learning. arXiv preprint arXiv:1502.02127, 2015.
[6] Bergstra, James, and Yoshua Bengio. Random search for hyper-parameter optimization. Journal of Machine Learning Research , 281-305, 2012
[8] Shinozaki, Takahiro, and Shinji Watanabe. Structure discovery of deep neural network based on evolutionary algorithms. Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015.
[9] Young, Steven R., et al. Optimizing deep learning hyper-parameters through an evolutionary algorithm. Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments. ACM, 2015.
[10] Dheeba, J., and S. Selvi. A CAD system for breast cancer diagnosis using modified genetic algorithm optimized artificial neural network. Swarm, Evolutionary, and Memetic Computing, 349-357, 2011
[11] Levy, Erez, Omid E. David, and Nathan S. Netanyahu. Genetic algorithms and deep learning for automatic painter classification.
Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation, ACM, 2014.
[12] Akselrod-Ballin, Ayelet, et al. A Region Based Convolutional Network for Tumor Detection and Classification in Breast Mammography. International Workshop on Large-Scale Annotation of Biomedical Data and Expert Label Synthesis. Springer International Publishing, 2016.
[13] Smistad, Erik, and Lasse Løvstakken. Vessel Detection in Ultrasound Images Using Deep Convolutional Neural Networks. International Workshop on Large-Scale Annotation of Biomedical Data and Expert Label Synthesis. Springer International Publishing, 2016.
[14] Eskofier, Bjoern M., et al. Recent machine learning advancements in sensor-based mobility analysis: Deep learning for Parkinson's disease assessment. Engineering in Medicine and Biology Society (EMBC), 2016 IEEE 38th Annual International Conference of the. IEEE, 2016.
[15] Shimizu, Ryota, et al. deep learning application trial to lung cancer diagnosis for medical sensor systems. SoC Design Conference (ISOCC), 2016 International. IEEE, 2016.
[16] Cao, Yu, et al. Improving Tuberculosis Diagnostics using Deep Learning and Mobile Health Technologies among Resource-poor and Marginalized Communities.Connected Health: Applications, Systems and Engineering Technologies (CHASE), 2016 IEEE First International Conference on. IEEE, 2016.
[17] Blog.rescale.com. (2017). Deep Neural Network Hyper-Parameter Optimization | Rescale. [online] Available at: https://blog.rescale.com/deep-neural-network-hyper-parameter-optimization/ [Accessed 2 Apr. 2017].
[18] Man, Kim-Fung, TANG, Kit Sang, Kwong, Sam, Genetic Algorithms Concepts and Designs, Advanced Textbooks in Control and Signal Processing, Springer-Verlag London, 1st edition, 978-1-4471-0577-0, 1999
[19] Erhan, Dumitru, Yoshua Bengio, Aaron Courville, Pierre-Antoine Manzagol, Pascal Vincent, and Samy Bengio. Why does unsupervised pre-training help deep learning?. Journal of Machine Learning Research 11, no. Feb (2010): 625-660 , 2010
[20] Snoek, Jasper, Hugo Larochelle, and Ryan P. Adams. Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems, pp. 2951-2959, 2012.
[21] Deng, Li, and Dong Yu. Deep learning: methods and applications. Foundations and Trends in Signal Processing 7, no. 3–4 (2014): 197-387.
[22] Sunila Gollapudi, Practical Machine Learning, Packt Publishing Ltd, 1784394017, 9781784394011, 2016
[23] Leszek Rutkowski, Marcin Korytkowski, Rafał Scherer , Artificial Intelligence and Soft Computing, 15th International Conference, ICAISC 2016, Zakopane, Poland, June 12-16, 2016, Proceedings, Part II, Series Volume 9693, Springer International Publishing Switzerland, 2016
[24] David E. Goldberg, Genetic Algorithms, Pearson Education India, 2009
[25] Lance D. Chambers, The Practical Handbook of Genetic Algorithms: New Frontiers, Volume 2, CRC Press, 1995 [26] Melanie Mitchell, An Introduction to Genetic Algorithms, 0262631857, 9780262631853MIT Press, 1998
[27] Information Resources Management Association (IRMA), Nature-Inspired Computing: Concepts, Methodologies, Tools, and Applications: Concepts, Methodologies, Tools, and Applications, 1522507892, 9781522507895 IGI Global, 2016