2018 International Conference on Physics, Computing and Mathematical Modeling (PCMM 2018) ISBN: 978-1-60595-549-0
The Impact of Imbalanced Training Datasets on CNN Performance in
Typical Remote Scenes Classification
Tong SHI
*, Jie WANG, Peng-fei WANG, Qi-hang CAI and Yao-chang HAN
Air and Missile Defense College, Air Force Engineering University, Xi’an 710051, China
*Corresponding author
Keywords: CNN, Remote scenes, Classification, Imbalanced training datasets.
Abstract. Convolution neural network (CNN) has a prominent performance in image classification. This thesis empirically studies how imbalanced training datasets impact on CNN performance in typical remote scenes classification. I adopted different strategy based on the complexity of scenes to set up imbalanced and balanced training datasets. Then I compared the results after training and testing experiments. Experimental results show that appropriate imbalanced strategy could improve the CNN performance in typical remote scenes classification. In face of specific classification tasks, the empirically method is a good choice to improve CNN performance as a whole.
Introduction
Aerospace remote sensing refers to the integrated detection and recognition technology of qualitative, quantitative, timing, locating and communicating through various sensing systems in aerospace vehicles such as airplanes, balloons, satellites and spaceships[1]. The classification of typical scenes, such as missile base, airport, bridge, port, overpass, oil depot and expressway, is the focus of aerospace remote sensing field, which has far-reaching theoretical significance and great application value both in civilian and military areas.
In the civilian areas, the research of classifying typical scenes can play a very good role in the planning of urban traffic and the construction of intelligent city. In the military areas, the technology of classifying typical scenes is a key problem in the field of military technology research all over the world today. Especially in modern info warfare, such scenes are the main object of long-range reconnaissance and directional strike. The technology will play a decisive role in grasping military opportunity, analyzing military situation and making military tactics[2,3].
Remote sensing image data has been presented with typical big-data "4V" features -- Volume, Velocity, Variety and Value. However, in the face of remote sensing image big-data[4], traditional image classification technology has become inadequate. In recent years, deep learning, reinforcement learning, transfer learning and other new intelligent algorithms have come to the fore and made remarkable achievements in the field of image recognition, speech recognition, big-data analysis, robot control and other fields[5].
As a typical algorithm of deep learning, convolution neural network (CNN) is particularly pronounced in the direction of image recognition and classification. With the optimization of the algorithm, complexity of CNN is increasing, and deep convolution neural network (DCNN) arises[6]. Many scholars begin to apply it to the remote sensing image classification, as does this thesis.
DCNN Models
sharing, which is mainly used for handwritten character recognition. LeNet-5 is the most classical CNN model[8], and its network structure is shown in Figure 1.
Figure 1. LeNet-5 network structure.
With the continuous improvement of computer performance, especially the development of graphical processing unit (GPU), tensor processing unit (TPU), and the support of big-data, the wave of deep learning has been raised. The development of DCNN is more and more rapid. In recent years, many excellent DCNN models have been proposed in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC)[9,10,11]. The accuracy record has been constantly refreshed. Table 1 compares several network models.
Table 1. Several network models.
Name AlexNet VGG GoogleNet ResNet
Year 2012 2014 2014 2015
Number of layers 8 19 22 152
Number of
convolution layers 5 16 21 151
Convolution kernels
size 11,5,3 3 7,1,3,5 7,1,3,5
Number of full
connection layers 3 3 1 1
Full connection layers
size 4096,4096,1000 4096,4096,1000 1000 1000 TOP-5 error rate 16.4% 7.3% 6.7% 3.57%
Inception No No Yes No
Through comparison, it is found that changes in the structure of DCNN is mainly reflected in the following two aspects: (1) the number of layers is increasing; (2) the structure is more complex. These DCNN models have achieved unprecedented success in the ImageNet natural image dataset, not only because the network model is being optimized, but also because millions of labeled images provide strong data support for network training.
Imbalanced or Balanced Dataset
[image:2.595.79.519.374.570.2]Moreover, the research objects of this thesis are six typical remote scenes -- expressway, bridge, oil depot, overpass, port, airport. The complexity of the above six scenes is gradually increasing. According to the learning mechanism of the human brain, complex problems need to be allocated more learning energy[14]. Similarly, complex images need to be seen more than once to understand. So, for the specific classification task of this thesis, we need to raise the number of complex scenes samples to distinguish the complex scene better.
The following part will explain how imbalanced training dataset impacts CNN performance in typical remote scenes classification.
Establishment of Datasets
In order to set up the imbalanced dataset and the balanced dataset of this thesis, it is necessary to collect enough remote sensing images of six typical scenes. Many research teams have published their own remote sensing image datasets, such as, UC-Merced, WHU-RS19, NWPU-RESISC45, AID and so on, which have been widely used by many scholars.
Each dataset contains a number of scene categories, including the types of images we needed, but some of these categories are not meaningful to the study of this article. Therefore in order to build training dataset and testing dataset, we need integrate the contents of several public remote sensing image datasets, from which the typical remote scenes images is selected, and then properly collect some supplementary samples and finally perform necessary image preprocessing.
[image:3.595.55.542.412.707.2]In addition, different strategies can be adopted to establish multiple imbalanced datasets to make experiments more general. For example, imbalanced training dataset I and imbalanced training dataset II samples grow with the increase of complexity degree linearly. Merely, former slope is bigger. Imbalanced training dataset III adopts approximate regression strategy.
Table 2. Balanced dataset and imbalanced dataset.
Category
expressway bridge oil depot overpass port airport
Complexity
degree 0.5 0.6 0.7 0.8 0.9 1.0
Balanced training dataset
samples
500 500 500 500 500 500
Imbalanced training dataset I
samples
250 400 550 700 850 1000
Imbalanced training dataset
II samples
500 600 700 800 900 1000
Imbalanced training dataset
III samples
550 700 800 900 950 1000
Balanced testing
dataset samples 500 500 500 500 500 500
Experimentation Analysis
The performance is defined by the rate of accuracy the CNN produced during testing. A correct answer means the network classified an image as the class it belonged to. The performance was measured for each individual category and a mean result for the entire dataset. Table 3 and Figure 2 shows the testing results of four cases.
Table 3. The rate of accuracy.
Cases Dataset
Category
express
way bridge oil depot overpass port airport total
Case. 1 Balanced training
dataset 0.7 0.71 0.73 0.68 0.69 0.66 0.7
Case. 2 Imbalanced training
dataset I 0.55 0.6 0.62 0.65 0.69 0.69 0.63
Case. 3 Imbalanced training
dataset II 0.71 0.74 0.75 0.73 0.76 0.75 0.74
Case. 4 Imbalanced training
dataset III 0.75 0.77 0.75 0.746 0.8 0.77 0.76
Figure 2. The rate of accuracy.
Form the total accuracy rate, Case 2 <Case 1 <Case 3 <Case 4.
In Case 2, imbalanced training dataset I samples grow with the increase of complexity degree linearly. But the bigger slope makes the simple scenes samples too few, so that the network training is insufficient.
In the three other cases, the mean correct rate of balanced dataset is roughly the lowest, which illustrates that the reasonable allocation strategy can improve the comprehensive performance of the network, so that the network can accomplish the specific classification task better.
Comparing Case 3 and Case 4, the latter is roughly higher than the former. The results show that approximate regression strategy adopted by imbalanced training dataset II could solve the problem of learning energy distribution in image recognition better than simple linear strategy.
Summary
However, this is indeed an empirical conclusion, but it gives us some inspiration and help in our usual training of the CNN model. In combination with the actual situation, whether there is a better allocation strategy remains to be continued to study.
References
[1] Schanda E. Remote Sensing for Environmental Sciences[M]. Springer Berlin Heidelberg, 1976.
[2] Qu J, Sun X, Gao X. Remote sensing image target recognition based on CNN[J]. Foreign Electronic Measurement Technology, 2016.
[3] Maggiori E, Tarabalka Y, Charpiat G, et al. Convolutional Neural Networks for Large-Scale Remote-Sensing Image Classification[J]. IEEE Transactions on Geoscience & Remote Sensing, 2016, 55(2):645-657.
[4] Ma Y, Wu H, Wang L, et al. Remote sensing big data computing: Challenges and opportunities[J]. Future Generation Computer Systems, 2015, 51:47-60.
[5] Wu Q, Qiu J, Ding G. Machine Learning Methods for Big Spectrum Data Processing[J]. Journal of Data Acquisition & Processing, 2015.
[6] Schmidhuber Jorgen. Deep learning in neural networks[M]. Elsevier Science Ltd. 2015.
[7] Hubel D H, Wiesel T N. Receptive fields, binocular interaction and functional architecture in the cat's visual cortex[J]. Journal of Physiology, 1962, 160(1):106.
[8] Ketkar N. Convolutional Neural Networks[M]// Deep Learning with Python. Apress, 2017.
[9] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[C]// International Conference on Neural Information Processing Systems. Curran Associates Inc. 2012:1097-1105.
[10] Shelhamer E, Long J, Darrell T. Fully Convolutional Networks for Semantic Segmentation[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2014, 39(4):640.
[11] He K, Zhang X, Ren S, et al. Identity Mappings in Deep Residual Networks[J]. 2016:630-645.
[12] Buda M, Maki A, Mazurowski M A. A systematic study of the class imbalance problem in convolutional neural networks[J]. 2017.
[13] Japkowicz N. Learning from Imbalanced Data Sets.[J]. Ai Magazine, 2000.
[14] Russell S J, Norvig P. Artificial intelligence: a modern approach[J]. Applied Mechanics & Materials, 2010, 263(5):2829-2833.