3. RESULTS AND DISCUSSION
3.3 Analysis of the Results
3.3.2 On the images with a modified resolution
The more we try to map a large area with the aid of a UAV or a satellite, the larger the scale of the image becomes. Hence, the volume of information in one pixel becomes disproportionally smaller. Therefore, I conducted several studies when applying the downsampling and upsampling the images. For this purpose, I reduced all the resolution of the train and validation datasets to 224 x 224, the standard spatial dimension of each network. Then, I modified the resolution of the resulted validation dataset, specified in the following table.
26
Train crop size Test crop size Top-1 Acc % (Test) ResNet-10 224 x 224 28 x 28 67.27% ResNet-10 224 x 224 56 x 56 80.49% ResNet-10 224 x 224 112 x 112 81.45% ResNet-10 224 x 224 224 x 224 87.88% ResNet-10 224 x 224 320 x 320 88.13% ResNet-10 224 x 224 480 x 480 90.45% ResNet-14 224 x 224 28 x 28 64.01% ResNet-14 224 x 224 56 x 56 80.35% ResNet-14 224 x 224 112 x 112 79.18% ResNet-14 224 x 224 224 x 224 84.74% ResNet-14 224 x 224 320 x 320 88.85% ResNet-14 224 x 224 480 x 480 91.14% ResNet-34 224 x 224 224 x 224 95.13%
Table 5. Top-1 Accuracy of models based on ResNet-10 and ResNet-14, with different test cropping size
27
The study shows the ResNet-derived networks managed to achieve a comparable result even if the input dimension is reduced. This shows that the generalization capabilities of the ResNet framework would not be impaired by the reduced number of residual blocks.
However, if the spatial dimension of the inputs is more than 50% smaller than the CNN’s input dimension, it shows that the classification performance is dropped by 15%. If the resolution of the input is equal to one-tenth of the input dimension, the accuracy is sharply decreased to 64%. Table 6 shows a similar result to support our previous claim.
It is interesting that, with lesser number of convolutional layers, our VGG-derived networks, VGG-MR0 and VGG-MR2, produce marginally better Top-1 Accuracy results (5 – 13%) than our ResNet-derived networks, ResNet-10 and ResNet-14. However, one must take also into consideration that the ResNet-derived networks have significantly lower numbers of parameter than the numbers of VGG-derived networks; ResNet-derived networks have less than the one-tenth number of parameters than the VGG-derived networks.
28 Train crop size Test crop size Top-1 Acc % (Test) VGG-MR0 224 x 224 28 x 28 77.75% VGG-MR0 224 x 224 56 x 56 81.96% VGG-MR0 224 x 224 112 x 112 77.75% VGG-MR0 224 x 224 224 x 224 86.6% VGG-MR0 224 x 224 320 x 320 88.63% VGG-MR0 224 x 224 480 x 480 88.48% VGG-MR2 224 x 224 28 x 28 72.4% VGG-MR2 224 x 224 56 x 56 82.34% VGG-MR2 224 x 224 112 x 112 81.11% VGG-MR2 224 x 224 224 x 224 87.94% VGG-MR2 224 x 224 320 x 320 88.32% VGG-MR2 224 x 224 480 x 480 88.54% VGG-16 224 x 224 224 x 224 88.14%
Table 6. Top-1 Accuracy of models based on VGG based networks, with different test cropping size
29
Using regularization techniques (e.g., Dropout layer) could yield a better classification accuracy to our derived networks, but that could be left for our future study. However, the objective of this study is to observe if the CNN-based classifier with fewer layers can maintain the comparable classification accuracy. Table 6 shows that there is a marginal decrease in classification accuracy if the overall resolution of the test dataset decreases from the original resolution to its downsampled resolution. The Top-1 Accuracy for both of our derived networks (VGG-MR0 and VGG-MR2) decreases by 6 ~ 9% (5%) if our input image’s resolution is reduced to 112 x 112 (56 x 56) pixels. If the input is further reduced to 28 x 28 pixels (12.5 % of the original resolution), the accuracy plummets by 10 ~ 15%.
4. CONCLUSIONS
Striking an appropriate balance between the size of a neural network and the
characteristics of a dataset is critical. Using our dataset, I have shown that a simpler artificial neural network derived from the current state-of-the-art CNN framework could achieve a satisfiable classification result with marginally reduced accuracy. If one can afford a specific loss of accuracy for the sake of the feasibility, I believe that one could produce a vegetation map with a larger area, using a similar approach. Primarily, I find it somewhat surprised that a CNN could handle the downsampled images this well (6-10% performance drop), based just on the original input size. (224 × 224 pixels) The
performance of derived networks remains to be seen, i.e., verifying a similar performance can be achievable with different datasets. Still, in that scenario, the experimental results imply that this could be applied for on-site inspection of farmland. Although it would
30
require a computer equipped with a decent GPU to train such a model, I hope that our modified models could fit any embedded device thanks to the reduced model size.
For the future work, I will apply our findings from this study to imageries containing the much larger area. I will also work with various CNN frameworks to validate empirically that our approach is feasible with most of the frameworks currently available.
31
LITERATURE CITED
[1] Rawat, W., & Wang, Z. (2017). Deep convolutional neural networks for image classification: A comprehensive review. Neural computation, 29(9), 2352-2449. [2] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image
recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
[3] Wang, J., & Perez, L. (2017). The effectiveness of data augmentation in image classification using deep learning (No. 300). Technical report.
[4] [5] [6] [7] [8] [9]
Torralba, A., & Efros, A. A. (2011, June). Unbiased look at dataset bias. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on (pp. 1521-1528). IEEE.
Zou, W. W., & Yuen, P. C. (2012). Very low-resolution face recognition problem. IEEE Transactions on Image Processing, 21(1), 327-340.
He, K., Zhang, X., Ren, S., & Sun, J. (2016, October). Identity mappings in deep residual networks. In European Conference on Computer Vision (pp. 630-645). Springer, Cham.
Chatfield, K., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Return of the devil in the details: Delving deep into convolutional nets. arXiv preprint
arXiv:1405.3531.
Zhu, X., Vondrick, C., Ramanan, D., & Fowlkes, C. C. (2012, September). Do I Need More Training Data or Better Models for Object Detection? In BMVC (Vol. 3, p. 5).
32 [10] [11] [12] [13] [14] [15] [16] [17] [18]
Taylor, L., & Nitschke, G. (2017). Improving Deep Learning using Generic Data Augmentation. arXiv preprint arXiv:1708.06020.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).
Chen, X. W., & Lin, X. (2014). Big data deep learning: challenges and perspectives. IEEE access, 2, 514-525.
Sa, I., Ge, Z., Dayoub, F., Upcroft, B., Perez, T., & McCool, C. (2016).
Deepfruits: A fruit detection system using deep neural networks. Sensors, 16(8), 1222.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929-1958.
Shen, L. (2017). End-to-end Training for Whole Image Breast Cancer Diagnosis using An All Convolutional Design. arXiv preprint arXiv:1708.09427.
Yu, F., & Koltun, V. (2015). Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122.
Masko, D., & Hensman, P. (2015). The impact of imbalanced training data for convolutional neural networks.
Bongiovanni, R., & LoInberg-DeBoer, J. (2004). Precision agriculture and sustainability. Precision agriculture, 5(4), 359-387.
Suh, S. H., Kim, D. Y., Jhang, J. E., Byamukama, E., Hatfield, G., & Shin, S. Y. (2017, September). Identification of the White-Mold affected Soybean fields by
33 [19] [20] [21] [22] [23] [24] [25] [26]
using Multispectral Imageries, Spatial Autocorrelation and Support Vector
Machine. In Proceedings of the International Conference on Research in Adaptive and Convergent Systems (pp. 104-109). ACM.
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Berg, A. C. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211-252.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324. Duda, R. O., & Hart, P. E. (1973). Pattern classification and scene analysis. A Wiley-Interscience Publication, New York: Wiley, 1973.
Goodfellow, I., Bengio, Y., Courville, A., & Bengio, Y. (2016). Deep learning (Vol. 1). Cambridge: MIT Press.
Perez, L., & Wang, J. (2017). The effectiveness of data augmentation in image classification using deep learning. arXiv preprint arXiv:1712.04621.
Lin, M., Chen, Q., & Yan, S. (2013). Network in network. arXiv preprint arXiv:1312.4400.
He, K., & Sun, J. (2015). Convolutional neural networks at constrained time cost. In Proceedings of the IEEE conference on computer vision and pattern