Pedestrian Detection for Android Mobile Devices

(1)

2017 International Conference on Computer Science and Application Engineering (CSAE 2017) ISBN: 978-1-60595-505-6



Pedestrian Detection for Android Mobile Devices

Jing Li, Fang Qu and Yingdong Ma*

Inner Mongolia University, No.235, Daxue west road, Saihan District, Hohhot,Inner Mongolia Autonomous Region, China

ABSTRACT

With the popularity of Android mobile devices and rapid development of hardware performance, more computer vision algorithms can be implemented on Android mobile devices, such as object detection, face recognition, etc. This paper focuses on the problem of implementing pedestrian detection on Android mobile devices. Recently, the commonly used pedestrian detection algorithm is based on sliding window strategy which uses single or multi-features, such as HOG and Channel features. Due to the resource limitation of mobile devices, in this paper, we use HOG and LBP joint feature to achieve pedestrian detection. In order to improve the efficiency of the algorithm, we use the spatial pooling algorithm to process the HOG-LBP joint feature. Experiments on the Android mobile phone shows that this method not only has competitive accuracy but also improves the pedestrian detection efficiency on both INRIA dataset and mobile phone images.

INTRODUCTION

Pedestrian detection can be defined as whether the picture contains pedestrians, if there are pedestrians we will give location information, it is the first step in the processing of driver assistance[1], intelligent video surveillance and analysis of human behavior[2] and so on. In recent years, the pedestrian detection is also used in aerial images, victims of rescue and other emerging areas. Pedestrian has both rigid and flexible features, the appearance is easy to affect by wear, occlusion, posture, perspective and other factors, these issues make pedestrian detection research to be the focus and a challenging problem of computer vision.

(2)

INRIA dataset[5] and take pictures by using an Android mobile phone as test pictures to evaluate system performance. The experiment shows that the proposed spatial pooling joint feature method is not only more accurate than most multiple features based pedestrian detection methods, but also has a fast detection speed on the Android mobile devices.

RELATED WORK

From the work of Dalal and Triggs[6] in 2005, the problem of pedestrian detection has rapid development. Pedestrian detection methods can be divided into two categories, one is based on the background modeling[7], the other is based on statistical learning. At present, most state-of-the-art methods are based on multi-feature fusion[8] and a linear classifier, such as cascade AdaBoost[9].

Common features include Haar-like[10], HOG, LBP, edgelet, CSS, Covariance, and Integral channel[11]. HOG feature can reflect the appearance and contour of the target object, thus it is widely used in pedestrian detection[12, 13]. However, HOG not only has a strict requirement on the background, but also sensitive to texture noise. The LBP feature is another commonly used feature for pedestrian detection as it can express the local texture very well. Therefore, this paper combined these two features, then the cascade AdaBoost algorithm is applied to train the classifier.

In recent years, a series of applications based on Android mobile system have been studied. Some computer vision technologies have been successfully applied on various mobile platforms, such as face recognition and pedestrian detection on autonomous automobiles. Due to limited hardware resources, detection strategy selection is critical to the test results on mobile devices. We present a sliding window based method which uses spatial pooling HOG-LBP joint feature to implement efficient pedestrian detection. In the second step, we adopt JNI and NDK to transplant the trained classifiers to the Android platforms. In the following article, we introduce the spatial pooling HOG-LBP joint feature in section 3. Section 4 presents how to train cascade classifier. Section 5 describes how to migrate the classifier to the Android client. Finally, the experimental results are discussed and concluded in section 6 and section 7, respectively.

SPATIAL POOLING JOINT FEATURE

(3)

information and moreover, it compensates for the HOG feature which is sensitive to texture noise. In this paper, the HOG-LBP joint feature is adopted as pedestrian detection feature. Specifically, the size of training picture is 64*128 pixels. LBP and HOG are extracted respectively, and the obtained eigenvectors are combined to get the HOG-LBP joint feature. The process of computing HOG-LBP feature vectors is shown in Figure 1.

Enter a 64*128 smaple image

Extract HOG feature

Divide image to 16*16

Generate a 3780 diemension eigenvector

Extract LBP feature of each block

Generta a 1888 dimension eigenvector

Cascade into a 5668-dimensional

[image:3.612.117.479.166.255.2]

eigenvector

Figure 1. LBP and HOG joint process.

Spatial pooling has been proven to be invariant to various image transformations and demonstrates better robustness to noise [14]. Several empirical results have indicated that a pooling operation can greatly improve the recognition performance. The new feature representation preserves visual information over a local neighborhood while discarding irrelevant details and noises.

We divide the image window into small patches and extract LBP and gradient over pixels within the patch. The gradient and LBP of the histogram are calculated separately. For better invariance to translation, we perform max pooling over a pooling region and use the obtained results to represent the HOG and LBP in the pooling region. For max pooling LBP and max pooling HOG, the patch spacing stride, the pooling region size and pooling spacing stride are set to 1 pixel, 8 × 8 pixels and 4 pixels, respectively. These two features are then combined for later detection.

CASCADE CLASSIFIER TRAINING

Recent works show that most top performance detectors use AdaBoost based algorithm as classifier to achieve accurate pedestrian detection [15,16]. Comparing to support vector machine, AdaBoost is more efficient especially in the case of a large feature pool with multiple features, such as 10 channel features in [16]. Therefore, the cascade AdaBoost algorithm is adopted to train pedestrian classifier in this work.

(4)

TABLE I. FALSE POSITIVES FOR EACH STRONG CLASSIFIER.

Series No. of weak classifiers

False alarm rate

Seri es

No. weak classifiers

False alarm rate

1 15 0.462 2 20 0.464

3 29 0.379 4 42 0.390

5 60 0.393 6 70 0.381

7 85 0.377 8 109 0.392

9 128 0.355 10 160 0.349

11 174 0.340 12 195 0.315

SYSTEM BUILD

The development environment of this project is based on Android operating system. The main code of pedestrian detection, including classifier training is written in C++ language and using image processing functions of OpenCV library. In order to make the Java language and C++ functions call each other, we use JNI and NDK so that we can transplant OpenCV library, which is written in C++ language, to the Java based Android platform. The construction system includes two main contents: builds Android platform environment and procedures transplantation.

Environment

To ensure that C/C++ language can be used under Android platform, we implement Java and C&C++ communications using several APIs provided by JNI. NDK uses the command to compile the native code to generate dynamic link library files so that Java code can use and call the native functions.

OpenCV is a cross-platform computer vision library which composes of a series of C function and a number of C++ classes. As this work is implemented on Android system, we configure the OpenCV Android SDK to allow using of OpenCV on the Android equipments.

Program Transplantation

(5)

EXPERIMENT

Experimental Results on Personal Computer

[image:5.612.155.414.198.407.2]

The work is implemented on a notebook with 2.0GHz CPU and 2GB RAM. We also use an Android mobile phone with 2GB running memory and 8GB storage to test the proposed pedestrian detection system. The INRIA pedestrian detection dataset is adopted to test the proposed method on the PC side.

Figure 2. The compare results of experiments.

[image:5.612.209.380.530.652.2]

Figure 2 shows the results of this experiment with other pedestrian test results. It can be seen from Figure 2 that the proposed pedestrian detection system achieves 24% miss rate on INRIA dataset, outperforms some typical pedestrian detection methods.

TABLE II. DETECTION TIME COMPARISON.

Method Detection time (ms)

VJ 48

HOG 91

HikSvm 121

PosInv 46

ChnFtrs 18

Ours 93

(6)

[image:6.612.110.486.165.342.2]

the mobile phone. TABLE III shows detection time of various image sizes on the mobile phone. As we can see from this experiment, the running time is sensitive to testing image size. For example, the mobile phone spends about 45 seconds to process a 1280*910 pixel image whereas the running time is 0.5 seconds for an image less than 100 kb.

TABLE III. RUNNING TIME OF IMAGES WITH DIFFERENT SIZES ON MOBILE PHONE.

Image size

(kb)

Width*heig

ht

Detection

time(s)

Image

size(kb)

Width*heig

ht

Detection

time(s)

666 739*627 18 432 480*640 12

1045 868*720 23 523 641*496 17

482 640*480 15 1475 1055*879 29

313 333*531 7 776 1060*605 24

180 447*358 5 1547 1280*910 45

478 640*480 11 464 480*640 12

582 640*480 11 487 640*480 12

1260 960*664 25 434 640*480 10

22.46 96*160 0.5 21.24 70*134 0.4

Experimental Results on Mobile Devices

The pedestrian detection experiment on mobile devices is carried out on a Huawei mobile terminal with Android version 4.4.2. In this experiment, we use INRIA dataset and images took by the mobile phone. The pedestrian detection results are shown below, including successful and false cases.

[image:6.612.103.492.464.592.2]

(7)

(a)

[image:7.612.208.388.54.191.2]

(b)

Figure 4. Example of false positive cases.

Figure 3 shows the correct detection examples of single and multiple pedestrians. Figure 4 shows examples of false positive cases. From these examples we can see that most false positives are due to vertical structures in images, such as pillars and traffic signs. This phenomenon indicates that we should add more negative examples that have vertical structures to improve system performance. Finally, Figure 5 shows person detection examples in images that we took by using a mobile phone on campus.

(a) (b)

Figure 5. Person detection in images took by mobile phone.

CONCLUSIONS

[image:7.612.190.404.342.489.2]

(8)

ACKNOWLEDGEMENT

This research was supported by the National Natural Science Foundation of China under Grant 61461039.

REFERENCES

1. McCall, J. C and Trivedi, M. M. 2006. “Video-based lane estimation and tracking for driver

assistance: survey, system, and evaluation,” IEEE transactions on intelligent transportation systems, 7(1): 20-37.

2. Casarrubea, M., et al. 2015. "T-pattern analysis for the study of temporal structure of animal and

human behavior: a comprehensive review," Journal of Neuroscience Methods, 239: 34-46. 3. Lee, Yann-Hang, Preetham Chandrian and Bo Li. 2011. "Efficient java native interface for

android based mobile devices," Security and Privacy in Computing and Communications (TrustCom): 1202-1209.

4. Son, Ki-Cheol and Jong-Yeol Lee. 2011."The method of android application speed up by using

NDK," Awareness Science and Technology (iCAST): 382-385.

5. Sermanet, Pierre, et al. 2013. "Pedestrian detection with unsupervised multi-stage feature

learning," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013.

6. Dalal, Navneet, and Bill Triggs. 2005. "Histograms of oriented gradients for human

detection," Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. IEEE, 2005, 1: 886-893.

7. Bouwmans, Thierry. 2011. "Recent advanced statistical background modeling for foreground

detection-a systematic survey," Recent Patents on Computer Science, 2011: 147-176.

8. Ma, Yingdong, Deng Liang, Chen Xiankai, and Guo Ning. 2013. "Integrating orientation cue

with EOH-OLBP-based multilevel features for human detection," IEEE Transactions on circuits and systems for video technology, 2013, 23(10): 1755-1766.

9. Liu, Hui, et al. 2015. "Comparison of four Adaboost algorithm based artificial neural networks

in wind speed predictions," Energy Conversion and Management, 92: 67-81.

10. Shan, Caifeng. 2012. "Learning local binary patterns for gender classification on real-world face

images," Pattern Recognition Letters, 33(4): 431-437.

11. Zhang, Shanshan, Rodrigo Benenson, and Bernt Schiele. 2015. "Filtered channel features for

pedestrian detection," Computer Vision and Pattern Recognition (CVPR), 2015: 1751-1760. 12. Su, Xiaoqiong, et al. 2013. "A new local-main-gradient-orientation HOG and contour

differences based algorithm for object classification," Circuits and Systems (ISCAS), 2013 IEEE International Symposium on. IEEE, 2013.

13. Dalal, Navneet, Bill Triggs, and Cordelia Schmid. 2006. "Human detection using oriented

(9)

14. Paisitkriangkrai, Sakrapee, Chunhua Shen and Anton van den Hengel. 2016. "Pedestrian

detection with spatially pooled features and structured ensemble learning," IEEE transactions on pattern analysis and machine intelligence, 2016: 1243-1257.

15. Piotr Doll´ar, Christian Wojek, Bernt Schiele, and Pietro Perona. 2012. "Pedestrian detection: An

evaluation of the state of the art," IEEE transactions on pattern analysis and machine intelligence, 34(4): 743-761.

16. Benenson R, Mathias M, Tuytelaars T, et al. 2013. “Seeking the Strongest Rigid Detector,” in

Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2013.