Multi Object Detection and Classification using Superpixel based Saliency Detection and Multi Support Vector Machine for Visually Impaired Person

(1)

ISSN: 2005-4238 IJAST 19

Multi Object Detection and Classification using Superpixel based Saliency Detection and Multi Support Vector Machine for

Visually Impaired Person

1Saju A and ²Dr. H. N. Suresh

1Research Scholar (VTU),

Department of Electronics & Instrumentation Engineering, Bangalore Institute of Technology,

Bangalore, Karnataka [email protected]

2Professor,

Department of Electronics & Instrumentation Engineering, Bangalore Institute of Technology,

Bangalore, Karnataka

Abstract

In recent decades, the object detection systems has gained significant research interest due to the progression in computer vision technology. Hence, there are many object recognition systems are developed for assisting visually impaired person, still there is a constant demand for better object recognition systems. In this paper, a new system is proposed for better object recognition and classification. Initially, the input images were collected from Caltech database. At first, the superpixel based saliency object detection method was used for segmenting the objects, because it gives more saliency information of an image with the advantage of color contrast.Then, feature extraction was accomplished by utilizing Local Ternary Pattern (LTP), colour moments, and Histogram of Oriented Gradient (HOG). After feature extraction, reliefF algorithm was used for rejecting the irrelevant features or for choosing the optimal features. At last, the optimal feature values were given as the input for Multi-Support Vector Machine (MSVM) classifier for classifying the individual objects. The experimental outcome showed that the proposed work improved the classification accuracy up to 1.5-2% as related to the existing work by means of accuracy.

Key-words: Colour moments, histogram of oriented gradient, local ternary pattern, multi support vector machine, reliefF feature selection, superpixel based saliency object detection.

1. Introduction

Over the past years, automatic object recognition and classification from an image has attracted a significant amount of interest from academic to industrial communities [1].

Several applications rely onautomatic object recognition and classification such as surveillance, home care systems, intelligent transportation systems, human action recognition, mobile application for visually impaired person, etc [2-3]. Though, the object recognition is a tougher process, as the image quality vary from image to image [4]. For example, most of the images suffer from noise, illumination issues, orientation and so on. The object recognition system must be able to recognize the objects being present in the image,

(2)

ISSN: 2005-4238 IJAST 20

irrespective of all the shortcomings [5]. An image comprises of several objects and the object recognition system must be capable of differentiating between several objects in the image [6]. Currently, there are numerous object recognition techniques are available such as, high order multiple shape models [7-8], convolutional networks [9-10], still there is constant demand for reliable and efficient object recognition systems. Most of the existing methodologies encounter complexities by means of computation, memory and time. By considering these issues, a new automated system is proposed for object recognition and classification based on local and high level features.

In this paper, a new automated system was developed for improving the performance of object recognition and classification. At first, the input data were collected from Caltech database that comprises of images like aero-plane, bottle, building, ground, horse, human, road, sheep, sky, and water. After data collection, segmentation was carried out on individual images for segmenting the objects from the images. Then, feature extraction was carried out by usingHOG, LTP, and colour moments for extracting the feature values from the segmented images. The extracted hybrid feature values were fused by using feature level fusion technique. After obtaining the hybrid feature values,reliefF algorithm was used to reduce the dimensions of the extracted features. In reliefF algorithm, Manhattan distance was utilizedto identify the nearest miss and nearest hit instances. Here,Manhattan distance uses only a limited number of features for representing the data that effectively reduces the “curse of dimensionality” problem. Then, the output of feature selection was given as the input for MSVM classifier for classifying the individual objects. At last, the proposed work performance was compared with the existing works by means of accuracy, specificity, sensitivity, False Acceptance Rate (FAR), error rate, and False Rejection Rate (FRR).

This research paper is prepared as follows. In section 2, numerous research papers on object detection and classification are reviewed. The detailed explanation about the proposed work is given in the section 3. In addition, section 4 illustrates about the quantitative analysis and comparative analysis of the proposed work. The conclusion is made in section 5.

2. Literature survey

Many methods are developed by the researchers in object detection and classificationtopic. In this literature section, a brief review of some important contributions to the existing literature is presented.

R.Bhuvaneswari, and R. Subban, [11]developed a new object recognition system on the basis of feature extraction and points of interest. Initially, the point of interest of the image was selected using derivative kadir-bradydetector and the neighbourhood pixels of a particular window size was selected for further processing. In this literature paper, the Gabor and curvelet features were extracted from the area of interest, and classified by utilizing Support Vector Machine (SVM) classifier. The performance of the developed object recognition system was evaluated in light of accuracy, precision, recall and f-measure. From the experimental outcome, the developed approach outperforms the existing approaches with satisfactory result.

(3)

ISSN: 2005-4238 IJAST 21

P.Sengottuvelan, and R. Arulmurugan, [12] developed a new classification methodfor object classification by utilizing substance based neural network. Initially, wavelet transform was utilized to extract the feature vectors form the collected image in order to achieve more accurate information. Finally, the misclassification between the foreground and background images were reduced by using the neuralnetwork classifier model. Moreover, the substance based neural network involvesthe removal of the regions in the surrounding image to increase the accuracy of object classification. Performance evaluation reveals that the developed system reduces theoccurrence of misclassification and reﬂects the exact shape object accurately.

H.Ren, and Z.N. Li, [13] developed a new descriptor Boosted Local Binary (BLB) for object recognition. The developed descriptor encodes the variable local neighbour regions in dissimilarlocations and scales. Each region pair of the developed descriptor was selected by RealAdaBoost algorithm with a penalty term on thestructural diversity. In addition to this, the encoding scheme was applied in the gradient and intensity domains, which was complementary to standard binary descriptors. The developed methodology was tested using three benchmark datasets: Caltech, FDDB face dataset, and PASCAL VOC 2007 dataset.

Experimental results demonstrates that the detection accuracy of the developed methodology outperforms traditional binary descriptors.

X.S. Tang, et al, [14] utilized a multi-stream systemon the basis of different geometric feature spaces for object recognition. In order to assess the robustness and smoothness of the proposed representation, four representative geometric feature sets were examined. To further verify the effectiveness of the proposed system, the geometric feature sets were applied on the four challenging datasets. The developed multi-stream method achieves comparable or better results compared to the existing performers.

W. Tao, et al, [15] presented a new algorithm for object localization and classification on the basis of Spatial Adjacent Bag of Features (SABOF), Superpixel Adjacent Histogram (SAH) and multiple segmentation cues. The SABOF model integrates the spatial information into the traditional BOF model for more powerful object discriminating capability. The fusion of multiple segmentations into the SAH framework makes it possible to handle the objects with the large size variation more effectively. The object recognition and classification experiments were tested on Graz dataset and PASACAL VOC 2007 segmentation competition datasetto demonstrate the effectiveness of developed algorithm 3. Proposed system

The aim of this research work is to propose an efficient and reliable object recognitionand classification system that demands sensible computational over-head. For ensuring the proposed work modularity, the whole work is divided into five stages such as image collection, segmentation, feature extraction, feature selection and classification. Each stage performs the task significantly in order to achieve better object recognition. Figure 1 indicates the working procedure of proposed work and the detailed explanation about the proposed work is described below.

(4)

ISSN: 2005-4238 IJAST 22

Figure 1. Workingprocess of proposed work 3.1 Data collection

At first, the input data are collected from Caltech database, which contains images like aero-plane, bottle, building, ground, horse, human, road, sheep, sky, and water. These images are varied by means of locations, sizes, and orientations, and some sample images of Caltech database is denoted in the figure 2.In this research, eighty percentage of the images are utilized for training and twenty percentage of the images are used for testing. For everyimage category, the performance metrics are used to prove the effectiveness of the proposed work.

Figure 2. Sample images of Caltech dataset

3.2 Segmentation using superpixel based saliency object detection

After data collection, superpixel based saliency detection is carried-out for segmenting the objects from the images. The proposed saliency detection model comprises of

Caltech dataset

Feature selection ReliefF algorithm Segmentation

Superpixel based saliency object detection

Feature extraction HOG,colour moments, and LTP

Classification

Multi support vector machine

(5)

ISSN: 2005-4238 IJAST 23

three stages such as superpixel segmentation and adaptive color histogram, inter superpixel similarity, and superpixel saliency. Detailed explanation about the proposed saliency detection model is stated below.

3.2.1 Image simplification

In this segment, superpixel segmentation is carried-out to convert the collected image into lab color space. The proposed superpixel segmentation model works on the basis of linear iterative clustering for subdividing the image into number of superpixels that generally have compact and regular shape with better boundary adherence. The size of the superpixel is set to 𝑁/200, where 𝑁 is represented as number of image pixels, and the number of generated superpixel is 200, which is adequate to preserve the dissimilar boundaries. In adaptive color quantization, each color channel is quantized into 𝑞 bins for generating the image histogram𝐻₀. Then, the quantized color of each bin 𝑞𝑐_𝑘 is determined as the mean color of image pixels that falls into the 𝑘^𝑡ℎ bin, and the high probability bins 𝑚 have 𝛼. 𝑁 pixels, which is selected as the representative colors. Finally, the remaining bins are merged into one and then the quantized color of all the bins are updated for generating a color quantization table 𝑄 with 𝑚 entries.

3.2.2 Inter superpixel similarity

The superpixel level histogram 𝐻_𝑥 is determined and simplified based on 𝑄 by using all pixels in every superpixel 𝑆𝑃_𝑥(𝑥 = 1, … . , 𝑛), and the normalized superpixel level histogram is mathematically denoted in equation (1).

𝐻_𝑥 𝑘 = 1

𝑚𝑘 =1 (1)

Meanwhile, the inter superpixel similarity between every pair of superpixels 𝑆𝑃_𝑥 and 𝑆𝑃_𝑦 is mathematically defined in the equations (2), (3), and (4).

𝑆𝑖𝑚 𝑥, 𝑦 = 𝑆𝑖𝑚_{𝑠𝑝𝑎𝑡𝑖𝑎𝑙}(𝑥, 𝑦) × 𝑆𝑖𝑚_{𝑐𝑜𝑙𝑜𝑟}(𝑥, 𝑦) (2) Where,

𝑆𝑖𝑚_{𝑐𝑜𝑙𝑜𝑟} 𝑥, 𝑦 = ^𝑚_{𝑘 =1}𝑚𝑖𝑛{𝐻_𝑥 𝑘 ,𝐻_𝑦 𝑘 } (3) 𝑆𝑖𝑚_{𝑠𝑝𝑎𝑡𝑖𝑎𝑙} 𝑥, 𝑦 = 1 −^𝜇^𝑥^−𝜇^𝑦

𝑑 (4)

Where, 𝜇 is indicated as spatial center position of 𝑆𝑃, and 𝑑 is stated as diagonal length of the image.

3.2.3 Superpixel saliency

In most of the multi-media images, the background superpixels shows color contrast with salient object superpixels and also the spatial distribution of background superpixels are thinner than salient object superpixels that usually scatter over the whole images. On the basis of these observations, the Spatial Sparsity (SS) and Global Contrast (GC) of superpixels

(6)

ISSN: 2005-4238 IJAST 24

are calculated for measuring the superpixels saliency. The GC of 𝑆𝑃_𝑥 is determined by using the weighted color differences of other superpixels, as mentioned in equation (5).

𝐺𝐶 𝑥 = ^𝑛_{𝑦 =1}𝑊 𝑥, 𝑦 × 𝑚𝑐_𝑥− 𝑚𝑐_𝑦 (5)

Where, 𝑚𝑐_𝑥 is stated as mean color of 𝑆𝑃_𝑥. The weight 𝑊 𝑥, 𝑦 is determined by assuming the factors of spatial similarity and superpixel area, which is mathematically stated in equation (6).

𝑊 𝑥, 𝑦 = 𝑆𝑃_𝑦 × 𝑆𝑖𝑚_{𝑠𝑝𝑎𝑡𝑖𝑎𝑙}(𝑥, 𝑦) (6)

Where, 𝑆𝑃_𝑦 is indicated as number of pixels in the superpixel. In addition, the normalized GC measure of 𝑆𝑃_𝑥 is determined by using equation (7).

𝑁𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑 𝐺𝐶 𝑥 = ^{𝐺𝐶 𝑥 −𝐺𝐶}^𝑚𝑖𝑛

𝐺𝐶𝑚𝑎𝑥−𝐺𝐶𝑚𝑖𝑛 (7)

Where,𝐺𝐶_𝑚𝑖𝑛 𝑎𝑛𝑑 𝐺𝐶_𝑚𝑎𝑥are stated as minimum and maximum value in the GC measures of all superpixels. For every superpixel 𝑆𝑃_𝑥, the spatial spread color distribution is represented in equation (8).

𝑆𝑆 𝑥 = ^𝑛_{𝑦 =1}𝑆𝑖𝑚 𝑥,𝑦 ×𝐷(𝑦) 𝑆𝑖𝑚 𝑥,𝑦 𝑛𝑦 =1

(8)

Where, 𝐷(𝑦) is denoted as Euclidean spatial distance from the center position of𝑆𝑃_𝑦. Besides, the inverse normalization operation is carried-out on the spatial spread measures for obtaining the normalized spatial spread measures for each superpixels that is mathematically represented in equation (9).

𝑁𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑 𝑆𝑆 𝑥 = ^{𝑆𝑆 𝑥 −𝑆𝑆}^𝑚𝑎𝑥

𝑆𝑆𝑚𝑖𝑛−𝑆𝑆𝑚𝑎𝑥 (9)

Where, 𝑆𝑆_𝑚𝑖𝑛 𝑎𝑛𝑑 𝑆𝑆_𝑚𝑎𝑥 are indicated as minimum and maximum spatial spread measures of all superpixels. Then, the inter superpixel similarity measures are exploited for refining the SS and GC measures. The refined SS and GC measures are mathematically stated in the equations (10) and (11).

𝑅𝑆𝑆 𝑥 = ^𝑛_{𝑦 =1}𝑆𝑖𝑚 (𝑥,𝑦)×𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑 𝑆𝑆 𝑦 𝑆𝑖𝑚 (𝑥 ,𝑦)

𝑛𝑦 =1

(10) 𝑅𝐺𝐶 𝑥 = ^𝑛𝑦 =1𝑆𝑖𝑚 (𝑥,𝑦)×𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑 𝐺𝐶 𝑦

𝑆𝑖𝑚 (𝑥,𝑦) 𝑛𝑦 =1

(11)

Where, 𝑅𝑆𝑆 is indicated as refined spatial sparsity and 𝑅𝐺𝐶 is stated as refined global contrast. At last, the saliency measure of each superpixel is calculated by performing superpixel wise multiplication that is represented in equation (12). The saliency detected images are graphically denoted in figure 3.

𝑆𝑎𝑙 𝑥 = 𝑅𝐺𝐶(𝑥) × 𝑅𝑆𝑆(𝑥) (12)

(7)

ISSN: 2005-4238 IJAST 25

Figure 3. a) Superpixel applied image, b) saliency detected image, c) Segmented image

3.3 Feature extraction

After segmenting the objects from images, feature extraction is accomplished to extract the feature values. In object feature extraction, hybrid feature descriptors are applied to extract the feature information from the segmented data. The hybrid feature descriptors includes HOG, LTP, and colour moments. A brief description about hybrid feature extraction is detailed below.

3.3.1 Histogram of oriented gradients

Initially, the HOGdescriptor divides the image into small spatial regions and for each region it creates a1D gradient orientation histogram with gradient direction and magnitude. A key characteristic of HOG feature is to capture the local appearance of objects, and to account the invariance in object transformations and illumination conditions. Edge information about the gradients are determined by applying HOG feature vector. Initially, a gradient operator𝑁is applied to calculate the gradient value.The gradient point of the image(𝑥, 𝑦) is mathematically indicated in the equation (13).

𝐺_𝑥 = 𝑁 × 𝐼 𝑥, 𝑦 and 𝐺_𝑦 = 𝑁^𝑇 × 𝐼 𝑥, 𝑦 (13)

Where, 𝐺_𝑥is represented as horizontal direction of the gradient, and 𝐺_𝑦is indicated as vertical direction of the gradient. Generally, the image detection windows are partitioned into several spatial regions, which are named as cells. Hence, the gradient magnitude of the image pixel is determined along with the edge orientation. Refer the equations (14) and (15) for calculating the gradient magnitude 𝐺 𝑥, 𝑦 and edge orientation 𝜃 𝑥, 𝑦 of the image.

𝐺 𝑥, 𝑦 = 𝐺_𝑥 𝑥, 𝑦 ²+ 𝐺_𝑦 𝑥, 𝑦 ² (14)

(8)

ISSN: 2005-4238 IJAST 26

𝜃 𝑥, 𝑦 = 𝑡𝑎𝑛^{−1 𝐺}^𝑦^(𝑥,𝑦)

𝐺_𝑥(𝑥,𝑦) (15)

Where, 𝜃 𝑥, 𝑦 is represented as edge orientation of the image, 𝐼is indicated as input image, 𝑁 is represented as gradient operator, and 𝑁^𝑇 is stated as transformation of gradient point.

After calculating the histogram values, the normalization process is accomplished to eliminate the illumination conditions and noise from the collected images. Normalization is an essential phase in HOG feature descriptor that helps to maintain discriminative characteristics and also to perform consistently even against the factors like background- foreground contrast and local illumination variations in the collected images. The HOG feature descriptor includes four dissimilar patterns of normalizations: L2-norm, L2-Hys, L1- Sqrt and L1-norm. Among these normalizations, L2-norm gives better performance in object detection and classification that is mathematically denoted in equation (16).

𝐿_{2−𝑛𝑜𝑟𝑚} : f = ^𝑥

||𝑥||₂²+𝑒²

(16)

Where, 𝑒is represented as small positive value,𝑓is denoted as feature extracted vectors, 𝑥 is represented as non-normalized vector in histogram blocks and ||𝑥||₂²is indicated as 2-norm of HOG normalization.

3.3.2Colour moments

Colour moments are implemented to extract the features from the images that depends on the image colour distributions like mean, standard deviation and skewness. In this research, these three colour moments are applied, because the low order moments provide best colour information. These three colour moments: mean𝑒_𝑥, standard deviation𝜎_𝑥 and skewness 𝑠_𝑥 are mathematically denoted in the equations (17), (18), and (19).

𝑒_𝑥 = 1 𝑛^𝑝^𝑥𝑦

𝑛𝑦 =1 (17)

𝜎_𝑥 = (1 𝑛 ^𝑛_{𝑦 =1}(𝑝_𝑥𝑦 − 𝑒_𝑥)²) (18)

𝑠_𝑥 = (1 𝑛³ ^𝑛_{𝑦 =1}(𝑝_𝑥𝑦 − 𝑒_𝑥)³) (19)

Where, 𝑝_𝑥𝑦is indicated as colour value of 𝑦^𝑡ℎ image pixel from 𝑥^𝑡ℎchannel, 𝑛 is represented as number of pixels in the image, 𝑒_𝑥 is denoted as average value of the𝑥^𝑡ℎchannel, 𝜎_𝑥 is stated as variance, and 𝑠_𝑥 is represented as skewness of the𝑥^𝑡ℎchannel.

3.3.3 Local ternary pattern

LTP is an extension of Local Binary Pattern (LBP) that utilizes a thresholding constant for thresholding the pixels into three values. Let us consider, 𝑘 as a thresholding constant, 𝑝 as a neighboring pixels, and 𝑐 as a center pixel value. The result of the thresholding is determined in the equation (20).

(9)

ISSN: 2005-4238 IJAST 27

{

1 𝑖𝑓 𝑝 > 𝑐 + 𝑘

0 𝑖𝑓 𝑝 > 𝑐 − 𝑘 𝑎𝑛𝑑 𝑝 < 𝑐 + 𝑘

−1 𝑖𝑓 𝑝 < 𝑐 − 𝑘

(20)

In this scenario, every threshold pixel has one of the three values and the neighboring pixels are combined after thresholding into a ternary pattern.Calculating the histogram of ternary values results in a large range, so the ternary pattern is split into two binary patterns.

The histograms are concatenated forgenerating a descriptor (double the size of LBP), which is very successful in the application of object detection. The basic idea of LTPis to transformthe intensity space to order space, where the order of neighboring pixels is utilized for creating a monotonic change illumination invariant code for everyimage.

3.4 Feature selection using reliefF algorithm

ReliefF algorithm is used to select the optimal features to perform better classification and also it is very robust while dealing with real time and noisy data. At first, reliefF algorithm randomly chooses the instances 𝑟_𝑖and then searches for 𝑘nearest neighbor in the same classis called as nearest hit 𝐻_𝑗and in the dissimilar classes is called as nearest miss𝑀_𝑗. Generally, theManhattan distance measure is used to identify the nearest miss𝑀_𝑗 and nearest hit 𝐻_𝑗instances. The major advantage of Manhattan distance is that it needs only limited time for deciding the distances between the instances𝑟_𝑖, and also it utilizes only minimum number of features for representing the data that is enough to attain accurate neighbourhood selection and better object detection. Then, reliefF algorithm updates the quality estimation 𝑊[𝐴] for all the attributes𝐴 that mainly depends on the values of𝑟_𝑖, 𝐻_𝑗, and𝑀_𝑗.

If the instances 𝐻_𝑗 and 𝑟_𝑖 have same values, then the attribute 𝐴 is distinct into two instances with the same classes that is necessary for minimizing the quality estimation 𝑊 𝐴 .In contrast, if the instances 𝐻_𝑗 and 𝑟_𝑖 have different values, then the attribute 𝐴is distinct into two instances with the different classes that is necessary for maximizing the quality estimation 𝑊[𝐴].The entire procedure is repeated for 𝑚 times, where 𝑚is indicated as a user defined parameter. In this work, the user-defined parameter is fixed as twenty. In reliefF algorithm, the quality estimation 𝑊[𝐴 ] is updated by using the equations (21), (22), and (23).

𝑊 𝐴 = 𝑊 𝐴 + (𝐻 + 𝑀 )/20(21) Where,

𝐻 = − ^𝑘_𝑗₌₁𝐷 (𝐴 , 𝑟 _𝑖 ,𝐻_𝑗)/𝑘 (22)

𝑀 = [( ^𝑃 ^𝐶

1−𝑃 𝑐𝑙 𝑟_𝑖 ) ^𝑘_𝑗₌₁𝐷 (𝐴 , 𝑟_𝑖 ,𝑀_𝑗(𝐶 ))] /𝑘

𝐶≠𝑐𝑙 (𝑟_𝑖) (23)

Where, 𝑊[𝐴 ] is represented as quality estimation, 𝑟_𝑖 is denoted as instances, 𝐴 is indicatedad attributes, 𝐻_𝑗 and 𝑀_𝑗 are represented as nearest hit and nearest miss values,

(10)

ISSN: 2005-4238 IJAST 28

𝑃 (𝐶 ) is denotedas prior class, 𝐷 is exemplified as distance between the selected

instances𝑟_𝑖 , 𝐶 is stated as total number of classes, and 𝑐𝑙 (𝑟 _𝑖 ) is indicated as class of the 𝑖 ^𝑡^ℎ sample.

3.5 Classification using multi support vector machine

After obtaining the optimal feature values, classification is carried out by using MSVM classifier. Usually, regular SVM is a two-class classification methodology. Hence, it is essential to concentrate on the multi binary classification issues for extending the normal SVM classifier to multi-class SVM classifier. In conventional SVM classification approach, the multi-class classification is rehabilitated into 𝑛^𝑡^ℎ two class and𝑖 ^𝑡^ℎ two-class issues, where class 𝑖 is distinct from the remaining classes. The two important methods in SVM classifier are One-Against-All (1-a-a) and One-Against-One (1-a-1). In this scenario, 1-a-a methodology gives solution to create a binary classifier for every class that helps to separate the objects in the same class. In 𝑛^𝑡^ℎclass, 1-a-a approach generates 𝑛^𝑡^ℎbinary classifiers, and the𝑖 ^𝑡^ℎclassifier is trained with the data samples in 𝑖 ^𝑡^ℎclass with the positive labels and the residual data samples are trained with the negative labels. The result of 𝑛^𝑡^ℎclass in 1-a-a approach relates with the 1-a-1 approach for obtaining the highest output value. In addition, the 1-a-1 approach is the resultant of previous researches on two class classifier.

The MSVM classifier generates all possible two class classifiers from the training sets of 𝑛^𝑡^ℎclasses, and it trains only two out of 𝑛^𝑡^ℎclasses that results in 𝑛 × (𝑛 − 1)/2 classifiers. In MSVM, decision function is an active way to moderate the multi-class problems that is constructed by assuming all the𝑛^𝑡^ℎclasses. The MSVM classifier is an extension of SVM that is mathematically represented in the equations (24), (25), and (26).

𝑚𝑖𝑛 Φ 𝑤, ξ = 1/2 ^𝑘_𝑚₌₁ 𝑤_𝑚.𝑤_𝑚 + 𝑐 ^𝑙_𝑖 ₌₁ _𝑚_≠_𝑦𝑖 𝜉_𝑖^𝑚 (24) Subjected to,

𝑤_𝑦𝑖 .𝑥_𝑖 + 𝑏_𝑦𝑖 ≥ 𝑤_𝑦𝑖 .𝑥_𝑖 + 𝑏_𝑚+ 2 −𝜉_𝑖^𝑚,(25) 𝜉_𝑖^𝑚≥ 0,𝑖 = 1,2,3 … 𝑙 , 𝑚, 𝑦𝑖 ∈ 1,2,3 … 𝑘 , 𝑚 ≠ 𝑦𝑖 (26) At last, the decision function is represented in equation (27).

𝑓 𝑥 = 𝑎𝑟𝑔 𝑚𝑎𝑥 𝑤_𝑖 .𝑥 + 𝑏_𝑖 , 𝑖 = 1,2, 3, . . 𝑘 (27)

Where, 𝜉_𝑖^𝑚is stated as slack variables, 𝑙 is considered as training data point, 𝑐 is represented as user’s positive constant, 𝑦𝑖 is represented as class of training data vectors 𝑥_𝑖 , and 𝑘 is stated as number of classes.

4. Experimental result and discussion

In the experimental phase, the proposed work was simulated by utilizing MATLAB (version 2018a) with 3.0 GHZ-Intel i3 processor, 2TB hard disc, and 4 GB RAM. The proposed work performance was related with an existing work [11] on Caltech database

(11)

ISSN: 2005-4238 IJAST 29

forestimatingthe effectiveness and efficiency of the proposed work. The performance of the proposed work was validated in light of accuracy,FRR, FAR, error rate, sensitivity, and specificity.

4.1 Performance metric

Performance metric isthe procedure of collecting, reporting and analysing information about the performance of a group or individual. The mathematical equations of FRR,accuracy, FAR,sensitivity, error rate and specificity are denoted in the equations (28), (29), (30), (31), (32), and (33).

𝐹𝑅𝑅 =_𝐹𝑁^𝐹𝑁

+𝑇𝑃 × 100(28) 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = _𝐹𝑁 ^𝑇𝑃 ⁺^𝑇𝑁

+𝑇𝑃 +𝑇𝑁+𝐹𝑃 × 100(29) 𝐹𝐴𝑅 =_𝐹𝑃^𝐹𝑃

+𝑇𝑁 × 100(30)

𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 = ^𝑇𝑃

𝐹𝑁+𝑇𝑃 × 100(31)

𝐸𝑟𝑟𝑜𝑟 𝑟𝑎𝑡𝑒 = (1 −𝑎𝑐𝑐𝑢𝑎𝑟𝑐𝑦 ) × 100(32)

𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 = ^𝑇𝑁

𝐹𝑃 +𝑇𝑁 × 100 (33)

Where, 𝑇𝑃 is signified as true positive, 𝐹𝑃 is represented as false positive,𝐹𝑁 is indicated as false negative, and 𝑇𝑁 is denoted as true negative.

4.2 Quantitative study on Caltech dataset

In this section, Caltech dataset is used for evaluating the performance of the proposed and existing works. In table 1, the performance of the proposed work is validated by means of specificity, sensitivity and accuracy. In this scenario, the performance evaluation is validated for 100 sampleswith 80% training and 20% testing of data. The validation result shows that the MSVM classifier out-performed the existing classification methodologies like k-Nearest Neighbour (KNN), random forest, and Deep Neural Network (DNN). The sensitivity of MSVM classifier is 91.57% and the comparative classification methodologies; KNN, random forest, and DNN delivers 67.15%, 68.27% and 63.45% of sensitivity. Correspondingly, the specificity of MSVM classifier is 94.12% and the comparative classification methods delivers 68.37%, 71.34% and 69.12% of specificity. Additionally, the accuracy of MSVM classifier is 95.34% and the comparative classification methods; KNN, random forest, and DNNachieves60%, 55% and40% of accuracy. Graphical representation of proposed work in light of sensitivity, specificity, and accuracy is indicated in the figure 4.

Table 1. Performance evaluation of proposed work in light of sensitivity, specificity, and accuracy

Classifier Sensitivity (%) Specificity (%) Accuracy (%)

(12)

ISSN: 2005-4238 IJAST 30

Figure 4. Graphical representation of proposed work in light of sensitivity, specificity, and accuracy

In table 2, the performance of proposed work is validated with dissimilar classifiers in light of FRR, FAR, and error rate. The FRRof MSVM classifier is 6.72% and the comparative classification methodologies: KNN, random forest, and DNNdelivers 28%, 10%

and 30.11% of FRR.Similarly, the FAR of MSVM classification approach is 5.14% and the comparative classification methods delivers 7.77%, 5.55% and 16.78% of FAR. Additionally, the error rate of MSVM classifier is 4.66% and the comparative classification methods;

KNN, random forest, and DNNattains 40%, 45% and60% of error rate.Tables 1 and 2 clearly shows that the MSVM classifier performs effectively compared to other existing classification methods onCaltech dataset. Graphical representation of proposed work in light of FRR, FAR, and error rate is indicated in the figure 5.

Table 2. Performance evaluation of proposed work in light of FRR, FAR, and error rate

KNN 67.15 68.37 60

Random forest 68.27 71.34 55

DNN 63.45 69.12 40

MSVM 91.57 94.12 95.34

Classifier FRR (%) FAR (%) Error rate (%)

KNN 28 7.77 40

Random forest 10 5.55 45

DNN 30.11 16.78 60

MSVM 6.72 5.14 4.66

(13)

ISSN: 2005-4238 IJAST 31

Figure 5. Graphical representation of proposed work in light of FRR, FAR, and error rate Table 3 represents the performance of the proposed work with dissimilar optimization techniques like Principal Component Analysis (PCA), Probabilistic PCA (PPCA) and infinite feature selection. In reliefF algorithm, the MSVM classifier averagely improved the accuracy in object detection and classification upto 60% compared to other algorithms. In this research, the undertaken feature extraction and selection methods determine the non-linear and linear features of objects, and also preserves the quantitative relationships between the low level and high level features. The undertaken performance measures confirms that the proposed work performs well in object detection and classificationcompared to the existing works.

Table 3. Performance evaluation of proposed work with dissimilar optimization techniques

Table 4 represents the cross-validation of the proposed work, which is evaluated on Caltech dataset. In this scenario, MSVM classifier averagely delivered a sensitivity of 91.57% and specificity of 94.12%, and accuracy of 95.34%. Similarly, the FRR, FAR, and error rate of the proposed work in object detection and classification analysis is 6.72%, 5.14% and 4.66%, respectively. The outcome of MSVM classifier showed that the proposed work performed well and delivered a significant contribution toobject detection and classification.

Optimizer Sensitivity (%)

Specificity (%)

Accuracy (%)

FRR (%)

FAR (%)

Error rate (%)

PCA-MSVM 93.33 30 31 15 10.67 69

PPCA-MSVM 83.56 41.24 45.54 12.36 9.45 54.46

Infinite-MSVM 89.14 91.12 93.56 8.14 7.12 6.44

ReliefF-MSVM 91.57 94.12 95.34 6.72 5.14 4.66

(14)

ISSN: 2005-4238 IJAST 32

Table 4. Cross validation of proposed work

4.3 Comparative study

The comparative study of proposed and existing work is represented inthe table 5.R.Bhuvaneswari, and R. Subban, [11]developed a new object recognition system on the basis of feature extraction and points of interest. In this literature paper, the Gabor and curvelet features were extracted from the area of interest, and classified by utilizing SVM classifier. In this work, Caltech dataset was used for verifying the efficiency of developed method.In the experimental outcome, the developed work achieved 95.4% of accuracyin Aero-plane modality, and 94.4% of accuracy in human face modality.Related to the existing work, the proposed work attained 95.90% of accuracy in Aero-plane modality, and 96.89% of accuracy in human face modality, which were superior compared to the existing work.

Table 5. Comparative study of proposed and existing work

Methodology Objects Accuracy (%)

Gabor and curvelet features with SVM [11]

Aero-plane 95.4

HumanFace 94.4

Mean 94.9

Proposed work

Aero-plane 95.90

HumanFace 96.89

Mean 96.39

Parameters Validation

Database Caltech dataset

Segmentation Superpixel based saliency detection

Feature method Hybrid features

Feature selection ReliefF

Classifier MSVM

Total number of samples 100 samples (10 classes) Classes

Aero-plane, bottle, building, ground, horse, human, road, sheep, sky, and water

Training samples 80

Testing samples 20

Sensitivity (%) 91.57%

Specificity (%) 94.12%

Accuracy (%) 95.34%

FRR (%) 6.72%

FAR (%) 5.14%

Error rate (%) 4.66%

(15)

ISSN: 2005-4238 IJAST 33

5.Conclusion

In this paper, a new system is proposed for object detection and classification, especially for assisting visually impaired person. The main aim of this experimental study is to develop a proper feature extraction and feature selection methods to classify the individual objects such asaero-plane, bottle, building, ground, horse, human, road, sheep, sky, and water. In this research, reliefF feature selection algorithm is used to select the optimal features. By selecting the optimal features from the extracted features, a set of most dominant discriminative features are obtained. These optimal features are classified by using MSVM classifier. Compared to the existing work, the proposed work delivered an effective performance by means of quantitative analysis and comparative analysis. From the experimental investigation, the proposed work attainedaveragely 95.34%of accuracy, but the existing work obtained a limited accuracy on Caltech database. In future work, a deep learning concept is included in the proposed work for further improving the performance of object detection and classification.

References

[1]C.R. Viau, P.Payeur, and A.M. Cretu, “Multispectral image analysis for object recognition and classification”, In Automatic Target Recognition XXVI,International Society for Optics and Photonics, vol.9844, pp.98440N, 2016.

[2]X. Wei, S.L.Phung, and A. Bouzerdoum, “Object segmentation and classification using 3- D range camera”, Journal of Visual Communication and Image Representation, vol.25, no.1, pp.74-85, 2014.

[3]X.Zeng, F.Chen, and M. Wang, “Shape group Boltzmann machine for simultaneous object segmentation and action classification”,Pattern Recognition Letters, vol.111, pp.43-50, 2018.

[4]L. Zhao, Z.He, W.Cao, and D. Zhao, “Real-time moving object segmentation and classification from HEVC compressed surveillance video”, IEEE Transactions on Circuits and Systems for Video Technology, vol.28, no.6, pp.1346-1357, 2016.

[5]C.W. Liang, and C.F. Juang, “Moving object classification using a combination of static appearance features and spatial and temporal entropy values of optical flows”,IEEE Transactions on Intelligent Transportation Systems, vol.16, no.6, pp.3453-3464, 2015.

[6]F. Zhu, M.Bosch, N.Khanna, C.J.Boushey, and E.J. Delp, “Multiple hypotheses image segmentation and classification with application to dietary assessment”,IEEE Journal of Biomedical and Health Informatics, vol.19, no.1, pp.377-388, 2014.

[7] F. Lecumberry, Á.Pardo, and G. Sapiro, “Simultaneous object classification and segmentation with high-order multiple shape models”,IEEE Transactions on Image Processing, vol.19, no.3, pp.625-635, 2009.

[8]F. Chen, H.Yu, and R. Hu, “Shape sparse representation for joint object classification and segmentation”,IEEE Transactions on Image Processing, vol.22, no.3, pp.992-1004, 2012.

[9]R. Girshick, J. Donahue, T.Darrell, and J. Malik, “Region-based convolutional networks for accurate object detection and segmentation”,IEEE transactions on pattern analysis and machine intelligence, vol.38, no.1, pp.142-158, 2015.

(16)

ISSN: 2005-4238 IJAST 34

[10]S.R. Kheradpisheh, M.Ganjtabesh, S.J.Thorpe, and T. Masquelier, “STDP-based spiking deep convolutional neural networks for object recognition”,Neural Networks, vol.99, pp.56-67, 2018.

[11]R.Bhuvaneswari, and R. Subban, “Novel object detection and recognition system based on points of interest selection and SVM classification”,Cognitive Systems Research, vol.52, pp.985-994, 2018.

[12]P.Sengottuvelan, and R. Arulmurugan, “Object classification using substance based neural network”, Mathematical Problems in Engineering, 2014.

[13]H.Ren, and Z.N. Li, “Object detection using boosted local binaries”,Pattern Recognition, vol.60, pp.793-801, 2016.

[14]X.S. Tang, K. Hao, H.Wei, and Y.Ding, “Using line segments to train multi-stream stacked autoencoders for image classification”,Pattern Recognition Letters, vol.94, pp.55-61, 2017.

[15]W.Tao, Y. Zhou, L. Liu, K. Li, K. Sun, and Z. Zhang, “Spatial adjacent bag of features with multiple superpixels for object segmentation and classification”,Information Sciences, vol.281, pp.373-385, 2014.

[16]http://www.vision.caltech.edu/Image_Datasets/Caltech101/