Neural Network Based Object Detection by Utilizing GMM with Histogram Features

(1)



Abstract— Moving object identification and following are the

more vital task in video reconnaissance as well as PC vision

applications. object recognization is the system of finding the

non-stationary substances in the picture successions.

Recognition is the initial move towards following the moving

item in the video. object action is the following essential stride

to track. Here GMM (Gaussian Mixture Model) was used for

detection the first step of object detection. While Object

representation or action performance is done by training the

error back propagation neural network where this trained

neural network identify and classify the action of the object as

well. Real dataset was used in experiment and comparison was

done on different evaluation parameters. It was obtained that

proposed work is better as compare to other existing methods.

Keywords—Artificial Neural Network, Histogram Feature,

Human Action detection, Digital Image processing.

I. INTRODUCTION

Video examination is overall the most difficult job in today‟s

condition. Following the moving item has pulled in numerous

scientists in the field of PC vision and picture preparing.

Video exploration is the way toward watching the conduct,

occasions and other vital proof, ordinarily of the general

population for supervision and securing them. Exploration is

principally utilized by governments for social event data, for

examining and keeping the wrongdoing. Video observation

movement can be separated into three sorts specifically.

Manual video exploration, Semi-independent observation and

Fully autonomous observation. In manual exploration human

is in charge of investigating the substance of the video.

Semiautonomous exploration framework includes video

handling by framework with the communication of the human

when essential. Fully autonomous exploration framework, the

framework will play out each undertaking, for example,

discovery of movement, following and so forth., without the

need of human connection. Mechanized checking strategy

otherwise called the Intelligence Visual Surveillance (IVS)

includes the examination and translation of articles exercises,

notwithstanding object location and following to perceive the

visual activities of the scene. The primary assignment of IVS

comprises of wide zone exploration control and scene

understanding. Object following must manage a few

brightening changes and surely understood difficulties.

Basically video examination is classified into three

fundamental stages: moving substance recognizing, finding

the direction of object starting with one casing then onto the

next edge and investigation of element tracks to distinguish

their execution. Following items in a static situation is

substantially less demanding than following articles in

powerful condition. In like manner, in a dynamic biological

framework together foundation and substance fluctuates [1].

On a basic level, to determine this regular unconstrained issue

is inflexible. One can put a gathering of impulsion to make

this issue liable. The more the impulsions, the issue is casual

to settle.

Neural Network Based Object Detection by

Utilizing GMM with Histogram Features

(2)

Class acknowledgment and location are two sections of object

acknowledgment. The goal of classification acknowledgment

is to order a object into one of a few predefined classes. The

objective of recognition is to recognize objects from the

foundation. There are different protest acknowledgment

challenges. Normally, objects must be identified against

jumbled, uproarious foundations and different objects under

various enlightenment and differentiation situations.

Appropriate element portrayal is a significant stride in a

protest acknowledgment framework as it enhances execution

by separating the object from the foundation or different

protests in various lightings and situations. Protest

acknowledgment highlights are sorted into two gatherings -

meager and thick portrayals. For meager component

portrayals, intrigue point locators are utilized to recognize

structures, for example, corners and blobs on the object. A

component Object discovery is a testing field in PC perception

and example investigation explore zone.

III. RELATED WORK

In [1] numerous recordings catch and take after remarkable

protests in a scene. Identifying such remarkable articles is

accordingly of awesome interests to video investigation and

hunt. In any case, the revelation of remarkable objects in an

unsupervised way is a testing issue as there is no earlier

information of the notable items gave. Not the same as

existing striking article discovery strategies, researcher

propose to recognize and track notable protest by finding a

spatio-worldly way which has the biggest amassed saliency

thickness in the video. Propelled by the perception that notable

video protests generally show up in continuous edges,

researcher use the movement intelligibility of recordings into

the way disclosure and make the notable object recognition

more powerful. With no earlier learning of the notable objects,

our technique can distinguish striking objects of different

shapes and sizes, and can deal with loud saliency maps and

moving cameras.

In [2] analyst embrace static webcam for catching live scene

and additionally put away database of portioned recordings.

Following requires area and state of object in each divided

edge. A programmed relinquished protest recognition

framework commonly utilizes a mix of foundation subtraction

and object following to search for certain predefined examples

of action that happen when a protest is deserted by its

proprietor.

In [3] calculation for ongoing recognition and following of

object is proposed. The framework comprises of camera and

Raspberry pi and the PC. Camera ceaselessly catches the

video. It is send to Raspberry pi for additionally preparing.

Blob location investigation is performed in which objects gets

distinguished in view of their shapes. Bouncing box

calculation is connected which ceaselessly tracks the object in

view of data from the limit pixels by utilizing flat and vertical

output from left to right and best to the base to find the limit

focuses. These limit indicates are utilized track the object

through limit box

In IEEE 2014 ,paper titled Moving Object Detection Based on

Temporal Information [3] gives the blueprint on Makes usage

of transient information for period of development saliency

which is then trailed by most prominent entropy and fluffy

creating technique to perceive moving target having positive

point No prior learning of the foundation sham is required and

Robust to delicate establishment developments and camera

butterflies, No customer relationship for parameter tuning is

required and Efficiently deals with the disturbs of the

establishment yet having negative viewpoint as Shadow is

settled nearby moving thing which may be misclassified as

object itself.

V. PROPOSED WORK

PreProcessing

As video is the collection of frames which is called as frame.

Here frames are display in fix rate. This rate should be greater

then 16 frame per second. As human cannot judge one by one

display of frames if rate is more then 16 frame per second. As

(3)

in object position is new information of the frame 9, 10]. So

reading of video means conversion of video in sequence of

frames of RGB format.

Fig.1. Block diagram of proposed model.

Histogram Feature

Original Video: As shown in block diagram first block will

read original video. So collection of frames in sequence of

matrix is term as reading of video. Now all frames are in form

of two dimensional matrix and complete video is of three

dimensional matrix. Here matrix is of three dimension first is

for row, other for column, and third dimension for the frame

sequence.

The histogram is alludes as discrete function where Histogram

identifies with the recurrence of occurring of each dark level

in that picture, that shows it informed us how the estimations

of individual pixel in a picture are evacuated. Histogram is

given as:

h(rk)=nk/N

Where and are intensity level and number of pixels in image

with intensity respectively.

Gaussian Mixture Model (Object Detection)

Staufer and Grimson [5] have proposed a versatile parametric

GMM to decrease the impact of little tedious movements like

trees, edges and brightening variety. A pixel I at position x and

time t is demonstrated as a blend of K Gaussian conveyances.

The present pixel esteem takes after the likelihood conveyance

given by



   



k

i

i x t i x t x t i x t x

t

w

I

P

1

2 ) , , 1 ( ) , , 1 ( , ) , , 1 (

,

)

*

(

,

)

(







where



is the Gaussian probability density function.



















2 2

2

)

(

exp

2

1

)

,

(











I

While w,



,



2 are the weight, mean

value esteem and change of the ith Gaussian in the blend at

time t - 1. For keeping up the Gaussian blend display, the

parameters w,



,



2should be refreshed in view of the new

pixel It,x. A pixel is said to be coordinated, if It,x lies within σ

standard deviations of a Gaussian. In our case σ2 lies between

1 and 5. On the off chance that one of the K Gaussian is

coordinated, the coordinated Gaussian is refreshed as

following: Video Dataset

Pre-Processing

Training Feature Vector

Trained Neural Network Histogram Feature

GMM Based Object Detection

(4)

)

(

)

1

(

( 1, ,) , )

, ,

(txi





t xi



I

tx







_



)

,

(

)

,

(

)

1

(

₍2 ₁_, _,₎ ₍_, ₎ _,_, ₍_, ₎ _,_,

2 ) , ,

( txi tx txi

T x t i x t i x

t







I



I











where

)

,

|

(

I

_t_,_x _t_₁_,_x_,_i _t_₁_,_x_,_i











is a learning rate that controls how fast µand σ2 converges to

new observations.

The weight of the K Gaussian is adjusted as follows:

)

(

)

1

(

₁_, _, _,

,

,xi t xi ti

t

w

M

w







_





Where Mt,i 1is set for the matched Gaussian and Mt,i = 0

for the others. The learning rate _ is used to update the weight

and its value ranges between 0 and 1.

If none of the K Gaussian component matches the current

pixel value, the least weighted component is replaced by a

distribution with the current value as its mean, a high variance,

and a low value of weight parameter is chosen. Thereafter, the

weights are normalized.

The K circulations are sorted in dropping request by w/σ This

requesting moves the most plausible foundation with high

weight and low fluctuation at the top. The main B Gaussian

dispersion which surpass certain limit T are held for the

foundation conveyances. In the event that a little estimation of

T is picked, the foundation model is uni-modular and is

multi-modular, if higher estimation of T is picked. On the off chance

that a pixel It,x does not matches with any of the foundation

part, then the pixel is set apart as forefront.

Training of Error Back Propagation Neural Network

(EBPNN):

In this step after background separation pixel position of the

foreground are store in feature vector where both x, y

co-ordinates are store. This can be understands as the pixel

position of the foreground from each frame are store in a fix

size vector which act as the shape of the object.

● Let us assume a four layer neural network.

● Now consider i as the input layer of the network. While j is consider as the hidden layer of the

network. Finally k is consider as the output layer of

the network.

● If wij represents a weight of the between nodes of

different consecutive layers.

● So the output of the neural network is depend on the

below equation:

where, Xj =  xi . wij - j , 1 i  n; n is the number of inputs

to node j, and j is threshold for node j

● The error of output neuron k after the activation of

the network on the n-th training example (x(n), d(n))

is:

ek(n) = dk(n) – yk(n)

● The network error is the sum of the squared errors of

the output neurons:

● The total mean squared error is the average of the

network errors of the training examples.

● The Back propagation weight update rule is based on

the gradient descent method:

− It takes a step in the direction yielding the

maximum decrease of the network error E. − This direction is the opposite of the gradient

of E.

● Iteration of the Backprop algorithm is usually

terminated when the sum of squares of errors of the

output values for all training data in an epoch is less

than some threshold such as 0.01

ij ij ij

w







ij ij

w

-w









E

(n)

e

E(n)

2 k









N 1 n N 1

AV

E

(n)

(5)

VI. EXPERIMENT AND RESULTS

This area presents the exploratory assessment of the proposed

work of object identification in video. All calculations and

utility measures were executed utilizing the MATLAB device.

The tests were performed on a 2.2 GHz Intel Core i3 machine,

outfitted with 4 GB of RAM, and running under Windows 7

Professional. Investigation done on the video that are of

various condition.

Dataset

Action Dataset were made over homogeneous foundations

with a static camera with 25fps casing rate. The successions

were down examined to the spatial determination of 160x120

pixels and have a length of four seconds in normal

.

Evaluation Parameters

Execution time : This is the time taken by the calculation to

identify the activity in the video or it can likewise be said as

far as the aggregate time taken by the framework, which

incorporates the video perusing time and execution time

totally.

Detection Rate: As the object in a frame is identified by the

pixels. So this parameter is the ratio of number of identified

correct pixels to the total pixels which are correct or incorrect.

DR = True_Positive / (True_Positive + True_Negative)

So if DR is high then method is good.

False Alarm Rate: As selection of pixel as the object or

background is done in object detection but method which

identified false set of pixel in incorrect category is not good.

So this parameter is the ratio of number of pixels which comes

in FP category to the total pixels which are correctly identified

as well as incorrectly identified.

[image:5.612.313.567.85.499.2]

FAR = FP / (FP+TP)

Table 1 Represent different video with actions.

Results:

Actions Execution time in seconds

Previous Work [12] Proposed work

Boxing 21.0225 7.9314

Jogging 24.7468 5.2733

Hand waving 23.2536 6.602

Table 1. Comparison of proposed and previous work on

[image:5.612.305.569.569.670.2]

(6)

Table 1 shows that proposed work has highly decrease the

execution time. Here due to use of neural network for testing

detection of action is quit feasible. While for training different

type of video having various actions can be used.

Actions Detection Rate

Boxing 0.985106 0.993987

Jogging 0.971159 0.993021

Hand

waving

[image:6.612.41.294.137.251.2]

0.990189 0.0419014

Table 2. Comparison of proposed and previous work on action

localization pixels.

Table 2 shows that proposed work has highly increases the

detection rate. It has been gotten that proposed work keep

running on different condition video however evaluation

parameters are higher then past strategy in [12].

Actions False Alarm Rate

Boxing 0.0419014 0.0409304

Jogging 0.0137312 0.00973196

Hand

waving

0.0729493 0.0637819

Table 2 shows that proposed work has highly decrease the

false alarm rate. It has been gotten that proposed work keep

running on different condition video however evaluation

parameters are higher then past strategy in [12].

Actions Human Action

Boxing Standing Boxing

Jogging Walking Jogging

Hand

waving

Standing Waves

Table 3 demonstrates that proposed work has precisely

distinguish the activities what sort of changes obtained in

video. It has been gotten that proposed work keep running on

different condition video however evaluation parameters are

higher then past strategy in [12].

VII. CONCLUSIONS

Video object activity discovery was done in this work. The

key thought is to isolate the background and foreground pixels

in the edge by utilizing histogram include with Gaussian

model. Yield of the Gaussian model go about as the input

vector of the neural system. Qualities acquired from various

assessment parameters demonstrates that proposed work was

better as contrast with past work of SVM. Results

demonstrates that numerous activities are recognize from the

same prepared neural system for various activity of different

condition. Future work will include the spatial data while

elements of the video arrangement was considered.

REFERENCES

1. Ye Luo, Junsong Yuan, Jianwei Lu. ―Finding

spatio-temporal salient paths for video objects discovery‖.

1047-3203/! 2016 Elsevier

2. Dhanashree Vijayrao Madhekar1 , Prof. Mrinal

Rahul Bachute, ―Real Time Object Detection and Tracking using Raspberry‖. Volume 7 Issue No.6

International Journal of Engineering Science and

Computing, June 2017

3. Zhihu Wang, Kai Liao, Jiulongxiong, And Qi Zhang, ―Moving Object Detection Based On Temporal

Information‖, Ieee Signal Processing Letters, Vol. 21,

No. 11, Pp. 1404-1407, November 2014.

4. Wang, H., Kl¨Aser, A., Schmid, C., And Liu, C.-L.

(2013). Dense Trajectories And Motion Boundary

Descriptors For Action Recognition. International

(7)

5. C. Stauffer And W. E. L. Grimson, ―Adaptive

Background Mixture Models For Real-Time Tracking,‖ In Computer Vision And Pattern

Recognition. IEEE Computer Society, 1999, Pp.

2246–2252.

6. Viswanath Gopalakrishnan, Deepu Rajan, And Yiqun

Hu. A Linear Dynamical System Framework For

Salient Motion Detection Ieee Transactions On

Circuits And Systems For Video Technology, Vol.

22, NO. 5, MAY 2012 683.

7. W.T. Lee And H. T. Chen, ‖Histogram-Based Interest Point Detectors,‖ In Proceedings Of The Ieee

Conference On Computer Vision And Pattern

Recognition, 2009, Pp. 1590-1596.

8. Rupesh Kumar Rout A Survey On Object Detectio-

Nand Tracking Algorithms Department Of Computer

Sci- Ence And Engineering National Institute Of

Technology Rourkela Rourkela 769 008, India.

9. Jiuyuehao, Chao Li, Zuwhan Kim, And Zhang

Xiong. Spatio-Temporal Traffic Scene Modeling For

Object Motion Detection, Ieee, Intelligent

Transportation Systems, 2012. 9.

10. Liu Gangl , Ningshangkun ,You Yugan ,Wen

Guanglei And Zhengsiguo, An Improved Moving

Objects Detection Algorithm, In Proceedings Of The

2013 Ieee International Conference On Wavelet

Analysis And Pattern Recognition, Pp. 96-102, 14-17

July, 2013.

11. Idrees, H., Warner, N., And Shah, M. (2014).

Tracking In Dense Crowds Using Prominence And

Neighborhood Motion Concurrence. Image And

Vision Computing, 32(1):14–26.

12. Zhong Zhou, Member, IEEE, Feng Shi, And Wei Wu. ―Learning spatial And Temporal Extents Of human Actions For Action Detection―.DOI

10.1109/TMM.2015.2404779, IEEE Transactions On