The projected human activity recognition system is illustrated in Figure 1. In our approach, the data set containing activities such as such as running, walking, jogging, waving, clapping and boxing is divided into two sections: training set, and testing set. The proposed system is also divided into two phases: training and testing. In training stage, we extract the histogram of oriented gradient features (HOG) from each video frame. The HOG feature vectors from n consecutive video frames are processed to generate action pattern. The generated action patterns are used to train the Probabilistic Neural network (PNN) classifier for all activities. In testing stage, for each human activity we generate the action pattern and feed into the PNN classifier for action detection.
Video is a sequence of images and action is set of sequence of small movements. Human detection is first task of our algorithm so Navneet Dalal and Bill Triggs  are normalized descriptor blocks as Histogram of Oriented Gradient (HOG) descriptors. Tiling the detection window with a dense overlapping grid of HOG descriptors and used SVM for human detection. Wei-Lwun Lu, James J. Little  proposed HOG descriptor which was constructed by converting the tracking region to the grids of Histograms of Oriented Gradient (HOG) descriptor, and then used Principal Components Analysis (PCA) to project the HOG descriptor to a linear subspace. For recognition purpose Maximum Likelihood Estimation (MLE) was executed based on the observations using the Hidden Markov Model classifier. William Brendel and Sinisa Todorovic  presented an exemplar-based approach for identification of human postures where HOG features were computed and dictionary of discriminative features was learned. The videos were represented as temporal sequences of the learned code-words and activities in query video were detected by aligning the query and exemplar time series. Huang and Hsieh  recommended Histogram of motion history image (MHIHOG) for action recognition. It modernises an action sequence into a motion history image by collecting frame variance of the sequence. The motion history image is interpreted into a HOG descriptor, and then SVM classifier is used for categorize the actions.
The use of fusion technique on an image or algorithm varies depending on the conditions or information required in the feature extraction. The study in  highlights the selection of fusion as a problem dependent. Hence, improvements in the fusion methods should be applied to reduce the loss of the required information. Histogram of Oriented Gradient (HOG) is widely used in human detection especially in the pedestrian video but the accuracy rate of detection has not yet reached the acceptable rate to discriminate the features of data sets in different scenarios such as between the standard, and with additional features (i.e., carry bag and wearing a coat). However, due to the good use of the HOG method in pedestrian detection, this method is applied as an image representation for the individual gait recognition . The person gaits recognition by analyzing and identifying the unique features in their way of walking is a challenging task. In this case, an improvement of the feature extraction method is needed to reduce the complexity of computations and increase the rate of recognition accuracy rates.
In this video forgery detection technique mainly useful to detect the spatial and temporal copy paste tampering. As it is challenging to detect this type of tampering in videos as the forged patch may invariably vary in terms of size, compression rate and type (I, B or P) or other changes such as scaling and filtering. The algorithm as in  is based on Histogram of Oriented Gradients (HOG) feature matching and video compression properties. The advantage of using HOG features is that they are robust against various signal processing manipulations. Image or frame can be represented by using a set of local histograms . These histograms count the number of occurrences of gradient orientation in a local spatial region of the image known as cell. Typically, a cell size may vary as 4x4, 6x6 or 8x8 pixels. In order to extract the HOG features, first the gradients of the image are computed followed by building a histogram of orientation at each cell. Finally the histogram obtained from each cell in a block is normalized which gives the HOG descriptor of that block, where a block may comprise of 2x2 or 3x3 cells.
This paper presents an effective nighttime vehicle detection system that combines a novel bioinspired image enhancement approach with a weighted feature fusion technique. Inspired by the retinal mechanism in natural visual processing, we develop a nighttime image enhancement method by modeling the adaptive feedback from horizontal cells and the center-surround antagonistic receptive fields of bipolar cells. Furthermore, we extract features based on the convolutional neural network, histogram of oriented gradient, and local binary pattern to train the classifiers with support vector machine. These features are fused by combining the score vectors of each feature with the learnt weights. During detection, we generate accurate regions of interest by combining vehicle taillight detection with object proposals. Experimental results demonstrate that the proposed bioinspired image enhancement method contributes well to vehicle detection.
10 Read more
Abstract— Object detection and feature extraction are main initial steps in digital image processing for surveillance system. A single object can be easily detected in an image. Multiple object in an image can be detected by using different object detectors simultaneously. The object detection methods have a wide range of application in a variety of areas including robotics, medical image analysis, surveillance, military operation and security purpose. Feature extraction is the process of extracting useful information from the input image. Here, object is detected and then extract its features by using HOG i.e. Histogram Oriented Gradient algorithm. This paper explains an algorithm which takes the video as an input and starts extracting frames from these video. Through the feature extraction algorithm, we have detected and extracted its features. Further these detected objects were classified according to the shape based criteria.
A person is a male or female is an easy task for human to recognize but it is very difficult for a machine or robot. Gender identification using voice of a person is comparatively easier than that from facial images. This is a binary classification which is useful in many applications such as targeted advertising, surveillance system, human machine interaction, content based indexing and searching, demographic collection, biometrics etc. In the present scenario identification of a face, gesture recognition and gender classification plays an important role in order to meet the secure, reliable and individualized services. In the previous time, gender recognition is based on the recognition and psychology regions but in present time people began to start thinking about this problem more technically. Now, the gender recognition is receiving more and more attention. Gender classification research started in 1990s. Golomb et al and Cottrel and Metcalfe first used the face images manually and used neural network classifier to classify the gender. Generally features can be broadly classified into 2 categories: geometric based feature and appearance based feature. They are also known as local feature and global feature respectively. Appearance based methods are based on the pixels in an image and geometric based methods are related to various properties of face such as eyes, nose, chin, eyebrow etc. Many feature extraction methods have been used for the classification of gender. The global feature method which we present in this paper has the potential to identify the gender. In this paper Histogram equalization is used to equalize the illumination effects for color images. Firstly, histogram equalization is applied to equalize the illumination changes. HOG feature methods are applied for the facial feature extraction.
In the last decade, much effort has been devoted to learning an effective descriptor via deep learning. Differ- ent types of deep neural networks have been designed to learn rich discriminative features, and a strong perform- ance has been achieved in AED. Hasan et al.  pro- posed a convolutional autoencoder framework for reconstructing a scene, and the reconstruction costs were computed for identifying abnormalities in the scene. Sabokrou et al.  proposed a deep network cascade for AED. In the first stage, most normal patches were rejected by a small stack of an auto-encode, and a deep convolutional neural network (CNN) was applied to ex- tract the discriminative features for the final decision. Hu et al.  proposed a deep incremental slow features ana- lysis (D-IncSFA) network to learn the slow features in a scene. Feng et al.  proposed a deep Gaussian mixture model (D-GMM) network to model normal events. Zhou et al.  proposed a spatio-temporal CNN to learn the jointed features of both appearance and motion. Al- though a deep neural network can automatically learn useful descriptors, handcrafted features could still play a dominant role and be widely used in both image and video domains because they can benefit from human in- genuity and prior knowledge as well as enjoy flexibility and computational efficiency without relying on large sets of samples for training.
15 Read more
In testing phase two input sample images are given to the pre-processing step. This step will remove the unwanted noise and resize the input images also convert the image RGB to greyscale, and then fuse the input images by decomposition of the robust-principal component analysis and the quad tree technique. Fused image is subject to level set segmentation, and then an image is segmented. Where segmented image is gives to feature extraction. Extraction is used to extract the features of image; complete local binary pattern with pyramid histogram of oriented gradient process can apply to feature extraction. Then feature extraction is subject to the classification, ART classifier can be used here for final resultant of an image. Then the proposed block diagram of this paper is shows in Figure 1.
Optical flow represents the absolute motion between two frames, which contains motion from many sources, i.e., foreground object motion and background camera motion. If camera motion is considered as action motion, it may corrupt the action classification. Various types of camera motion can be observed in realistic videos, e.g., zooming, tilting, rotation, etc. In many cases, camera motion is translational and varies smoothly across the image plane.  proposed the motion boundary histograms (MBH) descriptor for human detection by computing derivatives separately for the horizontal and vertical components of the optical flow. The descriptor encodes the relative motion between pixels. Since MBH represents the gradient of the optical flow, locally constant camera motion is removed and information about changes in the flow field (i.e., motion boundaries) is kept. MBH is more robust to camera motion than optical flow and thus more discriminative for action recognition. The MBH descriptor separates optical flow ω= (u, v) into its horizontal and vertical components. Spatial derivatives are computed for each of them and orientation information is quantized into histograms. The magnitude is used for weighting. We obtain a 8- bin histogram for each component. Compared to video stabilization [31, 32] and motion compensation , this is a simpler way to discount for camera motion. The MBH descriptor is shown to outperform significantly the HOF descriptor in our experiments. For both HOF and MBH descriptor computation, we reuse the dense optical flow that is already computed to extract dense trajectories . This makes our feature computation process more efficient. Here, the size of the MBH feature is 2050.
One thing to note is that, at orientation computation radian to degree method is used, which returns values between -180° and 180°. Since unsigned orientations are desired for this implementation, the orientation range of degree which is less than 0° is summed up with 180°. The next step is to compute cell histogram. Each histogram divides the gradient angle range into a predefined number of bins. In this paper, each cell, as shown Figure 1 (c), is represented by 8x8 pixel size and has 9 bins covering the orientation for [0°, 180°] interval. For each pixel’s orientation, the corresponding orientation bin is found and the orientation’s magnitude | m ( x , y ) | is voted to this bin. A contrast-normalization is used on the local responses to get better invariance regarding illumination, shading, etc. To normalize the cell’s orientation histograms, it should be grouped into blocks (3x3 cells). This is done by accumulating a measure of local histogram value over the blocks and the result is then used to normalize the cells in the block. Although there are four different methods for block normalization suggested by Dalal and Triggs , L2-norm normalization is implemented using equation (5)
Image processing is a method to perform some operations on an image, in order to get an enhanced image or to extract some useful information from it. Image analysis techniques are widely used in agricultural product and food engineering for the identification of cereals, pulses, plant disease, counting of flowers and fruits and grains. It is used in the detection of cracks, dark spots etc. Image processing techniques can be used to enhance agricultural practices by improving accuracy and consistency of processes while reducing farmer’s manual monitoring.
Crop diseases may lead to severe agricultural yield. Hence classification and identification of crop diseases is essential to improve the agricultural yield. Various methods have been proposed to identifying the crop diseases, but the accuracy is considered to be issue over all the researches performed so far. In proposed system, the image is taken, preprocessing the image. The preprocessed image is subjected to K means clustering to get infected part of the leaf. The infected part is subjected to morphological processing to expanding the infected area. The infected part of leaf is subjected to histogram of oriented gradient (HOG) algorithm to extract the features. SVM classifier is used to identify and classify the diseases based on the extracted feature.
H N Patel, et al  has worked on efficient location of fruit on the tree is one of the major requirements for the fruit harvesting system and implements the fruit detection using shape analysis. The algorithm was composed of edge detection, region labeling and circle fitting based detection. The Edge detection and combination of a circular fitting algorithm is applied for the automatic segmentation of fruit in the image. The results showed that the work can accurately segment the occluded fruits with the efficiency of 98% and the average yield measurement error was found as 31.4 %.It was designed to solve the problems of varying illumination and fruit occlusion through segmentation and shape-based detection.
In the proposed system, we used two different types of features, HOG and LPQ are fused to train a classifier for face recognition. From many existed systems, we trust that the description ability of LPQ and HOG to the object is different. So, we need to determine the strength of HOG and LPQ descriptor ability on the objects or the Region of Interest (ROI) of the objects. HOG and LPQ are both based on the histogram statistics, and HOG features divides in terms of cell structure and merge in to block structure for normalization. In order to match these with LPQ, we did LPQ features statistics in to the cell structure and for the same block structure, we use Fuzzy Logic based on Support Vector Machine (Fuz.-SVM) to learn these two features to compute confidence of HOG and LPQ. Finally, we calculate the HOG-LBP features which can be well applicable to the case of the substructure of the object are not similar. In the face, because of the different sub-structures like eyes, nose, lips and chin, the description ability of HOG and LPQ are not same to partial face. Though our proposed method can be used in the detection of the Object face liveness detection and the system uses the advantages of the two features.
Plants play a vital role in the cycle of nature. Plants are the only organisms which produce food by converting light energy from the sun. They also help in maintaining oxygen balance on earth by emitting oxygen and taking carbon dioxide. They have plenty of use in medicine and industry. But plant species are vast in number. To identify this large number of existing plant species in the world is a tedious and time-consuming task for a human. Hence, an automatic plant identification tool is very useful even for experienced botanists to identify the vast number of plants. In this paper, we proposed a technique to identify the plant leaf images. For training and testing, we used a publicly available dataset called Flavia leaf dataset. Histogram of Oriented Gradients (HOG) and Local Binary Pattern (LBP) are used to extract features and multiclass Support Vector Machine (SVM) is applied to classify the leaf images. We observed that the accuracy of HOG+SVM with HOG feature extraction using cells size of 2 x 2, 4 x 4 and 8 x 8 are 77.5%, 81.25% and 85.31 respectively. The accuracy of LBP+ SVM is 40.6% and the combination of HOG and LBP based features with SVM achieved 91.25% accuracy. The experimental results indicate the effectiveness of HOG+LBP with SVM over HOG+SVM and LBP+SVM techniques.
13 Read more
On the basis of summarizing the limitations of the existing algorithm for the object detection, this paper presents object detection based on deep learning of small samples. The proposed algorithm contains modules as: preprocessing, feature extraction and support vector machine, which can realize object detection of a scene. Experimental results show the proposed method is significantly better than the existing techniques in terms of both subjective and objective. In the future work, we will combine object detection with attitude estimation to make the object detector better used in the service robotics.
4) Histogram Computation: Each detection window is divided into sized cells 8 x 8 pixels and for each cell we compute the histogram of gradients by accumulating votes into bins for each orientation. Votes are weighted by the magnitude of a gradient. We used the 'integral image' representation to compute efficiently the HOG of each cell. HOG descriptor is then the vector of the components of the normalized cell histograms from all of the block regions. 4) Object Detection: The final step in object recognition using Histogram of Oriented Gradient descriptors is to feed the descriptors into some recognition system. The final PHOG descriptor for an image is a concatenation of all the HOG vectors at each pyramid resolution
In the context of improved navigation for micro aerial vehicles, a new scene recognition visual descriptor, called spatial color gist wavelet descriptor (SCGWD), is proposed. SCGWD was developed by combining proposed Ohta color-GIST wavelet descriptors with census transform histogram (CENTRIST) spatial pyramid representation descriptors for categorizing indoor versus outdoor scenes. A binary and multiclass support vector machine (SVM) classifier with linear and non-linear kernels was used to classify indoor versus outdoor scenes and indoor scenes, respectively. In this paper, we have also discussed the feature extraction methodology of several, state-of-the-art visual descriptors, and four proposed visual descriptors (Ohta color-GIST descriptors, Ohta color-GIST wavelet descriptors, enhanced Ohta color histogram descriptors, and SCGWDs), in terms of experimental perspectives. The proposed enhanced Ohta color histogram descriptors, Ohta color-GIST descriptors, Ohta color-GIST wavelet descriptors, SCGWD, and state-of-the-art visual descriptors were evaluated, using the Indian Institute of Technology Madras Scene Classification Image Database two, an Indoor-Outdoor Dataset, and the Massachusetts Institute of Technology indoor scene classification dataset [(MIT)-67]. Experimental results showed that the indoor versus outdoor scene recognition algorithm, employing SVM with SCGWDs, produced the highest classification rates (CRs) — 95.48% and 99.82% using radial basis function kernel (RBF) kernel and 95.29% and 99.45% using linear kernel for the IITM SCID2 and Indoor-Outdoor datasets, respectively. The lowest CRs — 2.08% and 4.92%, respectively — were obtained when RBF and linear kernels were used with the MIT-67 dataset. In addition, higher CRs, precision, recall, and area under the receiver operating characteristic curve values were obtained for the proposed SCGWDs, in comparison with state-of-the-art visual descriptors.
13 Read more
As the segmented hand image still contains noise, various image morphological operations such as erosion and dilation  are performed on it. The smaller connected components which are less than 300 pixels present in the background of the image are eliminated and only the biggest connected component i.e., the hand is retained in the image using labeling. Then the labeled image is subjected to filtering for noise removal. In image processing, noise reduction is most often performed on an image and it is the typical pre- processing step to enhance the result of further processing such as edge detection on an image. Median filter is a non- linear filter which is used to remove noise. The 3x3 mask median filter is applied to smoothen the hand image which is used for edge detection in the next stage.