Computer vision has several active research subjects, but detecting human objects is considered one of the most active ones, due to the wide implementation in real applications. Human detecting could be interpreted as, the process of determining all the human objects in the sequences of videos and images, by ascertaining the human qualities present in videos and images. For a robust detection, we need the good capabilities of the approaches of computer vision, which must have the ability to bring out the mutual qualities between various humankinds from the videos and images, so they can be localised and separated from the background. This task has become quite challenging for researchers in computer vision areas due to the fact that different people tend to have different features which are usually caused by their variable appearances and postures. This task of human detection also corresponds to determining the region that contains human objects in the images or sequence videos, which leads the computer vision approach to start detecting objects in the images and video sequences and then to identify and classify the detected objects as humans or non- humans based on the human features depending on the system goals.
ABSTRACT: A newclassificationapproach is proposed for pollen grains. Along with image statistics and shape descriptor, histogram (H) coefficient features were used as input to the classifier. As the earlier reported approaches werefound to be tedious and time consuming with less accuracy, the present approach gives precise accuracy in classification of pollen grains by using SEM images. The improved classifiers based on Generalized Feed Forward (GFF) Neural Network, Modular Neural Network (MNN), Principal Component Analysis (PCA) Neural Network, and Support Vector Machine (SVM) are explored with optimization of their respective parameters in view of reduction in time as well as space complexity. In order to reduce the space complexity, sensitivity analysis is done to eliminate theinsignificant parameters from the dataset.As performance of all these neural networks is compared with respect to MSE, NMSE and the Average Classification Accuracy (ACA), GFF NN comprising of two hidden layers is found to be superior (95 % ACA on CV dataset) to all other classifiers. The new improved classifier algorithm with Histogram coefficients provides more accuracy as compared to the earlier algorithms, which usedDiscrete Cosine Transform Features and Walsh Hadamard Transform coefficients. The robustness of the classifier to noise is verified on the Cross validation dataset by introducing controlled Gaussian and Uniform noise in both input and output. The proposed approach is inexpensive, reliable and nearly accurate that can be used without help of experts from the field of palynology.
R.S. ANU GOWSALYA, S. MIRUNA JOE AMALI , In this paper they explain, Traffic classification is of fundamental importance to numerous other network activities, from security monitoring to accounting, and from Quality of Service to providing operators with useful forecasts for long-term provisioning. Naive Bayes estimator is applied to categorize the traffic by application. Uniquely, this work capitalizes on hand-classified network data, using it as input to a supervised Naive Bayes estimator. A novel traffic classification scheme is used to improve classification performance when few training data are available. In the proposed scheme, traffic flows are described using the discretized statistical features and flow correlation information is modeled by bag-of-flow (BoF). A novel parametric approach for traffic classification, which can improve the classification performance effectively by incorporating correlated information into the classification process. Then analyze the newclassificationapproach and its performance benefit from both theoretical and empirical perspectives. Finally, a large number of experiments are carried out on large-scale real- world traffic datasets to evaluate the proposed scheme. The experimental results show that the proposed scheme can achieve much better classification performance than existing state-of-the-art traffic classification methods.
The idea of using association rule mining in classification rule mining was first introduced in 1997 by  and  and it was named as class association rule mining or associative classification. The first classifier based on association rules was CBA  given by Liu et al. in 1998. Later, some improved classifiers were given by Li et al. CMAR  in 2001, Yin et al. CPAR  in 2003, and Fadi et al. MCAR  in 2005. More research is going on to design even improved classifiers.
depending on problem-speciﬁc variation in the importance of the alternative classiﬁcations). Thus whenever a choice has to be made between competing classiﬁers, either the success rate or the error rate is the criterion on which the decision is based. Within a Bayesian approach to classiﬁcation, the problem is generally turned into one of model choice and then the optimal model can be chosen on the basis of a criterion such as BIC (Schwarz, 1978). A good example is provided by Lee (2001), who uses this criterion for developing a procedure for model choice in neural network classiﬁcation. But these statistics carry no information regarding the conﬁdence with which the various classiﬁcations have been made. We have argued above for the use of SURE CORRECT, SURE INCORRECT and UNSURE as measures of conﬁdence in classiﬁcations, so a better comparison between classiﬁers should be based on simultaneous use of all these measures. To see how this can be implemented, we draw on the work that has been done in classi- ﬁer acceptance-reject rates (see, e.g., Giacinto, Roli and Bruzzone, 2000, for a summary). In particular, Battiti and Cola (1994) have shown that to compare the performance of diﬀerent classiﬁers we need to compare their accuracies over a range of diﬀerent rejection rates (i.e. dif- ferent threshold values t), and this can be done by plotting these values in the accuracy-rejection (A-R) plane. In our case the UNSURE proportions at diﬀerent t values correspond to the rejec- tion rates, while “accuracy” is reﬂected by either of the SURE categories. We prefer to minimise SURE INCORRECT rather than maximise SURE CORRECT, so to compare diﬀerent classi- ﬁers on a data set we compare the curves each produces when SURE INCORRECT is plotted against UNSURE for a range of values of t. The classiﬁer corresponding to the lowest curve on such a plot is the one to be chosen.
Background: DNA microarrays are potentially powerful technology for improving diagnostic classification, treatment selection, and prognostic assessment. The use of this technology to predict cancer outcome has a history of almost a decade. Disease class predictors can be designed for known disease cases and provide diagnostic confirmation or clarify abnormal cases. The main input to this class predictors are high dimensional data with many variables and few observations. Dimensionality reduction of these features set significantly speeds up the prediction task. Feature selection and feature transformation methods are well known preprocessing steps in the field of bioinformatics. Several prediction tools are available based on these techniques. Results: Studies show that a well tuned Kernel PCA (KPCA) is an efficient preprocessing step for dimensionality reduction, but the available bandwidth selection method for KPCA was computationally expensive. In this paper, we propose a new data-driven bandwidth selection criterion for KPCA, which is related to least squares cross-validation for kernel density estimation. We propose a new prediction model with a well tuned KPCA and Least Squares Support Vector Machine (LS-SVM). We estimate the accuracy of the newly proposed model based on 9 case studies. Then, we compare its performances (in terms of test set Area Under the ROC Curve (AUC) and computational time) with other well known techniques such as whole data set + LS-SVM, PCA + LS-SVM, t-test + LS-SVM, Prediction Analysis of Microarrays (PAM) and Least Absolute Shrinkage and Selection Operator (Lasso). Finally, we assess the performance of the proposed strategy with an existing KPCA parameter tuning algorithm by means of two additional case studies.
Currently, the new landscape classification approaches, which relate to landscape ecology and landscape pattern analysis, have played an important role in solving integrated problems related to natural resources exploitation and environmental protection. Since, landscape ecology considers a territory space as a system that consists of both natural elements, namely geology, topography, soil, climate, and vegetation as well as human components, such as residential and land use patterns. Under these approaches, each territory is clarified through analyzing its structure, function, and dynamics as the main characteristics of landscape ecology. These are an essential, solid and reliable scientific basis for sustainable development planning. Thus, identifying landscape unit and establishing landscape type with diagnostic criteria usually become the first essential step in several studies. This crucial step was mentioned and conducted at continental level for different European landscape maps [1-4]. Besides delineating landscape units, studying landscape pattern analysis was successfully conducted in different countries such as Kim and Pauleit , Swanwick , Ongsomwang and Ruamkaew , Ongsomwang and Sutthivanich , Tudor , Ongsomwang , Van Eetvelde and Antrop , Bosun et al. , Käyhkö et al. , Blasi et al. , Brabyn , Otahel , Lioubimtseva and Defourny , Nogué et al. , Perko et al. , Romportl et al. , Divíšek et al. . In Vietnam, studies of landscape classification are mainly based on the theoretical backgrounds of Soviet scientists by using natural geographic zoning. Among those studies, the multi-level landscape classification system of Lap , which is the first landscape classification system in Vietnam, was applied to classify landscape in Northern Vietnam. Since then, landscape ecology scientists and researchers have applied his theoretical concept for their different studies to meet practical requirements. Most of those studies were conducted at regional and national levels with small scales, such as landscape map of Southern Vietnam , landscape map of Vietnam at the scale of 1:1,000,000 [24, 25]. Nevertheless, these studies provide information on the structure, locations and other properties of landscapes of Vietnam but most of them were manually produced. Therefore, a new landscape classificationapproach is required to examine for Vietnam territory, particularly areas with highly landscape diversities. Hence, Bac Kan province which represents such area is chosen as the study area.
Crossover determines the process of how two different genomes can reproduce to create a new genome of the next generation. In nature, the fittest specimens reproduce the most, and the same goes for genetic algorithms. If less fit models were allowed to reproduce freely, the model could stagnate and never reach an optimal solution. To avoid this, the fittest members, as determined by the fitness function, should reproduce to propagate healthy genes to the next population. How this is determined can once again be done in numerous ways. In the case of neural networks, The fitter parent can be more likely to pass on its node and connection layout to its children, while the less fit parent has a smaller chance, but a chance nonetheless. This is because genetic diversity is usually beneficial in promoting the survival of a species, which is the core idea in genetic algorithms as well.
The second statement (Table 1, line C-I-3-c-y) suggests that the correct formulation of diagnostic terms according to the 2008 EGSc cannot always help the therapeutic choice. After arriving at the end of the thinking pathway imposed by this classification and after spending enough time to place a new case in the group of ACG or OAG, one may expect to receive help in therapeutic decision making. Instead, one discovers that the same mechanism may lead both to OAG and ACG. Knowing that the treatment must attack the patho- genic chain, one may ask: what was the use of the whole previous effort? To prevent abandonment, a good classifica- tion must have practical finality and must help the doctor to conduct the treatment. As the same gonioscopic form may be produced by different pathogenic mechanisms, and as the specific treatment is addressed to the mechanism, the best solution is to conduct the treatment not after a clinical sign (the gonioscopic aspect) but after the pathogenic mechanism that has produced that particular sign. It is easier to apply pathogenic thinking when guided by a pathogenic classifica- tion than by any of the existing classifications. After fram- ing the case in one category, these classifications leave the doctors in the middle of nowhere and force them to a new level of analysis, this time a pathogenic one, before being
The remainder of the paper is structured as follows. The mathematical model is intro- duced in Section 2, showing that it is N P-Hard. In Section 3, two Integer Programming formulations are proposed and theoretically compared. Numerical results are given in Sec- tion 4. It follows from this experience that, when the optimization problems are solved exactly (with a standard MIP solver) the behavior of the classification rule is promising, but with enormous preprocessing times. For this reason, a heuristic procedure is also proposed, and its quality and speed is explored. In particular, this shows that the rules obtained with this heuristic procedure have similar behavior on testing samples as the optimal ones. Some concluding remarks and possible extensions are given in Section 5.
The proposed system can be used to provide the safety information for the driving assistance also it can be used for driverless vehicles. The inverse of circularity is utilized as derived formulae for the shape detection of different shapes traffic sign boards. The area-based analysis based on circularity provides faster detection rate with high accuracy that helped in the recognition stage with less computation time. The area-based approach is efficient for the unsupervised shape recognition when the detected shape is passed to the classifier. Firstly, the database has been trained with the traffic sign board’s images in different classes. SVM classifier is designed to classify the object in different classes based on the feature. The experimental results obtained by the area based algorithm, have excellent detection rate up to 98% with the assumption that there should be appropriate climatic and lightening condition. False detection was also observed that kind of detection is shown under an unknown class. SVM can classify the detected object in the respective six output classes but the response time is slow for the classification in Matlab software. The experimental results have proven that system has fast response and less computation time with the proposed area based algorithms when it is applied on Open CV software for the detection and recognition in real time.
In computational linguistics, one body of work uses social media data to classify ISQs and NISQs, training models on a limited set of manually annotated data (Harper et al., 2009; Li et al., 2011; Zhao and Mei, 2013; Ranganath et al., 2016). Paul et al. (2011) use crowdsourcing techniques to collect human classifications for a large amount of Twit- ter questions. While social media data has its own set of problems (e.g., length of the turn, ungrammaticality of sen- tences, spelling mistakes), the data is enriched with infor- mation like usernames, hashtags and urls, which helps in identifying the type of the question. Bhattasali et al. (2015) develop a machine-learning mechanism to identify rhetor- ical questions in the Switchboard Dialogue Act Corpus; Zymla (2014) uses a rule-based approach to heuristically identify rhetorical questions in German Twitter data. The challenges for this type of work are manifold. First, distinguishing ISQs from NISQs based on syntactic prop- erties is difficult because they are mostly structurally indis- tinguishable. Instead, context and intonation play a much bigger role (Bhatt, 1998; Zymla, 2014). Secondly, only some languages have special lexical markers that might in- dicate the type of question, e.g. German tends to use dis- course particles in NISQs (Maibauer, 1986). Thirdly, ex- pressions such as give a damn, lift a finger or even, which have been identified as generally conveying NISQs (Bhatt, 1998), are not frequent enough in real texts for computa- tional purposes.
pSenti  is an established aspect-focused hybrid sentiment classification system, which integrates both lexicon-based and learning based approaches into opinion mining. Aspects considered are nouns and noun phrases. pSenti is claimed to detect and measure sentiment at the concept level and provide structured and readable aspect-oriented outputs due to the built-in sentiment lexicon and linguistic rules. To tackle domain dependency, exclude domain specific aspect words from the machine learning step. In our approach, the senti- ment classification is centred around the aspect words, and to reduce domain dependency we do not utilize any external domain specific sources. We believe that ours is a better technique because these aspect words will change from one domain to another, and so given that sentiment bearing terms are domain specific, it is better to develop a framework that each dataset can be fit into, rather than attempt to carry out cross-domain sentiment classi- fication, especially aspect-focused classification. Their system performed slightly worse than their purely learning based system, with the accuracy falling from 82.3% to 86.85%.
The high dimensional nature of many data in bioinformatics, has given rise to a wealth of feature subset selection techniques. Feature selection aims identifying a subset of the most useful features that allows the same results as the original set of features. Feature subset selection is an effective method for removing irrelevant features, improving learning accuracy, and improving classification accuracy. Many methods have been studied for different applications. They are generally classified into three categories: the Wrapper, Filter, and Embedded .
For the purpose of evaluating the performance of the CMA in classification and subject indexing of documents using Wikipedia concepts and library controlled vocabularies, we have used a dataset called wiki-20 (Medelyan et al., 2008, Medelyan, 2009). The wiki-20 collection consists of 20 computer science (CS) related scientific documents, each manually annotated by fifteen different human teams independently. Each team consisted of two senior undergraduate and/or graduate CS students. The teams were instructed to assign about five key Wikipedia concepts to each document from a set of over two million concepts in English Wikipedia at the time the dataset was compiled. The detailed evaluation results of our key Wikipedia concept detection and ranking method (described in 2.1 and 2.2) on this dataset are reported in (Joorabchi and Mahdi, 2013). As shown in Table 1, performance of our concept detection and ranking method measured in terms of consistency with human annotators using Rolling’s inter-indexer consistency formula (Rolling, 1981), is on a par with that achieved by humans and outperforms most of rival methods such as KEA++ (KEA-5.0) (Medelyan and Witten, 2008), (Grineva et al., 2009), Maui (Medelyan, 2009), and CKE (Mahdi and Joorabchi, 2010).
The vitreous is normally transparent. In the past, the transparency of the vitreous and associated problems of detailed clinical examination resulted in poorly defined concepts of vitreous pathology. With ultrasonography: “traction”, “contraction”, “organization”, “opacity” of the vitreous is standardized nomenclature in respect of both nature and location, and surgical techniques have been applied as necessary. Up to now there is no classification of vitreous opacity in medical literature. Author’s 3 degrees of vitreous opacity classification by ultrasonography was introduced with some details of description. Some results with this classification were reported here in.
state-of-the-art text classification algorithms are good at categorizing the Web documents into a few categories. But such a classification method does not give very detailed topic-related class information for the user because the first two levels are often too coarse in Large-scale Text Hierarchies. In this paper, we propose a method named DNB which can improve the performance of classification effectively in experimental results.
Sampling methods for imbalanced data sets. Recently, some attempts have been proposed in the machine learning community to overcome the two-class imbalanced data set problem, which pri- marily focus on sampling over the training examples. This is due to the analysis work by Weiss and Provost  where they concluded that the natural class distribution is often not the best distribution for learning a classifier. These sampling methods involve either (i) under-sampling - reducing the negative class by randomly removing the negative examples from training set, or (ii) over-sampling - increasing the positive class by replicating the positive examples. Several studies [4, 10] observed that over-sampling with replication does not always improve the minority (positive) class prediction. This is due to the classifier becoming very specific in the minority class decision region and leading to overfitting the examples . Drummond and Holte  have shown that under-sampling approach performs better than the over-sampling method. The under-sampling approach forces the learning algorithm to focus on different degrees of the class distribution, at the same time increasing the presence of the minority class in the training examples, which can generate a more robust classifier. Although these sampling approaches appear to be appealing for solving imbalanced data problems, at the moment most of these techniques are mainly experimented in two-class problems [4, 10] as well as on artificial/synthetic data . Removing or increasing the training examples is not suitable in this research domain due to the multi-class nature of the training examples as well as the limited amount of real protein data. Furthermore, in the protein fold classification problem, we would like to learn sequence-fold relationships from the sequence features by using non-redundant protein examples with low sequence similarities. Hence, we would like to preserve all the original training examples and propose a method that is capable of performing learning over these multi-class imbalanced data.