classification power [12]. The viability of machine learning approaches such as artificial neural networks (ANN) has also been explored as a useful technology to assist in statistical speech recognition [13-17] due to its discriminative and adaptive learning properties. The capability of ANN has been demonstrated in many aspects such as isolated-word recognition, phoneme classifier, and as probability estimator for speech recognizers [13, 14]. Much as ANN offers high discriminative training especially for short-time speech signals as in isolated- words, it also has issues with adequately modeling of temporal variations of long-time speech signals. Hybrid solutions that draw on the strengths of the ANN and HMM frameworks have also been demonstrated in speech recognition technology. In the work of Trentin et al. [20], a hybrid ANN/HMM architecture was used to achieve speech recognition rate of 54.9-89.27% with corresponding SNR of 5-20dB for isolated utterances. Reynolds et al., have also employed the statistical Gaussianmixturemodels (GMM) framework, which is a variant of HMM, to achieve speech recognition rate of 80.8-96.8% [21, 22] in a speaker-independent system using isolated utterances. Though there are considerable research activities in continuous speech recognition, most of the activities are concentrated on either correct detection or recognition of words or their positions in the utterances for example to detect when a speaker has used a word that is not in the vocabulary of a continuous speech. The goal of this work however, is to identify speakers using their continuous speech voice waveform distribution of utterances. This may be particularly useful for application systems that require detection of speakers in natural conversation environments such as forensic and security activities.
Qiguang et al. (1996) were the pioneers who came out with the idea of hybrid VQ/GMM in speaker recognition field. Conventional GMM generates a Gaussianmixture model for each enrolled speaker. The model statistics is estimated using acoustic features covering the entire acoustic space. They argued that the statistics can be better estimated by first clustering (vector-quantizing) the acoustic space into several subspaces. Each subspace is then represented by a number of Gaussianmixturemodels whose parameters are determined using only those relevant acoustic features belonging to the subspace. They therefore recommended vector-quantization based Gaussianmixturemodels (VQGMM) and the system has recently been used in 1996 NIST Speaker Identification Evaluation. From the official evaluation results, the system generally produces top scores among all the participating sites. For some test subsets (short utterances), the VQGMM system yields the best scores.
Better acquisition protocols and analysis techniques are making it possible to use fMRI to obtain highly detailed visualizations of brain processes. In particular we focus on the reconstruction of natural images from BOLD responses in visual cortex. We expand our linear Gaussian framework for percept decoding with Gaussianmixturemodels to better represent the prior distribution of natural images. Reconstruction of such images then boils down to probabilistic inference in a hybrid Bayesian network. In our set-up, different mixture components correspond to different character categories. Our framework can automatically infer higher-order semantic categories from lower-level brain areas. Furthermore, the framework can gate semantic information from higher-order brain areas to enforce the correct category during reconstruction. When categorical information is not available, we show that automatically learned clusters in the data give a similar improvement in reconstruction. The hybrid Bayesian network leads to highly accurate reconstructions in both supervised and unsupervised settings.
fills in this important gap by considering a more flexible model for imputation. In this paper, to achieve robustness against model misspecification, we develop an imputation procedure based on Gaussianmixturemodels. Gaussianmixture model is a very flexible model that can be used to handle outliers, heterogeneity and skew- ness. McLachlan and Peel (2004) and Bacharoglou (2010) argued that any continu- ous distribution can be approximated by a finite Gaussianmixture distribution. The proposed method using Gaussianmixture model makes a nice compromise between ef- ficiency and robustness. It is semiparametric in the sense that the number of mixture components is chosen automatically from the data. The computation for parameter estimation in our proposed method is based on EM algorithm and its implementation is relatively simple and efficient.
Abstract: Clustering is the most important task in data mining. For the intelligent clustering is also the part of the machine learning. Various existing systems are introduced for better clustering. In the past decade so many existing clustering algorithms are introduced to perform better results. These algorithms work on extracting the patterns from the unsupervised decision tree. Binary cuckoo search based decision tree is adopted with Expectation–Maximization (EM) Clustering through GaussianMixtureModels (GMM) to improve performance of the clustering. Here we are using numerical data set, mushroom and MIST dataset to extract patterns using clustering. The performance will be estimated in terms of various measures like sensitivity, specificity, and accuracy.
Abstract Microalgae are unicellular organisms that have different shapes, sizes and structures. Classifying these microalgae manually can be an expensive task, because thousands of microalgae can be found in even a small sample of water. This paper presents an approach for an automatic/semi-automatic classification of microalgae based on semi-supervised and active learning algorithms, using Gaussianmixturemodels. The results show that the approach has an excellent cost-benefit relation, classifying more than 90 % of microalgae in a well distributed way, overcoming the supervised algorithm SVM.
The appearance of a face is severely altered by illumination conditions that makes automatic face recognition a challenging task. In this paper we propose a GaussianMixtureModels (GMM)-based human face identification technique built in the Fourier or frequency domain that is robust to illumination changes and does not require “illu- mination normalization” (removal of illumination effects) prior to application unlike many existing methods. The importance of the Fourier domain phase in human face identification is a well-established fact in signal processing. A maximum a posteriori (or, MAP) estimate based on the posterior likelihood is used to perform identification, achieving misclassification error rates as low as 2% on a database that contains images of 65 individuals under 21 different illumination conditions. Furthermore, a misclassification rate of 3.5% is observed on the Yale database with 10 people and 64 different illumination conditions. Both these sets of results are significantly better than those obtained from traditional PCA and LDA classifiers. Statistical analysis pertaining to model selection is also presented.
AI implementation on traffic light certainly requiring a process that is capable of recognizing and calculating traffic density as studied by Indrabayu et al. which uses Viola Jones method to detect and calculate the vehicle [2]. This study applying image processing techniques in the tracking process and counting vehicle objects. Although it manage to recognize and count vehicle objects, yet this studies has not reached the best accuracy and can only detect one type of vehicle, therefore development in this research are continuously conducted by optimum identification in Region of Interest (ROI) using GaussianMixtureModels (GMM) under heavy traffic conditions [3]. The “Heavy traffic” is a condition where the vehicles are moving slowly due to crowded traffic through a road section, or a complex traffic junction that can be described as chaotic traffic [4].
This paper proposes a voicemorphing system for people suffering from Laryngectomy, which is the surgical removal of all or part of the larynx or the voice box, particularly performed in cases of laryngeal cancer. A primitive method of achieving voice morphing is by extracting the source's vocal coefficients and then converting them into the target speaker's vocal parameters. In this paper, we deploy GaussianMixtureModels (GMM) for mapping the coefficients from source to destination. However, the use of the traditional/conventional GMM-based mapping approach results in the problem of over-smoothening of the converted voice. Thus, we hereby propose a unique method to perform efficient voice morphing and conversion based on GMM, which overcomes the traditional-method effects of over-smoothening. It uses a technique of glottal waveform separation and prediction of excitations and hence the result shows that not only over- smoothening is eliminated but also the transformed vocal tract parameters match with the target. Moreover, the synthesized speech thus obtained is found to be of a sufficiently high quality. Thus, voice morphing based on a unique GMM approach has been proposed and also critically evaluated based on various subjective and objective evaluation parameters. Further, an application of voice morphing for Laryngectomees which deploys this unique approach has been recommended by this paper.
Speech has several endowment features such as naturalness and efficient, which makes it as winsome interface medium. It is possible to express emotions and attitudes via speech. In human machine interface application emotion recognition from the speech signal has been prevailing topic of research. Speech emotion recognition is an important issue which affects the human machine intercommunication. Automatic recognition of human emotion in speech angles at recognizing the primitive emotional state of a speaker from the speech signal. Gaussianmixturemodels (GMMs) and the scintilla error rate classifier (i.e. Bayesian optimal classifier) is embraced and effective tools for speech emotion recognition. Typically, GMMs are used to model the class-conditional distributions of acoustic visage and their parameters are outlined by the expectation maximization (EM) algorithm based on a training data set. Then, classification is performed to minimize the classification error w.r.t. the judged class conditional distributions. This method is called the EM-GMM algorithm. In this paper, we discuss about boosting algorithm for reliably and accurately estimating the class- conditional GMMs. The resulting algorithm is named the Boosted-GMM algorithm. This speech recognition experiment shows better results than the prior algorithms available update. This is due to the fact that the boosting algorithm can lead to more scrupulous estimates of the class-conditional GMMs, namely the class-conditional dispersions of acoustic features.
Model-based clustering using a family of Gaussianmixturemodels, with parsimo- nious factor analysis-like covariance structure, is described and an efficient algorithm for its implementation is presented. This algorithm uses the alternating expectation- conditional maximization (AECM) variant of the expectation-maximization (EM) algorithm. Two central issues around the implementation of this family of mod- els, namely model selection and convergence criteria, are discussed. These central issues also have implications for other model-based clustering techniques and for the implementation of techniques like the EM algorithm, in general. The Bayesian information criterion (BIC) is used for model selection and Aitken’s acceleration, which is shown to outperform the lack of progress criterion, is used to determine convergence. A brief introduction to parallel computing is then given before the implementation of this algorithm in parallel is facilitated within the master-slave paradigm. A simulation study is then carried out to confirm the effectiveness of this parallelization. The resulting software is applied to two data sets to demonstrate its effectiveness when compared to existing software.
We emphasize that we used, in this paper, an explicit MMSE formula to estimate speech and noise PSDs, independently, which were then used in the construction of the Wiener filter. We also transformed this explicit MMSE formula to express an explicit formula for the MAP estimations of our speech and noise PSDs. By doing so we proposed a solution which achieves almost the same results at a much reduced processing time. Our future line of work includes, among other things, comparison of our MAP solution with a solution based on using optimization algorithms albeit at a much higher computation time. This comparison is interesting in the sense that it should shed some light on the relevance of the orthogonal assumption made here on the covariance matrices of the Gaussianmixturemodels.
Despite a large diversity of methods, feature selection algo- rithms usually do not scale well with the number of pixels to be processed [27]. The training computational load is too impor- tant to compensate the reduced prediction computational load. Hence, feature selection is not widely used in operational situ- ations. However, methods based on GaussianMixtureModels (GMM) have several interesting properties that make them suit- able for feature selection in the context of large amount of data. By taking advantage of their intrinsic properties, it is possible to increase the computational efficiency with respect to standard implementation.
Recent work has shown substantial performance improvements of discriminative probabilistic mod- els over their generative counterparts. However, since discriminative models do not capture the input distribution of the data, their use in missing data scenarios is limited. To utilize the advantages of both paradigms, we present an approach to train Gaussianmixturemodels (GMMs) in a hybrid gen- erative-discriminative way. This is accomplished by optimizing an objective that trades o ff between a generative likelihood term and either a discriminative conditional likelihood term or a large mar- gin term using stochastic optimization. Our model substantially improves the performance of classical maximum likelihood optimized GMMs while at the same time allowing for both a consistent treatment of missing features by marginalization, and the use of additional unlabeled data in a semi-supervised setting. For the covariance matrices, we employ a diagonal plus low-rank matrix structure to model important correlations while keeping the number of parameters small. We show that a non-diagonal matrix structure is crucial to achieve good performance and that the proposed structure can be utilized to considerably reduce classification time in case of missing features. The capabilities of our model are demonstrated in extensive experiments on real-world data.
We have shown how to efficiently construct coresets for estimating parameters of Gaussianmixturemodels by exploiting a connection between statistical estimation and clustering problems in computational geometry. We prove existence of coresets of size independent of the original data set size. To our knowledge, our results provide the first rigorous guarantees for obtaining compressed ε-approximations of the log-likelihood of mixturemodels for large data sets. The coreset construction algorithm is based on a simple importance sampling scheme and and has linear running time in n. We demonstrate that, by exploiting certain closure properties of coresets, it is possible to construct them in parallel, or in a single pass through a stream of data, using only poly(d, k, λ −1 , ε −1 , log n, log(1/δ)) space and update time. Critically, our coresets provide guarantees for any given (possibly unstructured) data, without assumptions on the distribution or model that generated it. In an empirical evaluation on several real-world data sets we observe a reduction in computational time of up to two orders of magnitude, while achieving a hold-out set likelihood competitive with the models trained on the full data set.
the transcriptions are available. Wang and Wood- land (2007) used the self-training method to aug- ment the training set for discriminative training. Huang and Hasegawa-Johnson (2008) investigated another use of discriminative information from la- beled data by replacing the likelihood of labeled data with the class posterior probability of labeled data in the semi-supervised training objective for GaussianMixtureModels (GMM), resulting in a hybrid dis- criminative/generative objective function. Their ex- perimental results in binary phonetic classification showed significant improvement in classification ac- curacy when labeled data are scarce. A similar strat- egy called ”‘multi-conditional learning”’ was pre- sented in (Druck et al., 2007) applied to Markov Random Field models for text classification tasks, with the difference that the likelihood of labeled data is also included in the objective. The hybrid dis- criminative/generative objective function can be in- terpreted as having an extra regularization term, the likelihood of unlabeled data, in the discriminative training criterion for labeled data. However, both methods in (Huang and Hasegawa-Johnson, 2008) and (Druck et al., 2007) encountered the same issue about determining the weights for labeled and un- labeled part in the objective function and chose to use a development set to select the optimal weight. This paper provides an experimental analysis on the effect of the weight.
The aim of this paper is to show the accuracy and time results of a text independent automatic speaker recognition (ASR) system, based on Mel-Frequency Cepstrum Coefficients (MFCC) and GaussianMixtureModels (GMM), in or- der to develop a security control access gate. 450 speakers were randomly extracted from the Voxforge.org audio data- base, their utterances have been improved using spectral subtraction, then MFCC were extracted and these coefficients were statistically analyzed by GMM in order to build each profile. For each speaker two different speech files were used: the first one to build the profile database, the second one to test the system performance. The accuracy achieved by the proposed approach is greater than 96% and the time spent for a single test run, implemented in Matlab language, is about 2 seconds on a common PC.
Abstract—Handwriting biometrics is the science of identifying the behavioural aspect of an individual’s writing style and exploiting it to develop automated writer identification and verification systems. This paper presents an efficient handwriting identification system which combines Scale Invariant Feature Transform (SIFT) and RootSIFT descriptors in a set of Gaussianmixturemodels (GMM). In particular, a new concept of similarity and dissimilarity Gaussianmixturemodels (SGMM and DGMM) is introduced. While a SGMM is constructed for every writer to describe the intra-class similarity that is exhibited between the handwritten texts of the same writer, a DGMM represents the contrast or dissimilarity that exists between the writer’s style on one hand and other different handwriting styles on the other hand. Furthermore, because the handwritten text is described by a number of key point descriptors where each descriptor generates a SGMM/DGMM score, a new weighted histogram method is proposed to derive the intermediate prediction score for each writer’s GMM. The idea of weighted histogram exploits the fact that handwritings from the same writer should exhibit more similar textual patterns than dissimilar ones, hence, by penalizing the bad scores with a cost function, the identification rate can be significantly enhanced. Our proposed system has been extensively assessed using six different public datasets (including three English, two Arabic and one hybrid language) and the results have shown the superiority of the proposed system over state-of-the-art techniques.
However, for some parametric models in retrieval, the integral involved in com- puting the Kullback-Leibler divergence is not analytically tractable, which is the case for Gaussianmixturemodels (GMMs). Nevertheless, GMM is a popular sta- tistical model due to its flexibility. Therefore, one has to resort to approximations to the Kullback-Leibler divergence between two GMMs. In literature, a number of methods have been proposed for approximation. However their performances on approximation are not well understood. Thus, in this letter, we compare seven methods for approximating the Kullback-Leibler divergence between two GMMs in satellite image retrieval. We first extract some local features from an image and then estimate a parametric GMM for the feature space. The learned model is considered as a statistical representation of the image content. Then the Kull- backLeibler divergence between GMMs is approximated by these methods. Two experiments using two public datasets have been performed. The comparison is carried out in terms of retrieval accuracy and computational time.
The performance was assessed by averaging the results obtained from fivefold cross-validation scheme [10,12]. Table 1 shows the confusion matrix, accuracy (%) in- cluding 95% confidence intervals (CIs), specificity (%), sensitivity (%), and areas under the curve (AUC) accord- ing to the number of the Gaussian mixtures. Specificity and sensitivity means the test’s ability to identify nega- tive and positive results, respectively. The accuracy is the proportion of true results (both true positives and true negatives) in the population. The GMM models were trained using 8, 16, and 32 mixtures. The average performance was 92.00% when the number of Gaussian