latent variable models

Top PDF latent variable models:

Implicit Deep Latent Variable Models for Text Generation

Implicit Deep Latent Variable Models for Text Generation

Deep latent variable models (LVM) such as variational auto-encoder (VAE) have recently played an important role in text generation. One key factor is the exploitation of smooth latent structures to guide the generation. How- ever, the representation power of VAEs is lim- ited due to two reasons: (1) the Gaussian as- sumption is often made on the variational pos- teriors; and meanwhile (2) a notorious “poste- rior collapse” issue occurs. In this paper, we advocate sample-based representations of vari- ational distributions for natural language, lead- ing to implicit latent features, which can pro- vide flexible representation power compared with Gaussian-based posteriors. We further de- velop an LVM to directly match the aggregated posterior to the prior. It can be viewed as a nat- ural extension of VAEs with a regularization of maximizing mutual information, mitigating the “posterior collapse” issue. We demonstrate the effectiveness and versatility of our models in various text generation scenarios, including language modeling, unaligned style transfer, and dialog response generation. The source code to reproduce our experimental results is available on GitHub 1 .
Show more

11 Read more

Dynamical Binary Latent Variable Models for 3D Human Pose Tracking

Dynamical Binary Latent Variable Models for 3D Human Pose Tracking

Perhaps the most prominent current latent variable mod- els are derived from the Gaussian Process Latent Variable Model [8, 23] and the Gaussian Process Dynamical Model [24]. Such models can serve as effective priors for track- ing [23, 24] and can be learned with small training cor- pora [23]. However, larger corpora are problematic since learning and inference are O(N 3 ) and O(N 2 ), where N is the number of training exemplars. While sparse approxima- tions to GPs exist [10], sparsification is not always straight- forward and effective. Recent additions to the GP family include the topologically-constrained GPLVM [25], Multi- factor GPLVM [26], and Hierarchical GPLVM [9]. Such models permit stylistic diversity and multiple motions (un- like the GPLVM and GPDM), but to date these models have not been used for tracking, and complexity remains an issue. Most generative priors do not address the issue of ex- plicit inference over activity labels. While latent variable models can be constructed from data that contains multi- ple activities [20], knowledge about the activities and tran- sitions between them is typically only implicit in training data. As a result, training prior models to capture transi- tions, especially when they do not occur in training data, is challenging and often requires that one constrain the model explicitly (e.g. [25]). In [12] a coordinated mixture of fac- tor analyzers was used to facilitate model selection, but to our knowledge, this model has not been used for tracking multiple activities and transitions. Another way to handle transitions is to to build a discriminative classifier for activ- ities, and then use corresponding activity-specific priors to bootstrap the pose inference [1]. The proposed imCRBM model bridges the gap between pose and activity inference within a single coherent and efficient generative framework.
Show more

8 Read more

Overcoming the Memory Bottleneck in Distributed Training of Latent Variable Models of Text

Overcoming the Memory Bottleneck in Distributed Training of Latent Variable Models of Text

Large unsupervised latent variable models (LVMs) of text, such as Latent Dirichlet Al- location models or Hidden Markov Models (HMMs), are constructed using parallel train- ing algorithms on computational clusters. The memory required to hold LVM parameters forms a bottleneck in training more powerful models. In this paper, we show how the mem- ory required for parallel LVM training can be reduced by partitioning the training corpus to minimize the number of unique words on any computational node. We present a greedy document partitioning technique for the task. For large corpora, our approach reduces mem- ory consumption by over 50%, and trains the same models up to three times faster, when compared with existing approaches for paral- lel LVM training.
Show more

6 Read more

Online but Accurate Inference for Latent Variable Models with Local Gibbs Sampling

Online but Accurate Inference for Latent Variable Models with Local Gibbs Sampling

We study parameter inference in large-scale latent variable models. We first propose a unified treat- ment of online inference for latent variable models from a non-canonical exponential family, and draw explicit links between several previously proposed frequentist or Bayesian methods. We then propose a novel inference method for the frequentist estimation of parameters, that adapts MCMC methods to online inference of latent variable models with the proper use of local Gibbs sam- pling. Then, for latent Dirichlet allocation,we provide an extensive set of experiments and compar- isons with existing work, where our new approach outperforms all previously proposed methods. In particular, using Gibbs sampling for latent variable inference is superior to variational inference in terms of test log-likelihoods. Moreover, Bayesian inference through variational methods perform poorly, sometimes leading to worse fits with latent variables of higher dimensionality.
Show more

45 Read more

Probabilistic Distributional Semantics with Latent Variable Models

Probabilistic Distributional Semantics with Latent Variable Models

3.4.3 Choosing | Z | . In the “parametric” latent variable models used here the number of topics or semantic classes, | Z | , must be fixed in advance. This brings significant efficiency advantages but also the problem of choosing an appropriate value for | Z | . The more classes a model has, the greater its capacity to capture fine distinctions between entities. However, this finer granularity inevitably comes at a cost of reduced general- ization. One approach is to choose a value that works well on training or development data before evaluating held-out test items. Results in lexical semantics are often reported over the entirety of a data set, meaning that if we wish to compare those results we cannot hold out any portion. If the method is relatively insensitive to the parameter it may be sufficient to choose a default value. Rooth et al. (1999) suggest cross-validating on the training data likelihood (and not on the ultimate evaluation measure). An alter- native solution is to average the predictions of models trained with different choices of | Z | ; this avoids the need to pick a default and can give better results than any one value as it integrates contributions at different levels of granularity. As mentioned in Section 3.4.2 we must take care when averaging predictions to compute with quan- tities that do not rely on topic identity—for example, estimates of P(a | p) can safely be combined whereas estimates of P(z 1 | p) cannot.
Show more

46 Read more

Latent variable models for binary response data

Latent variable models for binary response data

Louis (1982) developed a technique for computing the observed information matrix when the E-M algorithm ir. used to find the maximum likelihood estimates in incomplete data problems. It requires computation of the complete-data gradient and second derivative matrix which can be implemented quite simply in the E-M iterations. This procedure can be applied to obtain the asymptotic variance-covariance matrix in latent variable models, since they involve observable (manifest) variables and not directly observable (latent) variables which corresponds to a case of incomplete data, as defined by Dempster, Laird and Rubin (1977).
Show more

356 Read more

Particle methods for maximum likelihood estimation in latent variable models

Particle methods for maximum likelihood estimation in latent variable models

Abstract. Standard methods for maximum likelihood parameter estimation in latent variable models rely on the Expectation-Maximization algorithm and its Monte Carlo variants. Our approach is different and motivated by similar considerations to simulated annealing; that is we build a sequence of artificial distributions whose support concentrates itself on the set of maximum likeli- hood estimates. We sample from these distributions using a sequential Monte Carlo approach. We demonstrate state of the art performance for several ap- plications of the proposed approach.

16 Read more

Integrative Analysis using Coupled Latent Variable Models for Individualizing Prognoses

Integrative Analysis using Coupled Latent Variable Models for Individualizing Prognoses

The coupled model (C-LTM) has several advantages. First, the marker-specific LTMs account for marker trajectory shapes using components at the population, subpopulation, individual long-term, and individual short-term levels, which simultaneously allows for het- erogeneity across and within individuals, and enables statistical strength to be shared across observations at different “resolutions” of the data. Within an individual marker model, the population and subpopulation components are learned offline, while estimates of the individual-specific parameters are refined over the course of the disease as data accrues for that individual. Second, our coupling model allows us to condition both the target and auxiliary marker histories to make predictions about the future target marker trajectory. We therefore use the marker-specific latent variable models to neatly summarize and ex- tra information from the irregularly sampled and sparse, while simultaneously sidestepping the issue of jointly modeling both the target and auxiliary markers. The conditional for- mulation is less sensitive to misspecified dependencies between different marker types and can also be easily scaled to diseases with a large number of auxiliary markers. Finally, our model aligns with clinical practice; predictions are dynamically updated in continuous time as new marker observations are measured. We also note that our description of the method and the experimental results focus on predicting the trajectory of a single clinical marker, but multiple latent factor regression models can be easily fit so that many markers can be simultaneously predicted. Using this extension, we only need to maintain different CRF parameters; the latent variable models are shared since they are fit independently as a precursor to learning the CRF.
Show more

35 Read more

Priors for Diversity in Generative Latent Variable Models

Priors for Diversity in Generative Latent Variable Models

Probabilistic latent variable models are one of the cornerstones of machine learn- ing. They offer a convenient and coherent way to specify prior distributions over unobserved structure in data, so that these unknown properties can be inferred via posterior inference. Such models are useful for exploratory analysis and visual- ization, for building density models of data, and for providing features that can be used for later discriminative tasks. A significant limitation of these models, however, is that draws from the prior are often highly redundant due to i.i.d. as- sumptions on internal parameters. For example, there is no preference in the prior of a mixture model to make components non-overlapping, or in topic model to ensure that co-occurring words only appear in a small number of topics. In this work, we revisit these independence assumptions for probabilistic latent variable models, replacing the underlying i.i.d. prior with a determinantal point process (DPP). The DPP allows us to specify a preference for diversity in our latent vari- ables using a positive definite kernel function. Using a kernel between probability distributions, we are able to define a DPP on probability measures. We show how to perform MAP inference with DPP priors in latent Dirichlet allocation and in mixture models, leading to better intuition for the latent variable representation and quantitatively improved unsupervised feature extraction, without compromis- ing the generative aspects of the model.
Show more

10 Read more

Building Blocks for Variational Bayesian Learning of Latent Variable Models

Building Blocks for Variational Bayesian Learning of Latent Variable Models

We introduce standardised building blocks designed to be used with variational Bayesian learning. The blocks include Gaussian variables, summation, multiplication, nonlinearity, and delay. A large variety of latent variable models can be constructed from these blocks, including nonlinear and variance models, which are lacking from most existing variational systems. The introduced blocks are designed to fit together and to yield efficient update rules. Practical implementation of various models is easy thanks to an associated software package which derives the learning formulas au- tomatically once a specific model structure has been fixed. Variational Bayesian learning provides a cost function which is used both for updating the variables of the model and for optimising the model structure. All the computations can be carried out locally, resulting in linear computational complexity. We present experimental results on several structures, including a new hierarchical nonlinear model for variances and means. The test results demonstrate the good performance and usefulness of the introduced method.
Show more

47 Read more

Learning latent variable models : efficient algorithms and applications

Learning latent variable models : efficient algorithms and applications

interpretable description of the data – via the structure of the LVM – and an effective generator for synthetic data. It is interesting to observe that not- interpretable models able to generate plausible data in challenging contexts exist, and have been object of an increasing interest in the deep learning community during the last few years. These models, called implicit generative models (Mohamed and Lakshminarayanan, 2016), are probabilistic parametric models whose likelihood function is not explicitly accessible but from which it is possible to generate synthetic samples. Unlike the LVMs we have seen up to now, they are not interpretable, and they are typically represented via a deep neural network with millions of parameters, which propagates the outcomes of a prior to generate synthetic, structured data. These networks are trained with the explicit objective of generating data indistinguishable from the training one: they use objective functions aimed for example at minimizing the MMD (Li et al., 2015) or at matching the moments between synthetic and training data (Ravuri et al., 2018) or at making a discriminator unable to distinguish between synthetic and real data (Goodfellow et al., 2014). Implicit generative models have been successfully used to generate synthetic data from a variety of scenarios, including images (Goodfellow et al., 2014), texts (Yu et al., 2017) and medical data (Esteban et al., 2017; Choi et al., 2017; Yoon et al., 2018) and, while they seem to be able to simulate arbitrarily complex models, they do not provide any interpretation on the process generating the data; additionally, these models typically require longer training times in comparison with the simpler latent variable models studied in this thesis (see Aviñó et al., 2018). Understanding in which contexts and in which modeling tasks implicit generative models could be substituted by more interpretable LVMs is an interesting area of future research.
Show more

185 Read more

Online Tensor Methods for Learning Latent Variable Models

Online Tensor Methods for Learning Latent Variable Models

This paper builds on the recent works of Anandkumar et al (Anandkumar et al., 2012, 2013b) which establishes the correctness of tensor-based approaches for learning MMSB (Airoldi et al., 2008) models and other latent variable models. While, the earlier works provided a theoretical analysis of the method, the current paper considers a careful implementation of the method. Moreover, there are a number of algorithmic improvements in this paper. For instance, while (Anandkumar et al., 2012, 2013b) consider tensor power iterations, based on batch data and deflations performed serially, here, we adopt a stochastic gradient descent approach for tensor decomposition, which provides the flexibility to trade-off sub-sampling with accuracy. Moreover, we use randomized methods for dimensionality reduction in the preprocessing stage of our method which enables us to scale our method to graphs with millions of nodes.
Show more

39 Read more

Robustness in Latent Variable Models

Robustness in Latent Variable Models

We proposed methods for diagnosing estimator robustness to distributional spec- ification of latent variable models in the structural measurement error models and joint models. We defined analytic robustness conditions in Chapter 2. In Chapters 4 and 5 we demonstrated the implementation and performance of remeasurement method and its improved version along with several test statistics. The models we studied cover a wide range of latent-variable models that are suitable in many ap- plications. In principle, these methods can be applied to any latent variable models with some sort of error model whose effects on the observed data can be simulated so that remeasured data can be generated. Therefore, these methods provide data analyst with a systematic approach useful for practical use.
Show more

90 Read more

Latent Variable Models of Selectional Preference

Latent Variable Models of Selectional Preference

Latent variable models that use EM for infer- ence can be very sensitive to the number of latent variables chosen. For example, the performance of ROOTH-EM worsens quickly if the number of clusters is overestimated; for the Keller and Lap- ata datasets, settings above 50 classes lead to clear overfitting and a precipitous drop in Pearson cor- relation scores. On the other hand, Wallach et al. (2009) demonstrate that LDA is relatively insensi- tive to the choice of topic vocabulary size Z when the α and β hyperparameters are optimised appro- priately during estimation. Figure 1 plots the effect of Z on Spearman correlation for the LDA model. In general, Wallach et al.’s finding for document modelling transfers to selectional preference mod- els; within the range Z = 50–200 performance remains at a roughly similar level. In fact, we do not find that performance becomes significantly less robust when hyperparameter reestimation is deactiviated; correlation scores simply drop by a small amount (1–2 points), irrespective of the Z chosen. ROOTH-LDA (not graphed) seems slightly more sensitive to Z; this may be because the α pa- rameters in this model operate on the relation level rather than the document level and thus fewer “ob-
Show more

10 Read more

Bayesian latent variable models for collaborative item rating prediction

Bayesian latent variable models for collaborative item rating prediction

The model possesses a number of advantageous attributes; it is fully generative meaning that it is easy to make infer- ences on new documents and its Bayesian nature helps to overcome the over-fitting problem present in models such as Probabilistic Latent Semantic Indexing (pLSI) [9]. Also since in LDA each document is a mixture of topics it is far more flexible than models that assume each document is only drawn from a single topic. These advantages also extend to models based on LDA with both the added flexibility and ability to easily incorporate new data being very important for our own models, as we shall discuss later. Many useful models have used LDA as a base to work from, for example the Author-Topic model [18] which models academic pub- lications and citations. It has even been used to develop models to annotate images based on their visual features [3].
Show more

10 Read more

Perturbative Corrections for Approximate Inference in Gaussian Latent Variable Models

Perturbative Corrections for Approximate Inference in Gaussian Latent Variable Models

In order to justify the usefulness of EP for Ising models we therefore need an alternative argu- ment. Our argument is entirely restricted to Gaussian EP for our extended definition of GLVMs and do not extend to approximations with other exponential families. In the following, we will discuss these assumptions in inference approximations that preceded the formulation of EP, in order to pro- vide a possibly more relevant justification of the method. Although this justification is not strictly necessary for practically using EP nor corrections to EP, it nevertheless provides a good starting point for understanding both.
Show more

42 Read more

Semi supervised latent variable models for sentence level sentiment analysis

Semi supervised latent variable models for sentence level sentiment analysis

We derive two variants of a semi-supervised model for fine-grained sentiment analysis. Both models leverage abundant natural super- vision in the form of review ratings, as well as a small amount of manually crafted sentence labels, to learn sentence-level sentiment clas- sifiers. The proposed model is a fusion of a fully supervised structured conditional model and its partially supervised counterpart. This allows for highly efficient estimation and infer- ence algorithms with rich feature definitions. We describe the two variants as well as their component models and verify experimentally that both variants give significantly improved results for sentence-level sentiment analysis compared to all baselines.
Show more

6 Read more

Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data

Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data

In this paper we introduce a new underlying probabilistic model for prin- cipal component analysis (PCA). Our formulation interprets PCA as a particular Gaussian process prior on a mapping from a latent space to the observed data-space. We show that if the prior’s covariance func- tion constrains the mappings to be linear the model is equivalent to PCA, we then extend the model by considering less restrictive covariance func- tions which allow non-linear mappings. This more general Gaussian pro- cess latent variable model (GPLVM) is then evaluated as an approach to the visualisation of high dimensional data for three different data-sets. Additionally our non-linear algorithm can be further kernelised leading to ‘twin kernel PCA’ in which a mapping between feature spaces occurs.
Show more

8 Read more

Spectral Methods for Learning Latent Variable Models: Unsupervised and Supervised Settings

Spectral Methods for Learning Latent Variable Models: Unsupervised and Supervised Settings

“A Tensor Spectral Approach to Learning Mixed Membership Community Models” by A.. Outline[r]

76 Read more

Analysis of the readiness to buy cultural tourism online by means of latent variable models

Analysis of the readiness to buy cultural tourism online by means of latent variable models

The growing importance that the Internet is beginning to acquire as a tool for distribution in  the  world  of  travel  is  conditioning  the  marketing  and  management  strategies  of  tourism  providers. These have an ever more urgent need to understand consumers' attitude towards  online purchasing. This need is most patent for the most vulnerable agents – small firms and  emerging tourist destinations – especially in today's global and changing environment.   Segmenting  the  market  into  homogeneous  groups  enables  policymakers  to  better  match  their  offer  to  consumer  demands,  facilitating  their  decision‐making  process,  and  above  all  ensuring its  greater  efficiency.  To perform this  segmentation, in the present case  related  to  the  readiness  to  purchase  tourism  services  and  products  online  as  expressed  by  tourists  visiting  medium‐sized  towns  in  Andalusia  (Spain),  we  have  used  an  innovative  methodological approach based on latent class analysis.  
Show more

15 Read more

Show all 10000 documents...