Deep **latent** **variable** **models** (LVM) such as variational auto-encoder (VAE) have recently played an important role in text generation. One key factor is the exploitation of smooth **latent** structures to guide the generation. How- ever, the representation power of VAEs is lim- ited due to two reasons: (1) the Gaussian as- sumption is often made on the variational pos- teriors; and meanwhile (2) a notorious “poste- rior collapse” issue occurs. In this paper, we advocate sample-based representations of vari- ational distributions for natural language, lead- ing to implicit **latent** features, which can pro- vide flexible representation power compared with Gaussian-based posteriors. We further de- velop an LVM to directly match the aggregated posterior to the prior. It can be viewed as a nat- ural extension of VAEs with a regularization of maximizing mutual information, mitigating the “posterior collapse” issue. We demonstrate the effectiveness and versatility of our **models** in various text generation scenarios, including language modeling, unaligned style transfer, and dialog response generation. The source code to reproduce our experimental results is available on GitHub 1 .

Show more
11 Read more

Perhaps the most prominent current **latent** **variable** mod- els are derived from the Gaussian Process **Latent** **Variable** Model [8, 23] and the Gaussian Process Dynamical Model [24]. Such **models** can serve as effective priors for track- ing [23, 24] and can be learned with small training cor- pora [23]. However, larger corpora are problematic since learning and inference are O(N 3 ) and O(N 2 ), where N is the number of training exemplars. While sparse approxima- tions to GPs exist [10], sparsification is not always straight- forward and effective. Recent additions to the GP family include the topologically-constrained GPLVM [25], Multi- factor GPLVM [26], and Hierarchical GPLVM [9]. Such **models** permit stylistic diversity and multiple motions (un- like the GPLVM and GPDM), but to date these **models** have not been used for tracking, and complexity remains an issue. Most generative priors do not address the issue of ex- plicit inference over activity labels. While **latent** **variable** **models** can be constructed from data that contains multi- ple activities [20], knowledge about the activities and tran- sitions between them is typically only implicit in training data. As a result, training prior **models** to capture transi- tions, especially when they do not occur in training data, is challenging and often requires that one constrain the model explicitly (e.g. [25]). In [12] a coordinated mixture of fac- tor analyzers was used to facilitate model selection, but to our knowledge, this model has not been used for tracking multiple activities and transitions. Another way to handle transitions is to to build a discriminative classifier for activ- ities, and then use corresponding activity-specific priors to bootstrap the pose inference [1]. The proposed imCRBM model bridges the gap between pose and activity inference within a single coherent and efficient generative framework.

Show more
Large unsupervised **latent** **variable** **models** (LVMs) of text, such as **Latent** Dirichlet Al- location **models** or Hidden Markov **Models** (HMMs), are constructed using parallel train- ing algorithms on computational clusters. The memory required to hold LVM parameters forms a bottleneck in training more powerful **models**. In this paper, we show how the mem- ory required for parallel LVM training can be reduced by partitioning the training corpus to minimize the number of unique words on any computational node. We present a greedy document partitioning technique for the task. For large corpora, our approach reduces mem- ory consumption by over 50%, and trains the same **models** up to three times faster, when compared with existing approaches for paral- lel LVM training.

Show more
We study parameter inference in large-scale **latent** **variable** **models**. We first propose a unified treat- ment of online inference for **latent** **variable** **models** from a non-canonical exponential family, and draw explicit links between several previously proposed frequentist or Bayesian methods. We then propose a novel inference method for the frequentist estimation of parameters, that adapts MCMC methods to online inference of **latent** **variable** **models** with the proper use of local Gibbs sam- pling. Then, for **latent** Dirichlet allocation,we provide an extensive set of experiments and compar- isons with existing work, where our new approach outperforms all previously proposed methods. In particular, using Gibbs sampling for **latent** **variable** inference is superior to variational inference in terms of test log-likelihoods. Moreover, Bayesian inference through variational methods perform poorly, sometimes leading to worse fits with **latent** variables of higher dimensionality.

Show more
45 Read more

3.4.3 Choosing | Z | . In the “parametric” **latent** **variable** **models** used here the number of topics or semantic classes, | Z | , must be fixed in advance. This brings significant efficiency advantages but also the problem of choosing an appropriate value for | Z | . The more classes a model has, the greater its capacity to capture fine distinctions between entities. However, this finer granularity inevitably comes at a cost of reduced general- ization. One approach is to choose a value that works well on training or development data before evaluating held-out test items. Results in lexical semantics are often reported over the entirety of a data set, meaning that if we wish to compare those results we cannot hold out any portion. If the method is relatively insensitive to the parameter it may be sufficient to choose a default value. Rooth et al. (1999) suggest cross-validating on the training data likelihood (and not on the ultimate evaluation measure). An alter- native solution is to average the predictions of **models** trained with different choices of | Z | ; this avoids the need to pick a default and can give better results than any one value as it integrates contributions at different levels of granularity. As mentioned in Section 3.4.2 we must take care when averaging predictions to compute with quan- tities that do not rely on topic identity—for example, estimates of P(a | p) can safely be combined whereas estimates of P(z 1 | p) cannot.

Show more
46 Read more

Louis (1982) developed a technique for computing the observed information matrix when the E-M algorithm ir. used to find the maximum likelihood estimates in incomplete data problems. It requires computation of the complete-data gradient and second derivative matrix which can be implemented quite simply in the E-M iterations. This procedure can be applied to obtain the asymptotic variance-covariance matrix in **latent** **variable** **models**, since they involve observable (manifest) variables and not directly observable (**latent**) variables which corresponds to a case of incomplete data, as defined by Dempster, Laird and Rubin (1977).

Show more
356 Read more

Abstract. Standard methods for maximum likelihood parameter estimation in **latent** **variable** **models** rely on the Expectation-Maximization algorithm and its Monte Carlo variants. Our approach is different and motivated by similar considerations to simulated annealing; that is we build a sequence of artificial distributions whose support concentrates itself on the set of maximum likeli- hood estimates. We sample from these distributions using a sequential Monte Carlo approach. We demonstrate state of the art performance for several ap- plications of the proposed approach.

16 Read more

The coupled model (C-LTM) has several advantages. First, the marker-specific LTMs account for marker trajectory shapes using components at the population, subpopulation, individual long-term, and individual short-term levels, which simultaneously allows for het- erogeneity across and within individuals, and enables statistical strength to be shared across observations at different “resolutions” of the data. Within an individual marker model, the population and subpopulation components are learned offline, while estimates of the individual-specific parameters are refined over the course of the disease as data accrues for that individual. Second, our coupling model allows us to condition both the target and auxiliary marker histories to make predictions about the future target marker trajectory. We therefore use the marker-specific **latent** **variable** **models** to neatly summarize and ex- tra information from the irregularly sampled and sparse, while simultaneously sidestepping the issue of jointly modeling both the target and auxiliary markers. The conditional for- mulation is less sensitive to misspecified dependencies between different marker types and can also be easily scaled to diseases with a large number of auxiliary markers. Finally, our model aligns with clinical practice; predictions are dynamically updated in continuous time as new marker observations are measured. We also note that our description of the method and the experimental results focus on predicting the trajectory of a single clinical marker, but multiple **latent** factor regression **models** can be easily fit so that many markers can be simultaneously predicted. Using this extension, we only need to maintain different CRF parameters; the **latent** **variable** **models** are shared since they are fit independently as a precursor to learning the CRF.

Show more
35 Read more

Probabilistic **latent** **variable** **models** are one of the cornerstones of machine learn- ing. They offer a convenient and coherent way to specify prior distributions over unobserved structure in data, so that these unknown properties can be inferred via posterior inference. Such **models** are useful for exploratory analysis and visual- ization, for building density **models** of data, and for providing features that can be used for later discriminative tasks. A significant limitation of these **models**, however, is that draws from the prior are often highly redundant due to i.i.d. as- sumptions on internal parameters. For example, there is no preference in the prior of a mixture model to make components non-overlapping, or in topic model to ensure that co-occurring words only appear in a small number of topics. In this work, we revisit these independence assumptions for probabilistic **latent** **variable** **models**, replacing the underlying i.i.d. prior with a determinantal point process (DPP). The DPP allows us to specify a preference for diversity in our **latent** vari- ables using a positive definite kernel function. Using a kernel between probability distributions, we are able to define a DPP on probability measures. We show how to perform MAP inference with DPP priors in **latent** Dirichlet allocation and in mixture **models**, leading to better intuition for the **latent** **variable** representation and quantitatively improved unsupervised feature extraction, without compromis- ing the generative aspects of the model.

Show more
10 Read more

We introduce standardised building blocks designed to be used with variational Bayesian learning. The blocks include Gaussian variables, summation, multiplication, nonlinearity, and delay. A large variety of **latent** **variable** **models** can be constructed from these blocks, including nonlinear and variance **models**, which are lacking from most existing variational systems. The introduced blocks are designed to fit together and to yield efficient update rules. Practical implementation of various **models** is easy thanks to an associated software package which derives the learning formulas au- tomatically once a specific model structure has been fixed. Variational Bayesian learning provides a cost function which is used both for updating the variables of the model and for optimising the model structure. All the computations can be carried out locally, resulting in linear computational complexity. We present experimental results on several structures, including a new hierarchical nonlinear model for variances and means. The test results demonstrate the good performance and usefulness of the introduced method.

Show more
47 Read more

interpretable description of the data – via the structure of the LVM – and an effective generator for synthetic data. It is interesting to observe that not- interpretable **models** able to generate plausible data in challenging contexts exist, and have been object of an increasing interest in the deep learning community during the last few years. These **models**, called implicit generative **models** (Mohamed and Lakshminarayanan, 2016), are probabilistic parametric **models** whose likelihood function is not explicitly accessible but from which it is possible to generate synthetic samples. Unlike the LVMs we have seen up to now, they are not interpretable, and they are typically represented via a deep neural network with millions of parameters, which propagates the outcomes of a prior to generate synthetic, structured data. These networks are trained with the explicit objective of generating data indistinguishable from the training one: they use objective functions aimed for example at minimizing the MMD (Li et al., 2015) or at matching the moments between synthetic and training data (Ravuri et al., 2018) or at making a discriminator unable to distinguish between synthetic and real data (Goodfellow et al., 2014). Implicit generative **models** have been successfully used to generate synthetic data from a variety of scenarios, including images (Goodfellow et al., 2014), texts (Yu et al., 2017) and medical data (Esteban et al., 2017; Choi et al., 2017; Yoon et al., 2018) and, while they seem to be able to simulate arbitrarily complex **models**, they do not provide any interpretation on the process generating the data; additionally, these **models** typically require longer training times in comparison with the simpler **latent** **variable** **models** studied in this thesis (see Aviñó et al., 2018). Understanding in which contexts and in which modeling tasks implicit generative **models** could be substituted by more interpretable LVMs is an interesting area of future research.

Show more
185 Read more

This paper builds on the recent works of Anandkumar et al (Anandkumar et al., 2012, 2013b) which establishes the correctness of tensor-based approaches for learning MMSB (Airoldi et al., 2008) **models** and other **latent** **variable** **models**. While, the earlier works provided a theoretical analysis of the method, the current paper considers a careful implementation of the method. Moreover, there are a number of algorithmic improvements in this paper. For instance, while (Anandkumar et al., 2012, 2013b) consider tensor power iterations, based on batch data and deflations performed serially, here, we adopt a stochastic gradient descent approach for tensor decomposition, which provides the flexibility to trade-off sub-sampling with accuracy. Moreover, we use randomized methods for dimensionality reduction in the preprocessing stage of our method which enables us to scale our method to graphs with millions of nodes.

Show more
39 Read more

We proposed methods for diagnosing estimator robustness to distributional spec- ification of **latent** **variable** **models** in the structural measurement error **models** and joint **models**. We defined analytic robustness conditions in Chapter 2. In Chapters 4 and 5 we demonstrated the implementation and performance of remeasurement method and its improved version along with several test statistics. The **models** we studied cover a wide range of **latent**-**variable** **models** that are suitable in many ap- plications. In principle, these methods can be applied to any **latent** **variable** **models** with some sort of error model whose effects on the observed data can be simulated so that remeasured data can be generated. Therefore, these methods provide data analyst with a systematic approach useful for practical use.

Show more
90 Read more

10 Read more

The model possesses a number of advantageous attributes; it is fully generative meaning that it is easy to make infer- ences on new documents and its Bayesian nature helps to overcome the over-fitting problem present in **models** such as Probabilistic **Latent** Semantic Indexing (pLSI) [9]. Also since in LDA each document is a mixture of topics it is far more flexible than **models** that assume each document is only drawn from a single topic. These advantages also extend to **models** based on LDA with both the added flexibility and ability to easily incorporate new data being very important for our own **models**, as we shall discuss later. Many useful **models** have used LDA as a base to work from, for example the Author-Topic model [18] which **models** academic pub- lications and citations. It has even been used to develop **models** to annotate images based on their visual features [3].

Show more
10 Read more

In order to justify the usefulness of EP for Ising **models** we therefore need an alternative argu- ment. Our argument is entirely restricted to Gaussian EP for our extended definition of GLVMs and do not extend to approximations with other exponential families. In the following, we will discuss these assumptions in inference approximations that preceded the formulation of EP, in order to pro- vide a possibly more relevant justification of the method. Although this justification is not strictly necessary for practically using EP nor corrections to EP, it nevertheless provides a good starting point for understanding both.

Show more
42 Read more

We derive two variants of a semi-supervised model for fine-grained sentiment analysis. Both **models** leverage abundant natural super- vision in the form of review ratings, as well as a small amount of manually crafted sentence labels, to learn sentence-level sentiment clas- sifiers. The proposed model is a fusion of a fully supervised structured conditional model and its partially supervised counterpart. This allows for highly efficient estimation and infer- ence algorithms with rich feature definitions. We describe the two variants as well as their component **models** and verify experimentally that both variants give significantly improved results for sentence-level sentiment analysis compared to all baselines.

Show more
In this paper we introduce a new underlying probabilistic model for prin- cipal component analysis (PCA). Our formulation interprets PCA as a particular Gaussian process prior on a mapping from a **latent** space to the observed data-space. We show that if the prior’s covariance func- tion constrains the mappings to be linear the model is equivalent to PCA, we then extend the model by considering less restrictive covariance func- tions which allow non-linear mappings. This more general Gaussian pro- cess **latent** **variable** model (GPLVM) is then evaluated as an approach to the visualisation of high dimensional data for three different data-sets. Additionally our non-linear algorithm can be further kernelised leading to ‘twin kernel PCA’ in which a mapping between feature spaces occurs.

Show more
“A Tensor Spectral Approach to Learning Mixed Membership Community Models” by A.. Outline[r]

76 Read more

The growing importance that the Internet is beginning to acquire as a tool for distribution in the world of travel is conditioning the marketing and management strategies of tourism providers. These have an ever more urgent need to understand consumers' attitude towards online purchasing. This need is most patent for the most vulnerable agents – small firms and emerging tourist destinations – especially in today's global and changing environment. Segmenting the market into homogeneous groups enables policymakers to better match their offer to consumer demands, facilitating their decision‐making process, and above all ensuring its greater efficiency. To perform this segmentation, in the present case related to the readiness to purchase tourism services and products online as expressed by tourists visiting medium‐sized towns in Andalusia (Spain), we have used an innovative methodological approach based on **latent** class analysis.

Show more
15 Read more