Latent Dirichlet Allocation (LDA)

Top PDF Latent Dirichlet Allocation (LDA):

Class specific Gaussian multinomial latent Dirichlet allocation for image annotation

Class specific Gaussian multinomial latent Dirichlet allocation for image annotation

Considering the problem of intra-tag diversity, a straightforward way is to set up the class-specific tech- niques [10,11] by treating annotation tags as class labels and learning the visual contents within each class. Although capable of identifying sets of visual contents discriminative for the classes of interest, these straight- forward methods do not explicitly model the interclass and intraclass structures of visual distributions due to its lack of hierarchical content groupings. To facilitate the discovery of these structures, various hierarchical gener- ative methods have been recently ported from the text to the vision literature. Among these methods, topic models, such as latent Dirichlet allocation (LDA) [12] and proba- bilistic latent semantic analysis (pLSA) [13], that consider probabilistic latent variable models for hierarchical learn- ing have caused extensive interest. However, an analysis of previous supervised topic models [14] shows that the top- ics discovered by these models are driven by general image regularities rather than the semantic regularities for image annotation. For example, it has been noted in [14] that given a collection of movie reviews, LDA might discover topics as movie properties, such as genres, which are not central to the annotation task. Therefore, incorporating a class label variable into a generative model might tackle the intra-tag diversity problem well. Such extensions have been successfully applied into the classification task, such as class LDA (cLDA) [14], supervised LDA (sLDA) [15], class-specific-simplex LDA (css-LDA) [16], and so on.
Show more

13 Read more

Flock The Similar Users Of Twitter By Using  Latent Dirichlet Allocation

Flock The Similar Users Of Twitter By Using Latent Dirichlet Allocation

Latent Dirichlet Allocation(LDA) is an unsupervised method used for detecting the topics across the given data. LDA is the first one, which presented a graphical representation for topic discovery by David Blei et.al in 2002[8][21]. The posts generated by the users of OSN containing unstructured data and an exact model of analyzing and finding the hidden topic is needed for efficient mining process. LDA is suitable for detecting the hidden topics and uses a generative model to mimic the writing process of humans for the generation of topics. The generative model is not present in other topic detection techniques like Latent Semantic Allocation(LSA) etc. ————————————————
Show more

5 Read more

Principled Selection of Hyperparameters in the Latent Dirichlet Allocation Model

Principled Selection of Hyperparameters in the Latent Dirichlet Allocation Model

Latent Dirichlet Allocation (LDA) is a well known topic model that is often used to make inference regarding the properties of collections of text documents. LDA is a hierarchical Bayesian model, and involves a prior distribution on a set of latent topic variables. The prior is indexed by certain hyperparameters, and even though these have a large impact on inference, they are usually chosen either in an ad-hoc manner, or by applying an algorithm whose theoretical basis has not been firmly established. We present a method, based on a combination of Markov chain Monte Carlo and importance sampling, for estimating the maximum likelihood estimate of the hyperparameters. The method may be viewed as a computational scheme for implementation of an empirical Bayes analysis. It comes with theoretical guarantees, and a key feature of our approach is that we provide theoretically-valid error margins for our estimates. Experiments on both synthetic and real data show good performance of our methodology.
Show more

38 Read more

Exploring Topic Discriminating Power of Words in Latent Dirichlet Allocation

Exploring Topic Discriminating Power of Words in Latent Dirichlet Allocation

Latent Dirichlet Allocation (LDA) and its variants have been widely used to discover latent topics in textual documents. However, some of topics generated by LDA may be noisy with irrelevant words scattering across these topics. We name this kind of words as topic-indiscriminate words, which tend to make topics more ambiguous and less interpretable by humans. In our work, we propose a new topic model named TWLDA, which assigns low weights to words with low topic discriminating power (ability). Our experimental results show that the proposed approach, which effectively reduces the number of topic-indiscriminate words in discovered topics, improves the effectiveness of LDA.
Show more

10 Read more

Arabic Text Classification Framework Based on Latent Dirichlet Allocation

Arabic Text Classification Framework Based on Latent Dirichlet Allocation

Recently, there has been a progress in the mod- els of document description; this progress is based on techniques which embed more and more semantics. These models are known for the generative aspect, they can provide a method to achieve correct syntactic and semantic de- scription of texts. Among these models we quote the LDA ( Latent Dirichlet Allocation ) ; the basic idea is that a document is a mixture probability of ( hidden ) latent themes ( topics ) . Then, every topic is characterized by a proba- bility distribution of words which are associated with it. We thus see that the key element is the notion of theme i.e. that semantics is prioritized over the notion of term or word.
Show more

16 Read more

LogisticLDA: Regularizing Latent Dirichlet Allocation by Logistic Regression

LogisticLDA: Regularizing Latent Dirichlet Allocation by Logistic Regression

As an unsupervised method, Latent Dirichlet Allocation (LDA) (Blei et al., 2003) seeks to si- multaneously find a set of basis (i.e. topics) and embed documents to the latent space spanned by this basis. Due to its inherent capability of producing interpretable and semantically coherent topics, LDA has been widely used in text analysis and shown promising performance in tasks like topic mining, browsing, and accessing document similarity. In contrast, when LDA is applied to text classification tasks, it is often used as only a dimension reduction step to extract features for consecutive discriminative models (e.g. SVM). Because the objective of LDA (as well as other unsupervised topic models) is to infer the best set of latent topics that can explain the document collection rather than separate different classes, the training of LDA is actually independent to supervisory information. This paradigm greatly limits its applications in classification tasks.
Show more

10 Read more

Categorizing Research Papers By Topics Using Latent Dirichlet Allocation Model

Categorizing Research Papers By Topics Using Latent Dirichlet Allocation Model

Topic modeling refers to the task of identifying topics that best describes a set of documents. The technique is latent because the topics emerge during the topic modeling process. And one popular topic modeling technique is Latent Dirichlet Allocation (LDA). In natural language processing, Latent Dirichlet Allocation (LDA) is a topic modeling technique that automatically discovers topics in text documents. LDA considers a document as a mix of various topics and each word belongs to one of the document's topics. This algorithm was first presented as a graphical model for topic discovery by David Blei, Andrew Ng, and Michael Jordan in 2003. LDA imagines a fixed set of topics. Each topic represents a set of words. The goal of LDA is to map all the documents to the topics in a way, such that the words in each document are mostly related to the imaginary topics. When classifying newspaper articles [5] , Story A may contain a topic with the words ―catch,‖ ―goal,‖ ―referee,‖ and ―won.‖ It’d be reasonable to assume that Story A is about Sports. Whereas Story B may return a topic with the words ―forecast,‖ ―economy,‖ ―shares,‖ and ―profits.‖ Story B is clearly about Business. LDA calculates the probability that a word belongs to a topic and processes the text. For instance, in Story B, the word ―movie‖ would have a higher probability than the word ―rated.‖ This makes intuitive sense because ―movie‖ is more closely related to the topic Entertainment than the word ―rated.‖ LDA is useful when there are a set of documents, and the goal is to discover patterns within but without knowing about the documents. LDA is used to generate topics of a document, recommendation systems, document classification, data exploration, and document summarization. Further, LDA is useful in training linear regression models with the topics and their occurrences.
Show more

5 Read more

HarpLDA+: Optimizing Latent Dirichlet Allocation for Parallel Efficiency

HarpLDA+: Optimizing Latent Dirichlet Allocation for Parallel Efficiency

Latent Dirichlet Allocation (LDA) [1] is a widely used machine learning technique in topic modeling and data analysis. LDA training are iterative algorithms, starting from a randomly initialized model(parameters to learn) and itera- tively computing and updating the model until it converges. It is an irregular computation with a model size that can be huge and changes as one iterates to convergence. Meanwhile, parallel workers need to synchronize the model. State-of- the-art LDA trainers are implemented to handle billions of documents, hundreds of billion tokens, millions of topics and millions of unique tokens. However, the pros and cons of different approaches in the existing tools are often hard to explain because many trade-offs between effectiveness and efficiency of model updates and implementation details impact the performance of LDA training systems. One of the popular trade-offs is to decrease the time complexity of the computation by introducing approximations. Another widely used one is to reduce the sychronization overhead by using an asynchronous design working on stale model. In this paper, we propose a different approach to design a high performance trainer. Our main contributions can be summarized as follows:
Show more

10 Read more

HarpLDA+: Optimizing Latent Dirichlet Allocation for Parallel Efficiency

HarpLDA+: Optimizing Latent Dirichlet Allocation for Parallel Efficiency

Abstract—Latent Dirichlet Allocation (LDA) is a widely used machine learning technique in topic modeling and data analysis. Training large LDA models on big datasets involves dynamic and irregular computation patterns and is a major challenge to both algorithm optimization and system design. In this paper, we present a comprehensive benchmarking of our novel synchronized LDA training system HarpLDA+ based on Hadoop and Java. It demonstrates impressive performance when compared to three other MPI/C++ based state-of-the-art systems, which are LightLDA, F+NomadLDA, and WarpLDA. HarpLDA+ uses optimized collective communication with a timer control for load balance, leading to stable scalability in both shared-memory and distributed systems. We demonstrate in the experiments that HarpLDA+ is effective in reducing synchronization and communication overhead and outperforms the other three LDA training systems.
Show more

10 Read more

Latent Dirichlet Allocation for Internet Price War

Latent Dirichlet Allocation for Internet Price War

If we are able to reveal these kinds of missing information, we can find the best strategy for playing such a game, and also obtain a better understanding of the price war. Latent Dirichlet Allocation (LDA) is a powerful tool to learn the la- tent variables, which have been applied in a lot of fields, such as text processing (Blei, Ng, and Jordan 2003), causal in- ference (Lauritzen 2001), image classification (Chong, Blei, and Li 2009) and so on. Thus we also consider the LDA model for this scenario. It characterizes the interactions us- ing the observable information about consumptions in one’s own company as a variable dependent on customers’ pref- erences, which is in turn also dependent on both its strategy and its competitors’ strategies of providing price reduction. Aided by the LDA, we can infer the latent variables to ap- proximately characterize the environment and further seek better strategies through other decision-making algorithms like Deep Reinforcement Learning (DRL). The combined method forms a complete framework to deal with imperfect information scenario, inferring latent variables through LDA first and find better strategies based on transferred perfect in- formation environment.
Show more

8 Read more

Online Advertising In Website through Related Latent Topic Models Using Latent Dirichlet Allocation Algorithm

Online Advertising In Website through Related Latent Topic Models Using Latent Dirichlet Allocation Algorithm

In order to scale SEM with an increasing number of product offerings while at the same time optimizing for conversions, we propose a framework called Topic Machine. In Topic Machine, we learn the latent topics hidden in the available search terms reports. Our hypothesis is that these topics correspond to the set of information needs that best match make the client with its users. For this purpose, we use a Latent Dirichlet Allocation (LDA) based topic model [6]. Since information needs may change over time or drift in concept, we learn dynamic topic models (DTM) by sequentially chaining model parameters in a Gaussian process across a well-defined epoch, e.g., weekly, bi-weekly, or monthly. In order to assess the quality of the models learned, we show the predictive power of the framework by measuring how well conversions per epoch can be predicted.
Show more

6 Read more

Authorship Attribution with Latent Dirichlet Allocation

Authorship Attribution with Latent Dirichlet Allocation

The problem of authorship attribution – at- tributing texts to their original authors – has been an active research area since the end of the 19th century, attracting increased interest in the last decade. Most of the work on au- thorship attribution focuses on scenarios with only a few candidate authors, but recently con- sidered cases with tens to thousands of can- didate authors were found to be much more challenging. In this paper, we propose ways of employing Latent Dirichlet Allocation in authorship attribution. We show that our ap- proach yields state-of-the-art performance for both a few and many candidate authors, in cases where these authors wrote enough texts to be modelled effectively.
Show more

9 Read more

Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation

Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation

In this paper, we develop multilingual super- vised latent Dirichlet allocation (M L SLDA), a probabilistic generative model that allows insights gleaned from one language’s data to inform how the model captures properties of other languages. M L SLDA accomplishes this by jointly modeling two aspects of text: how multilingual concepts are clustered into themat- ically coherent topics and how topics associ- ated with text connect to an observed regres- sion variable (such as ratings on a sentiment scale). Concepts are represented in a general hierarchical framework that is flexible enough to express semantic ontologies, dictionaries, clustering constraints, and, as a special, degen- erate case, conventional topic models. Both the topics and the regression are discovered via posterior inference from corpora. We show M L SLDA can build topics that are consistent across languages, discover sensible bilingual lexical correspondences, and leverage multilin- gual corpora to better predict sentiment.
Show more

11 Read more

Latent Dirichlet Allocation with Topic in Set Knowledge

Latent Dirichlet Allocation with Topic in Set Knowledge

Latent Dirichlet Allocation is an unsupervised graphical model which can discover latent top- ics in unlabeled data. We propose a mech- anism for adding partial supervision, called topic-in-set knowledge, to latent topic mod- eling. This type of supervision can be used to encourage the recovery of topics which are more relevant to user modeling goals than the topics which would be recovered otherwise. Preliminary experiments on text datasets are presented to demonstrate the potential effec- tiveness of this method.

6 Read more

Unsupervised Concept Annotation using Latent Dirichlet Allocation and Segmental Methods

Unsupervised Concept Annotation using Latent Dirichlet Allocation and Segmental Methods

Training efficient statistical approaches for natural language understanding generally re- quires data with segmental semantic annota- tions. Unfortunately, building such resources is costly. In this paper, we propose an ap- proach that produces annotations in an unsu- pervised way. The first step is an implementa- tion of latent Dirichlet allocation that produces a set of topics with probabilities for each topic to be associated with a word in a sentence. This knowledge is then used as a bootstrap to infer a segmentation of a word sentence into topics using either integer linear optimisation or stochastic word alignment models (IBM models) to produce the final semantic anno- tation. The relation between automatically- derived topics and task-dependent concepts is evaluated on a spoken dialogue task with an available reference annotation.
Show more

10 Read more

Following Topics over Time using Epoch Latent Dirichlet Allocation.

Following Topics over Time using Epoch Latent Dirichlet Allocation.

However, unlike the word clusters that LDA (section 2.2) finds, the latent feature vectors do not generally map well to concepts. Two factors contribute to the lack of interpretability of the latent features. First, though the word-document matrix has only non- negative entries, the matrix factors in the singular value decomposition of ! can have negative entries. That is, the effect of one feature vector can be partly cancelled out by that of another. In addition, there is no penalty for using many features to approximate a document. Consequently, LSA often relies upon complex cancellations among many features to represent groups of frequently co-occurring word types, making the individual features difficult to interpret.
Show more

37 Read more

Augmenting word2vec with latent Dirichlet allocation within a clinical application

Augmenting word2vec with latent Dirichlet allocation within a clinical application

Word embedding projects words into a lower- dimensional latent space that captures semantic and morphological information. Separately but re- lated, the task of topic modelling also discovers la- tent semantic structures or topics in a corpus. La- tent Dirichlet allocation (LDA) uses bag-of-words statistics to infer topics in an unsupervised man- ner. LDA considers each document to be a prob- ability distribution over hidden topics, and each topic is a probability distribution over all words in the vocabulary, both with Dirichlet priors.

5 Read more

Term Weighting Schemes for Latent Dirichlet Allocation

Term Weighting Schemes for Latent Dirichlet Allocation

Many implementations of Latent Dirichlet Al- location (LDA), including those described in Blei et al. (2003), rely at some point on the removal of stopwords, words which are as- sumed to contribute little to the meaning of the text. This step is considered necessary be- cause otherwise high-frequency words tend to end up scattered across many of the latent top- ics without much rhyme or reason. We show, however, that the ‘problem’ of high-frequency words can be dealt with more elegantly, and in a way that to our knowledge has not been considered in LDA, through the use of appro- priate weighting schemes comparable to those sometimes used in Latent Semantic Indexing (LSI). Our proposed weighting methods not only make theoretical sense, but can also be shown to improve precision significantly on a non-trivial cross-language retrieval task.
Show more

9 Read more

Storyline detection and tracking using Dynamic Latent Dirichlet Allocation

Storyline detection and tracking using Dynamic Latent Dirichlet Allocation

words define a vocabulary and topics are rep- resented by a probabilistic distribution of words from this vocabulary. Each word may be part of several topic representations. LDA assumes that each document from a collection of docu- ments is generated from a probabilistic distribu- tion of topics. Bayes’ Theorem in combination with a Dirichlet distribution as prior distribution are used to approximate the true posterior dis- tribution. The probability space defined by the probabilities of the words and topics is multi- dimensional which is represented by a multino- mial distribution. For the a priori estimation the conjugate distribution is needed, which corre- sponds to a Dirichlet distribution in this case. Information gain is used as measure for the dif- ference between two iterated probability distri- butions and thereby acts as convergence crite- rion.
Show more

11 Read more

Particle Filter Rejuvenation and Latent Dirichlet Allocation

Particle Filter Rejuvenation and Latent Dirichlet Allocation

Previous research has established sev- eral methods of online learning for la- tent Dirichlet allocation (LDA). How- ever, streaming learning for LDA— allowing only one pass over the data and constant storage complexity—is not as well explored. We use reservoir sam- pling to reduce the storage complexity of a previously-studied online algorithm, namely the particle filter, to constant. We then show that a simpler particle filter im- plementation performs just as well, and that the quality of the initialization dom- inates other factors of performance. 1 Introduction

6 Read more

Show all 7710 documents...