2.2 Sentiment Classification of Documents, Sentences, and Tweets
2.2.4 Multi-Domain Sentiment Classification
The relationship between text and sentiment, as enunciated in Chapter 1, can vary from one domain to another. One of the first works studying this phe- nomenon was the thesis of Engström (2004). In that work, multiple sentiment classifiers were trained and tested on data from different domains (e.g., movie
reviews, automobiles). Results showed that classifiers trained on certain do- mains were unlikely to perform as well when tested on different domains.
A possible approach to tackle this problem is to include labelled data from all the domains to which the model will be applied in the training dataset. The problem of this approach is that the data distribution may vary from one do- main to another, and hence, it is very difficult to design a robust multi-domain classifier (Glorot, Bordes and Bengio, 2011). Another simple solution is to train domain-specific classifiers for each domain. However, we know from the dis- cussion of the label sparsity problem that obtaining labelled data from several domains is a costly process. Additionally, the sentiment patterns of different domains are not equally closely related to each other. For example, book and movie reviews are more related to each other than restaurant reviews. Hence, the naive approach of training individual classifiers for each domain may be inefficient because the knowledge provided by labelled instanced from similar domains is not exploited.
According to (Read, 2005), emoticons are, unlike opinion words, poten- tially domain-independent sentiment indicators. Thus, they could address the domain-dependency problem when used to label training data for supervised learning.
In (Wu and Huang, 2015), the multi-domain sentiment classification prob- lem is addressed by jointly training a global sentiment model with multiple domain-specific ones, in which each model corresponds to a linear classifier. A convex loss function is optimised that considers different sources of infor- mation: 1) labelled documents from all target domains, 2) a global sentiment lexicon, 3) domain-specific sentiment lexicons for each domain, and 4) inter- domain similarities. The domain-specific lexicons are calculated from labelled examples of each domain using PMI semantic orientation and expanded to words occurring in domain-specific unlabelled data. The expansion is done by propagating the existing labels using a graph of associations between words. The inter-domain similarities are calculated in two ways: 1) using unigram- based textual similarities, and 2) by relying on the cosine similarity between domain-specific lexicons. The loss function is regularised using a combination
ofL1 andL2norms and optimised using an accelerated algorithm. The novelty
of this approach is that it explicitly exploits the fact that some domains share more sentiment information with others. The experimental results show that the proposed model outperforms the multi-domain sentiment classification ac- curacy of several existing multi-domain approaches.
A number of methods have been proposed to adapt sentiment classifiers from a source to a target domain. This strategy is suitable when the avail- ability of labelled data is much higher in the source domain than in the target domain. In this direction, four model-transfer approaches were compared in (Aue and Gamon, 2005). The dataset consisted of a mixture of four differ- ent domains: movie reviews, book reviews, product support services web sur- vey data, and knowledge base web survey data. The first three approaches apply SVMs classifiers with the following features: unigrams, bigrams, and trigrams. In the first approach, which is used as the baseline method, one single classifier is trained from all the domains. The second one follows a sim- ilar idea as the former, but the features are limited to the ones observed in the target domain. The third approach uses ensembles of classifiers from the domains with available labelled data. Finally, the fourth approach combines small amounts of labelled data with large amounts of unlabelled data in the target domain following an expectation maximisation (EM) learning strategy based on naive Bayes. Experiments were carried out using different numbers of labelled examples across the different domains, and results showed that the EM approach tends to achieve better accuracy than the others. The authors argue that this occurred because the EM method is the only one that takes advantage of unlabelled examples in the target domain.
Glorot et. al also exploited unlabelled data for domain adaptation in (Glorot et al., 2011), using a deep learning procedure. High-level representations are learnt in an unsupervised fashion from unlabelled data provided from multiple domains. This is carried out using Stacked Denoising Auto-Encoders with a sparse rectifier unit (Vincent, Larochelle, Bengio and Manzagol, 2008). Then, a linear SVM is trained on the transformed labelled data of the source domain and used to classify the testing data from the target domain. The hypothe- sis of the approach is that higher-level features are intermediate abstractions which are shared across different domains. Experimental results show that this approach can successfully perform domain adaptation on a dataset of 22 domains.
A different, but somewhat related problem is to simultaneously extract top- ics and opinions from a corpus of opinionated data on multiple topics. Proba- bilistic generative models were proposed in (Mei, Ling, Wondra, Su and Zhai, 2007) and (Lin and He, 2009) to model the generative process of words re- garding both topics and sentiment polarities. These models extend the topic modelling approach (Blei, Ng and Jordan, 2003), in which it is assumed that
words in a corpus of documents are generated by a mixture of topics. The ex- tension is based on the assumption that words within topics are also generated by a sentiment model that defines the polarity of the word.
The model proposed in (Mei et al., 2007), called the topic-sentiment mixture model (TSM), relies on a mixture of four multinomial distributions to describe the stochastic process in which words are generated from a corpus of opinion- ated documents about multiple topics. The distributions and the generative process are described as follows:
1. The background topic modelθB captures common English words such as
“the", “a", and “of".
2. The k topic models Θ = {θ1, . . . , θk}capture the neutral words related to
the different topics in the collection.
3. The positive sentiment modelθP models positive opinions.
4. The negative sentiment modelθN captures negative opinions.
The generation of a document proceeds as follows in this model. First, it is randomly decided whether the current word is a common English word or not.
If so, the word is drawn from θB. Otherwise, it is decided from which of the
k topics the word will be sampled. Then, it is decided whether the word will
describe the topic with a neutral, positive, or negative orientation. According
to this decision, the word is drawn from either θi (i being the selected topic),
θP, or θN, respectively. This process is repeated until all the words from the
document are generated.
The parameters of the model are estimated using a maximum a posteriori estimation procedure. The prior distributions of the sentiment models are learnt first. Then, they are combined with the data likelihood to estimate the parameters of the maximum a posterior estimator. An important limitation of this model is that sentiment models are the same for all the different topics. Therefore, this model is not able to capture opinion words which are specific to particular domains.
Another sentiment topic model, called the joint topic sentiment model (JST) was proposed in (Lin and He, 2009). This model is unsupervised in the sense that is does not depend on documents labelled by sentiment. The words are drawn from a distribution jointly defined by the topics and the sentiment label. The model incorporates opinion words as prior information, and in contrast to TSM, it acknowledges that opinion words can be topic dependent.