Chapter 2 Background
2.2 Sentiment Analysis
2.2.2 Cross-domain Sentiment Classification
Most existing domain adaptation approaches can be classified into two categories: feature-based adaptation and instance-based adaptation. The former seeks to con- struct new adaptive feature representations that reduce the difference between do- mains, while the latter aims to sample and re-weight source domain training data for use in classification within the target domain.
With respect to feature domain adaptation, [79] applied structural corre- spondence learning (SCL) algorithm for cross-domain sentiment classification. SCL chooses a set of pivot features that frequently occur in both domains and have highest mutual information to the domain labels, and uses these pivot features to align other features by training N linear predictors. Finally it computes singular value decomposition (SVD) to construct low-dimensional features to improve its classification performance. A small amount of target domain labelled data is used to learn to deal with misaligned features from SCL. [80] found that SCL did not work well for cross-domain adaptation of sentiment on Twitter due to the lack of mutual information across the Twitter domains and uses subjective proportions as a backoff adaptation approach. [81] proposed to construct a bipartite graph from a co-occurrence matrix between domain-independent and domain specific features to reduce the gap between different domains and use spectral clustering for feature alignment. The resulting clusters are used to represent data examples and train sen- timent classifiers. They used mutual information between features and domains to classify domain-independent and domain specific features, but in practice this also introduces mis-classification errors. [82] describes a cross-domain sentiment classifi- cation approach using an automatically created sentiment sensitive thesaurus. Such a thesaurus is constructed by computing the point-wise mutual information between a lexical element u and a feature that can be either a sentiment feature or another lexical element that co-occurs withu in the training data, as well as relatedness be-
tween two lexical elements. Therefore common domain-independent words are used as pivots that transfer information from one domain to another. The problem with these feature adaptation approaches is that they try to connect domain-dependent features to known or common features under the assumption that parallel sentiment words exist in different domains, which is not necessarily applicable to various topics in tweets [83].
When it comes to instance adaptation, [84] proposes an instance weighting framework that prunes “misleading” instances and approximates the distribution of instances in the target domain. Their experiments show that by adding some la- belled target domain instances and assigning higher weights to them performs better than either removing “misleading” source domain instances using a small number of labelled target domain data or bootstrapping unlabelled target instances. [85] adapts the source domain training data to the target domain based on a logistic approximation. [31] learns different classifiers on different sets of features and com- bines them in an ensemble model. Such an ensemble model is then applied to part of the target domain test data to create new training data (i.e. documents for which different classifiers had the same predictions). We include this ensemble method as one of our baseline approaches for evaluation and comparison.
Except for [31] and [80], none of the above studies carry out cross-domain sentiment classification for Twitter data, which has been proven more challenging. [30] and [86] studied cross-medium sentiment classification, which transfers senti- ment classifier trained on blogs or reviews to tweets. [87] examined whether the observation about domain-dependent models improving sentiment classification of reviews also applies to tweets. They found such models to achieve significantly better performance than domain-independent models for some topics. [83] implements a multi-class semi-supervised Support Vector Machines (S3VMs) model that performs co-training on both textual and non-textual features (e.g. temporal features) for sentiment classification on tweets. In order to make their model adaptive to differ-
ent topics, confident unlabelled target-domain data are selected and topic-adaptive sentiment words are used as additional lexicon features. Ruder et al. [88] review dif- ferent strategies to select training data from multiple sources for domain adaptation for sentiment analysis, based on feature representation, similarity metrics, and the level of the selection. They find both selecting the most similar domain and subsets outperform instance-level selection. A Bayesian Optimisation based data selection approach is also proposed by the same author [89].
More recently, several studies have developed deep learning models for do- main adaptation. [90] is the first to propose learning a unified feature representation for different domains, under the intuition that deep learning algorithms learn inter- mediate concepts (between raw input and target) and these intermediate concepts could yield better transfer across domains. [91] use two parameter-sharing memory networks with attention for automatically capturing important sentiment words that are shared in both domains (i.e. pivots), where one network is for sentiment clas- sification and the other is for domain classification. The two networks are trained jointly. By augmenting the skip-gram objective with a regularisation term, [92] learns cross domain word embeddings that is shown to achieve good performance in cross-domain sentiment classification. However, both source and target domains are reviews from different sites. [93] uses emoji tweets for pretraining a model that can be used in a new task with fine-tuning. Their proposed transfer learning ap- proach sequentially unfreezes and fine-tines each layer, then lastly the entire model is trained with all layers. The authors evaluated on 3 tasks including emotion analysis, however, only ‘Fear’, ‘Joy’ and ‘Sadness’ are evaluated as the remaining emotions rarely occurred in the observations.
In contrast with most cross-domain sentiment classification works, we use a SVM-based approach proposed in [94], which directly adapts existing classifiers trained on general-domain corpora. We believe this is more efficient and flexible [95] for our task. We evaluate on a set of manually annotated tweets about cul-
tural experiences in museums and conduct a finer-grained classification of emotions conveyed (i.e.anger,disgust,happiness,surprise and sadness).