• No results found

Semi-supervised and unsupervised approaches

2.3 Approaches

2.3.3 Semi-supervised and unsupervised approaches

Due to the problem of limited data availability, the use of supervised methods is not always possible. Semi-supervised and unsupervised methods attempt to overcome this problem by taking advantage of a plentiful amount of unlabelled data. In this section, several important semi-supervised and unsupervised studies are described.

Dasgupta and Ng(2009) assumed that some texts are easier for sentiment classification to deal with than others and proposed the combination of two techniques to acquire and exploit both easy-to-classify and hard-to- classify data. First, spectral clustering (Ng et al., 2001) was applied to find unambiguous and easy-to-classify reviews. These documents were then used in active learning (Cohn et al., 1994), which attempted to acquire the most ambiguous documents and annotate them manually. Finally, an ensemble of classifiers trained on the same set of ambiguous reviews and different sets of unambiguous reviews was constructed and compared against several baselines. The evaluation verified the effectiveness of each of the suggested steps.

Li et al.(2010a) applied a co-training approach (Blum and Mitchell,1998) for semi-supervised classification of product reviews. They considered the data from two perspectives: personal and impersonal views. Personal views were defined as those conveyed by personal sentences where the subject is a person, for example, “I am happy with the product”. In turn, impersonal

views were those represented by impersonal sentences where the subject is not a person, for example, “The product is really good”. First, a simple heuristic was applied to classify sentences as personal or impersonal, on the basis of which three datasets were composed: reviews with personal sentences, reviews with impersonal sentences and full texts. Then, three ME classifiers trained on these datasets were fused in the co-training procedure. Evaluation showed the significance of each of the proposed steps. First, a random division of sentences yielded a substantial decrease in accuracy compared to the two view division. Second, an ensemble of the three classifiers performed much worse than the co-training approach. Finally, co-training was proved to be better than self-training on the individual classifiers.

Haimovitch et al. (2012) argued that augmenting the amount of unlabelled data can reduce the error rate given by semi-supervised approaches. They conducted large-scale experiments with up to 15 million unlabelled Amazon product reviews employing a bootstrapping approach called AROW (Adaptive Regularisation of Weight vectors) (Crammer et al., 2009). The results demonstrated that the unlabelled data size affects the effectiveness of their approach. For example, increasing unlabelled data from 50K to 1.6M examples reduced the error rate for book reviews by ≈2%. Another interesting outcome of their study is concerned with the amount of labelled data needed for high performance. While increasing the amount of labelled data from 100 to 1000 examples yielded a significant decrease in

the error rate (from 15.2% to 8.4% for book reviews), further growth of the labelled data size did not improve the results much.

Zhou et al. (2013) proposed using deep learning for training semi- supervised sentiment classification models. They introduced a novel approach, called active deep networks (ADN), which combines deep belief networks (Hinton and Salakhutdinov, 2006) with active learning. First, the deep architecture exploiting all unlabelled data and some initial labelled examples is constructed. Then an active learner is applied to identify the most uncertain unlabelled examples and use them for training the networks. To improve the active learning stage by taking into account not only the uncertainty of an example but also the density of the area in which it is found, a modified version of ADN, called information ADN (IADN), was proposed. This helps to choose the most representative examples. The experimental results demonstrated the effectiveness of ADN and IADN compared to previous semi-supervised methods, such as transductive SVMs and the method of Dasgupta and Ng(2009) described above.

An important group of algorithms within the semi-supervised learning approach is graph-based algorithms (Zhu et al., 2003a), which model data as a weighted graph of instances with the edges corresponding to similarity between instances. For document-level sentiment classification, instances are documents and the similarity function reflects the closeness of sentiment between documents. The first attempt to apply graph-based learning to sentiment classification was by Goldberg and Zhu (2006), who proposed

a modified label propagation algorithm (Zhu and Ghahramani, 2002) for multiclass classification of movie reviews. To estimate sentiment similarity between a pair of documents, several similarity measures were tested. The measure that performed best was based on the percentage of positive sentences (PSP) in a document, previously introduced by Pang and Lee (2005). For computing PSP scores, review sentences were classified as either positive or negative using a binary classifier trained on an external “snippet” dataset. The snippet dataset comprised 10 662 short texts taken from the movie reviews on rottentomatoes.com, where the ratings for snippets were assigned on the basis of the ratings of their original reviews. Using the PSP scores, each document was represented as a vector (PSP, 1-PSP) and the similarity between two documents was measured as the cosine similarity between the corresponding vectors. For relatively small amounts of labelled data (less than 200 documents), the graph-based results demonstrated a considerable improvement over the accuracies given by supervised SVM regression. In contrast, for larger amounts of labelled data, supervised SVM regression generally performed better. The method of Goldberg and Zhu (2006) is implemented as part of our graph-based sentiment analysis system and, therefore, a more detailed description of the algorithm can be found in Chapter 4.

Turney(2002) was the first to tackle the sentiment classification problem in an unsupervised manner. He assumed that lexical association of two words and their similarity are related, i.e., words with similar orientation

tend to co-occur. Instead of considering isolated words, he extracted two- word phrases which contain adjectives and adverbs and satisfy a set of specific linguistic patterns, for example, JJ NN or RB JJ. Phrases were preferred to isolated words for introducing some context which could help to disambiguate domain-dependent and context-dependent sentiment words, for example, “unpredictable plot” and “unpredictable steering”. The semantic orientation of phrases was measured using the SO-PMI method as explained in Section 2.3.1. The sentiment of documents was computed by averaging the semantic orientations of their phrases. This approach gave reasonable results taking into consideration its being completely unsupervised.

Read and Carroll (2009) extended the SO-PMI method by exploring three types of similarity measures: lexical association measures as inTurney and Littman (2003) and two second-order similarity measures - semantic spaces and distributional similarity. The overall sentiment of a document was computed on the basis of the sentiments of its features, which in turn were represented as a sum of similarity scores between a feature and a set of predefined prototypical words. Seven positive and seven negative words were selected as polarity prototypes. The evaluation of the proposed word similarity method showed that its performance is independent of domains, topics and time-periods. In addition, a comparison with supervised techniques suggested that the word similarity method can be more beneficial than supervised techniques when the task involves multi-domain datasets.

manner which exploits the idea that emoticons and their contexts convey similar sentiments. The author collected data from Usenet newsgroups and extracted pieces of text close to emoticons using different context windows. Unfortunately, the results were not very good: the best classifier trained on 20 000 articles could only achieve 70.1% accuracy. The author explained the low accuracy by the high level of noise present in the automatically acquired labelled data. This unsupervised labelled data acquisition approach was adopted in several subsequent studies (Go et al., 2009; Pak and Paroubek, 2010) for sentiment analysis on Twitter.

Another unsupervised method developed by Zagibalov (2010) exploits a small number of sentiment-bearing seed words and a bootstrapping strategy. First, all documents are classified according to the seed words and all lexical units are weighted on the basis of their frequency in positive and negative documents. Then, the procedure is repeated, which leads to new document labels and updated weights of lexical units. This process iterates until convergence is achieved.

Lin and He (2009) proposed a joint sentiment-topic (JST) model as another unsupervised approach to sentiment classification. Their model extends Latent Dirichlet Allocation (LDA) (Blei et al., 2003) by adding a sentiment layer, which simultaneously allows the extraction of mixture of topics and the detection of their sentiments. To refine the model, some prior knowledge about sentiment-bearing and opinionated words was incorporated, which in fact added some supervision to the method. The authors used the

MPQA subjectivity lexicon together with the removal of objective sentences in a supervised manner similar toPang and Lee(2004). The results obtained on the movie review dataset were lower than those of fully supervised approaches but are still surprisingly high when we consider the almost unsupervised nature of the method. There are also a number of similar studies where topic models are exploited for extracting aspects of reviews (Mei et al., 2007; Titov and McDonald, 2008; Brody and Elhadad, 2010; Zhao et al., 2010) although their main objective was to produce summaries rather than to detect the sentiment of a document.

Related documents