Supervised classification - Graph-based approaches for semi-supervised and cross-domain sentime

2.3 Approaches

2.3.2 Supervised classification

Supervised classification requires a corpus of texts labelled with their polarity or sentiment strength. According to numerous studies, when the amount of labelled data is sufficient, this learning approach normally yields the best performance (Pang and Lee, 2008). A pioneering study in supervised sentiment classification (Pang et al.,2002) compared three techniques: naive Bayes (NB), maximum entropy (ME) and SVMs and the results showed a moderate advantage for SVMs over the other methods. In another study, SVMs and ME demonstrated comparable results for the classification of heterogeneous information on the Web (Boiy and Moens, 2009).

Research on supervised methods for sentiment classification has investigated several directions for improving classification results. One popular direction concerns exploring a set of features yielding the best results (Gamon, 2004; Mullen and Collier, 2004; Whitelaw et al., 2005; Kennedy and Inkpen, 2006; Abbasi et al., 2008; Paltoglou and Thelwall, 2010). The main studies in this group were discussed in Section2.1. Another direction comprises methods which use a combination of classifiers to reduce classification errors given by individual learners (Kennedy and Inkpen,2006; Prabowo and Thelwall, 2009). For example, Kennedy and Inkpen (2006) combined a lexical approach and SVMs using weighted voting. Their lexical

approach exploited different sentiment lexicons (including GI and a list of positive and negative adjectives (Taboada and Grienve, 2004)), enriched by marking the presence of negatives and intensifiers. Though the lexical approach alone was poor, its combination with SVMs slightly outperformed SVMs on their own.

Similar to Kennedy and Inkpen (2006), Prabowo and Thelwall (2009) hypothesised that the use of multiple classifiers in a hybrid manner can help to improve sentiment classification. However, the classifiers were combined in a sequence rather than in an ensemble. The authors proposed three rule-based methods. The first method used General Inquirer, while the second exploited proper nouns for constructing rules, which were assumed to convey the same sentiment as the whole document. Finally, the last method, called the statistics-based classifier, established a set of rules using sentiment- bearing words automatically rated similar to the SO-PMI method (Turney and Littman, 2003). The hybrid approach, which combines the three rule- based classifiers with SVMs showed a significant advantage over SVMs alone for small datasets. The comparison was not carried out for larger datasets due to a high computational cost of the statistics-based classifier.

A third direction unites papers that represent documents by more fine-grained elements, for example, sentences, and address both coarse- and fine-grained classification problems (Pang and Lee, 2004; McDonald et al., 2007; Zaidan et al., 2007; Li et al., 2010b; Yessenalina et al., 2010; Carrillo de Albornoz et al., 2011). Pang and Lee (2004), when dealing with

movie reviews, hypothesised that objective sentences degrade the sentiment classification of full texts. To filter out objective sentences, a graph min- cut algorithm that takes into account both proximity between sentences in a text and the classification results given by the SVM and NB classifiers was applied. The SVM and NB polarity classifiers trained on filtered texts were compared with those trained on the full documents. As a result, the extraction of subjective sentences was shown to be beneficial only for NB, while SVMs performed marginally better with full documents.

McDonald et al.(2007) assumed that the joint classification of documents and sentences can improve the accuracy of sentiment classification at both levels. The authors suggested an undirected graphical model, where the label of each sentence depends on its neighbouring sentences and the label of the document. In general, inference in undirected graphical models is intractable, but if the document label is fixed the introduced model converts into a chain and the problem can be solved using Viterbi’s algorithm. The method slightly outperformed two cascaded classifiers, where one classifies sentences using only a sentence-structured model, and then passes the labels obtained to a document classifier, while the other acts vice-versa.

Zaidan et al. (2007) argued that document annotations enriched with “annotator rationales” can be more effective for sentiment classification than providing a classifier with additional labelled examples. By “annotator rationales”, the authors mean the most important words and phrases of a document that indicate its polarity. For each original document, a set of

contrastive examples was constructed by removing one or more annotator rationales. The contrastive examples were used to put additional constraints on the SVM classifier to ensure that the contrastive documents were classified less confidently than the original documents. The results demonstrated a substantial improvement over the baseline SVMs trained only on original documents. Interestingly, training SVMs on annotator rationales only yielded a poor accuracy significantly lower than the baseline. Following the idea of Zaidan et al. (2007), Yessenalina et al. (2010) proposed an unsupervised method for extracting annotator rationales using either OpinionFinder5 or manually constructed polarity lexicons. Their evaluation showed that automatically constructed rationales are as effective as manually-produced rationales.

The last group of approaches presented here attempt to improve sentiment classification by stepping outside a simple BOW representation (Li et al., 2010b; Bai, 2011). Bai (2011) employed Bayesian networks, which are able to model dependencies among words, proposing an algorithm for learning the Markov Blanket for a sentiment variable. The sentiment variable can have multiple values corresponding to the sentiment expressed in a document. At the first stage, the algorithm establishes a parsimonious vocabulary of words that are expressive enough to capture the overall sentiment of a document. At the second stage, a dependency structure between the words in the vocabulary and sentiment variables is learnt. The experiments showed that

only several dozen highly predictive words are enough to obtain accurate classification results which are comparable or superior to those of state-of- the-art classifiers trained on BOW representations. Interestingly, the words found important by the algorithm for predicting sentiments are not always sentiment-bearing, for example, “also”, “again”, “but” and “as”, among others. According to the authors, such “results suggest that words that occur often, along with their conditional dependencies and a few strong adjectives, constitute most of the vocabulary needed to express sentiments and perform reasonable predictions” (Bai, 2011, page 741).

Li et al. (2010b) argued that a simple BOW representation is unable to model such complex linguistic phenomena as negation structures, contrast transition, modals and presuppositional structures, which can substantially shift or even invert sentence polarity (Polanyi and Zaenen,2006). Therefore, the first stage of their algorithm consisted of the automatic detection of sentences with polarity-shifting structures (polarity-shifted sentences). At the second stage three classifiers trained on polarity-shifted sentences, polarity-unshifted sentences and all sentences are trained. Experiments proved the importance of polarity-shifted sentences for correct sentiment classification. Moreover, the final classifier combining three learning models by stacking (Dzeroski and Zenko, 2004) significantly outperformed each of the individual classifiers.

In document Graph-based approaches for semi-supervised and cross-domain sentiment analysis (Page 51-56)