We have described our research on supervised opinion classification of blogs. We have in- vestigated the difference of global classification of documents from mixed topics and local classification of documents from the same topic. Our experiment on the TREC Blog collec- tions has shown that the local classification is significantly more accurate than the global classification. This might be because that documents from the same topic tended to have a similar set of sentiment words. Our research will then concentrate on developing topic-specific opinion classification models, especially it is anticipated that the annotation of opinion words tensity can be used to further improve such models.
Chapter 5
Improving Sentiment Classification
of Blog Posts by Reducing Topic
Drift
Chapter 4 shows that better results of sentiment classification of blog posts can be achieved when the classifier is trained on posts from the target domain (relevant to the topic specified by the user). In this chapter we present another study to improve sentiment classification in the context of topics. Here we focus on a phenomenon prominent in blog posts – topic drift.
5.1 Introduction
Unlike news stories and product reviews which usually have a strong focus on a single topic, blog posts on a topic often contain significant portions of drift text, as is shown in the example below, which is extracted from a post about the documentary March of the Penguins.
“This is a must-see documentary about the mating ritual of Antarctic penguins... I snif- fled and nodded I know what that feels like... We received news yesterday that the donor´s ultrasound exam was normal, so we are good to go. Am feeling detached and ambivalent again...”
The author comments on the documentary first, but then drifts to her personal experience. The polarity of the opinion shown in the drift portion of the text is not necessarily consistent with the authors opinion on the topic, which misleads the classifier.
Sentiment classification of blog posts has been done in a two-phase process [Ounis et al., 2006]. Posts are classified with regard to their relevance to a topic expressed as a user query, and if relevant, are then classified by their sentiment polarity1. The topic drift within relevant documents are not addressed. Recently much work has been done on classification on extracts. Yet they are either conducted on documents from a single domain and does not address the topic drift, e.g. movie reviews [Pang and Lee, 2004], or tested only with supervised classifiers [Pang and Lee, 2004; Zhou et al., 2012], thus confined from drawing a definitive conclusion on the effectiveness of classification on extracts. Most existing work considering both opinion and topic focuses on retrieving opinion text content without classification [Jo and Oh, 2011; Lin and He, 2009; Zhang and Ye, 2008].
We surmise that using topic focused extracts (the portion of text that expresses opinions on the given query) may improve the classification of opinions expressed towards a topic. The research questions we address are: 1. Can topic-focused opinion extraction methods (multi-facet extraction) lead to better sentiment classification than the single-facet opinion
1
In later discussions we use the terms topic and query interchangeably. We also use the terms sentiment and opinion interchangeably in this section
CHAPTER 5. ADDRESSING TOPIC DRIFT IN BLOGPOST SENTIMENT CLASSIFICATION
extraction methods? 2. How does the classification of blog posts with extracts compare to that on full text?
To answer the research questions, we develop multi-facet extracts and contrast the clas- sification performance on different types of extracts as well as that on the full text. In the pilot study, the extracts are generated by human judges, so as to validate the hypothesis that reducing topic drift can lead to more accurate sentiment classification; after that, we propose automatic extraction methods.
Automatic extraction techniques have been used for traditional topic classification, sen- timent classification and evaluative sentence classification. Extraction methods include su- pervised graph-based extraction methods [Pang and Lee, 2004], unsupervised graph-based approaches [Zhai et al., 2011], and ensemble methods [Shen et al., 2007] which combine an array of heterogeneous extraction techniques. We hypothesize that using automatically gen- erated topic focused extracts (the portion of text that expresses opinions on the given query) can also improve the classification of opinions expressed towards a topic.
Aiming for a comprehensive study on classification performance using the extracts, we test our automatic extraction methods on the TREC Blog collection, and benchmark our classifiers and extracts on the Movie Review Dataset [Pang and Lee, 2004]. The former is a collection of blog posts in a wide range of domains, and the latter is a popularly used dataset for sentiment classification.
Different types of extracts are classified with a number classification models on a wide range of topics. To examine how length affects the classification performance, we focus on rank-based methods; to ensure the readability of the extracts we extract at the sentence level.
Several typical classifiers are employed in this chapter: a supervised Naive Bayes classifier [Pang and Lee, 2004] trained on uni-gram word features, a logistic regression model with token counts as inputs, and a state-of-the-art unsupervised lexicon-based classifier, SOCAL [Taboada et al., 2011], which leverages linguistic rules. The Naive Bayes model and the logistic regression model are used in the pilot study. The same Naive Bayes classifier was again applied on the automatic extracts, together with the lexicon-based classifier.
We have found that multi-facet extraction methods produce better extracts for classifica- tion with both the supervised Nave Bayes classifier and the lexicon-based SOCAL classifier. Sentiment classification on extracts with 5% of the sentences can achieve comparable perfor- mance to that on full text, and on 70% of the sentences can be more accurate than full text. An investigation on 20 domains shows that our approach works on most of the domains. Next we introduce the pilot study.