We choose to use the Support Vector Machine (SVM)[Joachims, 1998] as our classification model. In topic categorization, typically all words except the stop words are used by SVM. In this study however, a generic lexicon is used as the feature list in order to provide a fair comparison between local classification and global classification of blog posts. Next we introduce the lexicon we used and the SVM classifier.
CHAPTER 4. TOPIC DEPENDENCY IN SENTIMENT CLASSIFICATION
4.2.1 A General Sentiment Lexicon
Identification of sentiment words is fundamental to sentiment analysis and classification. There are two broad methods to identify sentiment words and build sentiment lexicon. One method is through manual construction in which annotators manually annotate a list of words or phrases [Subasic and Huettner, 2001] or find and annotate sentiment words from a given corpus [Pang et al., 2002; Wiebe et al., 2005a]. Another method is to build a lexicon from a small number of seed words with pre-determined sentimental polarity, and then populate the seed list through learning or other relationships. For example, Hatzivassiloglou and McKeown [Hatzivassiloglou and McKeown, 1997] expanded a seed list by adding those words that are linked to seed words through conjunction such asand, or, but, either-or, or neither-or; while Kim and Hovy [2004] made use of WordNet to populate seed words through synonym and antonym relationships. In our study, we use the sentiment lexicon developed by Wiebe et al. [2005b]. This lexicon list has 8221 annotated words resulted from manual annotation of a 10,000-sentence corpus of news articles of various topics. The following is an example of such an annotation:
Annotation Example type=strongsubj len=1 word=admire pos=verb stemmed=y pri-
orpolarity=positive
The property prior polarity indicates the attitude being expressed by the word admire and has three values: positive, negative and neutral. The neutral tag is for those subjective expressions that do not have positive or negative polarity. The property type indicates the
expression intensity and here it has binary values: strong or weak. As annotation was done within context of a sentence, the grammar function of a word is also annotated, for example, the word admire here is a verb. Thus a word may occur twice or more in the list depending on which grammar function a word acts in the original text for annotation, for example, the word “cooperation” is annotated as adjective and none. This list also includes words with multiple morphemes, for example, cooperate, cooperation, cooperative, and cooperatively.
4.2.2 Support Vector Machine
Support Vector Machine (SVM) has been widely used in text categorisation, and with re- ported success [Joachims, 1998]. In an SVM model, objects are represented as vectors. In learning a model to classify two classes, the basic idea of SVM is to find a hyperplane, rep- resented by a vector, that separates objects of one class from objects of other classes at a maximal margin. When using a linear kernel, SVM learns a linear threshold function. With polynomial and radial basis kernels, SVM can also be used to learn polynomial and radial basis classifiers. SVM Multiclass1is an implementation of the multi-class SVM, and is based on Structural SVMs [Tsochantaridis et al., 2004]. Unlike regular SVMs, structural SVMs can predict complex objects like trees, sequences, or sets. SVMstruct can be used for linear-time training of binary and multi-class SVMs under the linear kernel. Features extracted jointly from the input and the output are used to form an optimal separation plane.
1The package can be downloaded from http://www.cs.cornell.edu/people/tj/svm light/svm multiclass.
CHAPTER 4. TOPIC DEPENDENCY IN SENTIMENT CLASSIFICATION
4.2.3 Opinion Word Extraction
To apply a classification model effectively, a key issue is feature selection, i.e. what input will be given to a classification model. The feature selection is application dependent - how do we want to classify a set of documents, and what are prominent features from a set of documents that can separate them from each other. For the sentiment classification task, it is intuitive that we identify those opinion words from a set of documents as classification features. In this study, we simply treated opinion words as tokens and did not apply natural language processing methods such as Part-Of-Speech tagging to analyse the grammatical function of those words. We apply the Porter stemming method to the list and group different forms of the same word, and this leaves us 4919 words. A closer look at the stemmed opinion words reveals some interesting facts. There are 103 words that are of contradictory polarities. After we removed these words, we had 4816 words with unique sentiment polarity. However, there are also some words that have mixed levels of strength. In lieu of this, we created a new level of strength and named it contextual strength; there are a total of 194 in this category. The distribution of opinion words in terms of polarity and strength is summarised in Table 4.1.
Positive Negative Neutral Total
Strong 954 2061 107 3192
Contextual 81 98 14 194
Weak 544 783 163 1490
Total 1579 2942 284 4816
4.2.4 Opinion Word Vectors
In information retrieval, each document is represented by all tokens from a collection. How- ever, for the purpose of opinion classification, we represent a document as a vector of opinion word tokens and ignore those words that do not express any sentiment. As in retrieval mod- els, we weight each feature (an opinion word) of the document vector. Thetf ×idf weight of an opinion word f in a document d is:
wf d=tff d×log |D| |Df|
where tff d is the frequency of word f in document d.
|D|
|Df| is inverse document frequency of f|D|is the number of documents in the collection, and |Df|is the number of documents containing f. We expect that this model is general enough to be applied to opinion classifi- cation.