• No results found

Investigating the contextual polarity of sentiment expressions in the MPQA Corpus requires new annotations. Although the polarity of a subset of expressions is captured with the attitude-type attribute, this attribute has not been comprehensively annotated (see Chapter 3). However, subjective expressions in the corpus are comprehensively annotated. Sub- jective expressions in the MPQA Corpus are a subset of the private state annotations. They include all expressive subjective element frames and those direct subjective frames with an expression intensity greater than neutral. Because sentiment is a type of private state, senti- ment expressions will be a subset of the subjective expressions already marked in the corpus. Thus, the subjective expression annotations in the MPQA Corpus give a starting point for the sentiment and contextual polarity annotations.

6.2.1 Annotation Scheme

When deciding how to annotate contextual polarity, there were two main issues that needed to be addressed. First, which of the subjective expressions are sentiment expressions? Sec- ond, what annotation scheme should be used for marking contextual polarity?

For this research, sentiments are defined as positive and negative emotions, evaluations and stances. Examples of positive sentiments are on the left in Table 6.1, and examples of negative sentiments are on the right. Any subjective expression that is expressing one of

Table 6.1: Examples of positive and negative sentiments

Positive sentiments Negative sentiments

Emotion I’m happy I’m sad

Evaluation Great idea! Bad idea!

Stance She supports the bill She’s against the bill

these types of private states is considered a sentiment expression.

The second issue to address is what the actual annotation scheme should be for marking contextual polarity. The scheme that was developed has four tags: positive, negative, both, and neutral. The positive tag is used to mark positive sentiments. The negative tag is used to mark negative sentiments. The both tag is applied to sentiment expressions where both a positive and negative sentiment are being expressed (e.g., a bittersweet memory). The neutral tag is used for all other subjective expressions.

Below are examples of contextual polarity annotations from the corpus. The tags are in boldface, and the subjective expressions with the given tags are underlined.

(6.4) Thousands of coup supporters celebrated (positive) overnight, waving flags, blowing whistles . . .

(6.5) The criteria set by Rice are the following: the three countries in question are repressive (negative) and grave human rights violators (negative) . . .

(6.6) Besides, politicians refer to good and evil (both) only for purposes of intimidation and exaggeration.

(6.7) Jerome says the hospital feels (neutral) no different than a hospital in the states.

As a final note on the annotation scheme, the annotators were asked to judge the contex- tual polarity of the sentiment that was ultimately being conveyed by the subjective expres- sion, that is, once the sentence had been fully interpreted. Thus, the subjective expression, “they have not succeeded, and will never succeed,” was marked as positive in the following sentence:

(6.8) They have not succeeded, and will never succeed (positive), in breaking the will of this valiant people.

Table 6.2: Contingency table for contextual polarity agreement

Neutral Positive Negative Both Total

Neutral 123 14 24 0 161

Positive 16 73 5 2 96

Negative 14 2 167 1 184

Both 0 3 0 3 6

Total 153 92 196 6 447

The reasoning is that breaking the will of a valiant people is negative, so to not succeed in breaking their will is positive.

6.2.2 Agreement Study

Paul Hoffmann conducted an agreement study to measure the reliability of the polarity anno- tation scheme. For the study, two annotators1 independently annotated 10 documents from the MPQA Corpus containing 447 subjective expressions. Table 6.2 shows the contingency table for the two annotators’ judgments. Overall agreement is 82%, with a Kappa (κ) value of 0.72.

As part of the annotation scheme, annotators were asked to judge how certain they were in their polarity tags. For 18% of the subjective expressions, at least one annotator used the uncertain tag when marking polarity. If these cases are considered borderline and excluded from the study, percent agreement increases to 90% and Kappa rises to 0.84. Table6.3shows the revised contingency table with the uncertain cases removed. This shows that annotator agreement is especially high when both annotators are certain, and that annotators are certain for over 80% of their tags.

Note that all annotations are included in the experiments.

Table 6.3: Contingency table for contextual polarity agreement with borderline cases re- moved

Neutral Positive Negative Both Total

Neutral 113 7 8 0 128

Positive 9 59 3 0 71

Negative 5 2 156 1 164

Both 0 2 0 2 4

Total 127 70 167 3 367

Table 6.4: Distribution of contextual polarity tags

Neutral Positive Negative Both Total

9,057 3,311 7,294 299 19,961

45.4% 16.6% 36.5% 1.5% 100%

6.2.3 MPQA Corpus version 1.2

In total, all 19,962 subjective expressions in the 535 documents (11,112 sentences) of the MPQA Corpus have been annotated with their contextual polarity as described above. Table 6.4 gives the distribution of the tags. This table shows that a small majority of subjective expressions (54.6%) are expressing a positive, negative, or both (positive and negative) sentiment. I refer to these expressions as polar in context. Close to half of the subjective expressions are neutral: They are expressing some other type of subjectivity other than sentiment. This suggests that, although sentiment is a major type of subjectivity, there are other prominent types of subjectivity that may be important to distinguish for applications seeking to exploit subjectivity analysis.

As many NLP applications operate at the sentence level, one important issue to consider is the distribution of sentences with respect to the subjective expressions they contain. In the 11,112 sentences in the MPQA corpus, 28% contain no subjective expressions, 24% contain only one, and 48% contain two or more. Of the 5,304 sentences containing two or more subjective expressions, 17% contain mixtures of positive and negative expressions, and 61%

contain mixtures of polar (positive/negative/both) and neutral subjective expressions.