Sentiment Analysis and Topic Classification: Case study over Spanish tweets

(1)

Sentiment Analysis and Topic Classification:

Case study over Spanish tweets

Fernando Batista, Ricardo Ribeiro

Laboratório de Sistemas de Língua Falada, INESC-‐ID Lisboa R. Alves Redol, 9, 1000-‐029 Lisboa, Portugal

ISCTE -‐-‐ Instituto Universitário de Lisboa Av. Forças Armadas, 1649-‐026 Lisboa, Portugal

{Fernando.Batista, Ricardo.Ribeiro}@inesc-id.pt

Introduction

By providing revolutionary means for people to communicate and interact, Social Networks take now part in the life of a large of millions of people all over the world. Each social network targets different audiences, offering a range of unique services that people find useful in the course of their lives. In what concerns Twitter, it provides a simple way of writing small messages, which people can use to express everything.

Twitter can be accessed in numerous ways, ranging from computers to mobile phones or other mobile devices. That is particularly important because accessing and producing content becomes a trivial task, therefore assuming an important part of people's lives. One relevant aspect that differentiates Twitter from other communication means is its ability to rapidly propagate such content and make it available to specific communities, selected based on their interests. Twitter data is a powerful source of information for assessing and predicting large-‐scale facts. For example, [11] capture large-‐scale trends on consumer confidence and political opinion in tweets, strengthening the potential of such data as a supplement for traditional polling. In what concerns stock markets, [5] found that Twitter data can be used to significantly improve stock market predictions accuracy.

The huge amount of data, constantly being produced, makes it impracticable to manually process such content. For that reason, it becomes urgent to apply automatic processing strategies that can handle, and take advantage, of such amount of data. However, processing Twitter is all but an easy task, not only because of specific phenomena that can be found in the data, but also because it may require to process a continuous stream of data, and possibly to store some of the data in a way that it can be accessed in the future. This paper addresses two well-‐known Natural Language Processing (NLP) tasks, sentiment analysis and topic classification, which have been performed over Spanish Twitter data, in the context of the "TASS – workshop on Sentiment Analysis" [18], a satellite event of the SEPLN 2012 conference. The remainder of this paper presents an overview of the challenge proposed within the workshop, reports on our experience and presents some conclusions about the achieved results.

(2)

Related work

Text Classification consists of assigning a class from a set of possible predefined classes to a given text. Sentiment Analysis and Topic Detection are two Text Classification tasks, commonly applied both to written and speech corpora, that may be tackled using similar strategies, despite being characterized by their specificities. Sentiment Analysis is often referred by other names (e.g. sentiment mining, opinion mining, etc.) and assigns an opinion, sentiment or emotion to a given portion of text. Topic detection assigns topics from a set of possible predefined topics to a given portion of text.

Sentiment analysis can be performed at different complexity levels, where the most basic one consists just on deciding whether a portion of text contains a positive or a negative sentiment. However, it can be performed at more complex levels, like ranking the attitude into a set of more than two classes or, even further, it can be performed in a way that different complex attitude types can be determined, as well as finding the source and the target of such attitudes.

Dealing with the huge amounts of data available on Twitter demand clever strategies. One interesting idea, explored by [1] consists of using emoticons, abundantly available on tweets, to automatically label the data and then use such data to train machine learning algorithms. The paper shows that machine learning algorithms trained with such approach achieve above 80% accuracy, when classifying messages as positive or negative. A similar idea was previously explored by [3] for movie reviews, by using star ratings as polarity signals in their training data. This latter paper analyses the performance of different classifiers on movie reviews, and presents a number of techniques that were used by many authors and served as baseline for posterior studies. As an example, they have adapted a technique, introduced by [14], for modeling the contextual effect of negation, adding the prefix NOT_ to every word between a

“negation word” and the first punctuation mark following the negation word.

Common approaches to sentiment analysis involve the use of sentiment lexicons of positive and negative words or expressions. The General Inquirer [13] was one of the first available sentiment lexicons freely available for research, which includes several categories of words, such as: positive vs. negative, strong vs.

week. Two other examples include [10], an opinion lexicon containing about 7000 words, and the MPQA Subjectivity Cues Lexicon [16], where words are annotated not only as positive vs. negative, but also with intensity.

Learning polarity lexicons is another research approach that can be especially useful for dealing with large corpora. The process starts with a seed set of words and the idea is to increasingly find words or phrases with similar polarity, in semi-‐supervised fashion [12]. The final lexicon contains much more words, possibly learning domain-‐specific information, and therefore is more prone to be robust.

Work on Topic Detection has its origins in 1996 with the Topic Detection and Tracking (TDT) initiative sponsored by the US government. The main motivation for this initiative was the processing of the large amounts of information coming

(3)

from newswire and broadcast news. The main goal was to organize the information in terms of events and stories that discussed them. The concept of topic was defined as the set of stories about a particular event. Five tasks were defined: story segmentation, first story detection, cluster detection, tracking, and story link detection. The current impact and the amount information generated by social media led to a state of affairs similar to the one that fostered the pioneer work on TDT. Social media is now the context for research tasks like topic (cluster) detection [9, 6] or emerging topic (first story) detection [15].

In that sense, closer to our work are the approaches described by [4] and [9], where tweets are classified into previously defined sets of generic topics. In the former, a conventional bag-‐of-‐words (BOW) strategy is compared to a specific set of features (authorship and the presence of several types of twitter-‐related phenomena) using a Naive Bayes (NB) classifier to classify tweets into the following generic categories: News, Events, Opinions, Deals, and Private Messages. Findings show that authorship is a quite important feature. In the latter, two strategies, BOW and network-‐based classification, are explored to classify clusters of tweets into 18 general categories, like Sports, Politics, or Technology. In the BOW approach, the clusters of tweets are represented by TF-‐

IDF vectors and NB, NB Multinomial, and Support Vector Machines (SVM) classifiers are used to perform classification. The network-‐based classification approach is based on the links between users and C5.0 decision tree, k-‐Nearest Neighbor, SVM, and Logistic Regression classifiers were used. Network-‐based classification was shown to achieve a better performance, but being link-‐based, it cannot be used for all situations.

Data

The data provided in the context of the TASS contest [18] consists of Spanish tweets written in Spanish by 200 well-‐known personalities and. The training data is an XML file containing about 7200 tweets, each one labeled with sentiment polarity and the corresponding topics. The goal consists in providing automatic sentiment and topic classification for the test data, which is also available in XML and consists of about 60800 unlabeled tweets.

Each tweet in the labeled data is annotated in terms of polarity, using one of six possible values: NONE, N (negative), N+ (very negative), NEU (both negative and positive), P (positive), P+ (very positive). Moreover, each annotation is also marked as AGREEMENT or DISAGREEMENT, indicating whether all the annotators performed the annotation coherently. In what concerns topic detection, each tweet was annotated with one or more topics, from a list of 10 possible topics: política (politics), otros (others), entretenimiento (entertainment), economía (economics), música (music), fútbol (football), cine (movies), tecnología (technology), deportes (sports), and literatura (literature).

It is also important to mention that, besides the tweets, an extra XML file is also available, containing information about each one of the users that authored at least one of the tweets in the data. In particular, the information includes the type of user, assuming one of three possible values – periodista (journalist),

(4)

famoso (famous person), and politico (politician) – which may provide valuable information for these tasks.

Approaches and Results

The number of participants was eight for the sentiment analysis task, but only six of them also performed the topic detection task. The most relevant strategies described by the participants cover: treatment of emoticons, processing negation, spelling error correction, part-‐of-‐speech tagging, use of named entities and adjectives, syntactic information, word sense disambiguation, polarity lexicons (for sentiment analysis), Twitter-‐LDA -‐ Latent Dirichelet Allocation (for topic detection), information retrieval motivated strategies based on language divergence. In what concerns machine learning, the participants have reported the use of Logistic Regression, Multinomial Naive Bayes, and SVM (Support Vector Machines).

Sentiment Analysis Topic Detection 65.3% 65.4%

63.4% 60.1%

57.0% 45.3%

Table 1 – Top 3 results (accuracy)

Table 1 presents the best results achieved for each one of the tasks. Our team achieved the best result for Topic Detection and the second place for Sentiment Analysis. The team which achieved the best results for Sentiment Analysis, involved treating emoticons, negation, spelling errors, POS-‐tagging, syntactic analysis, and the used of sentiment lexicons. Our results for this task involved not only text features, but also the tweet's author name and the user type, also available in the distributed data. Apart from the provided data, our experiments also used Spanish Sentiment Lexicons (http://lit.csci.unt.edu/), a resource created at the University of North Texas [17]. From this resource, only the most robust part was used, known as fullStrengthLexicon, and containing 1346 words automatically labeled with sentiment polarity. For Topic Detection, our approach was the most successful and involved the use of text features (after removing punctuation marks) and the tweet's author name.

Bibliography

1. A. Go, R. Bhayani, and L. Huang, “Twitter Sentiment Classification using Distant Supervision,” tech. rep., Stanford University, 2009.

2. A. L. Berger, S. A. D. Pietra, and V. J. D. Pietra. A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39–71, 1996.

3. B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up? sentiment classification using machine learning techniques,” in Proceedings of Empirical Methods in Natural Language Processing (EMNLP), pp. 79–86, 2002.

4. B. Sriram, D. Fuhry, E. Demir, H. Ferhatosmanoglu, and M. Demirbas, “Short Text Classification in Twitter to Improve Information Filtering,” in SIGIR’10:

(5)

Proceedings of the 33rd international ACM SIGIR conference on Research and Development in Information Retrieval, pp. 841–842, ACM, 2010.

5. Bollen, Johan, Huina Mao, and Xiao-‐Jun Zeng. 2010. Twitter mood predicts the stock market. CoRR, abs/1010.3003.

6. C. Lin, Y. He, R. Everson, and S. Rüger, “Weakly Supervised Joint Sentiment-‐

Topic Detection from Text,” IEEE Transactions On Knowledge And Data Engineering, vol. 24, no. 6, pp. 1134–1145, 2012.

7. F. Batista, R. Ribeiro, Sentiment Analysis and Topic Classification based on Binary Maximum Entropy Classifiers, Procesamiento de Lenguaje Natural, vol.

50, pp. 77–84. ISSN: 1989-‐7553, 2013.

8. H. Daumé III. Notes on CG and LM-‐BFGS optimization of logistic regression.

http://hal3.name/megam/, 2004.

9. K. Lee, D. Palsetia, R. Narayanan, M. Patwary, A. Agrawal, and A. Choudhary,

“Twitter trending topic classification,” in International Conference on Data Mining Workshops (ICDMW), pp. 251–258, IEEE, 2011.

10. M. Hu and B. Liu, “Mining and summarizing customer reviews,” in Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’04, pp. 168–177, ACM, 2004.

11. O’Connor, B., R. Balasubramanyan, B. R. Routledge, and N. A. Smith. 2010.

From tweets to polls: Linking text sentiment to public opinion time series. In Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media (ICWSM’10).

12. P. D. Turney, “Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Clas-‐ sification of Reviews,” in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 417–424, ACL, 2002.

13. P. Stone, D. Dunphy, M. Smith, and D. Ogilvie, The General Inquirer: A Computer Approach to Content Analysis. MIT Press, 1966.

14. S. Das and M. Chen, “Yahoo! for Amazon: Extracting market sentiment from stock message boards,” in Proceedings of the Asia Pacific Finance Association Annual Conference (APFA), 2001.

15. S. P. Kasiviswanathan, P. Melville, A. Banerjee, and V. Sindhwani, “Emerging Topic Detection using Dictionary Learning,” in CIKM’11: Proceedings of the 20th ACM international conference on Information and Knowledge Management, pp. 745–754, ACM, 2011.

16. T. Wilson, J. Wiebe, and P. Hoffmann, “Recognizing contextual polarity in phrase-‐level sentiment anal-‐ ysis,” in Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pp. 347–354, ACL, 2005.

17. V. Perez-‐Rosas, C. Banea, and R. Mihalcea. Learning sentiment lexicons in spanish. Proc. of the 8th International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey, May 2012.

18. Villena-‐Román, J., J. García-‐Morera, C. Moreno Garcia, L. Ferrer-‐Ureña, S.

Lana-‐Serrano, J. C. González-‐Cristobal, A. Westerski, E. Martínez-‐Cámara, M. A.

García-‐Cumbreras, M. T. Martín-‐Valdivia, and Ureña-‐López L. A. 2012. TASS-‐

Workshop on Sentiment Analysis at SEPLN. Procesamiento de Lenguaje Natural, 50.