• No results found

Sentiment Analysis and Topic Classification: Case study over Spanish tweets

N/A
N/A
Protected

Academic year: 2021

Share "Sentiment Analysis and Topic Classification: Case study over Spanish tweets"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

Sentiment  Analysis  and  Topic  Classification:  

Case  study  over  Spanish  tweets  

Fernando  Batista,  Ricardo  Ribeiro  

Laboratório  de  Sistemas  de  Língua  Falada,  INESC-­‐ID  Lisboa   R.  Alves  Redol,  9,  1000-­‐029  Lisboa,  Portugal  

ISCTE  -­‐-­‐  Instituto  Universitário  de  Lisboa   Av.  Forças  Armadas,  1649-­‐026  Lisboa,  Portugal  

{Fernando.Batista, Ricardo.Ribeiro}@inesc-id.pt

Introduction  

By   providing   revolutionary   means   for   people   to   communicate   and   interact,   Social  Networks  take  now  part  in  the  life  of  a  large  of  millions  of  people  all  over   the   world.   Each   social   network   targets   different   audiences,   offering   a   range   of   unique   services   that   people   find   useful   in   the   course   of   their   lives.   In   what   concerns   Twitter,   it   provides   a   simple   way   of   writing   small   messages,   which   people  can  use  to  express  everything.  

Twitter   can   be   accessed   in   numerous   ways,   ranging   from   computers   to   mobile   phones  or  other  mobile  devices.  That  is  particularly  important  because  accessing   and  producing  content  becomes  a  trivial  task,  therefore  assuming  an  important   part  of  people's  lives.  One  relevant  aspect  that  differentiates  Twitter  from  other   communication  means  is  its  ability  to  rapidly  propagate  such  content  and  make   it   available   to   specific   communities,   selected   based   on   their   interests.   Twitter   data  is  a  powerful  source  of  information  for  assessing  and  predicting  large-­‐scale   facts.  For  example,  [11]  capture  large-­‐scale  trends  on  consumer  confidence  and   political   opinion   in   tweets,   strengthening   the   potential   of   such   data   as   a   supplement   for   traditional   polling.   In   what   concerns   stock   markets,   [5]   found   that  Twitter  data  can  be  used  to  significantly  improve  stock  market  predictions   accuracy.    

The  huge  amount  of  data,  constantly  being  produced,  makes  it  impracticable  to   manually   process   such   content.   For   that   reason,   it   becomes   urgent   to   apply   automatic   processing   strategies   that   can   handle,   and   take   advantage,   of   such   amount   of   data.   However,   processing   Twitter   is   all   but   an   easy   task,   not   only   because  of  specific  phenomena  that  can  be  found  in  the  data,  but  also  because  it   may  require  to  process  a  continuous  stream  of  data,  and  possibly  to  store  some   of  the  data  in  a  way  that  it  can  be  accessed  in  the  future.    This  paper  addresses   two   well-­‐known   Natural   Language   Processing   (NLP)   tasks,   sentiment   analysis   and  topic  classification,  which  have  been  performed  over  Spanish  Twitter  data,   in  the  context  of  the  "TASS  –  workshop  on  Sentiment  Analysis"  [18],  a  satellite   event   of   the   SEPLN   2012   conference.   The   remainder   of   this   paper   presents   an   overview   of   the   challenge   proposed   within   the   workshop,   reports   on   our   experience  and  presents  some  conclusions  about  the  achieved  results.  

(2)

Related  work  

Text  Classification  consists  of  assigning  a  class  from  a  set  of  possible  predefined   classes   to   a   given   text.   Sentiment   Analysis   and   Topic   Detection   are   two   Text   Classification  tasks,  commonly  applied  both  to  written  and  speech  corpora,  that   may   be   tackled   using   similar   strategies,   despite   being   characterized   by   their   specificities.  Sentiment  Analysis  is  often  referred  by  other  names  (e.g.  sentiment   mining,  opinion  mining,  etc.)  and  assigns  an  opinion,  sentiment  or  emotion  to  a   given   portion   of   text.   Topic   detection   assigns   topics   from   a   set   of   possible   predefined  topics  to  a  given  portion  of  text.  

Sentiment   analysis   can   be   performed   at   different   complexity   levels,   where   the   most   basic   one   consists   just   on   deciding   whether   a   portion   of   text   contains   a   positive  or  a  negative  sentiment.  However,  it  can  be  performed  at  more  complex   levels,   like   ranking   the   attitude   into   a   set   of   more   than   two   classes   or,   even   further,  it  can  be  performed  in  a  way  that  different  complex  attitude  types  can  be   determined,  as  well  as  finding  the  source  and  the  target  of  such  attitudes.  

Dealing   with   the   huge   amounts   of   data   available   on   Twitter   demand   clever   strategies.   One   interesting   idea,   explored   by   [1]   consists   of   using   emoticons,   abundantly  available  on  tweets,  to  automatically  label  the  data  and  then  use  such   data   to   train   machine   learning   algorithms.   The   paper   shows   that   machine   learning   algorithms   trained   with   such   approach   achieve   above   80%   accuracy,   when  classifying  messages  as  positive  or  negative.  A  similar  idea  was  previously   explored   by   [3]   for   movie   reviews,   by   using   star   ratings   as   polarity   signals   in   their   training   data.   This   latter   paper   analyses   the   performance   of   different   classifiers   on   movie   reviews,   and   presents   a   number   of   techniques   that   were   used   by   many   authors   and   served   as   baseline   for   posterior   studies.   As   an   example,   they   have   adapted   a   technique,   introduced   by   [14],   for   modeling   the   contextual   effect   of   negation,   adding   the   prefix   NOT_   to   every   word   between   a  

“negation  word”  and  the  first  punctuation  mark  following  the  negation  word.  

Common  approaches  to  sentiment  analysis  involve  the  use  of  sentiment  lexicons   of   positive   and   negative   words   or   expressions.   The   General   Inquirer   [13]   was   one  of  the  first  available  sentiment  lexicons  freely  available  for  research,  which   includes   several   categories   of   words,   such   as:   positive   vs.   negative,   strong   vs.  

week.   Two   other   examples   include   [10],   an   opinion   lexicon   containing   about   7000   words,   and   the   MPQA   Subjectivity   Cues   Lexicon   [16],   where   words   are   annotated  not  only  as  positive  vs.  negative,  but  also  with  intensity.  

Learning   polarity   lexicons   is   another   research   approach   that   can   be   especially   useful  for  dealing  with  large  corpora.  The  process  starts  with  a  seed  set  of  words   and   the   idea   is   to   increasingly   find   words   or   phrases   with   similar   polarity,   in   semi-­‐supervised   fashion   [12].   The   final   lexicon   contains   much   more   words,   possibly  learning  domain-­‐specific  information,  and  therefore  is  more  prone  to  be   robust.    

Work   on   Topic   Detection   has   its   origins   in   1996   with   the   Topic   Detection   and   Tracking  (TDT)  initiative  sponsored  by  the  US  government.  The  main  motivation   for  this  initiative  was  the  processing  of  the  large  amounts  of  information  coming  

(3)

from   newswire   and   broadcast   news.   The   main   goal   was   to   organize   the   information  in  terms  of  events  and  stories  that  discussed  them.  The  concept  of   topic  was  defined  as  the  set  of  stories  about  a  particular  event.  Five  tasks  were   defined:  story  segmentation,  first  story  detection,  cluster  detection,  tracking,  and   story  link  detection.  The  current  impact  and  the  amount  information  generated   by   social   media   led   to   a   state   of   affairs   similar   to   the   one   that   fostered   the   pioneer   work   on   TDT.   Social   media   is   now   the   context   for   research   tasks   like   topic  (cluster)  detection  [9,  6]  or  emerging  topic  (first  story)  detection  [15].    

In   that   sense,   closer   to   our   work   are   the   approaches   described   by   [4]   and   [9],   where  tweets  are  classified  into  previously  defined  sets  of  generic  topics.  In  the   former,   a   conventional   bag-­‐of-­‐words   (BOW)   strategy   is   compared   to   a   specific   set  of  features  (authorship  and  the  presence  of  several  types  of  twitter-­‐related   phenomena)   using   a   Naive   Bayes   (NB)   classifier   to   classify   tweets   into   the   following   generic   categories:   News,   Events,   Opinions,   Deals,   and   Private   Messages.   Findings   show   that   authorship   is   a   quite   important   feature.   In   the   latter,   two   strategies,   BOW   and   network-­‐based   classification,   are   explored   to   classify   clusters   of   tweets   into   18   general   categories,   like   Sports,   Politics,   or   Technology.  In  the  BOW  approach,  the  clusters  of  tweets  are  represented  by  TF-­‐

IDF   vectors   and   NB,   NB   Multinomial,   and   Support   Vector   Machines   (SVM)   classifiers   are   used   to   perform   classification.   The   network-­‐based   classification   approach  is  based  on  the  links  between  users  and  C5.0  decision  tree,  k-­‐Nearest   Neighbor,   SVM,   and   Logistic   Regression   classifiers   were   used.   Network-­‐based   classification  was  shown  to  achieve  a  better  performance,  but  being  link-­‐based,  it   cannot  be  used  for  all  situations.  

Data  

The   data   provided   in   the   context   of   the   TASS   contest   [18]   consists   of   Spanish   tweets   written   in   Spanish   by   200   well-­‐known   personalities   and.   The   training   data   is   an   XML   file   containing   about   7200   tweets,   each   one   labeled   with   sentiment  polarity  and  the  corresponding  topics.  The  goal  consists  in  providing   automatic   sentiment   and   topic   classification   for   the   test   data,   which   is   also   available  in  XML  and  consists  of  about  60800  unlabeled  tweets.    

Each  tweet  in  the  labeled  data  is  annotated  in  terms  of  polarity,  using  one  of  six   possible  values:  NONE,  N  (negative),  N+  (very  negative),  NEU  (both  negative  and   positive),   P   (positive),   P+   (very   positive).   Moreover,   each   annotation   is   also   marked   as   AGREEMENT   or   DISAGREEMENT,   indicating   whether   all   the   annotators   performed   the   annotation   coherently.   In   what   concerns   topic   detection,  each  tweet  was  annotated  with  one  or  more  topics,  from  a  list  of  10   possible   topics:   política   (politics),   otros   (others),   entretenimiento   (entertainment),   economía   (economics),   música   (music),   fútbol   (football),   cine   (movies),  tecnología  (technology),  deportes  (sports),  and  literatura  (literature).    

It  is  also  important  to  mention  that,  besides  the  tweets,  an  extra  XML  file  is  also   available,   containing   information   about   each   one   of   the   users   that   authored   at   least   one   of   the   tweets   in   the   data.   In   particular,   the   information   includes   the   type   of   user,   assuming   one   of   three   possible   values   –   periodista   (journalist),  

(4)

famoso  (famous  person),  and  politico  (politician)  –  which  may  provide  valuable   information  for  these  tasks.  

 Approaches  and  Results  

The   number   of   participants   was   eight   for   the   sentiment   analysis   task,   but   only   six  of  them  also  performed  the  topic  detection  task.  The  most  relevant  strategies   described  by  the  participants  cover:  treatment  of  emoticons,  processing  negation,   spelling   error   correction,   part-­‐of-­‐speech   tagging,   use   of   named   entities   and   adjectives,   syntactic   information,   word   sense   disambiguation,   polarity   lexicons   (for   sentiment   analysis),   Twitter-­‐LDA   -­‐   Latent   Dirichelet   Allocation   (for   topic   detection),   information   retrieval   motivated   strategies   based   on   language   divergence.  In  what  concerns  machine  learning,  the  participants  have  reported   the   use   of   Logistic   Regression,   Multinomial   Naive   Bayes,   and   SVM   (Support   Vector  Machines).  

Sentiment  Analysis   Topic  Detection   65.3%   65.4%  

63.4%   60.1%  

57.0%   45.3%  

Table  1  –  Top  3  results  (accuracy)  

Table   1   presents   the   best   results   achieved   for   each   one   of   the   tasks.   Our   team   achieved  the  best  result  for  Topic  Detection  and  the  second  place  for  Sentiment   Analysis.   The   team   which   achieved   the   best   results   for   Sentiment   Analysis,   involved   treating   emoticons,   negation,   spelling   errors,   POS-­‐tagging,   syntactic   analysis,   and   the   used   of   sentiment   lexicons.   Our   results   for   this   task   involved   not  only  text  features,  but  also  the  tweet's  author  name  and  the  user  type,  also   available  in  the  distributed  data.  Apart  from  the  provided  data,  our  experiments   also   used   Spanish   Sentiment   Lexicons   (http://lit.csci.unt.edu/),   a   resource   created  at  the  University  of  North  Texas  [17].  From  this  resource,  only  the  most   robust  part  was  used,  known  as  fullStrengthLexicon,  and  containing  1346  words   automatically   labeled   with   sentiment   polarity.     For   Topic   Detection,   our   approach   was   the   most   successful   and   involved   the   use   of   text   features   (after   removing  punctuation  marks)  and  the  tweet's  author  name.  

Bibliography  

1. A.   Go,   R.   Bhayani,   and   L.   Huang,   “Twitter   Sentiment   Classification   using   Distant  Supervision,”  tech.  rep.,  Stanford  University,  2009.  

2. A.  L.  Berger,  S.  A.  D.  Pietra,  and  V.  J.  D.  Pietra.  A  maximum  entropy  approach   to  natural  language  processing.  Computational  Linguistics,  22(1):39–71,  1996.  

3. B.   Pang,   L.   Lee,   and   S.   Vaithyanathan,   “Thumbs   up?   sentiment   classification   using  machine  learning  techniques,”  in  Proceedings  of  Empirical  Methods  in   Natural  Language  Processing  (EMNLP),  pp.  79–86,  2002.  

4. B.  Sriram,  D.  Fuhry,  E.  Demir,  H.  Ferhatosmanoglu,  and  M.  Demirbas,  “Short   Text  Classification  in  Twitter  to  Improve  Information  Filtering,”  in  SIGIR’10:  

(5)

Proceedings  of  the  33rd  international  ACM  SIGIR  conference  on  Research  and   Development  in  Information  Retrieval,  pp.  841–842,  ACM,  2010.  

5. Bollen,  Johan,  Huina  Mao,  and  Xiao-­‐Jun  Zeng.  2010.  Twitter  mood  predicts  the   stock  market.  CoRR,  abs/1010.3003.  

6. C.  Lin,  Y.  He,  R.  Everson,  and  S.  Rüger,  “Weakly  Supervised  Joint  Sentiment-­‐

Topic   Detection   from   Text,”   IEEE   Transactions   On   Knowledge   And   Data   Engineering,  vol.  24,  no.  6,  pp.  1134–1145,  2012.  

7. F.   Batista,   R.   Ribeiro,   Sentiment   Analysis   and   Topic   Classification   based   on   Binary  Maximum  Entropy  Classifiers,  Procesamiento  de  Lenguaje  Natural,  vol.  

50,  pp.  77–84.  ISSN:  1989-­‐7553,  2013.  

8. H.  Daumé  III.  Notes  on  CG  and  LM-­‐BFGS  optimization  of  logistic  regression.  

http://hal3.name/megam/,  2004.  

9. K.  Lee,  D.  Palsetia,  R.  Narayanan,  M.  Patwary,  A.  Agrawal,  and  A.  Choudhary,  

“Twitter   trending   topic   classification,”   in   International   Conference   on   Data   Mining  Workshops  (ICDMW),  pp.  251–258,  IEEE,  2011.  

10. M.   Hu   and   B.   Liu,   “Mining   and   summarizing   customer   reviews,”   in   Proceedings   of   the   10th   ACM   SIGKDD   International   Conference   on   Knowledge  Discovery  and  Data  Mining,  KDD  ’04,  pp.  168–177,  ACM,  2004.  

11. O’Connor,   B.,   R.   Balasubramanyan,   B.   R.   Routledge,   and   N.   A.   Smith.   2010.  

From  tweets  to  polls:  Linking  text  sentiment  to  public  opinion  time  series.  In   Proceedings   of   the   Fourth   International   AAAI   Conference   on   Weblogs   and   Social  Media  (ICWSM’10).  

12. P.  D.  Turney,  “Thumbs  Up  or  Thumbs  Down?  Semantic  Orientation  Applied  to   Unsupervised  Clas-­‐  sification  of  Reviews,”  in  Proceedings  of  the  40th  Annual   Meeting   on   Association   for   Computational   Linguistics,   pp.   417–424,   ACL,   2002.  

13. P.   Stone,   D.   Dunphy,   M.   Smith,   and   D.   Ogilvie,   The   General   Inquirer:   A   Computer  Approach  to  Content  Analysis.  MIT  Press,  1966.  

14. S.  Das  and  M.  Chen,  “Yahoo!  for  Amazon:  Extracting  market  sentiment  from   stock  message  boards,”  in  Proceedings  of  the  Asia  Pacific  Finance  Association   Annual  Conference  (APFA),  2001.  

15. S.  P.  Kasiviswanathan,  P.  Melville,  A.  Banerjee,  and  V.  Sindhwani,  “Emerging   Topic   Detection   using   Dictionary   Learning,”   in   CIKM’11:   Proceedings   of   the   20th   ACM   international   conference   on   Information   and   Knowledge   Management,  pp.  745–754,  ACM,  2011.  

16. T.   Wilson,   J.   Wiebe,   and   P.   Hoffmann,   “Recognizing   contextual   polarity   in   phrase-­‐level   sentiment   anal-­‐   ysis,”   in   Proceedings   of   Human   Language   Technology   Conference   and   Conference   on   Empirical   Methods   in   Natural   Language  Processing  (HLT/EMNLP),  pp.  347–354,  ACL,  2005.  

17. V.   Perez-­‐Rosas,   C.   Banea,   and   R.   Mihalcea.   Learning   sentiment   lexicons   in   spanish.  Proc.  of  the  8th  International  Conference  on  Language  Resources  and   Evaluation  (LREC’12),  Istanbul,  Turkey,  May  2012.  

18. Villena-­‐Román,   J.,   J.   García-­‐Morera,   C.   Moreno   Garcia,   L.   Ferrer-­‐Ureña,   S.  

Lana-­‐Serrano,  J.  C.  González-­‐Cristobal,  A.  Westerski,  E.  Martínez-­‐Cámara,  M.  A.  

García-­‐Cumbreras,  M.  T.  Martín-­‐Valdivia,  and  Ureña-­‐López  L.  A.  2012.  TASS-­‐  

Workshop   on   Sentiment   Analysis   at   SEPLN.   Procesamiento   de   Lenguaje   Natural,  50.  

 

References

Related documents

For estimating the export demand function in equation 1, the variables used are as follows: XV is Jordan’s index of volume of exports, YW is world income proxied by the

By using historical approach (intellectual history), the relationship between religion, state, and nationalism based on the thinking of the two Muslim leaders can

H 4 : five price categories in the tick size policy have a significantly positive effect on trading volume Aside of making an effort to increase liquidity, the purpose of tick

According to the PIDX Standards Master, version 1.0 (PIDX1), the purpose of the standards is “…to promote standard processes for non-catalog, configurable products and services

This ratio is relatively high compared to what the Treasury was actually considering for the TARP auctions (i.e. no more than 50% of the total value of the securities included in

Given a tree model constraint network and the pseudo tree T identical to its primal graph, the explicit AND/OR search graph of the tree model relative to T is obtained from S T

Topic #5: Network security meets secure network traffic (2) • Recommendations:. • Network-based intrusion detection and prevention systems must incorporate content-blind rules

Vice  President  at  Southern  California  Edison..  Martin  Berlin