• No results found

Social Network Mining

N/A
N/A
Protected

Academic year: 2021

Share "Social Network Mining"

Copied!
52
0
0

Loading.... (view fulltext now)

Full text

(1)

SSIIM  -­‐  Seminários  de  Sistemas  Inteligentes,  Interacção  e  Mul8média,  MIEIC  

Social  Network  Mining  

Eduarda  Mendes  Rodrigues  

Assistant  Professor  

DEI-­‐FEUP,  Universidade  do  Porto  

   

hHp://www.fe.up.pt/~eduarda   [email protected]  

(2)
(3)

Social  Media  Landscape  

People  

the  individual  is  at  the  center  of  the  social  web  

Social  media  networks  

explicit

 and  

implicit  

social  8es  

interac8on  among  millions  of  people  

 

User-­‐generated  content  

rich  source  of  collec8ve  knowledge  

diffusion  of  informa8on  and  opinions  

(4)

Informa8on  Retrieval  and  Social  Media    

•  Proper8es  of  social  media  

–  Scale:  millions  of  ac8ve  users,  millions  of  posts  per  day  

–  Real-­‐.me:  breaking  news,  informa8on  novelty  

–  Duplicates:  informa8on  diffusion  (re-­‐tweets,  cross-­‐posts,  etc.)  

–  Content  quality:  spelling,  grammar,  punctua8on,  emo8cons,  etc.  

–  Social  fabric:  informa8on  credibility,  opinion  leaders,  topic  experts  

 

•  Some  challenges:  relevance  and  ranking    

–  Social  vs.  non-­‐social  content  

–  Novelty  detec8on  

(5)

Informa8on  Credibility  

•  Several  newspapers  picked  up  the  fake  

photos  

•  Wrongly  indexed  by  search  engines  

based  on  the  news  stories  

(6)

Social  Media  Mining  

…and patterns are left

behind!

"

(7)

Social  Media  Mining  

Can  social  network  analysis  enrich  the   content  analysis?  

     

Can  the  content  analysis  help  explain  the   social  network  structure  and  dynamics?  

Content Analysis!

Social Network Analysis!

§  user  ac8vity  sta8s8cs  

§  interac8on  paHerns  

§  social  network  metrics  

§  community  detec8on  

§  visualiza8on  

§  text  features  

§  topic  analysis  

§  clustering  and  classifica8on  

(8)

Current  Research  

•  Data  mining  and  IR  in  social  media  

–  social  network  mining  

–  text  classifica8on,  opinion  mining  

–  micro-­‐blog  search  

 

•  Network  visualiza8on    

–  layout  and  clustering  algorithms  

–  design  of  interac8ve  tools  

 

•  Data  journalism  

–  informa8on  extrac8on  from  news    

–  real-­‐8me  social  media  analy8cs  

 

(9)

Social  Media  Networks  

•  Explicit  social  .es  

–  Friends  on  Facebook  

–  Followers  on  TwiHer  

–  Professional  contacts  on  LinkedIn  

–  ...  

 

•  Implicit  social  .es  

–  Like,  favorite,  repin  

–  Reply,  retweet,  share  

–  Comment,  review  

–  Tag,  rate,  vote  

(10)

Implicit  Networks  for  Social  Media  Mining  

•  Discussion  groups  (usenet  newsgroups)  

–  Can  we  iden.fy  posts  with  answers  in  Q&A  groups?  

–  Can  we  predict  agreement  and  disagreement  in  debate  groups?  

 

•  Community  Q&A

 

–  What  type  of  ques.ons  are  posted?  

(11)

Discussion  Group  Communi8es  

•  Discussion  groups  are  extremely    

 valuable  sources  of  informa8on  

•  Iden8fying  the  polarity  of  people’s    

 opinions  about  certain  topics  is    useful  for  business  intelligence  

•  People  seeking  informa8on  through    

 newsgroup  search  may  want  to  be      pointed  at  answers  to  their  ques8ons  

(12)

Implicit  Networks  in  Discussion  Groups  

thread structure" social network graph"

discussion thread"

w=2!

(13)

Mining  PaHerns  of  Social  Interac8on  

Author Networks Thread Networks

•  Reply-to Network: connects authors who reply to other authors

•  Thread Participation Network: connects authors who

co-participate in threads

•  Text Similarity Network: connects authors of similar content

•  Common Authors Network: connects threads

that have common authors

•  Text Similarity Network: connects threads of

similar content

Feature  Sets  

Supervised  Learning     (Linear  SVM)  

Message  Categories  

§  Agreement,  Disagreement,  Insult  

§  Ques8on,  Answer  

B. Fortuna, E. Mendes Rodrigues, N. Milic-Frayling. Improving the Classification of Newsgroup Messages through Social Network analysis. ACM 16th Intl. Conf. on Information and Knowledge Management, CIKM 2007 (PDF).

(14)

Mining  PaHerns  of  Social  Interac8on  

Topic   Experts  

Reply-to network at distance 2 for the most prolific authors of

talk.politics.guns (LEFT) and microsoft.public.internetexplorer.general (RIGHT) newsgroups.

(15)

Analysis  of  CQA  Communi8es  

•  CQA  services  aim  build  a  large  knowledge  base  of  

ques.ons  and  answers,  on  any  topic,  and  make  it   available  through  search  

Challenge:  content  quality!  

2003" 2006" 2005" 2002" 2002" 2006" 2010" question" answers"

(16)

Is  the  community  sharing  knowledge?    

User  Intent  &  Ques8on  Types  

Mendes Rodrigues, E., Milic-Frayling, N., Sharing Knowledge or socializing? Characterizing User Intent in Community Question Answering, Proceedings of the 2009 ACM International Conference on Information and Knowledge Management, CIKM ’09.

(17)

Mining  Ques8on  Types  

•  Automa8c  classifica8on  problem  

–  Social  vs.  Non-­‐social  ques.ons  

•  Feature  sets  

–  Ques.on  features  

 Content  (c.idf  scores  for  single  terms  and  n-­‐grams),  message  length  

–  Thread  features  

 Responsiveness,  user  par8cipa8on,  presence  of  URLs  in  answers  

–  Tags  and  topic  features  

 Aggregate  informa8on  about  specificity  of  tag  or  topic  

–  Social  network  features  for  users  involved  in  the  thread  

(18)

Social  Network  Structure  

•  Community  ecosystem  evolved  in  such  a  way  that  encouraged  

interac8ons  of  a  social  nature  

–  84.5%  of  ques8on  are  non-­‐social  and  6.5%  are  social  

–  Over  8me,  the  percentage  of  social    

         ques8ons  and  respec8ve  answers  and              comments  increased  significantly  

 

•  How  social  are  individual  users?  

•  Social  score:    

–  S(u)  =  |social|  /  |non-­‐social|  

(19)

Social  Network  Structure  

•  Users  with  high  degree  post  a  large  percentage  of  social  ques8ons  

•  Users  who  answer  and  comment  on  social  threads  have  dense  in-­‐

neighborhoods    

(20)

Social  Network  Analysis  

Mapping  and  measurement  of  rela8onships  and  

flows  between  en88es  that  include  people  

Views  social  rela8onships  in  terms  of  network  

theory  consis8ng  of  

nodes

 and  

links

   

– 

node

:  “actor”  on  which  rela8onships  act  

– 

link

:  rela8onship  connec8ng  nodes  

(21)

Social  Network  Analysis  

Social  network  graphs  can  be  analysed  using  a  number  of  

metrics

 including:  

• 

cohesion

 

of  the  network  or  sub-­‐network                                                  

measures  the  ease  with  which  connec/ons  can  be  made  

• 

density

 

of  the  network  or  sub-­‐network                                                                                      

measures  the  robustness  of  the  connec/ons    

• 

centrality

 

of  the  nodes                                                                                                                                  

gives  a  rough  indica/on  of  the  social  power  of                                                                                   a  node  in  the  network  

-­‐  degree  

-­‐  betweenness  

-­‐  closenness  

(22)

Degree  Centrality  

Count  of  the  number  of  links   to  other  nodes  in  the  network  

Higher  degree  of  a  node  might  indicate   that  the  node  is  a  hub  in  the  network  

Most  connected  does  not  mean   most  powerful!    

(23)

Betweeness  Centrality  

Number  of  shortest  paths  between   each  node  pair  that  a  node  is  on  

Boundary  spanners  that  bridge  

between  groups  have  high  betweeness  

High  betweenness  generally  indicates  a   powerful  posi8on  in  the  network!    

(24)

Closeness  Centrality  

Mean  shortest  path  between  a   node  and  all  other  nodes  in  the   network  reachable  from  it  

Reflects  the  ability  of  a  node  in  accessing   informa8on  through  the  network  

Low  closeness  generally  indicates  high  

visibility  of  what’s  going  on  in  the  network!     ©  Will  Ockenden  

(25)

Centrality  Mesures  and  Node  Roles  

Social  network  graph  

Peripheral

 –  below  average  centrality  (C)  

Central  connector

 –  above  average  centrality  (D)  

(26)

Visual  Signatures  of  Social  Roles  

Answerer   Connector   Originator  

•  Outward  links  to  local   isolates  

•  Rela8ve  absence  of   triangles  

•  Few  intense  links  

•  Links  from  local  isolates   oren  inward  only  

•  Dense,  many  triangles  

•  Numerous  intense  links  

•  Links  from  local  isolates   oren  inward  only  

•  Sparse,  few  triangles  

•  Few  intense  links  

Welser, H., Smith, M., Gleave, E. and Fisher, D. Visualizing the Signatures of Social Roles in Online Discussion Groups. Journal of Social Structure, vol. 8, 2007.

(27)

Network  Visualiza8on  

Visualiza8on  should  support  knowledge   discovery  and  communica8on  

(28)

How  good  is  a  network  visualiza8on?  

Ideally…  

Every  node  is  visible  

The  degree  of  every  node  can  be  counted  

It  is  possible  to  follow  every  link  from  source  to  

des8na8on  

Clusters  and  outliers  are  iden8fiable  

NetViz  Nirvana!!!  

 

C.  Dunne  and  B.  Shneiderman,  “Improving  graph  drawing  readability  by  incorpora8ng  readability  metrics:  A  sorware  tool  for  network  analysts,”   University  of  Maryland,  HCIL  Tech  Report  HCIL-­‐2009-­‐13,  May  2009.  

(29)

How  good  is  a  network  visualiza8on?  

Challenge:  real  networks  are  oren  very  complex  structures.  

Interpreta8on  of  the  network  structure  oren  requires  

visualizing  addi8onal  informa8on  about  the  nodes  and  links.  

Standard  layout  algorithms  don’t  help  much  when  the   size  of  the  network  is  above  a  few  hundred  nodes  and   the  network  is  rela8vely  dense  in  the  number  of  links.    

(30)

Some  Visualiza8on  Approaches  

Overview  of  the  network  

Zoom  and  details  on  demand  

Dynamically  filter  nodes  and  links    

Integrate  metrics  and  visualiza8on  

(31)

Interpret   Data   Adjust  visual   proper8es   Choose         network  layout   Apply  data   filters  

Network  Analysis  and  Visualiza8on  Process  Model  

Interpret   Data   Collect   Network   Data   Define   Analysis   Goals  

D.  L.  Hansen,  D.  Rotman,  E.  M.  Bonsignore,  N.  Milic-­‐Frayling,  E.  Mendes  Rodrigues,  M.  Smith,  and  B.  Shneiderman,  “Do  you  know  the  way  to   SNA?:  A  process  model  for  analyzing  and  visualizing  social  media  data.”  in  University  of  Maryland  Tech  Report:  HCIL-­‐2009-­‐17.  

(32)

Network  Analysis  and  Visualiza8on  Process  Model  

Interpret   Data   Collect   Network   Data   Define   Analysis   Goals   Adjust  visual   proper8es   Choose         network  layout   Apply  data   filters  

D.  L.  Hansen,  D.  Rotman,  E.  M.  Bonsignore,  N.  Milic-­‐Frayling,  E.  Mendes  Rodrigues,  M.  Smith,  and  B.  Shneiderman,  “Do  you  know  the  way  to   SNA?:  A  process  model  for  analyzing  and  visualizing  social  media  data.”  in  University  of  Maryland  Tech  Report:  HCIL-­‐2009-­‐17.  

(33)

Network  Analysis  and  Visualiza8on  Process  Model  

Interpret   Data   Collect   Network   Data   Define   Analysis   Goals   Refining  /  adjus8ng   goals  arer  the  first  

look  at  the  data  

Analysis  may  require   addi8onal  data   Discovery  may  trigger  

further  analyses  

D.  L.  Hansen,  D.  Rotman,  E.  M.  Bonsignore,  N.  Milic-­‐Frayling,  E.  Mendes  Rodrigues,  M.  Smith,  and  B.  Shneiderman,  “Do  you  know  the  way  to   SNA?:  A  process  model  for  analyzing  and  visualizing  social  media  data.”  in  University  of  Maryland  Tech  Report:  HCIL-­‐2009-­‐17.  

(34)
(35)

Flickr  Related  Tags  Network  –  “Mouse”  

Computer   Mickey  

(36)
(37)
(38)
(39)
(40)
(41)
(42)
(43)

Connected Action: Marc Smith

Microsoft Research: Natasa Milic-Frayling, Tony Capone University of Porto: Eduarda Mendes Rodrigues

University of Maryland: Ben Shneiderman, Cody Dunne University of Stanford: Jure Leskovec

University of Washington: Eric Gleave Cornell University: Vladimir Barash

TEAM

NodeXL  Project  

Open  source  project  at:  hHp://nodexl.codeplex.com  

Social  Network  Analysis  add-­‐in  for  MS  Excel  makes  graph  theory   as  easy  as  a  bar  chart,  integrated  analysis  of  social  media  sources.  

(44)

REACTION  Project  

hHp://dmir.inesc-­‐id.pt/project/Reac8on  

•  Computa.onal  journalism  

Intensive  use  of  sorware  tools   for  news  research,  produc8on   and  presenta8on  

•  What  is  the  impact  in  the  

rou.nes  of  newsrooms?  

•  What  effect  will  these  tools  have  

on  the  quality  of  news  and  the  

produc.vity  of  journalists?    

Retrieval,  Extrac/on  and  Aggrega/on   Compu/ng  Technology  for  Integra/ng  

(45)

Data  Journalism  –  Implicit  News  Networks  

•  Informa8on  extrac8on  from  

thousands  of  online  news   ar8cles  

   

•  SAPO  Labs  developed  NLP  

technology  for  Named  En8ty   Recogni8on  in  news  (Verbetes   service)  

•  Rela8onship  extrac8on  based  

on  co-­‐occurrence  

Pedro  Passos  Coelho,407,128   Silvio  Berlusconi,271,106   Aníbal  Cavaco  Silva,234,98   …  

'Paulo  Bento'  e  'Cris8ano  Ronaldo'  co-­‐ocorreram  em  72   no€cias  

'Paulo  Bento'  e  'Bruno  Alves'  co-­‐ocorreram  em  39  no€cias   'Paulo  Bento'  e  'Raul  Meireles'  co-­‐ocorreram  em  37  no€cias   …  

(46)

Data  Journalism  –  Implicit  News  Networks  

•  News  social  networks  

–  Named  en8ty  extrac8on  

–  En8ty  co-­‐occurrences  

–  Interac8ve  visualiza8on  

 

•  Applica8ons  

–  Inves8ga8ve  journalism  

–  Review  of  the  week  

–  User  engagement  

(47)

Data  Journalism  –  Opinion  Mining  

(48)

Data  Journalism  –  Opinion  Mining  

TwiHerEcho  Crawler  

Opinion  Mining  Module   Dic8onary  of  

names   Sen8ment  lexicon  

Query  

TwiHerEcho   Rule-­‐based  classifier  

Stats  

(49)

Data  Journalism  -­‐  TwiHeuro  

•  Real-­‐8me  social  media  

monitoring  

–  Big  data  crawling  and  analy8cs  

–  En8ty  extrac8on  

–  Interac8ve  visualiza8on  

 

•  Journalism  applica8ons  

–  Event  repor8ng  (#Euro  2012)  

(50)
(51)

Project  Themes  

Survey  paper    

–  on  mining  social  media  data  for  business  intelligence  (e.g.  

brand  management;  targeted  adver8sing;  new  product   development)  

–  on  opinion  mining  techniques  for  social  media  content  and  

applica8ons  

–  on  community  detec8on  techniques  for  implicit  social  

networks  

Social  media  visualiza.on  widgets    

–  visualiza8on  for  tracking  the  propaga8on  of  twiHer  memes  

–  spa8o-­‐temporal  visualiza8on  of  tweets  with  named  en88es  

(52)

Thank  you!    Ques8ons?  

 

 

 

hHp://www.fe.up.pt/~eduarda      [email protected]   @eduardamr  

References

Related documents

Non-monogamous matings may result in a family group with subordinate females giving birth (as a result of polygynous or promiscuous matings) or in some cases juveniles not being

Therefore 65% should be converted with electricity emission factors and 35% should be converted with natural gas emission factors as per The Australian Government's Carbon

These risks can be analysed using the "risk tree"; the algorithm of the risk assessment process in aquaculture and the measures to respond to it have been identified;

Using modern methods for missing data analysis with the social relations model: A bridge to social network analysis.. Social

Nilai pertumbuhan panjang dan bobot ser- ta laju pertumbuhan spesifik benih ikan nila me- rah hasil pemijahan induk seleksi mempunyai nilai yang lebih tinggi dibanding

The first hypothesis was: a replication of prior research (Kelemen et al., 2009) in the control condition, such that low agreement with the teleological items should be found in

Attached is the project paper titled “Factors in Determining Store Loyalty among Consumers in Malacca City” to fulfil the requirement as needed by the Faculty of Business Management,

The aim of this study was to evaluate the frequency of epidermal growth factor receptor and KRAS mutations in a population of Brazilian patients with non-small cell lung