The Italian Hate Map:
semantic content analytics for social good
Cataldo Musto
, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
(Università degli Studi di Bari ‘Aldo Moro’, Italy - SWAP Research Group)
I-CiTies 2015
2015 CINI Annual Workshop on ICT
for Smart Cities and Communities
Palermo (Italy) - October 29-30, 2015
2
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
3
The Italian HateMap
http://users.humboldt.edu/mstephens/hate/hate_map.html
Inspired by the
Hate Map
built by
the
Humboldt
University
joint research
with a
psychologists team
of
Rome University and a
no-profit agency
focused on human
rights
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
4
http://users.humboldt.edu/mstephens/hate/hate_map.html
Insight:
To aggregate rough
people-based data
in order to analyze
complex
phenomena.
The Italian HateMap
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
5
(Not a new idea)
Map of cholera in London,
1854
red
= cholera cases
blue
= water
The Italian HateMap
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
6
Research Question:
Is it possible to
extract and process social media
to detect intolerant content posted on social
networks and identify the
most at-risk areas
of the
Italian country
?
The Italian HateMap
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
7
A framework for
real-time
Semantic Analysis
of
Social Streams
CrowdPulse
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
8
CrowdPulse
Social Data Extraction
features
Semantic Tagging
Sentiment Analysis
Processing & Visualization
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
9
workflow
CrowdPulse
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
10
Step 1:
CrowdPulse
Social Data Extraction
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
11
Step 1:
Social Data Extraction
Extraction
Source
Heuristics
CrowdPulse
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
12
Step 1:
Social Data Extraction
Extraction
Source
Heuristics
CrowdPulse
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
13
Step 1:
Social Data Extraction
Extraction
Source
Heuristics
Content
User
Geo
Content+Geo
#icities2015 #democrats #traffic @barack_obama @comunepalermo #earthquakePage
Group
CrowdPulse
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
14
Step 1:
Social Data Extraction
Extraction
Source
Heuristics
Content
User
Geo
Content+Geo
#www2015 #democrats #traffic @barack_obama @comunefi #earthquakePage
Group
We only extract
public content
CrowdPulse
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
15
Use Case
Heuristics:
Twitter content
-
76 intolerant seed terms
, defined by the psychologists teams
-
5 intolerance dimensions
: violence (against women), racism,
homophobia, disability, anti-semitism
C
ROWD
P
ULSE
SETTINGS
The Italian Hate Map
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
16
Use Case
Extracted content
(seed term: nano/midget)
Tweet about an Italian ministry
C
ROWD
P
ULSE
SETTINGS
Tweet about iPod nano
Tweet about an Italian football player
The Italian Hate Map
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
17
Use Case
Tweet about an Italian ministry
C
ROWD
P
ULSE
SETTINGS
Tweet about iPod nano
Tweet about an Italian football player
The Italian Hate Map
Many
non-intolerant
Tweets are extracted!
X
X
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
18
Use Case
Sentiment Analysis
and
Semantic Tagging
of the content
C
ROWD
P
ULSE
SETTINGS
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
The Italian Hate Map
Keyword-based representation
introduces
a lot of noise
in the analysis
nano
?
(midget) (ipod nano)Semantic Tagging
Motivations
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
“E’inutile, il mio
nano
non segnerà mai”
?
Semantic Tagging
Motivations
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
INTOLERANT
NOT INTOLERANT
?
•
Entity Linking Algorithms
•
Input:
textual content
•
Output:
identification and
disambiguation of the
entities
mentioned in the text.
(1) http://tagme.di.unipi.it
(2) http://spotlight.dbpedia.org
21
Step 2:
Semantic Tagging
Solution:
semantic processing of extracted content
Algorithms
CrowdPulse
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
22
Use Case
Non-intolerant Tweets are detected and filtered out.
C
ROWD
P
ULSE
SETTINGS
The Italian Hate Map
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
23
CrowdPulse
Step 3:
Sentiment Analysis
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
24
Sentiment Analysis
Motivations
Is this content conveying
any opinion
?
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
25
Sentiment Analysis
Motivations
Is this content conveying
any opinion
?
This is a crucial issue if
people-based findings
have to be generated
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
26
Sentiment Analysis
Definition
“It is the field of study that
analyzes
people’s
opinions, sentiments,
evaluations, appraisals,
attitudes, and emotions
towards entities such as
products, services,
organizations, individuals,
issues, events, topics, and
their attributes
“ (*)
(Pang, Bo, and Lillian Lee. "Opinion mining and sentiment analysis." Foundations and trends in information retrieval, 2008)
We concentrated on the
polarity detection
task
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
27
Use Case
Tweets with
positive
or
neutral
sentiment are detected and filtered out.
C
ROWD
P
ULSE
SETTINGS
The Italian Hate Map
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
28
Use Case
C
ROWD
P
ULSE
SETTINGS
The Italian Hate Map
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
29
CrowdPulse
Step 4:
Processing
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
30
Use Case
We have to
build a map
, so we
only need
geotagged content
C
ROWD
P
ULSE
SETTINGS
The Italian Hate Map
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
31
Use Case
C
ROWD
P
ULSE
SETTINGS
The Italian Hate Map
Definition of
heuristics
to increase the
number of
geotagged Tweets
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
32
Use Case
The Italian Hate Map
Dimension
#Tweets
#Geo
%Geo
Homophobia
110,774
8,501
7,66%
Racism
154,170
1,940
1,24%
Violence
1,102,494
28,886
2,62%
Disability
479,654
3,410
0,75%
Anti-Semitism
6,000
1,150
18,03%
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
33
CrowdPulse
Step 4:
Data Visualization
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
34
Use Case
C
ROWD
P
ULSE
O
UTPUT
The Italian Hate Map
Violence against women
Disability
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
based on OpenStreetMap
35
Use Case
C
ROWD
P
ULSE
O
UTPUT
The Italian Hate Map
Racism
Homophobia
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
based on OpenStreetMap
Conclusions
36
Crowdsourcing-based approach
Social content
containing the seed terms is
extracted and processed
in
real-time
Semantic Processing
exploited to delete non-intolerant
Tweets
Sentiment Analysis
used to filter out
Tweet
with irony
1.
2.
3.
4.
Analytics Console
used
to build
real-time hate
maps
Almost
2,000,000 social content
extracted and analyzed.
The Italian Hate Map
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
Lessons Learned
37
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
38
Lessons Learned
The Italian Hate Map
Given the
maps
and given the output of the
linguistic
analysis
of intolerant Tweets (co-occurrences between terms,
time lapse, etc.), the psychologists team defined some
guidelines
to tackle and prevent intolerant behaviors.
These guidelines have been freely distributed to public
administration on early 2015.
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
Lessons Learned
39
Pipeline of state of the art techniques
Semantic Processing, Sentiment Analysis, Machine Learning, Data Visualization
Use Case:
The Italian Hate Map
DEFINITION OF A FRAMEWORK FOR
REAL-TIME SEMANTIC CONTENT ANALYSIS
Thanks to the
huge availability of
textual data
very complex
phenomena
can be analyzed in a
totally new way
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis