• No results found

The Italian Hate Map:

N/A
N/A
Protected

Academic year: 2021

Share "The Italian Hate Map:"

Copied!
40
0
0

Loading.... (view fulltext now)

Full text

(1)

The Italian Hate Map:

semantic content analytics for social good

Cataldo Musto

, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

(Università degli Studi di Bari ‘Aldo Moro’, Italy - SWAP Research Group)

I-CiTies 2015

2015 CINI Annual Workshop on ICT

for Smart Cities and Communities

Palermo (Italy) - October 29-30, 2015

(2)

2

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

(3)

3

The Italian HateMap

http://users.humboldt.edu/mstephens/hate/hate_map.html

Inspired by the

Hate Map

built by

the

Humboldt

University

joint research

with a

psychologists team

of

Rome University and a

no-profit agency

focused on human

rights

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

(4)

4

http://users.humboldt.edu/mstephens/hate/hate_map.html

Insight:

To aggregate rough

people-based data

in order to analyze

complex

phenomena.

The Italian HateMap

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

(5)

5

(Not a new idea)

Map of cholera in London,

1854

red

= cholera cases

blue

= water

The Italian HateMap

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

(6)

6

Research Question:

Is it possible to

extract and process social media

to detect intolerant content posted on social

networks and identify the

most at-risk areas

of the

Italian country

?

The Italian HateMap

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

(7)

7

A framework for

real-time

Semantic Analysis

of

Social Streams

CrowdPulse

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

(8)

8

CrowdPulse

Social Data Extraction

features

Semantic Tagging

Sentiment Analysis

Processing & Visualization

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

(9)

9

workflow

CrowdPulse

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

(10)

10

Step 1:

CrowdPulse

Social Data Extraction

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

(11)

11

Step 1:

Social Data Extraction

Extraction

Source

Heuristics

CrowdPulse

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

(12)

12

Step 1:

Social Data Extraction

Extraction

Source

Heuristics

CrowdPulse

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

(13)

13

Step 1:

Social Data Extraction

Extraction

Source

Heuristics

Content

User

Geo

Content+Geo

#icities2015 #democrats #traffic @barack_obama @comunepalermo #earthquake

Page

Group

CrowdPulse

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

(14)

14

Step 1:

Social Data Extraction

Extraction

Source

Heuristics

Content

User

Geo

Content+Geo

#www2015 #democrats #traffic @barack_obama @comunefi #earthquake

Page

Group

We only extract

public content

CrowdPulse

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

(15)

15

Use Case

Heuristics:

Twitter content

-

76 intolerant seed terms

, defined by the psychologists teams

-

5 intolerance dimensions

: violence (against women), racism,

homophobia, disability, anti-semitism

C

ROWD

P

ULSE

SETTINGS

The Italian Hate Map

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

(16)

16

Use Case

Extracted content

(seed term: nano/midget)

Tweet about an Italian ministry

C

ROWD

P

ULSE

SETTINGS

Tweet about iPod nano

Tweet about an Italian football player

The Italian Hate Map

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

(17)

17

Use Case

Tweet about an Italian ministry

C

ROWD

P

ULSE

SETTINGS

Tweet about iPod nano

Tweet about an Italian football player

The Italian Hate Map

Many

non-intolerant

Tweets are extracted!

X

X

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

(18)

18

Use Case

Sentiment Analysis

and

Semantic Tagging

of the content

C

ROWD

P

ULSE

SETTINGS

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

The Italian Hate Map

(19)

Keyword-based representation

introduces

a lot of noise

in the analysis

nano

?

(midget) (ipod nano)

Semantic Tagging

Motivations

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

(20)

“E’inutile, il mio

nano

non segnerà mai”

?

Semantic Tagging

Motivations

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

INTOLERANT

NOT INTOLERANT

?

(21)

Entity Linking Algorithms

Input:

textual content

Output:

identification and

disambiguation of the

entities

mentioned in the text.

(1) http://tagme.di.unipi.it

(2) http://spotlight.dbpedia.org

21

Step 2:

Semantic Tagging

Solution:

semantic processing of extracted content

Algorithms

CrowdPulse

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

(22)

22

Use Case

Non-intolerant Tweets are detected and filtered out.

C

ROWD

P

ULSE

SETTINGS

The Italian Hate Map

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

(23)

23

CrowdPulse

Step 3:

Sentiment Analysis

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

(24)

24

Sentiment Analysis

Motivations

Is this content conveying

any opinion

?

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

(25)

25

Sentiment Analysis

Motivations

Is this content conveying

any opinion

?

This is a crucial issue if

people-based findings

have to be generated

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

(26)

26

Sentiment Analysis

Definition

“It is the field of study that

analyzes

people’s

opinions, sentiments,

evaluations, appraisals,

attitudes, and emotions

towards entities such as

products, services,

organizations, individuals,

issues, events, topics, and

their attributes

“ (*)

(Pang, Bo, and Lillian Lee. "Opinion mining and sentiment analysis." Foundations and trends in information retrieval, 2008)

We concentrated on the

polarity detection

task

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

(27)

27

Use Case

Tweets with

positive

or

neutral

sentiment are detected and filtered out.

C

ROWD

P

ULSE

SETTINGS

The Italian Hate Map

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

(28)

28

Use Case

C

ROWD

P

ULSE

SETTINGS

The Italian Hate Map

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

(29)

29

CrowdPulse

Step 4:

Processing

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

(30)

30

Use Case

We have to

build a map

, so we

only need

geotagged content

C

ROWD

P

ULSE

SETTINGS

The Italian Hate Map

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

(31)

31

Use Case

C

ROWD

P

ULSE

SETTINGS

The Italian Hate Map

Definition of

heuristics

to increase the

number of

geotagged Tweets

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

(32)

32

Use Case

The Italian Hate Map

Dimension

#Tweets

#Geo

%Geo

Homophobia

110,774

8,501

7,66%

Racism

154,170

1,940

1,24%

Violence

1,102,494

28,886

2,62%

Disability

479,654

3,410

0,75%

Anti-Semitism

6,000

1,150

18,03%

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

(33)

33

CrowdPulse

Step 4:

Data Visualization

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

(34)

34

Use Case

C

ROWD

P

ULSE

O

UTPUT

The Italian Hate Map

Violence against women

Disability

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

based on OpenStreetMap

(35)

35

Use Case

C

ROWD

P

ULSE

O

UTPUT

The Italian Hate Map

Racism

Homophobia

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

based on OpenStreetMap

(36)

Conclusions

36

Crowdsourcing-based approach

Social content

containing the seed terms is

extracted and processed

in

real-time

Semantic Processing

exploited to delete non-intolerant

Tweets

Sentiment Analysis

used to filter out

Tweet

with irony

1.

2.

3.

4.

Analytics Console

used

to build

real-time hate

maps

Almost

2,000,000 social content

extracted and analyzed.

The Italian Hate Map

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

(37)

Lessons Learned

37

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

(38)

38

Lessons Learned

The Italian Hate Map

Given the

maps

and given the output of the

linguistic

analysis

of intolerant Tweets (co-occurrences between terms,

time lapse, etc.), the psychologists team defined some

guidelines

to tackle and prevent intolerant behaviors.

These guidelines have been freely distributed to public

administration on early 2015.

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

(39)

Lessons Learned

39

Pipeline of state of the art techniques

Semantic Processing, Sentiment Analysis, Machine Learning, Data Visualization

Use Case:

The Italian Hate Map

DEFINITION OF A FRAMEWORK FOR

REAL-TIME SEMANTIC CONTENT ANALYSIS

Thanks to the

huge availability of

textual data

very complex

phenomena

can be analyzed in a

totally new way

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

(40)

questions?

Cataldo Musto, PhD

[email protected]

@cataldomusto

http://tagme.di.unipi.it (2) http://spotlight.dbpedia.org http://www.di.uniba.it/~swap

References

Related documents