• No results found

Towards a Set Theoretical Approach to Big Data Analytics

N/A
N/A
Protected

Academic year: 2021

Share "Towards a Set Theoretical Approach to Big Data Analytics"

Copied!
21
0
0

Loading.... (view fulltext now)

Full text

(1)

Towards a Set Theoretical Approach to

Big Data Analytics

IEEE BigData 2014, Alaska, USA

July 01, 2014

Raghava Rao Mukkamala

Postdoc, Software and Systems Section

IT University of Copenhagen, Denmark

!

Joint work with

Ravi Vatrapu

Professor, Department of IT Management

Abid Hussain

PhD student, Department of IT Management

(2)

Raghava Rao Mukkamala, http://www.itu.dk/~rao

July 01, 2014

Road  Map

Part 1: Introduction and Motivation

Part 2: Social Data Model

Part 3: Social Data Analytics Example: H & M Company

(3)

Raghava Rao Mukkamala, http://www.itu.dk/~rao

July 01, 2014

Social  Media  as  Business  Pla@orm

Most used platforms: Facebook, Twitter,

Youtube, LinkedIn

Content published worldwide by million

of social media users

Social Data Sets contain valuable

information

Can provide meaningful facts and

actionable insights

Vatrapu, R. (2013). Understanding social business. In Emerging Dimensions of Technology Management (pp. 147-158). Springer India.

!

Giglietto, F., Rossi, L., & Bennato, D. (2012). The Open Laboratory: Limits and Possibilities of Using Facebook, Twitter, and YouTube as a Research Data Source. Journal of Technology in Human Services, 30(3-4), 145-159.

http://60secondmarketer.com/blog/

(4)

Raghava Rao Mukkamala, http://www.itu.dk/~rao

July 01, 2014

Facebook-­‐Social  Media  Pla@orm

A Facebook wall offers

1

a new and more informal, but a public means for users

provision to enter into discussions and debate with other users

engagement can range from interactions under normal conditions

to activities under extra-ordinary events such as Arab Spring

Facebook gives you friends, while Twitter gives you followers!

Structure and Data availability

2

Twitter: Simple and public

Facebook: Complicated and Restricted privacy settings

Research wise: Mostly on Twitter, very less on Facebook

1) ROBERTSON, S., VATRAPU, R. & MEDINA, R. (2010). Online Video “Friends” Social Networking: Overlapping Online Public Spheres in the 2008 U.S. Presidential Election. Journal of Information Technology & Politics, 7, 182-201

(5)

Raghava Rao Mukkamala, http://www.itu.dk/~rao

July 01, 2014

Social  Data  Research  Approaches

Ethnographical Approaches

1

social meaning inferred from content

usually qualitative techniques based on triangulation methods

Statistical Approaches

1

Computational supported statistical approach (correlation, regression, etc)

e.g. study of twitter usage in natural disasters

2

Computational Approaches

Computational Social Science: interdisciplinary approach, social scientists +

computer scientists + mathematicians

3

building models, methods and concepts for analysis of large volume of data

1) Giglietto, F., Rossi, L., & Bennato, D. (2012). The Open Laboratory: Limits and Possibilities of Using Facebook, Twitter, and YouTube as a Research Data Source. Journal of Technology in Human Services, 30(3-4), 145-159.

2) Earle, P. (2010). Earthquake Twitter. Nature Geoscience, 3(4), 221–222.

(6)

Raghava Rao Mukkamala, http://www.itu.dk/~rao

July 01, 2014

Our  Research  Approach

Formal Methods to develop advanced data analysis techniques for

social data

Formal methods: technique to model complex phenomena as

mathematical entities

abstract, precise and complete

Current techniques limited to Social Network Analysis based on

graph-theoretical approach (Relational Sociology)

Our approach: based on Set theory and Fuzzy Logics

(Associational Sociology)

(7)

Raghava Rao Mukkamala, http://www.itu.dk/~rao

July 01, 2014

Research  Methodology

!

Using Integrated Modeling approach

!

Conceptual Model of Social Data

!

Formal Model based on Set Theory

!

Social Data Analytics Tool

(8)

Raghava Rao Mukkamala, http://www.itu.dk/~rao

July 01, 2014

Social  Data  Conceptual  Model

Social Graph Analysis: structure of

relationships emerging from social

media use

Which actors involved?

What actions they perform?

What activities they

undertake?

What artifacts they create

and interact with?

A. Hussain, R. Vatrapu, D. Hardt, and Z. Jaffari, “Social data analytics tool: A demonstrative case study of methodology and

software.” in Analysing Social Media Data and Web Networks, M. C. Rachel Gibson and S. Ward, Eds. Palgrave Macmillan, 2014 (in press)

(9)

Raghava Rao Mukkamala, http://www.itu.dk/~rao

July 01, 2014

Social  Data  Conceptual  Model

Social Text Analysis: substantive

nature of the interactions

How the topics are

discussed?

Which keywords appear?

Which pronouns are used?

How are far positive/negative

sentiments expressed?

A. Hussain, R. Vatrapu, D. Hardt, and Z. Jaffari, “Social data analytics tool: A demonstrative case study of methodology and

software.” in Analysing Social Media Data and Web Networks, M. C. Rachel Gibson and S. Ward, Eds. Palgrave Macmillan, 2014 (in press)

(10)

Raghava Rao Mukkamala, http://www.itu.dk/~rao

July 01, 2014

(11)

Raghava Rao Mukkamala, http://www.itu.dk/~rao

July 01, 2014

(12)

Raghava Rao Mukkamala, http://www.itu.dk/~rao

July 01, 2014

(13)

Raghava Rao Mukkamala, http://www.itu.dk/~rao

July 01, 2014

(14)

Raghava Rao Mukkamala, http://www.itu.dk/~rao

July 01, 2014

(15)

Raghava Rao Mukkamala, http://www.itu.dk/~rao

July 01, 2014

H  &  M  Facebook  Data  set

H&M Swedish fast fashion retail cloths

company

2009/01/01 to 2013/12/31

Total entries: 12.60 Million

9.95 Million likes

112,000 posts,

300, 000 comments

Albums + comments & likes on

Albums: 2.26 Million

Likes

79%

Albums+

18%

Posts

1%

Comments

2%

(16)

Raghava Rao Mukkamala, http://www.itu.dk/~rao

July 01, 2014

SenLment  Analysis

Artifact (text) can be analysed by

machine learning tools such as

Google Prediction API

1

default sentiment label: positive

(+) /neutral (0)/negative (-)

a sentiment score such as

{(+):82, (0):15, (-): 03}

Only posts and comments can have

sentiments

likes and shares carry forward their

parent artifact’s sentiment

H&M (u0)

Featuring new summer collection cloths …. (r1)

Yes, I love summer (r2) u3

post

u6 like

comment

I love H &M (r3) comment u5 u1

like u2

Finally my high heel sandals

arrived, and they broke!!! :-( (r6) u7 comment u4 share u8 like (+):58, (0):35, (-): 07 (+):35, (0):55, (-): 10 (+):82, (0):15, (-): 03 (+):12, (0):21, (-): 67

(17)

Raghava Rao Mukkamala, http://www.itu.dk/~rao

July 01, 2014

ArLfact  SenLment  DistribuLon

0 175.000 350.000 525.000 700.000 875.000 1.050.000 1.225.000 1.400.000 Quarters 2009-01 2009-02 2009-03 2009-04 2010-01 2010-02 2010-03 2010-04 2011-01 2011-02 2011-03 2011-04 2012-01 2012-02 2012-03 2012-04 2013-01 2013-02 2013-03 2013-04

Positive Negative Nuetral

(+) ∩ (-) 2% (+) ∩ (-) ∩ (0) 6% (+) ∩ (0) 6% (-) ∩ (0) 4% Negative (-) 18 % (20,521) Positive (+) 33% (36,784) Neutral (-) 31% (33,910)

Temporal distribution of Artifact sentiments (quarterly)

Distribution of post artifacts based on the

(18)

Raghava Rao Mukkamala, http://www.itu.dk/~rao

July 01, 2014

Actor  SenLments  -­‐  Actor  Profiling

Actors don't carry any direct

sentiment

Actors

perform

Actions

on

Artifacts

Actor sentiment can be derived

from their actions on artifacts

Total: 3.8 million unique users

(+) ∩ (-) 3.3% (+) ∩ (-) ∩ (0) 11.1% (+) ∩ (0) 14.6% (-) ∩ (0) 3.9% Negative (-) 10.3% (384 906) Positive (+) 17.9% (667 307) Neutral (-) 38.9% (1 451 635)

Distribution of actors’ sentiments based on

the actions they do on artifacts sentiments

0 100.000 200.000 300.000 400.000 500.000 600.000 700.000 800.000 Quarters 2009-01 2009-02 2009-03 2009-04 2010-01 2010-02 2010-03 2010-04 2011-01 2011-02 2011-03 2011-04 2012-01 2012-02 2012-03 2012-04 2013-01 2013-02 2013-03 2013-04

(19)

Raghava Rao Mukkamala, http://www.itu.dk/~rao

July 01, 2014

Actor  SenLments  -­‐  Actor  Profiling  -­‐  II  

Temporal distribution of -ve/+ve actor sentiment ratio (Weekly basis)

0% 1250% 2500% 3750% 5000% 2011-01 2011-04 2011-07 2011-10 2011-13 2011-16 2011-19 2011-22 2011-25 2011-28 2011-31 2011-34 2011-37 2011-40 2011-43 2011-46 2011-49 2011-52 2012-02 2012-05 2012-08 2012-11 2012-14 2012-17 2012-20 2012-23 2012-26 2012-29 2012-32 2012-35 2012-38 2012-41 2012-44 2012-47 2012-50 2012-53 2013-03 2013-06 2013-09 2013-12 2013-15 2013-18 2013-21 2013-24 2013-27 2013-30 2013-33 2013-36 2013-39 2013-42 2013-45 2013-48 2013-51 2013-52

Ratio of -ve/+ve sentiment Average -ve /+ve ratio (154%)

Some of the peaks corresponds to real-world events

such as Factory collapses in Bangladesh

Rana Plaza incident (week 2013-17 - 2013-20)

where 1129 people died

(20)

Raghava Rao Mukkamala, http://www.itu.dk/~rao

July 01, 2014

H  &  M  Sales  -­‐  ArLfact  SenLments  

Strong correlation between

sales and +ve comments on

non-H&M posts

Strong correlation between sales

and -ve posts by non-H&M

Strong correlation between sales

and -ve comments on non-H&M

posts

Strong correlation between sales

and neutral posts by non-H&M

(21)

Raghava Rao Mukkamala, http://www.itu.dk/~rao

July 01, 2014

Future  Work

Formal model can abstracted further to model data

from other social media channels such as Twitter

Use of fuzzy sets and fuzzy logic to develop

advanced analysis techniques

Modeling of networks of groups and friends of

users in an online social media platform

More case studies to study consumer behaviour in

case of crisis events

http://www.itu.dk/~rao developers.google.com/prediction/.

References

Related documents

As exposed in chapter 5, the open source platform analysed Hadoop is the most used and serves as basis for some other mention platforms, maybe the better suited for all contexts

In the previous chapter, the BDAD was designed and developed, using open- source software, namely Hadoop and Spark, to be able to store data in a dis- tributed, fault-tolerant

Given the opportunity to source social media data from Facebook for a perfectly sufficient timeframe for analysis of the Bangladesh textile industry event timeline, an

You will also have the opportunity to explore the unique open-source toolkits that MIT Connection Science has incubated (such as Funf and Bandicoot); discover novel applications

The emerging consensus is that Big Data is any data source that fits the magnitude criteria where magnitude means Volume, Velocity, Variety, and Complexity?. We define Big data

• Actian Dataflow Analytics for Hadoop (KNIME installation library) • Gephi (open source graph visualization). 12 Confidential © 2012

When used in conjunction with open source tools such as Hadoop and MapReduce, this powerful analytic solution delivers everything you need to acquire, organize, analyze and

In this block diagram major component is flume that is a tool which is used to retrieve the information from the web source like twitter or facebook and this is