Towards a Set Theoretical Approach to
Big Data Analytics
IEEE BigData 2014, Alaska, USA
July 01, 2014
Raghava Rao Mukkamala
Postdoc, Software and Systems Section
IT University of Copenhagen, Denmark
!
Joint work with
Ravi Vatrapu
Professor, Department of IT Management
Abid Hussain
PhD student, Department of IT Management
Raghava Rao Mukkamala, http://www.itu.dk/~rao
July 01, 2014
Road Map
•
Part 1: Introduction and Motivation
•
Part 2: Social Data Model
•
Part 3: Social Data Analytics Example: H & M Company
Raghava Rao Mukkamala, http://www.itu.dk/~rao
July 01, 2014
Social Media as Business Pla@orm
•
Most used platforms: Facebook, Twitter,
Youtube, LinkedIn
•
Content published worldwide by million
of social media users
•
Social Data Sets contain valuable
information
•
Can provide meaningful facts and
actionable insights
Vatrapu, R. (2013). Understanding social business. In Emerging Dimensions of Technology Management (pp. 147-158). Springer India.
!
Giglietto, F., Rossi, L., & Bennato, D. (2012). The Open Laboratory: Limits and Possibilities of Using Facebook, Twitter, and YouTube as a Research Data Source. Journal of Technology in Human Services, 30(3-4), 145-159.
http://60secondmarketer.com/blog/
Raghava Rao Mukkamala, http://www.itu.dk/~rao
July 01, 2014
Facebook-‐Social Media Pla@orm
•
A Facebook wall offers
1
•
a new and more informal, but a public means for users
•
provision to enter into discussions and debate with other users
•
engagement can range from interactions under normal conditions
to activities under extra-ordinary events such as Arab Spring
•
Facebook gives you friends, while Twitter gives you followers!
•
Structure and Data availability
2
•
Twitter: Simple and public
•
Facebook: Complicated and Restricted privacy settings
•
Research wise: Mostly on Twitter, very less on Facebook
1) ROBERTSON, S., VATRAPU, R. & MEDINA, R. (2010). Online Video “Friends” Social Networking: Overlapping Online Public Spheres in the 2008 U.S. Presidential Election. Journal of Information Technology & Politics, 7, 182-201
Raghava Rao Mukkamala, http://www.itu.dk/~rao
July 01, 2014
Social Data Research Approaches
•
Ethnographical Approaches
1•
social meaning inferred from content
•
usually qualitative techniques based on triangulation methods
•
Statistical Approaches
1•
Computational supported statistical approach (correlation, regression, etc)
•
e.g. study of twitter usage in natural disasters
2•
Computational Approaches
•
Computational Social Science: interdisciplinary approach, social scientists +
computer scientists + mathematicians
3•
building models, methods and concepts for analysis of large volume of data
1) Giglietto, F., Rossi, L., & Bennato, D. (2012). The Open Laboratory: Limits and Possibilities of Using Facebook, Twitter, and YouTube as a Research Data Source. Journal of Technology in Human Services, 30(3-4), 145-159.
2) Earle, P. (2010). Earthquake Twitter. Nature Geoscience, 3(4), 221–222.
Raghava Rao Mukkamala, http://www.itu.dk/~rao
July 01, 2014
Our Research Approach
•
Formal Methods to develop advanced data analysis techniques for
social data
•
Formal methods: technique to model complex phenomena as
mathematical entities
•
abstract, precise and complete
•
Current techniques limited to Social Network Analysis based on
graph-theoretical approach (Relational Sociology)
•
Our approach: based on Set theory and Fuzzy Logics
(Associational Sociology)
Raghava Rao Mukkamala, http://www.itu.dk/~rao
July 01, 2014
Research Methodology
!
•
Using Integrated Modeling approach
!
•
Conceptual Model of Social Data
!
•
Formal Model based on Set Theory
!
•
Social Data Analytics Tool
Raghava Rao Mukkamala, http://www.itu.dk/~rao
July 01, 2014
Social Data Conceptual Model
•
Social Graph Analysis: structure of
relationships emerging from social
media use
•
Which actors involved?
•
What actions they perform?
•
What activities they
undertake?
•
What artifacts they create
and interact with?
A. Hussain, R. Vatrapu, D. Hardt, and Z. Jaffari, “Social data analytics tool: A demonstrative case study of methodology and
software.” in Analysing Social Media Data and Web Networks, M. C. Rachel Gibson and S. Ward, Eds. Palgrave Macmillan, 2014 (in press)
Raghava Rao Mukkamala, http://www.itu.dk/~rao
July 01, 2014
Social Data Conceptual Model
•
Social Text Analysis: substantive
nature of the interactions
•
How the topics are
discussed?
•
Which keywords appear?
•
Which pronouns are used?
•
How are far positive/negative
sentiments expressed?
A. Hussain, R. Vatrapu, D. Hardt, and Z. Jaffari, “Social data analytics tool: A demonstrative case study of methodology and
software.” in Analysing Social Media Data and Web Networks, M. C. Rachel Gibson and S. Ward, Eds. Palgrave Macmillan, 2014 (in press)
Raghava Rao Mukkamala, http://www.itu.dk/~rao
July 01, 2014
Raghava Rao Mukkamala, http://www.itu.dk/~rao
July 01, 2014
Raghava Rao Mukkamala, http://www.itu.dk/~rao
July 01, 2014
Raghava Rao Mukkamala, http://www.itu.dk/~rao
July 01, 2014
Raghava Rao Mukkamala, http://www.itu.dk/~rao
July 01, 2014
Raghava Rao Mukkamala, http://www.itu.dk/~rao
July 01, 2014
H & M Facebook Data set
•
H&M Swedish fast fashion retail cloths
company
•
2009/01/01 to 2013/12/31
•
Total entries: 12.60 Million
•
9.95 Million likes
•
112,000 posts,
•
300, 000 comments
•
Albums + comments & likes on
Albums: 2.26 Million
Likes
79%
Albums+
18%
Posts
1%
Comments
2%
Raghava Rao Mukkamala, http://www.itu.dk/~rao
July 01, 2014
SenLment Analysis
•
Artifact (text) can be analysed by
machine learning tools such as
Google Prediction API
1
•
default sentiment label: positive
(+) /neutral (0)/negative (-)
•
a sentiment score such as
{(+):82, (0):15, (-): 03}
•
Only posts and comments can have
sentiments
•
likes and shares carry forward their
parent artifact’s sentiment
H&M (u0)
Featuring new summer collection cloths …. (r1)
Yes, I love summer (r2) u3
post
u6 like
comment
I love H &M (r3) comment u5 u1
like u2
Finally my high heel sandals
arrived, and they broke!!! :-( (r6) u7 comment u4 share u8 like (+):58, (0):35, (-): 07 (+):35, (0):55, (-): 10 (+):82, (0):15, (-): 03 (+):12, (0):21, (-): 67
Raghava Rao Mukkamala, http://www.itu.dk/~rao
July 01, 2014
ArLfact SenLment DistribuLon
0 175.000 350.000 525.000 700.000 875.000 1.050.000 1.225.000 1.400.000 Quarters 2009-01 2009-02 2009-03 2009-04 2010-01 2010-02 2010-03 2010-04 2011-01 2011-02 2011-03 2011-04 2012-01 2012-02 2012-03 2012-04 2013-01 2013-02 2013-03 2013-04
Positive Negative Nuetral
(+) ∩ (-) 2% (+) ∩ (-) ∩ (0) 6% (+) ∩ (0) 6% (-) ∩ (0) 4% Negative (-) 18 % (20,521) Positive (+) 33% (36,784) Neutral (-) 31% (33,910)
Temporal distribution of Artifact sentiments (quarterly)
Distribution of post artifacts based on the
Raghava Rao Mukkamala, http://www.itu.dk/~rao
July 01, 2014
Actor SenLments -‐ Actor Profiling
•
Actors don't carry any direct
sentiment
•
Actors
perform
Actions
on
Artifacts
•
Actor sentiment can be derived
from their actions on artifacts
•
Total: 3.8 million unique users
(+) ∩ (-) 3.3% (+) ∩ (-) ∩ (0) 11.1% (+) ∩ (0) 14.6% (-) ∩ (0) 3.9% Negative (-) 10.3% (384 906) Positive (+) 17.9% (667 307) Neutral (-) 38.9% (1 451 635)
Distribution of actors’ sentiments based on
the actions they do on artifacts sentiments
0 100.000 200.000 300.000 400.000 500.000 600.000 700.000 800.000 Quarters 2009-01 2009-02 2009-03 2009-04 2010-01 2010-02 2010-03 2010-04 2011-01 2011-02 2011-03 2011-04 2012-01 2012-02 2012-03 2012-04 2013-01 2013-02 2013-03 2013-04
Raghava Rao Mukkamala, http://www.itu.dk/~rao
July 01, 2014
Actor SenLments -‐ Actor Profiling -‐ II
Temporal distribution of -ve/+ve actor sentiment ratio (Weekly basis)
0% 1250% 2500% 3750% 5000% 2011-01 2011-04 2011-07 2011-10 2011-13 2011-16 2011-19 2011-22 2011-25 2011-28 2011-31 2011-34 2011-37 2011-40 2011-43 2011-46 2011-49 2011-52 2012-02 2012-05 2012-08 2012-11 2012-14 2012-17 2012-20 2012-23 2012-26 2012-29 2012-32 2012-35 2012-38 2012-41 2012-44 2012-47 2012-50 2012-53 2013-03 2013-06 2013-09 2013-12 2013-15 2013-18 2013-21 2013-24 2013-27 2013-30 2013-33 2013-36 2013-39 2013-42 2013-45 2013-48 2013-51 2013-52
Ratio of -ve/+ve sentiment Average -ve /+ve ratio (154%)
•
Some of the peaks corresponds to real-world events
such as Factory collapses in Bangladesh
•
Rana Plaza incident (week 2013-17 - 2013-20)
where 1129 people died
Raghava Rao Mukkamala, http://www.itu.dk/~rao
July 01, 2014
H & M Sales -‐ ArLfact SenLments
•
Strong correlation between
sales and +ve comments on
non-H&M posts
•
Strong correlation between sales
and -ve posts by non-H&M
•
Strong correlation between sales
and -ve comments on non-H&M
posts
•
Strong correlation between sales
and neutral posts by non-H&M
Raghava Rao Mukkamala, http://www.itu.dk/~rao
July 01, 2014
Future Work
•
Formal model can abstracted further to model data
from other social media channels such as Twitter
•
Use of fuzzy sets and fuzzy logic to develop
advanced analysis techniques
•
Modeling of networks of groups and friends of
users in an online social media platform
•
More case studies to study consumer behaviour in
case of crisis events