A Monthly Double-Blind Peer Reviewed Refereed Open Access International e-Journal - Included in the International Serial Directories.
GE-International Journal of Management Research (GE-IJMR) ISSN: (2321-1709)
219 | P a g e
GE-International Journal of Management Research
Vol. 4, Issue 7, July 2016 IF- 4.88 ISSN: (2321-1709)
© Associated Asia Research Foundation (AARF)
Website: www.aarf.asia Email : [email protected] , [email protected]USING SOCIAL MEDIA DATA SET AS A KEY INPUT TO ECONOMIC
INDICATORS
Mr. Harish Kamath, B.Sc., MBA (Ph.D.)
Research Scholar
ISBR Research Centre
(Recognized by the University of Mysore)
No. 107, Electronics City – Phase I
Near Infosys, Behind BSNL Telephone Exchange
Bangalore 560100
Dr. Noor Firdoos Jahan, Ph.D
Professor
R. V. Institute of Management
CA 17, 26th Main, 36th Cross
4th T Block, Jayanagar
Bangalore 560041
A Monthly Double-Blind Peer Reviewed Refereed Open Access International e-Journal - Included in the International Serial Directories.
GE-International Journal of Management Research (GE-IJMR) ISSN: (2321-1709)
220 | P a g e
ABSTRACT
Economic indicators and reports are published primarily by the govt. agencies from
time to time. These statistics are vital for building right strategy for economic growth.
Economic forecasting is hard without having the right statistics on key Economic Indicators.
The accuracy of the data is the key for the decision making. It’s always a challenge for the
statisticians to get the right data and the volume of data. Smaller volumes of data can project
the different Index than a huge volume of dataset. Economic Indicators will be more accurate
with the right volume of data and right contextual data. Social media has a collection of huge
dataset with real time data. We can identify patterns in certain trends and it can be a good
pointer for building key economic indicator. This paper will focus on using the Social Media
dataset as one of the key input to derive the key economic indicators.
KEYWORDS-JEL: A13, D81, E51, G21, G32, C81, C82, E24, J60
1. Introduction
This paper will focus on the understanding current method of deriving economic
Indicator for one of the key economic indicator as a case study. This paper will also suggest
the conceptual model to build the infrastructure to integrate into larger dataset. Scope of this
paper is to look at the validity of using Social media dataset as a key input to improve
accuracy of the key economic indicators and building a conceptual model for one sample
economic Indicator to prove how this objective can be achieved. Listing down all possible
key indicators is not in the scope of the paper.
2. Method
We will look at one of the key Economic Indicator unemployment rate and
understand what Social Media dataset is and how it can help. We will take research done on
the topic “Using Social Media to Measure Labor Market Flows” (Dolan, 2014) as a baseline
for this paper and extend it.
3. Social Media dataset
Social media is a term used to describe the group of internet based applications.
Internet based technologies enable communication channels dedicated to user based input,
interaction, content-sharing and collaboration. Social media includes popular networking
websites, like Facebook, Instagram, Twitter, Pinterest, Google+, LinkedIn, YouTube etc. the
usage of such application depends on user’s choice. The dataset will contain the conversation
A Monthly Double-Blind Peer Reviewed Refereed Open Access International e-Journal - Included in the International Serial Directories.
GE-International Journal of Management Research (GE-IJMR) ISSN: (2321-1709)
221 | P a g e behavior. The extrapolation of data into subject of interest is critical to the accuracy of the
outcome. The Volume, real time, contextual data and involvement of majority of the
population makes dataset more relevant. According to Global Web Index (GWI), people
spend 28% of the online time on the Social Media activities and about 13% on
Microblogging. Following image summarizes the involvement of population. (Chaffey, 2016)
Image 1. User Base of Social Media. Source: SmartInsights.com
There is an apprehension about how the data can be compromised on privacy. The
data set is utilized without the personal context. To illustrate this better, If person A tweets I
lost job, then the content of the tweet is used over knowing who did that. Of course the
location and time related information becomes important but they will be detached from
individual’s context.
4. Limitations
Social Media dataset can be used for the Analysis of various aspects of human
behavior but the data cannot be verified for its trueness. That’s the downside of the
dataset. There have been several commercial usages of the Social media dataset. This
may add up to a certain percentage of error, which need to be accommodated into the
model.
5. Economic Indexes
An economic indicator is any statistics related to macro or micro economics, such as
A Monthly Double-Blind Peer Reviewed Refereed Open Access International e-Journal - Included in the International Serial Directories.
GE-International Journal of Management Research (GE-IJMR) ISSN: (2321-1709)
222 | P a g e well the economy is doing and how well the economy is going to do in the future. These
information is critical for the economists, business, and lawmakers to make decisions and
build winning strategy. (Moffatt, 2016)
There are 3 dimensions of an Economic Indicator. They are Economic Index trend vs
Economy trend, Frequency of data, timing.
Economic Index trend vs Economy trend. Primarily there are 2 types.
1. Pro-Cyclic: the economic Indexes which are in tandem with the economy. As an
example, if GDP is moving upwards, it indicates economy is moving is the same direction
2. Counter-Cyclic indicates that economy is moving in the opposite direction to the
Economic Index. The best example is Unemployment rate. If this index is greater,
economy moving is opposite direction
Frequency of data. Economic Indexes can be classified into three categories based on
the timescale.
Leading: is an Economic Indicator, published ahead of the time even before the event
occurs. These are primarily the predictions. As an example expected GDP growth, stock
indexes etc. they are based on certain historical data points model based future prediction.
Various parameters will determine the projection. Change in one of the parameters can
lead to a skewed actual value.
Lagged: is an economic indicator which is based on the historical data. Usually it’s derived from recent history. The accuracy here is higher as it’s already happened in the
past and the model will provide the key statistics. Unemployment rate is an lagged
indicator.
Coincident: is an economic indicator is one that simply moves at the same time the
economy does. It’s mostly the real time data. Gross Domestic Product is a coincident
indicator.
Time based. In most countries GDP figures are released quarterly, the unemployment
rate is released monthly. Some economic indicators, such as the SENSEX, Nifty are made
A Monthly Double-Blind Peer Reviewed Refereed Open Access International e-Journal - Included in the International Serial Directories.
GE-International Journal of Management Research (GE-IJMR) ISSN: (2321-1709)
223 | P a g e Sometimes it brings lot of value if lagged Economic indicators be lead. As an
example, if unemployment rate can be estimated ahead of time, the State agencies can be
better prepared for unemployment insurance ahead of time. Though this is not the focus,
it becomes natural outcome of this model.
6. Case study of Unemployment rate
The context has US department of labour, way of calculating the Unemployment rate.
(Bureau of Labour statistics, 2015) There are about 60,000 eligible households are
considered for the sample dataset for this survey. It would be about 110,000. The sample
is selected in such a way that it represents entire population of the US. About 2,000
geographically apart areas are chosen as sampling units.
Every month, 15,000 of the households in the sample are changed to prevent
consecutive interviews more than 4 months. Census Bureau employees will interview the
60,000 eligible sample households with job related relevant questions, which feeds in as a
dataset for deriving the Unemployment rate using a defined model.
This is a lagged method which means by the data is historical than real time as the
time it takes to process the entire dataset.
Now let’s take a look at how the Social Media dataset can add value. As the Twitter
based dataset is already been explored in the research done by Antenucci et al. We will
look at using Facebook, RSS news feeds, Micro blogging and community blogs.
7. Model
Sentiment Analysis method is one of the best method followed in the industry on
Social media. The conceptual model of extracting the Index is explained in the following
steps, step by step.
Step 1. Collect the filtered data from all the related channels like Facebook, Twitter,
RSS feeds, Microblogging sites etc. in the case of Unemployment Rate calculations the
dataset must be filtered on “lost job”, “Looking for new job”, ”Jobless”, “Unemployment” could be used as keywords for filtering the dataset. Once we have the
filtered dataset, the next step will be Sentiment Analysis. Please note that there is no
specific need for regional sampling the filter will be only on the larger level i.e. the
A Monthly Double-Blind Peer Reviewed Refereed Open Access International e-Journal - Included in the International Serial Directories.
GE-International Journal of Management Research (GE-IJMR) ISSN: (2321-1709)
224 | P a g e
Step 2. Sentiment Analysis will focus on classifying each row of the dataset into
“positive sentiment”, “negative sentiment” and “neutral sentiment” positive sentiments
are those keywords and context which adds up to the count in the direction of the
analysis. In this case any keyword that suggests or confirms the jobless will be a positive
sentiment. Filtered data may have the positive/negative/neutral sentiment. We need to
understand the context of the keywords and then perform the sentiment analysis. To
illustrate it better, we have the dataset which contains 3 rows of data, which are filtered
based in the keyword “job”.
1. “I lost my job and looking for new one quickly” Twitter message from person Y
2. “I got a new job. Feeling excited” Person X wrote in his Facebook
3. “It’s observed that job market will be stable” From News feeds
Among these the first statement is suggesting the job loss, which is what we are
interested in counting the numbers. So it’s termed as positive sentiment. Second one is the
negative count. Which means we have to deduct a count from unemployment rate count.
Hence it’s a negative sentiment. Third statement has a word job in it but has no certain
sentiment. Its neither positive nor negative hence its neutral sentiment.
Step 3. Track this number separately from the current ongoing method of deriving the
Unemployment Rate will continue
Step 4. The delta from actual Unemployment rate, Traditional model and the one with
Social media data set will be compared constantly and the delta is observed for any
relation using the machine learning algorithms.
Step 5. Make the right amount of data mix from traditional model and Social media if
it’s more accurate. Continue to use social media as a standalone index if it produces a
better accuracy
Step 6. Use feedback mechanism to see and correct the delta by introducing error
correction index. Error correction index will be derived from delta between actual data vs.
derived Index. Please see the below diagram illustrating the same concept. The Error
Index will be conceptualized based on the weighted average model to ensure each source
A Monthly Double-Blind Peer Reviewed Refereed Open Access International e-Journal - Included in the International Serial Directories.
GE-International Journal of Management Research (GE-IJMR) ISSN: (2321-1709)
225 | P a g e multiple sources. This model is at conceptual stage. This will be the continuous process to
improve the indexes.
Sentiment Analysis Social Media
data
Weighted Scoring Model
Feedback Real-time
System
Feedback
Image 2. Build feedback system
8. Conclusion
Following the same step by step process for other Economic Indicators like
Level of New Business Startups, Consumer confidence, Consumer satisfaction Index etc. also
can be derived thru the same process using Social media set. It’s important to understand that
these can be better suited for, where human behavior or reactions, or such patterns are
directly contributing to the economic indexes.
References
1. Bureau of Labor statistics. (2015, October 15). Labor Force Statistics from the Current
Population Survey. Retrieved from Bureau of Labor statistics, available at:
http://www.bls.gov/cps/cps_htgm.htm
2. Chaffey, D. (2016, April 21). Global social media research summary 2016. Retrieved
from Smartinsights, available at: http://www.smartinsights.com/social-media-marketing/social-media-strategy/new-global-social-media-research/
3. Dolan, A. (2014). Using Social Media to Measure Labor Market Flows.
4. Moffatt, M. (2016, May 12). Beginner's Guide to Economic Indicators. Retrieved from
Economics.about.com:
http://economics.about.com/cs/businesscycles/a/economic_ind.htm
5. Wikipedia. (2016, May 12). Sentiment analysis. Retrieved from Wikipedia, available at: