Three Essays on Big Data Consumer Analytics in E-Commerce

(1)

University of Pennsylvania

ScholarlyCommons

Publicly Accessible Penn Dissertations

1-1-2015

Three Essays on Big Data Consumer Analytics in

E-Commerce

Dokyun Lee

University of Pennsylvania, [email protected]

Follow this and additional works at:

http://repository.upenn.edu/edissertations

Part of the

Advertising and Promotion Management Commons

,

Databases and Information

Systems Commons

, and the

Marketing Commons

This paper is posted at ScholarlyCommons.http://repository.upenn.edu/edissertations/1830 For more information, please [email protected].

Recommended Citation

Lee, Dokyun, "Three Essays on Big Data Consumer Analytics in E-Commerce" (2015).Publicly Accessible Penn Dissertations. 1830.

(2)

Three Essays on Big Data Consumer Analytics in E-Commerce

Abstract

Consumers are increasingly spending more time and money online. Business

to consumer e-commerce is growing on average of 20 percent each year and

has reached 1.5 trillion dollars globally in 2014. Given the scale and growth

of consumer online purchase and usage data, firms' ability to understand

and utilize this data is becoming an essential competitive strategy.

But, large-scale data analytics in e-commerce is still at its nascent stage and there

is much to be learned in all aspects of e-commerce. Successful analytics on big data often require a combination of both data mining and econometrics: data mining to reduce or structure

(from unstructured data such as text, photo, and video) large-scale data

and econometric analyses to truly understand and assign causality to interesting

patterns. In my dissertation, I study how firms can better utilize big data

analytics and specific applications of machine learning techniques for improved

e-commerce using theory-driven econometrical and experimental studies. I

show that e-commerce managers can now formulate data-driven strategies for

many aspect of business including cross-selling via recommenders on sales

sites to increasing brand awareness and leads via social media content-engineered-marketing.

These results are readily actionable with far-reaching economical consequences.

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Graduate Group

Operations & Information Management

First Advisor

(3)

Keywords

big data, data mining, e-commerce, managerial implications, recommender system, social media marketing

Subject Categories

(4)

THREE ESSAYS ON BIG DATA CONSUMER ANALYTICS IN E-COMMERCE

Dokyun Lee

A DISSERTATION

in

Operation and Information Management

For the Graduate Group in Managerial Science and Applied Economics

Presented to the Faculties of the University of Pennsylvania

in

Partial Fulfillment of the Requirements for the

Degree of Doctor of Philosophy

2015

Supervisor of Dissertation

Kartik Hosanagar, Professor of OPIM

Graduate Group Chairperson

Eric Bradlow, Professor of Marketing, Statistics, and Education

Dissertation Committee

Kartik Hosanagar, Professor of Operations and Information Management

(5)

2015

Dokyun Lee

This work is licensed under the

Creative Commons Attribution

NonCommercial-ShareAlike 3.0

License

To view a copy of this license, visit

(6)

Dedicated to my parents, Hongme and Sangwook,

(7)

ACKNOWLEDGEMENT

The five years I spent here has been the best time of my life thus far. I am grateful for the

best advisors and friendly colleagues who have helped and stimulated me throughout.

I would like to first thank my advisor, Kartik Hosanagar, who goes above and beyond for

his students with a style and grace. I would like to thank Lorin Hitt and Harikesh Nair for

being in my committee and for giving me valuable advices and feedbacks on both research

and career. I am blessed to have world-leading researchers and educators as my mentors,

who excel in advising and guiding as much as they do in researching and teaching. I learned

many things beyond researching from them and I hope to learn even more in the decades

to come. I am grateful for insightful discussions and general supports from Vibhanshu

Abhishek, Christophe Van den Bulte, Raghuram Iyengar, David Bell, Jonah Berger, Dylan

Small, Dean Foster, Paul Shaman, Lynn Wu, Xuanming Xu, Noah Gans, Eric Clemons,

Jing Peng, Fujie Jin, Eric Bradlow, Karl Ulrich, Jeff Cai, and Arun Gopalakrishnan.

I would like to thank the Baker Retailing Center, the William And Phyllis Mack Institute

for Innovation Management, Fishman-Davidson Center for Service and Operations

Man-agement, and Wharton Risk Management and Decision Processes Center for their generous

and instrumental financial support. I am grateful to Sargent Shriver, Andrea Nurse, Kim

Watford, Patricia James, Tara Mullins for all the administrative help and Stan Liu and

Jamie Walter for IT support.

Lastly, I would like to thank my parents for providing me with the best opportunities

possible for following any goals and dreams I have. They have been my role models all my

life and I am forever grateful for their steadfast support and unconditional sacrifice. Finally,

(8)

ABSTRACT

Dokyun Lee

Kartik Hosanagar

Consumers are increasingly spending more time and money online. Business to consumer

e-commerce is growing on average of 20 percent each year and has reached 1.5 trillion

dollars globally in 2014. Given the scale and growth of consumer online purchase and usage

data, firms’ ability to understand and utilize this data is becoming an essential competitive

strategy. But, large-scale data analytics in e-commerce is still at its nascent stage and

there is much to be learned in all aspects of e-commerce. Successful analytics on big data

often require a combination of both data mining and econometrics: data mining to reduce

or structure (from unstructured data such as text, photo, and video) large-scale data and

econometric analyses to truly understand and assign causality to interesting patterns. In my

dissertation, I study how firms can better utilize big data analytics and specific applications

of machine learning techniques for improved e-commerce using theory-driven econometrical

and experimental studies. I show that e-commerce managers can now formulate data-driven

strategies for many aspect of business including cross-selling via recommenders on sales sites

to increasing brand awareness and leads via social media content-engineered-marketing.

(9)

TABLE OF CONTENTS

ACKNOWLEDGEMENT . . . iv

ABSTRACT . . . v

LIST OF TABLES . . . ix

LIST OF ILLUSTRATIONS . . . xi

CHAPTER 1 : Introduction . . . 1

CHAPTER 2 : The Effect of Social Media Marketing Content on Consumer Engage-ment: Evidence from Facebook . . . 5

2.1 Introduction . . . 5

2.2 Data . . . 12

2.3 Empirical Strategy . . . 29

2.4 Results . . . 37

2.5 Discussion and Managerial Implications . . . 50

2.6 Conclusions . . . 54

CHAPTER 3 : People Who Liked This Study Also Liked: The Impact of Recom-mender Systems on Sales Volume and Diversity . . . 56

3.2 Prior Work . . . 59

3.3 Problem Formulation . . . 64

3.4 Data . . . 69

3.5 Results . . . 72

(10)

CHAPTER 4 : When do Recommender Systems Work the Best? The Moderating

Effects of Product Attributes and Consumer Reviews on

Recom-mender Performance . . . 88

4.2 Data . . . 91

4.3 Product Attributes & Hypotheses . . . 95

4.4 Model & Results . . . 106

4.5 Conclusion and Discussion . . . 111

APPENDIX . . . 123

(11)

LIST OF TABLES

TABLE 1 : Variable Descriptions and Summary for Content-coded Data . . . . 18

TABLE 2 : Examples of Messages and Their Content Tags . . . 19

TABLE 3 : Performance of Text Mining Algorithm on 5000 Messages Using 10-fold Cross-validation . . . 28

TABLE 4 : User-level Setup Notation. . . 31

TABLE 5 : EdgeRank Model Estimates . . . 39

TABLE 6 : Persuasive vs Informative . . . 41

TABLE 7 : Aggregate Logistic Regression Results For Comments and Likes . . 45

TABLE 8 : Predicted versus Actual Engagement Ranking for Three Illustrative messages . . . 53

TABLE 9 : Literature on Impact of Recommender Systems and Claims . . . . 63

TABLE 10 : Movie Genres Viewed and Purchased . . . 71

TABLE 11 : Data Summary Statistics . . . 73

TABLE 12 : Individual Item Views Comparison . . . 76

TABLE 13 : Individual Item Purchases Comparison . . . 76

TABLE 14 : Individual Wallet-Size Comparison . . . 77

TABLE 15 : Aggregate View Diversity . . . 78

TABLE 16 : Aggregate Sales Diversity . . . 78

TABLE 17 : Individual View Diversity . . . 79

TABLE 18 : Individual Purchase Diversity . . . 80

TABLE 19 : Hypotheses Tested . . . 82

TABLE 20 : Permutation Test Results for Co-purchase Network Comparisons . 83 TABLE 21 : Hypotheses under Robustness Checks . . . 86

(12)

TABLE 23 : Variable Descriptions and Summary for Content-coded Data. . . . 94

TABLE 24 : Utilitarian VS. Hedonic Product Cluster Means . . . 99

TABLE 25 : Search VS. Experience Product Cluster Means . . . 103

TABLE 26 : Logistic Regression Results Table . . . 107

TABLE 27 : Multiple Specifications for Review Related Variables . . . 110

TABLE 28 : Hypotheses and Results . . . 110

TABLE 29 : Other Takeaways . . . 111

TABLE 30 : A Few Examples of Message Attributes Used in Natural Language Processing Algorithm . . . 120

TABLE 31 : Performance of Text Mining Algorithm on 5000 Messages Using 10-fold Cross-validation . . . 121

(13)

LIST OF ILLUSTRATIONS

FIGURE 1 : (Left) Example of a firm’s Facebook Page (Walmart). (Right)

Ex-ample of a firm’s message and subsequent user engagement with

that message (Tennis Warehouse). Example is not necessarily from

our data. . . 13

FIGURE 2 : Co-occurrence of Attribute Characteristics Across messages . . . . 20

FIGURE 3 : Bubble Chart of Broader Industry Category vs Message Content . 21 FIGURE 4 : Box Plots of Log(engagement+1) vs Time since message Release . 22 FIGURE 5 : AverageLikes and Comments by Message Type . . . 23

FIGURE 6 : AverageLikes and Comments by Message Type by Industry . . . 23

FIGURE 7 : AverageLikes and Comments by Message Content . . . 24

FIGURE 8 : Cronbach’s Alphas for 5,000 Messages . . . 25

FIGURE 9 : Impression-Engagement Funnel . . . 31

FIGURE 10 : Page-level Fixed effect Estimates from Generalized Additive Model Across 14 Demographic Bins . . . 40

FIGURE 11 : Time Since message Release (τ) Coefficients Box plot Across De-mographics . . . 40

FIGURE 12 : Message Characteristic Coefficients for Comments andLikes . . . 46

FIGURE 13 : Logistic Regression by Industry . . . 47

FIGURE 14 : Proportion of Content Posted Split into Hour-bin . . . 49

FIGURE 15 : Message Characteristic Coefficients for Shares and Click-throughs 52 FIGURE 16 : Lorenz Curve . . . 65

FIGURE 17 : Recommender Example . . . 70

FIGURE 18 : Average Individual Statistics . . . 74

(14)

FIGURE 20 : Co-Purchase Network Graphs of Genre Purchases under Control

and Purchase-Based Collaborative Filtering. . . 83

FIGURE 21 : Genre Purchase Share Comparison on Purchase-based CF vs. Control 84

FIGURE 22 : Recommendation Panel . . . 93

FIGURE 23 : Survey Form Used in Amazon Mechanical Turk . . . 113

(15)

CHAPTER 1 : Introduction

Consumers are increasingly spending more time and money online. A 2013 study

(eMar-keter, 2013b) reports that for the first time, the average adult in the US will spend more

time online than watching TV at just above five hours per day. The spread of mobile devices

like smartphones and tablet PCs are also fueling this dramatic increase in consumer online

activities. Consequently, e-commerce is growing faster than ever. Business to consumer

e-commerce is growing on average of 20% each year and has reached $1.5 trillion globally

in 2014 (eMarketer, 2014).

This growth in online activity has given arise to a new phenomenon called “big data”. “Big

data” is a catch-phrase used to describe massive data recorded online (e.g., e-commerce,

search engine) and offline by a myriad of sensors (e.g., surveillance, traffic monitor). The

term is used to describe four different aspects of challenges arising from exploding data: 1)

Volume, which refers to the sheer amount of volume recorded1; 2) Variety, arising from

un-structured data like text and photos that come from numerous sources such as social media

sites; 3) Velocity, which refers to the speed at which data gets recorded; and 4) Veracity,

which refers to the uncertainty and missing data. The “big data” phenomenon has created

problems and challenges to virtually everyone including marketers, business managers,

aca-demics, and policy makers: How can big data be utilized for improved marketing, business

managing, and policy making?

Given the scale and growth of consumer online purchase and usage data, firms’ ability to

understand and utilize this big data is becoming an essential competitive strategy. Several

academic and industry reports (Kiron et al., 2011; Rogers and Sexton, 2012; Monetate,

2014a,b) show that while 63% of organizations see big data analytics as a competitive

advantage, 80% of marketers say they don’t know how to translate data into action and

that 95% of data within organizations remain unused. Even more perplexing, one survey

1

(16)

(Allen et al., 2005) shows that while 80% of CEOs believe they deliver superior customer

experience, only 8% of customers agree. Big data analytics in e-commerce is still at its

nascent stage and there is much to be learned in all aspects of e-commerce. Particularly

lacking is the area of social media marketing (specifically, content engineering for better

engagement) and the impact of recommender systems, in which there are little to no

large-scale level analyses or much disagreement on what strategies actually work.

Big data analytics is challenging because successful analytics on big data often require a

combination of both data mining and econometrics: Data mining to reduce or structure

(from unstructured data such as text, photo, and video) large-scale data and econometric

analyses to truly understand and assign causality to interesting patterns. In my

disserta-tion, I study how firms can better utilize big data analytics and specific applications of

machine learning techniques for improved e-commerce using theory-driven econometrical

and experimental studies. Specifically, in the first essay, I investigate how firms can actively

content engineer their social media page postings (e.g., Facebook Pages and Twitter) to

better engage connected consumers. In the second essay, I investigate how different

recom-mender algorithms on e-commerce sites (e.g., Amazon.com’s “Consumers who purchased

this also purchased”) influence sales volume and diversity. In the third essay, I plan to study

how product attributes and reviews moderate the performance of recommender systems.

Based on completed results, it can be observed that big data analytics that combine data

mining and econometrical studies can provide readily actionable strategies to improve many

aspects of e-commerce with far-reaching economical consequences. A detailed description

of three essays is given below.

Essay 1- The Effect of Social Media Marketing Content on Consumer

Engage-ment: Evidence from Facebook We investigate the effect of social media content on

customer engagement using a large-scale field study on Facebook. We content-code more

than 100,000 unique messages across 800 companies engaging with users on Facebook using

(17)

Process-ing algorithms. We use this large-scale database of advertisProcess-ing attributes to test the effect

of ad content on subsequent user engagement defined as Likes and comments − with the

messages. We develop methods to account for potential selection biases that arise from

Facebook’s filtering algorithm, EdgeRank, that assigns posts non-randomly to users. We

find that inclusion of persuasive content − like emotional and philanthropic content−

in-creases engagement with a message. We find that informative content − like mentions of

prices, availability and product features − reduce engagement when included in messages

in isolation, but increase engagement when provided in combination with persuasive

at-tributes. Persuasive content thus seems to be the key to effective engagement. Our results

inform advertising design in social media, and the methodology we develop to content-code

large-scale textual data provides a framework for future studies on unstructured natural

language data such as advertising content or product reviews.

Essay 2- “People Who Liked This Study Also Liked”: An Empirical

Investi-gation of the Impact of Recommender Systems on Sales Volume and Diversity

We investigate the impact of collaborative filtering recommender algorithms (e.g.,

Ama-zon.com’s “Customers who bought this item also bought”), commonly used in e-commerce,

on sales volume and diversity. We use data from a randomized field experiment on movie

sales run by a top retailer in North America. For sales volume, we show that different

al-gorithms have differential impacts. Purchase-based collaborative filtering (“Customers who

bought this item also bought”) causes a 25% lift in views and a 35% lift in the number of

items purchased over the control group (no recommender). In contrast, View-based

collab-orative filtering (“Customers who viewed this item also viewed”) shows only a 3% lift in

views and a 9% lift in the number of items purchased, albeit not statistically significant. For

sales diversity, we find that collaborative filtering algorithms cause individuals to discover

and purchase a greater variety of products but push users to the same set of titles, leading

to concentration bias at the aggregate level. We show that this differential impact on

in-dividual versus aggregate diversity is caused by users exploring into only a few ’pathway’

(18)

few popular genres rather than uniformly aiding discovery in all genres. For managers, our

results inform personalization and recommender strategy in e-commerce. From an academic

standpoint, we provide the first empirical evidence from a randomized field experiment to

help reconcile opposing views on the impact of recommenders on sales diversity.

Essay 3- When do Recommender Systems Work the Best? The Moderating

Effects of Product Attributes and Consumer Reviews on Recommender

Perfor-mance We investigate the moderating effect of product attributes and consumer reviews

on the efficacy of a collaborative filtering recommender system on an e-commerce site. We

run a randomized field experiment on a top North American retailer’s website with 184,375

users split into a recommender-treated group and a control group with 37,215 unique

prod-ucts in the dataset. By augmenting the dataset with Amazon Mechanical Turk tagged

product attributes and consumer reviews from the website, we study their moderating

in-fluence on recommenders in generating conversion.

We first confirm that the use of recommenders increases the baseline conversion rate by

5.9%. We find that recommenders act as substitutes for high average review ratings and

review volumes with the effect of using recommenders increasing the conversion as much

as about two additional average star ratings. Additionally, we find that positive impact

on conversion from recommenders are greater for hedonic products compared to utilitarian

products while search-experience quality did not have any impact. Lastly, we find that

the higher the price, the lower the positive impact of recommenders, while providing more

product descriptions increased the recommender effectiveness.

For managers, we 1) identify the products with which to use recommenders and 2) show

how other product information sources on e-commerce sites interact with recommenders.

From an academic standpoint, we provide insight into the underlying mechanism behind

(19)

CHAPTER 2 : The Effect of Social Media Marketing Content on Consumer

Engagement: Evidence from Facebook

2.1. Introduction

Social networks are increasingly taking up a greater share of consumers’ time spent online.

As a result, social media — which includes advertising on social networks and/or marketing

communication with social characteristics — is becoming a larger component of firms’

mar-keting budgets. Surveying 4,943 marketing decision makers at U.S. companies, the 2013

Chief Marketing Officer survey (www.cmosurvey.org) reports that expected spending on

social media marketing will grow from 8.4%s of firms’ total marketing budgets in 2013 to

about 22% in the next five years. As firms increase their social media activity, the role

of content engineering has become increasingly important. Content engineering seeks to

develop content that better engages targeted users and drives the desired goals of the

mar-keter from the campaigns they implement. This raises the question: what content works

best? The most important body of academic work on this topic is the applied psychology

and consumer behavior literature which has discussed ways in which the content of

market-ing communication engages consumers and captures attention. However, most of this work

has tested and refined theories about content primarily in laboratory settings. Surprisingly,

relatively little has been explored systematically about the empirical consequences of

adver-tising and promotional content in real-world, field settings outside the laboratory. Despite

its obvious relevance to practice, Marketing and advertising content is also relatively under

emphasized in economic theory. The canonical economic model of advertising as a signal

(c.f. Nelson (1970); Kihlstrom and Riordan (1984); Milgrom and Roberts (1986)) does not

postulate any direct role for ad content because advertising intensity conveys all relevant

information about product quality in equilibrium to market participants. Models of

infor-mative advertising (c.f. Butters (1977); Grossman and Shapiro (1984)) allow for advertising

to inform agents only about price and product existence — yet, casual observation and

(20)

adver-tisements contain much more information and content beyond prices. In this paper, we

explore the role of content in driving consumer engagement in social media in a large-scale

field setting. We document the kinds of content used by firms in practice. We show that

a variety of emotional, philanthropic, and informative advertising content attributes affect

engagement and that the role of content varies significantly across firms and industries. The

richness of our engagement data and the ability to content code social media messages in

a cost-efficient manner enables us to study the problem at a larger scale than much of the

previous literature on the topic.

Our analysis is of direct relevance to industry in better understanding and improving firms’

social media marketing strategies. Many industry surveys (Ascend2, 2013; Gerber, 2014;

eMarketer, 2013a; SmartBrief, 2010; Ragan and Solutions, 2012) report that achieving

en-gagement on large audience platforms like Facebook is the top most important social

me-dia marketing goals for consumer-facing firms.1Social media marketing agencies’s financial

arrangements are increasingly contracted on the basis of the engagement these agencies

promise to drive for their clients. In the early days of the industry, it was thought that

engagement was primarily driven by the volume of users socially connected to the brand by

increasing the reach of posts released by the firms. Accordingly, firms aggressively acquired

fans and followers on platforms like Facebook by investing heavily in ads on the network.

However, early audits of the data (e.g., Creamer 2012) suggested that only about 1% of

an average firm’s Facebook fans show any engagement with the brand by Liking, sharing,

or commenting on messages by the brand on the platform. As a result, industry

atten-tion shifted from acquisiatten-tion of social media followers per se, to the design of content that

achieves both better reach and engagement amongst social media followers, especially since

the design of websites like Facebook also uses current engagement level to determine firms’

future reach. In a widely reported example that reflects this trend (WSJ, 2012), General

Motors curtailed its annual spending of$10M on Facebook’s paid ads (a vehicle for

acquir-1

(21)

ing new fans for the brand), choosing instead to focus on creating content for its branded

Facebook Page, on which it spent $30M. While attention in industry has shifted towards

content in this manner, industry still struggles with understanding what kinds of content

work better for which firms and in what ways. For example, are messages seeking to inform

consumers about product or price attributes more effective than persuasive messages with

humor or emotion? Do messages explicitly soliciting user response (e.g., “Like this post

if . . . ”) draw more engagement or in fact turn users away? Does the same strategy apply

across different industries? Our paper systematically explores these kinds of questions and

contributes to the formulation of better content engineering policies in practice.2

Our empirical investigation is implemented on Facebook, which is the largest social media

platform in the world. As alluded to above, many top brands now maintain a Facebook

page from which they serve posts and messages to connected users. This is a form of free

social media marketing that has increasingly become a popular and important channel for

marketing. Our data comprises information on about 100,000 such messages posted by a

panel of about 800 firms over a 11-month period between September 2011 and July 2012.

For each message, our data also contains time-series information on two kinds of

engage-ment measures — Likes and comments — observed on Facebook. In addition, we have

cross-sectional data on shares and click-throughs. We supplement these engagement data

with message attribute information that we collect using a large-scale survey we implement

on Amazon Mechanical Turk (henceforth “AMT”), combined with a Natural Language

Processing algorithm (henceforth “NLP”) we build to tag messages. We incorporate new

methods and procedures to improve the accuracy of content tagging on AMT and our NLP

algorithm. As a result, our algorithm achieves great accuracy, recall, and precision under

10-fold cross validation for almost all tagged content profiles.3We believe the methods we

2

As of December 2013, industry-leading social media analytics firms such as Wildfire (now part of Google) do not offer detailed content engineering analytics connecting a wide variety of social media content with real engagement data. Rather, to the best of our knowledge, they provide simpler analytics such as optimizing the time-of-the-day or day-of-the-week to post and whether to include pictures or videos.

3

(22)

develop will be useful in future studies analyzing other kinds of advertising content and

product reviews.

Our data has several advantages that facilitate a detailed study of content. First, Facebook

messages have rich content attributes (unlike say, Twitter tweets, which are restricted to

140 characters) and rich data on user engagement. Second, Facebook requires real names

and, therefore, data on user activity on Facebook is often more reliable compared to other

social media sites. Third, engagement is measured on a daily basis (panel data) by actual

message-level engagement such asLikes and comments that are precisely tracked within a

closed system. These aspects make Facebook an almost ideal setting to study the role of

content for this type of marketing communication.

Our strategy for coding content is motivated by the psychology, marketing and economic

literatures on advertising (see Cialdini (2001); Bagwell (2007); Berger (2012); Chandy et al.

(2001); Vakratsas and Ambler (1999) for some representative overviews). In the economics

literature, it is common to classify advertising as informative (shifting beliefs about product

existence or prices) or persuasive (shifting preferences directly). The basis of informative

content is limited to prices and/or existence, and persuasive content is usually treated as a

“catch-all” without finer classification. Rather than this coarse distinction, our

classifica-tion follows the seminal classificaclassifica-tion work of Resnik and Stern (1977), who operaclassifica-tionalize

informative advertising based on the number of characteristics of informational cues (see

Abernethy and Franke, 1996 for an overview of studies in this stream). Some criteria for

classifying content as informative include details about products, promotions, availability,

price, and product related aspects that could be used in optimizing the purchase

deci-sion. Following this stream, any product oriented facts, and brand and product mentions

are categorized as informative content. Following suggestions in the persuasion literature

(Cialdini, 2001; Nan and Faber, 2004; Armstrong, 2010; Berger, 2012), we classify

“per-suasive” content as those that broadly seek to influence by appealing toethos,pathos, and

(23)

logos strategies. For instance, the use of a celebrity to endorse a product or attempts to

gain trust or good-will (e.g., via small talk, banter) can be construed as the use ofethos —

appeals through credibility or character — and a form of persuasive advertising. Messages

with philanthropic content that induce empathy can be thought of as an attempt at

per-suasion via pathos — an appeal to a person’s emotions. Lastly, messages with unusual or

remarkable facts that influence consumers to adopt a product or capture their attention can

be categorized as persuasion via logos — an appeal through logic. We categorize content

that attempt to persuade and promote relationship building in this manner as persuasive

content. Though we believe we consider a larger range of content attributes than the

exist-ing literature, it is practically impossible to detail the full range of possible content profiles

produced on a domain as large as Facebook (or in a data as large as ours). We choose

content profiles that reflect issues flagged in the existing academic literature and those that

are widely used by companies on Facebook. We discuss this in more detail in Section 2.2.

Estimation of the effect of content on subsequent engagement is complicated by the

non-random allocation of messages to users implemented by Facebook via its EdgeRank

algo-rithm. EdgeRank tends to serve to users messages that are newer and are expected to appeal

better to his/her tastes. We account for the selection induced byEdgeRank by developing a

semi-parametric correction for the filtering it induces. One caveat to the correction is that

it is built on prior (but imperfect) knowledge of how EdgeRank is implemented. In the

ab-sence of additional experimental/exogenous variation, we are unable to address all possible

issues with potential nonrandom assignment perfectly. We view our work as a large-scale,

and relatively exhaustive exploratory study of content variables in social media that could

be the basis of further rigorous testing and causal assessment, albeit at a more limited scale.

A fully randomized large-scale experiment that provides a cross-firm and cross-industry

as-sessment like provided here may be impossible or cost-prohibitive to implement, and hence,

we think a large-scale cross-industry study based on field data of this sort is valuable.

(24)

engagement significantly. Additionally, informative content tends to drive engagement

pos-itively only when combined with such content. Persuasive content thus seem to be the key

to effective content engineering in this setting. This finding is of substantive interest

be-cause most firms post messages with one content type or other, rather than in combination.

Our results suggest therefore that there may be substantial gains to content engineering

by combining characteristics. The empirical results also unpack the persuasive effect into

component attribute effects and also estimate the heterogeneity in these effects across firms

and industries, enabling fine tuning these strategies across firms and industries.

Our paper adds to a growing literature on social media. Studies have examined the the

dif-fusion of user-generated content (Susarla et al., 2012) and their impact on firm performance

(Rui et al., 2013; Dellarocas, 2006). A few recent papers have also examined the social

me-dia strategies of firms, focusing primarily on online blogs and forums. These include studies

of the impacts of negative blog messages by employees on blog readership (Aggarwal et al.,

2012), blog sentiment and quality on readership (Singh et al., 2014), social product features

on consumer willingness to pay (Oestreicher-Singer and Zalmanson, 2013), and the role of

active contributors on forum participation (Jabr et al., 2014). We add to this literature by

examining the impact of firms’ content strategies on user engagement.

An emerging theoretical literature in advertising has started to investigate the effects of

content. This includes new models that allow ad content to matter in equilibrium by

augmenting the canonical signaling model in a variety of ways (e.g. Anand and Shachar

(2009)) by allowing ads to be noisy and targeted; Anderson and Renault (2006) by allowing

ad content to resolve consumers’ uncertainty about their match-value with a product; and

Mayzlin and Shin (2011) and Gardete (2013) by allowing ad content to induce consumers

to search for more information about a product). Our paper is most closely related to a

small empirical literature that has investigated the effects of ad content in field settings.

These include Bertrand et al. (2010) (effect of direct-mail ad content on loan demand);

(25)

and online sales); Tucker (2012b) (effect of ad persuasion on YouTube video sharing) and

Tucker (2012a) (effect of “social” Facebook ads on philanthropic participation). Also related

are recent studies exploring the effect of content more generally (and not specifically ad

content) including Berger and Milkman (2012) (effect of emotional content in New York

Times articles on article sharing) and Gentzkow and Shapiro (2010) (effect of newspaper’s

political content on readership). Relative to these literatures, our study makes two main

contributions. First, from a managerial standpoint, we show that while persuasive ad

content — especially emotional and philanthropic content — positively impacts consumer

engagement in social media, informative content has a negative effect unless it is combined

with persuasive content attributes. This can help drive content engineering policies in firms.

We also show how the effects differ by industry type. Second, none of the prior studies on

ad content have been conducted at the scale of this study, which spans a large number of

industries. The rigorous content-tagging methodology we develop, which combines surveys

implemented on AMT with NLP-based algorithms, provides a framework to conduct

large-scale studies that analyze the content of marketing communication.

Finally, the reader should note we do not address the separate but important question of how

engagement affects product demand and firm’s profits so as to complete the link between

ad-attributes and those outcome measures. First, the data required for the analysis of this

question at a scale comparable to this study are still not widely available to researchers.

Second, as mentioned, firms and advertisers care about engagementper seand are willing to

invest in advertising for generating engagement, rather than caring only about sales. This

is consistent with our view that advertising is a dynamic problem and a dominant role of

advertising is to build long-term brand-capital for the firm. Even though the current period

effects of advertising on demand may be small, the long-run effect of advertising may be

large, generated by intermediary activities like increased consumer engagement, increased

awareness and inclusion in the consumer consideration set. Thus, studying the formation

and evolution of these intermediary activities — like engagement — is worthwhile in order

(26)

settings. We note other papers such as Kumar et al. (2013); Goh et al. (2013); Rishika et al.

(2013); Li and Wu (2013); Miller and Tucker (2013); Sunghun et al. (2014); Luo and Zhang

(2013); Luo et al. (2013) as well as industry reports (comScore, 2013;

Chadwick-Martin-Bailey, 2010; 90octane, 2012; HubSpot, 2013) have linked the social media engagement

measures we consider to customer acquisition, sales, and profitability metrics.

2.2. Data

Our dataset is derived from the “pages” feature offered by Facebook. The feature was

introduced on Facebook in November 2007. Facebook Pages enable companies to create

profile pages and to post status updates, advertise new promotions, ask questions and push

content directly to consumers. The left panel of Figure 1 shows an example of Walmart’s

Facebook Page, which is typical of the type of pages large companies host on the social

network. In what follows, we use the terms pages, brands, and firms interchangeably. Our

data comprises posts served from firms’ pages onto the Facebook profiles of the users that

are linked to the firm on the platform. To fix ideas, consider a typical message (see the

right panel of Figure 1): “Pretty cool seeing Andy giving Monfils some love. . . Check out

what the pros are wearing here: http://bit.ly/nyiPeW.”4In this status update, a tennis

equipment retailer starts with small talk, shares details about a celebrity (Andy Murray

and Gael Monfils) and ends with link to a product page. Each such message is a unit of

analysis in our data.

2.2.1. Data Description

Raw Data and Selection Criteria

To collect the data, we partnered with an anonymous firm, henceforth referred to as

Com-pany X that provides analytics services to Facebook Page owners by leveraging data from

Facebook’s Insights. Insights is a tool provided by Facebook that allows page owners to

monitor the performance of their Facebook messages. Company X augments data from

4

(27)

Figure 1: (Left) Example of a firm’s Facebook Page (Walmart). (Right) Example of a firm’s message and subsequent user engagement with that message (Tennis Warehouse). Example is not necessarily from our data.

Facebook Insights across a large number of client firms with additional records of daily

message characteristics, to produce a raw dataset comprising a message-day-level panel of

messages posted by companies via their Facebook pages. The data also includes two

con-sumer engagement metrics: the number ofLikes and comments for each message each day.

These metrics are commonly used in industry as measures of engagement. They are also

more granular than other metrics used in extant research such as the number of fans who

haveLiked the page. Also available in the data are the number of impressions of each

mes-sage per day (i.e., the total number of users the mesmes-sage is exposed to - we have both the

unique user impression and the total impression). In addition, page-day level information

such as the aggregate demographics of users (fans) whoLiked the page on Facebook or have

ever seen messages by the page are collected by Company X on a daily level. This comprises

the population of users a message from a firm can potentially be served to. We leverage this

information in the methodology we develop later for accounting for non-random assignment

of messages to users by Facebook. Once a firm serves a message, the message’s

(28)

126 days).5The raw data contains about a million unique messages by about 2,600 unique

companies.

The reader should note that as of this writing, our data is the most complete observational

data available outside of Facebook — the data includes details such as demographics of page

fans and engaged fans, which cannot be scraped by outsiders (but are essential for correcting

for EdgeRank) but are available only to the page owners via Facebook’s Application

Pro-gramming Interface. Our data also includes daily snapshots of message-level engagement

that Facebook does provide to page owners (Page owners must take snapshots themselves

if they want this data). These daily snapshots generate the within-message variation that

enables the panel analysis in our paper. Finally, page-owners do not have access to data on

performance of any messages by other pages, unlike our dataset which spans a large number

of companies across sectors.

We clean the data to reflect the following criteria:

• Only pages located in the US, and,

• Only messages written in English, and,

• Only messages with complete demographics data.

After cleaning, the data span 106,316 unique messages posted by 782 companies (including

many large brands) between September 2011 and July 2012. This results in about 1.3 million

rows of message-level daily snapshots recording about 450 million page fans’ responses.

Removing periods after which no significant activity is observed for a message reduces this

to 665,916 rows of message-level snapshots (where activity is defined as either impressions,

Likes, or comments). The companies in our dataset are categorized into 6 broader industry

categories following Facebook’s page classification criteria: Celebrities & Public Figure (e.g.,

Roger Federer), Entertainment (e.g., Star Trek), Consumer Products & Brands (e.g., Tesla

5

(29)

Motors), Organizations & Company (e.g., WHO), Websites (e.g., TED), Local Places &

Businesses (e.g., MoMA).

Content-coded Data

We use a two-step method to label content. First, we contract with workers through AMT

and tag 5,000 messages for a variety of content profiles. Subsequently, we build an NLP

algorithm by combining several statistical classifiers and rule-based algorithms to extend

the content-coding to the full set of 100,000 messages. This algorithm uses the 5,000

AMT-tagged messages as the training data-set. We describe these methods in more detail later

in the paper.

The content in Facebook messages can be categorized as informative, persuasive, or both.

Some messages inform consumers about deals and discounts about products, while other

messages seek to connect with consumers on a personal level to promote brand personality,

form relationships and are social in nature. We call the first type informative content,

and the second persuasive content. Some messages do both at the same time by including

casual banter and product information simultaneously (e.g., “Are you a tea person or a coffee

person? Get your favorite beverage from our website: http://www.specific-link-here.

com”).

Table 1 outlines the finer classification of the attributes we code up, including precise

defini-tions, summary statistics, and the source for coding the attribute. In Table 1, the 8 variables:

BRANDMENTION, DEAL, PRICECOMPARE, PRICE, TARGET, PRODAVAIL,

PROD-LOCATION, and PRODMENTION are informative. These variables enable us to assess

the effect of search attributes, brand, price, and product availability information on

en-gagement. The 8 variables: REMFACT, EMOTION, EMOTICON, HOLIDAYMENTION,

HUMOR, PHILANTHROPIC, FRIENDLIKELY, and SMALLTALK are classified as

per-suasive. These definitions include emotional content, humor, banter, and philanthropic

(30)

Informative content variables are identified using the seminal work by Resnik and Stern

(1977), which provides an operational guideline to identify informative content. Their work

provides fourteen evaluative content criteria to identify informative content that includes

content such as product price and availability. Our persuasive content are identified mostly

from existing consumer behavior research. For example, emotional and humorous content

have been identified as drivers of word-of-mouth and of viral marketing tactics (Porter and

Golan, 2006; Berger, 2012, 2011; Berger and Milkman, 2012). Philanthropic content has

been studied in the context of advertising effectiveness (Tucker, 2012a). Similarly, Berger

and Schwartz (2011) documented that the interestingness of content such as mentions of

remarkable facts is effective in generating word-of-mouth. For a survey of papers motivating

our choice of persuasive variables, see Berger (2012). While not fully exhaustive, we have

attempted to cover most variables that are 1) highlighted by prior academic research to be

relevant, 2) commonly discussed and used in the industry.

Besides these main variables of interest, controls and content-related patterns noted as

important in industry reports are profiled. We include these content categories to

inves-tigate more formally considerations laid out in industry white papers, trade-press articles,

and blogs about the efficacy of message attributes in social media engagement. It includes

content that explicitly solicits readers to comment or includes blanks for users to fill out

(thus providing an explicit option to facilitate engagement). Additionally, characteristics

like whether the message contained photos, website links, and the types of the page-owner

(e.g., business organization versus celebrity) are also coded. Other message-specific

char-acteristics and controls include metrics such as message length in characters and SMOG

(“Simple Measure of Gobbledygook”), an automatically computed reading complexity index

that is used widely. Higher values of SMOG implies a message is harder to read. Table 2

shows sample messages taken from Walmart’s page in December 2012 and shows how we

would have tagged them. The reader should note that some elements of content tagging

and classification are necessarily subjective and based on human judgement. We discuss our

(31)

All things considered, we believe this is one of the most comprehensive attempts at tagging

marketing communication related content in the empirical literature.

Data Descriptive Graphics

This section presents descriptive statistics of the main stylized patterns in the data. The

first thing we would like to report is what kinds of content are used by firms (this may

be useful for instance, for a researcher interested in studying the role of specific content

profiles in Facebook, who would like to know what content variables are used a lot by firms).

Table 1 reports on the mean proportion of messages that have each content characteristic.

One can see that messages with videos, product or holiday mentions or emoticons are

relatively uncommon, while those with smalltalk and with information about where to

obtain the product (location/distribution attributes) are very common. Figure 2 reports

on the co-occurrence of the various attributes across messages. The patterns are intuitive.

For instance, emotional and philanthropic content co-occur often, so does emotional and

friend-like content, as well as content that describes product deals and availability. To

better describe the correlation matrix graphically and to cluster highly correlated variables

together, we ran cluster analysis to determine the optimal number of clusters using the

average silhouette width (Rousseeuw, 1987), which suggested that there are two clusters in

the data. Figure 2 shows via a solid line how content types are clustered across messages.6We

see that persuasive content types and informative content types are split into two separate

clusters, suggesting that firms typically tend to use one or the other in their messages. Later

in the paper, we show evidence suggesting that this strategy may not be optimal. Figure 3

shows the percentage of messages featuring a content attribute split by industry category.

We represent the relative percentages in each cell by the size of the bubbles in the chart.

The largest bubble is SMALLTALK for the celebrities category (60.4%) while the smallest

is PRICECOMPARE for the celebrities category (0%). This means that 6 in 10 messages by

celebrity pages in the data have some sort of small talk (banter) and/or content that does

6

(32)

Variable Description Source Mean SD Min Max

TAU (τ) Time since the post release (Day) Facebook 6.253 3.657 1 16 LIKES Number of “Likes” post has obtained Facebook 48.373 1017 0 324543 COMMENTS Number of “Comments” post has obtained Facebook 4.465 78.19 0 22522 IMPRESSIONS Number of times message was shown to users (unique) Facebook 9969.2 129874 1 4.5×107

SMOG SMOG readability index (higher means harder to read) Computed 7.362 2.991 3 25.5

MSGLEN Message length in characters Computed 157.41 134.54 1 6510

HTTP Message contains a link Computed 0.353 0.478 0 1

QUESTION Message contains questions Computed 0.358 0.479 0 1

BLANK Message contains blanks (e.g. “My favorite artist is ”) Computed 0.010 0.099 0 1 ASKLIKE Explicit solicitation for “Likes” (e.g. “Like if . . . ”) Computed 0.006 0.080 0 1 ASKCOMMENT Explicit solicitation for “Comments” Computed 0.001 0.029 0 1

Persuasive

REMFACT Remarkable fact mentioned AMT 0.527 0.499 0 1

EMOTION Any type of emotion present AMT 0.524 0.499 0 1

EMOTICON Contains emoticon or net slang (approximately 1000 scraped from web emoticon dictionary e.g. :D, LOL)

Computed 0.012 0.108 0 1

HOLIDAYMENTION Mentions US Holidays Computed 0.006 0.076 0 1

HUMOR Humor used AMT 0.375 0.484 0 1

PHILANTHROPIC Philanthropic or activist message AMT 0.498 0.500 0 1 FRIENDLIKELY Answer to question: “Are your friends on social media likely to

post message such as the shown”?

AMT 0.533 0.499 0 1 SMALLTALK Contains small talk or banter (defined to be content other than

about a product or company business)

AMT 0.852 0.355 0 1

Informative

BRANDMENTION Mentions a specific brand or organization name AMT+Comp 0.264 0.441 0 1 DEAL Contains deals: any type of discounts and freebies AMT 0.620 0.485 0 1 PRICECOMPARE Compares price or makes price match guarantee AMT 0.442 0.497 0 1

PRICE Contains product price AMT+Comp 0.051 0.220 0 1

TARGET Message is targeted towards an audience segment (e.g. demographics, certain qualifications such as “Moms”)

AMT 0.530 0.499 0 1 PRODAVAIL Contains information on product availability (e.g. stock and

release dates)

AMT 0.557 0.497 0 1 PRODLOCATION Contains information on where to obtain product (e.g. link or

physical location)

AMT 0.690 0.463 0 1 PRODMENTION Specific product has been mentioned AMT+Comp 0.146 0.353 0 1 MSGTYPE Categorical message type assigned by the Facebook Facebook

– App application related messages Facebook 0.099 0.299 0 1

– Link link Facebook 0.389 0.487 0 1

– Photo photo Facebook 0.366 0.481 0 1

– Status Update regular status update Facebook 0.140 0.347 0 1

– Video video Facebook 0.005 0.070 0 1

PAGECATEGORY Page category closely following Facebook’s categorization Facebook

– Celebrity Singers, Actors, Athletes etc Facebook 0.056 0.230 0 1

– ConsumerProduct consumer electronics, packaged goods etc Facebook 0.296 0.456 0 1

– Entertainment Tv shows, movies etc Facebook 0.278 0.447 0 1

– Organization non-profit organization, government, school organization Facebook 0.211 0.407 0 1 – PlaceBusiness local places and businesses Facebook 0.071 0.257 0 1

– Website page about a website Facebook 0.088 0.283 0 1

(33)

Sample Messages Content Tags

Cheers! Let Welch’s help ring in the New Year. BRANDMENTION,

SMALLTALK, HOLIDAYMENTION, EMOTION

Maria’s mission is helping veterans and their families find employment. Like this and watch Maria’s story. http: // walmarturl. com/ VzWFlh

PHILANTHROPIC, SMALLTALK, ASKLIKE, HTTP

On a scale from 1--10 how great was your Christmas? SMALLTALK, QUESTION,

HOLIDAYMENTION Score an iPad 3 for an iPad2 price! Now at your local

store, $50 off the iPad 3. Plus, get a $30 iTunes Gift Card. Offer good through 12/31 or while supplies last.

PRODMENTION, DEAL, PRODLOCATION, PRODAVAIL, PRICE They’re baaaaaack! Now get to snacking again. Find

Pringles Stix in your local Walmart.

EMOTION, PRODMENTION, BRANDMENTION,

PRODLOCATION

Table 2: Examples of Messages and Their Content Tags: The messages are taken from 2012 December messages on Walmart’s Facebook page.

not relate to products or brands; and that there are no messages by celebrity owned pages

that feature price comparisons. “Remarkable facts” (our definition) are posted more by

firms in the entertainment category and less by local places and businesses. Consistent with

intuition, consumer product pages and local places/businesses post the most about products

(PRODMENTION), product availability (PRODAVAIL), product location (PRODLOC),

and deals (DEAL). Emotional (EMOTION) and philanthropic (PHILAN) content have

high representation in pages classified as celebrity, organization, and websites. Similarly,

the AMT workers identify a larger portion of messages posted by celebrity, organization

and website-based pages to be similar to messages by friends.

We now discuss the engagement data. Figure 4 shows box plots of the log of impressions,

Likes, and comments versus the time (in days) since a message is released (τ). Both

comments andLikes taper off to zero after two and six days respectively. The rate of decay

of impressions is slower. Virtually all engagements and impressions (more than 99.9%) are

(34)

Figure 2: Co-occurrence of Attribute Characteristics Across messages. Shades in upper triangle represent correlations. Numbers in lower triangle represent the same correlations in numerical form in 100-s of units (range −100,+100). For e.g., the correlation in occur-rence of smalltalk and humor across messages is 0.26 (cell [3,2]). The dark line shows the separation into 2 clusters. Persuasive content and informative content attributes tend to form two separate clusters.

Figure 5 shows the average number of Likes and comments by message type (photo, link,

etc.) over the lifetime of a message. Messages with photos have the highest average Likes

(94.7) and comments (7.0) over their lifetime. Status updates obtain more comments (5.5)

on average than videos (4.6) but obtain lessLikesthan videos. Links obtain the lowestLikes

on average (19.8) as well as the lowest comments (2.2). Figure 6 shows the same bar plots

split across 6 industry categories. A consistent pattern is that messages with photos always

(35)

Celebrity ConsumerProduct Entertainment Organization PlacesBusiness Websites

17 7 1 0 3 7 12 48 46 9 0 3 5 7 24 18

10 2 0 0 1 2 8 39 53 19 0 6 7 11 36 37

21 12 0 0 2 16 14 50 44 9 0 3 8 6 28 17

7 5 0 1 1 2 10 40 53 39 1 7 7 18 36 31

7 14 0 1 3 11 13 50 24 22 0 2 7 10 39 17

8 12 2 0 1 13 19 60 33 5 0 2 2 9 27 11

remf

act

emotion _emoticon holida

y humor philan fr iendlik ely smalltalk br andmention deal pr icecompare pr ice target proda vail prodloc prodmention

Industry Category VS Message Content Appearance Percentage

The labels on the bubbles are the percentages

Figure 3: Bubble Chart of Broader Industry Category vs Message Content: Each bubble represents the percentage of messages within a row-industry that has the column-attribute. Computed for the 5000 tagged messages. Larger and lighter bubbles imply higher percentage of messages in that cell. Percentages do not add up to 100 along rows or columns as any given message can have multiple attributes included in it. The largest bubble (60.4%) corresponds to SMALLTALK for the celebrity page category and the smallest bubble (0%) corresponds to PRICECOMPARE for the celebrity category.

in engagement response across industries. The patterns in these plots echo those described

in reports by many market research companies such as Wildfire and comScore.

Figure 7 presents the average number of Likes and comments by content attribute.

Emo-tional messages obtain the most number ofLikes followed by messages identified as “likely

to be posted by friends” (variable: FRIENDLIKELY). Emotional content also obtain the

highest number of comments on average followed by SMALLTALK and FRIENDLIKELY.

The reader should note these graphs do not account for the market-size (i.e., the number of

impressions a message reached). Later, we present an econometric model that incorporates

market-size as well as selection by Facebook’s filtering algorithm to assess user engagement

(36)

0 5 10 15

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Tau

Log(Imp+1)

Log(Imp+1) VS Tau (time since post release) boxplot

0 2 4 6 8 10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Tau

Log(Comment+1)

Log(Comment+1) VS Tau (time since post release) boxplot

0 2 4 6 8 10 12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Tau

Log(Lik

e+1)

Log(Like+1) VS Tau (time since post release) boxplot

(37)

0 20 40 60 80

link app status update video photo

A

ver

age Count

comment like Average number of likes and comments obtained

over lifetime by message type

Figure 5: Average Likes and Comments by Message Type: This figure shows the average number ofLikes and comments obtained by messages over their lifetime on Facebook, split by message type.

0 100 200 300 400 500 link app

status updatevideo photo

A ver age Count comment like Celebrity 0 10 20 30 40 link app

A ver age Count comment like ConsumerProduct 0 20 40 60 80 100 link app

A ver age Count comment like Entertainment 0 20 40 60 link app

A ver age Count comment like Organization 0 5 10 link app

A ver age Count comment like PlacesBusiness 0 50 100 150 link app

A ver age Count comment like Websites

(38)

0 50 100 150 remf act

emotion _emoticon holida

y humor philan fr iendlik ely smalltalk br andmention deal pr icecompare pr ice target proda vail prodloc prodmention A ver age Count comment like

Average number of likes and comments obtained over lifetime by message content

Figure 7: AverageLikesand Comments by Message Content: This figure shows the average number of Likes and comments obtained by messages over their lifetime split by message content.

2.2.2. Amazon Mechanical Turk (AMT)

We now describe our methodology for content-coding messages using AMT. AMT is a crowd

sourcing marketplace for simple tasks such as data collection, surveys, and text analysis. It

has now been successfully leveraged in several academic papers for online data collection and

classification. To content-code our messages, we create a survey instrument comprising of a

set of binary yes/no questions we pose to workers (or “Turkers”) on AMT. To ensure high

quality responses from the Turkers, we follow several best practices identified in literature

(e.g., we obtain tags from at least 9 different Turkers choosing only those who are from

the U.S., have more than 100 completed tasks, and an approval rate more than 97%. We

also include an attention-verification question.) Please see the appendix for the final survey

instrument and the complete list of strategies implemented to ensure output quality.

Figure 8 presents the histogram of Cronbach’s Alphas, a commonly used inter-rater

reliabil-ity measure, obtained for the 5,000 messages.7_{The average Cronbach’s Alpha for our 5,000}

7

(39)

tagged messages is 0.82 (median 0.84), well above typically acceptable thresholds of 0.7.

About 87.5% of the messages obtained an alpha higher than 0.7, and 95.4% higher than

0.6. For robustness, we replicated the study with only those messages with alphas above

0.7 (4,378 messages) and found that our results are qualitatively similar.

At the end of the AMT step, approximately 2,500 distinct Turkers contributed to

content-coding 5,000 messages. This constitutes the training dataset for the NLP algorithm used

in the next step.

0 50 100 150

0.0 0.2 0.4 0.6 0.8 1.0

Cronbach’s Alpha

Counts

Cronbach’s Alphas for 5,000 Tagged Messages Among 9+ Inputs

Figure 8: Cronbach’s Alphas for 5,000 Messages: This bar graph shows the inter-rater reliability measure of Cronbach’s Alpha among at least 9 distinct Turkers’ inputs for each 5,000 messages. The mean is 0.82 and the median is 0.84. We replicated the study with only those above 0.7 and found the result to be robust.

2.2.3. Natural Language Processing (NLP) for Attribute Tagging

We use NLP techniques to label message content from Facebook messages using the

AMT-labeled messages as the training data. Typical steps for such labeling tasks include: 1)

breaking the sentence into understandable building blocks (e.g., words or lemmas) and

(40)

set of training sentences with labels tagged from a trusted source identifying whether the

sentences do or do not have a given content profile (in our case, this source comprise the

5000 AMT-tagged messages); 3) using statistical tools to infer which sentence-attributes are

correlated with content outcomes, thereby learning to identify content in sentences. When

presented with a new set of sentences, the algorithm breaks the sentence down to building

blocks, identifies sentence-level attributes, and assigns labels using the statistical models

that were fine-tuned in the training process. We summarize our method here briefly. A

detailed description of the algorithms employed is presented in the Appendix.

The use of NLP techniques has been gaining traction in business research due to readily

available text data online (e.g., Netzer et al. (2012); Ghose et al. (2012); Geva and Zahavi

(2013)), and there are many different techniques. Our NLP methods closely mirror cutting

edge multi-step methods used in the financial services industry to automatically extract

financial information from textual sources (e.g., Hassan et al. (2011)) and are similar in

flavor to winning algorithms from the recent Netflix Prize competition.8The method we use

combines five statistical classifiers with rule-based methods via heterogeneous “ensemble

learning”. Statistical classifiers are binary classification machine learning models that take

attributes as input and output predicted classification probabilities.9Rule-based methods

usually use large data sources (a.k.a dictionaries) or use specific if-then rules inputted by

human experts, to scan through particular words or occurrences of linguistic entities in

the messages to generate a classification. For example, in identifying brand and product

mentions, we augment our AMT-tagged answers with several large lists of brands and

products from online sources and a company list database from Thomson Reuters. Further,

to increase the range of our brand name and product database, we also ran a separate

8_See_{http://www.netflixprize.com}_.

9

(41)

AMT study with 20,000 messages in which we asked AMT Turkers to identify any brand

or product name included in the message. We added all the brand and product names we

harvested this way to our look-up database. We then utilize rule-based methods to identify

brand and product mentions by looking up these lists. Similarly, in identifying emoticons

in the messages, we use large dictionaries of text-based emoticons freely available on the

internet.

Finally, we utilize ensemble learning methods that combine classifications from the many

classifiers and rule-based algorithms we use. Combining classifiers is very powerful in the

NLP domain since a single statistical classifier cannot successfully overcome the classic

precision-recall tradeoff inherent in the classification problem. The final combined classifier

has higher precision and recall than any of the constituent classifiers.

Assessment We assess the performance of the overall NLP algorithm on three measures,

viz., accuracy, precision, and recall (as defined in Footnote 3) using 10-fold cross-validation.

10-fold cross-validation is computationally intensive and makes it harder to achieve higher

accuracy, precision and recall, but we find using the criterion critical to obtaining the

external validity required for large scale classification. Table 3 shows these metrics for

dif-ferent content profiles. The performance is extremely good and comparable to performance

achieved by the leading financial information text mining systems (Hassan et al., 2011). We

also report the improvement of the final ensemble learning method relative to using only a

support vector machine classifier. As shown, the gains from combining classifiers are very

substantial. We obtain similar results for negative class labels.

As a final point of assessment, note that several papers in the management sciences using

NLP methods implementunsupervised learning which does not require human-tagged data.

These techniques use existing databases such as WordNet (lexical database for English) or

tagged text corpus (e.g, tagged Brown Corpus) to learn content by patterns and correlations.

Supervised NLP instead utilizes human-taggers to obtain a robust set of data that can be

(42)

With Ensemble Learning Without Ensemble Learning

(The Best Performing Algorithm) (Support Vector Machine

version 1 + Rule-based)

Accuracy Precision Recall Accuracy Precision Recall

REMFACT 0.94 0.99 0.68 0.88 0.99 0.33

EMOTION 0.97 0.99 0.87 0.94 0.98 0.65

HUMOR 0.98 1.00 0.90 0.97 1.00 0.14

PHILANTHROPIC 0.97 0.99 0.85 0.93 0.99 0.62

FRIENDLIKELY 0.94 0.99 0.68 0.90 0.99 0.41

SMALLTALK 0.85 0.88 0.80 0.78 0.34 0.28

DEAL 0.94 0.99 0.65 0.90 1.00 0.43

PRICECOMPARE 0.99 0.99 1.00 0.99 1.00 0.85

TARGETING 0.98 0.99 0.89 0.95 0.99 0.71

PRODAVAILABILITY 0.96 0.99 0.76 0.91 1.00 0.10

PRODLOCATION 0.97 0.99 0.90 0.87 1.00 0.11

Table 3: Performance of Text Mining Algorithm on 5000 Messages Using 10-fold Cross-validation: This table presents metrics for performance of the classification algorithms used. The left 3 columns show the metrics for the final algorithm which combines classifiers via ensemble learning methods while the right 3 columns shows the metrics for a support vector machine algorithm. Notice that the support vector machine classifier tends to have low recall and high precision. Naive Bayes tends to have high recall but low precision. Classifiers on their own cannot successfully overcome the standard precision-recall tradeoff (if one is higher, the other is lower). But combining many different classifiers with ensemble learning can increase both precision and recall. We obtain similar results for negative class labels.

performance is significantly poor compared to that of supervised NLP algorithms like the

ones implemented here. Finally, To the best of our knowledge, the NLP method used in

this paper that uses ensemble learning to combine several statistical classifiers and

rule-based methods, has not been used in business research journals.10Further, several current

implementations of NLP do not utilize the strict bar of utilizing the 10-fold cross-validation

criterion. We believe one of the contributions of this paper is to demonstrate how to utilize

AMT in combination with ensemble learning techniques, to implement supervised NLP in

business research to produce robust and cost-efficient NLP algorithms that perform well at

the scale required for empirical work. We believe the method will be useful in future studies

on unstructured natural language data such as advertising content or product reviews. For

10