Feature and Sentiment Extraction through Text Mining

Chapter 4 The Impact of Daily Deal Promotions on Retailers’ Online Reputation

4.6 Mediation and Moderation Analysis

4.6.1 Feature and Sentiment Extraction through Text Mining

Text mining techniques have been used to extract important information from the texts of online reviews in marketing and other business areas (Lee and Bradlow 2011; Bai 2011; Netzer et al. 2012). In this study the text mining analysis involves two components: 1) identifying features (from which we derive consumers perceptions of quality) most frequently mentioned by reviewers; and 2) identifying the consumers' sentiments about the sentences containing those features.

4.6.1.1 Extracting Important Features

We adopted one of the well-established techniques (see Hu and Liu 2004) for feature identification to extract nouns and noun phrases as candidates, because the most frequently described features have been shown to be nouns and noun phrases.

The steps we used of features extraction include:

1) applied POS tagging and obtained nouns and noun phrases for each review. This yielded more than 9,000 nouns and noun phrases from Yelp reviews;

2) manually pre-defined five categories of features and built our own dictionary (see Table 18) to reduce the dimensionality of features;

3) classified each sentence into one of the 5 categories according to the dictionary; and, 4) selected food quality and service quality as the two key potential mediators because they were the most frequently mentioned features as detailed in Table 19.

Table 18. Feature Categories, Definitions and Example Words in the Dictionary

Feature Categories Definitions Example words

Food quality Perceived quality related to food, including various types of food and drink.

‘salad’, ‘chicken’, ‘ice tea’,

‘wine’

Service quality Perceived quality related to service in a restaurant from various servers, such as waiter and waitress.

‘waiter’, ‘waitress’,

‘manager’

Ambiance Perceived quality related to the dinning environment, such as atmosphere.

‘ambiance’, ‘atmosphere’,

‘music’

Social setting A feature indicating whom the reviewers dine with.

‘husband’, ‘wife’,

‘boyfriend’, ‘girlfriend’

Others Any other feature which does not belong to any

category above. The rest of noun phrases

Table 19. Frequency Distribution of Features Feature

Reviews Frequency Number of

Reviews Frequency

4.6.1.2 Extracting Sentiments

Focusing on the two types of perceived quality detailed above we identified the sentiment for each sentence that mentioned food quality and/or service quality. We applied the corpus-based machine learning method, which enabled us to capture the contextual structure and domain-related knowledge (see Pang and Lee 2002).

The steps of sentiment identification included:

1) randomly selected 100 reviews and split them into 946 sentences;

2) manually assigned the sentiment (positive, negative or neutral) to each of the 946 sentences;

3) assigned 80% (757) of sentences as the training set and the remaining 20% (189) of sentences as the test set;

4) trained various machine learning models (including naïve Bayes, Support Vector Machine, and Logistic Regression) using the training set;

5) applied the trained model to the test set, and selected the model with the highest prediction rate, Logistic Regression; and,

6) utilized trained Logistic Regression model and obtained the sentiment of each sentence mentioning food quality and/or service quality.

To categorize the sentiments, we assigned -1 to indicate negative sentiment, 0 for neutral, and 1 for positive and then calculated the average sentiment of food quality (service quality) over all sentences mentioning food quality (service quality) for matched Groupon Restaurants and Non-Groupon Restaurants.

4.6.1.3 Preliminary Evidence of Mediation and Moderation Effects – Food Quality and Service Quality

Table 20 below summarizes the comparisons of average sentiment of food quality and Change in Yelp Rating for the matched pairs. Please note that we excluded any matched pair in which one or both restaurants in the matched pair did not mention food quality. Hence, we have 751 matched pairs for analysis.

The average sentiment of food quality for Groupon Restaurants is statistically

significantly lower than that of Non-Groupon Restaurants, which potentially explains the overall declining trend of Yelp ratings. Hence, food quality is a potential mediator of the main effect.

We then divided the matched pairs into three groups according to pre-promotion ratings:

1) low rating, [1, 2.5); 2), 2) medium rating, [2.5, 3.5); and 3) high rating, [3.5, 5]. In all three

cases the average sentiment of food quality for Groupon Restaurants is statistically significantly lower than that of Non-Groupon Restaurants. However, regarding the Change in Yelp Ratings, the decline in Yelp Rating for Groupon Restaurants is statistically significantly larger than that of Non-Groupon Restaurants only for high rated restaurants. There is no statistically significant difference in Change in Yelp Ratings between Groupon Restaurants and Non-Groupon

Restaurants for medium and low rated restaurants.

These results provide some direct evidence for the potential moderation role of pre-promotion ratings on the effect of Groupon pre-promotions on Yelp ratings.

Table 20. Comparisons of Average Sentiment of Food Quality and Change in Yelp Ratings Pre-promotion

Rating

Number of Matched

Pairs

Average Sentiment of Food Quality Change in Yelp Ratings Groupon

Restaurants

Non-Groupon Restaurants

Groupon Restaurants

Non-Groupon Restaurants

Total 751 0.52 0.58*** -0.07 -0.01***

Low 15 0.40 0.66** 0.30 0.39

Medium 262 0.48 0.53*** -0.0004 0.002

High 474 0.55 0.60*** -0.12 -0.03***

Note: *’s denote significance level of Wilcoxon tests of larger means between Groupon vs. non-Groupon restaurants, <0.1 *; <0.05 **; <0.01***

Similarly, Table 21 summarizes the comparisons of average sentiment for service quality and Change in Yelp Ratings for the matched pairs. We excluded any matched pair in which one or both restaurants in the matched pair did not mention service quality. Because service quality was mentioned than food quality, we obtained fewer matched pairs (527) than for food quality.

The average sentiment for service quality for Groupon Restaurants is statistically significantly less than that of Non-Groupon Restaurants, which suggests that service quality is also a potential mediator for the overall decline of Yelp ratings after Groupon promotions.

We divided the matched pairs into three groups based on the three pre-promotion ratings.

As detailed in Table 21, the average sentiment of service quality for Groupon Restaurants is less

than that of Non-Groupon Restaurants for all three groups, although the difference for low rated restaurants is not statistically significant.

Regarding the Change in Yelp Ratings, for the medium and high rated restaurants, the decline of Yelp ratings for Groupon Restaurants is statistically significantly larger than that of Non-Groupon Restaurants. However, the difference of the Change in Yelp Ratings between Groupon Restaurants and Non-Groupon Restaurants is 0.027 for the medium rated restaurants compared to 0.08 for the high rated restaurants. For the low rated restaurants, Yelp ratings for Groupon Restaurants increased more than Non-Groupon Restaurants. Because we only have 7 matched pairs for this group, statistical significance for the comparison test is not considered.

The results above also suggest the potential for moderation role of pre-promotion ratings on the effect of Groupon promotions on Yelp ratings.

Table 21. Comparisons of Average Sentiment of Service Quality and Change in Yelp Ratings Pre-promotion

Rating

Number of Matched

Pairs

Average Sentiment of Service Quality Change in Yelp Ratings Groupon

Restaurants

Non-Groupon Restaurants

Groupon Restaurants

Non-Groupon Restaurants

Total 527 0.51 0.56*** -0.07 -0.02***

Low 7 -0.18 0.14 0.37 0.34

Medium 184 0.17 0.28* -0.03 -0.003**

High 336 0.25 0.34*** -0.11 -0.03***

Note: *’s denote significance level of Wilcoxon tests of larger means between Groupon vs. non-Groupon restaurants, <0.1 *; <0.05 **; <0.01***

In the next sections, we test the formal models of the mediation effect and moderation effect that we proposed in Section 4.3. In subsection 4.6.2 and subsection 4.6.3, we test the models for food quality and service quality separately. In subsection 4.6.4, we combined both food quality and service quality in the analysis. Please note that, because we do not have enough marched pairs for low rated restaurants, the analyses are limited to the matched pairs of the medium (relatively lower) and high (relatively higher) pre-promotion rated restaurants.

In document Essays on Business Models and Empirical Analysis of the Online Daily Deal Industry (Page 105-110)