Move over, marketing researchers, and make room. Website vs. Traditional Survey Ratings DO THEY TELL THE SAME STORY? By D.

(1)

Website vs. Traditional Survey Ratings

DO THEY TELL THE SAME STORY?

M

ove over, marketing researchers, and make room for social media. Today, a plethora of general and industry-specific websites make it possible for consum-ers to read what other consumconsum-ers have to say about everything from hotels to hospitals, restaurants to repair services and cater-ers to car dealcater-erships. These same sites also make it possible for consumers to share their own experiences, often in the form of ratings and open-ended comments very similar to those captured via traditional service quality or customer satisfaction surveys.

Some argue that, as such websites proliferate, traditional survey research will be less necessary because the type of data typically furnished by surveys will be public and essentially “there for the taking.” Others respond by saying that, while website data are cheap and accessible, they also are not as representative or projectable as traditional surveys and therefore should be treated as a supplement to, rather than a substitute for, such surveys.

Consider an article that appeared on March 21, 2011 on Advertising Age’s website (“Will Social Media Replace Surveys as a Research Tool?”), which proclaims “the top research execu-tive of likely the world’s biggest research buyer expects surveys

to dramatically decline in importance by 2020 and sees the rise of social media as a big reason why.” Jack Neff, the author, was apparently quite taken by remarks made by Joan Lewis, global consumer and market knowledge officer at Procter & Gamble, in a speech delivered at a 2011 Advertising Research Foundation conference in New York. Lewis was quoted as saying that the industry should get away from, “believing that a [single] method, particularly survey research, will be the solution to anything,” and that, “the more people see two-way engagement and being able to interact with people all over the world, I think the less they will want to be involved in structured research.”

Not surprisingly, reactions to the article were mixed. Readers who agreeed with Lewis offered comments such as, “monitor-ing social media absolutely is a path to powerful insights” and, “social media listening is a more natural way of capturing the voice of the customer in a non-contrived way.” Other readers took an opposing view, making statements such as, “survey research can never be fully replaced by social media research, just as focus groups can never be replaced by surveys … every tool has a specific strength that covers a specific weakness of another

(2)

tool,” and, “relying on social media to be the primary source of research is like letting the squeaky wheel get all the grease.”

While they may be provocative, Lewis’ comments—and the online commentary that ensued—can only take this burgeoning debate so far. There is little doubt that social media channels are going to be with us for a long time, and that they can furnish a wealth of opportunities to listen, learn and respond to customers who share their experiences online. But, can or will social media replace surveys as a research tool? Can managers forgo invest-ments in traditional surveys and, instead, rely on website ratings and comments that are relatively cheap and readily available?

To answer these questions, we need research aimed at estab-lishing the degree to which social media and traditional surveys yield similar findings, where and how they are different, and under what circumstances (if any) one should be preferred

Unfortunately, only a few relevant studies have been conduct-ed, and the findings do not tell a clear or consistent story. Some studies have focused on the extent to which social media and survey data are correlated. For example, a study by Jamie Baker-Prewitt showed that social media “buzz” volume was positively related to several measures of customer loyalty and brand equity across 10 different brands. (“Listening Versus Asking: How So-cial Media Measures Relate to Consumer Attitudes and Purchase Behaviors.” Paper presented at the CASRO Online Research Conference, March 3-4, 2011.)

Other researchers have focused on the comparability of social media and survey data. For instance, Christian Bourque, Rick Hobbs and Danielle St. Hilaire compared comments obtained from 1,200 completed surveys with more than 10,000 social media entries. They found major differences both in the relative incidence of topics, as well as the proportion of positive, nega-tive and neutral comments, appearing in the two data sources. Their study also revealed that product owners and users were represented in the social media data to a much greater degree than in the survey data, and that the two data sources were not comparable with respect to the time frame and context of data collection. These findings led the authors to conclude that, “data provided by these two approaches are sufficiently different from each other that both are likely necessary.” (See “Apples and Or-anges,” Marketing Research, Fall 2011, 9-13.)

Gina Pingitore conducted two studies, one in the electric util-ity industry and the other in financial services, in which survey data from traditional online panelists were compared to those obtained from real time data collection (RTD) or “rivering” re-spondents. Results revealed that both incidence and completion rates were higher among traditional panelists, but that the two sample sources were comparable demographically. Pingitore also found that, while RTD and traditional panel sources produced comparable key driver analysis results, performance ratings and overall satisfaction indices generally were lower among RTD respondents. (See “Results from Real Time Data Collection versus Data from Traditional Panelists: Is It Valid to Combine Data from These Two Sources?” Paper presented at the CASRO Online Research Conference, March 3-4, 2011.)

In a study published at www.hotelmanagement.net on July 8, 2011 (“Study Shows TripAdvisor Is a Reliable Review Source”),

Robert Honeycutt and Jonathan Barsky compared ratings from TripAdvisor to those from a hotel customer satisfaction panel. They found that these two data sources provided similar results for some hotel properties, but conflicting results for others. In ad-dition, there was much more variability (i.e., extremely favorable or unfavorable ratings) in the TripAdvisor data.

From these studies it seems clear that: (a) no solid conclusions can be drawn regarding the comparability of social media and traditional survey data and (b) additional research is needed. With this in mind, we developed a study aimed to address three questions:

• Do website ratings tell the same story as ratings captured through traditional survey methods?

• What proportion of consumers who have an opportunity to share experience via website ratings actually do so?

• How similar are “posters” to “non-posters,” and do ratings from posters tell the same story as ratings from non-posters?

The Research

Addressing our first research question requires capturing ratings on a common set of items using a common rating scale. Also, it requires collecting these ratings from two sources: (a) a website sample of customers who have taken the initiative to share their experiences online; and (b) a sample of customers who may or may not have chosen to take their stories to a website, but who are willing to share evaluations of their experiences when solic-ited via a traditional survey approach.

In this study, we obtained Web-based and traditional survey-based ratings from guests who had stayed at one of 38 proper-ties of a leading upscale hotel brand during August 2011. To build the Web-based sample, all ratings posted on TripAdvisor for the 38 properties in question were “scraped” and assembled for analysis. This yielded a total of 369 completed TripAdvisor ratings. The smallest number of competed ratings for a single property was two, and the largest was 25.

To build the traditional sample, we were able to obtain a list of all guests who had stayed at one of the 38 properties during August 2011. Valid e-mail addresses were available for most of these guests and were used to make initial contact and invite them to visit a website where they would be asked to provide ratings on a small set of items, followed by a few additional questions. At an 18% response rate, this yielded a total of 1,586 completed ratings. The smallest number of completed ratings for a single property was 19, and the largest was 62.

Guests in both the TripAdvisor and traditional samples pro-vided the five core ratings typically captured in a TripAdvisor review: (1) overall hotel rating, (2) service, (3) value, (4) sleep quality and (5) cleanliness. This five-point scale was used to cap-ture all of the above ratings: excellent, very good, average, poor and terrible. In addition to (and after) furnishing these ratings, customers in the traditional sample also were asked about:

• The purpose of the August hotel stay (business or leisure); • The annual number of business and leisure trips;

• If they posted about the August stay and, if so, on which website;

(3)

months and, if so, on which website;

• If they used a website to plan their August stay and, if so, which website;

• If they used a website to plan any stay during the past 12 months and, if so, which website;

• Gender; and • Age.

These additional items enabled us to address the remaining re-search questions: What proportion of “qualified” guests actually take the time to share their experiences in the form of a website review? How similar are posters to non-posters? Do ratings from posters tell the same story as ratings from non-posters?

The Results

For most of the 38 hotel properties studied, the number of TripAdvisor and/or traditional survey ratings available for analysis at the individual property level was inadequate. There-fore, our results are based upon analysis of data that were ag-gregated across the 38 properties.

Website-based ratings tell a different story than traditional survey-based ratings. Two methods were used to compare TripAdvisor with traditional ratings. Method 1 consisted of drawing a proportionate random sample of 369 of the 1,586 guests who completed the traditional survey. This was done to create a sample of traditional ratings that had the same total number of observations as the TripAdvisor sample with a com-parable number of guests per property. Ratings from these guests were compared with those of the 369 guests who posted ratings

on TripAdvisor. Without exception, both mean and top-box (i.e., percent “excellent”) scores from TripAdvisor were lower than the same scores obtained from the traditional sample, and the differences were statistically significant at the 95% confidence level for all comparisons except the top-box scores on the overall hotel rating.

Recognizing that results from the preceding approach could reflect the luck of the draw, Method 2 involved bootstrapping both TripAdvisor and traditional samples. One thousand “sub-samples” were drawn randomly (with replacement) in order to estimate means, top-box scores and 95% confidence intervals for both TripAdvisor and traditional ratings. Method 2 produced essentially the same results as Method 1: On all five items, both mean and top-box scores from TripAdvisor were lower than the same scores obtained from the traditional sample, and the differ-ences were statistically significant for all comparisons except the top-box scores on the overall hotel rating.

We actually employed a third approach in which the data from all 1,565 guests in the traditional survey sample were com-pared to the data from the 369 TripAdvisor raters. A weighting procedure was applied to the traditional survey sample in order to achieve desired proportionality across properties. This pro-duced essentially the same results obtained from the proportion-ate random sampling and bootstrapping methods. Incidentally, and in contrast to results reported by Honeycutt and Barsky (2011), the range and variance in TripAdvisor and traditional survey-based performance ratings were comparable.

Results from both methods of comparison suggest that, as FIGURE 1: Difference in Conclusions Drawn from TripAdvisor® versus a Traditional Survey Sample of Ratings

!"#$%$&'('

)$*+,+-.+'$-'/0-.123$0-3'),45-'6,07'8,$9:;<$30,='<+,323'

4'8,4;$>0-41'?2,<+@'?4791+'06'A4>-B3'

TripAdvisor® results indicate that (a) “Value” has the highest relative importance of any customer experience category, and (b) the hotel receives its least favorable ratings on this category. 35% 26% 15% 24% 62% 41% 59% 0% 10% 20% 30% 40% 50% 60% 70% Importance Performance 26% 32% 22% _20% 50% 31% 47% 52% -10% 0% 10% 20% 30% 40% 50% 60% 70%

Service Value Cleanliness Sleep Quality Importance Performance

TripAdvisor®

Traditional

=)50> 2R-'&#+'.43+'06'R790,&4-.+E'9+,.+-&4B+3',+9,+3+-&'9,090,>0-'06'&0&41'<4,$4-.+'$-'&#+'0<+,411',4>-B'

' +"914$-+;' %@' +4.#' .23&07+,' +"9+,$+-.+' .4&+B0,@F' ' R-' &#+' .43+' 06' S+,60,74-.+E' 9+,.+-&4B+3' ',+9,+3+-&'&#+'9,090,>0-'06'B2+3&3'B$<$-B'&#+'#0&+1'4-'U+".+11+-&V',4>-B'0-'+4.#'.4&+B0,@F'

Service Value Cleanliness Sleep Quality Traditional survey results indicate that (a) “Service” has the highest relative importance of any customer experience category, and (b) the hotel receives its most favorable ratings on this category.

56%

NOTE: In the case of Importance, percentages represent proportion of total variance in the overall rating explained by each customer experience category. In the case of Performance, percentages represent the proportion of guests giving the hotel an “excellent” rating on each category.

(4)

they pertain to performance, Web-based ratings may paint a less favorable picture of the customer experience than ratings obtained via a more traditional survey approach. Can the same be said with regard to the importance of different customer experience categories or elements? To address this question, we performed a regression analysis using the overall hotel rating as the criterion variable and the remaining four ratings (value, service, cleanliness and sleep quality) as predictors.

An “averaging-over-ordering” regression method was used to perform key driver analysis. Essentially, this method involves performing multiple stepwise regressions in which all possible orders of entering the predictor variables are examined. The rela-tive importance of any given predictor is calculated as the mean of its regression coefficients resulting from all possible orders of entry. For a detailed description of this method, see Kruskal, W. (1987), “Relative Importance by Averaging Over Orderings” The American Statistician, 41, 6-10; Theil, H. and C. Chung (1988), “Information-Theoretic Measures of Fit for Univariate and Mul-tivariate Linear Regressions,” The American Statistician, 42, 249-252; and Soofi, E. S., J.J. Retzer and M. Yasai-Ardekani (2000), “A Framework for Measuring the Importance of Variables with Applications to Management Research and Decision Models,” Decision Sciences Journal, 31, 596-625.

A bootstrapping approach, similar to Method 2, was used to develop importance estimates and their 95% confidence intervals for both TripAdvisor and traditional samples. Results revealed that TripAdvisor and traditional sample data differ with respect to the order of importance of the four experience categories. However, as the 95% confidence intervals overlap, the differ-ences are not statistically significant.

Still, if only one or the other source is used, the TripAdvisor and traditional survey data do not tell the same story. As Figure 1 on page 11 illustrates, importance and performance scores from the TripAdvisor data suggest that “value” is the most important of the four customer experience categories, and the one on which the hotel brand performs poorest across the 38 properties examined. In contrast, importance and performance scores from the traditional sample data suggest that “service” is the most important category, and the one on which the brand receives its most favorable performance ratings.

Clearly, relying on only one of these two data sources would lead managers to draw very different conclusions regarding customer experience category importance and performance. The question is which of the two sources is most accurate? To answer this question, we have to understand a bit more about who is represented in website versus traditional survey samples.

Posters comprise a very small percentage of all guests. Of the 1,586 guests who comprised the traditional sample in this study, only 3% reported having posted a review about their August hotel stay on TripAdvisor or a similar website. Furthermore, when asked if they had posted a review regarding any hotel stay during the past 12 months, only 12% said “yes.” Clearly, the bulk of guests in the traditional sample have not made use of social media for the purpose of sharing their experiences with others, and therefore are not represented in data obtained from that social commentary/media.

Posters are behaviorally and demographically different from non-posters. In some respects, posters and non-posters are quite similar. For example, 52% of posters and 53% of non-posters said the purpose of their August 2011 trip was business. Also, when asked to estimate how many business versus leisure trips they make annually, responses of posters and non-posters were nearly identical:

• Mean annual business trips were 17.8 among posters, com-pared to 17.1 among non-posters.

• Mean annual leisure trips were 5.5 among posters, com-pared to 5.2 among non-posters.

However, posters differed significantly from non-posters in at least three ways:

1. Posters are more likely than non-posters to have researched website reviews/ratings while planning a trip during the past 12 months, and, as a rule, they do so more frequently:

• 97% of posters reported having checked a website while planning a trip during the past 12 months, compared to just 47% of non-posters.

• 64% of posters reported having checked a website five or more times while planning a trip during the past 12 months, compared to just 37% of non-posters.

2. Posters are more likely to be male:

• 67% of posters are male compared to 52% among non-posters.

3. Posters are more likely to be Gen-Xers than non-posters: • 48% of posters were born between 1964 and 1984, compared to 34% of non-posters.

Generally speaking, whether sharing their own experiences or reading those of others, posters are more engaged with social media than non-posters. Also, they are demographically different from non-posters in at least two ways.

Ratings obtained from posters tell a different story than ratings obtained from non-posters. The two methods used to compare TripAdvisor with traditional ratings were also used to compare posters with non-posters. The results produced es-sentially the same findings: Without exception, both mean and top-box scores from posters were lower than the same scores obtained from non-posters, and the differences were statisti-cally significant at the 95% confidence level for all top-box and

Executive Summar y

Do website ratings tell the same story as ratings captured through traditional survey methods? What proportion of consumers who have an opportunity to share experiences via website ratings actually do so? How similar are “post-ers” to “non-posters,” and do ratings from posters tell the same story as ratings from non-posters? This article addresses these questions and offers some recommenda-tions for getting the most out of both Web-based and more traditional surveys.

(5)

most mean scores. To evaluate relative importance of the four customer experience categories, we used the same basic analyti-cal approach used to compare TripAdvisor with traditional data. Results revealed that posters and non-posters do differ with respect to the order of importance of the four experience catego-ries. However, as was the case with the TripAdvisor and tradi-tional sample comparison, the differences are not statistically significant, with the exception of “service,” which is significantly more important among non-posters.

Keep Your Options Open

Website ratings do not appear to tell the same story as traditional survey ratings. Web-based ratings are consistently less favorable than traditional survey ratings, and, when used in conjunction with estimates of the relative importance of alterative customer experience categories, lead to different conclusions.

A very small minority of hotel guests post on websites like TripAdvisor. They are similar to non-posters with respect to purpose and frequency of travel, but different with respect to age and gender, and in terms of their overall level of engagement with social media. Ratings from posters do not appear to tell the same story as ratings obtained from non-posters. Ratings from posters are consistently less favorable than ratings from non-posters and often lead to different conclusions.

Admittedly, these findings are based on a single study in one industry. Furthermore, only one website among many that are available in the hotel industry was examined. Thus, none but tentative conclusions can be drawn regarding the comparability of Web-based and traditional survey-based customer feedback. That said, the results of our research reveal clear differences between the two data sources. Furthermore, the results suggest posters are not representative of most customers, and that their feedback is different from non-posters.

In light of these findings, one would be hard-pressed to defend a recommendation that Web-based ratings may be used as a replacement for more traditional survey ratings. Also, while not directly examined in the research reported here, additional limitations of Web-based data make it difficult to defend such a recommendation.

• Sample sizes, particularly at the individual property or unit level, are frequently too small to support statistical analysis and inference. This is an especially critical consideration when incen-tives and/or compensation of managers and employees at the unit level are in play.

• Unlike traditional surveys, observations obtained from Web-based ratings may not be independent. To the extent that a customer reads and is influenced by ratings of others, his or her own ratings may change, distorting the experiential story he or she might otherwise have shared and, thus, introducing a source of bias into Web-based data.

• Relying on website ratings that are captured and controlled by a third party means that managers are forced to live with the categories and rating scales that are in place. Categories such as “value” and “service” lack the granularity typically needed by managers to make ratings actionable. Also, there is little oppor-tunity for customization or exploration of critical ad hoc issues.

This is not to say that traditional surveys don’t have limitations. For ex-ample, declining response rates have been a concern for marketing researchers for many years now.

In our own research, only an 18% response rate was achieved using a tra-ditional survey approach. Even though this produced a sample more than four times the size of the Tri-pAdvisor sample, the tradi-tional survey approach still failed to capture feedback from 82% of guests who were “qualified” to share their experi-ences. This begs the question, “What are we leaving on the table when we fail to hear from these guests?”

Simply put, no method of capturing the voice of the customer (VoC) is perfect. Each has its relative strengths and limitations. At least one consequence is that alternative methods and data sources may tell different—sometimes conflicting—stories. Man-agers who rely on any single source do so at their own peril.

So, what should marketing researchers and managers do? • Rather than relying on traditional surveys or social media, capture and leverage data from both sources, along with other VoC listening posts.

• Select and apply specific VoC methods and data sources based on the appropriateness of each for addressing specific managerial questions and information needs.

• Develop and implement a process of VoC integration that incorporates a uniform set of customer experience categories, along with analytical techniques that make multiple data sources work together.

• Draw conclusions and support for decisions and actions based upon convergence of findings from multiple data sources.

Capturing and integrating VoC via multiple methods is in-creasingly being touted as a best practice. Results of the research described in this article support such an approach and, at the very least, should discourage managers from relying exclusively on any single VoC data source.

Perhaps the world in 2020 will have changed enough to force us to reconsider this conclusion—but for now, using social media to complement, rather than replace, traditional surveys seems the prudent way to go. MR

Note: Special thanks go to Jenny Anderson, Mary Barnidge, Laura Nicklin, Liz Reyer, William Ryback, Kurt Salmela and Jianping Yuan, without whose efforts this research would not have been possible.

✒ D. RANDALL BRANDT is senior vice president, customer experience management at Maritz Research. He may be reached at

Randy.Brandt@maritz.com. AMA Articles

The Marketing Outlook Survey, LoopFuse, Inc., 2012

Marketers Use Location Analytics to Understand, Engage Shoppers, Marketing Researchers, 2012 AMA Webcast

How to Get Exceptional Consumer Insights and Market Research Using Facebook Data, sponsored by MicroStrategy, 2012

gotomarketingpower.com