CHAPTER 4: CONTENT ANALYSIS RESULTS
4.4 Statistical Analysis of the Relationship between Credibility and Credibility Factors
4.4.2 Regression Analysis
There were 15 credibility factors in the Yahoo! Answers dataset, and a full model with these factors (independent variables in the model) was generated (refer to Table 17). All the independent variables were rescaled by z-transformation to obtain standardized regression coefficients. If independent variables are in different scales, there are biases that favor large numbers. The combination of 15 independent variables could explain 80.12% of the variance in the credibility (F15,474 = 130, p < 2.2E-16). This full model is statistically significant in predicting
the credibility of answers in Yahoo! Answers. The Wherry 2 adjusted R2 was 0.795.
T-statistics in the regression analysis compare two models that differ by only one variable either by dropping the variable under consideration from the “larger” model or adding the
variable under consideration into the “smaller” model. The t-statistic in Table 17 indicated that five credibility factors (honesty, presentation, first-hand experience, currency, and credence) did not contribute above and beyond the other variables. In other words, those factors showed no statistically significant differences at α = 0.05 level. Other ten independent variables did make statistically significant contributions to the model. The factor that had the highest was
plausibility, followed by comprehensiveness. Four factors were unique to Yahoo! Answers and 11 factors could be applied to both types of social media. Regarding unique factors, three out of
four factors had a statistically significant effect. Regarding common factors, seven out of 11 factors had a statistically significant effect.
Table 17. Linear Regression with the Full Model (Yahoo! Answers) Social media type Variable Raw
coefficient Std. error t-value p-value Constant 1.19E-15
Unique (Yahoo! Answers)
Plausibility 0.428 2.77E-02 15.438 < 2E-16*** Relevance 0.156 2.48E-02 6.307 6.41E-10*** Sympathy 0.079 2.26E-02 3.509 4.93E-04*** First-hand
experience 0.039 2.60E-02 1.506 0.133
Both
(Yahoo! Answers and Yelp)
Comprehensive 0.285 2.42E-02 11.741 < 2E-16*** Objectiveness 0.186 2.51E-02 7.419 5.34E-13***
Negativeness -0.077 2.17E-02 -3.554 4.17E-04*** Promotion -0.07 2.23E-02 -3.137 1.81E-03** Positiveness 0.06 2.14E-02 2.809 5.17E-03*
Reference 0.05 2.07E-02 2.415 1.61E-02* Expertise 0.049 2.23E-02 2.217 2.71E-02* Presentation 0.04 2.11E-02 1.893 5.90E-02
Honesty 0.024 2.71E-02 0.882 0.378 Currency 0.017 2.13E-02 0.775 0.439 Credence 0.016 2.09E-02 0.779 0.437 * significant at p < 0.05; ** significant at p < 0.005; *** significant at p < 0.001 The results in the regression analysis were not consistent with the correlation analysis because only the reference had a statistically insignificant correlation with the credibility in the correlation analysis while there are several factors with no statistical significance in the
regression model. There can be several reasons for this discrepancy. First, regression analysis examines causation, but correlation analysis does not examine causation. In contrast to the correlation examining bi-directional relationships, in the case of regression analysis examining uni-directional relationships from independent variables to dependent variables, the influence of each factor is inevitably reduced. In addition to that, multicollinearity between credibility factors also could be one of the potential reasons as correlation analysis examines the one-to-one
relationships between independent variables and dependent variable, and regression analysis examines the one-to-many relationship.
The Variance Inflation Factor (VIF) was measured to examine the degree of inflation of variance caused by multiple factors. The VIF function of the R Car package was used for calculation. The general rule of thumb about VIF is that values above 4 require further investigation and VIFs above 10 are a sign of severe multicollinearity and need correction
(O’brien, 2007). In the Yahoo! Answers dataset, the VIFs ranged from 1.041 to 1.867 (Table 18). Multicollinearity did not appear to be present in the Yahoo! Answers dataset. Thus, no further investigation was made for checking multicollinearity between credibility factors.
Table 18. Variance Inflation Factors (Yahoo! Answers) Credibility Factor Variance
Inflation Factor Credibility Factor
Variance Inflation Factor
Plausibility 1.867 Expertise 1.208
Honesty 1.783 Negativeness 1.145
First-hand experience 1.646 Positiveness 1.118
Objectiveness 1.530 Currency 1.100
Relevance 1.497 Presentation 1.083
Comprehensiveness 1.430 Credence 1.060
Sympathy 1.242 Reference 1.041
Promotion 1.211
In the regression analysis, it is essential to check the normal distribution of the credibility factors, but it is hard to check because they were all coded as binary. There is no data point in the center side, but there are data points only in the upper and lower sides. Instead, bootstrapping was applied to the regression model to check if the normality assumption could be problematic. The boot function of the R Car package was used. The number of iterations was 100,000. The second column explains original coefficients in the linear regression with the full model, the third column explains bootstrapped coefficients, and fourth and last columns explain confidence intervals of original coefficients at 95% level in Table 19. The table was sorted in descending order of bootstrapped coefficients. All the bootstrapped coefficients were similar to the original coefficients and within the range of confidence intervals (CI) at the 95% level (Table 19). Thus, the regression analysis results would not be affected by non-normality in the Yahoo! Answers dataset.
Table 19. Bootstrapped Linear Regression with the Full Model (Yahoo! Answers) Original
coefficient
Bootstrapped
coefficient CI 2.5% CI 97.5% Constant 1.19E-15 3.41E-04 -4.00E-02 3.94E-02
Plausibility 0.428 0.428 0.358 0.499
Comprehensiveness 0.285 0.284 0.242 0.326
Objectiveness 0.186 0.186 0.134 0.240
Relevance 0.156 0.157 0.099 0.215
Sympathy 7.93E-02 7.99E-02 3.94E-02 0.120
Positiveness 6.02E-02 5.96E-02 2.18E-02 9.86E-02 Expertise 4.94E-02 4.91E-02 1.54E-02 8.36E-02 Presentation 3.99E-02 4.67E-02 6.83E-03 7.68E-02 Reference 4.99E-02 4.96E-02 1.58E-02 8.31E-02 First-hand
experience 3.92E-02 3.92E-02 -1.34E-02 9.12E-02 Honesty 2.39E-02 2.43E-02 -2.78E-02 7.58E-02 Credence 1.63E-02 1.68E-02 -1.31E-02 4.58E-02 Currency 1.65E-02 1.61E-02 -1.55E-02 4.93E-02 Promotion -7.00E-02 -6.98E-02 -0.121 -2.01E-02 Negativeness -7.71E-02 -7.72E-02 -0.133 -2.30E-02
Yelp
There were 14 credibility factors in the Yelp dataset, and a full model was generated with these independent variables (refer to Table 20). All the independent variables were rescaled by z- transformation. The combination of 14 independent variables could explain 41.68% of the
variance in the credibility (F14,485 = 24.75, p < 2.2E-16). This full model was statistically
significant in predicting the credibility of Yelp reviews. The Wherry 2 adjusted R2 was 0.4. The t-statistic indicated that eight credibility factors (expertise, credence, reference, excessive negativeness, positiveness, presentation, currency, and promotion) did not contribute above and beyond the other variables. Those factors showed no statistically significant
differences at the α = 0.05 level. Six other independent variables did make statistically significant contributions to the model. The factor that had the highest was objectiveness,
followed by specificity. Three factors were unique to the Yelp and 11 factors could be applied to both types of social media. Regarding unique factors, two out of three factors had a statistically significant effect. Regarding common factors, four out of 11 factors had a statistically significant effect. More details are summarized in Table 20.
Table 20. Linear Regression with the Full Model (Yelp) Social media
type Variable
Raw
coefficient Std. error t-value p-value Constant 3.92E-16
Unique (Yelp)
Specificity 0.291 4.03E-02 7.212 2.13E-12*** Excessive
Positiveness 0.115 4.18E-02 2.748 6.22E-03* Excessive
Negativeness -3.88E-02 4.58E-02 -0.846 0.398
Both
(Yahoo! Answers and Yelp)
Objectiveness 0.373 4.05E-02 9.207 < 2E-16*** Honesty 0.176 3.73E-02 4.722 3.06E-06*** Comprehensiveness 0.117 3.75E-02 3.111 1.98E-03**
Credence 3.00E-02 3.56E-02 0.844 0.399 Positiveness 2.95E-02 3.96E-02 0.734 0.464 Reference 1.92E-02 3.48E-02 0.551 0.582
Currency 4.88E-03 3.61E-02 0.135 0.892 Presentation -5.24E-03 3.52E-02 -0.149 0.882 Expertise -2.56E-02 3.50E-02 -0.729 0.466 Promotion -6.54E-02 3.51E-02 -1.863 6.31E-02 Negativeness -9.02E-02 4.52E-02 -1.995 4.66E-03** * significant at p < 0.05; ** significant at p < 0.005; *** significant at p < 0.001
The differences in results between the correlation analysis and the regression analysis in the Yelp dataset was smaller than the differences found in Yahoo! answers dataset. Currency showed a statistically significant difference in the correlation analysis, while it did not show the difference in the regression analysis. Excessive positiveness and negativeness showed
statistically significant differences in the regression analysis, but not in the correlation analysis. Promotion was the only factor negatively correlated to credibility in the correlation analysis, whereas four more factors (presentation, negativeness, excessive negativeness, and expertise) had a negative linear relation in the regression analysis. Reviews presented well and written by experts are expected to have a positive relationship with credibility, but these factors had no statistically significant effects.
VIF was measured to examine the degree of inflation of variance caused by credibility factors, although there was no obvious possibility of significant multicollinearity. In the Yelp dataset, the VIFs ranged from 1.008 to 1.745. Multicollinearity did not appear to be present in the Yelp dataset. Thus, no further investigation was made for checking multicollinearity between credibility factors. Further details about VIF can be referred in Table 21.
Table 21. Variance Inflation Factors (Yelp) Credibility Factor Variance Inflation
Factor Credibility Factor
Variance Inflation Factor Objectiveness 1.867 Promotion 1.211 Credence 1.783 Currency 1.208 Positiveness 1.646 Expertise 1.145 Specificity 1.530 Presentation 1.118 Comprehensiveness 1.497 Negativeness 1.100 Honesty 1.430 Excessive Negativeness 1.060 Excessive Positiveness 1.242 Reference 1.041
Bootstrapping was applied to the regression model to check the potential violation of the normal distribution assumption. The number of iterations was 100,000. The second column explains original coefficients in the linear regression with the full model, the third column explains bootstrapped coefficients, and the fourth and last columns explain confidence intervals of original coefficients at the 95% level. The table was sorted in descending order of
bootstrapped coefficients. All the bootstrapped coefficients were similar to the original
coefficients and within the range of confidence intervals (CI) at the 95% level (Table 22). Thus, the regression analysis results would not be affected by non-normality in the Yelp dataset either.
Table 22. Bootstrapped Linear Regression with the Full Model (Yelp) Original
coefficient
Bootstrapped
coefficient CI 2.5% CI 97.5% Constant 3.92E-16 9.21E-04 -6.72E-02 6.91E-02
Objectiveness 0.373 0.395 0.291 0.446 Specificity 0.291 0.297 0.22 0.376 Honesty 0.176 0.174 0.103 0.244 Excessive Positiveness 0.115 0.125 3.67E-02 0.218 Comprehensiveness 0.117 0.116 4.54E-02 0.186
Credence 3.00E-02 5.59E-02 -4.02E-02 0.103
Positiveness 2.95E-02 5.24E-02 -1.65E-02 0.198 Reference 1.92E-02 1.92E-02 -6.96E-02 8.37E-02 Presentation -5.24E-03 4.85E-03 -5.61E-02 5.32E-02 Currency 4.88E-03 3.22E-03 -6.50E-02 7.11E-02 Expertise -2.56E-02 -2.60E-02 -6.16E-02 1.01E-02 Excessive
Negativeness -3.88E-02 -3.02E-02 -0.125 6.92E-02 Promotion -6.54E-02 -6.56E-02 -1.867 5.77E-02 Negativeness -9.02E-02 -8.30E-02 -0.174 7.57E-02