Model diagnostics - ESTIMATION MODEL - Social media and mobile money adoption: comparative evid

4.5 ESTIMATION MODEL

4.5.1 Model diagnostics

Following the estimation of the binary logistic regression model parameters using the maximum likelihood estimator, it was essential to evaluate the significance of the use of social media, the interaction term and control variables with regard to predicting an individual’s adoption of mobile money services. As in Harrell (2001), there were a number of statistics that were used for such an evaluation - the odds ratio, pseudo

R2 equivalents, log-likelihood ratio, omnibus test of model coefficients, Hosmer- Lemeshow goodness of fit, classification table and Wald test.

4.5.1.1 Odds ratio

The binary logistic model estimation measures the link between the dichotomous dependent variable (mobile money adoption) and the predictors (social media and control variables) using the odds ratio. Hosmer and Lemeshow (1989) explain that the odds ratio refers to a measure of association between the binary outcome and an independent variable that provides a clear indication of how the risk of the outcome being present changes with the variable in question. Therefore, the odds ratio is the likelihood that an event will occur (an individual’s adoption of mobile money technology) divided by the probability that it will not (non-adoption). O’Connell (2011) observes that the odds ratios are bounded below by 0 but have no upper bound, thus, the odds ratio can range from 0 to infinity. The odds ratio formula that indicates whether the chances of a success case are equal to those of failure is given by:

Odds Ratio= Odds of Case

Odds of Non-Case (9)

Strong associations between independent variables and the outcome are typically represented by odds ratios further from 1 in either direction. A value less than 1 indicates that a unit increase in an independent variable, holding other variables constant, will result in the outcome less likely to occur; a value greater than 1 indicates that a unit increase in the independent variable holding other variables constant will lead to a high likelihood of occurrence of the outcome (Muchabaiwa, 2013). The statistical significance of an odds ratio is typically analysed by testing whether the regression coefficient, β, is statistically different from zero through any one of the Wald, score, or likelihood ratio tests.

4.5.1.2 R2 Equivalents for logistic regression

One way of evaluating the effectiveness of a regression model is to calculate a statistic which measures how strong the relationship between the explanatory

variable(s) and the outcome is (Kleinbaum and Kein, 2010). This statistic is represented by the R2 measure in linear regression analysis. However, Greene (2008) and Harrell (2001) note that in modelling binary or other discrete choices there is no direct counterpart to the R2 goodness of fit statistic as is applied in linear regression in assessing the predictive power of a model. Instead, a pseudo R2 whose value is similar to the R2 in multiple regression is estimated. The pseudo R2 in binary logistic regression lies between “0” and “1”, with a value of “1” indicating that the fitted model accounts for 100% of variance in the dependent variable (outcome), while a value “0” means that it explains none of the variance (ibid). The R2 measure for binary logistic using the IBM SPSS is estimated by the Cox and Snell (1989) R2. Hosmer and Lemeshow (2000) observe that the value of the Cox and Snell Pseudo R2 cannot reach 1. Nagelkerke (1991), however, improved it to reach 1; a value of 1 is an indication of a perfect fit whilst a value of zero is an indication that there is no relationship, thus, the higher the R2value the better fit of the model.

4.5.1.3 Log-likelihood Ratio

The Log-likelihood ratio is a statistical measure used in comparing the goodness of fit of two estimated models - that is the null model with just the constant (β₀), and a full model after addition of independent variables. Muchabaiwa (2013) argues that a decline in the Log-likelihood ratio from the null to the full model is an indicator of improved goodness of fit of the model.

4.5.1.4 Omnibus Test of model coefficients

The Omnibus test statistic is a measure of the overall model fit. Lawrence, Gamst, and Guarino (2006) and Muchabaiwa (2013) note that the Omnibus Test statistic is comparable to the F-test in linear regression. Thus, the null hypothesis is to be rejected if the obtaining p-value of the Omnibus test of model coefficients is less below 0.05 (significance level). A significant test statistic suggests that the binary logistic regression is an adequate fit, and can therefore be used to model the observed data (ibid).

4.5.1.5 Hosmer-Lemeshow goodness of fit

An alternative method for assessing model fitness is the Hosmer-Lemeshow goodness of fit test. This statistic compares the predicted values against the actual values of the dependent variable. The Hosmer-Lemeshow goodness of fit test is comparable to the chi-square test and forms several groups referred to as deciles of risk based on the estimated probabilities for the sample (O’Connell (2011). A good fit model will have a small Hosmer-Lemeshow test statistic and a p-value that is greater than the 0.05 significance level (Hosmer and Lemeshow, 1989; 2000).

4.5.1.6 Classification table

A classification table also measures the predictive accuracy of a binary logistic regression model (Muchabaiwa, 2013). This method involves cross classifying the dependent variable y with the categorical variable coming from the fitted logistic probabilities (ŷ). The percentage of successes that have been correctly classified as such is referred to as the sensitivity of the model, while the percentage of failures that have been correctly classified is the specificity of the model (ibid). The failures that are incorrectly classified as success are referred to as false positive and the success that are incorrectly classified as failures are referred to as false negatives (Sharma, 1996). Table 4.4 below shows a typical classification table.

Table 4.4: Classification Table

Predicted Mobile Money Adoption

Decision Percentage Correct Yes (success) No Adoption (failure) Mobile Money Adoption Decision Yes (success) a b 𝑎 𝑎 + 𝑏(100) No Adoption (failure) c d 𝑑 𝑐 + 𝑑(100) Overall Percentage 𝑎 + 𝑑 𝑎 + 𝑏 + 𝑐 + 𝑑(100)

In table 4.4 above, the ratio _a+ba (100) is the specificity of the model, and_c+dd (100) is the sensitivity of the model. High values for specificity and sensitivity are an indication of a good fit of the model (Muchabaiwa, 2013). Kutner et al. (2005) argue that if a model fitting sample produces the same prediction error rate as the validation sample, then the fitted model will be reliable.

4.5.1.7 Wald test

The Wald statistic is employed to evaluate the significance of individual logistic regression coefficients, specifically whether the explanatory variable’s coefficient is significantly different from zero. The parameter estimate for the effect of each independent variable in a binary logistic model (the Wald test) is divided by its respective standard error, and the results are squared to represent a value from the chi-square distribution with one degree of freedom under the null hypothesis of no effect (O’Connell, 2011). IBM SPSS reports the Wald test chi-square statistics for each variable in the fitted model. The Wald statistic is chi-square distributed with 1 degree of freedom. The null hypothesis is to be rejected if the p-value of the Wald test is below 0.05 (significance level) - a coefficient with a p-value which is less than 0.05 implies that the variable is significant in the model (Muchabaiwa, 2013).

In document Social media and mobile money adoption: comparative evidence from South Africa and Zimbabwe (Page 98-102)