4.3 Research process
4.3.7 Analysis of data
4.3.7.2 Inferential statistical analyses
Inferential statistics are used to infer a finding about the population from which the sample was drawn, based on the characteristics of the sample (Van Zyl, 2014:177). Inferential analyses were done on the data gathered on all three logistical supply chain drivers to determine whether the industry in which the respondents operate, influences their responses. When conducting inferential statistical analyses, the corresponding p-value for each test should be considered. The p-value reports the extent to which the test statistics disagree with the null hypotheses (Williams et al., 2006:359). Cooper and Schindler (2011:462) explain that “the p-value is the probability of observing a sample value as extreme as, or more extreme than, the value actually observed, given that the null hypotheses is true.” The calculated p-value is compared to the level of significance and based on this comparison, the null hypotheses is accepted or rejected. The significance level of a test is defined as the probability of making a decision to reject the null hypothesis, when the null hypothesis is actually true (a decision known as a Type 1 error).
In this study the chosen level of significance was 0.05. Therefore, for a statistical test to be deemed statistically significant, the p-value had to be lower or equal to 0.05 (p ≤ 0.05). In this study three inferential statistical tests were conducted namely, the Kruskal-Wallis test, the Pearson Chi-Square test and binary logistic regression. Each test is explained in more detail below.
a) The Kruskal-Wallis Test
The Kruskal-Wallis test is a one-way analysis of variances by rank test. The Kruskal-Wallis test is a nonparametric test, which is used to compare the medians of three or more independent variables (Cooper and Schindler, 2014:617). The Kruskal-Wallis test was used to determine whether there is a statistical significant difference between the six industries with regard to the responses relating to the questions on the three logistical supply chain drivers. The nonparametric test was used because the data being measured was on an ordinal scale. If the p-value of the
Kruskal-Wallis test was higher than 0.05 (p ˃ 0.05), no statistically significant difference existes at the 5% level, between the different industry groups (refer to table 4.6) with regard to the specific question; therefore, the industry group in which the respondents operate do not influence their responses. On the other hand, if the p-value was lower or equal to 0.05 (p ≤ 0.05), a statistically significant difference between the six industry groups, with regard to the responses relating to the specific question, exists. To further investigate the differences between the six industry groups with regard to the specific question (where a statistical significant difference exists), the mean ranks generated by the Kruskal-Wallis test were considered. The mean ranks are described as the sum of the ranks, assigned in ascending order to all observations, for each specific subgroup, divided by the number of observations for each subgroup/category. A mean rank does not indicate a fixed point on a scale, but only shows a tendency to the left or right anchor points of a scale.
Based on the relative position of the mean rank, it was construed that the industry with the highest mean rank shows a higher frequency or level of importance (on the scale of either never, sometimes and always or on the scale of unimportant to very important) compared to other industry groups.
b) The Pearson Chi-Square Test
The Pearson Chi-Square test checks for a significant difference between the observed distribution of data in the categories and the expected distribution, based on the null hypotheses (Cooper and Schindler, 2011:469). The Pearson Chi-Square test was used to determine whether there is an association between the six industry groups (see table 4.6) and the responses relating to the questions regarding the three logistical supply chain drivers. The nonparametric test was used as the data being measured was on a nominal scale. As with the Kruskal-Wallis test, if the p-value of the Pearson Chi-Square test was higher than 0.05 (p ˃ 0.05), no statistically significant association exists at the 5% level among the different industry groups and the specific question; therefore, the industry group (refer to table 4.6) in which the respondent operates do not influence their responses. Then again, if the p-value was lower or equal to 0.05 (p ≤ 0.05), a statistically significant association do exist between the industry groups and the specific question.
c) Binary logistic regression model
Logistic regression is a statistical method used to determine whether one or more independent variables can be used to predict a dichotomous dependent variable. Before the binary logistic regression model could be developed, preparatory principal component analyses were employed to reduce the data on the three logistical supply chain drivers.
Principal component analysis is a technique that is used to examine patterns of relationships between selected variables, in order to establish if there is an underlying combination of the original variables that can summarise the set of variables. Basically, a principal component analysis is performed in order to create a more manageable number of variables from a larger set (Cooper and Schindler, 2014:657). The principal component analysis in this study resulted in valid and reliable factors, as well as independent items, which were then used to develop the two binary logistic regression models (see chapter 6). Principal component extraction and varimax rotation were used to determine the factor structure of the questions; where unidimensionality measured the extent to which the questions (referred to as items – see section 6.2.2) related to a specific aspect. When conducting principal component analyses, some general guidelines are pertinent - of which the following relate to this study:
To ensure sampling adequacy, two measures were used to determine whether a principle component analysis could be conducted on the questions (items), namely the Kaiser-Meyer-Olkin test and the Bartlett’s test of sphericity. If the Kaiser-Kaiser-Meyer-Olkin value was higher or equal to 0.5 (p ≥ 0.05), and the Bartlett’s test of sphericity was significant (p=0.000), a principle component analysis could be conducted on the question.
The number of factors identified by the principal component analysis was identified through the Eigenvalue criterion of Eigenvalues greater than one.
Once a factor(s) had been identified by the principle component analysis, the internal consistency of the factor(s) was considered by examining the Cronbach Alpha value of each factor(s). The Cronbach Alpha value measured how closely related the different questions (items) within a group were to each other. A value above the exploratory threshold of 0.6 was deemed as satisfactory.
After the data on the three logistical supply chain drivers were reduced, two binary logistic regression models were developed. Using regression terminology, the variable that is being predicted is referred to as the dependent variables and the variables used for the prediction are referred to as independent variables (Williams et al., 2006:561). When developing a binary logistic regression model the independent variables can only predict the probability of a binomial outcome (one of two possible outcomes) for the dependent variable. The following should be noted when developing a binary logistic regression model:
To determine adequacy of fit, the Hosmer and Lemeshow test is used. The Hosmer and Lemeshow test uses a null hypothesis to determine whether the fitted model is adequate (goodness of fit) by providing a p-value between 0 and 1. The hypothesis for the Hosmer and Lemeshow test can be stated as follows:
H0: There is a goodness of fit of the logistic regression model H1: There is no goodness of fit of the logistic regression model
If the p-value is higher than 0.05, H0 cannot be rejected and the logistic regression can be assumed to be adequate. The higher the p-value of the Hosmer and Lemeshow test, the better the fit of the logistic regression model (Allison, 2014:1).
The overall correct prediction classification of the binary logistic regression model indicates the accuracy of the prediction regarding the dependent variable of the model. The overall correct prediction classification is calculated for both possible outcomes of the dependent variables.
If the Hosmer and Lemeshow test indicates a goodness of fit (H0 is valid), the odds ratios of the regression model are considered. Independent variables with a p-value lower or equal to 0.05 (p≤0.05) are statistically significant predictors of the dependent variable of the regression model and only the odds ratios of these independent variables are considered.
An odds ratio indicates whether the independent variable influences the dependent variable positively by having an odds ration larger than 1, or negatively by having an odds ratio lower than 1. The decimals of the odds ratio indicate the percentage of the positive or negative influence of the independent variable on the dependent variable. To calculate the
percentage positive influence of the independent variable on the dependent variable, the following calculation should be conducted:
Odds ratio – 1 x 100 = % positive influence on the dependent variable
To calculate the percentage negative influence of the independent variable on the dependent variable, the following calculation should be conducted:
1
odds ratio = % negative influence on the dependent variable
When data is statistically analysed, three requirements need to be considered, namely validity, reliability and practicality.