Multiple regression analysis - Analysis of variables

6.2 Analysis of variables

6.2.3 Multiple regression analysis

A stepwise regression analysis was carried out to determine which of the variables and factors were the strongest predictors of perceived budgeting accuracy. The stepwise method enters predictors with the highest t-statistic into a model until none with a significance of <0.05 are left. The requirements are that the residual data is normally distributed and that there is no significant correlation between the independent variables i.e. multicollinearity.

Table 6.4 Regression model output for the dependant variable of perceived budgeting accuracy

Standardized Coefficients

Variable β Std. Error Beta Tolerance VIF

(Constant) 1.884 .662 2.848 .006

SACC Accuracy of student number estimates .473 .107 .472 4.416 .000 1.000 1.000

(Constant) 2.342 .626 3.739 .000

SACC Accuracy of student number estimates .465 .099 .464 4.690 .000 1.000 1.000 DFWD Difficulty caused by allowing carry forward -.265 .076 -.347 -3.506 .001 1.000 1.000

(Constant) 1.660 .691 2.401 .019

SACC Accuracy of student number estimates .375 .106 .374 3.542 .001 .835 1.197 DFWD Difficulty caused by allowing carry forward -.230 .076 -.301 -3.042 .003 .951 1.052 FACC Forecasting accuracy .265 .126 .228 2.108 .039 .799 1.252

(Constant) 2.305 .733 3.142 .003

SACC Accuracy of student number estimates .372 .103 .371 3.614 .001 .835 1.197 DFWD Difficulty caused by allowing carry forward -.225 .073 -.295 -3.068 .003 .950 1.052 FACC Forecasting accuracy .283 .123 .243 2.311 .024 .795 1.258 TIMP Time spent preparing the budget -.048 .022 -.207 -2.196 .032 .995 1.005 Note: R2_{= .223 & F = 19.5 for model 1; R}2_{= .343 & F = 17.5 for model 2; R}2_{= .385 & F = 13.8 for model 3; R}2_{= .429 & F = 12.2 for model 4.}

Step 1 Step 2 Step 3 Step 4 Model Unstandardized Coefficients t Sig. Collinearity Statistics

171

(Note: The value for R is the Pearson correlation between the actual and predicted values. R2 shows how well the model generalises to the predictor values and is the proportion of the variance accounted for by those predicted values. The F value represents the ratio of the improvement in prediction that results from each iterative model together with the significance level. A good model should have a large F ratio, i.e. greater than at least 1.)

The model parameters in Table 6.4 show the unstandardised β indicating the individual contribution of each predictor to the model. The standardised β identifies the number of standard deviations that the outcome will change as a result of one standard deviation change in the predictor and gives an insight in to the importance of the predictor. As the predictors have different scales for the research data obtained, their relative strengths (the beta coefficients) are compared by standardising them.

For the dependent variable of perceived budgeting accuracy the specific model would be:

Budgeting accuracy = 0.372 SACC – 0.225 DFWD + 0.283 FACC - 0.048 TIMP + E (error)

(The constant from the unstandardised scores of 2.305 is excluded as the mean of standardised scores is zero.)

VIF and tolerance statistics are used to assess whether there is collinearity (these statistics are within the required parameters of: VIF <10 and tolerance >0.2, as suggested by Field, 2013). The F ratio is also >1.

The predictor formula derived above demonstrates that a relatively small number of variables are related to perceived budgeting accuracy. Specifically, the results showed that the perceived accuracy of student number estimates and forecasting were positive predictors of perceived budgeting accuracy and represented the largest elements at +0.372 and +0.283. The difficulty caused by allowing unspent balances to be carried forward together with the time spent preparing the budget had a negative effect of -0.225 and -0.048, with the size of the latter indicating that it

172

had little significance. The former may have a negative effect because of the unpredictability caused by assessing how much and when such balances would be spent during the year.

Some responses to Question E2 indicated that there may have been instances of forecasting being confused with budgeting (see section 5.6). This was not picked up during the pilot testing as most respondents were led through the questionnaire by the researcher and none indicated a misinterpretation of the requirement of the question. Assuming that misinterpretation might have also arisen with responses to Question E10, which considered the key variable of the perception of forecasting accuracy, the model was rerun with this variable removed to assess how this influenced the multiple regression output. The result is shown below:

Budgeting accuracy = 0.414 SACC – 0.264 DFWD + E (error)

The revised model now has two predictors only, with greater emphasis being placed on the accuracy of student number estimates and the difficulty caused by allowing unspent balances to be carried forward. The limited influence exerted by the variable of the time spent preparing the budget (TIMP) is no longer included as a predictor.

In cases where multicollinearity is too high, Hair et al. (2016) suggest removing one or more of the highly correlated variables from the regression model as they can affect the statistical significance of the individual regression coefficients. They offer a rule of thumb of +0.60 as evidence of potential problems. Removing the ten variables with correlations above +0.60 (INCO, EXPO, FCOU, TFEC, RESG, OTHI, ENIC, STUS, SPAC and STAF) did not alter the coefficients in Table 6.4.

As multiple regression analysis is based on an assumption of normal distribution, a Kolmogorov- Smirnov test was undertaken (appropriate for sample sizes over 50) which showed that most factors and variables had a significance of <0.05. Indicating that much of the data significantly deviated from a normal distribution. However, a visual inspection of the histograms shows that many appear to have a normal distribution curve, with evidence of skewness from Q-Q (quantile- quantile) plots [which are similar to P-P (probability-probability) plots but look at quantiles (where the data is split into equal portions) rather than every individual piece of data]. An analysis of the

173

concentration of data in the centre, the upper and the lower ends (tails) and the shoulders (between the centre and the tails) of the distribution of variables and valid factors indicates that 27% have a value of +3.29 when dividing the Kurtosis value by the standard error which results in a concern about normality of the distribution (Field, 2013). However, Field (2013, p.187) advises not to be over-reliant on tests such as Kolmogorov-Smirnov and Shapiro-Wilk because “small and unimportant deviations from normality might be deemed significant” and suggests a review of evidence provided by plotting the data, as does Hair et al. (2010). Indeed, Ghasemi and Zahediasl (2012) note that the Kolmogorov-Smirnov test has low power and should not be seriously considered for assessing normality.

Field (2013) explains that the central limit theorem “states that when samples are large (above about 30) the sampling distribution will take the shape of a normal distribution regardless of the shape of the population from which the sample was drawn” (p.871) and that where the “sample is fairly large, outliers are a more pressing concern than normality” (p.172). This view is also supported by Ghasemi and Zahediasl (2012) who refer to sample sizes greater than 30 or 40 as resulting in normal distribution. They also explain that boxplots which are symmetric with a median line which is approximately at the centre of the box and with symmetric whiskers slightly longer than the intersections of the centre box suggest normal distribution. Outliers can bias estimates of parameters and have a significant effect on the sum of squares on which most statistics are based.

Outliers were identified using boxplot diagrams in this study and the multiple regression output (excluding cases listwise) was re-tested by trimming these outliers from the key variables. Whilst this did not result in a significant change to the multiple regression model, the dangers of simply deleting the outliers from the population sampled are recognised as there is no reason to believe that they are not a valid element of the population. An alternative approach of winsorizing was therefore employed which replaces the outliers with a score of 3.29 standard deviations from the mean (Field, 2013). For the dependent variable of budgeting accuracy the model would then be:

174

In addition to testing for normality and outliers, linearity and homogeneity were investigated using scatterplot graphs. These did not reveal serious violation of linear assumptions. For example, the histogram, Q-Q plot, boxplot and scatterplot for each of the key accuracy variables are reproduced below.

The histograms show that the data is approximately normally distributed with a peak towards the middle and fairly symmetrical. The histograms for the accuracy of budgeting (BACC) and the factor for accuracy (COMA) do however demonstrate some skewing of the data. The alternative graph method of a Q-Q plot shows that the scatter lies close to the line without an obvious pattern moving away from the line which indicates a normal distribution.

The boxplots show the median at the centre, with the top and bottom of the box representing the middle 50% of observations and the whiskers approximately the top and bottom 25%. Any score greater than the upper quartile plus 1.5 times the inter-quartile range is shown as an outlier. There are two such instances for the accuracy of student number estimates (SACC) and one for the accuracy of forecasting (FACC). In the case of student number estimates (SACC) and accuracy of forecasting (FACC) the whiskas are the same length indicating a symmetrical distribution. For the accuracy of budgeting (BACC) and the factor for accuracy (COMA) the upper whisker is longer than the lower, indicating signs of a skew.

The scatterplots indicate the relationship between one variable (COMA) and another (BACC, FACC and SACC). The shape of the distribution in each case reveals a positive linear relationship.

175

1. Accuracy of budgeting (BACC)

2. Accuracy of student number estimates (SACC)

176

3. Accuracy of forecasting (FACC)

177

4. Factor for accuracy (COMA)

As a final test, problems with normality and linearity can be addressed by transforming the data and various methods exist to do this (Field, 2013). Logarithm transformation (log(Xi)) was

selected as it is useful for transforming data that is skewed. The key variables of BACC, FACC, SACC, DFWD and TIMP were converted into logs using functionality within SPSS and the regression analysis was re-run. The revised model (shown below) places greater emphasis on the accuracy of student numbers and the time spent preparing the budget, but less on the difficulty caused by allowing unspent balances to be carried forward. The variable for the accuracy of forecasting is removed. It also introduces a new variable of institutional surplus, but at a low negative predictor value, indicating little influence on budgeting accuracy despite being included in the model.

178

In document Budgeting and Financial Planning in UK Universities: Accuracy, Caution and Control in an Era of Financialisation (Page 170-178)