OLS Regression assumptions - Multivariate regression analysis

3 Methodology

4.3 Multivariate regression analysis

4.3.1 OLS Regression assumptions

Multivariate techniques are all based on a fundamental set of assumptions representing the requirements of the underlying statistical theory. Hair et al. (2009) mention the assumptions to be met are (1) the linearity of the phenomenon measured, (2) the normality of the error term distribution, (3) the lack of perfect multicollinearity and (4) the constant variance of the error terms.

The first assumption refers to the relationship between the independent and dependent variables and represents the degree to which the change in the dependent variable is associated with the independent variable. The assumption of linearity is checked by individually plotting the

independent variables and the dependent variables in a scatterplot. From the visual examination, it becomes clear that there appears to be no relationship between FirmAge and FirmICC. The nonlinearity is robust for the use of the natural logarithm of FirmAge. Therefore, the variable FirmAge is not taken into account in the regression analyses. The relationships between the other independent variables and the dependent variables are linear. For the sake of brevity, the scatterplots are not reported.

The normality of the error term distribution, the second assumption, refers to the restriction that the residuals of the regression should follow a normal distribution. According the Hair et al. (2009), this assumption can either be tested by plotting a histogram of the residuals or by the application of a normal predicted probability (P-P) plot. Appendix D contains both the histogram and the normal P-P plot of the error term distribution in the multivariate regression in which WAL1, CoVCFV and all control variables are taken into account. The histogram shows a distribution that has the desired bell-shape, the normal P-P plot shows a distribution conform the normality line indicated in the plot. Therefore, it can be concluded that the assumption of normality is met. The histogram and the P-P plot remain similar after substituting WAL1 and CoVCFV for any other asset liquidity or cash flow volatility measure, for brevity purposes only the first histogram and P-P plot are presented in appendix D.

The third assumption, the lack of perfect multicollinearity, refers to the correlation among the independent variables as a key issue in interpreting the regression variate. Multicollinearity reduces the impact of any single independent variable’s predictive power by the extent to which it is associated with the other independent variables. Hair et al. (2009) elaborate several ways to identify multicollinearity. First, the most obvious way is to examine the correlation matrix for the independent variables. The presence of high correlations (generally .90 and higher) is the first indication of collinearity. From table 6 it becomes clear that apart from the correlation between FA_to_TA and WAL2 (r=0.890**), no signs of multicollinearity are present. Second, by assessing a measure of multiple variable collinearity, the outcome can be compared with a common cutoff threshold. The variance inflation factor (VIF), is such a measure and is calculated as the inverse of the tolerance value, which is the amount of variability of the selected independent variable not explained by the other independent variables. Hair et al. (2009) mention a common cutoff threshold of 10, indicating that a VIF value higher than 10 is considered to be problematic. Appendix D reports the VIF values attached to the independent variables in all regressions applied, showing that the highest VIF score is just above 5 for the variables FA_to_TA and WAL2. Therefore, multicollinearity is not considered to be a problem. Nevertheless, it raises the question whether or not to retain asset tangibility (FA_to_TA) as a regressor, as WAL2 measures partly the same. The goal of this study is to value the liquidity of a firm’s assets as an explanatory measure of the implied cost of equity capital, in order to do this as accurate as possible, it is desirable to separate the effect of asset liquidity from asset structure. Therefore, despite the debatability considering the higher VIF values, the FA_to_TA is kept in the analysis.

Next, the fourth assumption refers to the term homoscedasticity, which is the assumption that the variance of the residuals is homogeneous across levels of the predicted values. According to Hair et al. (2009), this assumption can be checked by plotting the predicted values and residuals on a scatterplot. Any patterns on the scatterplot indicate heteroscedasticity, which is the circumstance of unequal variability of residuals across predicted values. Thus, heteroscedasticity is the absence of homoscedasticity, which is a violation of the OLS assumption. Appendix D contains the visual representation of the relationship between the predicted values and the residuals of the regression analysis in the form of a scatterplot. The scatterplot reports the results

of FirmICC regressed on the WAL1, CoVCFV and control variables. Substituting the independent variables for other measures of asset liquidity or cash flow volatility yield similar results. As can be derived from the graph, there is a pattern in the relation between the predicted values and the residuals. Therefore, it can be concluded that the fourth assumption is violated.

According to Hayes and Cai (2007), White (1980) and Wooldridge (2012), violations of the homoscedasticity assumption can invalidate statistical inferences. The extend of the problem produced by heteroscedasticity depends on both the form and the severity of it. Heteroscedastic errors cause ordinary least squares estimates to be biased and inconsistent (Hayes & Cai, 2007). The net result is a type I error inflation or reduced statistical power for tests of hypotheses involving the regression coefficients, and inaccuracy of the estimates of the confidence intervals. Heteroscedasticity can take on different forms of patterns of errors and can be the result of a misspecified model or the modeling of bounded or limited outcome variables (Downs & Rocke, 1979; Perry, 1986). Lewis and Linzer (2005) elaborated on these findings, they state that quantities estimated from auxiliary data sets used as dependent variables, denoted as estimated dependent variables (EDV), often induce heteroscedasticity. The authors show that the most common approach to this problem, the weighted least squares, will usually lead to inefficient estimates and underestimated standard errors. Moreover, they show that ordinary least squares regressions with heteroscedastic consistent standard errors yield better results.

Reducing the effects of heteroscedasticity on inference by employing a heteroscedasticity-consistent standard error (HCSE) estimator of OLS parameter estimates is, according to Hayes and Cai (2007), a ‘highly appealing’ method. With this approach, the regression model is estimated using ordinary least squares, but an alternative method of estimating the standard errors is employed that does not assume homoscedasticity. The advantage of this method is that it requires neither knowledge about, nor a model of, the functional form of the heteroscedasticity (Hayes & Cai, 2007). Wooldridge (2012) even states that the use of an HCSE estimator in the OLS regression means that ‘we can report new statistics that work regardless of the kind of heteroskedasticity present in the population’ (p. 269). Hayes and Cai (2007) elaborates the several HCSE estimators that are built upon the earlier work of White (1980), HC0 to HC4. Using both the OLS estimator and the four HC methods, Cribari-Neto, Ferrari and Oliveira (2005) and Long and Ervin (2000) evaluate the empirical power functions of the t-tests of the regression coefficients and suggest the superiority of the HC3 over its predecessors because ‘it can keep the test size at the nominal level regardless of the presence or absence of heteroskedasticity’, while there is only a neglectable loss of power associated with HC3 when the errors are indeed homoscedastic (Hayes & Cai, 2007).

Within this study, the heteroscedasticity is considered to be an impediment to valid inference. The heteroscedasticity is potentially caused by the fact that the dependent variable is partly estimated on the basis of industry median returns, making the model a multilevel model with an estimated dependent variable that generally exhibits heteroscedasticity, as is elaborated by Lewis and Linzer (2005). To overcome the heteroscedasticity, the HC3 heteroscedasticity- consistent standard error estimator, as developed by Hayes and Cai (2007), is utilized.

In document Identifying the value of liquidity : cash flow volatility, real asset liquidity and the implied cost of equity capital (Page 51-53)