• No results found

92 Amortization expense(#65)

3.4. Diagnostic Tests

Consistent with accounting empirical studies, this thesis uses annual pooled data, which is a combination of both time-series and cross-section, and applies ordinary least squares (OLS) regression. The usual assumptions underlying OLS regression apply also to the analysis using accounting data, including linearity of the model, normality of the prediction error, homogeneity of variance across individual observations and groups of observations (homoscedasticity), no correlation between predictor variables (no multicollinearity) and that there are no errors in variables. The research described in this thesis has to consider these conditions and ensure that the data have met the assumptions of the regression analysis. Accordingly, the current study employs the following diagnostic tests.

3.4.1. Heteroscedasticity Tests

As data may be subject to time series error terms or cross-section error terms (or both), homogeneity of variance is an important assumption of OLS regression that must be tested. When the variance of the residuals is not equal across observations, this heteroscedasticity problem should be rectified. To mitigate this problem, it is usual to deflate variables by a size factor such as the number of outstanding shares, the market value of equity or total assets, either at the end of the fiscal year or on average during the year.

White's (1980) standard error test and the Breusch-Pagan test may be used to detect heteroscedasticity. In a recent investigation of the methods used in accounting studies to calculate standard errors, Gow et al (2010) suggest that allowing for standard

errors clustered by firm and time is likely to be robust to both time-series and cross- section dependence.

To test heteroscedasticity, the current study, following Barth et al (2001), Lev et al (2009) and Brochet et al (2009), scales all variables by the average value of total assets across firms in the dataset.

3.4.2. Multicollinearity Test

Multicollinearity is high correlation among two or more independent variables. This issue leads to unstable coefficients and inflated standard errors for the coefficients. Several methods to detect multicollinearity are well documented in the econometrics literature, although there is no failsafe method that works effectively in any conditions.

To detect multicollinearity, the variance inflation factor (VIF) should be calculated after running a regression. A high VIF indicates a high level of multicollinearity, with a VIF more than the cut-off point of 10 indicating a need for further investigation. The other rule suggested in the econometrics literature is to estimate pair-wise correlation between independent variables employed in each research model, relying on a correlation coefficient between independent variables that is more than 0.80 as an indicator that there is a serious problem (Gujarati, 2004, p.359).

With regard to multicollinearity, as Gujarati (2004) notes if the aim of the regression analysis is prediction, multicollinearity is often not treated as a serious issue, and the highest R2 is interpreted as indicating the best estimation per se. Nevertheless,

in addition to estimating pair-wise correlation between independent variables, to tackle the issue of multicollinearity, the current study also takes into consideration the variance inflation factor (VIF).

3.4.3.Autocorrelation Test

Autocorrelation is an issue in time-series data, when the error terms are not independent. The Durbin-Watson test is performed to detect serial correlation defined as the D-W statistic falling between 0 and 4. When equal to 2, the D-W statistic indicates that there is no first-order autocorrelation; if less than 2, there is positive serial correlation; if more than 2, there is negative serial correlation. The Arellano-Bond test (1981) for autocorrelation is used with panel data with cross-section and time-series structure, as employed in this thesis, and is also reported here in the context of OLS.110 This study uses the Arellano–Bond test to detect autocorrelation.

3.5.Summary

In this chapter, prediction horizons and the number of lags of predictor variables are specified. Then, the chapter describes the development of four research models: a cash flow model; an aggregate earnings model; a disaggregated earnings model (cash flow with aggregate accruals) and a full disaggregation model (cash flow with accruals components) which are tested in Chapter 5.

In addition, the chapter outlines the criteria for comparison of research models including in-sample estimations and out-of-sample predictions.

Furthermore, the chapter describes diagnostic tests that are performed in Chapter 5 to test the equality of the variance of the residuals, identify correlation between independent variables and detect serial correlation.

Chapter 4 will focus on the data and the sample; beginning with describing the key features of the data and variable definitions and provides a discussion of sampling

110 Roodman, D. 2006. How to do xtabond2: An introduction to “Difference” and “System” GMM in

issues in accounting research, then presenting the sampling process. Chapter 5 will present the preliminary empirical results of the research: descriptive statistics and model estimation, including in-sample estimations and out-of-sample accuracy tests.

Chapter 4

Data and Sample

4.1.Introduction

This chapter is structured as follows. Section 4.2 outlines the key features of the data used in this study. Section 4.3 presents the definition of variables and describes the two approaches to cash flow analysis used in the thesis, using either Cash Flow Statement information to calculate accruals, or alternatively the Balance Sheet changes method. Section 4.4 provides a more detailed discussion with relation to (a) the validation of data including firm coverage differentiation, (b) the nature of values that are unrecorded, missing or zero, (c) the effect of influential observations on the estimations, and (d) the impact of changes in fiscal year length, plus a comparison between the information in commercial databases and the source information in published financial statements. Section 4.5 specifies the sampling process and the sample specifications. Section 4.6 concludes with a summary.