5.7 Statistical Analysis of Data
5.7.3 Inferential Statistics
5.7.3.2 Multiple Regression Analysis
A pooled cross-sectional regression is estimated to account for different distributions on different time periods. The estimation process is much like a standard cross-sectional regression. Pooled regression estimation differs only in terms of the inclusion of dummy variables for all time periods, except for one period to avoid perfect collinearity (Brooks,
159
2008). Examples of studies that used pooled-cross sectional regressions in the field of earnings management include, Warfield et al. (1995), Guay et al. (1996), Becker et al. (1998), Kasnik (1999), Young (1999), Yeo et al. (2002), Hribar and Nichols (2007), Osma (2008), Cohen and Zarowin (2010) and Dechow et al. (2010).
To test the research hypotheses, Ordinary Least Squares (OLS) method is used to estimate the empirical models. However, certain assumptions must be met in order to make valid statistical inferences: normally distributed errors (i.e. normality), linearity, homoscedasticity, no autocorrelation, and no multicollinearity (Gujarati, 2003).
Yet before conducting a diagnosis for the empirical models, initial multiple regressions are carried out to identify outliers. According to Gujarati (2003, p.390), an outlier “is an observation that is much different (either very small or very large) in relation to the observations in the sample”. By definition, an outlier has a large residual in comparison with
other residuals. As such, outliers can bias the model because they affect the values of the estimated regression coefficients. Moreover, outliers impose difficulties in satisfying the assumptions of normality, linearity and homoscedasticity. To detect an outlier, Field (2009) suggests that a case of standardised or studentised absolute residuals greater than 2 is a cause for concern and hence, excluded from the data.
After the exclusion of outliers, the empirical models are conducted to check whether these models satisfy the assumptions underlying the method of ordinary least squares. Below is a discussion of each assumption, how to detect a violation in an assumption, and what remedies, if any, are applicable.
1- Normally distributed errors (Normality): the residuals in the model are random and normally distributed with a zero mean. Put simply, the differences between the model and the observed data are zero or very close to zero (Fields, 2009). To check this
161
assumption, Kolmogorov-Smirnov (K-S) test of normality is performed. The assumption is satisfied if the null hypothesis of normal distribution is not rejected. 2- Linearity: this assumption necessitates that the regression model is correctly specified.
The relationship should be linear and there is no specification bias or specification error (Field, 2009; Gujarati and Porter, 2010). As mentioned in Chapter Three, tests for earnings management can be fashioned in a linear framework around partitioning variable(s) (Dechow et al., 1995). To detect model misspecifications such as omitting relevant variable or the appropriateness of using a linear functional form, Regression Error Specification Test (RESET) is used. The assumption is satisfied if the null hypothesis of linearity is not rejected (Brooks, 2008).
3- Homoscedasticity: this assumption states that the variance of each residual should be constant. Otherwise, there would be what is called heteroscedasticity or unequal variance (Brooks, 2008). Gujarati and Porter (2010, p.281) state “In the presence of heteroscedasticity, the usual hypothesis-testing routine is not reliable, raising the possibility of drawing misleading conclusions”. This is because heteroscedasticity
bias the variances of OLS estimators and consequently, the estimators are no longer efficient. White’s General Heteroscedasticity test is used to test the null hypothesis of homoscedasticity. Should it be rejected, White’s estimators will be used to correct for heteroscedasticity.
4- No Autocorrelation: the residual terms should be uncorrelated (i.e. independent) for any two observations. This simply means that no systematic effect among residual should exist because otherwise, the dependent variable may depend not only on the predictors but also on other residuals terms such as lagged residuals in time series analysis. Durbin-Watson test can be used to test for autocorrelation (i.e. serial correlation). Gujarati (2003) and Field (2009) suggest the value of 2 as an indicator
161
for the absence of first-order autocorrelation. Field (2009) adds, values less than 1 and greater than 3 are cause for concern.
5- No Multicollinearity: there are no perfect linear relationships between explanatory variables. Put differently, explanatory variables should not correlate too highly because if so, the estimated parameter become untrustworthy and the predictors become less important (Field, 2009). Therefore, it is important to identify whether high collinearity exists among predictors. One way to do so is to scan a correlation matrix of all explanatory variables and whether they highly correlate. As a rule of a thumb, Brooks (2008) and Field (2009) state that correlation above 0.8 is a cause for concern. To that end, correlation matrix is constructed on the basis of both Pearson and Kendall’s tau Correlation Coefficients. Another way to diagnose multicollinearity is the Variance Inflation Factor (VIF). Statisticians suggest that a value of VIF greater than 10 signifies the existence of multicollinearity in the model (Myers, 1990, Field, 2009).
Finally, SPSS is also is used to perform Kolmogorov-Smirnov (K-S) test of normality for standardised residuals, Durbin-Watson test of autocorrelation, correlation matrices and variance inflation factor, and because the remaining tests are not available in SPSS, EViews 6.0 statistical software package is used instead to perform RESET test of linearity and White’s test of heteroscedasticity which if existed, White’s heteroscedasticity-consistent estimator is performed.
162
5.8. Summary:
The methodology and research design tend to be highly structured within the post-positivist paradigm (Patton, 2000). Consistent with this view, this chapter is constructed to accurately demonstrate the process of variables measurement, hypotheses development, empirical models construction, population selection procedures and the preparation for the statistical analysis of data.
In brief, this research employs data of manufacturing firms listed on Amman Stock Exchange to investigate the relationship between earnings management and corporate governance mechanisms. This chapter describes the measurement of four proxies for earnings management (i.e. abnormal accruals, abnormal cash flow from operating activities, abnormal production costs and abnormal discretionary expenses). Afterwards, hypotheses are developed based on the predicted relationship between each type of earnings manipulation and five of corporate governance deterrence mechanisms (i.e. ownership concentration, managerial ownership, institutional ownership and foreign ownership). Accordingly, four empirical models are developed to examine these relationships and the appropriate statistical analysis techniques are introduced.
163
Chapter Six
Data Analysis and Results
6.1. Introduction:
The purpose of this chapter is to test the research hypotheses concerning the effect of ownership structure and external audit corporate governance mechanisms through performing statistical tests on a population of manufacturing firms listed on Amman Stock Exchange (ASE). Due to the fact that managers may use several methods to manipulate earnings, four proxies for earnings management are measured separately so that each proxy becomes a dependent variable following previous research. As a result, the statistical analysis in this chapter comprises four empirical models corresponding to each dependent variable, which are: abnormal accruals model, abnormal cash flow from operating activities model, abnormal production costs model, and abnormal discretionary expenses model.
Two analyses are conducted based on the measurement of earnings management proxies. While earnings management proxies are considered in absolute terms in the main analysis, theses proxies are considered with their actual signs. Within each analysis, descriptive statistics are discussed and univariate analyses are conducted and discussed. Afterwards, multiple regression analyses are conducted to test the research hypotheses. The results obtained are then presented and interpretations are drawn. Finally, theoretical and practical implications for the association between earnings management practices and ownership structure and external audit mechanisms are demonstrated.
164