Parametric tests & Regression

4.3 Research method

4.3.2 Parametric tests & Regression

Univariate Analysis

Parametric independent t-tests and analysis of variance tests (ANOVA) were conducted to test the association between the dependent and the categorical independent variables. ANOVA is used to test for significant differences between two or more means. However, when there are only two groups the independent t-test also can be used to compare the means of the groups. In these cases the independent t-test and ANOVA F test will produce the same results21. Both ANOVA F and independent-t were used in this study to test the null hypothesis (alternative hypothesis) that there is no difference in the means of the groups (that one or more of the differences between means is significant). Where ANOVA indicates that more than two means are significantly different pair wise comparisons (post hoc tests: Hochberg`s GT2 test where population variances were equal and Games-Howell procedure where population variances were different) were also used to identify where the differences between the groups are. (Hunyadi and Vita, 1991; Field, 2009)

20_{in larger contingency tables at least 80% of the expected frequencies should be over 5%} 21_{equal variances assumed}

Pearson product-moment correlation coefficients (bivariate correlation) were calculated to measure the strength of the association between two continuous variables. The Pearson correlation coefficient ranges from -1 to +1. A correlation of +1 (-1) indicates that there is a perfect positive (negative) relationship between variables. A value of 0 indicates no linear relationship. (Hunyadi and Vita, 1991; Field, 2009)

The assumptions of parametric tests (normally distributed data; homogeneity of variances; the dependent variable measured at least at the interval level; independent observations; linear relationship between the variables in case of Pearson’s r) were checked. (Hunyadi and Vita, 1991; Field, 2009) The dependent variable(s) are continuous variables and the independence of the data collected about the different FTSE350 companies is highly likely.

There are different options available to decide whether the distribution derives from normal distribution or not (e.g. histogram and normal curve, skewness and kurtosis, formal tests of normality). In this study Kolmogorov-Smirnov and Shapiro-Wilk tests were used. However, if the sample size per groups is sufficiently large, then the sample distribution of any statistics will follow the normal distribution regardless the distribution of the variable in the population (Central Limit Theorem). The practical question is that how large the sample should be. It is suggested that sample size larger than 30 (Hunyadi and Vita, 1991; Field, 2009) is sufficiently large for symmetric or near symmetric (light- tailed) distributions. However, larger sample size is required for heavily-tailed distributions. (Hunyadi and Vita, 1991; Wilcox, 2012)

Levene’s test was carried out to test the homogeneity of variances. Significant Levene’s test result indicates that the homogeneity of variances assumption is violated. In these cases the more robust Brown-Forsythe F and Welch’s F were calculated instead of ANOVA F.

Univariate analysis is used in Chapter 5, Chapter 6 and Chapter 7.

Multivariate Analysis

Standard multiple regression models (forced entry) were used in this study. Most of the included independent variables were based upon past theoretical and empirical research

The multivariate linear regression model has several underlying assumptions which need to be assessed. These assumptions are the following:

1) the independent variables are categorical or quantitative and the dependent variable is “quantitative, continuous and unbounded” (Field, 2009, p220);

2) the model is specified properly (all relevant variables included, irrelevant excluded);

3) the relationship between the independent variables and the dependent variable is linear (linearity);

4) the residuals in the model are normally distributed (normality);

5) the residuals at each level of the independent variables are constants (homoscedasticity) and

6) the residuals are independent from one another (no autocorrelation).

Additionally, other issues such as outliers (influential cases) and highly correlated independent variables (multicollinearity) can cause problems in estimating the regression model. Thus, they are of great concern to regression analysis. (Tabachnick and Fidell, 2007; Field, 2009)

Outliers (univariate and multivariate) and influential cases were tested (graphical methods: histogram and normal curve, scatterplot; large residuals; Cook’s distance) because these observations can have a large influence on the overall model and on the estimated parameters. Incorrect data entry could be one of the reasons for the presence of an outlier. Therefore, data entry for outliers was double checked. The detected outliers (univariate: e.g. Tesco, BP, Shell; multivariate: e.g. W H Smith, Babcock International Group etc.) are important members of the sample, therefore deleting them was not an option. However, variable transformation (see later in Section 4.3.3) was considered to address their impact.

Multicollinearity is present when two or more explanatory variables in a multiple regression model are highly (but not necessarily perfectly) correlated. There is perfect multicollinearity when the correlation coefficient between two independent variables is equal to 1 or -1. However, in practice perfect multicollinerarity is rare. If the independent variables are highly correlated it is difficult to distinguish the individual effects of the variables. In the best regression models independent variables correlate highly with the

dependent variable but they correlate minimally with each other. From the several warning signals which can help to detect multicollinearity the correlation matrix and collinearity statistics (tolerances and Variance Inflation Factors, VIFs) were analysed. VIF greater (tolerance22_{smaller) than 10 (0.1) indicates that multicollinearity effects are}

present. When multicollinearity is detected its consequences need to be considered. The greater the multicollinearity the greater the standard errors are (wider confidence intervals for coefficients, small t-statistics). However, multicollinearity does not violate OLS (Ordinary Least Squares) and does not bias results. (Hunyadi and Vita, 1991; Tabachnick and Fidell, 2007; Field, 2009)

Table 4.1 Methods applied in the study to check the outliers, the multicollinearity and the assumptions of multivariate regression

The assumptions of linearity, normality, homoscedasticity and the independence of residuals can be checked by graphical methods and by different statistics. Most of these concentrate on the examination of the residuals. Table 4.1 is a summary of the methods applied in this study to check the outliers, the multicollinarity and the assumptions of the multivariate regression model.

Multivariate analysis is used in Chapter 5.

Linearity Normality Homoscedasticity No autocorrelation Graphical methods

Standardised residuals against the standardised predicted

values. √ √

Histogram of the standardised residuals (with normal

curve) √

Normal probability plot (observed cumulated probability

against the expected cumulated probability) √ Partial plots (scatterplots of the residuals and each of the

the predictors) √ √

Bivariate scatterplots between pairs of variables √ √ √

Statistics

Skewness √

Kurtosis √

Kolmogorov-Smirnov & Shapiro-Wilk tests √

Cook`s distance √

Tolerance & VIF √

Durbin-Watson statistics √

Correlation matrix √

based on Tabachnick and Fidell, 2007, Chapter 4 & 5 and Field, 2009, Chapter 7

Assumptions

Outliers Multicollinearity Method

In document The application of International Financial Reporting Standard 8 Operating Segments: evidence from UK companies (Page 82-86)

Parametric tests & Regression

4.3 Research method

4.3.2 Parametric tests &amp; Regression

4.3.2 Parametric tests & Regression