Design for Testing Hypotheses using Quantitative Data

AND PARADIGM

Step 8: Undertake MICMAC Analysis Classify Variables

6.13 Design for Testing Hypotheses using Quantitative Data

According to Black (2010), the data available to the researcher dictates how the statistical analysis will be performed. The basic level of data obtainable from the results of the questionnaire survey of this research has been ordinal and categorical in nature (Moore, 2010). The values of the data were ‘labels’ in part as well as categorical (Moore, McCabe, Alwan, Craig & Duckworth, 2011) hence the central theme of the analysis could be centred on descriptive statistics using bars and graphs

174 as well as non-parametric statistical testing (Sounderpandian, 2009). The ultimate reason for adopting non-parametric statistical testing to underpin this research has been the ordinal nature of the data, which facilitated the ranking of the data using categories (Heiman, 2011). This implies that stringent assumptions about the population (N) for the research are not warranted (Sounderpandian, 2009), hence the sample (N) would provide the necessary outcomes to conclude the research. Therefore, Sounderpandian explains that the commonly held views and assumptions of ‘normal distribution’ would not be necessary in this regard. Before the implementation of the non-parametric statistical analysis of the data, it was imperative to undertake deeper examination of the relationships between the responses using correlation (Black, 2010; Heiman, 2011).

6.13.1 Examining Data Relationships Using Correlation and Multiple

Regression

According to Pallant (2005), it is possible to explore the strength of the relationship between variables of different sets of questions in order to gather an indication of the general direction of the relationship, should there be one. The ideal tools for the exploration of the relationships between the data variables were correlation, (Black, 2010; Moore, 2010) and linear or multiple regression analysis (Pallant, 2005).

6.13.1.1Correlation coefficient

Heiman (2011, p.136) states that a correlation coefficient is, “the descriptive statistic that, in a single number, summarizes and describes the important characteristics of a

relationship”. The outcome – called the correlation coefficient value – “quantifies the

pattern in a relationship, examining all X–Y pairs at once” (Heiman, 2011, p.136).

There is no other means by which a statistic can be used to explain the complex relationships between variables in as simplified a manner as the correlation coefficient (Black, 2010; Heiman, 2011). This means that the preliminary interpretation of relationships between variables of different questions can be made possible using a single statistic. According Moore et al. (2011) the correlation r between variables for x and y is determined using the following mathematical model, shown in Equation 6- 1.

175

Equation 6-1: The Correlation Mathematical Model (Moore et al., 2011, p.93)

According to Moore (2010) there are four key areas of interest with the correlation statistic:

(i) The correlation coefficient statistic (r) does not distinguish between independent and dependent variables, meaning that it does not matter which variable is called X and which is called Y; the outcome is the same; (ii) The correlation coefficient statistic (r) uses standardised values of the

observations, meaning that the statistic r would not change with the changes in the units of measure that is allocated to the variables. It implies that variables X and Y are neutral when it comes to measurement units – it can take any unit of measure, such as kilogrammes, metres and the like, as the statistic r would be the same;

(iii) When the outcome of statistic r is a positive value, it becomes an indication that there is a positive relationship between the variables, and when statistic r has a negative outcome, it indicates that there is a negative relationship between the variables;

(iv) The correlation coefficient statistic (r) is always a number between minus one (−1) and positive one (1); however, if the value of r is near 0, then it can be an indication of a very weak linear relationship (Moore, 2010). On the contrary, “the strength of the linear relationship increases as r moves

away from 0 toward either −1 or 1” (Moore 2010, p.107). When the data

is plotted on a scatter plot, values of r close to –1 or 1 indicate that the points in a scatterplot lie close to a straight line. “The extreme values r = − 1 and r = 1 occur only in the case of a perfect linear relationship, when

the points lie exactly along a straight line” (Moore 2010, p.107).

(v) The following rules of thumb for describing the size of the correlation coefficient (r) are a measure to determine the size of an effect; Cohen (1988) gives the following:

Small effect = .10, Medium effect = .30, Large effect = .50

                     _     s y y s x x y i x i n r _ 1 1

176

6.13.1.2Gamma statistic as a precursor to multiple regression

Gamma statistics are based on a comparison of respondents to see if the tested variables are concordant or discordant (Blaikie, 2003). Tabachnick and Fidell (2013) suggest that when a survey adopts categorical data, responses should be on at least a seven-point scale. However, at the time of designing the survey, five-point scales were deemed acceptable. To resolve this potential weakness in the data capture method, it was necessary to undertake and report using gamma statistics as a precursor to multiple regression. The gamma statistic is particularly useful for testing the association between two variables when there are likely to be a large number of tied ranks (Siegel & Castellan, 1988). One problem with the above gamma statistics is the non-independence of relationships between variables. It could therefore be suggested that gamma statistics are useful as a precursor to multiple regression analyses, instead of the Pearson's r correlation coefficients, which would usually be presented (Moore et al., 2011). This is primarily because gamma statistics would be reasonable in the present circumstances where the Likert scale has five points within variables (Siegel & Castellan, 1988). It was, therefore, justified that gamma statistical analysis was conducted on the data as a precursor to multiple regression. For this research, gamma statistical analysis was undertaken using SPSS, and the results indicated in section 7.10.1, as a precursor to multiple regression. Gamma ranges from -0.1 to +0.1 (Blaikie, 2003).

6.13.1.3Multiple linear regression analysis

Pallant (2005, pp.95-6) stated that multiple regression is, “a more sophisticated

extension of correlation”, which can be useful when there is a need to “explore the

predictive ability of a set of independent variables on one continuous dependent

measure”. Therefore, the rationale for the choice of multiple regression is to ensure

that it could be possible to compare the, “predictive ability of particular independent

variables and to find the best set of variables to predict a dependent variable”

(Pallant, 2005, pp.95-6). Moore et al. (2011) explained that correlational analyses are usually presented as a precursor to multiple regression analyses because they play the role of descriptive statistics, which help to provide a context for considering regression coefficients. Therefore, although Tabachnick and Fidell (2013) imply that the use of multiple regression analysis may be problematic when scales have fewer

177 than seven points, it was thought useful to conduct multiple regression analyses to test whether the different independent variables mentioned in each hypothesis were independently predictive of the different dependent variables. The use of multiple regression analysis would be ideal because this would cut down the number of analyses performed and (more importantly) test whether relationships between independent variables and the dependent variable in any given analysis were independent.

To ensure that multiple regression statistics are not biased, it is usually taken to be important to ensure that residuals in an analysis are normally distributed (and it is fairly easy to assess whether this is the case using SPSS). Where residuals are not normally distributed, one of two courses of action is usually recommended: a) transforming the variables; or b) bootstrapping (Pallant, 2005).

6.13.2 Chi-Square Test of Association

According to Sounderpandian (2009), the use of non-parametric statistics has been essential in situations where the stringent assumptions a study should make about the population are not really needed, hence the issue of normal distribution does not need to be included. Non-parametric statistical tests are therefore useful for handling categorical (or nominal) data (Sounderpandian, 2009), especially in the case where the level of association or dependency between the categorical responses to two sets of questions can be assessed (Black, 2010). The Chi-square test of association or independence is one such statistical method that falls into the category of non- parametric statistical tests; it shows the probability value of the level of association of independence between variables (Black, 2010). Moore et al. (2011) stated that the level of significance – alpha (α) – could be said to mean the area in the tail of the Chi- square probability distribution curve. In most cases, the p value of 0.05 is the standard benchmark for accepting with confidence that there is some form of association and or independence. This means that the area in the tail is 0.05, and this is also called the ‘rejection region’ (Moore et al., 2011). If the final value of the test falls in the rejection region, it is possible to reject the null hypothesis. The critical point, therefore, should be obtained from the Chi-square table, and it separates the tail (rejection region) from the rest of the curve, indicating the demarcation.

178 This critical value is a Chi-square value since a Chi-square test is being used, in the 0.05 area (column) in the Chi-square table. It is therefore possible to see the critical value, associated with the degree of freedom from the row of the table (Black, 2010;

In document A framework for talent management to support the 2030 knowledge based economic vision for Qatar (Page 173-178)