RESEARCH STRATEGY 6.1 Introduction
6.7 Data Analysis
6.7.7 Two-Way factorial analysis of variance
The main computations in the prospective study would be derived from analysis of variance. The appropriate model technically is called a two-way between-groups (independent groups) factorial analysis of variance (Kirkpatrick & Feeney, 2007, pp. 49-57). This advanced statistical method is a powerful method of analysis since it assesses main effects, interaction effects and also accounts for error factors. One issue that researchers have to contend with is that results on computer printouts and textbook explanations vary somewhat although more experienced researchers would notice the similarities. The discussion is a general overview of the method while introducing willing learners to an understanding of the computations and concepts that are associated with three-way classification analysis of variance (Ferguson & Takane, 1989, pp. 297-320). Calculation procedures are done in six steps that are illustrated and explained by means of one univariate frequency table, several cross-tabulations, and an array of formulas that is presented and discussed sequentially as the statistical
1 2 3 4 5
Long-term Frequency f11 f12 f13 f14 f15 Row freq1
contract Row percentage %R11 %R11 %R11 %R11 %R11 Row%
Column percentage %C11 %C12 %C13 %C14 %C15 Column freq2
Percentage of sample Tots11 Tots12 Tots13 Tots14 Tots15 Total%1
Short-term Frequency f21 f22 f23 f24 f25 Row Total2
contract Row percentage %R12 %R22 %R32 %R42 %R52
Column percentage %C21 %C22 %C23 %C24 %C25 Column freq2
Percentage of sample Tots21 Tots22 Tots23 Tots24 Tots55 Total%2
Row f1 Row f2 Row f3 Row f4 Row f5 Grand Total
121 model evolves. The intact data set, irrespective of being ordered or unordered, is used for calculations (See Table 6.2).
Table 6.2 Listing of Individual Scores on the First Dependent Variable
Step 1 requires the intact data set. A key statistic in multivariate analysis of variance is the grand mean that is calculated across all N observations (N being the number of subjects involved in the study). The raw scores of the 128 subjects (X1 to X128) are
added up. The formula for the grand mean is:
= ( Formula 12
In step 2 the researcher uses the grand mean to calculate the total sum of squares that is denoted as SS Total. The formula for this statistic reads:
SS Total = ( Formula 13
The grand mean is subtracted from the raw scores (X1 to X128) of every subject. The 128 difference scores are squared and added up. The SS Total can be divided up in its additive parts. The degrees of freedom of SS Total are equal to N – 1.
From this point onward, the intact (original) data set is split up and rearranged in terms of the three independent variables: firstly, according to the two contract subgroups, then in terms of the two gender subgroups, and finally according to the three age cohort groups. This split-up is illustrated in Table 6.3.
Note that the intact data set was rearranged into three subsets, in accordance with the categories of the three independent variables. This data transformation would assist the researcher to compute statistical outputs that could answer three critical
Total Sample X1 . XN N = 128 Grand Mean
122 questions with regard to the effects of the independent variables on the dependent variable:
Did the test scores of executives with long-term contracts differ significantly from the test scores of their fellow-executives with short-term contracts?
Did the test scores of male executives differ significantly from the test scores of female executives?
Did the test scores of executives in the 20-40 year, 41-50 year, and ≥ 51 year age cohort groups differ significantly?
Table 6.3. Listing of Individual Scores on the First Dependent Variable, According to the Categories of the Independent Variables Contract Term, Gender, and Age Cohort Group
Main Effects Independent Variables
Contract Term Gender
Long-Term Short-Term Male Female
X1 X1 X1 X1
X2 X2 X2 X2
. . . .
XN XN XN XN
Nj1 = 76 Nj2 = 52 Nj1 = 86 Nj2 = 42
Mean L-T Mean S-T Mean M Mean F
Subgroup Subgroup
Mean T.1.1 Mean T.1.2 Mean T .2.1 Mean T .2.2 Age Cohort Group
21-30 Years 31-50 Years ≥ 51 Years
X1 X1 X1
X2 X2 X2
XN XN XN
Nj1 = 54 Nj2 = 42 Nj1 = 32
Mean 21-40 Mean 41-50 Mean ≥ 51
Subgroup
123 The categories of the three independent variables represent the main effects of the study. Since statistical manipulation of one independent variable did not involve any of the two remaining independent variables, testing for statistical significance was restricted to examination of within group differences. In step 3, three formulas, one for each independent variable, are used:
SS Contract = nCΣ( - ... Formula 14
SS Gender = ncG - ... Formula 15
SS Age Cohort Group = nacG - ... Formula 16
The three subgroup means for the above data split are (mean L-T + mean S-T)/ 2, (mean M + mean F)/2, and (mean 21-40 + mean 41-50 + mean ≥ 51)/3. The three SSIV are
designated as SS Contract, SS Gender, and SS Age Category.
The pending study examined three biographical variables: the primary variable Contract Term (Long- or Short-Term), the secondary variable Gender (Male and Female), and the tertiary variable Age Cohort Group (categories 21-40 Years, 41-50 Years, and ≥ 51 Years). The main effects determine whether differences between the two or more categories of each biographical variable are statistically significant or indeed insignificant (testing for within group differences). The data split for the three main effects is illustrated in Table 6.4.
The calculations in step 4 are interim procedures that are related to analyses of the sets of interactions among the independent variables examined in the current study. The following formulas are used for this purpose:
The calculation procedures for the interim statistics are similar although the number of categories per variable might differ. The grand mean is subtracted from each of the two or three group means, squared hereafter, and consecutively multiplied by the N subjects and g categories of the specific independent variable.
A second set of interim statistics is calculated in step 5 to yield the sum of squares for the cells. The formula reads as follows:
124
SS Cells = nΣ ( Formula 17
The SS Cells are derived from the above formula. The grand mean is subtracted from the seven category means, namely means L-T , S-T, M , F, 21-40, 40-51 and ≥51, hereafter
squared and multiplied by N.
In step 7 the interactions effects are calculated by the following formulas: SS Contract x Gender =SS Cells – SS Contract - SS gender Formula 18
SS Contract x Age Cohort group =SS Cells – SS Contract - SS Age Cohort group Formula 19
SS Gender x Age Cohort group =SS
Cells –SS gender – SS Age Cohort group Formula 20
Table 6.4 demonstrates analysis of variance of a three-way classification.
Table 6.4. Listing of Individual Scores on the Interaction Contract x Gender, Contract Term x Age Cohort Group, and Gender x Age Cohort Group
Interaction Effects Independent Variables
Contract-Term x Gender Contract Term x Age Cohort Group
Gender Age Cohort Group
Male Female 21-40 Years 41-50 Years ≥51 Years Contract Long-Term X1 X1 X1 X1 X1 X2 X2 X2 X2 X2 . . . . . XN XN XN XN XN N L-T x M = N L-T x F = N L-T x 21-40 = N L-T x 41-50 = N L-T x ≥ 51 =
Mean Mean Mean Mean Mean
L-T x M L-T x F L-T x 21-40 L-T x 21-50 L-T x ≥ 51 Short-Term X1 X1 X1 X1 X1 X2 X2 X2 X2 X2 .. .. .. .. .. XN XN XN XN XN N S-T x M = NS-T x F = N S-T x 21-40 = NS-T x 41-50 = N ST x ≥ 51 =
Mean Mean Mean Mean Mean
S-T x M S-T x F S-T x 21-40 S-T x 21-50 S-T x ≥ 51
Age Cohort group
21-40 Years 41-50 Years ≥ 51 Years Gender
125 The degrees of freedom for the interaction effects are the number of categories of the first independent variable minus 1, multiplied by the number of categories of the second independent variable.
Male X1 X1 X1
X2 X2 X2
. . .
XN XN XN
N Mx 21-40 = NM x 41-50 = N Mx ≥51 =
Mean= Mean= Mean =
Mx 21-40 NM x 41-50 Mx ≥51 Female X1 X1 X1 X2 X2 X2 . . . XN XN XN N Fx 21-40 = NF x 41-50 = N F x 21-40 =
Mean = Mean= Mean=
126 Analysis of variance of three-way classification of is complex as the effects of three independent variables on a single dependent variable are examined. The effect of interaction occurs when two or more independent variables produce a joint effect over and above their main effects (Whitley, Jr., 2002, pp. 204-207). Use of this statistical method assumes subsamples of equal size. Whenever this requirement is violated, the statistical method applies general linear modelling to estimate group and subgroup means.
Table 6.5. Example of Printout with Results of
Multivariate Factorial Analysis of Variance
The trans
form ation that the original data set has to undergo to compute statistics that demonstrate the influence of three main effect variables on a dependent variable. The higher level of complexity is clear.
Type III Sum Degrees of Mean Level of
Source of Squares Freedom Square F Ratio Significance
Corrected Model 7873.418 11 414.390 1.313 0.191 Intercept 123824.607 10 123824.607 392.248 0.000
Contact Term 36.420 1 36.420 0.115 0.735
Gander 19.204 1 19.204 0.061 0.806
Age Cohort Group 3181.374 2 636.275 2.016 0.082 Contract * Gender 324.465 1 324.465 1.028 0.313 Contract * Age 1873.043 2 468.261 1.483 0.212 Gender * Age 1465.703 2 366.426 1.161 0.332 Contract * Gender *
Age Cohort Group 1425.347 2 475.116 1.505 0.217 Error Factor 34093.361 116 315.679
Total Sum of
Squares 322638.264 128 Corrected Sum of
127 The primary calculations in the current study were done by means of analysis of variance of three-way classification (refer to Subsections 9.4.1 to 9.4.20 of Chapter 9. An example of a computer print-out of an analysis of variance of a three-way classification is presented in Table 6.5.
The researcher examined the main, two-way and three-way interaction effects of Length of Contract Term x Gender X Age Cohort Group on the 20 dependent variables that were selected for the purposes on the study.
While the prospective study, technically, was a factorial analysis of variance of three- way classification (Ferguson & Takane, 1989, pp. 297-320; Howell, 2004, pp. 399- 423). The computation procedures relatively similar to those that were required for a two-way between-groups factorial analysis of variance design with independent group comparisons (Kirkpatrick & Feeney, 2007, pp. 49-57). The chosen version of a multiple analysis of variance is a powerful statistical method that analyses main effects, or effects that are attributed to a specific independent variable. The main effect of a specific independent variable is not influenced by effects of any of the remaining independent variables in the data set. Interaction occurs when two or more independent variables combine to produce a joint effect over and above their main effects (Whitley, Jr., 2002, pp. 204-207). Use of this statistical method assumes subsamples of equal size. Whenever this requirement is violated, the statistical method applies general linear modelling to estimate group and subgroup means. The statistics that are required for understanding and interpreting tables of two-way and three-way analysis of variance are the computed F ratio and its associated level of significance, both of which appear on the right-hand side of the ANOVA table. If the computed F ratio is equal to or greater than the F critical value, the contrast is judged as being statistically significant and appropriate for further analysis. In such a case, the significance level would be equal to or less than 0.05. If the computed F ratio is less than the F critical value, the numeric value of the level of significance would exceed p = 0.05. In this the case, the contrast would be interpreted as being insignificant. The critical F value for contrasts that involve Length of Contract Term and Gender is 3.91 with 1 and 127 degrees of freedom, provided that the level of
128 significance is preset at 0.05 and one-tailed hypothesis testing, or directional testing, is done. In the case of the main effect Age Cohort Group, the F critical value is 3.06, with 2 and 127 degrees of freedom, hypothesis testing is one-tailed and the level of significance is preset at the 0.05 level (or 5% level).