1
Chapter 9
Analysis of Nonorthogonal Data
Orthogonal data:
The concept of orthogonality of data is associated with two or higher way classified data. Consider the set up of two way classified data.
Let A and B be two factors at p and q levels respectively.
Let nij: number of observations in (i, j)th cell.
Let yijk kth observation in cell, i1, 2,..., ;p j1, 2,..., ;q k1, 2,...,nij
th
`
: marginal total correspondence to level of A =
where : cell total
: marginal total correspondence to level of
.
Let
; Gra
th
i ijk
j k
ij i
ij ijk
k
j ijk
i k
ij i
i j
A y i
T
T y
B y j B
T
A B G
i
nd total
, ,
ˆ : marginal mean of
ˆ : marginal mean of
j
ij io ij oj io oj
j i j j
i i
ij j
j j
ij j
n n n n n n n
t A A
n
b B B
n
If any contrast ˆ
i i i
l t of marginal means of A is orthogonal to any contrast ˆj j j
m b of the other marginal means, then the data are called orthogonal, otherwise non-orthogonal.If each cell has constant number of observations then the data are orthogonal as shown below.
2 Note that in this case io q ij q , j .
j j
n
n
nnq n npThe definition extends to higher classification if we treat the marginal means of every pair of factors of classification.
1
ˆ 1 ( ... ).
i i i
i i i i iq
i i
l A
L l t l T T
qn qn
Similarly
1
ˆ 1 ( ... )
j j
j
j j j j pj
j n j
m B
M m b m T T
B pn
.The sum of products of the coefficients of identical observations in the two contrasts is
2
1 1
i j i j 0.
i j i j
l m n l m
pqn pqn
So the data are orthogonal as the two contrasts are orthogonal.
When cell frequencies are proportional, i.e.,
' '
( , ' 1, 2,..., ; 1,..., )
ij j
ij j
n C
j j r i p
n C
then also the data are orthogonal.
The same is true for higher order classification if the number of observations in the ultimate cell is constant.
Analysis of non-orthogonal two way data:
When the data are non-orthogonal, then the analysis is no longer simple as the straight solution of normal equations is not available. Consider the usual two way model as
yiik i jijk i1,..., ;I j1,...., ; J k1,..., .nij
2
Assume that ijk' areidentically and independently distribution ass N(0, ). Using the least squares principal, we minimize the sum of squares due to error as
3
2 2
1
( )
ˆ ˆ
0 ˆ (1)
ˆ ˆ
0 ˆ ( 1, 2,..., ) (2)
ˆ ˆ
0 ( 1, 2,..., ) (3)
From (3), we obt
ijk ijk i j
i j k i j k
oo io i oj j ijk
i j i j k
io io i ij j ijk i
j j k
oj i oj j ijk j
i k
j
E y
E n n n y G
E n n n y A i I
E n n y B j J
ˆ ˆ ˆ
ain (we use instead of to avoid confusion)
ˆ ˆ ˆ ˆ 1 ˆ
Put (2),
j
j oj
j m m
oj oj m
j
j io io i ij mj m i
j oj oj m
n n m i
n n
in n n n B n A
n n
or
2 2
ˆ 1 ˆ
ˆ ˆ
ˆ 1 ˆ
ˆ . 1 ˆ ˆ
j
i ij io io i ij ij mj m
j oj j j oj
io i ij mj m
j oj m
ij ij
i i ij mj m i
j oj j oj m j oj
A n B n n n n n
n n
n n n
n
n n
n n n
n n n
or
2
ˆ ˆ
j ij ij mj
i ij i io m
j oj j oj m i j oj
i ii im
B n n n
A n n
n n n
Q C C
or
ˆ ˆ ( 1, 2,..., ) (4)
i ii i im m
m i
Q C C i I
These are referred to as the reduced normal equations.
is called the adjusted treatment total of th
Qi i level of A .
2
, , .
ij ij mj
ii io im im mi
j oj j oj
n n n
C n C C C
n n
4 These I equations in (4) are not independent write
1 11 1 1
1
2 22 2 2
2
ˆ ˆ
ˆ ˆ
ˆ ˆ .
m m
m
m m
m
I II I Im m
m I
Q C C
Q C C
Q C C
Sum them all on left and right hands of equality sign
1 1
1
.
I I
ij j
i i
i i j oj
I
ij j i
i i j oj
Q A n B
n A n B
n
Using normal equations
1
ˆ ˆ
ˆ ˆ
ˆ ˆ
ˆ ˆ
ˆ ˆ
ˆ ˆ
ˆ ˆ
ˆ ˆ
I
ij
i io io i ij j oj oj oj ij i
i i j i j oj i
ij
io io i ij j oj oj j ij i
i j i j oj i
i i ij j ij j i j
i i j i j i j
Q n n n n n n n
n
n n n n n n n
n
n n n n
0.
Consider right hand side of (4) and sum over i
(4)
ˆ ˆ0 (using normal equations)
ii i im m
i i m i
RHS of C C
The I equation in (4) are not independent.
So no unique solution exists
if ˆ (1... )i I is a set of solution then ˆ i (i1, 2,..., )I is also a set of solution where is a constant.
To get unique solution, impose a condition ˆi 0
i
.ˆ 'i s
are estimated as deviates from their mean.
5
As a matter of fact restrictions need not be
ˆi 0 always, it can be any linear functions of ˆ 'i s other than their contrasts. Such restriction changes only.After obtaining set of solution from (4) for ˆ ,i's we can obtain the solution of ˆ'
js
from (3) if so required.
Further, the error sum of squares is
2
2
2
2 2
ˆ ˆ ˆ
( )
ˆ ˆ ˆ
ˆ
ˆ ˆ
ˆ . (5)
ijk i j
ijk i i j j
i j k i j
mj m
j m
ijk j j
i j k j oj oj
j
ijk i i
i j k j oj i
E y
y G A B
B n
y G B m
n n
y B Q
n
Here in this case, we eliminated ˆj and obtained the error sum of squares by obtaining ˆ .i
Now we consider the other way round, i.e., eliminate ˆ ,i obtain ˆj and then obtain the sum of square due to error. So doing so, we eliminate ˆi and obtain the error sum of square as follows:
2
2 i ˆ
ijk j j
i j k i io j
E y A R
n
(6) where R is the adjusted total of j j level of B given by th.
ij i
j j
i io
R B n A
nBoth error sum of squares (5) and (6) are the same, so
2 2
ˆ j ˆ
i
j j i i
i io j j oj i
A B
R Q
n n
or
2 2
ˆ ˆ .
i j
i i j j
i io j oj i j
A B
Q R
n n
(7)6 Now under
0 1 2
* 2
*
*
: ... 0
we get the model as
Now minimize the sum of squares due to error as
( )
ˆ ˆ 0
ˆ ˆ 0
or ˆ
k
ijk j ijk
ijk j
i j k
oo oj j
j
oj oj j j
j
H y
E y
E n n G
E n n B
ˆ.
j j
oj
B
n
The error sum of squares is
21
2 2
ˆ ˆ
.
ijk j
i j k
j ijk
i j k j oj
E y
y B
n
(8)The sum of squares due to AE1 obtained by E equation (5) - equation (8) ˆiQi which is called as the adjusted sum of squares due to A whereas
2 2
Unadjusted sum of squares due to i
i io oo
A G
A
n n .Once adjusted sum of squares due to A is obtained, the adjusted sum of squares due to B can be obtained from (7). The analysis of variance table is given as follows.
Source Degrees of freedom
Sum of squares Mean square F
(adjusted) A
(unadjusted) B
Error
1 I
1 J
1 IJ I J
ˆi i
i
Q
2 2
j
j oj oo
B G
n n
error
SS (by subtraction)
(adjusted) 1 MSA SSA
I
(unadjusted) 1 MSB SSB
J
1
error
MSE SS
IJ I J
MSA MSE
Total IJ 1 2 2
ijk
i j k oo
y G
n
7 The sum of squares due to errors
2 2
2 2 2
2
1
ˆ
ˆ
= Total S.S. - S.S. block(unadjusted) - S.S. treat(adjusted) (9)
j
ijk i i
i j k j oj i
j
ijk i
i j k oo j oj oo i
E y B Q
n
G B G
y Q
n n n
andalso
2 2
2
2 2
2
ˆ
ˆ (10)
= Total S.S. - S.S. treat(adjusted) - S.S. block(unadjusted)
i
ijk j j
i j k j io i
i
ijk j j
i j k oo j io oo i
E y A R
n
G A G
y R
n n n
For the Fisher - Cochran theorem to be used here, we need to have
Total = SS(treat) + SS(block) + SS Error which is possible only under (9) or (10).
Total SS SS treatment(adjusted) + SS block(adjusted ) + SS Error
When interaction effect is present
When interaction effect is present, then the model is
, 1, 2,..., ; 1, 2,..., .
ijk i j ij ijk
y i I j J The normal equation for ij's are
ˆ ˆ ˆ
ˆ
( ).
ij ij i j ij
T n The error sum of squares is
2
2 2
ˆ ˆ ˆ
( ˆ )
.
ijk ijk i j ij
i j k
ij ijk
i j k i j ij
E y y
y T
n
8
Under the null hypothesis that all ij's are zero, we already have derived the error sum of squares as
2
2 j ˆ
ijk i i
k j oj i
E y B Q
n
.Hence, the sum of squares due to interaction is
2 2
2 ij j ˆi i.
i j ij j oj i
T B
E E Q
n n
It has (I1)(J degrees of freedom, if there is at least one observation in each cell; otherwise it is 1) reduced by the number of cells having no observation.
Estimate of treatment contrast and its variance
Let i i
i
L
be a treatment contrast. The estimate of i is a linear function of Q s . Hence L is i' estimated by another linear function of .Q iˆ ˆ
Let i i i i
i i
L
q Q .If ˆ 'i s are available as linear functions of Q s then i' , q s can be obtained easily. However, i' q s can i'' be obtained as follows:
ˆ ˆ
Since i ii i im im, 1, 2,...,
m i
Q C C i I
1 2 2
ˆ ˆ ˆ ˆ
so i i i i i ... iI I
i i
q C C C
.Equating the coefficients of ˆi in this identity, we get ( 1, 2,.., ).
i m mi
m
q C i I
This is the same as the normal equation
ˆ ˆ
ii i im im i
m i
C C Q
except that Q s are substituted by i' i's and the unknown ˆ 'i s have been written as 'q s . i
Hence 'q s can be obtained from the solution of the same normal equation. i
9 Now
2 2
2
2 2
2
2
( )
( , )
So
( )
.
ij i
i i ii
i oj
ij i mj j
i m i m im
i oj i oj
i i i ii i m im
i i m
i m im
i m
i i i
Var Q Var A n B C
n
n B n B
Cov Q Q Cov A A C
n n
Var q Q q C q q C
q q C q
In particular ˆ ˆ 2
( i im) ( i m) Var q q
where q and i q are the coefficients of m Q and i Q respective in the expression giving the estimate of m
ˆ ˆ
( i m).
The sum of squares of a contrast is the square of the contrast is the square of the contrast divided by the
coefficient of 2 in its variance. Hence, the sum of squares of the estimate of i i
i
is2
.
i i i
i i i
q Q q
When data is proportionate to cell frequencies:
When the cell frequencies are the same in a column (row), though it may vary from column to column, then non-orthogonal type of data is obtained. Suppose the frequencies in each row of the i column be th
n . Then i
2
.
i
ii i
C rn rn
N
where i
i
N
n and r is the number of rows, .i m
ij
C rn n
N
The reduced normal equations become
10
2
ˆ ˆ
ˆ ˆ
or .
i
i i i m m i
m i
i i
i i m m
m
n r
r n n n Q
N N
n Q
n n
N r
Imposing restriction mˆm 0,
m
n
the solution is obtained as ˆi i .i
Q
rn
In this case the adjustment sum of squares due to A can be obtained from
2 2
i
i i
A G
rn rN
and unadjusted sum of squares due to B is
2 2
i
j i
B G
rn rN
.Thus
2 1 1
ˆ ˆ
( i m) .
i m
Var n n n