• No results found

Chapter 9 Analysis of Nonorthogonal Data

N/A
N/A
Protected

Academic year: 2022

Share "Chapter 9 Analysis of Nonorthogonal Data"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

(1)

1

Chapter 9

Analysis of Nonorthogonal Data

Orthogonal data:

The concept of orthogonality of data is associated with two or higher way classified data. Consider the set up of two way classified data.

Let A and B be two factors at p and q levels respectively.

Let nij: number of observations in (i, j)th cell.

Let yijkkth observation in cell, i1, 2,..., ;p j1, 2,..., ;q k1, 2,...,nij

th

`

: marginal total correspondence to level of A =

where : cell total

: marginal total correspondence to level of

.

Let

; Gra

th

i ijk

j k

ij i

ij ijk

k

j ijk

i k

ij i

i j

A y i

T

T y

B y j B

T

A B G

 





i

nd total

, ,

ˆ : marginal mean of

ˆ : marginal mean of

j

ij io ij oj io oj

j i j j

i i

ij j

j j

ij j

n n n n n n n

t A A

n

b B B

n

   

 

   

If any contrast ˆ

i i i

l t of marginal means of A is orthogonal to any contrast ˆ

j j j

m b of the other marginal means, then the data are called orthogonal, otherwise non-orthogonal.

If each cell has constant number of observations then the data are orthogonal as shown below.

(2)

2 Note that in this case io q ij q , j .

j j

n

n

nnq nnp

The definition extends to higher classification if we treat the marginal means of every pair of factors of classification.

1

ˆ 1 ( ... ).

i i i

i i i i iq

i i

l A

L l t l T T

qn qn

 

  

 

Similarly

1

ˆ 1 ( ... )

j j

j

j j j j pj

j n j

m B

M m b m T T

B pn

 

  

 

.

The sum of products of the coefficients of identical observations in the two contrasts is

2

1 1

i j i j 0.

i j i j

l m n l m

pqn pqn

 

 

   

  

  

So the data are orthogonal as the two contrasts are orthogonal.

When cell frequencies are proportional, i.e.,

' '

( , ' 1, 2,..., ; 1,..., )

ij j

ij j

n C

j j r i p

nC  

then also the data are orthogonal.

The same is true for higher order classification if the number of observations in the ultimate cell is constant.

Analysis of non-orthogonal two way data:

When the data are non-orthogonal, then the analysis is no longer simple as the straight solution of normal equations is not available. Consider the usual two way model as

yiik    ijijk i1,..., ;I j1,...., ; J k1,..., .nij

2

Assume that ijk' areidentically and independently distribution ass N(0, ). Using the least squares principal, we minimize the sum of squares due to error as

(3)

3

2 2

1

( )

ˆ ˆ

0 ˆ (1)

ˆ ˆ

0 ˆ ( 1, 2,..., ) (2)

ˆ ˆ

0 ( 1, 2,..., ) (3)

From (3), we obt

ijk ijk i j

i j k i j k

oo io i oj j ijk

i j i j k

io io i ij j ijk i

j j k

oj i oj j ijk j

i k

j

E y

E n n n y G

E n n n y A i I

E n n y B j J

   

  

  

 

    

      

       

      

 

  

 



ˆ ˆ ˆ

ain (we use instead of to avoid confusion)

ˆ ˆ ˆ ˆ 1 ˆ

Put (2),

j

j oj

j m m

oj oj m

j

j io io i ij mj m i

j oj oj m

n n m i

n n

in n n n B n A

n n

 

 

    

  

 

     

 

 

 

or

2 2

ˆ 1 ˆ

ˆ ˆ

ˆ 1 ˆ

ˆ . 1 ˆ ˆ

j

i ij io io i ij ij mj m

j oj j j oj

io i ij mj m

j oj m

ij ij

i i ij mj m i

j oj j oj m j oj

A n B n n n n n

n n

n n n

n

n n

n n n

n n n

   

 

  

   

    

   

   

   

 

   

        

   

 

   

or

2

ˆ ˆ

j ij ij mj

i ij i io m

j oj j oj m i j oj

i ii im

B n n n

A n n

n n n

Q C C

 

     

   

     

     

     

  

   

or

ˆ ˆ ( 1, 2,..., ) (4)

i ii i im m

m i

Q CCi I

 

These are referred to as the reduced normal equations.

is called the adjusted treatment total of th

Qi i level of A .

2

, , .

ij ij mj

ii io im im mi

j oj j oj

n n n

C n C C C

n n

 

 

(4)

4 These I equations in (4) are not independent write

1 11 1 1

1

2 22 2 2

2

ˆ ˆ

ˆ ˆ

ˆ ˆ .

m m

m

m m

m

I II I Im m

m I

Q C C

Q C C

Q C C

 

 

 

 

 

 

  

Sum them all on left and right hands of equality sign

1 1

1

.

I I

ij j

i i

i i j oj

I

ij j i

i i j oj

Q A n B

n A n B

n

 

   

 

  

 

Using normal equations

1

ˆ ˆ

ˆ ˆ

ˆ ˆ

ˆ ˆ

ˆ ˆ

ˆ ˆ

ˆ ˆ

ˆ ˆ

I

ij

i io io i ij j oj oj oj ij i

i i j i j oj i

ij

io io i ij j oj oj j ij i

i j i j oj i

i i ij j ij j i j

i i j i j i j

Q n n n n n n n

n

n n n n n n n

n

n n n n

     

     

   

 

   

         

 

 

   

 

   

         

 

 

   

   

    

   

   

0.

Consider right hand side of (4) and sum over i

(4)

ˆ ˆ

0 (using normal equations)

ii i im m

i i m i

RHS of CC

 

   

  

 The I equation in (4) are not independent.

So no unique solution exists

 if ˆ (1... )i I is a set of solution then ˆ i (i1, 2,..., )I is also a set of solution where  is a constant.

To get unique solution, impose a condition ˆi 0

i

 

.

ˆ 'i s

 are estimated as deviates from their mean.

(5)

5

As a matter of fact restrictions need not be

ˆi 0 always, it can be any linear functions of ˆ 'i s other than their contrasts. Such restriction changes  only.

After obtaining set of solution from (4) for ˆ ,i's we can obtain the solution of ˆ'

js

 from (3) if so required.

Further, the error sum of squares is

2

2

2

2 2

ˆ ˆ ˆ

( )

ˆ ˆ ˆ

ˆ

ˆ ˆ

ˆ . (5)

ijk i j

ijk i i j j

i j k i j

mj m

j m

ijk j j

i j k j oj oj

j

ijk i i

i j k j oj i

E y

y G A B

B n

y G B m

n n

y B Q

n

  

  

 

   

   

 

 

      

  



  

  

  

Here in this case, we eliminated ˆj and obtained the error sum of squares by obtaining ˆ .i

Now we consider the other way round, i.e., eliminate ˆ ,i obtain ˆj and then obtain the sum of square due to error. So doing so, we eliminate ˆi and obtain the error sum of square as follows:

2

2 i ˆ

ijk j j

i j k i io j

E y A R

n



(6) where R is the adjusted total of j j level of B given by th

.

ij i

j j

i io

R B n A

 

n

Both error sum of squares (5) and (6) are the same, so

2 2

ˆ j ˆ

i

j j i i

i io j j oj i

A B

R Q

n    n  

   

or

2 2

ˆ ˆ .

i j

i i j j

i io j oj i j

A B

Q R

nn    

   

(7)

(6)

6 Now under

0 1 2

* 2

*

*

: ... 0

we get the model as

Now minimize the sum of squares due to error as

( )

ˆ ˆ 0

ˆ ˆ 0

or ˆ

k

ijk j ijk

ijk j

i j k

oo oj j

j

oj oj j j

j

H y

E y

E n n G

E n n B

  

  

 

 

 

   

  

  

    

    



ˆ.

j j

oj

B

n

 

The error sum of squares is

 

2

1

2 2

ˆ ˆ

.

ijk j

i j k

j ijk

i j k j oj

E y

y B

n

 

  

 



 

(8)

The sum of squares due to AE1 obtained by E equation (5) - equation (8) ˆiQi which is called as the adjusted sum of squares due to A whereas

2 2

Unadjusted sum of squares due to i

i io oo

A G

A

nn .

Once adjusted sum of squares due to A is obtained, the adjusted sum of squares due to B can be obtained from (7). The analysis of variance table is given as follows.

Source Degrees of freedom

Sum of squares Mean square F

(adjusted) A

(unadjusted) B

Error

1 I

1 J

1 IJ   I J

ˆi i

i

Q

2 2

j

j oj oo

B G

nn

error

SS (by subtraction)

(adjusted) 1 MSA SSA

I

(unadjusted) 1 MSB SSB

J

1

error

MSE SS

IJ I J

   

MSA MSE

Total IJ 1 2 2

ijk

i j k oo

y G

n



(7)

7 The sum of squares due to errors

2 2

2 2 2

2

1

ˆ

ˆ

= Total S.S. - S.S. block(unadjusted) - S.S. treat(adjusted) (9)

j

ijk i i

i j k j oj i

j

ijk i

i j k oo j oj oo i

E y B Q

n

G B G

y Q

n n n

  

 

 

      

  

  

and

also

2 2

2

2 2

2

ˆ

ˆ (10)

= Total S.S. - S.S. treat(adjusted) - S.S. block(unadjusted)

i

ijk j j

i j k j io i

i

ijk j j

i j k oo j io oo i

E y A R

n

G A G

y R

n n n

  

   

     

   

  

  

For the Fisher - Cochran theorem to be used here, we need to have

Total = SS(treat) + SS(block) + SS Error which is possible only under (9) or (10).

Total SS SS treatment(adjusted) + SS block(adjusted ) + SS Error

When interaction effect is present

When interaction effect is present, then the model is

, 1, 2,..., ; 1, 2,..., .

ijk i j ij ijk

y        iI jJ The normal equation for ij's are

ˆ ˆ ˆ

ˆ

( ).

ij ij i j ij

Tn      The error sum of squares is

2

2 2

ˆ ˆ ˆ

( ˆ )

.

ijk ijk i j ij

i j k

ij ijk

i j k i j ij

E y y

y T

n

   

 

      

 



 

(8)

8

Under the null hypothesis that all ij's are zero, we already have derived the error sum of squares as

2

2 j ˆ

ijk i i

k j oj i

E y B Q

n

.

Hence, the sum of squares due to interaction is

2 2

2 ij j ˆi i.

i j ij j oj i

T B

E E Q

n n

 



It has (I1)(J degrees of freedom, if there is at least one observation in each cell; otherwise it is 1) reduced by the number of cells having no observation.

Estimate of treatment contrast and its variance

Let i i

i

L

 be a treatment contrast. The estimate of i is a linear function of Q s . Hence L is i' estimated by another linear function of .Q i

ˆ ˆ

Let i i i i

i i

L

 

q Q .

If ˆ 'i s are available as linear functions of Q s then i' , q s can be obtained easily. However, i' q s can i'' be obtained as follows:

ˆ ˆ

Since i ii i im im, 1, 2,...,

m i

Q CCi I

 

1 2 2

ˆ ˆ ˆ ˆ

so i i i i i ... iI I

i i

q C C C

       

.

Equating the coefficients of ˆi in this identity, we get ( 1, 2,.., ).

i m mi

m

q C i I

This is the same as the normal equation

ˆ ˆ

ii i im im i

m i

CCQ

except that Q s are substituted by i'i's and the unknown ˆ 'i s have been written as 'q s . i

Hence 'q s can be obtained from the solution of the same normal equation. i

(9)

9 Now

2 2

2

2 2

2

2

( )

( , )

So

( )

.

ij i

i i ii

i oj

ij i mj j

i m i m im

i oj i oj

i i i ii i m im

i i m

i m im

i m

i i i

Var Q Var A n B C

n

n B n B

Cov Q Q Cov A A C

n n

Var q Q q C q q C

q q C q

 

   

  

     

 

   

 

  

 

  

 

In particular ˆ ˆ 2

( i im) ( i m) Var    qq

where q and i q are the coefficients of m Q and i Q respective in the expression giving the estimate of m

ˆ ˆ

( im).

The sum of squares of a contrast is the square of the contrast is the square of the contrast divided by the

coefficient of 2 in its variance. Hence, the sum of squares of the estimate of i i

i

is

2

.

i i i

i i i

q Q q

 

 

When data is proportionate to cell frequencies:

When the cell frequencies are the same in a column (row), though it may vary from column to column, then non-orthogonal type of data is obtained. Suppose the frequencies in each row of the i column be th

n . Then i

2

.

i

ii i

C rn rn

  N

where i

i

N

n and r is the number of rows, .

i m

ij

C rn n

  N

The reduced normal equations become

(10)

10

2

ˆ ˆ

ˆ ˆ

or .

i

i i i m m i

m i

i i

i i m m

m

n r

r n n n Q

N N

n Q

n n

N r

 

 

 

  

 

 

 

Imposing restriction mˆm 0,

m

n  

the solution is obtained as ˆi i .

i

Q

 rn

In this case the adjustment sum of squares due to A can be obtained from

2 2

i

i i

A G

rnrN

and unadjusted sum of squares due to B is

2 2

i

j i

B G

rnrN

.

Thus

2 1 1

ˆ ˆ

( i m) .

i m

Var n n n

    

 

References

Related documents

Furthermore, while symbolic execution systems often avoid reasoning precisely about symbolic memory accesses (e.g., access- ing a symbolic offset in an array), C OMMUTER ’s test

Thus, ovarian hyperstimulation with or without progesterone injection alter the thickness of the surface and glandular epithelium of endometrium, which could affect the endometrial

There were no significant differences in the grade of cellular infiltration or the number of cells staining positive for CD1a, CD3 and CD68 between individuals

Policies, as the motivating power behind technology adoption or implementation should be brought higher up on the agenda for the reasons that the support of government

In this study, it is aimed to develop the Science Education Peer Comparison Scale (SEPCS) in order to measure the comparison of Science Education students'

The critical defect length leading to thermal runaway is determined as a function of the current decay time constant s dump , RRR of the SC cable copper matrix, RRR of the bus

To achieve the load balancing we determine the sub-domains by using the Morton ordering so that the number of cells and the particle loads becomes uniform between processors as

As noted in the Literature Review, above, scholarship on the determinants of foreign direct investment (FDI) variously argue the influence of GDP growth, the openness of a