• No results found

Anova Lecture

N/A
N/A
Protected

Academic year: 2021

Share "Anova Lecture"

Copied!
42
0
0

Loading.... (view fulltext now)

Full text

(1)

Engineering Experimentation II

Engineering Experimentation II

Lecture 7

Lecture 7

(2)

S

Su

um

mm

ma

ar

r o

of

f L

Le

ec

cttu

urre

e 6

6

 Regression ModelRegression Model

 Linear model coefficientsLinear model coefficients

 Model evaluationModel evaluation

 Exploit contour and surface plotsExploit contour and surface plots

 Error bars for 2Error bars for 222 exampleexample

 Single factor multiple levelSingle factor multiple level

 Make sense of your dataMake sense of your data

 Model LinearizationModel Linearization

 Curve fittingCurve fitting

 RR22 definitiondefinition

(3)

S

Su

um

mm

ma

ar

r o

of

f L

Le

ec

cttu

urre

e 6

6

 Regression ModelRegression Model

 Linear model coefficientsLinear model coefficients

 Model evaluationModel evaluation

 Exploit contour and surface plotsExploit contour and surface plots

 Error bars for 2Error bars for 222 exampleexample

 Single factor multiple levelSingle factor multiple level

 Make sense of your dataMake sense of your data

 Model LinearizationModel Linearization

 Curve fittingCurve fitting

 RR22 definitiondefinition

(4)

 The hypothesis testing frameworkThe hypothesis testing framework

 The two-sampleThe two-sample t t -test-test

 eecc nng g aassssuummpp oonnss, , vvaa yy

 Comparing more than two factor’s levels…Comparing more than two factor’s levels…thethe aananalysis lysis ofof variance

variance

  ANOVA decomposition of total variability ANOVA decomposition of total variability

 Statistical testing & analysisStatistical testing & analysis

 Checking assumptions, model validityChecking assumptions, model validity

 Post-ANOVA testing of meansPost-ANOVA testing of means

(5)

P

(6)
(7)
(8)

The Hypothesis Testing Framework

Statistical hypothesis testing

is a useful framework for

many experimental situations

r g ns o

e me o o ogy a e rom e ear y

s

We will use a procedure known as the two-sample

t  

(9)

-The Hypothesis Testing Framework

 Sampling from a normal distribution  Statistical h otheses:

0

:

1 2

 H 

μ

=

μ 

1

:

1 2

(10)

Estimation of Parameters

1

n

=

1 i n

n

= 2 2 2 1

(

) estimates the variance

1

i i

S

y

y

n

=

σ 

=

(11)

Summar Statistics

. 36

Modified Mortar

Unmodified Mortar

“Ori inal reci e”

1

16.76

 y

=

1 2

17.04

 y

=

1 1

.

0.316

=

=

1 1

.

0.248

=

=

1

10

n

=

n

1

=

10

(12)

How the Two-Sample

t  

-Test Works:

Use the sam le means to draw inferences about the o ulation means

1 2

16.76 17.04

0.28

 y

− =

y

=

2

Standard deviation of the difference in sample means

2 y

This su

ests a statistic:

n

σ 

=

1 2 0 2 2

Z

=

y

y

1 2 1 2

n

+

n

(13)

How the Two-Sample

t  

-Test Works:

1 2 1 2

se

an

o es ma e

an

 y

y

σ σ 

2 2 1 2

e prev ous ra o ecomes

S

+

1 2

2 2 2

However we have the case where

n

n

σ

=

σ

=

σ 

Pool the individual sample variances:

2 2 2

(

1

1)

1

(

2

1)

2

2

 p

n S n

S  

n

n

+

=

+ −

(14)

How the Two-Sample

t  

-Test Works:

The test statistic is

1 2 0

 y

y

=

1 2  p

n

+

n

 Values of t 0 that are near zero are consistent with the null hypothesis  Values of t 0that are very different from zero are consistent with the

alternative h othesis

 t 0is a “distance” measure-how far apart the averages are expressed in

standard deviation units

(15)

The Two-Sample (Pooled)

t  

-Test

2 2 2 1 1 2 2 1 2 ( 1) ( 1) 9(0.100) 9(0.061) 0.081 2 10 10 2  p n S n S   S  n n

+

+

=

=

=

+

+

0.284  p S 

=

1 2 0 16.76 17.04   2.20 1 1 1 1  y y t 

=

=

=

1 2 . 10 10  p n n

The two sample means are a little over two standard deviations apart Is this a "large" difference?

(16)

The Two-Sample (Pooled)

t  

-Test

 So far, we haven’t reall done

any “statistics”

 We need an objective basis

for deciding how large the test

0 = -2.20

statistic 0 really is

 In 1908, W. S. Gosset derived

the reference distribution

0…

distribution

 Tables of the t distribution

(17)

The Two-Sample (Pooled)

t  

-Test

  A value of t 0 between –2.101 and 2.101 is consistent with equality of means

 t 0 is exceeding the range of 2.101 or –2.101, leads to significant means difference  Could also use the P  -value approach

(18)

The Two-Sample (Pooled)

t  

-Test

0 = -2.20

 The P- value is the risk of wron l re ectin the null h othesis of e ual

means (it measures rareness of the event)

(19)
(20)

Im ortance of the

t  

-Test

Provides an objective framework for simple comparative

experiments

ou

e use o es a re evan ypo eses n a

wo-level factorial design, because all of these hypotheses

versus the mean response at the opposite “side” of the

cube

(21)

What If There Are More Than Two Factor Levels?

 The t -test does not directly apply

 There are lots of practical situations where there are either more

than two levels of interest, or there are several factors of simultaneous interest

 The analysis of variance (ANOVA) is the appropriate analysis

“engine” for these types of experiments – Chapter 3, textbook

 e was eve ope y s er n t e ear y s, an

initially applied to agricultural experiments

(22)

 An Exam le See

. 60

  An engineer is interested in investigating the relationship

. objective of an experiment like this is to model the relationship between etch rate and RF power, and to specify the power

.

 The response variable is etch rate.

 She is interested in a particular gas (C2F6) and gap (0.80 cm),

and wants to test four levels of RF power: 160W, 180W, 200W, and 220W. She decided to test five wafers at each level of RF power.

 The experimenter chooses 4 levels of RF power 160W, 180W,

200W, and 220W

 –  order 

(23)

 An Example (See pg. 62)

 Does changing the power

change the mean etch rate?

 Is there an optimum level

(24)

The Analysis of Variance (Sec. 3-2, pg. 63)

 In general, there will be a levels of the factor, or a treatments,

and n re licates of the ex eriment run in random order …a

completely randomized design (CRD)

 N = an total runs

… will be discussed later 

(25)

The Analysis of Variance

 The name “analysis of variance” stems from a partitioning of

are consistent with a model for the experiment

 The basic single-factor ANOVA model is

1,2,...,

i

a

τ ε 

=

= + +

1,2,...,

 j

=

n

an overall mean,

i

ith

treatment effect,

μ

=

τ 

=

exper men a error,

,

ij

(26)

Models for the Data

There are several ways to write a model for the data:

is called the effects model

ij i ij

 y

= + +

μ

τ

ε 

,

is called the means model

i i

 y

μ

ε 

=

=

+

(27)

 The Analysis of Variance

 Total variability is measured by the total sum of squares:

 The basic ANOVA partitioning is:

2 ..

(

)

a n T ij

SS

=

∑∑

y y

1 1 i = j= 2 2 .. . .. .

(

)

[(

) (

)]

a n a n ij i ij i

 y

y

=

y

y

+

y

y

∑∑

∑∑

1 1 1 1 2 2 i j i j a a n

n

y

y

y

y

= = = =

=

.

..

+

. 1 1 1 i i j T Treatments E  

SS SS

SS  

= = =

=

+

(28)

The Analysis of Variance

T

Treatments

E  

SS SS

=

+

SS  

  A large value of SS Treatmentsreflects large differences in treatment

means

  A small value of SS Treatments likely indicates no differences in

treatment means

 Formal statistical hypotheses are:

:

 H 

=

=

L

=

1

: At least one mean is different

a

(29)

The Analysis of Variance

 While sums of squares cannot be directly compared to test the

hypothesis of equal means, mean squares can be compared.

  A mean square is a sum of squares divided by its degrees of freedom:

=

1

1

(

1)

ota reatments rror  

an

− = − +

a

a n

,

1

(

1)

Treatments E  Treatments E  

 MS

MS 

a

a n

=

=

 If the treatment means are equal, the treatment and error mean

squares will be (theoretically) equal.

 rea men means er, e rea men mean square w e arger an

(30)

 Analysis of Variance: Summarized

 Computing…see text, pp 66-70

 The reference distribution for F 0is the F -1,(n- 1) distribution  e ec e nu ypo es s equa rea men means

(31)
(32)
(33)

 ANOVA calculations are usually done via

computer 

Calculations can be done on Minitab, NCSS, Excel,

Matlab, Scilab, …etc

(34)

Model Adequacy Checking in the ANOVA

Text reference, Section 3-4, pg. 75

Checking assumptions

is important

Normalit

Constant variance

Inde endence

Have we fit the right model?

Later we will talk about what to do if some of these

assumptions are violated

(35)

Model Adequacy Checking in the ANOVA

residuals (see text, Sec. 3-4, pg. 75)

ˆ

ij ij ij

e

=

y

y

 NCSS enerates the . ij

i residuals

 Residual plots are very  Normal probability plot

(36)
(37)

Post-ANOVA Comparison of Means

means

  Assume that residual analysis is satisfactory

 a ypo es s s re ec e , we on now w c spec c means

are different

 Determining which specific means differ following an ANOVA is

called the multiple comparisons problem

 There are lots of ways to do this…see text, Section 3-5, pg. 87  We will use airwise t -tests on means…sometimes called Fisher’s

(38)

Two-Factor Multi le levels Ex eriment

(39)

Extension of the ANOVA to Factorials

a b n a b ... .. ... . . ... 1 1 1 1 1

(

ijk

)

(

i

)

(

j

)

i j k i j a b a b n

 y

y

bn

y

y

an

y

y

= = = = =

=

+

2 2 . .. . . ... . 1 1 1 1 1

 

(

ij i j

)

(

ijk ij

)

i j i j k  

n

y

y

y

y

y

y

= = = = =

+

∑∑

+

+

∑∑∑

T A B AB E  

SS

= + +

SS

SS

SS

+

SS  

 breakdown:

1

1

1

1

1

1

df 

abn

− = − + − + −

a

b

a

b

− +

ab n

(40)

 –

NCSS and Minitab

will perform the computations

Text gives details of

manual computing

 – see pp.

(41)

 An aly si s of Varian ce Table

Source Sumof Mean Prob Power   Term DF Squares Square F-Ratio Level (Alpha=0.05)  A: C2 2 900801.2 450400.6 2563.41 0.000000* 1.000000 B: C3 2 420599.2 210299.6 1196.90 0.000000* 1.000000  AB 4 809992.1 202498 1152.50 0.000000* 1.000000 S 18 3162.667 175.7037 Total Ad usted 26 2134555 Total 27

* Term significant at alpha = 0.05 Means and Effect s Section

Standard  All 27 478.2592 478.2592  A: C2 1 9 468.7778 4.418442 -9.481482 2 9 706.5555 4.418442 228.2963 . . - . B: C3 1 9 305.4445 4.418442 -172.8148 2 9 595.7778 4.418442 117.5185 3 9 533.5555 4.418442 55.2963  AB: C2,C3 1,1 3 16.33333 7.652967 -279.6296 1,2 3 796.6667 7.652967 210.3704 1,3 3 593.3333 7.652967 69.25926 , . . . 2,2 3 708 7.652967 -116.0741 2,3 3 873 7.652967 111.1481 3,1 3 361.3333 7.652967 274.7037 3,2 3 282.6667 7.652967 -94.2963

(42)

Factorials with More Than Two Factors

- …

treatment combinations are run in random order 

  ANOVA identity is also similar:

T A B AB AC  

SS

SS SS

SS

SS  

SS

SS

SS  

= + + +

+

+

+

+

+

+

L

L

L

 Complete three-factor example in text, Example 5-5

References

Related documents

While many have attempted to implement sentiment analysis through var- ious algorithms, we wanted to gauge the performance of a recurrent LSTM neural network using different

SPSS  Output  6  shows  the  initial  output  from  this  ANOVA.  The  first  table  merely  lists  the  variables  that  have  been  included  from  the  data 

When the securitized assets are necessary for reorganization, however, and the …rm cannot easily replace them by resorting to outside markets, securitization can lead to ine¢

• Really, we are usually doing a random effects or mixed model, sort of…. • Did you really randomly select

ª [M,V] = chi2stat(NU) returns the mean and variance for the χ2 distribution with degrees of freedom parameters specified by NU. ª R = chi2rnd(V) generates random numbers from the

Whereas one-way analysis of variance (ANOVA) tests measure significant effects of one factor only, two-way analysis of variance (ANOVA) tests (also called two-factor analysis

Where the parents have a highly conflictual relationship, little or no con- tact between the child and the non-residential parent has been related to more positive child

The analysis centres on how the MOOC supports learning, focus on formative