5 Analysis of Variance models, complex linear models and Random effects models

(1)

5 Analysis of Variance models, complex linear models and Random effects models

In this chapter we will show any of the theoretical background of the analysis. The focus is to train the set up of ANOVA models in GenStat. GenStat comes with a very extensive help system and in addition several PDF files which sever as a documentation and reference. You should also consult secondary literature on statistical modelling to properly use these procedures.

At the beginning of each chapter you find a short introductions on how to generate the example field trails in GenStat. GENSTAT is also a good software to create field trails, although it is somewhat limited regarding the number of treatments. For complex design specialised software should be used.

The design creation in GenStat always give good advice on how the analysis model for a certain design should be set up. In any case you should consult a biometrician .

5.1 Basic syntax of ANOVA models

Table 5: Notation of ANOVA models + = A +B = Main effects of A and B

. = A.B = Interaction of A and B only

* = A*B = A+B+A.B factorial structure

/ = A/B = A+A.B –without main effect of B, but B nested within A

Table 6: Examples of ANOVA Models A*B*C = A+B+C+A.B+A.C+B.C+A.B.C full factorial model (A+B)*(C+D) = (A+B)+(C+D)+(A+B).(C+D)

= A+B+C+D+A.C+A.D+B.C+B.D Block/Plot/Subplot = Block+Block.Plot+Block.Plot.Subplot

A/(B*C) = A+A.B+A.C+A.B.C

Recommendation: Syntax of ANOVA Models

You can reuse any model from the „input log“ window and copy it in an extrax script window, then edit the model to suite your needs. Whenever you are not completele sure about the syntax of teh model you should use the long form writing using „ + “ and „ . “.

5.2 Anaylsis example : Potatoe yield – Latin square Please restart the GENSTAT Server via „Restart Server“ (see 1.3.2).

(2)

5.2.1 Create Design

To create a design in GENSTAT you use the menu „Stats -> Design -> Generate Standard Design“.

Please choose the base design „Latin Square“ from the pull down menu of the dialog box “Generate a Standard design”. Also enter names for the Rows-, Column and Treatment factor as well as the

“Number of Levels“:

Graph 102: Create a Latin Square Design

Click the „Run“ button and the design will be created in form of a spreadsheet. The specialty to this particular spreadsheet is that the information about the analysis model is saved within the spreadsheet. You can check the power of the design after you hit the “run” button. Go back to the design creation dialog and click the “Check for Power” button which is visible now. You need to know some basic information like the hypothesised mean difference “Size of difference to detect” and an information about the standard deviation “Residual Mean Square”. In case the power is below 80%

you want to rethink the design and could add replications to overcome the low power situation.

Please insert a new column for the response values “Yield” via “Spread -> Insert -> Column after current column” and enter some data. Now open the menu “Stats -> Analysis of Variance ->

General...“ .

You can save the design spreadsheet for later use via the menu „File“ and the „Save“ dialog. You should close the design after checking the results.

(3)

5.2.2 Enter or load data

Please restart the GENSTAT Server via „Restart Server“ (see 1.3.2).and load the file

„Latin_Square_Data_Potato_Yield.xls“ using the Excel Import Wizards (see 2.1).

You should convert Zeile, Spalte and Sorte to factor variables. The following data will be loaded:

Table 7: data set „Latin_Square_Data_Potato_Yield.xls“

Zeile Spalte Sorte Ertrag

1 1 C 22

1 2 B 20

1 3 A 39

1 4 D 27

1 5 E 34

2 1 E 29

2 2 D 29

2 3 C 25

2 4 A 30

2 5 B 23

3 1 A 29

3 2 E 25

3 3 D 34

3 4 B 26

3 5 C 27

4 1 B 23

4 2 A 27

4 3 E 27

4 4 C 32

4 5 D 41

5 1 D 33

5 2 C 21

5 3 B 24

5 4 E 30

5 5 A 33

5.2.3 Analysis

Graph 103: Analysis of Variance : General

Please start the ANOVA via „Stats ->

Analysis of Variances -> General“

In the following dialog you specify the analysis model .

(4)

Graph 104: Latin square ANOVA model

The response „Y-Variate“ of the model is the yield which is named „Ertrag“ in this data file.

The treatment or independent variable is the variety named „Sorte“ .

Built into every latin square model is the block structure of rows*columns here called „Zeile*Spalte“. Please see chapter 5.1 for more details on the usage of “*, ., /” in setting up ANOVA models.

Graph 105: ANOVA Options

Please specify all other settings as shown in Graph 105 and Graph 106, then click „Run“

In jedem Fall sollten Sie als Zusatzoption die graphische Ausgabe aktivieren.

(5)

Graph 106: ANOVA Save

After you „Run“ the analysis once, you can click the „Save“ button in the Analysis of Variance“ dialog.

To be able to check the assumption of normality of the residuals and run the appropriate test you need to save the residuals first.

To be able to run a multiple comparison test or pairwise means comparison you need to save the means first.

The following script listings are automatically generated as commands in the Input Log (see script listing 4 and script listing 5). You can save the Input Log to reuse the command sequence later.

script listing 4: Create One Way ANOVA Output

"General Analysis of Variance."

BLOCK "No Blocking"

TREATMENTS Sorte+Spalte+Zeile COVARIATE "No Covariate"

ANOVA [PRINT=aovtable,information,means,%cv; FACT=1; CONTRASTS=7; FPROB=yes; PSE=diff,\

means] Ertrag

APLOT [RMETHOD=simple] fitted,normal,halfnormal,histogram AGRAPH [METHOD=means]

script listing 5: Saving results of the One Way ANOVA

DELETE [REDEFINE=yes] Kartoffel_Meantab

AKEEP [RESIDUAL=Kartoffel_Residuals; FACT=32]Sorte; MEANS=Kartoffel_Meantab FSPREADSHEET [SHEET=29548864; METHOD=replace] Kartoffel_Residuals

FSPREADSHEET Kartoffel_Meantab

GENSTAT creates diagnostic plots and means plots for the treatment automatically . The diagnostics for this example look very good and allow the statement that the data comply with the assumtion of normal residuals without any further statistical analysis. In Normal plot as well as in the Half Normal plot a few value seem to be outstanding. Those will also be found in the list of “large residuals” in the output.

(6)

Graph 107: Example Potatoe yields – Diagnostic plots

Graph 108: Potatoe yields – Means

The result of the numerical analysis is listed in the following Output list .

Ouput 5: Result of the One Way ANOVA

Analysis of variance

Variate: Ertrag

Source of variation d.f. s.s. m.s. v.r. F pr.

Sorte 4 330.00 82.50 5.64 0.009

Spalte 4 150.00 37.50 2.56 0.093

Zeile 4 20.40 5.10 0.35 0.840

Residual 12 175.60 14.63

Total 24 676.00

Message: the following units have large residuals.

*units* 3 6.00

*units* 4 -6.40

Tables of means

Variate: Ertrag

Grand mean 28.40

Sorte A B C D E

31.60 23.20 25.40 32.80 29.00

Spalte 1 2 3 4 5

27.20 24.40 29.80 29.00 31.60

Zeile 1 2 3 4 5

28.40 27.20 28.20 30.00 28.20

(7)

Standard errors of means

Table Sorte Spalte Zeile

rep. 5 5 5

d.f. 12 12 12

e.s.e. 1.711 1.711 1.711

Standard errors of differences of means

Table Sorte Spalte Zeile

rep. 5 5 5

d.f. 12 12 12

s.e.d. 2.419 2.419 2.419

Stratum standard errors and coefficients of variation

Variate: Ertrag

d.f. s.e. cv%

12 3.825 13.5

5.2.4 Pairwise comparison

It is not possible in GenStat to run pairwise LS means comparisons using the menus. If the code for LS-means comparisons is generated manually it works fine. This chapter shows an example. Before you can apply the script to a specific analysis you have to find some information in the previous output and paste that into the script.

You have to supply the name of the table of means and information about the standard deviation and degrees of freedom. Besides the overly conservative Bonferroni method you can also run Tukey and Sidak tests.

VSN knows about this problem and will add this option to Version 10 of GENSTAT.

script listing 6: Create a pairwise comparison of means Data from table „Stratum standard errors and coefficients of variation“

VARIANCE = s.e. from the Output has to be squared DF = d.f. from Output

ALLPAIRWISE [METHOD=Bonferroni; DIRECTION=descending; PROBABILITY=0.05]\

MEANS=Kartoffel_Meantab; REPLICATION=5; VARIANCE=14.631; DF=12

Ouput 6: Result of the pairwise comparison on the basis of a One Way ANOVA All pairwise comparisons are tested.

Variance = 14.6310 with 12 degrees of freedom

Bonferroni test

Experimentwise error rate = 0.0500 Comparisonwise error rate = 0.0050

(8)

D E 1.571 No

D C 3.059 No

D B 3.968 Yes

A E 1.075 No

A C 2.563 No

A B 3.472 Yes

E C 1.488 No

E B 2.398 No

C B 0.909 No

Identifier Mean D 32.80 | A 31.60 | E 29.00 | | C 25.40 | | B 23.20 |

5.2.5 Test if data are Normal

Graph 109: Menu to Test if data are Normal

The Normality test is started via the Graph menu . Please specify as shown in Graph 110 .

Graph 110: Options of the Normality test

The script command is automatically created by GenStat :

(9)

script listing 7: Graphical Test of Normality

DPROBABILITY [PRINT=parameters,tests;DISTRIBUTION=NORMAL;METHOD=quantile;QMETHOD=standardized;\

BANDS=simultaneous;ALPHA=0.95;PLOT=reference] Kartoffel_Residuals

Ouput 7: Numerical results of the Normality tests

Critical values of test statistics (marginal tests)

Test statistic 15% 10% 5% 2.5% 1%

Anderson-Darling 0.576 0.656 0.787 0.918 1.092

Cramer-von Mises 0.091 0.104 0.126 0.148 0.178

Watson 0.085 0.096 0.116 0.136 0.163

Marginal tests

Variate Anderson-Darling Cramer-von Mises Watson

1 0.3176 0.0415 0.0415

?, *, ** indicate significance at 10%, 5% and 1% levels respectively

The graphical analysis shows that all residuals are within the limits of the confidence interval, which is an indication that the residuals are following a gaussian normal distribution. The result is congruent with the numerical analysis .

Graph 111: Graphical Output of the Normality test