NONPARAMETRIC ANALYSIS OF VARIANCE
24.4 ANALYSIS: FRIEDMANN ’ S TEST (FOR THREE OR MORE DEPENDENT SAMPLES)
24.4.2 How Is the Test Analysis Carried Out?
As for the Kruskal – Wallis test, Friedmann ’ s test compares the medians of three or more dependent groups (Friedmann, 1937 ). The scores are ranked individually for each day with tied values given the mean of the ranks as usual. The ranks are summed for each soil horizon and the square of the sum of ranks calculated. The statistic S is then obtained as shown in Table 24.2 and referred to a table of Friedmann ’ s S or, if not available, χ 2 is calculated as shown and taken to the χ 2 table for K − 1 DF, where K = the number of treatments, to obtain a P value.
24.4.3 Interpretation
In the present example χ 2 = 19.19 and is signifi cant at the 0.1% level of probability ( P < 0.001), indicating a signifi cant effect of soil horizon on percentage of anaerobic bacteria. Examination of the data also suggests there may be some differences between samples collected on the different days, with slightly higher percentages of anaerobic bacteria recorded on the fi rst two days. Unfortunately, Friedmann ’ s test does not provide an explicit test of the difference between days but, as for the two - way ANOVA, takes into account variations between the different days in assessing differences between the soil horizons.
24.5 CONCLUSION
There are a limited number of nonparametric tests available for comparing three or more different groups. Two useful nonparametric tests are the Kruskal – Wallis and Friedmann ’ s tests. The Kruskal – Wallis test is the nonparametric equivalent of the one - way ANOVA (Statnote 6 ) and essentially tests whether the medians of three or more independent groups are signifi cantly different. Friedmann ’ s test compares the medians of three or more dependent groups and in the nonparametric equivalent of the two - way ANOVA (see Statnote 11 ).
Statnote 25
MULTIPLE LINEAR REGRESSION
Uses of multiple regression.
Theory of multiple regression.
Multiple correlation coeffi cient R.
Interpretation of regression coeffi cients.
25.1 INTRODUCTION
In Statnotes 15 and 18 , the application of correlation and regression methods to the analysis of two variables ( X , Y ) was described. These methods can be used to determine whether there was a linear relationship between the two variables (see Statnote 15 ), whether the relationship was positive or negative, to test the degree of signifi cance of the linear rela-tionship, and to obtain an equation relating Y to X (see Statnote 18 ). When the data are normally distributed, the degree of linear correlation between two variables can be tested using Pearson ’ s correlation coeffi cient ( r ) (see Statnote 15 ) while if the data are not nor-mally distributed, a nonparametric correlation coeffi cient can be used (see Statnote 16 ).
This statnote extends the methods of linear correlation and regression to situations where there are two or more X variables, that is, multiple linear regression .
As in Statnote 18 , the variables under study are referred to as Y the dependent, outcome, or response variable and X the independent, predictor, or explanatory variable.
Multiple linear regression determines the linear relationship between one dependent vari-able ( Y ) and multiple independent varivari-ables ( X 1 , X 2 , X 3 , etc.). Multiple regression analysis has many uses. First, it enables a linear equation involving the X variables to be constructed
Statistical Analysis in Microbiology: Statnotes, Edited by Richard A. Armstrong and Anthony C. Hilton Copyright © 2010 John Wiley & Sons, Inc.
128 MULTIPLE LINEAR REGRESSION
that predicts Y , for example, it may be necessary to predict bacterial biomass under a set of conditions specifi ed by a series of X variables, such as pH, temperature, and amount of nutrient medium. Second, given several possible X variables that could potentially be related to Y , an investigator may wish to select a subset of the X variables that gives the best linear prediction equation. Third, an investigator may wish to determine which of a group of X variables are actually related to Y and to rank them in order of importance. For example, an investigator may wish to determine which climatic variables are most closely related to the growth of lichenized fungi in the fi eld and in which order of importance.
Multiple regression is most useful, however, in deciding whether there are any signifi cant variables infl uencing Y and, therefore, should be thought of as an exploratory method, the results of which should then be tested on a new set of data and, preferably, by a more rigorous experimental approach in which the X variables are controlled.
25.2 SCENARIO
Lichens, a symbiotic association between a fi lamentous fungus and an alga, are often dominant in stressful environments such as the surfaces of rock and tree bark. Under these conditions, they experience extremes of temperature, moisture supply, and low availability of nutrients. As a consequence, lichens sequester a high proportion of their carbon produc-tion for stress resistance rather than for growth. Hence, as a group lichens are particularly slow growing organisms, with many species growing at less than 2 mm per year and some at less than 0.5 mm per year (Hale, 1967 ).
Slow growth rates and the diffi culty of growing lichens for long periods under con-trolled laboratory conditions have made it diffi cult to study the infl uence of environmental factors on growth. In the absence of such studies, investigation of the seasonal variations in growth in the fi eld is one method of examining the effects of environmental factors.
Signifi cant correlations between growth and climatic variables suggest hypotheses about the causal factors limiting growth that may then be tested by more controlled physiological experiments. In the present study, the radial growth rate (RGR) of thalli of the crustose lichen Rhizocarpon geographicum (L.)DC was measured in successive 3 - month periods over 51 months in North Wales, United Kingdom. The radial growth of 20 thalli of R.
geographicum (Armstrong & Smith, 1987 ) was measured at between 8 and 10 randomly chosen locations around each thallus at 3 - month intervals from April 1993 to June 1997 using the method described by Armstrong (1973) . Essentially, the advance of the hypothal-lus, using a micrometer scale, is measured in relation to fi xed markers on the substratum.
Radial growth in each period was averaged for each thallus and then over the 20 thalli to examine the pattern of seasonal growth. Data for 8 climatic variables were obtained from the Welsh Plant Breeding Station, Plas Gogerddan, near Aberystwyth, and included records of (1) total rainfall over each 3 - month period, (2) the total number of rain days, (3) maximum ( T max ) and minimum ( T min ) temperature recorded on each day and averaged for each 3 - month period, (4) the total number of both air and ground frosts, (5) the total number of sunshine hours, and (6) average daily wind speed.
25.3 DATA
The data comprise a single dependent ( Y ) variable, namely, radial growth of the lichen in each 3 - month period and eight possible defi ning ( X ) variables and are presented in Table 25.1 .
ANALYSIS 129
25.4 ANALYSIS 25.4.1 Theory
When there are only two variables ( X and Y ), the distribution of data points in space can be represented by a two - dimensional (2D) surface, but with three variables ( Y , X 1 , and X 2 ) 3D geometry is required. The theory of multiple regression will be described with refer-ence to two independent variables, but the same principles apply to any number of X variables. Figure 25.1 illustrates the relationship between Y and two X variables ( X 1 , X 2 ) such that any point (A) in the 3D space is defi ned by three coordinates ( x 1 , x 2 , y ). The relationship between Y and a single X variable is described by the line of best fi t as deter-mined by the method of least squares (see Statnote 18 ). By contrast, with two X variables the data are fi tted by a surface or plane (the plane of best fi t ) (Fig. 25.2 ), which is described be used to indicate a sample regression coeffi cient, which is estimated from the data, and while β indicates the population or “ true ” value of the regression coeffi cient. As in the case of a single X variable, the y values are considered to be normally distributed about TA B L E 25.1 Radial Growth ( RGR ) ( Y ) of Lichen R . geographicum in 17 Successive 3 - Month
130 MULTIPLE LINEAR REGRESSION
Figure 25.1. Multiple regression with two independent ( X ) variables infl uencing the dependent variable ( Y ). With two X variables, the position of any point ( A ) is described in three - dimensional space by three coordinates ( x 1 , x 2 , y ).
A(x1, x2, y)
X1
X2
Y
x1
x2
y
Figure 25.2. Multiple regression with two independent variables infl uencing Y . The data are fi tted by the plane of best fi t. The data points are scattered about the plane with some data points ( y ) above the plane and some below the plane in three - dimensional space. The degree of scatter of the points above and below the plane of best fi t indicates the failure of the plane to fi t the points.
X2
X1 Data points below plane of best fit Y
Data points above plane of best fit
the regression plane (Fig. 25.2 ), and the coeffi cients of the regression equation are chosen to minimize Σ ( Y − Y L ) 2 , where Y L represents the points on the regression plane and Y the actual points. Deviations of the points from the regression plane are a consequence of random error and the existence of variables that infl uence Y but that have not been included in the study.
ANALYSIS 131
25.4.2 Goodness - of - Fit Test of the Points to the Regression Plane