Methods of numerical analysis - Chapter three: Numerical analyses o f the water sample data-set

Chapter three: Numerical analyses o f the water sample data-set

3.1. Methods of numerical analysis

Diatom counts were entered into the database AM PHORA (Beare, 1995) maintained at the Environmental Change Research Centre (UCL). Data manipulation, such as transformation into percentage data, was performed using TRAN (Juggins, 1994). Unless specified otherwise, the statistical techniques used in this chapter were implemented by means of the computer program CANOCO (version 3.12a, ter Braak, 1991). The ordination diagrams were plotted using the computer program CALIBRATE (version 0.81, Juggins and ter Braak, 1997).

3.1.1. Gradient analysis - Ordination and constrained ordination 3.1.1.1. Principal component analysis (PCA)

Principal component analysis is a method of indirect gradient analysis, where the response variables are explored exclusively and any variation in the response data is based on its inherent structure (ter Braak, 1988). PCA relates to a linear response model in which the abundance of any species either increases or decreases with the value of each of the latent environmental variables (ter Braak, 1987). PCA is therefore often used to deal with chemistry data. In the PCA diagram, variables with high positive correlation have small angles between their biplot arrows. Variables with long arrows have high variance and their proximity to the axes summarise the relative weight of each variable in determining each axis (ter Braak, 1987). The direction of each arrow indicates ascending values for each environmental variable.

3.1.1.2. Detrended Correspondence Analysis (DCA)

Detrended correspondence analysis like PCA is a technique of indirect gradient analysis. By contrast with PCA, DCA is related to a unimodal response model. In this model, any species occurs in a limited range of values of each of the latent environmental variables (ter Braak, 1987). DCA is useful in detecting outlier samples with atypical diatom assemblages. DCA can also be used to determine, by gradient length estimation, if unimodal or linear numerical techniques are better suited for ordinations of diatom-environment relationships.

In the following analyses, DCA was performed with detrending by segments and non-linear rescaling of axes. DCA was also implemented with and without down-weighting of rare species. In the CANOCO package, all species with frequency below 20% of the maximum frequency of any species, are down-weighted in proportion to their frequency. As shown by Eilertsen et a l

(1990), the gradient length decreases when infrequent species are down-weighted. However, the percentage of variance explained by the first two axes increases (this is illustrated in Table 3.2.6).

Chapter 3.1 : M ethods o f numerical analyses 183

3.1 1 .3 . Canonical Correspondence Analysis (CCA)

Canonical correspondence analysis is a direct gradient analysis technique in which response variables and explanatory variables are explored simultaneously (ter Braak, 1988). With CCA species are assumed to have a unimodal response and the ordination axes are constrained to be linear combinations of environmental variables (ter Braak, 1987). For the first axis, species scores and sample scores are chosen to maximise the correlation between them. For subsequent axes, the sample and species scores are also maximally correlated but are chosen to be uncorrelated to species and sample scores of previous axes. In the CCA biplot, the vector for an environmental variable points in the direction of maximum variation of that variable across the diagram, and its length is proportional to its importance. Environmental variables with long vectors are more strongly correlated with the ordination axes than those with short vectors, and are more closely related to the patterns of biological variation displayed in the diagram (ter Braak, 1987).

3.1.2. Species response analysis

The appropriate statistical techniques for defining the optima and tolerance ranges of diatom taxa to a particular environmental gradient are weighted-averaging (WA) regression and generalised linear models (GLM). Here, WA regression was used to provide a summary of the ecological preferences and breadths of the species, irrespective of whether the species have statistically significant relationships to particular environmental variables (Odland et a l , 1995). By contrast, GLM was used to develop and evaluate a hierarchical series of species response curves in order to find the simplest possible response model that adequately explains the patterns of occurrence and abundance of the diatom taxa (Huisman et a l , 1993).

3 .1 2 .1 . Weighting-averaging regression (WA)

It has long been observed that the relationship between a species and an environmental variable is often unimodal. In other words, each species thrives best at a particular value (optimum) and cannot survive when the value is either too low or too high (ter Braak and van Dam, 1989). In a lake with, for instance, a certain alkalinity, diatoms with their alkalinity optima close to the lake water alkalinity will tend to be the most abundant taxa present. An ecologically sound approximation of a taxon’s alkalinity optimum is, therefore, the average of all alkalinity values for samples in which the taxon occurs, weighted by the taxon’s relative abundance. This is done by weighting-averaging (WA) regression (Birks et a l , 1990). W A assumes that individual taxa respond in a unimodal manner over a long environmental gradient. This method is considered to be the most appropriate for noisy, species rich, compositional data, with species that may be

Chapter 3.1: M ethods o f numerical analyses________________________________________

absent in many of the samples (ter Braak and Juggins, 1993).

Let % denotes the variable of interest, x, the value of x in sample i and the abundance of taxon k

in sample i (y,t ^ 0). The abundance is the fraction of valves of taxon k with respect to all valves examined in sample i. The data are therefore compositional data, which have a constant-sum constraint = ij (1er Braak and van Dam, 1989).

The W A estimate of a taxon’s optimum (Wk) corresponds to the abundance weighted mean and is given by the formula:

n I n

/=! / 1=1

And a taxon’s tolerance, tk ,(which corresponds to the weighted standard deviation) is:

/ n

tk =

(=1 / 1=1

The computation of WA optima and tolerances was implemented using the program CALIBRATE (version 0.81, Juggins & ter Braak, 1997).

It has been shown that WA regression should produce unbiased estimates of species optima if the samples are equally spaced over the environmental gradients of interest and if they are closely spaced with respect to species tolerances (ter Braak and Looman, 1986 cited in Marchetto, 1994). At the ends of the gradients, however, these assumptions cannot be verified, i.e. the frequency distributions of taxa are truncated and optima are overestimated at the lower end of the gradient and underestimated at the higher end (Marchetto, 1994). As the estimated optima are biased towards the centre of the gradient, the range of estimated optima is also shrunk. Shrinkage occurs because averages are taken twice (Birks et a l , 1990). In WA, shrinkage is compensated for by linear re-scaling of species optima. Marchetto (1994) recommends the re-scaling of the optima in order to facilitate the comparison of values derived from different data-sets (assuming taxonomic consistency is assured). This is automatically done by the program CALIBRATE. As the WA of a taxon is effectively determined by the sample in which it occurs with the largest abundance, it is biased towards this sample. It is thus important to consider the effective number of occurrences of the taxa which actually influence the weighted-average estimates when comparing tolerances (Birks, 1995). An adequate measure o f the effective number of occurrences of the taxon is N2 (Hill, 1979 cited in Birks, 1995). For quantitative data such as diatom abundances, N2 lies between 1 and the actual number of occurrences. WA tolerances can be

Chapter 3.1 : M ethods o f numerical analyses 185

corrected for bias by dividing tk by (1-1/A^2)^^ (Line et a l , 1994). The tolerances given by the computer program CALIBRATE are adjusted for the effective number of occurrences in this way. Estimates of optima and tolerances for a particular environmental variable, are more reliable for taxa with high N2 (>3, in Reed, 1998).

Although the W A optima and tolerances calculated for each taxon are a function of the water samples selected, they still provide quantitative measures that are more meaningful than the often contradictory ecological information (such as pH or nutrient categories) available in the literature (Christie and Smol, 1993).

3.1.2.2. Taxon response models

W ith the aim of defining more accurately responses of specific diatom taxa to the main environmental variables, simple mathematical models that can describe the observed patterns are needed. For the purpose of species response analysis, a series o f hierarchical response models was used. These models are described in Huisman et a l (1993). This was implemented using the computer program HOF (J. Oksanen, unpublished program).

Using HOF, the simplest statistically significant response model for each taxon was found by fitting the most complex model first and progressively removing parameters from the regression models. This was done until the model could not be simplified without a significant change in the deviance of the model (Lotter et a l , 1997). Taxon response models were fitted by GLM using logistic regression.

HOF initially fits the most complex model, a skewed unimodal response model (model V). Then, HOF drops out a parameter to fit a symmetric unimodal response model (model IV), then drops out a further parameter to fit monotonie model with a plateau (model III) or a monotonie sigmoidal increasing or decreasing model (model II). Finally, HO F fits the fiat null model in which no significant trend in the abundance values is statistically significant.

Chapter 3.2: Numerical analyses o f the whole dataset_________________________________

In document Diatom assemblages and water chemistry of lakes in the French Massif Central: A methodology for reconstruction of past limnological and climate fluctuations during the Eemian period (Page 183-187)