CHAPTER 2: TRAIT BASED CLASSIFICATION AND
2.3 METHODS
2.3.1 CHOOSING TRAITS
The first step in generating tailored functional effects groups is to define the functions of
interest and select a group of functional traits that drive or determine them (Hillebrand & Matthiessen 2009). Generally, if there is a theoretical or empirical link between a trait and the function to be measured, then it should be included. For example, traits relating to leaf nutrient concentration or toughness can be directly linked to decomposability (Garnier et al. 2004; McLaren & Turkington 2010). Correlation between many traits may be unavoidable, but direct correlation should be avoided, for example specific leaf area (SLA) and leaf dry matter content (LDMC) describe almost identical characteristics, so only one should be included. Other correlations are frequent, such as between leaf nitrogen content (LNC), SLA and photosynthetic rate, but even though their links with different processes are likely to overlap, between them they will describe a large range of different processes, and do not directly measure the same aspect of function (Table 2.1).
Table 2.1: Correlations of traits using Pearson‟s Product Moment Coefficient. AGB = Aboveground
biomass, BGB = Belowground biomass, SLA = Specific leaf area, LNC = leaf nitrogen content, Gs = stomatal conductance, psyn= photosynthetic rate
BGB AGB SLA LNC Gs AGB 0.42 SLA 0.41 0.14 LNC 0.24 0.13 0.29 Gs 0.11 0.09 0.03 0.07 Psyn 0.03 0.4 0.05 0.12 0.56
While I recommend keeping the number of traits as low as possible, Hector and Bacchi (2007) present evidence that the more functions to be explained, the more species (and by extension, functional diversity) are needed. If traits are given equal weighting in cluster analysis (see below), this will generate bias towards the correlated traits, so I recommend testing for correlations before commencing the cluster analysis; an r value of over 0.7 is generally considered to indicate excessive correlation. Trait choice will depend upon community type and the functions in question, but I suggest that they are practical to measure, with strong support in the literature for their ability to describe the relevant functions (Westoby 1998; Wright et al. 2006; Kraft 2008). Mixtures of physiological and morphological traits have been recommended in the literature in order to represent a full spectrum of growth forms and resource capture strategies, which is useful as long as they are strongly supported by evidence (Dìaz et al. 1999a; 2004). Traits used can be ordinal or continuous. If a discontinuous trait variable cannot be assigned a logical rank order (e.g. from annual to perennial- 1-3), I recommend avoiding these data, as they are not well suited to cluster analysis (see below). However, if they are instrumental in describing the functions required, Jongman et al. (1995) suggest treating such traits as nominal data.
There are several published examples of systematic evaluations of trait and process linkages, which are good starting points for trait selection (Klumpp & Soussana, 2009; de Bello et al. 2010). Failure to identify groups closely linked to function is likely to result in a weak capacity to explain ecosystem processes. The functional diversity (FD) work presented by Petchey and Gaston (2002) indicates that if processes are measured with no consideration of the trait variation of species present, they are often poorly explained, and the authors question whether a single functional classification scheme can describe a wide range of functions. Creating many different functional effects classifications to describe different responses is likely to be unfeasible in a field experiment (although not in model simulations), so using a set of the most comprehensive traits possible is likely to be the most practical alternative.
2.3.2 DEFINING THE SPECIES POOL
The next step in the process is to delimit the species pool from which the groups are drawn. Deciding which species to include can be problematic, particularly if the study system
is open to invasion. It is further complicated by the fact that the majority of field experiments using these methods will undergo successional changes. The simplest and most cost-effective method is to begin with all the species present in the field site at the beginning of the study, and to screen and add invaders to the groups as and when they appear, on a similarity basis. In cases where communities are artificially assembled it may be sensible to use species that associate frequently and that are typical of the sites‟ environmental conditions e.g. the plant species found in a class of the UK‟s National Vegetation Classification (NVC) (Rodwell 1992).
2.3.3 OBTAINING A TRAITS DATABASE
Trait data can be obtained from database or literature sources (Fitter & Peat 1994; Kleyer et
al. 2008; Royal Botanic Gardens, Kew 2008; Kattge et al. 2010; USDA 2010b), or measured
directly. Database trait values can be very useful as they are drawn from several studies, and save a lot of time and expense (Lavorel et al. 2008). However, there are many limitations with using such values, particularly the lack of standardisation. A number of standardised protocol suggestions have been offered, with the aim of making published trait data largely comparable (Grime et al. 1997; Cornelissen et al. 2003). Standardisation is still generally lacking in plant trait measurement studies, which differ in growth conditions and substrate, season of measurement, seed source and trait measurement protocols. In addition not all species are represented in the databases. A second problem is that a single trait value does not represent the full intra-specific range of a species, or its expression in field conditions (Albert
et al. 2010). These problems can be addressed by creating as many replicates of each species
as possible and averaging between them. Using local seeds and soil, and keeping greenhouse conditions as similar to the site as possible could go some way to alleviate criticism; if using one set of trait data taken ex situ, trait-based field experiments should be confined to one site to reduce intra-specific trait variation, by the same token, attempting to capture the whole range of a species‟ trait distribution in a greenhouse could lead to „noisy‟ data and lack of fit to functions in the field Dìaz et al. (1999b).
2.3.4 USING DIVISIVE HIERARCHICAL CLUSTER ANALYSIS TO CREATE FUNCTIONAL EFFECTS GROUPS
Once a list of trait values for all species has been established, the next step is classification into functional groups, to create groups of species that have similar effects on the functions of interest. An efficient means of doing this is to use cluster analysis (Jongman 1995; Shaw 2003), which allows clusters of similar values to be identified within multivariate datasets. Of the various techniques available I recommend divisive hierarchical cluster analysis as this allows for a pre-determined number of groupings to be derived. This is desirable for many experiments, as an unrestricted number of groupings could result in uninformative (if low) or unmanageable (if high) numbers of functional groups. Cluster analysis can create a dendrogram (Fig. 1), which provides the added benefit of visualising the relationships between species. If the assumptions about the relationship between traits and function are correct, and trait values are representative of those expressed by species at the study site, the groups derived from the analysis should have more discrete and predictable effects on ecosystem function than random species assemblages.
In order to create groups, trait means for each species should be calculated and the cluster analysis carried out using S-PLUS 6.0 (Insightful, Gothenburg, Sweden) or the freeware R2.12.0 (R Core Development Team 2009). Categorical variables should be treated as ranked factors. Weighting of traits is somewhat contentious, and I do not recommend it unless there is strong justification for it. For example, if responses concerned with nitrogen cycling are the focus of the experiment, traits such as leaf nitrogen content or nitrogen fixation capacity could be double weighted (Petchey & Gaston 2002; Roscher et al. 2004), although this could lead to heavily biased groups, particularly since legumes do not always fix nitrogen, for example in immature communities or water-stressed communities (Serraj et al. 1999). As clustering is based upon dissimilarity matrices, an appropriate distance measure must be chosen. The simplest (and default in R) is Euclidean distance, which calculates the distance between every trait combination in Cartesian space (n-species dimensions), and creates a matrix of these distances (McCune & Mefford, 1999). Euclidean distance is commonly used, although it emphasises outliers so the data must be standardised with a mean of 0 and variance of 1. Most other distance measures have fairly rigid requirements, and compute data in less intuitive ways. For example, Sørenson (Bray-Curtis) dissimilarity gives less weight to
outliers, but it is recommended for ecological community data, as it gives proportions based on overlap of two communities, so is better for community turnover.
When carrying out this analysis in S-PLUS, it requires the user to set the number of groups required before clustering begins. In R, the grouping occurs in a post-hoc fashion. The number of groups to be included has a large impact on the type of study to be carried out, and the model system.
2.3.5 ESTABLISHING THE GROUPS IN THE FIELD
The optimal method for generating an experimental functional diversity gradient from the groups identified will depend upon the type of species in question, and the functions to be studied i.e. whether species are removed from an existing community or whether an artificially assembled community is established. If disturbance strongly affects the measured functions (e.g. intensive weeding can dry and warm the soil, altering microbial activity), then it may be best to assemble species. Clearly, it is not feasible to assemble late-successional ecosystems comprised of species with long generation times, so here species removals or simulations (e.g. Bunker et al. 2005) are the only options. Weeding to establish functional groups should cause a directional change in community level trait values and their distribution, although shifts in trait distribution can only be expected for traits that show significant differences between groups. In other cases niche space may be made available by the removal or exclusion of a functional group, and the remaining species that are most similar to those removed are likely to utilise this space and increase disproportionately. Where this occurs, the observed change in trait values following functional group exclusion may be less than expected. When trait data are combined with a survey of species abundances in experimental units, a number of metrics, including functional diversity (Petchey & Gaston 2002), community weighted means (Garnier et al. 2004; Lavorel et al. 2008) and dissimilarity measures (Hillebrand & Matthiessen 2009) can be calculated and used to estimate the impact of the functional group manipulation on community level functional properties. These can also be used as explanatory variables in statistical models describing function.
2.3.6 ASSIGNING NEW SPECIES TO EXISTING FUNCTIONAL GROUPS
In open ecosystems new species may colonise or emerge from stasis, (e.g. seedbank). These may not have been considered when originally delimiting the species pool of the site, and their absence from the functional dendrogram means that their functional group assignment is unknown. There are two options in this situation, the first of which is to remove the colonising species. This is simple and may be desired to avoid the emergence of entirely new functional properties (e.g. the entry of an N fixer into a system that did not contain them), but risks reducing ecological realism; the species may be a potential new dominant, for example. The other option is to obtain trait data for the new species and allocate it to the most appropriate of the existing functional groups. This a posteriori integration into groups can be achieved by using dissimilarity indices to add the new species to the dendrogram. Dissimilarity values follow the same principles as a cluster analysis; again Euclidean distance measures are recommended to arrange the data in multidimensional space.
To assign new species to functional groups, calculate a mean trait value for each of the three functional groups, to compare to the mean trait values of the new species. This method requires a sequential addition of species, and because the mean trait value of the groups would be altered with each addition, it is important to add new species in order of their abundance in the field, in case the new species have outlying traits which would alter group means. Dissimilarity indices can be calculated using R2.12.0, choosing Euclidean distances, and standardising the data as before. Categorical variables should be converted to a continuous format by averaging values across species, and labelling them as numeric, not factor values. The new species is then assigned to the group with the lowest dissimilarity value, and the new trait means calculated. The mean trait value for the functional group must be adjusted with each new species.
2.3.7 VALIDATION OF THE FUNCTIONAL EFFECTS GROUPS
An important criticism of hierarchical cluster analyses is that there is no measure of whether the groups identified are the most effective combination to explain function, or whether they are statistically different from one another. For this I recommend using linear discriminant analysis (LDA) in the MASS library of R2.12.0. LDA is an a posteriori method of verifying
that species have been allocated to the most appropriate group. LDA tests the within-group covariance matrix of standardised traits, and generates a probability of each species being in the most appropriate group, i.e. it generates a percentage comparing the similarity of the species to each of the groups. The highest probability generated is taken to be the group the species belongs in. If the analysis finds that almost every species is appropriately categorised, this is strong justification for the groupings. High percentages of correctly allocated species in the LDA confirm that the functional groups are as discrete as possible. If there are some species that the LDA suggests are misclassified, close inspection is needed. It is possible that the species was classified on the basis of a single trait. It is for the user‟s discretion to decide whether this trait is particularly important to their functions of interest. If not, the analysis could be repeated with the trait removed and results re-evaluated.
Before hypotheses about the effects of functional group removal can be formulated it is important to check that there are quantifiable differences between the trait means of the groups. If there are no clear differences between the groups, this suggests that there is too much functional divergence, or there are too many groups, and the outcome of manipulating these groups in a system could be confounding or inconclusive. Simple analyses such as one- way ANOVAs and post-hoc Tukey‟s HSD tests allow evaluation of differences in trait means across functional groupings.
2.3.8 GENERATING HYPOTHESES ABOUT FUNCTIONAL GROUP EXCLUSION
Identification of functional group differences, coupled with hypothesised trait-function relationships, enables predictions about the consequences of functional group removal. Despite these hypotheses, it is possible that the removal of a functional group will not change functional properties. For example, if the group removed was rare, or if trait expression in the remaining species shifted to encompass the trait identity of the lost group (Walker et al. 1999). To measure the trait distribution and means across treatments in the field, I suggest using community weighted mean (CWM) and functional divergence measures (FDvar), (Mason et al. 2003; Garnier et al. 2004). These both weight traits by abundances in the field, and give a weighted mean trait value and trait variation value, respectively. Once intergroup variability has been established, these metrics can then be applied to ecosystem process measures. FDvar is particularly useful as it hypothesises that increasing functional diversity
would lead to higher FDvar. This could then imply niche complementarity if correlated with higher levels of function.