Chapter 2 Materials and Methods
2.7 Quantitative analysis in PRIMER
The software package PRIMER (version 6.1.12; Primer-E Ltd.) with the add-on PERMANOVA+ (Primer-E Ltd, Plymouth, UK; Anderson et al. (2008)) was utilised to investigate variability among the multivariate ARISA data, 454 sequencing data as well as other univariate diversity measures. This package includes a wide range of ecological indices and is frequently used by microbial ecologists (Pharo et al., 2005, Bodelier et al., 2009, Anderson et al., 2011, Narasingarao et al., 2011, Shi et al., 2011).
2.7.1
Multi-Dimensional Scaling (MDS plots)
Multi-dimensional scaling (MDS) plots are a useful and informative way to display multivariate data and I use MDS plots in numerous chapters to present multivariate ARISA data. Unless otherwise stated, the MDS plots in this thesis are all based on Bray-Curtis similarity which is shown in Equation 1.
Equation 1 The Bray-Curtis similarity measure used throughout this thesis, whereby is the
standardised peak size of taxon (or number of sequences generated from 454
pyrosequencing) i from sample 1 and is the standardised peak size of taxon i from
sample 2 (Anderson et al., 2008). This equation was used to calculate the Bray-Curtis similarity between all pairs of samples within a given data-set.
Bray-Curtis similarity is a standard ecological measure in which a value of 0 represents absolute dissimilarity in community structure (which is based on differences in the presence and abundance of different taxa between two samples), and a value of 100 represents absolute similarity in community structure. Figure 2-7 presents an example of an MDS plot based on a Bray-Curtis similarity matrix. Each of the shapes on the plot represents the ARISA profile from an individual community. The closer two shapes are together, the more similar the communities are suggested to be; the further two shapes are apart, the more dissimilar they are to each other. The communities represented by the blue circles are more similar to the communities represented by the asterisk, than to communities represented by the red squares. There are also markedly different levels in heterogeneity found between the different communities. Heterogeneity in this sense refers to the variation around a centroid, therefore, the communities represented by blue circles display significantly less
heterogeneous than the communities represented by the green triangles. I designed this MDS plot to specifically show a gradient in levels in heterogeneity, which ranges from the blue circles (low heterogeneity) to the green triangles (high heterogeneity). The permutation analysis of multivariate dispersal (PERMDISP) function within PRIMER allows differences in heterogeneity levels (i.e., the variation among a centroid) to be statistically tested. Here, low deviation among the centroid values mean lower levels of heterogeneity and vice versa. PERMDISP analysis includes pairwise comparisons between the deviations among the centroid values, thus identifying significant differences in
heterogeneity between treatments. This is an important test to carry out, as some samples may not vary in their location within an MDS plot (i.e., samples overlap in 2D space), but they may show significant differences in heterogeneity.
Figure 2-7 A hypothetical data-set and an example of a two dimensional MDS plot based on Bray- Curtis similarity. Each individual shape represents an individual community. The closer two points are to each other on the plot, the more similar those two communities are in terms of Bray-Curtis similarity; the further two points are apart, the more dissimilar the communities are to each other. This plot shows four distinct clusters (represented by the triangles, squares, circles and asterisks), which could represent samples taken from four very different environments. Each cluster varies in their heterogeneity (variation around a centroid). For example, the green triangles exhibit higher dispersal than the blue circles and the ecological communities represented by the blue circles are more similar to the communities represented by asterisk
Additional experiments were carried out that tested the reproducibility of ARISA PCR’s upon the same DNA extract in regards to Bray-Cutis similarity. This was performed on four bacterial
communities with each PCR being replicated four times (technical replicates). These revealed high levels of Bray-Cutis similarity between technical replicates (average Bray-Curtis similarity of 90%). In Chapter Three, I used Bray-Curtis data to generate heat maps of community similarity using Arc- map 10 (ArcGIS, Wellington, NZ) software. Here, the Bray-Curtis data were used in conjunction with the easting and northing data to generate visual heat maps, whereby areas represented by similar colours were more similar in terms of Bray-Curtis similarity. See Bellamy, 2013 and Lear et al. 2014 for in-depth descriptions of the procedure.
2.7.2
PERMANOVA
Permutational multivariate analysis of variance (PERMANOVA) is a routine for testing the simultaneous response of one or more variables to one or more factors in an analysis of variance (ANOVA) experimental design using permutation methods (Anderson et al., 2008). Consequently, PERMANOVA is a useful and robust statistical tool for investigating multivariate data and assesses
whether any differences in data observed between treatments, or different environmental samples, were statistically significant (Lear et al., 2009, Krause et al., 2012, Reith et al., 2012, Rivas-Ubach et al., 2012). PERMANOVA has been utilised in all of my chapters to investigate the effects of numerous experimental factors on bacterial community structure. Unless otherwise stated, all PERMANOVAs were carried out on Bray-Curtis similarity matrices (Equation 1) using type III sum of squares (to account for potentially un-balanced results; DNA extractions failing, for example) and 9999
permutations under the reduced model (this method empirically gives the best power and the most accurate Type I error for complex experimental designs) (Anderson et al., 2008, O'Donnell et al., 2009). In addition to the main PERRMANOVA tests, contrasts were used to compare one or more groups of samples together, vs. one or more other groups, as appropriate to the different datasets. The PERMANOVA function was also utilised to investigate univariate data, such as the Gini coefficient (see section 2.8) and taxon-richness. In this case, the ‘PERMANOVA’ is termed ‘permutational
ANOVA’ and was carried out using the ‘unrestricted permutation of raw data’ model using 9999 permutations, as recommended for univariate data (Anderson et al., 2008). The approach was used so that the results would be comparable between univariate and multivariate analysis and it allows for two factor designs and it can be utilised among unbalanced datasets (Scyphers et al., 2011). Indeed, no major differences were observed using the permutational ANOVA approach compared to normal ANOVA analysis in R (e.g., although small differences in P and F values were identified, the overall significance of the relationship was always the same).
2.7.3
Distance-based linear regression models (DistLM)
To investigate the relative importance of different environmental parameters on bacterial
community structure, distance-based linear model (DistLM) analysis was carried out in Primer 6. This regression analysis method has been successfully utilised in numerous ecological studies, within the micro and macroscopic realm (Sun et al., 2012, Adair et al., 2013, Mapelli et al., 2013, Vilar et al., 2013). Prior to DistLM analysis, environmental variables were examined using draftsman plots (a simple array of two-variable scatter diagrams). This was done to ensure there was no correlation between different variables, which would bias the results. If any correlations were identified, one of the environmental measurements was removed from the analysis. Marginal tests were carried out first to identify which environmental variables explain the highest percentage of variation. DistLM was then carried out using ‘forward selection’, as previously described in the literature (Lear et al., 2008, Dell’Anno et al., 2009, Lear et al., 2009, Bissett et al., 2010, De Corte et al., 2011, De Corte et al., 2012), which adds one variable at a time (starting with the most significant variable from the marginal tests) to the model, until no improvement of the selection criteria is possible. The selection criteria ‘r2’ was used, which represents the total amount of variation explained by an environmental parameter using 9999 permutations (Lear et al., 2011). Using other selection criteria (such as ‘all
specified’, ‘best’ or ‘step wise’) largely did not change the overall results of the regression. The r2 values generated from DistLM analysis were always cumulative.When performing DistLM analysis on univariate data, such as the Gini coefficient or taxon-richness, the Euclidean distance was used instead of a Bray-Curtis measure (Scyphers et al., 2011, Mayer-Pinto et al., 2012).
2.7.4
SIMPER analysis
The Similarity percentages (SIMPER) function was used to determine the percent contribution of each taxon to average dissimilarity within or between groups of samples. SIMPER therefore allows the taxa that contribute most to distinguishing groups of samples to be identified and has been used in numerous ecological investigations (Dangles et al., 2004, Wolsing & Priemé, 2004, Relva et al., 2010, Lear et al., 2012, Santiago-Rodriguez et al., 2013).
2.7.5
Cluster and SIMPROF analysis
Dendrograms are useful and informative ways of representing similarity between paired objects. Cluster dendrograms based on Bray-Curtis similarity matrices were generated in Primer 6 using the ‘cluster’ function. In addition, ‘similarity profile’ (SIMPROF) analysis was used to test for significant levels of clustering between objects. SIMPROF is therefore a permutational test of the null
hypothesis that a set of samples, which are not a priori divided into groups, do not differ from each other in community structure (Anderson et al., 2008). The results of the SIMPROF test are displayed directly on the dendrogram, whereby samples connected by a red line cannot be significantly differentiated.
2.7.6
RELATE function
I used the RELATE function in Primer 6 to compare Bray-Curtis similarity matrices, and thus determine if there were significant differences between the resemblance matrices generated from 454 pyrosequencing of 16S rRNA genes and ARISA data. This Mantel-type statistic calculated a rank correlation coefficient (e.g., Spearman’s correlation) between all the elements of their respective similarity matrices. Consequently, if the among-sample relationships match, in exactly the same way in both datasets (e.g., samples 1 and 3 are the most similar and samples 9 and 4 are the most dissimilar in both resemblance matrices), the correlation coefficient ρ=1. If there is absolutely no similarity between the two datasets ρ=0. Although similar to Mantel coefficient, which mostly tests linear relationships, the rank correlations are more flexible and better suited for the analysis of two resemblance matrices (Anderson et al., 2008). The RELATE function was used with the rank
correlation method ‘Spearman’ and 9999 permutations. The statistical significance of the relationship was determined by the ‘Significance level of sample statistic’, whereby a significance level of less than 5% was determined to be a statistically significant interaction.