4.2.1. Site locations and data acquisition
Sites locations, data acquisition and modelling methodologies are described in Chapter 2.
This section discusses species turnover, community composition and environmental variables, as summarised in Table 4.1.
Table 4.1. Key modelling terminology
4.2.2. Statistical analysis
Following methodology developed by Stephane Dray et al. (2006), raw ELFA community data were incorporated into a ‘relative neighbour’ connection network - a straight line graph connecting nearest points in a point set, ensuring all sites had at least one neighbour (see Figure 4.1). This was preferable over a more distributed network, such as Delaunay triangulation (Delaunay, 1934), for several reasons. First, unlike macro-organisms we assume movement of microorganisms to be dynamic, although largely limited to aerial dispersal and due to the nature of this, to also favour local rather than regional migration. Second, due to the relatively small number of sites sampled over a large area, a more intricately connected network increases the chance of masking large scale trends, which was our primary interest. The relative neighbour network was created as a function using site latitudes and longitudes, in the ‘spacemakeR’ package in R (Anon, 2011)
Terminology Description Source
Species turnover
The changing profile of the ELFA fatty acid data
over the latitudinal gradient ELFA data, see Chapter 2.1.3 Table 2.4 Community
composition
The changing profile of the grouped fatty acid data
into indicative microbial communities ELFA data, see Chapter 2.1.3 Table 2.5
Environmental variables
The environmental and chemical properties either observed or later provided from lab testing
Observation on site, and Lab provided data, see Chapter 2.1.2, and Table 2.2 / 2.3.
___________________________________________________________________________
Figure 4.1. The Antarctic Peninsula (left): Latitudinal gradient over which sites were sampled
(Ferringo et al., 2005). (Right) The relative neighbour connection network chosen to represent
potential site connectivity in the calculation of Moran’s I
___________________________________________________________________________
The relative neighbour graph was used to identify sites of geographic and ecological proximity. This is achieved by using the geographic connectivity between sites as a means of spatial weighting in the calculation of Moran’s I (Moran, 1950) for changes in fatty acid composition in the ELFA data. Moran’s I values range from -1, indicating perfect dispersal to +1, indicating perfect correlation. A zero value is indicative of a random spatial pattern. These identified the level of spatial autocorrelation between site fatty acid compositions and thus the nature of the gradient in species turnover along the latitudinal transect. PCNM eigenfunctions were calculated for all our Peninsula locations in the ‘spacemakeR’ package in R (Anon, 2010). The Akaike Information Criterion (AIC) was used as a means of model
evaluation, also following recommendations of Burnham and Anderson (2002) to accept a threshold value of <2 for AICi- AICminto suggest substantial evidence for an adequate model fit. Monte Carlo permutation analysis was applied to test for the global and local significance of AIC identified spatial structures, using the R package ‘sedarjombart’ (Jombart et al., 2010). We retained the best fitting spatial model (vector 1) as an indicator of major trend in species turnover to test for evidence of spatial structuring in the species turnover gradient.
The sites were partitioned and a dummy variable created according to the spatial trend exhibited in PCNM vector 1 map to further explore how ecological conditions may be driving spatial community structure. Multivariate dispersion as proposed by Anderson et al. (2006) is a technique which weights an order of magnitude change in abundance the same as a change in species composition. This was important for the ELFA data which does not define individual microbial species but rather abundance of fatty acids. First, Bray-Curtis was chosen as a measure of ecological dissimilarity, then Euclidean distance based upon the dissimilarity algorithm was preserved by use of principal coordinate analysis (PCO), so that distance of an individual unit to the group centroid could be calculated. A p-value was then obtained by permuting least square residuals. The availability of this multivariate test, termed PERMDISP, for homogeneity amongst group diversity also allows for the option to superimpose biological diversity with environmental heterogeneity and test robustly for differences in structure. PRIMER 6 and PERMANOVA + software (Clarke and Gorley, 2006) was used to assess group dispersal post north-south separation.
Prior to PERMDISP, we normalized the environmental variables to adjust for different measurement scales and used Euclidean distance as a measure between sites to form a resemblance matrix. PERMDISP was performed using distance to group centroids and P- values obtained via 9999 permutations as recommended by Anderson (2006). PCO was used for visual representation of group dispersal for environmental factors.
To identify more specific relationships between community composition and environmental drivers we also pooled fatty acid abundances according to their taxonomic representatives (e.g. 10Me 16:0 is indicative of Actinomycetes; see Chapter 2.1.3, Table 2.4). These pooled abundances were used as broad-scale indicators of key microbial community groups to partition more effectively the specific environmental influences on key group dispersal. These groupings were used as response variables in Generalized Linear Models (GLM) and
in PERMDISP to distinguish environmental drivers specific to each microbial fraction of the community.
Variable inflation factor (VIF) analysis was used in selection of appropriate environmental variables to minimise collinearity, performed in the ‘car’ package in R (Fox 2010) via the methodology of Zuur et al. (2007). All variables used in this analysis had a VIF value of < 4, beneath adequate threshold values recommended by Montgomery and Peck (1992). In the GLM, full models were initially fitted including all VIF selected variables and model fit inspected. Parameters, representative of environmental variables, were removed if doing so improved model fit until the minimum adequate model was found. In this way, the GLM models contained the strongest set of environmental predictors for each of the microbial responses variables.