4.6 Conclusion
5.3.3 Statistical analysis
Spatial autocorrelation
Determining habitat-occurrence relationships needs to account for spatial
autocorrelation (SAC) in occurrence (abundance), habitat (environmental) or both. The geographic distribution of individuals can be spatially auto-correlated due to movement restrictions, social organisation or aggregative reactions to signals from other individuals of the species. Environmental variables are usually also spatially auto-correlated and are discussed at length in Legendre (1993). To asses the extent to which SAC was evident in our data, we applied an auto-correlation function
(ACF) to multiple linear subsections of all AUV transects (dives). The ACF
indicates at what lag (distance) SAC disappears. We assumed that observations further apart than the lag (distance) indicated by the ACF were spatially not
correlated. For each dive we generated several lag (distance) values one for each linear subsection. We took the largest lag (distance) as a threshold to rule out SAC. Distances between observations (presence of fishes per image) were calculated
using geographical easting and northing and the Pythagorean theorem. This
approach was only taken for binomial (presence/absence) analysis. To visualise the extent of spatial auto-correlation we used correlograms (Bjørnstad and Falck, 2001), which depict spatial dependencies between locations at different lag distances
using Moran’s I. Two relationships were investigated (A) location (spatial x, y
coordinates) ofH. percoidespresence-absence and (B) location and extent of habitat
classes.
Linear mixed-effects models (LMEs)
We investigated relationships between continuous variables (fish length and weight) and environmental variables (depth and habitat) using linear mixed effects models (LMEs) in R, package nlme (Pinheiro et al., 2009). The Maximum Likelihood (ML) method was preferred over the default Restricted Maximum Likelihood (REML) method as we intended to compare models with different fixed effects structures. LMEs allow for the observational units (image) to be clustered, e.g., observations by dive. Random effects across dives were assumed to vary. Another advantage of LMEs is their ability to incorporate several random effects that are spatially nested, i.e., habitat classes within dives within sites. LMEs were chosen since they can handle pseudoreplication. In our case, images fall in the category of spatial pseudoreplication where several measurements (length) were made from the same
vicinity (dive). Pseudoreplication violates one of the fundamental assumptions in statistical analysis; independence of errors. Conditions within each habitat class will affect all length measurements within this particular habitat class and therefore violate the independence of errors assumption. The best (minimal adequate) model was chosen by backward selection, where explanatory variables were deleted one at a time from the full (saturated) model. The model was
F Li =α+β×depthi×habitati+ai+i
log-transformed fish length (F L) and weight were modelled as an intercept (α) plus
the linear interaction between depth and habitat class effect, a random intercept (a)
and an error term . Index i refers to an image, where a length measurement was
taken. Fixed effects, depth and habitat class, influence the mean of y (fish length
and weight), whereas random effects influence only the variance of y. The reduced
model was compared to the full model utilising F-likelihood ratio tests. Restricted
Maximum Likelihood (REML) was used to compare models with different random effects structures and Maximum Likelihood (ML) was used for models where the fixed effects structure differed. Fish lengths and weights were log-transformed before analysis. Sightings without length measurement were excluded from analysis.
Generalised linear mixed-effects models (GLMMs)
Binary response variables, i.e., presence/absence ofH. percoideswere analysed using
used the R package lme4 (Bates and M¨achler, 2010), as it provides AIC (Akaike Information Criterion) for model selection. AIC is a measure of the fit of a model (Crawley, 2007) and for each model is calculated as:
AIC =−2(log−likelihood) + 2(p+ 1)
where p is the number of parameters in the model (1 is added for estimating the
variance). The lower the AIC number the better the fit of a model. As with LMEs we arrived at the ‘best’ model by backward selection. The model was:
logit(pi) = α+β×depthi+ai+i
for each habitat class individually the probability (p) of H. percoides presence in
imagei is modelled as an intercept (α) plus the linear depth effect, a random
intercept (a) and an error term . The depleted model was compared to the full
model using ANOVA. A non-significant result warranted model simplification, i.e., deletion of explanatory variables. After initial analysis using all habitat classes and
dives in one data set we decided to model H. percoides presence/absence for each
habitat class separately. This would allow for easier presentation of our results. The
binary response variable was presence or absence of H. percoides in image i. We
investigated the probability of H. percoidesoccurrence per image by depth for each
habitat separately. Site, dive and habitat class were incorporated into the model as random effects.
Habitat preference index
To address habitat preferences of H. percoides we used a log likelihood ratio test of
goodness of fit (Tolimieri et al., 2008) recommended by Sokal and Rohlf (1995). This
method is similar to Pearson’sχ2test, however, the test statistic is the deviance from
a log-linear model. We tabulated observed counts by habitat and calculated expected counts assuming no habitat preference by adjusting for frequency of occurrence for each habitat. For illustrative purposes we created a preference index (observed proportions minus expected proportions).
Juveniles and adults
Finally, we investigated the proportion of juvenile and adult individuals by habitat, depth, dive and site. Due to our inability to determine the sex of fishes we used an average value based on numbers reported by (Park, 1993): males mature at 10 – 13 cm TL (approx. 2 – 5 years of age) females mature at 9 – 17 cm TL (2 –
6 years of age). Individuals >12.25 cm (24.7 g) were considered adult fish. Our
decision to take an average is based on a sex ratio close or equal to 1 and seems to prevail in other live bearing non-targeted species of the Sebastidae family but there are several reasons why the sex ratio can deviate from 1, e.g., fishing pressure (Harvey et al., 2006). A classification tree model using binary recursive partitioning was also used to investigate habitat preferences of juvenile and adult fish. Here,
individual length measurements (nlength = 937) were split along the coordinate axis
maximally distinguishes the response variable (length) between the two branches (Breiman et al., 1984).