To evaluate population genetic structure in S. hystrix, I employed traditional analyses such as Wright's (1965) F-statistics (Paper 2, 3 & 5) and statistical methods based on Bayesian probability theory (Paper 3, 4 & 5). In recent years, Bayesian approaches (e.g. Corander et al. 2003; François et al. 2006;Guillot et al. 2005; Piry et al. 2004; Pritchard et al. 2000) have gained considerable popularity in population genetics. Bayesian clustering methods attempt to identify cryptic population structure by assigning individuals to genetically divergent clus- ters, based on their individual multilocus genotypes (Corander & Marttinen 2006). This is done by minimising Hardy-Weinberg as well as linkage disequilibrium within clusters. Various methods can also incorporate prior information such as the spatial location of individuals (re- viewed in Manel et al. 2005).
Bayesian clustering methods generate a grouping of individuals without any need to define population units in advance. This is particularly useful in species with continuous dis- tributions where an a priori definition of population units may be arbitrary or even erroneous (Balloux & Lugon-Moulin 2002; Mank & Avise 2004; Pearse & Crandall 2004). Determining the number of genetic clusters (K) in the dataset, however, is a major challenge (e.g. Durand et al. 2009). Furthermore, there is dissent about the applicability of certain methods (e.g. François et al. 2008; Durand et al. 2009; Guillot 2009). Finally, features of the dataset such as the level of differentiation among groups or isolation by distance may influence the per- formance of Bayesian inference (Latch et al. 2006; Pritchard et al. 2007). Various authors stated that the use of different methods and the critical examination of results may be the best way to avoid misinterpretations and to obtain results of biological significance (e.g. Pearse & Crandall 2004; Excoffier & Heckel 2006; Latch et al. 2006).
Following this advice, various Bayesian clustering approaches were tested and the results were examined carefully. Specifically, I applied STRUCTURE version 2.2 (Pritchard et al. 2000), BAPS version 4.1 (Corander et al. 2003; Corander & Marttinen 2006), GENELAND version 1.0.5 (Guillot et al. 2005), TESS version 1.1 (François et al. 2006; Chen et al. 2007) as well as GeneClass2 (Piry et al. 2004) which implements the partial Bayesian criterion of Rannala & Mountain (1997). Especially the widely used software STRUCTURE is poorly suited for inferring K (S. Baird, personal communication). Indeed, when applying
APPENDIX
STRUCTURE to the S. hystrix dataset from the GBR (Paper 5), it was not possible to reliably estimate the number of clusters because the corresponding values of posterior probabilities continued to increase with K. This could be because the dataset did not conform precisely to the STRUCTURE model, e.g. due to inbreeding (Pritchard et al. 2007). In an attempt to over- come this problem and obtain reliable estimates of K, I applied an ad hoc statistic ∆K, based on the rate of change in the log probability of data between successive K values (Evanno et al. 2005). With this method, the highest likelihood was obtained for K = 2. As the method does not provide a way to validate K = 1, this result could not be verified.
With BAPS analyses performed in the group mode, the highest number of clusters is defined by the maximum K that is predetermined by the user. As a result, BAPS could not be used to evaluate within-site population genetic structure due to limited resolution. In the indi- vidual mode, very large numbers of clusters were obtained, with considerable variation be- tween runs. This tendency of BAPS to overestimate the number of underlying populations was also described by Latch et al. (2006).
The clustering algorithm implemented in GENELAND yields reliable estimates of K (S. Baird, personal communication). GENELAND performs well when the underlying popula- tion structure can adequately be described by polygons (Guillot et al. 2005). In the present analyses, a major problem with GENELAND was that the system frequently crashed when performing analyses in the 'non-spatial mode'. In the 'spatial mode', the program performed clustering on the level of sampling sites but did not enable to unravel genetic structure within stands.
Given these constraints, STRUCTURE, BAPS and GENELAND analyses were dropped. Instead, the final results were based on TESS (Paper 4 & 5) and GeneClass2 (Pa- per 3). According to François et al. (2006), TESS performs better than GENELAND when the underlying spatial population genetic structure is too complex to be described by simple polygons (but see Guillot et al. 2009). When applied to the S. hystrix dataset from the GBR, TESS revealed ten clusters, five of which were very clearly defined, and uncovered popula- tion genetic structure down to the smallest spatial scales within stands (Paper 5) or within single colonies (Paper 4). Three observations illustrate that the TESS clusters reflected rele- vant biological units instead of artefacts of the clustering algorithm: First, as shown for Heron Island, differentiation among the main clusters was much higher than among sites while mean heterozygote deficits within clusters were greatly reduced (see also Mank & Avise 2004). Second, a considerably lower number of significant inter-locus associations was found within clusters than within sites. Third, parentage analyses suggested that cluster member-
APPENDIX
ship was 'inherited' because in most cases, the most likely parents of a given juvenile were assigned to the same cluster as the juvenile itself (Paper 5).
For the two S. hystrix stands from the Red Sea analysed in Paper 3, HR2 and GB, TESS did not infer any hidden population substructure, presumably due to low levels of ge- netic differentiation. This was consistent to the low FIS values calculated for these sites. To identify individual immigrants, the population exclusion method as implemented in Gene- Class2 (Piry et al. 2004) was applied. With this method, the number of individuals classified as immigrants largely depends on the threshold of exclusion set by the user. To compensate for the limited power of the dataset to identify immigrants (i.e. unknown source populations, limited number of loci), a low stringency criterion of α = 0.05 (95% exclusion probability) was applied. From the resulting estimate of Nm, the expected Type I error (N * α) was subtracted to obtain a reliable minimum, as suggested by Paetkau et al. (2004).
Finally, with all Bayesian analyses, I abstained from assigning immigrant individuals to presumptive source populations because this would require sampling all possible source populations, which is clearly beyond the scope of this study. Taking these precautions, Bayesian clustering methods provide a powerful way to uncover hidden population genetic structure in corals that would remain undetected with other approaches.