DNA amplification, library construction, and sequencing

All DNA amplicon libraries were prepared by amplifying a region of the mitochondrial COI gene which has been previously tested as suitable for specimen identification and species delimitation while showing within-species polymorphism in feather mites (Doña et al. 2015a, Doña et al. 2015b, Mironov et al. 2015), and by adding the Illumina-specific sequencing primers, indices, and adaptors. This was done following the recommended protocol by Illumina for bacterial 16S metabarcoding, with some modifications. Similar protocols have been used by other authors (e.g. Lange et al. 2014, Vierna et al. 2017). Also, we followed the wet-lab recommendations in Schnell et al. (2015) to minimize cross-contamination events. Specifically, we always used filter tips, the plates were exclusively opened under laminar flow hoods, which were preiodically wiped down with a 0.5% bleach solution, and only one plate was processed per day. The DNA metabarcoding libraries were constructed in a two-step PCR:

PCR1 was carried out using 2.5 μL of DNA as template in a final volume of 25 μL containing 6.50 μL of Supreme NZYTaq Green PCR Master Mix (NZYTech), 0.5 μM of each primer, and PCR-grade water up to 25 μL. The thermal cycling conditions were as follows:

an initial denaturation step at 95 ºC for 5 min, followed by 35 cycles of denaturation at 95 ºC for 30 s; annealing at 55 ºC; extension at 72 ºC for 45 s; and a final extension step at 72 ºC for 10 min. The primers were the bcd_F05 and bcd_R04 (Dabert et al. 2008) with a 5' overhang that contained the Illumina sequencing primer sequences. A negative control was included in every PCR round to check for cross-contamination during the PCR.

The products of PCR1 were purified by Solid-Phase Reversible Immobilization (SPRI) (Hawkins et al. 1994), using Mag-Bind RXNPure Plus magnetic beads (Omega Biotek). In order to eliminate the primer dimers generated during PCR, we used a final bead concentration of 0.5 X, thus size-selecting the high molecular weight amplicons over primer dimers. The purified products were loaded in a 1% agarose gel stained with GreenSafe (NZYTech) and visualized under UV light.

85

PCR2 was carried out using 2.5 μL of the purified product from PCR1, and the exact same conditions as for PCR1, but using 60 ºC as the optimal annealing temperature, and only 5 cycles. The primers used for this PCR consisted of a 3' region that anneals to the 5' end of the PCR1 products, and a 5' region that incorporated the adaptors and indices. A total number of 16 forward primers and 24 reverse primers were used for a total of 384 different index combinations. All indices used diferred by at least two bases. The products obtained were purified following the SPRI method as indicated above. Likewise, the purified products were loaded in a 1% agarose gel stained with GreenSafe (NZYTech) and visualized under UV light.

The 96 libraries and their corresponding 96 replicates from Experiment 1 and Experiment 2 (see Experimental design section) were pooled together and run in one MiSeq 300PE run (MiSeq Reagent Kit v3) (MiSeq Run 1 hereafter). Likewise, all 384 libraries from the Field Test experiment were pooled together and run in one MiSeq PE300 run (MiSeq Reagent Kit v3) (MiSeq Run 2 hereafter).

Bioinformatic analyses

The forward (R1) and reverse (R2) fastq reads of each MiSeq run were quality-checked with FastQC (Andrew 2010). Then, they were imported into Geneious 8.1.7 (Kearse et al. 2012) for visual inspection and quality-trimming. We trimmed a region of variable length at the 3' end of each file, according to the average Phred score (minimum quality score of 20) of each MiSeq run. Specifically, for MiSeq Run 1 we trimmed 36 bp and 120 bp from R1 and R2 reads, respectively. Likewise, for MiSeq Run 2 we trimmed 96 bp and 150 bp from R1 and R2 reads, respectively. That way, the length of the reads was identical for all samples in a given MiSeq run. The R1 and R2 files were then exported in FASTA format.

A Python script (MMIS, Supporting Material) was written to automatise the next steps of the bioinformatic pipeline. The R1 and R2 files were concatenated using the fuse.sh script available from the BBmap package version 37.00 (Bushnell 2014). Only concatenated sequences with the maximum possible length were kept.

The split_libraries.py script included in the pipeline of QIIME version 1.9.0 (Caporaso et al. 2010) was used to label sequences with the sample identifier and merge them into a unique file per MiSeq run. Then, the de novo clustering method and the UCLUST algorithm

86

(Edgar 2010) were used to pick the Operational Taxonomic Units (OTUs) with a 100% of similarity threshold.

A filter to eliminate or minimise mistagging events was implemented. Also referred to as “tag jumps” (Schnell et al. 2015) or “index switching” (Sinha et al. 2017), mistagging is a recently described sequencing artifact that results in the misassignment of reads (generally from 1 to 10 percent) to the wrong sample (Esling et al. 2015; Sinha et al. 2017; Owens et al. 2017). Our filter is based on the rationale that OTUs with a high number of sequences in a sample would “donate” reads, at a low rate, to other samples (Esling et al. 2015). For a given OTU, our filter identifies the sample where that OTU is most abundant in terms of number of reads across all of the samples in a pool. This OTU is treated as the “donor”, meaning that it would be the source of the read transfer to other samples. OTUs with a number of sequences below a threshold of 10% of the “donor” OTU or less than 100 reads were filtered out. This conservative threshold was empirically set after observing that a threshold of 6% successfully removed all of the non-expected taxa in our mock communities from Experiment 1 and 2 (where we knew the species included in each well).

After the mistagging filter, the most abundant sequence of each OTU was selected as the representative sequence of that OTU. The assign_taxonomy.py script of QIIME was used for taxonomic assignment of each representative sequence. Assignment was done with the RDP classifier (Wang et al. 2007) and a minimum confidence score of 96.6% against a reference database (Doña et al. 2015). The reference databases (Appendices 1, 2, Supporting Material) contained one sequence from each of the feather mite species considered in Doña et al. (2015). Since the query sequences spanned the 5' and 3' ends of the reference sequences, but not their central region, the central region of the reference sequences was previously deleted to perfectly match the query sequences. According to the average quality of each MiSeq run, the region deleted from the central part of the reference sequences was slightly different when analysing the results of each MiSeq run (the length of the reference sequences was 298 bp for MiSeq Run 1 and 389 bp for MiSeq Run 2). Then, an in-house C++ program was used to check if assigned sequences contained STOP codons. And, those sequences with STOP codons were excluded from downstream analyses.

87

In document On the diversification of highly host-specific symbionts: the case of feather mites. (Page 84-87)