CHAPTER 2 : GENOTYPING
2.3. Results
2.3.1. BAC Sequence Data
Identifying the Alternative Allele BAC colonies
The superpool plate of the BAC library was screened using SSR-based primer mixes (PM) 34, 61, 79 and 80 listed in Table A.1, which amplified sequences within Genes A, B, X and Z. 22 BAC plates were identified that contained alternative allele sequences (Figure 2.2). Each row and then each colony of an identified row was PCR tested in order to identify the specific BAC colonies containing alternative allele sequences (Figure 2.3). In total 11 BAC colonies were identified, of which five contained Gene A and Gene B sequences, three contained Gene X sequences, and the remaining three contained Gene Z sequences.
Figure 2.2. Identification of BAC plates containing the alternative allele sequences using gel electrophoresis of PCR products. a. Each row of the superpool plate was PCR tested. In this gel, lanes 9 and 10 contained the positive gel band used to identify a specific column within a pool plate. b. The identified column of a pool plate was then PCR tested. In this gel, the positive band in lane 3 was used to determine the BAC plate containing the alternative allele sequences. The size standards (far right lane of each gel) are 1 kb Plus DNA ladder.
Confirming the Alternative Allele BACs
Sanger sequencing of the BAC DNA confirmed that the identified BAC colonies contained alternative allele sequences and not LOP sequences (Figure 2.4 & 2.5). The alternative allele sequences were classified into different groups based on their similarity to the previously generated alternative allele sequences identified by the PFR apomixis research team (Dr Andrew Catanach, personal communication). As shown in Figure 2.4, three BAC colonies contained alternative allele sequences which matched the alternative allele 1 grouping of Gene X. Three other BAC colonies contained sequences that matched the alternative allele 1 grouping of Gene Z (Figure 2.4). Gene B alternative allele 1 was identified in a single BAC colony, while alternative allele 2 of Gene B was identified in four other BAC colonies (Figure 2.5).
Figure 2.3. Identification of positive BAC colonies using gel electrophoresis of PCR products. a. Each row of the cultured BAC plate was PCR tested. In this gel, lanes 2 - 5 contained the positive gel bands used to identify the rows containing the BAC colony of interest. b. Each BAC colony within the identified rows was then PCR tested. In this gel, the positive band in lane 1 identified the BAC colony containing the alternative allele sequence. The size standards (far right lane of each gel) are 1 kb Plus DNA ladder.
Figure 2.5. DNA sequence alignments of amplified regions of Genes B of previously known LOP and alternative allele sequences, with sequences of PCR products from identified BACs. The BAC sequence data derived from five BAC colonies aligns with the previously known sequence of alternative allele 2 of Gene B, and one other colony aligns with alternative allele 1 of Gene B. The relative position in base pairs along the lop110 contig is noted under each alignment.
Figure 2.4. DNA sequence alignments of amplified regions of Genes X and Z, of previously known LOP
and alternative allele sequences, with PCR sequences from identified BACs. a. The BAC sequence data derived from three BAC colonies aligns with the previously known sequence of alternative allele 1 of Gene X. b. Similarly, the BAC sequence data derived from the other three BAC colonies aligns with the previously known sequence of alternative allele 1 of Gene Z. The relative position in base pairs along the
Table 2.1. Read count and sequence statistics from BAC sequencing.
454 Sequencing of the Alternative Alleles
Eight BACs were sent to Axeq Technologies for Roche 454 pyrosequencing. The 454 pyrosequencing resulted in 277 027 reads giving a total of 151 400 031 bp of DNA sequence from all eight BACs combined (Table 2.1).
BAC Sequence Assembly
Initially, gsAssembler v2.6 was used to trim the raw 454 reads for BAC vector and E. coli contamination. The remaining 454 reads were assembled with gsAssembler v2.6 using the default assembly parameters (minimum read length of 20 bp, 40 bp minimum overlap and 90% minimum overlap identify). The first assembly resulted in an average N50 value of 29.2 kb for all BACs (Table 2.2). On average 13 contigs were constructed from each BAC, with an average length of 12 363 bp (Table 2.2). To further improve the assembly statistics, the second assembly was conducted using gsAssembler v.2.7 with altered assembly parameters where the minimum contig and scaffold length were set to 200 bp. This resulted in an improvement in the average N50 value to 31.1 kb, with a marked increase in the average contig size to 14 699 bp, and a decrease in the average number of contigs constructed per BAC to 12 (Table 2.2). The third assembly step was performed by Dr. Andrew Catanach using gsAssembler 2.7. In this step, assembled contigs were ordered based on contigs of overlapping BACs, either of the same alternative allele or of the highly homologous allele. Subsequently, three continuous draft sequences were generated for three separate alternative alleles (Table 2.2) that were subsequently annotated with the LOP candidate genes. BAC No. Gene Number of reads Total DNA sequence (bp) 152 Z 14 451 7 732 994 160 A & B 47 675 25 450 375 308 Z 15 194 8 227 024 326 X 45 436 23 999 124 344 X 31 998 17 256 243 353 A & B 21 394 12 193 568 393 A & B 66 425 37 062 568 430 A, B, Z 34 454 19 478 135 Total 277 027 151 400 031
2.3.2.Genotyping - Fragment Length Polymorphism
Primer Design
Seven new primers were designed, PM81, 82, 84-88 listed in Table A.2, to amplify microsatellites identified when comparing the newly assembled alternative allele sequences with the known LOP
sequences. Assembly Step BAC No. Number of Contigs N50 Largest Contig (bp) Average Contig Size (bp) 1 152 11 39 976 51 454 14 581 160 22 27 858 31 887 6 813 308 14 15 937 35 517 10 595 326 13 38 336 71 564 14 979 344 11 30 531 48 064 11 850 353 5 38 677 48 489 10 995 393 9 26 588 80 192 20 400 430 19 15 924 35 573 8 693 Average 13 29 228 50 343 12 363 2 152 11 40 013 51 466 14 621 160 14 28 596 32 029 10 568 308 13 15 938 35 552 11 426 326 11 43 018 71 572 17 848 344 5 40 052 48 176 25 996 353 14 38 670 48 745 11 879 393 11 26 847 80 192 14 958 430 16 15 967 35 587 10 295 Average 12 31 138 50 415 14 699 3 152 160 308 430 39 40 033 70 518 7 548 326 344 18 37 251 54 286 15 858 353 393 15 48 280 80 191 14 940
Table 2.2. The BAC assembly progress using gsAssembler v2.6 & v2.7 and associated statistics.
Fragment Length Polymorphism Genotyping Sub-Population Testing
Initially, 11 primer combinations consisting of both the existing primers used by the PFR apomixis research team and of newly designed primers were selected for sub-population testing. PM1 for Gene X, PM34 for Gene B and PM81 for Gene H (Table A.2) were found to be most suitable for genotyping the two populations. Their suitability was based on the criteria detailed in the methods, such as the clear association between different fragment peaks and different microsatellites, and an identifiable peak associated with the LOP allele.
PM1 was used to generate the fragment length profiles for the three alternative alleles of Gene X (Figure 2.6a). Allele 1 of Gene X was distinguished by a 391 bp peak, allele 2 by a 379 bp peak, and allele 3 by a 383 bp peak (Figure 2.6a). The 346 bp peak was found to be associated with the
LOP allele peak since it was present in all polyhaploids and both parents (R35 & EMS15C) but absent from the LOP-minus mutants 138, 143, 156 and 164 (Figure 2.6a). It was also possible to confirm these allele sizes by linking the size of the fragment peaks directly to the sequence data. The size of the associated fragment peak of each allele could be correlated directly to the length predicted by previous Sanger sequence data (Figure 2.6b).
PM34 was used to generate the fragment length profiles of the three alternative alleles of Gene B. Allele 1 was characterised by a 248 bp peak, allele 2 by a 243 bp peak, and allele 3 also had a 243 bp peak but possessed a different fragment length profile from that of allele 2 (Figure 2.7a). The
LOP allele of Gene B was found to be associated with the 259 bp peak which was present in all polyhaploids and parents, but absent from the LOP-minus mutants 138, 143, 156 and 164 (Figure 2.7a). It was also possible to correlate the size of the fragment peaks of each allele to the length predicted by previous Sanger sequence data (Figure 2.7b).
The fragment length profiles for Gene H, generated using PM81, indicated the presence of four alleles (Figure 2.8a). Allele 1 was characterised by a 323 bp peak and allele 2 was characterized as containing both a 323 bp and a 328 bp peak (Figure 2.8a). Allele 3 contained a single 328 bp peak and allele 4 was likely a null allele since no distinguishing peak was PCR amplified by PM81 (Figure 2.8a). The LOP allele was associated with the 333 bp peak (Figure 2.8a). The size of the
(Figure 2.8b). Sequence data of the alternative alleles of Gene H was unavailable in this study; therefore, it was not possible to validate the peaks of the alternative alleles. However, based on the known LOPH sequence, it was possible to infer the alternative allele sequence data. In the absence of one or both microsatellites, (A)7 and (A)4, the length of the inferred sequences accurately
matched the fragment peak size observed with alternative alleles 1 – 3.
Population Genotyping Gene X
Genotyping the two populations for the alleles of Gene X revealed that around 80% of the polyhaploids had either alternative allele 1 (117 plants), which was referred to as X1, or alternative allele 2 (124 plants), which was referred to as X2 (Figure 2.9). The least common allele was alternative allele 3 (or X3), which was only detected in 35 plants out of a total of 287. In addition, contrary to the repression of recombination at LOP predicted in the hypothesis, eleven recombinant plants were detected, which did not contain the LOP allele of Gene X, which is referred to as
LOPX (Figure 2.9).
Gene B
The three alternative alleles of Gene B were distributed in a similar pattern to those observed in Gene X, where approximately 80% of the population was found to contain either the B1 (120 plants) or the B2 (128 plants) alleles (Figure 2.9). Similarly, the B3 allele of Gene B was the least common in the polyhaploid populations occurring in only 38 plants, which was analogous to the distribution of the X3 allele (Figure 2.9). It is interesting to note that no plants were found that did not contain the LOP allele of Gene B, which is referred to as LOPB. This was in contrast to the 11 recombinants detected which lacked LOPX (Figure 2.9).
Gene H
The segregation of the Gene H alleles differed from that of Genes X and B. One major difference was that the H1 allele was represented in over 60% (179 plants) of the population (Figure 2.9), which contrasted with the more even distribution of the X1, X2 and B1, B2 alleles. Another major difference was the presence of four detectable alternative alleles, as opposed to three in Genes X and B. Of these four Gene H alternative alleles, three appeared infrequently. This included the H2
Figure 2.6. Fragment length polymorphism genotyping of Gene X. a. Polyhaploid-derived fragment profiles showing sizes in base pairs (bp) of distinguishing peaks of the three alternative alleles of Gene X identified using PM1. The 346 bp peak was associated with the LOP allele, as it was absent in the LOP-minus mutant 143. b. Previously-generated sequence data corresponds with the size of the distinguishing peak for each allele.
Figure 2.7. Fragment length polymorphism genotyping of Gene B. a. Polyhaploid-derived fragment profiles showing sizes in base pairs (bp) of distinguishing peaks of the three alternative alleles of Gene B identified using PM34. The 259 bp peak was associated with the LOP allele, as it was absent in the LOP-minus mutant 164. b. Previously-generated sequence data corresponds with the size of the distinguishing peak for each allele.
Figure 2.8. Fragment length polymorphism genotyping of Gene H. a. Polyhaploid-derived fragment profiles showing sizes in base pairs of distinguishing peaks (bp) of the four alternative alleles of Gene H identified using PM81. Allele 4 is consistent with a null allele, which was not amplified by PM81. The 333 bp peak was associated with the LOP allele since it was consistently lower in the LOP-minus mutant 156. b. The previously generated LOP sequence data corresponds to the size of the distinguishing LOP peak.
X1
(117)
X2
(124)
X3
(35)
Gene X
Recombinants (11)B1
(120)
B2
(128)
B3
(38)
Gene B
H1
(179)
H2
(47)
H3
(18)
H4
(42)
Gene H
Recombinant (1)allele, found in 47 plants, the H3 allele found in 18 plants and the H4 allele found in 24 plants (Figure 2.9). Based on the fragment profile, the H4 allele was likely a null allele. In addition, only one recombinant plant was detected that did not contain the LOP allele of Gene H, which is referred to as LOPH.
2.3.3.Genotyping – High Resolution Melt
For HRM genotyping, two new alternative allele-specific HRM primers, PM93 & PM94 listed in Table A.3, were designed around a SNP located 300 bp downstream of Gene Y.
HRM Genotyping
Sub-Population Testing
In total, six HRM primer combinations were selected for sub-population testing at a range of annealing temperatures. The primers were from two existing HRM primers used by the PFR apomixis research team and four newly designed primers (Table A.3). PM92 was found to be the most suitable primer combination for whole population genotyping (Table A.3) because of the ease
Figure 2.9. Distribution of the alternative alleles of Genes X, B and H in the two Hieracium praealtum polyhaploid populations generated through fragment length polymorphism genotyping. Alleles are abbreviated. For example, ‘X1’ refers to the alternative allele 1 of Gene X, etc. The number of plants with a given allele is shown in brackets. Plants that did not contain an LOP allele were classified as recombinant plants.
with which each plant could be classified into different alleles based on the shape of the melting curve profile (Figure 2.10a). The profile for allele 1 was distinguished by two peaks of fluorescence at a melting temperature of 75.5 °C and 77.5 °C (Figure 2.10b). Allele 2 was characterised by a different melting curve profile, with the highest level of fluorescence at 77.5 °C. Allele 3 was characterised by the highest level of fluorescence at 78 °C (Figure 2.10b).
Furthermore, the shape and temperature of the melting curve were analysed so they could be associated with different SNPs identified within the BAC sequence data. This analysis was based on melt-curve characteristics described by MacKay et al. (2008) and Distefano et al. (2012), where
Figure 2.10. High Resolution Melt (HRM) genotyping of Gene Y using PM92. a.
Normalised melting curve profile of the three alternative alleles showing melting temperature on the x-axis and normalised fluorescence on the y-axis, generated by LightScanner® Software (Idaho Technologies). b. Normalised melting peaks of the three alleles showing temperature on the x-axis and maximum change in fluorescence on the y-axis, generated by LightScanner® Software.
Figure 2.12. Distribution of the alternative alleles of Gene Y in the EMS15C Hieracium praealtum
polyhaploid population, detected by HRM-based genotyping. Alleles are abbreviated to ‘Y1’ referring to alternative allele 1 of Gene Y, etc. The number of plants with a given allele is shown in brackets. Recombinant plants, which were did not contain
LOPY, were detected using gel-based genotyping.
Y1
(83)
Y2
(17)
Y3
(18)
Gene Y
Recombinants (3)pairs of G and C nucleotides melt at a higher temperature than pairs of A and T nucleotides. Therefore based on the high melting temperature of allele 3, it likely contained a G & C nucleotide pairing within the double SNP region shown in Figure 2.11. Allele 1 likely contained a T & A pairing and allele 2 was an intermediate between allele 1 and 3, with a T & C pairing (Figure 2.11). However, sequence data of individual polyhaploids would be necessary to confirm the composition of these SNPs.
Genotyping the EMS15C Population
All 118 plants of the EMS15C population were genotyped for Gene Y. Even though three alternative alleles were detected, they were not distributed in a similar way to Genes X and B, in which the Y1 allele was found to be more common than the Y2 and Y3 alleles. In addition, gel-based genotyping was used to identify the LOP
alleles of Gene Y (LOPY), which was not detected in three recombinant plants (Figure 2.12). These recombinant plants were the same as those lacking LOPX.
Figure 2.11. The shape and melting temperature of the Gene Y melting curves could be associated with different types of SNPs identified within the BAC sequence data. The double SNP region is also highlighted.
Table 2.3. Frequency distribution of allele combinations of Genes X and B in the two Hieracium praealtum polyhaploid populations. Low p values denote significant linkage in the boxed highlighted allele combinations, and significant repulsion in the other alleles. N = 287.
117 0 1 1 124 0 0 1 35 0 20 40 60 80 100 120 140 X1 X2 X3 X1 X2 X3 X1 X2 X3 B1 B1 B1 B2 B2 B2 B3 B3 B3 LOP X LOP X LOP X LOP X LOP X LOP X LOP X LOP X LOP X LOPB LOPB LOPB LOPB LOPB LOPB LOPB LOPB LOPB
Fr e q u e n cy Allele
Figure 2.13. Bar chart of the frequency distribution of allele combinations of Genes X and B in the two Hieracium praealtum polyhaploid populations.