• No results found

Interspecific sequence differences in the FT cluster

Chapter 6. Exploring natural sequence variation within the FTa1-a2-c cluster

6.3.3 Interspecific sequence differences in the FT cluster

The FT cluster sequences from both C. reticulatum accessions used in this study (PI489777 and Cr5-9) were almost identical. A single polymorphism (G/A) was found between both sequences in the third intron of FTa2 among the 50736 bp sequenced. Therefore, only the sequence from accession PI489777 was used. This sequence was initially compared with that from the desi cultivar ICC4958, as these two lines are parents of the CRIL2 reference cross (Sharma et al. 2013) in which the major QTL in the FT cluster region was defined (Chapter 4).

The alignment between the sequences from PI489777 and ICC4958 (57048 bp long) revealed a percentage of homology of 87.0% and a high degree of polymorphism, with numerous variable

Chapter 6 Exploring natural sequence variation within the FTa1-a2-c cluster

138 sites observed across the alignment and a total of 365 SNPs and 73 indels (of variable length) identified (Fig 6.4). A fully detailed list with the position and sequence of all the polymorphisms is available in Electronic Supplementary Material 2.

To further investigate this variability, the FT cluster was divided into 7 different regions, as follows (represented by yellow boxes in Fig 6.3); the FTa1 promoter (beginning 11.5 kb upstream from the start of FTa1 mRNA), the FTa1 gene (corresponding to the genomic sequence of FTa1 including exons and introns), the FTa1-a2 intergenic region, FTa2 gene (exons and introns), FTc promoter ( from ≈8 kb upstream to the start of FTc mRNA), FTc gene (exons and introns) and FTc 3’ region ( 4.6kb after the end of FTc mRNA). The nucleotide (π) and indel (πi) diversity as well as the number of SNPs and indel events were for calculated for each of these regions and across the cluster as a whole (Table 6.4). The FTa1 promoter was found to be the most variable by far, with a nucleotide diversity of 0.02176 and 70.6 % of the total SNPs and 41 % of the indels. This contrasts with the other promoter region analysed (FTc), which in comparison showed much lower values for both nucleotide (0.0035) and 7 % of the total SNPs and 10 % of the indels. The region corresponding to the FTa2 gene was the most conserved, displaying the lowest diversity values of the seven regions comprising the cluster.

Despite the elevated number of polymorphic sites, the coding regions were highly conserved; only two SNPs were found in the coding sequence of the three FT genes. The first is a silent substitution (T/C) in the first exon of FTa1, located 60 bases after the ATG codon. The second, a (G269T) transversion, affects the last nucleotide of the second exon and introduces the amino acid change Trp90Leu.

Chapter 6 Exploring natural sequence variation within the FTa1-a2-c cluster

139

Figure 6.4 Diagram of the FTa1-FTa2 cluster (A) and the FTc gene (B) showing the position of all SNPs (orange lines) and Insertion-Deletions (indels, blue

lines) found between sequences obtained from wild accession PI489777 and desi cultivar ICC4958. SNPs and indels in coding or UTR regions in any of the three genes are highlighted in red. mRNA of the different genes is represented by green boxes (exons) and lines (introns) while the ncRNA is indicated in red following the same scheme. A retrotransposon found in the third FTa2 intron in ICC4958 is depicted as a grey rectangle.

Table 6.4 Summary of nucleotide diversity (π) and indel diversity (πi) parameters as well as the number of SNPs and indel events found among the different

regions within the FT cluster of C. arietinum and C. reticulatum. Since only 2 sequences were used, θ and π values are similar and therefore only π is shown.

FTa1 FTa1/a2

Intergenic region

FTa2 FTc

Total

Promoter Genea Total Genea Promoter Genea 3’ region Total

Size (bp) 11526 3991 15517 4466 17039 8075 7348 4603 20026 57048

Nucleotide diversity (π) 0.02176 0.00603 0.01753 0.00352 0.00212 0.0035 0.00515 0.0013 0.00359 0.00736

Number of SNPs 234 24 258 13 25 28 35 6 69 365

Number of indels 30 8 38 8 11 8 7 1 16 73

Indel diversity 0.00269 0.002 0.00251 0.00202 0.00065 0.00099 0.00109 0.00043 0.0009 0.00135 a) Genomic region comprising transcribed exons and introns

Chapter 6 Exploring natural sequence variation within the FTa1-a2-c cluster

140 Polymorphisms were also found in the untranslated regions (UTR) of the different FT genes. A SNP (T/C) was found in the 3’ UTR of FTa1, very close to the end of the mRNA. Also in the 3’UTR of FTa2 a 1-base indel was found in a string of 10 T (the cultivated allele possesses an extra T). The 5’ UTR of the FTc mRNA contains a microsatellite (AT repetition) variable between alleles; the wild accessions possess 11 repetitions of this motif whereas cultivated accession has 13 repetitions. Since there is no way to predict the functional implications of these changes, further research would be needed to test this.

FTa2 retrotransposon insertion.

According to the annotation of CaFT2 gene in NCBI (Gene ID 101496618), this gene possesses a distinct and unusual genomic structure: Whereas FT genes in general have a conserved genomic structure with 3 introns in the coding region, FTa2 has an extended 5'UTR containing three additional introns (see Chapter 3, section 3.3.1). The most significant difference observed between the FT clusters of accessions PI489777 and ICC4958 was a large insertion of 5219 bp located just 508 bp upstream of the FTa2 coding region in ICC4958 (Fig. 6.4), and situated within the third of these FTa2 additional introns. This insertion contained a large open reading frame (ORF) of 4521 bp encoding a protein with several domains characteristic of a retroelement (retrotansposon or retrovirus); a reverse transcriptase, an integrase (that mediates integration of a DNA copy of the viral genome into the host chromosome), a gag gene (encodes structural proteins that form the virus-like particle inside which reverse transcription takes place) and a ribonuclease H (responsible of the original RNA template degradation, generation of polypurine tract and final removal of RNA primers from newly synthesized DNA strands). The presence of Long Terminal Repeats (LTR) and the phylogenetic relationships of the different domains identified this element as a LTR-retrotransposon of the Ty1/Copia type (Finnegan 2012; Ustyantsev et al. 2015).

In addition to this element (from now referred as Retrotransposon 1, or RT1), the ORF analysis identified two more retrotransposons within the FT cluster, a second in the FTa2-FTc intergenic region (RT2, 5265 bp) and a third (RT3, 5319 bp) in the third intron of FTc (Fig 6.5). Both RT2 and RT3 possess LTRs flanking their coding regions and have similar domains to those present in RT1, so it is assumed that they all belong to the same type of retrotransposon (for further

Chapter 6 Exploring natural sequence variation within the FTa1-a2-c cluster

141 information about the position, BLAST E-values and a brief description of the different domains present in each retrotransposon refer to appendix 6.3.

Figure 6.5 Schematic representation of the FTa1-FTa2-FTc cluster showing the position of the three

retrotransposons (RT) found within it. mRNA from FT genes are in green, ncRNA in red, with arrows representing exons and lines introns. The coding regions of the RT are indicated by blue boxes and LTRs by light green boxes.

Both RT2 and RT3 insertions are present in both PI489777 and ICC4958 indicating they took place prior to the domestication process and therefore cannot be the cause of any differential behaviour observed between the two species. RT1, on the other hand, is present only in the cultivated lines, which suggests a domestication-related event.

Insertions of transposable elements can influence gene regulation in multiple ways (Galindo- González et al. 2017; Cui and Cao 2015), so in the next section, I will investigate the possibility of mRNA alteration.

FTa2 alternative splicing

To confirm that the unusual transcript of FTa2 was indeed expressed, and to test whether the RT1 insertion might somehow modify its splicing pattern, different primers combination targeting FTa2 were tested on the cDNA of both wild and cultivated accession (Fig 6.6).

As expected, a single band was obtained for several different primer combinations within the coding region when cDNA was used as template (Fig 6.6, B and C). However, several bands were observed when the forward primer was placed in any of the additional exons within the 5'UTR (Fig 6.6 D and E). A pattern consisting of two equivalently strong bands differing in size by ≈200 bp plus several other larger, fainter bands was found in leaves of both cultivated (ICC4958) and wild accessions (PI489777), and confirmed in two more accessions, one of each species (cultivated ICCL81001 and wild Cr5-9, data not shown). The pattern was independent of age or photoperiod with the conditions used in this study (Figure 6.6 D and E). These results

Chapter 6 Exploring natural sequence variation within the FTa1-a2-c cluster

142 confirm the integrity of the annotated CaFTa2 structure and also show that the splicing pattern of this gene is not affected by the presence of the retrotransposon insertion RT1.

Figure 6.6 (A) Representation of the FTa2 genomic sequence and the position of primers used. mRNA and coding

region are indicated by green and yellow boxes, respectively. Length of the intron 3 (containing the retrotransposon, which is not shown for this reason) has been reduced for graphical representation. (B to E) Visualization of the PCR products obtained with different primer pairs. Hyperladder IV (B to D) or Easy ladder I (E) were used as size indicator (Bioline). Lanes 1 and 2- Genomic DNA from accession ICC4958 (1) and PI489777 (2); Lanes 3 to 7 - cDNA from ICC4958 leaves harvested when plants were four (3, 5), six (4, 6) or eight (7) weeks old. Lane 8- cDNA from PI489777 leaves harvested from twelve weeks old plants. Plants were grown in either long days (16 h photoperiod, lanes 3, 4, 7 and 8) or short days (8 h photoperiod, lanes 5 and 6).

To further investigate the identity of the multiple bands observed in both species, the PCR products obtained using the primers F7/R7 were cloned only from the accession ICC4958 and sequenced. Alternative splicing of this gene was already predicted by sequence data in NCBI, with 3 different mRNA isoforms described (Fig 6.7). The analysis of sequences from ICC4958 revealed the presence of two of these isoforms, and 8 other distinct transcripts. The most significant changes compared with the canonical transcript were loss of the fourth (first coding) exon, and the inclusion of new exons between exons 6 and 7 (Figure 6.7). According to NCBI, the last exons present a long (484 bp) and a short variant (55 bp). We detected 8 novel transcripts

Chapter 6 Exploring natural sequence variation within the FTa1-a2-c cluster

143 using a primer combination that amplify only those spliced forms with the long variant. This means that the number of FTa2 isoforms is higher than predicted, as other variants could be obtained with a different primer combination.

Figure 6.7 Alternative splicing of chickpea FTa2. The 3 different mRNAs and coding regions annotated in NCBI

are indicated in green and yellow, respectively. Eleven different mRNAs were sequenced in this study, which are indicated in grey. Numbers over the exons correspond to size in bp.

Related documents