Development of SSR Markers - BLAST algorithm

BLAST algorithm

5. DISCUSSION

5.1 Development of SSR Markers

SSR markers have become the markers of choice for plant genetics and breeding applications. Despite the fact that hundreds of SSR markers have been isolated in chickpea using SSR- enriched library (Hüttel et al. 1999; Sethy et al. 2006a, 2006b; Winter et al. 1999) or BAC end sequence (Lichtenzveig et al. 2005) approaches, the narrow genetic background of cultivated chickpea germplasm demands the development of SSR markers in large number so that these can be used in chickpea genetics and breeding. With an objective of enriching the SSR marker repertoire, a SSR enriched library for GA and TAA repeat motifs was constructed from C. arietinum ICC 4958 genotype, which is being used as a reference genotype for developing genomic and genetic resource by the chickpea community.

Sequencing of 307 putative SSR positive clones yielded 457 non-redundant genome survey sequences (GSSs) of which 299 (65.4%) GSSs provided 643 SSRs. While comparing the SSR-enrichment efficiency with other studies like groundnut where 490 SSRs were found in 720 SSR positive clones (68%) (Cuc et al. 2008), the present study also showed SSR isolation efficiency at the rate of 65%. In terms of different classes of SSRs, tri-nucleotide (40%) and di-nucleotide (39%) motifs constituted the major proportion of SSRs followed by mono-nucleotides (16%) and tetra-nucleotide (3%). Similar kind of distribution of different

152

SSR classes was observed in different SSR isolation studies in chickpea (Hüttel et al. 1999; Winter et al. 1999; Lichtenzveig et al. 2005). While comparing the abundance of different SSR motifs, TAA/ATT repeats were found more abundant followed by GA/CT. These observations are not surprising because: (1) the library was enriched for TAA and GA repeat motifs, and (2) TAA repeat motifs have been reported abundant in earlier studies in chickpea (Hüttel et al. 1999; Winter et al. 1999; Lichtenzveig et al. 2005) as well as other legume or plant species such as, soybean (Akkaya et al. 1992; Cregan et al. 1994), Medicago (Mun et al. 2006), tomato (Smulders et al. 1997) and pine (Echt and May- Marquardt 1997). This was also illustrated in a comparative study on SSRs in ESTs of different legume and cereal species (Jayashree et al. 2006) and in silico sequence analysis among some cereal species (Varshney et al. 2002).

In case of BAC-end sequences (BESs), only 5,123 of 46,270 BESs were detected to have 6,845 SSRs scanning one SSR per every 4.85 kb. Unlike the SSRs derived from microsatellite enriched library, the di-nucleotide SSRs (mainly “AT” rich) were most abundant compared to tri-nucleotide repeats in identified BES-SSRs. This fact corroborates the fact that AT repeat motifs are abundant in chickpea genome. BAC libraries have been targeted for isolation of SSRs in chickpea earlier (Rajesh et al. 2004; Lichtenzveig et al. 2005). BES-derived SSR markers have been developed in several other legume species like Medicago (Nam et al. 1999), soybean (Cregan et al. 1999), common bean (Vanhouten and MacKenzie 1999), etc. One of the most important advantages of the BES-SSR markers over genomic or EST-derived markers is that they serve as anchor points between genetic and physical maps. Such linkages have been shown in several crops like rice (Tao et al. 2001; Chen et al. 2002), maize (Coe et al. 2002), cotton (Xu et al. 2008), melon (González et al. 2010), etc. As at the time of initiation of this study, no physical map was available for

153

chickpea, the developed BES-SSR marker resources is an useful resource for linking the genetic map of chickpea with future physical map.

Identified SSRs through both enriched library (ICCM series) as well as BAC-end sequences (CaM series) were analyzed in terms of the length of SSR tracts as Class I SSRs (>20 nucleotides in length) and Class II containing perfect SSRs (>12 but <20 nucleotides in length) (Temnykh et al. 2001). Among ICCM SSRs, Class I SSRs have abundance of tri- nucleotide repeats (77%) followed by di-nucleotide repeats (14%) while as Class II repeats have more penta- nucleotide repeats (55%) followed by hexa- repeats (23%). Similarly Class I CaM SSRs include higher proportion of di-nucleotide repeats (42.7%) followed by tri-nucleotide repeats (26%), while as Class II CaM SSRs include highest contribution from penta-nucleotides (65.3%) followed by hexa-nucleotides (26.1%). Availability of information on this aspect of SSRs is important for the selection of potential polymorphic SSR markers. In a parallel study carried out at ICRISAT, the ICCM and CaM markers were screened on 48 genotypes of chickpea (for detailed results, see Nayak et al. 2010). The study revealed that in case of ICCM markers, average PIC value of Class I SSRs was higher (0.38) than that of Class II SSRs (PIC = 0.22), thus demonstrated the potentiality of Class I SSRs over Class II SSRs. Similarly, in case of CaM markers, average PIC value of Class I SSRs was higher (0.21) compared to Class II SSRs (0.11). Similar results were also reported in an earlier study in rice by (Temnykh et al. 2001). Majority of the Class I SSRs contains tri-nucleotide repeats, thus indicating the importance of tri-nucleotide repeat motifs over others. In general, tri-nucleotide repeats were considered as the most polymorphic sites (Varshney et al. 2005a). In addition to tri-nucleotide repeats, compound SSRs constituted the majority of polymorphic markers during the present study. PIC values of compound SSRs (average PIC of ICCM = 0.29 and CaM = 0.27) were comparable to that of tri- nucleotide repeats. This can be attributed to the fact that the markers with compound SSRs

154

have more than one SSR motif, which increases their chances to be polymorphic (Ghislain et al. 2004).

While integrating newly developed SSR ICCM and CaM series markers into the inter- specific genetic map, the H-series markers, derived from BAC libraries (Lichtenzveig et al. 2005) were also attempted. During the testing of pairs of newly isolated SSR markers, a large number of them were lost because of several reasons like clone duplication, chimera formation, lack of flanking sequences for primer designing. This loss of primer pairs during marker development was termed as “attrition” by Squirrell et al. (2003). Total attrition rate of ICCM markers was 77.8% (only 22.2% were polymorphic) in inter-specific mapping population and 90.2% (only 9.8% were polymorphic) in intra-specific mapping population. The loss of primer pairs attributing to the attrition rate in CaM markers was about 79.2% (20.8% were polymorphic) in inter-specific mapping population and 91.1% (8.9% were polymorphic) in case of intra-specific mapping populations. This clearly showed that attrition rates of primer pairs in case of microsatellite enriched library and that derived from BESs were very similar to each other. Similar observations were made in case of rye while comparing microsatellite enriched library and BESs showed minor differences in the attrition rates and very high rates of attrition rate was obtained (about 90%) when chromosome specific microsatellites were studied (Kofler et al. 2008), while in case of cucurbits it was found to be 80% (Gong et al. 2008). When large number of SSR markers is required, difficulties and wastage due to clone duplication, chimera formation, lack of flanking sequence and poor amplification of PCR primers are all encountered, and can lead to massive attrition rates relative to the initial numbers of clones sequenced (up to 90%, Squirrell et al. 2003). However, with the accurate reporting of attrition rates at each step, the SSR development process can be further refined and improved to give greater efficiency of marker production.

155

5.1.1 Functional annotation of GSSs developed from the SSR enriched library

Similarity analysis was performed for 457 GSSs obtained from SSR enriched library using BLASTN and BLASTX algorithms, and significant similarity was determined at an Expect value threshold of ≤1E–05. Relatively few of the GSS sequences had E values that surpassed this score, irrespective of the species data set under analysis. This is consistent with the expectation that randomly selected short genomic sequences only occasionally correspond to gene coding regions that will match EST data sets. Nevertheless, in cases where BLAST hits with e-value lower than 1E–05 threshold were recorded, the degree of similarity, expressed as either nucleotide identity of deduced protein similarity, was highest for phylogenetically related species, decreasing in rank order of phylogenetic distance (i.e., Medicago > lotus > soybean = cowpea = common bean > poplar > Arabidopsis > rice). Among these sequences, 40 were identified as related sequences in all three analyzed cool season legumes, i.e., chickpea, Medicago, and Lotus (Hologalegina clade), while 29 sequences had similarity with all three analyzed warm season legumes, i.e., soybean, common bean, and cowpea (Phaseoleae clade). Only 21 sequences were identified as similar sequences in both Hologalegina and Phaseoleae species. Two of these GSSs (FI856609 and FI856659) showed significant similarity with sequences of all the plant species analyzed in the present study. This observation is evident from the evolutionary taxonomy of family Leguminosae (subfamily Papilionoideae) that crops like Medicago and Lotus are taxonomically related to chickpea and therefore, higher sequence similarity was observed with these crop species. The phylogenetic relationships in Leguminosae based on evolutionary taxonomy (Doyle 1994) and recent molecular analysis (Kajita et al. 2001), showed that the Hologalegina clade of leguminosae phylogenetic tree (Doyle and Lucknow 2003; Wojciechowski et al. 2004) consists of legume subfamilies Loteae (Lotus), Robinieae (Sesbania) and Inverted Repeat Lacking Clade (IRLC), that are characterized by loss of 25

156

kb chloroplast inverted repeat (Medicago, Pisum, Trifolium, Cicer), which represent economically important cool season legumes. Though, less than one-third of the SSR- associated GSS sequences had significant hits in these databases, the derived annotations add a potentially useful data type to the marker metadata. Not surprisingly, chickpea GSS (from which the SSRs were developed) had higher similarity with ESTs of other legume species, and to dicot out groups (i.e. poplar and Arabidopsis) than to monocot (i.e. rice) data sets.

In document Identification of QTLS and Genes for Drought Tolerance Using Linkage Mapping and Association Mapping Approaches in Chickpea (Cicer arietinum) (Page 163-168)