Chapter 4. Identifying the chromosomal location of I-7 by combining SNP
4.3 Results 89!
4.3.2 SNP analysis of the RNA-seq data 91!
The quality trimmed RNA-seq reads from the three mock-inoculated Tristar and three mock-inoculated M82 samples were mapped against the tomato Heinz 1706 reference transcriptome, using CLC Genomics Workbench 4.0. A similar number of reads from each cultivar aligned to the reference transcripts; 86.3% of Tristar reads and 86.6% of M82 reads (Table 4.4). The mapped data was then used for SNP analysis, which revealed a large number of SNPs in the root transcripts of both cultivars relative to the reference transcriptome; 55,201 SNPs for Tristar and 55,076 SNPs for M82. SNPs which were supported by at least 75% of the mapped reads were extracted; 30,069 SNPs for Tristar and 31,391 SNPs for M82. Then all SNPs that were common to Tristar and M82 were discarded. The remaining unique polymorphisms were used to calculate the SNP frequency in each transcript (number of SNPs per transcript / length of the transcript). Initially, transcripts with the highest frequency of SNPs were targeted for further analysis. SNP-based DNA marker analysis targeting 20 of the corresponding genes showed that these SNPs were not in fact real (Appendix 4 shows the results of DNA marker analysis carried out on five of these genes, as an example). Further investigation revealed that most of these genes belonged to multigene families and that the apparently high frequency of SNPs in these genes was mostly due to mapping errors. This result suggested that there was a degree of background noise in SNP detection that might need to be resolved using more stringent parameters, but ultimately this was not required.
Table 4.4 Illumina Sequencing, mapping and SNP analysis output
Treatment Number of reads Average length Reads after trimming length after Average trimming Total number of reads per treatment Total number of reads after trimming per treatment Total number of sequences removed
Mapping (%) Reads in Mapping pairs Mapping Broken Paired reads Number of SNP against reference genome Tristar Mock 64,457,570 101 63,186,041 97.9 173,051,524 169,618,280 3,433,244 146,499,362 (86.3%) 136,311,994 7,504,088 55,201 Tristar Mock 60,608,136 101 59,450,519 98 Tristar Mock 47,985,818 101 46,981,720 98 Tristar Fol3 41,725,488 101 40,899,007 98 135,116,254 132,394,434 2,721,820 Tristar Fol3 49,700,096 101 48,724,872 98 Tristar Fol3 43,690,670 101 42,770,555 98 M82 Mock 46,730,882 101 45,710,650 97.9 155,595,324 152,245,615 3,349,709 131,851,910 (86.6%) 122,521,270 6,828,277 55,076 M82 Mock 40,509,366 101 39,675,598 97.9 M82 Mock 68,355,076 101 66,859,367 98 M82 Fol3 47,920,198 101 46,955,092 97.9 146,736,452 143,753,688 2,982,764 M82 Fol3 48,278,644 101 47,327,095 97 M82 Fol3 50,537,610 101 49,471,501 98 IL7-7 Fol3 156,653,590 101 153,848,232 99.3 501,962,596 492,784,417 9,178,179 IL7-7 Fol3 171,965,736 101 168,620,314 99 IL7-7 Fol3 173,343,270 101 170,315,871 99.3
Taking a different approach, it was reasoned that Tristar transcripts from the introgressed S. pennellii DNA containing I-7 should possess an abundance of SNPs compared to corresponding transcripts from M82. To both visualise and verify this idea, an additional RNA-seq data set, obtained from the IL7-3 tomato line was used. The IL7-3 line (Figure 4.1) contains a defined introgression of S. pennellii DNA on chromosome 7 in the genetic background of S. lycopersicum cv. M82 (Eshed and Zamir, 1995). RNA samples from IL7-3 roots were sent for Illumina sequencing, along with the samples described in this project, as part of an I-3 characterisation project being carried out in parallel by Ann-Maree Catanzariti (Plant Disease Resistance Lab, Research School of Biology, Australian National University).
Figure 4.1 SNP analysis of transcripts from the IL7-3 introgression line reveals the S. pennellii introgression carrying I-3 on chromosome 7. A. Graphic representation of the S. pennellii introgression (green region) on chromosome 7 in the IL7-3 line, showing the position of the markers TG143 and TG20 (blue triangles), which define the extent of the introgression. B. Plot of SNP frequency per nucleotide for IL7-3 (green dots) and M82 (red dots) root transcripts relative to the chromosome 7 transcriptome. A large cluster of genes showing a higher frequency of SNPs can be seen between the markers TG143 and TG20 (blue triangles), which define the extent of the introgression. Gaps in the plot correspond to genes that are not expressed in roots or to gene numbers for which no corresponding genes have been assigned.
The IL7-3 sequencing reads were mapped against the tomato reference transcriptome and analysed for SNPs as described above for the analysis of Tristar and M82 samples. In this case, only SNPs on chromosome 7 were analysed. Those common between IL7-3 and M82 were discarded and polymorphisms unique to each line were used to calculate the SNP frequency for each transcript. This data was then plotted as shown in Figure 4.1. The graph also shows the position of the two DNA markers, TG143 and TG20 that delimit the S. pennellii introgression in IL7-3. A large cluster of IL7-3 SNPs can be seen between these two markers, thus establishing a visual pattern that could be used as a precedent to look for the I-7 introgression in Tristar.
The Tristar data was analysed in the same way, bearing in mind that the genome wide survey of this cultivar with CAPS and SCAR markers (Chapter 3) suggested that the S. pennellii introgression was likely to be much smaller than that in IL7-3. A plot of SNP frequency against gene position on each of the twelve tomato chromosomes revealed two clusters of Tristar genes with a higher frequency of SNPs on chromosomes 8 and 11 (Figures 4.2 and 4.3). Tristar carries the I-2 gene on an introgressed region derived from S. pimpinellifolium on chromosome 11 (see Section 3.3.1). Because the cluster found on Tristar chromosome 11 corresponded to the region where I-2 is located (Figure 4.2), it was concluded that I-7 was not present in this region. On the other hand, the small cluster found on chromosome 8 seemed promising and was studied in more detail. This cluster comprised genes from Solyc08g077520 to Solyc08g077800 (Figure 4.3).
95
Figure 4.2 SNP analysis of transcripts from Tristar reveals the S. pimpinellifolium introgression carrying I-2 on chromosome 11. A. Graphic representation of the Tomato- EXPEN 2000 map of chromosome 11 showing the approximate location of the I-2 gene. B. Plot of SNP frequency per nucleotide for Tristar (blue dots) and M82 (red dots) root transcripts relative to the chromosome 11 transcriptome. A small cluster of genes showing a higher frequency of SNPs (oval) can be seen corresponding to the introgressed region in Tristar that carries the I-2 gene from S. pimpinellifolium. Gaps in the plot correspond to genes that are not expressed in roots or to numbers for which no corresponding genes have been assigned.
Figure 4.3 SNP analysis of transcripts from Tristar reveals a small S. pennelli introgression on chromosome 8. Plot of SNP frequency per nucleotide for Tristar (blue dots) and M82 (red dots) root transcripts relative to the chromosome 8 transcriptome, showing a small cluster of genes showing a higher frequency of SNPs (oval). Gaps in the plot correspond to genes that are not expressed in roots or to numbers for which no corresponding genes have been assigned.
4.3.3 CAPS marker analysis confirms the SNP data and reveals the