Genotyping by sequencing - Investigating salt stress resilience in Brassica oleracea

2.1 Methods

2.1.9 Genotyping by sequencing

A disc of leaf sample of approximate size of 1cm2_{was collected from ninety-six (96)}_B.

oleracea accession from the C genome diversity set from true leaf seven when the plants

were approx. 2 months old.

(http://www2.warwick.ac.uk/fac/sci/lifesci/research/vegin/brassica/bcgdffs/bcgdfslinessu mm.pdf) by Hussien Gherli (Ph.D.), under VeGIN project, Warwick Crop Centre, Wellesbourne.

2.1.9.2 GBS protocol

For the GBS protocol using next-generation sequencing (NGS), an equivalent of 300 ng and a concentration of 10-50 ng/µl was used. This was of high molecular and RNA free DNA. A restriction enzyme (ApeKI) with an insertion size of ~130 bp was used for the digestion and insertion of adaptors including the barcodes. The sequencing mode was 1 x 75 bp single-end reads, 290 million reads and ~3 million read pairs per sample.

2.2.10 Data Analysis

2.2.10.1 Morphological traits

The analysis of the morphological traits measured was performed using Excel (2016), where Mean, SD and SEM were calculated and compared between the treated and control plants performed using a Student’s t-test, to test the level of significant variation was considered at p £ 0.05, and 0.001. The comparison was performed within the lines only not between the lines, 1.e. treated A vs control A and not vice versa. Correlation analysis between different morphological traits was also carried out to the identified level of correlation under salt stress conditions.

2.2.10.2 Physiological traits

The concentration of Na+_{, K}+_{, and Ca}2+_{were determined by averaging the mean, SD, and}

SEM of three replicates per line, K+_/Na+_{, Ca}2+_/Na+_{ratio were determined. Student’s t-test}

analysis was conducted to determine the level of statistically significant, which were considered at p £ 0.05 and 0.001. And linear regression analysis carried out between parameters to test the level of relationship in R. Principal component analysis (PCA) and Clustering analysis between the morphological and physiological traits under salt stress.

2.2.10.3 RNA-Seq Reprocessing

The reprocessing was done according to Finotello et al. (2014), the raw reads were pre- processed using some short common line tools (FASTX-Toolkit 0.0.13.2) (see Appendix VII for the scripts). The overall reads quality was inspected using the quality reports generated with FastQC. Pre-processed reads were mapped with TopHat (Kim et al., 2013) on the B. oleracea (To1000) (Boleracea.v2.1) genome, downloaded from the http://brassicadb.org/brad/. Gene coordinates file help also to map the reads spanning splices junctions (TopHat option ‘-G’). Reads multi-mapped were removed from the final results, together with those reads sharing less than 95% identity with reference.

In the identification of differentially expressed genes, the raw counts (BAM file) were used

as input using the SeqMonk software (1.40.1) downloaded from

https://www.bioinformatics.babraham.ac.uk/projects/seqmonk/. In the analysis workflow, SeqMonk (version 1.40.1 Babraham Institute, 2009). This is compatible and have an R language inbuilt, and pipeline for BAM files processing, generated from TopHat and can run simultaneous analyses for DSeq2 and EdgeR (v.3.8.6) Bioconductor packages were used. The software was set up to process the BAM files, and map to the sequence of B. oleracea genome (To1000) (Boleracea.v2.1) and scores for quality probe set at 50%. The

probes were filtered by using this inbuilt system and via the use of quality score filters, DESeq2 pipeline and EdgeR statistical analyses were conducted to obtain the differentially expressed genes using a FDR set at 0.05 for significant genes.

Table 2.5: List of membrane ion transporters identified from RNA-Seq that show differential expression 24 hour post salt shock in B. oleracea DH lines

* Note: the table shows the transcript ID as it correspond to B. oleracea C genome (To1000), their chromosomes (Chr.), start and end nucleotide sequence positions in each chromosome, chromosome strand, a gene name of each transcripts and gene description using conventional names.

2.2.10.4 qPCR data

Samples were run and analysis conducted using a Mx3005P multiplex quantitative PCR system (Agilent Stratagene). The housekeeping genes (b-Tubulin and TIP41) were used for data normalisation. Genotype-specific Ct values for each gene and control were calculated using baseline-corrected, ROX-normalised parameters. Three technical replicates included in each plate, and the average Ct value for each genotype was normalised within the plate housekeeping genes by a method of Livak and Schmittgen (2001). The average Ct values from the three biological replicates were analysed by Microsoft Excel (2016) to calculate the Mean, standard deviation (SD), % Coefficient of variance (%CV), SEM and Log2 Fold

change (FC) was determined for relative expression of genes. An ANOVA analysis conducted using F-test to test for significant variation between different time points.

Gene stable ID Transcript ID Chr. start (bp) end (bp) Strand Gene name Gene description

Bo1g022080 Bo1g022080.1 C1 8072815 8076088 1KT Potassium transporter

Bo1g158860 Bo1g158860.1 C1 43713959 43714450 -1V-type G V-type proton ATPase subunit G

Bo2g024320 Bo2g024320.1 C2 6673393 6676356 -1V-type a V-type proton ATPase subunit a

Bo4g012670 Bo4g012670.1 C4 1353531 1360063 1KT 9 Potassium transporter 9

Bo4g039050 Bo4g039050.1 C4 8527051 8530274 1KUP11 K+ uptake permease 11

Bo4g145930 Bo4g145930.1 C4 39931384 39935961 1V-type a V-type proton ATPase subunit a

Bo5g131740 Bo5g131740.1 C5 40620196 40622124 1CAX3 cation exchanger 3

Bo8g030800 Bo8g030800.1 C8 10024031 10027008 1V-CLC Voltage-gated chloride channel family protein

Bo9g003910 Bo9g003910.1 C9 514674 519143 1ECA2 ER-type Ca2+-ATPase 2

2.2.10.5 GBS Data Sorting

The raw GBS read obtained were subjected to bioinformatic analysis and aligned to the reference genome sequence (B. oleracea To1000). The analysis generated, mapping rate, the total number of SNPs and sample allele frequency scored both at 5% and 10%. Sorting was done first, by selection of all variants sheet with 0.05 score, unidentified corresponding alleles after mapping were cleaned. Homozygous parental data was generated separately and DH lines with related homozygous parental scored to the parental type. NOTE: Sorting based on this format was to enable identify introgression, inversion, polymorphism and or genetic imprint of alleles in a different position of chromosomes that corresponded to our transcripts positions.

CHAPTER THREE

Morphological Variation in B. oleracea genotypes in Response

In document Investigating salt stress resilience in Brassica oleracea (Page 100-105)