• No results found

Gene Expression Altered by the Environment

Key Points

4.4 Gene Expression Altered by the Environment

Gene expression datasets were aligned to the genome (see Section 3.4) and gene expression levels quantified (see Section 3.6). Genes that responded to environmental stress were identified by comparing relative expression levels in stressed plants to unstressed plants using R/Bioconductor (Gentleman et al. 2004) packages and a false discovery rate (FDR) (Benjamini and Hochberg 1995) significance threshold of 5% (see Section 3.8). Sixty thousand measures of orientation-specific gene expression, representing nearly thirty thousand distinct genes, were tested for evidence of differential expression; each gene could have two measures of expression depending on the orientation of gene expression alignments within the gene (see Section 4.3).

After cold environmental stress, 798 genes were identified as differentially expressed by edgeR compared to 2,864 by DESeq (Anders and Huber 2010) using the same datasets.

Fewer genes were differentially expressed after the recovery period: 743 and 1,179 were identified by edgeR and DESeq, respectively. For heat stressed datasets, the number of differ- entially expressed genes identified by edgeR fell from 9,291 to 1,667 after recovery. A similar number of differentially expressed genes were identified by DESeq for this comparison: 7,172 at the early time point and 2,554 after recovery. For all WT sets of environment-responsive genes, between one-fifth and one-quarter were affected in the antisense orientation (Supple- mentary Tables B7A and B7B).

Using baySeq (Hardcastle and Kelly 2013) to identify environmentally-induced genes allowed a more complex experimental design to be analysed. Rather than a pairwise comparison to determine differential expression in an experiment, it was possible to identify genes, expressed in the sense or antisense orientation, that were affected by a particular environmental stress or by both equally (see Section 3.8). Heat stress had a greater effect on transcription – 6,114 genes were differentially expressed in the heat stress datasets but expression in cold stress datasets remained unchanged from unstressed; these genes were heat stress specific. After recovery, 20-fold fewer genes were heat stress specific: a total of 295. Fewer genes were identified as responding to cold stress specifically: 145 at the early time point and 131 after recovery. Genes that were equally affected by both environmental stresses increased with recovery from 191 to 218. The number of genes that were significantly unaffected by environmental stress also increased with recovery from 12,124 to 41,563 – indicating that the transcriptome of stressed plants after recovery was more similar to unstressed plants than after stress treatment (Supplementary Table B7C).

Mutant datasets revealed much wider-ranging misregulation, between heterozygote and homozygote datasets, than WT environmentally stressed datasets. Using edgeR, 19,410 or 20,334 genes were differentially expressed inmop1-1orrmr6-2, respectively. DESeq identified fewer genes as differentially expressed: 16,280 in themop1-1and 18,736 inrmr6-2

but over four-fifths of RMR6- and MOP1-dependent genes were identified by both methods. A higher proportion of antisense transcripts were differentially expressed inmop1-1andrmr6-2 datasets: approximately 44.4% or 39.1% inmop1-1orrmr6-2, respectively (Supplementary Tables B7A and B7B).

More genes may have been identified as differentially expressed in mutant datasets due to the comparison between heterozygote and homozygote datasets being based on single estimates of gene expression, which causes the variation observed within WT biological replicates to be assumed similar for the unreplicated mutant datasets. High levels of differential gene expression can affect the identification of differentially expressed genes, since an assumption that most genes are not differentially expressed is made to allow estimation of variation with few biological replicates, leading to incorrect identification of differentially expressed genes. The differences in genes identified by each library can be normalised, in edgeR, by the TMM method which accounts for different sets of genes being identified by different datasets (see Section 3.8).

Some genes showed differential expression in both transcription orientations, however the majority were differentially expressed in the sense direction only. Between 71.3% and 78.6% of environmentally-induced genes at the early and late time points, respectively, were misregulated in the sense direction only. The proportion of genes that were differentially expressed in both orientations was slightly higher in the heat stressed datasets than the cold stressed datasets, but the proportion of genes that were differentially expressed in the antisense orientation only was approximately constant for each environmental stress and time point comparison (Table 4.2). Mutant datasets showed more overall misregulation and proportionally more antisense misregulation. Approximately one-fifth of differentially expressed genes in mutant datasets supported sense and antisense misregulation while one-third were differentially expressed in the antisense orientation only (Table 4.2).

Table 4.2|Number of differentially expressed genes. Number of genes that edgeR or DESeq identified as differentially expressed in sense, antisense or both transcription orientations.

Treatment Time point Sense Both Antisense Unique

Cold Early 2,718 (78.6%) 199 (5.8%) 542 (15.7%) 3,459 Late 920 (77.5%) 63 (5.3%) 204 (17.2%) 1,187 Heat Early 6,055 (71.3%) 955 (11.2%) 1,485 (17.5%) 8,495 Late 1,926 (78.6%) 166 (6.8%) 359 (14.6%) 2,451 mop1-1 - 7,591 (46.6%) 3,388 (20.8%) 5,301 (32.6%) 16,280 rmr6-2 - 8,784 (52.4%) 3,835 (22.9%) 4,157 (24.8%) 16,776

For environmentally stressed datasets, 3,658 and 9,450 genes were identified as differentially expressed by either DESeq or edgeR in the early time point of cold and heat stressed plants, respectively. This reduced to 1,250 and 2,617 genes in the late time point of cold and heat stressed plants, respectively. Within stress treatments, some different genes were identified by edgeR and DESeq, particularly at the early time point. More genes were identified solely by DESeq in the cold stressed datasets. With the exception of the cold stressed early time point datasets, there was a high degree of overlap between genes identified using these methods (Supplementary Figure B3 and Supplementary Table B8). A high proportion of genes identified by either edgeR or DESeq in mutant datasets were found by both methods: 81.5% of MOP1 dependent genes and 89.6% of RMR6 dependent genes (Supplementary Figure B3 and Supplementary Table B8). Genes identified as differentially expressed by either method were considered to be affected by environmental stress (see Section 3.8).

Key

Points

Heat stress produced significantly more misregulation than cold stress – over six thousand genes were affected specifically by heat stress. Environmental stress predominantly affected expression of sense transcripts but RNA-directed DNA methylation (RdDM) mutants showed more misregulation of antisense transcripts.

A 7745 17051953 Heat Cold B 2133 484 766 Heat Cold C 15556 5055 14613 rmr6−2 mop1−1

Figure 4.8|Similarity between stress-responsive and RdDM-dependent genes. Number of genes identified by edgeR or DESeq as up- or down-regulated (see Section 3.8) by (A–B) environmental stress at the (A) early, and (B) late time points, and (C) RdDM mutants.

Nearly half of the cold-induced changes at the early time point were also observed with heat stress; 1,705 genes were up- or down-regulated by both cold (46.6%) and heat (18.0%) stress environments, which was significantly more than expected (hypergeometric test,P<0.01), suggesting that these genes may form part of a ‘general stress response’. After recovery, the number of genes affected by stress decreased to 3,383: 484 were similarly affected by both stresses whil a total of 2,617 were induced by heat stress and 1,250 by cold. In contrast to the early time point, a smaller proportion of cold-induced changes at the late time point were shared with heat-induced changes: 38.7% of cold-induced genes were similarly affected by heat and 18.5% of heat-induced genes were similarly affected by cold (Figures 4.8A and 4.8B).mop1-1andrmr6-2 datasets indicated that 19,668 genes were dependent on MOP1 and 20,661 depended on RMR6; one-quarter of these were dependent on MOP1 and RMR6 (Figure 4.8C). The intersection between genes affected bymop1-1andrmr6-2was not significantly large, according to a hypergeometric test.

Within genes identified by either DESeq or edgeR, more genes were down-regulated in cold-stressed early (CE), cold-stressed late (CL) and heat-stressed early (HE) datasets while more up-regulated genes were identified in heat-stressed late (HL). At the early time point in cold stressed datasets, 2,381 genes were down-regulated and 1,277 were up-regulated. After recovery, 783 were down-regulated and 507 were up-regulated. In heat stressed datasets,

A 0 0.2 0.4 0.6 0.8 D e n si ty

Log2 fold change

-15 -10 -5 0 5 10 15 B 0 0.1 0.2 0.3 0.4 0.5 D e n si ty

Log2 fold change

-15 -10 -5 0 5 10 15

Figure 4.9|Distribution of fold changes of differentially expressed genes. Identified at the at the early (solid) or late (dot-dashed) time points by (A) edgeR or DESeq: cold (blue)

and heat (red),mop1-1(purple) andrmr6-2(pink), and (B) baySeq: cold-specific (blue),

heat-specific (red) and both stresses (yellow).

5,599 genes were down-regulated and 3,851 were up-regulated at the early time point com- pared to 1,174 and 1,443 after recovery (Figure 4.9A and Supplementary Figure B4A). Within baySeq classifications, more genes classified as cold stress specific were down-regulated at both time points (61.4% and 94.7%) than were up-regulated, while more genes classified as heat stress specific were up-regulated (61.0% and 70.5%). For genes that responded to both environmental stresses similarly, 53.9% were up-regulated at the early time point followed by 86.2% being down-regulated at the late time point (Figure 4.9B and Supplemen- tary Figure B4B). The distribution of gene expression changes was similar betweenmop1-1 andrmr6-2datasets and both showed a broader range of gene expression changes than WT stressed datasets. Whereas comparatively small stress-induced effects were found in WT datasets, mop1-1 and rmr6-2 datasets also identified many genes with large up- or down-regulation with more genes identified as up-regulated (Figure 4.9A and Supplementary Figure B4A).

Key

Points

Environmentally stressed datasets identified more repressed genes induced, whereas the loss of RdDM induced more genes than it repressed. Loss of RdDM my have led to transcription activation, suggesting that stress-induced gene repression may be due to hypermethylation. Between stress treatments, a minority of heat-responsive genes were affected by cold stress, indicating that a larger stress-specific gene response network may exist for heat compared to cold stress.

Duplicated genes composed a higher proportion of differentially expressed genes compared to the genome-wide proportion of duplicated genes. There are 2,983 genes identified as ‘local’ duplicates and 6,472 as ‘genome’ duplicates, leaving 30,201 genes in the FGS that are ‘unduplicated’ (Schnable and Freeling 2011). Expression of duplicated genes was detected using gene expression reads with exact unique alignments, potentially limiting the number of duplicated genes that could be detected. Increasing the acceptable number of alignments may provide expression estimates for more duplicated genes than were identified here; a more comprehensive gene expression quantification method could account for multiply-aligned reads contributing to duplicated genes. Genome duplicated genes that were affected by environmental stress andmop1-1orrmr6-2were enriched approximately 1.5-fold compared to the genome; genome duplicates constituted approximately one-quarter of differentially expressed genes compared to one-eighth of the FGS. Local duplicates were observed in similar proportions to the genome-wide background at the recovery time point following cold or heat environmental stress but genes that were differentially expressed at the early time point or bymop1-1orrmr6-2were somewhat under-represented. Unduplicated genes constituted approximately 70% of environment responsive genes and 75% of genes dependent on MOP1 or RMR6. The number of local and genome duplicates affected by stress at both and the

Local Genome Unduplicated 0 0.5 1 1.5 2 F o ld e n ri ch me n t * * * * * *

Figure 4.10|Differentially expressed duplicated genes in the maize genome. In cold (blue) and

heat (red) stressed datasets andmop1-1(purple) andrmr6-2(pink) mutants. ‘Local’ du-

plicates are those that are within the same genomic region whereas ‘genome’ duplicates are unlinked genomic regions (Schnable and Freeling 2011). Error bars indicate range of values within time points and asterisks indicate significant enrichment compared to

the genome at one or both time points (hypergeometric test,P<0.01).

early time points, respectively, as well as genome duplicates affected bymop1-1orrmr6-2 was significantly higher than expected (hypergeometric test,P<0.01) (Figure 4.10).

More genes showed evidence for differential expression at the early time point than after recovery. Some genes did not show consistent differential expression between time points but the highest proportion of genes were ‘reset’ – genes that provided evidence for differential expression at the early time point but not after recovery. Using edgeR and DESeq, 89.7% of genes differentially expressed at the early time point by cold stress were not differentially expressed at the late time point, 6.9% were maintained in the differentially expressed state but 69.9% of genes differentially expressed after recovery were not identified at the early time point. Similarly, for the heat stress datasets, 87.2% were not significantly different after recovery, 8.5% were maintained and 53.9% responded to heat stress after recovery (Table 4.3A). Rather than lack of evidence of differential expression leading to a gene being classed as non-differentially expressed, baySeq can identify genes that are significantly similar to unstressed expression levels (see Section 3.8). Using baySeq, over 95% of genes that were altered by environmental stress were ‘reset’ to unstressed levels of expression after recovery. A high proportion of cold stress specific and general stress response genes

Table 4.3|Gene expression changes after recovery. Long-term effects of environmental stress were observed by comparing differentially expressed genes (see Section 3.8) in an envi- ronmental stress between time points.

(A) edgeR and DESeq

Treatment Reset1 Maintained2 Inverted3 Delayed4

Cold 3,282 125 251 874

Heat 8,243 807 400 1,410

(B) baySeq

Stress response Reset1† Maintained2 Inverted3 Delayed4†

Cold specific 68 0 3 19

Heat specific 4,680 69 4 50

Both 95 1 0 43

1Become non-differentially expressed

2

Up- or down-regulated at both time points 3

Differentially expressed at both time points, in opposing directions 4

Become differentially expressed †

Significantly unaffected by environmental stress at the relevant time point

(86.4% and 97.7% respectively) were affected at the late time point only. Proportionally fewer genes showed a delayed response specifically to heat stress (40.7%). Few genes showed significant differences at both time points using baySeq, unlike genes identified by DESeq and edgeR. Heat stress induced the most maintained gene responses, where 69 genes (1.5%) were maintained in an up- or down-regulated state after recovery (Table 4.3B).

Key

Points

Genes affected by environmental stress were, generally, efficiently reset once stress treatment had concluded. However, the pervasive effects of environmental stress through the recovery period caused a large proportion of differentially expressed genes to only become so after recovery. The functions of genes affected at either time point differed and are described in Section 4.5.

HE CE CL HL

Figure 4.11|Heatmap of stress-induced differentially expressed genes. Highly affected differen-

tially expressed genes were clustered bylog2fold change within cold (C) and heat (H)

environmental stresses at the early (E) and late (L) time points. Down- or up-regulated genes are shown in cyan or magenta, respectively, and black shows no detected differ- ential expression. Clusters of genes or comparisons are indicated by coloured blocks on the respective axes.

Environmental stress caused misregulation of 11,148 genes in at least one orientation, environmental condition and time point combination, identified using edgeR or DESeq (see Section 3.8). The majority of changes were not consistent over time, but the response trends to cold and heat were similar (Figure 4.9, Table 4.3, Supplementary Figure B4, and Supplementary Table B7) despite less-similar genes being induced (Figure 4.8). At the early time point, there were 11,403 genes with evidence for altered expression in either sense or antisense orientations; 9,450 of these were affected by heat and 3,658 by cold. The response to environmental stress was more similar between conditions than time points, however, with many induced changes being time point specific. The short-term response to heat stress was greater than to cold stress (Figure 4.11).

A subset of differentially expressed genes were selected for validation by quantitative PCR (qPCR) (see Section 3.3) to confirm computational analyses. Fifteen genes were validated in at least one environmental stress and time point combination, where differential expression

-10 -5 0 5 10 L o g2 fo ld ch a n g e WR KY1 14 H E ER EB5 4 H E WR KY6 3 H E WR KY6 3 C E ER EB5 4 C E WR KY6 3 C L ER EB5 4 C L HB1 27 H E HB1 27 C E

Figure 4.12|Validation of differentially expressed genes by qPCR. A subset of genes were val- idated for differential expression in the environmental stress and time point indicated:

cold (C) and heat (H) stress at early (E) and late (L) time points. Estimatedlog2fold

change based on gene expression datasets (red) and qPCR (yellow) are shown with error bars to indicate standard deviation between the three qPCR replicates.

was detected in gene expression datasets. In total, 41 assays of gene expression were conducted and all but three (95.1%) assays agreed with the up- or down-regulation that was predicted by gene expression analyses (Figure 4.12 and Supplementary Table B10). The 41 comparisons showed high similarity of

log

2fold changes with

r

equal to 79.0%, increasing to 91.2% when the three contradictory comparisons were removed (Supplementary Figure B5).

Genes identified by Maize Genome Sequencing Project (MGSP) are more densely located in the arms of chromosomes, in contrast to transposable elements which are more densely located in the centromeric regions. Repetitive regions are dispersed across the maize genome, but are especially abundant in the centromeric regions. Differentially expressed genes were located throughout the genome but some chromosome regions had a higher proportion of differentially expressed genes (Figure 4.13) – these regions were gene-dense and comparatively transposable element sparse.

Genes that were differentially expressed by environmental stress were compared to genes that were misregulated inmop1-1andrmr6-2 to determine whether the environmentally- induced gene was, at some level, regulated by the RdDM pathway. The direction of differential expression was not compared between environment and RdDM dependent genes; rather a

Cold early Heat late Heat early Cold late Genes MTEC TEs MIPS TEs 1 2 3 4 5 6 7 8 9 10

Figure 4.13|Position of differentially expressed genes in the genome. Number of differentially expressed genes within 1Mb neighbouring windows was smoothed and scaled within each comparison. Each of the 10 maize chromosomes are shown with differentially expressed genes and density of transposable element (MTEC), repetitive sequences (MIPS) and genes.

C o ld e a rl y Cold late H e a t e a rl y Heat late 0 20 40 60 80 100 G e n e s (% )

Figure 4.14|Stress-induced genes affected by RdDM mutants. Proportion of differentially ex- pressed genes that were independent (red) or dependent on MOP1 (purple), RMR6

(pink) or both (pale yellow). Genes that were differentially expressed in mop1-1or

rmr6-2were considered dependent on the respective genes. Stress-responsive genes were compared to RdDM dependent genes in an orientation-specific manner.

gene that was differentially expressed in a mutant was considered dependent on MOP1 or RMR6 irrespective of whether stress promoted or repressed expression. The orientation of transcription was compared between stress-induced and RdDM-dependent genes to ensure that a stress-responsive sense transcript was not compared to a RdDM-dependent antisense transcript, for example. Approximately one-quarter of stress-responsive genes were not differentially expressed in either RdDM mutant dataset. Up to 10% more genes that were affected by stress were dependent on RMR6 than MOP1 and between 25.1% and 30.0% were affected by both RdDM mutants (Figure 4.14). Environmental stress and time point did not have a noticeable effect on RdDM dependence and all comparisons showed a significant proportion of genes with RdDM dependence (hypergeometric test,P<0.01).

Key

Points

Differentially expressed genes were located throughout the maize genome, in-