DOCUMENTO 3. THE Neurospora crassa EXPRESSION PROFILE IN RESPONSE TO
3. Results and discussion
3.1. Overall transcriptomic analysis (RNA-seq)
Recent studies showed that after 4 h of media shift (sucrose to lignocellulosic material), cellulolytic genes were induced several thousand-fold when compared to 30 min and 1 h of analysis [14]. This stage is called inductive phase and analysis starting at this point are considered ideal for lignocellulosic material transcriptomics studies, since the first stage of induction, the de-repression phase was reached in earlier stages. Considering this information, extraction of RNA samples was performed after 4 h growth.
The study of large gene sets under specific conditions has been made by transcriptional profiling strategy. The high number of sequences generated using next- generation sequencing allows the analysis of diverse biological processes that occur in an organism in a given moment and, in this work, it allowed the determination of different carbon sources influence in gene expression, focusing in complex carbohydrates substrates,such as sugarcane bagasse and its variant derived from chemical pretreatment.
N. crassa genome encodes 10785 proteins, considering all its variants identified so far [30] (www.broad.mit.edu/annotation/genome/neurospora discontinued webpage). In our transcriptomic study, 8954 transcripts with at least 1 Transcript Part per Million (TPM) [31] were identified. Despite the large number of transcripts, as the focus in these analyses are the evaluation of differential gene expression in complex carbohydrates, it was considered statistically significant for further analysis only the transcripts that showed differential expression pattern comparing two by two Avicel, PASB and RAW bagasse as shown in Figure 1.
Figure 1. Volcano plot of differentially expressed genes under complex carbohydrates substrates studied. Log2 fold changes and their corresponding −log10 p-values of all genes in the RNA-seq are shown. Up-regulated genes (log2 foldchange ≥1.0 and p ≤ 0.05) are depicted in red and down regulated (log2 foldchange ≤ −1.0 and p ≤ 0.05) in green.
A total of 3263 transcripts were considered significant taken together the three comparisons (additional file 1). It was observed 939 genes up-regulated in RAW in relation to AVI as control while 1714 were down-regulated, beeing the largest number of differentially expressed genes observed among the comparisons. When the analyses were made between PASB and AVI (control) it was observed 502 genes up-regulated and 775 genes down- regulated. Considering the evaluation of RAW and pretreated sugarcane bagasse, using PASB as control, 415 genes were up-regulated and 941 genes were down-regulated. The results observed were obtained taking into account only complex carbohydrates substrates statistical evaluation. For LIG and SUC the TPM means presented in any further discussion about these samples transcripts are qualitative.
When added togheter the TPM values of each significant transcript in AVI, PASB and RAW conditions it was possible to observe that among the 20 most expressed genes, six corresponded to hypothetical proteins (NCU04276, NCU04314, NCU04502, NCU05498, NCU07405 and NCU16673). These proteins are also among the 20 most expressed when the sum of TPM values are made with LIG and SUC (additional file 1). Other six transcripts are common in this comparative, mostly genes encoding proteins related to primary metabolism, such as an alcohol dehydrogenase (NCU01754) and a glyceraldehyde-3- phospate dehydrogenase (NCU01528) [32].
Completing the analyses of the 20 most expressed transcripts, there were eight genes in LIG/SUC conditions that were also highly expressed in complex carbohydrates (but are not among the 20 most produced in these conditions), highlighting two genes involved in thiamine production, thiazole biosynthetic enzyme (NCU 06110) and thiamin biosynthesis protein (NCU09345). Thiamine pyrophosphate, a thiamine derivate, is a coenzyme in the catabolism of sugars and aminoacids [33]. High expression levels of these genes are expected considering the sugar metabolism in sucrose medium obviously, and in complex carbohydrates, since the group of 20 most expressed genes in these substrates also included eight Carbohydrate-Active Enzymes (CAZymes), four xylanases (NCU07225, NCU05924, NCU8189 and NCU07326), an exoglucanase (NCU07190), an endoglucanase (NCU05057), a mannanase (NCU08412) and a carbohydrate esterase (NCU08785). For lignin however, aminoacids catabolism seems to be more appropriated reason for the overexpression of these genes, since this condition apparently mimics starvation condition.
Corroborating with the last affirmation, there are a serie of factors based in the most expressed genes in LIG. Xie and collaborators described that high affinity glucose
expressed during starvation [34]. This gene was strongly repressed in SUC condition and highly expressed LIG, the same behavior presented by the high affinity glucose transporter NCU04963, the most expressed transporter in LIG.
N. crassa is one of many organisms that can use glycogen as a carbon and energy reserve, and its metabolism is regulated by how this microorganism senses nutrient availability and other environmental conditions [35]. The high expression levels of NCU07027, a glycogen phosphorylase that belongs to the glycosyl transferase family 35 in LIG is also in accordance with the affirmation that N. crassa growth in lignin is similar to starvation. Glycogen phosphorylase catalyzes the glycogenolysis, and it is probably acting in the glycogen reserves made during the pre-incubation in sucrose. Similar transcript levels of this gene were also described when N. crassa was grown in sucrose and complex carbohydrate sources such as corn, barley, rice, soybean and wheat straws, while the transcript levels in starvation were much higher [36], comparatively the same behavior presented when SUC, AVI, PASB, RAW and LIG were evaluated.
To determine the number of encoded genes direct related to biomass degradation in the transcriptome, we searched for all the following carbohydrate-active enzyme classes: glycoside hydrolases (GHs), glycosyltransferases (GTs), carbohydrate esterases (CEs), auxiliary activitie enzymes (AAs) and also the carbohydrate-binding modules (CBMs). Almost all N. crassa predicted CAZymes appeared in at least one evaluated transcriptome condition, although “only” approximately half of them appeared as significant. An expression pattern based on carbon source was clearly observed, with formation of three clusters as showed in a TPM based heatmap (Figure 2A). Not only CAZymes seems to present this behavior, since in the Principal Component Analysis (PCA) of all identified transcripts (Figure 2B), the separation of samples from different cultured conditions demonstrated that complex carbohydrates substrates are closely related. PCA also showed the low variability among the biological replicates used for differential expression analysis, and, interestingly LIG condition seems to be more similar to polysaccharide substrates than the disaccharide SUC in overall gene comparison, a result different from that observed in CAZyme analysis.
Sucrose cluster presented the smallest number of CAZymes, 46 GHs, 26 GTs, 11 AAs, 10 CEs, 1 PL and 3 CBMs, while lignin cluster presented the largest number, 89 GHs, 46 GTs, 38 AAs, 10 CEs and 7 CBMs (additional file 2). Less than a half of these enzymes possess significant statistical differences while about 72% of the complex carbohydrate clustered enzymes were overexpressed in one of the three conditions (additional file 3).
Figure 2. Transcriptome Clustering. (A) Heat map of all identified CAZymes in transcriptome study. Hierarchical clustering of rows and columns was performed using complete linkage and Eucledean distance of row clusters were determined by k-means algorithm, using k=3. (B) Principal component analysis (PCA) of all identified genes in transcriptome was performed using the r-log transformed values of gene counts. r-log transformation function is available at DESeq2 package.
The third cluster was composed by highly expressed CAZymes in AVI, PASB and RAW, containing 46 GHs, 6 GTs, 18 AAs, 17 CEs, 3 PLs and 2 CBMs. As expected, for these substrates there was production of great amount of enzymes from GH families, which are rich in cellulases and hemicellulases, mainly GH3, GH5, GH6, GH7, GH10 and GH43 (Figure 3). In contrast for LIG and SUC clusters, among the most produced families of glycoside hydrolases are GH13, possibly acting as a scounting enzyme, and GH76, a CAZyme family responsible for glycoprotein incorporation in N. crassa cell wall [37].
Considering the enzymes with auxiliar activities, AA3 and AA7 were among the most abundant in all three clusters (Figure 3, additional file 2). In complex carbohydrate cluster, polysaccharide monooxygenases (AA9) were the most produced oxidative enzymes, which is related to enhanced cellulose degradation by the combination of AA3 and AA9 [38]. In SUC cluster, AA1 was the most expressed family, but the transcripts possessed a small TPM value, and probably the correspondent substrate response is with ferroxidase activity and not with laccase, the most representative enzyme in this family. Interestingly, the AA1 and AA2 families were not among the most produced in LIG cluster and not even the representants of these families have a high TPM value, indicating that N. crassa is not able to degrade lignin and barely modificates this substrate in consistence with the enzymatic traditional pattern expected in ascomycetes [39]. Nevertheless, AA7 is described as a glico-
or detoxification of lignocellulosic compounds [17], so NCU04108, an alcohol oxidase, may play an important role in lignin modification.
The TPM values observed for CAZymes transcripts in LIG and SUC are considerably smaller than in complex carbohydrates (additional file 1), although the variability of CAZymes families are equal or even larger. Therefore, it is possible to consider this result as initial evidence that the amount of key enzymes in lignocellulose degradation is more important to be studied than the general diversity. However it is crucial to consider the evaluation of transcripts from some families that are not highly expressed but are induced coordinately, since they may imply in a description of new enzymes with unknown activities.