• No results found

Synonymous Mutations Frequently Act as Driver Mutations in Human Cancers

N/A
N/A
Protected

Academic year: 2021

Share "Synonymous Mutations Frequently Act as Driver Mutations in Human Cancers"

Copied!
12
0
0

Loading.... (view fulltext now)

Full text

(1)

Theory

Synonymous Mutations Frequently Act

as Driver Mutations in Human Cancers

Fran Supek,1,4Bele´n Min˜ana,2,4Juan Valca´rcel,2,4,5Toni Gabaldo´n,3,4,5and Ben Lehner1,4,5,*

1EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation, Dr. Aiguader 88, 08003 Barcelona, Spain 2Gene Regulation, Stem Cells and Cancer Program, Centre for Genomic Regulation, Dr. Aiguader 88, 08003 Barcelona, Spain 3Bioinformatics and Genomics Program, Centre for Genomic Regulation, Dr. Aiguader 88, 08003 Barcelona, Spain

4Centre for Genomic Regulation, Universitat Pompeu Fabra, Barcelona 08003, Spain

5Institucio´ Catalana de Recerca i Estudis Avanc¸ats, Passeig Lluis Companys 23, 08010 Barcelona, Spain *Correspondence:ben.lehner@crg.eu

http://dx.doi.org/10.1016/j.cell.2014.01.051

SUMMARY

Synonymous mutations change the sequence of

a gene without directly altering the sequence of

the encoded protein. Here, we present evidence

that these ‘‘silent’’ mutations frequently contribute

to

human

cancer.

Selection

on

synonymous

mutations in oncogenes is cancer-type specific,

and although the functional consequences of

cancer-associated synonymous mutations may be

diverse, they recurrently alter exonic motifs that

regulate splicing and are associated with changes

in oncogene splicing in tumors. The p53 tumor

sup-pressor (

TP53) also has recurrent synonymous

mutations, but, in contrast to those in oncogenes,

these are adjacent to splice sites and inactivate

them. We estimate that between one in two

and one in five silent mutations in oncogenes

have been selected, equating to

6%– 8% of all

selected single-nucleotide changes in these genes.

In addition, our analyses suggest that

dosage-sen-sitive oncogenes have selected mutations in their

3

0

UTRs.

INTRODUCTION

Cancer genomes contain not only cancer-causing ‘‘driver’’ mutations, but also many additional accumulated ‘‘passenger’’ mutations that do not contribute directly to the tumor phenotype (Stratton et al., 2009). A key challenge in can-cer research is to distinguish the consequential driver muta-tions from the inconsequential passenger mutamuta-tions. Given the recent explosion in the number of sequenced cancer genomes, this challenge is particularly pressing (Vogelstein et al., 2013).

Decades of research have revealed that somatic mutations affect cancer genes through diverse molecular mechanisms. For example, oncogenes can be activated by amino acid-altering missense mutations, by amplifications, and by translo-cations. Tumor suppressor genes, in contrast, are inactivated

by local or large-scale deletions and loss of heterozygosity, by the introduction of premature stop codons or frameshifts, and by mutations in introns that inactivate pre-mRNA splice sites (Vogelstein et al., 2013). In the large-scale analyses of cancer genomes performed to date, it is these previously established common mechanisms that have been considered (Stratton et al., 2009; Vogelstein et al., 2013).

However, the extent of sequence data that are now available for tumor samples should also allow the identification of addi-tional classes of common oncogenic mutations. If such new types of driver mutations exist and they are not considered, then our ability both to interpret cancer genomes in general and to understand the causes of oncogenesis in individual patients would be compromised.

The universal genetic code uses 61 codons to encode 20 amino acids. The redundancy in this code means that many nucleotide changes can be made in the sequence of a gene without affecting the sequence of the encoded protein. Synon-ymous or ‘‘silent’’ positions in a gene’s sequence can be used to encode additional information that affects prop-erties such as the speed or accuracy with which an mRNA is translated (Drummond and Wilke, 2008), how an mRNA is folded (Goodman et al., 2013) or spliced (Parmley et al., 2006), or, through translational pausing, how a protein folds (Zhang et al., 2009; Zhou et al., 2013). These and other mechanisms mean that changes in a gene’s sequence that are silent with respect to protein sequence are not always silent with respect to function (Gingold and Pilpel, 2011). Indeed, there is growing evidence that natural selection acts widely on synonymous sites (Drummond and Wilke, 2008; Supek et al., 2010).

Based on their evolutionary importance, individual examples (Gartner et al., 2013; Griseri et al., 2011; Ma et al., 2009; Schutz et al., 2013), and their potential role in other human diseases (Sauna and Kimchi-Sarfaty, 2011), we hypothesized that synon-ymous mutations might also frequently contribute to cancer. Here, in an analysis of >3,000 cancer exomes and >300 cancer genomes, we present robust statistical evidence that this is indeed the case: that synonymous mutations must frequently contribute to cancer. These silent mutations in exons may act through diverse molecular mechanisms, and they are often associated with changes in splicing.

(2)

RESULTS

Employing Matched Gene Sets to Control for Regional Variation in Mutation Frequencies in Cancer Exome Data

Previous studies have distinguished cancer-causing driver mutations from background passenger mutations by comparing the frequency of protein-coding changes to the frequency of synonymous mutations in the same genes (Ding et al., 2008; Pei-fer et al., 2012). However, a search for selection acting on synon-ymous mutations necessarily has to rely on a different approach. We used the Cancer Gene Census (CGC) (Futreal et al., 2004; Santarius et al., 2010) to define three sets of known oncogenes according to their mechanism of activation and a set of known tumor suppressor genes (Figure 1A). Additionally, we collected a larger set of cancer genes consisting of those reported as recurrently mutated in 34 recent cancer genome sequencing studies (Table S1available online). For each of these gene lists, we ran an optimization procedure to create a set of non-cancer genes matching the cancer gene list in the distribution of their regional point mutation rates at 200 kb and 1 Mb resolution and also for a set of features implicated in mutation rate variation (the heterochromatin-associated nucleosome modification H3K9me3, replication timing, base composition [GC3content] and mRNA expression levels) across tissues (Hodgkinson et al., 2012; Pleasance et al., 2010; Schuster-Bo¨ckler and Leh-ner, 2012; Woo and Li, 2012); the complete list of features is given inTable S2, and a principal components plot summarizing

A

B

C D Figure 1. Classes of Cancer Genes and

Their Corresponding Matched Gene Sets

(A) Cancer Gene Census, subdivided by mech-anism of (in)activation. ‘‘Recurrently mutated genes,’’ are from 34 cancer genome papers. (B) Principal components analysis of regional so-matic mutation rates in cancer or known correlates thereof.

(C and D) Cumulative histograms of distributions of genomic variables in the recurrently mutated on-cogenes (C) or tumor suppressors (D) against a matched set and all other genes (5th

–95th percentile shown). ‘‘D+/’’ are Kolmogorov-Smir-nov statistics quantifying if a variable is stochas-tically smaller (D+) or larger (D) in the cancer genes; one-tailed p value is given below. See alsoFigure S7andTables S1andS2.

the relationships between these features is shown inFigure 1B. For instance, the 39 recurrently mutated oncogenes from the CGC are situated in genomic regions with significantly higher mutation rates, higher levels of heterochromatin, and later replication times (Figure 1C), and they are also more highly expressed than the general gene set (Figure 1C). The matched set here consists of 1,517 genes that closely follow the 39 onco-genes in all of these features. In contrast, the 38 recurrently mutated tumor suppressor genes lie in signif-icantly less mutable, less heterochromatic, and earlier replicating regions of the genome (Figure 1D); the corresponding matched set again compensates for these differences (Figure 1D). Oncogenes, but Not Tumor Suppressor Genes, Contain an Excess of Somatic Synonymous Mutations in Human Tumors

We next compiled a data set of 292,405 missense and 123,193 synonymous somatic mutations identified in the exomes of 3,851 cancer samples. These include tumors from 11 different tissues with more than 200 samples each (Table S2). As ex-pected, missense mutations are enriched in previously reported oncogenes and tumor suppressor genes (Figure 2A). Strikingly, however, oncogenes known to be activated by missense muta-tions are also enriched for synonymous mutamuta-tions in the 3,851 cancer exomes (Figure 2A), with a 23%–30% excess of synony-mous mutations compared to genes in their matched control set (p = 33 106and p = 73 1010;Figure 2A). In contrast, this is not the case for tumor suppressor genes, which actually show a slight depletion for synonymous mutations (Figure 2A).

To further test for any additional unaccounted source of varia-tion in mutavaria-tion rates, we also quantified the enrichment of mu-tations in introns. In all cases, the enrichments of mumu-tations in the introns of oncogenes and tumor suppressors compared to the matched gene sets were close to 1 (Figure 2D). Importantly, the intronic mutation rates of the oncogenes were not elevated compared to the matched sets when assessed both in the set

(3)

A B C

D

E F

G H I

J K

Figure 2. Enrichment of Synonymous Mutations in Oncogenes, but Not in Tumor Suppressor Genes, Is Cancer-Type Specific and Is Not due to Co-Occurrence with Nonsynonymous Mutations

(A) Mutation enrichments in cancer genes compared to matched sets.

(B and C) Distribution of synonymous enrichments for non-cancer genes in 164 Gene Ontology categories (B) or for oncogenes after 100 bootstrap samples (C). p value by Z-test (one tailed).

(D) Mutation enrichment across 395 cancer whole-genome sequences.

(E and F) Synonymous mutation enrichments of cancer genes classes evaluated independently of the matched sets: a comparison to neighboring genes (E) and the Invex randomization test (F). Error bars are 95% CI.

(G) Synonymous enrichments for cancer genes determined separately for tissues. Exome counts shown in pie chart. p values are by Z-test.

(H and I) Synonymous enrichments for the complete and for the tissue-specific cancer gene sets (Table S5), for oncogenes (H) and tumor suppressors (I). p values are by paired t test (one tailed, for tissue specific > general enrichment). Blood (BLO) or brain (BRA) tumors did not register synonymous mutations in the tissue-specific tumor suppressors and oncogenes, respectively, and are thus not shown.

(J) Co-occurrence of missense and synonymous mutations in the same genes in the same tumor samples. Gene counts in parentheses or ranges of counts for tissue-specific sets.

(K) Synonymous enrichments after removing mutations with another nonsynonymous mutation in the same gene (or on the same chromosome) in the same cancer sample. Error bars are 95% CI.

(4)

of 176 whole-genome sequences used in the matching proce-dure (1.033 and 1.043;Figure 2D) and in an independent set of 219 genomes (1.053 and 1.043;Figure 2D). Moreover, the oncogenes’ UTRs were not enriched with mutations compared to the matched sets (0.963 and 0.973;Figure 2D). This is in contrast to the synonymous mutation frequency, which is simi-larly elevated in the set of whole-genome sequences (1.273 and 1.323;Figure 2D). Thus, increases in local mutation rates cannot account for the elevated synonymous mutation rates in oncogenes.

The mutation spectra of synonymous mutations in oncogenes closely reflect those of their matched sets (Table S3), meaning that differences in sequence composition do not underlie the observed increases in synonymous mutation rates. Moreover, oncogenes are highly enriched for synonymous mutations compared to other gene functional categories (p = 0.014; histo-gram inFigure 2B). Finally, bootstrapping to remove bias ( Fig-ure S1) resulted in a slightly higher estimate of synonymous enrichment of the oncogenes (1.343;Figure 2C) as well as an estimate of a confidence interval (CI) that accounts for the vari-ance of the matching algorithm (95% CI: 1.18–1.49).

Considering Mutations in Introns and in Neighboring Genes Confirms the Excess of Synonymous Mutations in Oncogenes

We validated our findings using two additional methodologies that are orthogonal to the matched-sets approach. First, we compared the synonymous mutation rates in oncogenes to those in neighboring genes in the genome. Whereas tumor sup-pressor genes and amplification- or translocation-activated oncogenes show no evidence of an elevated frequency of syn-onymous mutation compared to their neighbors (Figure 2E), missense-activated oncogenes have an excess of synonymous mutations (1.203 and 1.373 for the two gene sets;Figure 2E), strikingly similar to that determined using the mutation-rate matched gene sets.

Second, we employed the Invex randomization test (Hodis et al., 2012) that simulates mutations in the coding sequence by sampling from the intronic and UTR mutations of the same gene. This test stringently controls for variations in mutation rates and spectra. The Invex test estimates 1.203 and 1.253 increased synonymous mutation rates in oncogenes (Figure 2F). We also compared the rates of missense mutations determined using this method to those determined using the Invex test and again found a striking agreement (R2= 0.991 across cancer gene clas-ses;Figure S1), further validating the matched-sets approach. Co-Occurring Mutations Do Not Account for the Excess of Synonymous Mutations in Oncogenes

One possible explanation for the enrichment of synonymous mutations in oncogenes is that they represent the phenomenon of clustered mutations or ‘‘mutation showers’’ (Roberts et al., 2012; Wang et al., 2007). For example, if a mutation process gen-erates multiple mutations in a gene, then functionally neutral syn-onymous mutations could be enriched in oncogenes because of ‘‘hitchhiking’’ with missense mutations that are positively selected. However, this model cannot explain the enrichment of synonymous mutations in oncogenes, because samples

harboring missense mutations in a particular oncogene are un-likely to also harbor synonymous mutations in the same gene (odds ratio [OR] = 0.53, p = 33 106;Figure 2J). This avoidance is even stronger in the oncogenes most frequently mutated in a particular tissue (OR = 0.29, p = 43 108;Figure 2J). Moreover, such strong avoidance is not observed for tumor suppressor genes, where we see no enrichment for synonymous mutations (Figure 2J). Consistently, if we filter out all synonymous muta-tions in genes already harboring a nonsynonymous mutation in the same sample, the synonymous enrichment of oncogenes is retained (1.223, p = 4 3 105;Figure 2K), as is also the case when removing all synonymous mutations occurring on chromo-somes containing any additional nonsynonymous mutations in any gene (Figure 2K; p = 0.003).

Synonymous Mutations Are Selected for in a Tissue-Specific Manner

The frequency of missense mutation in individual oncogenes is not evenly distributed across tissues. For example, the onco-geneBRAF is frequently mutated in melanoma, whereas KRAS is frequently mutated in colorectal cancer. We quantified the fre-quencies of synonymous mutations separately for cancers of different tissues and observed a consistent trend of enrichment in the oncogenes and depletion in the tumor suppressors across the 11 tissues (Figure 2G). The enrichment of synonymous muta-tions in oncogenes is lower in tissues with a high mutational load (1.10–1.263 in lung, melanoma, or head and neck cancers; Fig-ure 2H) and stronger in tissues with a low mutational load (1.44– 1.593 in leukemia, breast, and ovarian cancer; Figure 2H), consistent with a lower signal-to-noise ratio in highly mutated genomes. When considering individual tissues, synonymous mutations are enriched in the same tissue-specific oncogenes (list inTable S4) as missense mutations (Figure 2H; median syn-onymous enrichment over tissues = 1.983 compared to me-dian = 1.293 for the complete oncogene set; p = 1.3 3 103). This is not the case for tumor suppressor genes (Figure 2I; me-dian enrichment = 0.723). Again, this is not due to synonymous and missense mutations co-occurring in the same genes in the same sample (Figure S2; median enrichment = 1.843; p = 5 3 103). Tissue-specific oncogenes do not exhibit different dinucleotide compositions in their exons (Figure S2), making it unlikely that tissue-specific mutational processes underlie this enrichment. Thus, in a particular tissue, synonymous and missense mutations tend to target the same oncogenes. Cancer-Associated Synonymous Mutations

Preferentially Target Conserved Sites and Particular Regions of Oncogenes

The molecular mechanisms by which synonymous mutations can affect gene activity are diverse (Sauna and Kimchi-Sarfaty, 2011). However, we reasoned that some molecular mechanisms might be more common than others, and so we identified a group of 16 known oncogenes with a high (>1.53) enrichment with synon-ymous mutations for further analysis of putative molecular effects (Figures 3A–3C). Within this set, 11 genes are also individually significantly enriched with synonymous mutations (false discov-ery rate [FDR] < 9%). These are the receptor tyrosine kinases PDGFRA, EGFR, KDR (VEGFR2), and NTRK1, the receptors

(5)

IL7R and TSHR, the structural protein ELN, the cytoplasmic kinases JAK3 and ITK, and the transcription factors GATA1 and RUNX1T1. In addition to this set, the oncogenes ALK and NOTCH2 are also individually significant, at 1.473 and 1.463 enrichment. The increased synonymous mutation rates in these genes are not associated with a comparable increase in intronic and UTR mutation rates (Figure 3D; p = 5.73 104), ruling out a role for an elevated local mutation rate. In addition to the above oncogenes, a tissue-specific synonymous enrich-ment could be detected for other genes, most consistently for theJAK2 and RET kinases and for the SF3B1 splicing factor (at least two tissues at FDR% 10%; list inTable S5).

First, we considered whether cancer-associated synonymous mutations preferentially target evolutionarily conserved synony-mous sites (Lin et al., 2011) and found that this was indeed the case (Figure S1). This enrichment is not observed for tumor sup-pressors (Figure S1) and is confirmed using an alternative mea-sure of sequence constraint (Pollard et al., 2010) (Figure S1).

A B C D E F G H I

Figure 3. Oncogenes Most Targeted by Synonymous Mutations and Mutation Clustering

(A and B) Enrichment of individual oncogenes compared to a matched set of 200 non-cancer genes. Error bars are 95% CI. Significance by Z-test of log enrichment (one tailed). Bars darker for individually significant genes at p < 0.025 (%9% FDR).

(C) The 16 oncogenes with >1.53 enrichment examined in the following panels andFigure 4. (D) Enrichment with intronic and UTR mutations in the significant oncogenes.GATA1 not shown (only one mutation, enrichment = 0.173). Error bars are 95% CI for the noncoding enrichment. p value by paired t test, one tailed.

(E–G) Clustering of synonymous and missense mutations; p values are by Fisher’s exact test (one-tailed).

(H and I) Prominent examples of clusters of three or more recurrent mutations, only synonymous (I), or also missense (H). Amino acid changes given next to the mutated codon, along with the TCGA or Cosmic sample ID.

See alsoFigures S1,S3, andS4andTables S5 andS6.

Missense mutations are known to clus-ter together in the linear sequence of on-cogenes, reflecting the arrangement of functionally important sites in a protein. Synonymous mutations also significantly cluster in known oncogenes (Figure 3E) and even more so in the top-16 oncogene set, where a synonymous mutation is more likely to occur within five codons of another synonymous-mutated site (OR = 1.74, p = 106;Figure 3E), an effect size similar to the missense-missense clustering in the same genes (OR = 1.92;

Figure 3F). Again, such an association is absent in tumor suppressors for synonymous mutations (OR = 0.94), but not, as expected, for missense mutations (OR = 2.29). Interestingly, we found that the synonymous mutations in the 16-oncogene set also cluster significantly with the missense mutations, albeit to a lesser extent (OR = 1.24; Fig-ure 3G). The clustering of the synonymous mutations is not due to hitchhiking effects, because it is not altered after removing cases where the gene also contained a nonsynony-mous mutation in the same sample (OR = 1.71, p = 23 105; striped bars inFigure 3E). Individual examples of clustered mu-tations are shown inFigure 4I and 4H, with the complete set in all highly enriched oncogenes inFigure S3.

Cancer-Associated Synonymous Mutations Do Not Frequently Increase Codon Optimality or mRNA Stability or Inactivate Exonic miRNA Target Sites

Next, we considered the possibility that cancer-associated synonymous mutations function by increasing the use of more

(6)

optimal codons, thereby enhancing the efficiency or accuracy of oncogene translation (Drummond and Wilke, 2008). However, we found no evidence that cancer-associated synonymous mutations, considered as a set, increase codon optimality ( Fig-ure S4), as judged using the genomic tRNA gene copy number (Table S6). We also considered the potential influence of synon-ymous mutations on mRNA secondary structure, which is also known to affect translation efficiency (Goodman et al., 2013). However, the computationally predicted mRNA structures (Markham and Zuker, 2008) in 50 and 100 nt windows overlap-ping synonymously mutated sites did not show a general ten-dency toward more open structures when mutated (Figure S4). We also considered that synonymous mutations might affect oncogene expression by abolishing exonic microRNA (miRNA) target sites (Brest et al., 2011) but again found no evidence for this, considering either all sites or only those with evidence of evolutionary conservation at synonymous sites (Figure S4). Therefore, effects on codon optimality, mRNA folding, and miRNA targeting do not seem to be common mechanisms by which synonymous mutations influence oncogene activity, although they may still be important in individual cases (Gartner et al., 2013).

Synonymous Mutations in Oncogenes Target Exonic Splicing Motifs

An additional step in gene expression influenced by synonymous mutations is pre-mRNA splicing, with coding exons containing both exonic splicing enhancer (ESE) and exonic splicing silencer

(ESS) motifs that are under evolutionary constraint (Keren et al., 2010; Parmley et al., 2006) and with variants in these exonic mo-tifs leading to altered splicing (Cartegni et al., 2002). Mutations in intronic splicing donor or acceptor sites are a common mecha-nism of tumor suppressor gene inactivation, and changes in splicing can also deregulate the activity of oncogenes ( Druillen-nec et al., 2012; Kaida et al., 2012). Changes in splicing were also recently identified as a mechanism of acquired drug resis-tance in theBRAF (Poulikakos et al., 2011) andAR (Li et al., 2012) genes, where the underlying causes were either unknown (forBRAF) or intragenic deletions (AR).

Consistent with a role in splicing, synonymous mutations in the 16-oncogene set are 1.753 enriched within 30 nt of an exon boundary in oncogenes (Figure 4A; p < 104), but not in tumor suppressor genes (Figure 4A). Moreover, using the results of a study that tested all possible hexamers for ESE or ESS activity (Ke et al., 2011) revealed that cancer-associated synonymous mutations close to exon boundaries preferentially result in the gain of ESE motifs and the loss of ESS motifs (Figure 4B). This is not the case for synonymous mutations in tumor suppressor genes (Figure 4B). The same trends are seen when considering motifs identified by screening a random oligonucleotide library for ESS activity using an in vivo splicing reporter system (Wang et al., 2004) (Figure 4C). The ESE motifs identified through a sta-tistical analysis of exon composition (Fairbrother et al., 2002) also predict that cancer-associated synonymous mutations tend to create ESEs in oncogenes (Figure 4D), but not in tumor suppressors. Across the three definitions of ESE/ESS,

A B C D

E F G

Figure 4. Cancer-Associated Synonymous Mutations Create Exonic Splicing Enhancers and Destroy Exonic Splicing Silencers

(A) Enrichment of synonymous mutations near splice sites (SS). Odds ratios and p values byc2

test shown above.

(B–D) Enrichments of synonymous mutations within 30 nt of SS that lead to exonic splicing silencer (ESS) or exonic splicing enhancer (ESE) gain or loss events. y axes display odds ratios versus different baselines, counting those mutations leading to a gain/loss of an ESE/S; p values byc2

test. Error bars are 95% CI of the odds ratio.

(E) Cumulative histograms of SS agreement to consensus motif for exons with above- versus below-median densities of synonymous mutations. D+ is the Kolmogorov-Smirnov (K-S) statistic. p value is from the two K-S tests combined using Fisher’s method. PWM, position weight matrix.

(F and G) Sequence contexts around synonymous mutations leading to ESE gains (F) or to ESS losses (G) clustered. Similar known motifs of ESE/ESS binding proteins shown for comparison. Bar charts show number of sequences above binding threshold for splicing factors (F) or in searches for similar motifs among RNA binding proteins (G).

(7)

comparing the synonymous against the missense mutations in the same genes as a baseline gives a similar result: the synony-mous mutations tend to lead to ESE gains and ESS losses more often than the missense mutations (Figures 4B–4D). Moreover, randomizing the positions of the synonymous mutations strongly reduces or abolishes the enrichments (Figures 4B–4D), meaning that the enrichment cannot be explained by an increased density of ESE/ESS elements in the oncogenes. ESE and ESSs tend to play more important roles in exon definition when the flanking splice sites are weaker, i.e., more divergent from the consensus splice site sequences (Fairbrother et al., 2002). Consistent with this, we found that exons of the 16-oncogene set that harbor more synonymous mutations tend to have weaker splice sites (Figure 4E; p = 0.019).

Clustering the sequences in the close vicinity of the ESE/ESS-altering synonymous mutations (±5 nt, because ESE/ESS are hexamers), we found one dominant cluster that encompasses 43% of examined ESE gain events (Figure 4F) and two large clus-ters for the ESS losses, encompassing 22% and 20% of ESS losses (Figure 4G). The consensus sequence of the ESE cluster is compatible with the known binding site of the SRSF1 (SF2/ ASF) splicing factor (Figure 4F), which can act as a potent onco-protein (Karni et al., 2007). Moreover, a cluster of ESS loss sequences closely matches the known binding site of the hetero-geneous nuclear ribonucleoprotein (hnRNP) H2 splicing factor (Figure 4G). Previously, hnRNP H2-mediated regulation has been implicated in anticancer drug resistance (Stark et al., 2011). It is thus conceivable that gain of binding of SF2/ASF

and loss of binding of hnRNPs contributes to splicing regulation with consequences for tumor progression.

Synonymous Mutations in Oncogenes Are Associated with Abnormal Splicing

We examined >2,000 cancer exomes that had matching RNA sequencing (RNA-seq) measurements, searching for splicing irregularities associated with synonymous mutations by employ-ing a multivariate outlier measure, the Mahalanobis distance (De Maesschalck et al., 2000; Mahalanobis, 1936). This method accounts for the correlation structure between variables (exon levels), as well as for the variability of each variable, and can be understood as a multivariate generalization of a Z score. This revealed a strong association between synonymous muta-tions and differential exon usage profiles in the most recurrently mutated oncogenes (p = 0.0004;Figures 5A and 5C). This is not the case for samples harboring nonsynonymous mutations in the same genes (Figure 5C) or for the splicing of non-cancer genes with synonymous mutations (Figure 5C). The oncogenes exhibit no significant change in their overall expression level in the sam-ples with synonymous mutations (Figure 5D). Importantly, the strongest deviations in the splicing patterns (top 10% by outlier score; rightmost bar inFigure 5A) are the ones most enriched in the synonymous-mutated oncogenes (p = 23 106), with some enrichment also noted for moderate changes in splicing patterns (top 30%;Figure 5A; p = 0.002).

Across tumor types, the frequency of aberrant splicing profiles detectable in the RNA-seq data allowed us to conservatively Figure 5. Synonymous Mutations in Oncogenes Are Associated with Changes in Exon Usage Profiles in RNA-Seq Data

(A) Frequency of cancer samples harboring a synonymous mutation in an oncogene with strongly outlying exon abundance profiles, when compared against other samples in a given tissue. p values by binomial test, where expected frequency is the average of lowest five bins.

(B) Estimated proportions of putative driver mutations among all synonymous mutations and of splicing-affecting drivers among all synonymous drivers, the latter corresponding to the excess of outlying exon usage profiles over the expectation. The ‘‘other putative drivers’’ encompass the excess of synonymous mutations in the oncogenes that did not provoke splicing changes and thus likely act via a different mechanism. The ‘‘putative passengers’’ represent the baseline frequency of synonymous mutations. In all panels, examined oncogenes are the 16 genes with >1.53 overall synonymous enrichment (Figures 3A and 3B), plusALK and NOTCH2, which are individually significant.

(C) Cumulative histograms of the outlier measure for the samples where the 18 oncogenes harbor a synonymous or nonsynonymous mutation or are not mutated. Matched non-cancer genes given for comparison. p values for difference of distributions by K-S test.

(D) As above but for rank-normalized median RPKM of all exons of a transcript.

(E) The outlier measure visualized for oncogenes in samples where they harbor a synonymous mutation. FDR byc2

(8)

estimate that approximately half of the putative synonymous drivers are associated with splicing changes (Figure 5B). More-over, we note that splicing alterations may be more frequent for some oncogenes (e.g.,ITK, ALK, IDH1, or BCL6) than for others (Figure 5E), although other oncogenes (includingEGFR, KDR, and PDGFRA) also contain multiple instances of synony-mous mutations in samples with significantly outlying exon usage profiles (Figure 5E).

Cancer-Associated Synonymous Mutations Alter Splicing in Minigene Assays

To further validate our findings, we experimentally evaluated the effects of 12 synonymous mutations in five oncogenes (Extended Experimental Procedures) by examining changes in splicing patterns induced by the mutations using minigenes transfected in HeLa cells. Briefly, for each mutation tested, exonic and flanking intronic sequences of wild-type and mutant

A B C D E F G H

Figure 6. Effects of Cancer-Associated Synonymous Mutations in Oncogenes and TP53 in an In Vivo Splicing Assay

(A–D) Losses or gains of ESE/ESS motifs inJAK3, NTRK1, and ITK oncogenes alter exon inclusion levels. For each genotype, triplicate transfections are shown. Average± SD % exon skipping shown. *Uppermost bands in (B) result from an activation of a downstream cryptic splice site in the reporter gene intron (62 nt away). The slightly shifted in-clusion band in mutantITK exon 13 (C) is not due to aberrant splicing.

(E) TP53 is highly enriched with synonymous

mutations adjacent to splice sites. Error bars are 95% CI.

(F and G). Two sites at the 30ends ofTP53 exons 4 and 6 harbor recurrent synonymous (F) and missense mutations (G).

(H) The synonymous mutation at the 30end of exon 6 activates a cryptic splice site, resulting in a frameshifted mRNA.

See alsoFigure S5.

were cloned to replace an alternative cassette exon in a reporter plasmid (Orengo et al., 2006) and the patterns of exon inclusion/skipping of transcripts derived from the minigene were analyzed (see Experimental Procedures). The examined oncogene exons were selected such that the wild-type minigene ex-hibited measurable levels of both exon inclusion and skipping in this experi-mental setup. In total, six of the mutations affected alternative splicing: two muta-tions in exon 15 ofJAK3, one mutation in exon 14 ofNTRK1, and two mutations in exon 4 and one in exon 13 ofITK. In all cases, the mutations are located in the tyrosine kinase domains of the corre-sponding proteins, except forITK exon 4, which overlaps a pleckstrin homology domain. The two mutations inJAK3 exon 15 affect neighboring codons and were predicted to lead to ESS losses in two overlap-ping ESS hexamers (Figure 6A); they induced a clear reduction (14%–23%) in exon-skipping frequency in the splicing assay (Figure 6A). Similarly, theNTRK1 mutation resulted in a reduction in exon skipping (Figure 6B,23%), consistent with its predic-tion as an ESE gain event; we also observed an increased abun-dance of an aberrantly spliced product (asterisk inFigure 6B). Finally, in theITK oncogene, the two mutations in exon 4 were predicted to result in ESS losses (Figure 6D). Both mutations decreased the skipping/inclusion ratio and, moreover, increased the overall levels of mRNA (Figure 6D). A third mutation inITK was predicted to cause both ESE gain and ESE loss events in overlapping motifs in exon 13 and led to an increase in exon skipping (10%;Figure 6C).

In all cases exceptITK exon 4, exon skipping is predicted to result in a frameshift, suggesting that increased inclusion might

(9)

lead to increased expression of the full-length isoform and consequently higher oncogene activity. Taken together, the detection of splicing changes in 6 out of 12 (50%; 95% CI: 21%–79%) of the tested cases is consistent with the estimate derived from the RNA-seq data that approximately half of synon-ymous drivers in oncogenes have effects on alternative splicing. Recurrent Synonymous Mutations inTP53 Inactivate Splice Sites

Although, considered together, tumor suppressors did not show an enrichment of synonymous mutations, when considered individually, one tumor suppressor,TP53, was strongly enriched (Figure 6E; 3.993). The TP53-synonymous mutations were also extremely recurrent and, in contrast to those in oncogenes, target nucleotides directly adjacent to splice sites (75% of the synonymous mutations recur in three such nucleotides). Indeed, upon masking the terminal 2 nt of its exons,TP53 shows no syn-onymous mutation enrichment (Figure 6E), consistent with previ-ous estimates that most synonymprevi-ous mutations in TP53 are neutral (Strauss, 2000). However, an enrichment in evolutionarily conserved sites suggests a functional role for a subset of muta-tions (Lamolle et al., 2006).

Recurrently targeted synonymous sites inTP53 include the 30 terminal nucleotide ofTP53 exon 4 (in codon 125;Figure 6F). Strikingly, a germline variant at the same site is a known cause of Li-Fraumeni syndrome, producing an aberrantly spliced TP53 mRNA that causes intron retention (Warneford et al., 1992) and activation of cryptic splice sites (Varley et al., 1998). In addition, the 30terminal nucleotide of exon 6 is also recurrently mutated (codon 224;Figure 6G) with both synonymous and con-servative missense changes (Glu > Asp). We confirmed using an in vivo splicing assay that this mutation results in aberrant splicing, activating a cryptic splice site 5 nt downstream from the end of exon 6 and resulting in a frameshifted mRNA ( Fig-ure 6H). The wild-type splice site at the 50ofTP53 exon 6 has only a limited match to the consensus (AG/gtctgg versus AG/ gtaagt); the exonic G > A change further weakens it, being the

likely cause for the use of the downstream cryptic splice site that has a G as the 30nucleotide of the extended exon 6. More-over, we found a third recurrently mutated site in TP53 with similar properties, where the 30terminal G of exon 9 is commonly lost (codon 331;Figure S5). Thus, at least one tumor suppressor gene is also targeted by recurrent synonymous mutations at multiple locations, but, in contrast to those in oncogenes, these mutations inactivate splice sites.

Dosage-Sensitive Oncogenes Have an Excess of Mutations in Their 30UTRs

Finally, we reasoned that our analytical approach might also be useful for identifying additional classes of recurrent mutations in cancer genomes. We focused on mutations in untranslated regions because, similar to synonymous mutations, these have not been extensively investigated and are known to be important for regulating gene expression. In contrast to the situation with synonymous mutations, missense-activated oncogenes and the recurrently mutated oncogenes showed no enrichment for mutations in their 30 UTRs (Figure 7A). However, one set of oncogenes did show a clear enrichment of mutations in 30 UTRs: those oncogenes that can be activated by amplification (1.193, p = 0.03;Figure 7A).

We hypothesized, therefore, that other dosage-sensitive on-cogenes—those activated by elevated expression—might also show an enrichment of somatic mutations in their 30 UTRs. Consistent with this, we found that cancer genes that are consis-tently overexpressed in human tumors (Figure 7B) and the ortho-logs of dosage-sensitive oncogenes identified in mice using transposon and retroviral screens (Figure 7B) both also showed an elevated rate of mutation in their 30UTRs (1.263 and 1.403, p = 0.002 and 0.004;Figure 7A). None of the three groups of dosage-sensitive oncogenes exhibited an excess of synony-mous or intronic mutations (Figure 7A), demonstrating that regional mutation rate variations do not underlie the enrich-ments. When considering the opposite definitions of the above—cancer genes consistently underexpressed in human

A

C

B

D

Figure 7. Dosage-Sensitive Oncogenes Are Enriched with 30UTR Mutations

(A) Enrichments of cancer genes with synony-mous, 30UTR, and intronic mutations. Error bars are 95% CI. FDR shown for 30UTR enrichment. (B) Complementary definitions of dosage-sensitive cancer genes through their overexpression across many cancers or by orthology to genes from mouse mutagenesis screens. GXA, Gene Expres-sion Atlas.

(C) Enrichment of individual dosage-sensitive genes with 30UTR mutations; oncogenes from the amplified and mouse sets are shown. Error bars are 95% CI. Significance calls by Z-test of the log enrichment (one tailed).

(D) Clustering of mutations in the 30 UTRs of dosage-sensitive oncogenes. p values by Fisher’s exact test (one tailed).

(10)

tumors or the orthologs of tumor suppressors found in mouse screens—neither group showed any 30 UTR mutation enrich-ment (Figure 7A). Moreover, 17 of the dosage-sensitive onco-genes wereR23 enriched with 30UTR mutations, of which ten genes were individually significant at FDR% 10% (Figure 7C). Within the highly enriched dosage-sensitive oncogenes, the 30 UTR mutations clustered (Figure 7D), and we provide the muta-tion clusters in the significantly enriched genes as a resource (Figure S5;Table S7). Thus, our method was able to identify another unexpected enrichment of mutations in cancer ge-nomes, predicting that dosage-sensitive oncogenes may be recurrently activated by mutations in their 30UTRs.

DISCUSSION

The large-scale sequencing of cancer genomes provides an op-portunity to reassess the types of mutations that contribute to cancer in an unbiased manner. From their excess in oncogenes (see Extended Experimental Procedures), we estimate that approximately one in five synonymous mutations in all known oncogenes reported to-date in cancer genome projects have been selected, with this proportion rising to approximately one in two for the oncogenes most relevant to each cancer type. From the frequency at which synonymous changes occur compared to nonsynonymous mutations within oncogenes (see Extended Experimental Procedures), synonymous muta-tions represent6%–8% of all driver mutations due to single-nucleotide changes in these genes. Therefore, as part of their standard analyses, future cancer genome studies should test for enrichment and recurrence of synonymous changes.

Moreover, our combined computational and experimental an-alyses estimated that approximately half of synonymous drivers alter splicing. In future experimental work, it will be important to further dissect how these changes in splicing influence cancer progression and how they can be therapeutically targeted ( Bon-nal et al., 2012). In addition to their effects on splicing, it is known that synonymous mutations can alter protein folding ( Kimchi-Sarfaty et al., 2007; Zhang et al., 2009; Zhou et al., 2013), and our preliminary analyses suggest that this might also be relevant in cancer (Figure S4).

As illustrated by our discovery of selected mutations in the 30 UTRs of dosage-sensitive oncogenes, the general approach presented here may be useful for identifying additional classes of unexpected cancer driver mutations and thus provide a fuller understanding of how changes in cancer genomes contribute to tumor progression in individual patients.

EXPERIMENTAL PROCEDURES

For a more exhaustive description of the computational analyses, as well as the minigene splicing experiments, please refer toExtended Experimental Procedures.

Genomic Data, Cancer Genes, and Matched Sets

Somatic mutations in 3,851 cancer exomes were from the Cancer Genome Atlas (TCGA) Data Portal (Zhang et al., 2011) and from the exome sequencing studies in COSMIC v61 (Forbes et al., 2011) that reported synonymous muta-tions (Table S2). Somatic mutations in 167 cancer genomes from the Interna-tional Cancer Genome Consortium Data Portal v8 and addiInterna-tional publications

(Table S2) were used to determine 1 Mb/200 kb regional mutation rates. Of 219 additional whole genomes from the TCGA project (Table S2), mutations in 50 leukemias were from Cancer Genome Atlas Research Network (2013), whereas for others, including the 24 glioblastomas (Brennan et al., 2013), aligned short reads were downloaded from cgHub and somatic mutations called using Strelka 1.0.6 (Saunders et al., 2012). cDNAs from COSMIC v61 were matched to transcript models from RefSeq or Ensembl in the University of California, Santa Cruz (UCSC) ‘‘golden path’’ repository. Nonunique genomic regions in the ‘‘CRG Alignability 36’’ UCSC track were masked out and sequencing error-prone motifs checked for (Figure S6). The regional mu-tation rates for different cancers (Figure 1B), gene mRNA levels in ten tissues (Figure S7), and GC3 content in cDNA, together with H3K9me3 (CD4+

cells) and Repli-Seq (NHEK cells, wgEncodeEH002249) signal averaged over 1 Mb windows, was used to find matched sets using a genetic algorithm (GA)-based optimization. The GA minimizes the cDNA length-weighted Kol-mogorov-Smirnov distance of the distributions of mutation rates and other genomic features to the cancer gene set in question by iteratively ‘‘recombin-ing’’ and ‘‘mutat‘‘recombin-ing’’ the matched-set candidates. Matched sets for single cancer genes were constructed from 200 genes closest by Euclidian distance using trimmed and normalized mutation rates, mRNA levels, etc. Mutation enrichment is calculated as a relative risk of mutation compared to the matched set and significance determined by a Z-test on the standard error of the log RR. The tissue-specific oncogenes/tumor suppressors are those individually significantly enriched with either missense or nonsense mutations in a specific tissue (median FDR across tissues = 7.3%; list inTable S4).

Analyses of Splicing Motifs and Data

Complete sets of 176 ESSs fromWang et al. (2004)and 238 ESEs from Fair-brother et al. (2002), as well as the top 200 ESEs and ESSs fromKe et al. (2011), were tested for enrichments (usingc2

tests) of gain versus loss events caused by synonymous mutations in oncogenes, compared to the matched sets, missense mutations, or randomized positions. Splice-site strength was determined as the average log likelihood of the nucleotides in the splice site, given the consensus position weight matrixes (PWMs) in SpliceRack (Sheth et al., 2006). Motifs around the ESE/ESS-altering synonymous mutations were discovered in the mutated site and its ±5 nt context by the MEME 4.9.0 server (Bailey et al., 2009) with motif size = 6–8 and other parameters at default. Sequences in the large ESE gain cluster were submitted to ESEFinder 3.0 (Smith et al., 2006), which classifies whether each sequence will bind a splicing factor based on an empirical threshold. We searched the PWM of a major ESS loss cluster against CISBP-RNA 0.6 (Ray et al., 2013) to find human RNA-binding proteins with significantly similar binding sites. The shown SF2/ASF motif is fromSmith et al. (2006)and the hnRNP H2 motif is fromRay et al. (2013). Available data from the TCGA ‘‘RNASeqV2’’ pipeline (October 2013) were matched to existing TCGA samples with exonic muta-tions. Genes not expressed (if third quartile across exons [Q3], <1.0 reads per kilobase per million [RPKM]) were not examined further in corresponding samples; genes grossly overexpressed compared to other samples of same tissue (if Q3 an outlier according to 1.53 IQR rule) were also discarded. The square root was used as a variance-stabilizing transformation for RPKMs prior to calculating the Mahalanobis distances, which were converted to ranks and normalized to [0,1] separately for each gene in each tissue, before pooling.

SUPPLEMENTAL INFORMATION

Supplemental Information includes Extended Experimental Procedures, seven figures, and seven tables and can be found with this article online athttp://dx. doi.org/10.1016/j.cell.2014.01.051.

ACKNOWLEDGMENTS

F.S. was cofunded by Marie Curie Actions (INTERPOD). Work in the lab of B.L. is supported by an ERC Starting Grant, ERASysBio+ ERANET, MICINN BFU2011-26206, AGAUR, the EMBO Young Investigator Program, EU Frame-work 7 project 277899 4DCellFate, and the EMBL-CRG Systems Biology Pro-gram. Work in the lab of T.G. is supported by an ERC Starting Grant and

(11)

MICINN BIO2012-37161. Work in the lab of J.V. is supported by Fundacio´n Botı´n, Fundacio´n Alicia Koplowicz, Consolider RNAREG, and MINECO. Received: May 1, 2013

Revised: November 20, 2013 Accepted: January 15, 2014 Published: March 13, 2014 REFERENCES

Bailey, T.L., Boden, M., Buske, F.A., Frith, M., Grant, C.E., Clementi, L., Ren, J., Li, W.W., and Noble, W.S. (2009). MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res.37 (Web Server issue), W202–W208. Bonnal, S., Vigevani, L., and Valca´rcel, J. (2012). The spliceosome as a target of novel antitumour drugs. Nat. Rev. Drug Discov.11, 847–859.

Brennan, C.W., Verhaak, R.G.W., McKenna, A., Campos, B., Noushmehr, H., Salama, S.R., Zheng, S., Chakravarty, D., Sanborn, J.Z., Berman, S.H., et al.; TCGA Research Network (2013). The somatic genomic landscape of glioblas-toma. Cell155, 462–477.

Brest, P., Lapaquette, P., Souidi, M., Lebrigand, K., Cesaro, A., Vouret-Craviari, V., Mari, B., Barbry, P., Mosnier, J.-F., He´buterne, X., et al. (2011). A synonymous variant in IRGM alters a binding site for miR-196 and causes deregulation of IRGM-dependent xenophagy in Crohn’s disease. Nat. Genet. 43, 242–245.

Cancer Genome Atlas Research Network (2013). Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N. Engl. J. Med.368, 2059–2074.

Cartegni, L., Chew, S.L., and Krainer, A.R. (2002). Listening to silence and understanding nonsense: exonic mutations that affect splicing. Nat. Rev. Genet.3, 285–298.

De Maesschalck, R., Jouan-Rimbaud, D., and Massart, D.L. (2000). The Maha-lanobis distance. Chemom. Intell. Lab. Syst.50, 1–18.

Ding, L., Getz, G., Wheeler, D.A., Mardis, E.R., McLellan, M.D., Cibulskis, K., Sougnez, C., Greulich, H., Muzny, D.M., Morgan, M.B., et al. (2008). Somatic mutations affect key pathways in lung adenocarcinoma. Nature455, 1069– 1075.

Druillennec, S., Dorard, C., and Eyche`ne, A. (2012). Alternative splicing in oncogenic kinases: from physiological functions to cancer. J. Nucleic Acids 2012, 639062.

Drummond, D.A., and Wilke, C.O. (2008). Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution. Cell134, 341–352.

Fairbrother, W.G., Yeh, R.-F., Sharp, P.A., and Burge, C.B. (2002). Predictive identification of exonic splicing enhancers in human genes. Science297, 1007–1013.

Forbes, S.A., Bindal, N., Bamford, S., Cole, C., Kok, C.Y., Beare, D., Jia, M., Shepherd, R., Leung, K., Menzies, A., et al. (2011). COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res.39 (Database issue), D945–D950.

Futreal, P.A., Coin, L., Marshall, M., Down, T., Hubbard, T., Wooster, R., Rahman, N., and Stratton, M.R. (2004). A census of human cancer genes. Nat. Rev. Cancer4, 177–183.

Gartner, J.J., Parker, S.C.J., Prickett, T.D., Dutton-Regester, K., Stitzel, M.L., Lin, J.C., Davis, S., Simhadri, V.L., Jha, S., Katagiri, N., et al.; NISC Compara-tive Sequencing Program (2013). Whole-genome sequencing identifies a recurrent functional synonymous mutation in melanoma. Proc. Natl. Acad. Sci. USA110, 13481–13486.

Gingold, H., and Pilpel, Y. (2011). Determinants of translation efficiency and accuracy. Mol. Syst. Biol.7, 481.

Goodman, D.B., Church, G.M., and Kosuri, S. (2013). Causes and effects of N-terminal codon bias in bacterial genes. Science342, 475–479.

Griseri, P., Bourcier, C., Hieblot, C., Essafi-Benkhadir, K., Chamorey, E., Touriol, C., and Page`s, G. (2011). A synonymous polymorphism of the

Triste-traprolin (TTP) gene, an AU-rich mRNA-binding protein, affects translation efficiency and response to Herceptin treatment in breast cancer patients. Hum. Mol. Genet.20, 4556–4568.

Hodgkinson, A., Chen, Y., and Eyre-Walker, A. (2012). The large-scale distribu-tion of somatic mutadistribu-tions in cancer genomes. Hum. Mutat.33, 136–143. Hodis, E., Watson, I.R., Kryukov, G.V., Arold, S.T., Imielinski, M., Theurillat, J.-P., Nickerson, E., Auclair, D., Li, L., Place, C., et al. (2012). A landscape of driver mutations in melanoma. Cell150, 251–263.

Kaida, D., Schneider-Poetsch, T., and Yoshida, M. (2012). Splicing in onco-genesis and tumor suppression. Cancer Sci.103, 1611–1616.

Karni, R., de Stanchina, E., Lowe, S.W., Sinha, R., Mu, D., and Krainer, A.R. (2007). The gene encoding the splicing factor SF2/ASF is a proto-oncogene. Nat. Struct. Mol. Biol.14, 185–193.

Ke, S., Shang, S., Kalachikov, S.M., Morozova, I., Yu, L., Russo, J.J., Ju, J., and Chasin, L.A. (2011). Quantitative evaluation of all hexamers as exonic splicing elements. Genome Res.21, 1360–1374.

Keren, H., Lev-Maor, G., and Ast, G. (2010). Alternative splicing and evolution: diversification, exon definition and function. Nat. Rev. Genet.11, 345–355. Kimchi-Sarfaty, C., Oh, J.M., Kim, I.-W., Sauna, Z.E., Calcagno, A.M., Ambud-kar, S.V., and Gottesman, M.M. (2007). A ‘‘silent’’ polymorphism in the MDR1 gene changes substrate specificity. Science315, 525–528.

Lamolle, G., Marin, M., and Alvarez-Valin, F. (2006). Silent mutations in the gene encoding the p53 protein are preferentially located in conserved amino acid positions and splicing enhancers. Mutat. Res.600, 102–112.

Li, Y., Hwang, T.H., Oseth, L.A., Hauge, A., Vessella, R.L., Schmechel, S.C., Hirsch, B., Beckman, K.B., Silverstein, K.A., and Dehm, S.M. (2012). AR intra-genic deletions linked to androgen receptor splice variant expression and activity in models of prostate cancer progression. Oncogene31, 4759–4767. Lin, M.F., Kheradpour, P., Washietl, S., Parker, B.J., Pedersen, J.S., and Kellis, M. (2011). Locating protein-coding sequences under selection for additional, overlapping functions in 29 Mamm. Genome Res.21, 1916–1928.

Ma, F., Sun, T., Shi, Y., Yu, D., Tan, W., Yang, M., Wu, C., Chu, D., Sun, Y., Xu, B., and Lin, D. (2009). Polymorphisms of EGFR predict clinical outcome in advanced non-small-cell lung cancer patients treated with Gefitinib. Lung Cancer66, 114–119.

Mahalanobis, P. (1936). On the generalized distance in statistics. Proc. Natl. Inst. Sci. India2, 49–55.

Markham, N.R., and Zuker, M. (2008). UNAFold: software for nucleic acid folding and hybridization. Methods Mol. Biol.453, 3–31.

Orengo, J.P., Bundman, D., and Cooper, T.A. (2006). A bichromatic fluores-cent reporter for cell-based screens of alternative splicing. Nucleic Acids

Res.34, e148.

Parmley, J.L., Chamary, J.V., and Hurst, L.D. (2006). Evidence for purifying selection against synonymous mutations in mammalian exonic splicing enhancers. Mol. Biol. Evol.23, 301–309.

Peifer, M., Ferna´ndez-Cuesta, L., Sos, M.L., George, J., Seidel, D., Kasper, L.H., Plenker, D., Leenders, F., Sun, R., Zander, T., et al. (2012). Integrative genome analyses identify key somatic driver mutations of small-cell lung cancer. Nat. Genet.44, 1104–1110.

Pleasance, E.D., Stephens, P.J., O’Meara, S., McBride, D.J., Meynert, A., Jones, D., Lin, M.-L., Beare, D., Lau, K.W., Greenman, C., et al. (2010). A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature463, 184–190.

Pollard, K.S., Hubisz, M.J., Rosenbloom, K.R., and Siepel, A. (2010). Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res.20, 110–121.

Poulikakos, P.I., Persaud, Y., Janakiraman, M., Kong, X., Ng, C., Moriceau, G., Shi, H., Atefi, M., Titz, B., Gabay, M.T., et al. (2011). RAF inhibitor resistance is mediated by dimerization of aberrantly spliced BRAF(V600E). Nature 480, 387–390.

(12)

Ray, D., Kazan, H., Cook, K.B., Weirauch, M.T., Najafabadi, H.S., Li, X., Guer-oussov, S., Albu, M., Zheng, H., Yang, A., et al. (2013). A compendium of RNA-binding motifs for decoding gene regulation. Nature499, 172–177. Roberts, S.A., Sterling, J., Thompson, C., Harris, S., Mav, D., Shah, R., Klimc-zak, L.J., Kryukov, G.V., Malc, E., Mieczkowski, P.A., et al. (2012). Clustered mutations in yeast and in human cancers can arise from damaged long single-strand DNA regions. Mol. Cell46, 424–435.

Santarius, T., Shipley, J., Brewer, D., Stratton, M.R., and Cooper, C.S. (2010). A census of amplified and overexpressed human cancer genes. Nat. Rev. Cancer10, 59–64.

Sauna, Z.E., and Kimchi-Sarfaty, C. (2011). Understanding the contribution of synonymous mutations to human disease. Nat. Rev. Genet.12, 683–691. Saunders, C.T., Wong, W.S.W., Swamy, S., Becq, J., Murray, L.J., and Cheetham, R.K. (2012). Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics28, 1811–1817. Schuster-Bo¨ckler, B., and Lehner, B. (2012). Chromatin organization is a major influence on regional mutation rates in human cancer cells. Nature488, 504–507.

Schutz, F.A., Pomerantz, M.M., Gray, K.P., Atkins, M.B., Rosenberg, J.E., Hirsch, M.S., McDermott, D.F., Lampron, M.E., Lee, G.-S.M., Signoretti, S., et al. (2013). Single nucleotide polymorphisms and risk of recurrence of renal-cell carcinoma: a cohort study. Lancet Oncol.14, 81–87.

Sheth, N., Roca, X., Hastings, M.L., Roeder, T., Krainer, A.R., and Sachidanan-dam, R. (2006). Comprehensive splice-site analysis using comparative geno-mics. Nucleic Acids Res.34, 3955–3967.

Smith, P.J., Zhang, C., Wang, J., Chew, S.L., Zhang, M.Q., and Krainer, A.R. (2006). An increased specificity score matrix for the prediction of SF2/ASF-specific exonic splicing enhancers. Hum. Mol. Genet.15, 2490–2508. Stark, M., Bram, E.E., Akerman, M., Mandel-Gutfreund, Y., and Assaraf, Y.G. (2011). Heterogeneous nuclear ribonucleoprotein H1/H2-dependent unsplic-ing of thymidine phosphorylase results in anticancer drug resistance. J. Biol. Chem.286, 3741–3754.

Stratton, M.R., Campbell, P.J., and Futreal, P.A. (2009). The cancer genome. Nature458, 719–724.

Strauss, B.S. (2000). Role in tumorigenesis of silent mutations in the TP53 gene. Mutat. Res.457, 93–104.

Supek, F., Skunca, N., Repar, J., Vlahovicek, K., and Smuc, T. (2010). Trans-lational selection is ubiquitous in prokaryotes. PLoS Genet.6, e1001004. Varley, J.M., Chapman, P., McGown, G., Thorncroft, M., White, G.R., Greaves, M.J., Scott, D., Spreadborough, A., Tricker, K.J., Birch, J.M., et al. (1998). Genetic and functional studies of a germline TP53 splicing mutation in a Li-Fraumeni-like family. Oncogene16, 3291–3298.

Vogelstein, B., Papadopoulos, N., Velculescu, V.E., Zhou, S., Diaz, L.A., Jr., and Kinzler, K.W. (2013). Cancer genome landscapes. Science339, 1546– 1558.

Wang, Z., Rolish, M.E., Yeo, G., Tung, V., Mawson, M., and Burge, C.B. (2004). Systematic identification and analysis of exonic splicing silencers. Cell119, 831–845.

Wang, J., Gonzalez, K.D., Scaringe, W.A., Tsai, K., Liu, N., Gu, D., Li, W., Hill, K.A., and Sommer, S.S. (2007). Evidence for mutation showers. Proc. Natl. Acad. Sci. USA104, 8403–8408.

Warneford, S.G., Witton, L.J., Townsend, M.L., Rowe, P.B., Reddel, R.R., Dalla-Pozza, L., and Symonds, G. (1992). Germ-line splicing mutation of the p53 gene in a cancer-prone family. Cell Growth Differ.3, 839–846. Woo, Y.H., and Li, W.-H. (2012). DNA replication timing and selection shape the landscape of nucleotide variation in cancer genomes. Nat Commun3, 1004.

Zhang, G., Hubalewska, M., and Ignatova, Z. (2009). Transient ribosomal attenuation coordinates protein synthesis and co-translational folding. Nat. Struct. Mol. Biol.16, 274–280.

Zhang, J., Baran, J., Cros, A., Guberman, J.M., Haider, S., Hsu, J., Liang, Y., Rivkin, E., Wang, J., Whitty, B., et al. (2011). International Cancer Genome Consortium Data Portal—a one-stop shop for cancer genomics data. Data-base (Oxford)2011, bar026–bar026.

Zhou, M., Guo, J., Cha, J., Chae, M., Chen, S., Barral, J.M., Sachs, M.S., and Liu, Y. (2013). Non-optimal codon usage affects expression, structure and function of clock protein FRQ. Nature495, 111–115.

References

Related documents

A second option to manage the engineering duties of precision optical systems is to develop a manufacturing stream which enables transferring and transitioning designer

The production of High Carbon Ferro Chrome in submerged arc furnaces using sintered pellets in place of briquettes is comparatively better in all respects with respect to

Disseminated cutaneous, laryngeal, nasopharyngeal and recurrent obstructive nasal rhinosporidiosis in an immunocompetent adult: a case report and review of

Prema Čeroviću (2013) postoje mnoge specifičnosti ljudskih resursa u hotelijerstvu, pa prema tome navodi kako ljudski resursi u funkciju mogu staviti umne,

Fragmenting the matching and differencing process so that they reflect their input would be a way to reduce the overall memory consumed during the comparison process, allowing

In addition, transitions into university and new programs may also pose certain challenges to indigenous students as first-year students and those enrolled in

Magnetická rezonancia (MR) predstavuje v súčasnosti dôležitú paraklinickú metódu v diagnostike, monitorovaní priebehu a liečby sklerózy multiplex (SM). MR sa využíva

Runoff: basal and surface melting of the ice sheet; iceberg FWF: melt flux related to iceberg calving; direct FWF: input of calving mass as freshwater flux into the first ocean