• No results found

High-resolution reversible folding of hyperstable RNA tetraloops using molecular dynamics simulations

N/A
N/A
Protected

Academic year: 2021

Share "High-resolution reversible folding of hyperstable RNA tetraloops using molecular dynamics simulations"

Copied!
6
0
0

Loading.... (view fulltext now)

Full text

(1)

High-resolution reversible folding of hyperstable RNA

tetraloops using molecular dynamics simulations

Alan A. Chen and Angel E. García1

Department of Physics and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY 12180 Edited by Peter G. Wolynes, Rice University, Houston, TX, and approved August 16, 2013 (received for review May 17, 2013) We report the de novo folding of three hyperstable RNA

tetra-loops to 1–3 Å rmsd from their experimentally determined struc-tures using molecular dynamics simulations initialized in the unfolded state. RNA tetraloops with loop sequences UUCG, GCAA, or CUUG are hyperstable because of the formation of noncanon-ical loop-stabilizing interactions, and they are all faithfully repro-duced to angstrom-level accuracy in replica exchange molecular dynamics simulations, including explicit solvent and ion molecules. This accuracy is accomplished using unique RNA parameters, in which biases that favor rigid, highly stacked conformations are corrected to accurately capture the inherent flexibility of ssRNA loops, accurate base stacking energetics, and purinesyn-anti inter-conversions. In a departure from traditional quantum chemistry-centric approaches to forcefield optimization, our parameters are calibrated directly from thermodynamic and kinetic measurements of intra- and internucleotide structural transitions. The ability to recapitulate the signature noncanonical interactions of the three most abundant hyperstable stem loop motifs represents a signifi -cant milestone to the accurate prediction of RNA tertiary structure using unbiased all-atom molecular dynamics simulations. RNA folding

|

molecular simulations

S

tructured RNAs exhibit a distinct preference for loops of

precisely 4 nt, which was originally noted by Woese et al. (1) using comparative sequence analysis of ribosomes. Approxi-mately 70% of these tetraloops are comprised of just three

specific loop sequences: UUCG, GCAA, or CUUG. The

abun-dance of these sequences is thermodynamic in origin, because each motif forms a unique network of noncanonical interactions within their loops that stabilizes the folded state. The abundance of high-resolution structural and thermodynamic data available for these motifs coupled with their characteristic noncanonical signatures make them ideal for adjudicating the accuracy of RNA folding simulations.

RNA folding is understood to be hierarchical in nature, with secondary and tertiary folds stabilized by distinct thermodynamic driving forces (2). Secondary structure (the formation of

ca-nonical helices stabilized by Watson–Crick base pairs) can be

accurately predicted from the nucleotide sequence alone using simple nearest neighbor thermodynamic models (3). In contrast, tertiary structure formation is a subtle competition between

in-trinsic flexibility of single-stranded segments, rigidity imparted

from base-stacking interactions, stabilization of noncanonical

hydrogen bonding patterns, and site-specific ion binding. In

principle, a molecular dynamics simulation using a properly

calibrated forcefield should capture all of the physicochemical

properties of ribonucleotides relevant to the RNA folding pro-cess. Up until now, however, even small, fast-folding tetraloops cannot be accurately and reversibly folded from the unfolded

state (4–6). In contrast, numerous documented successes have

been reported using de novo protein folding with all-atom mo-lecular dynamics simulations (7).

In this work, we present the results of replica exchange mo-lecular dynamics (REMD) simulations that are consistently able to correctly fold all three hyperstable tetraloops from random

unfolded states to 1–3 Å rmsd from their experimentally determined

structures, using optimized RNA parameters. These parame-ters feature van der Waals interactions that have been cali-brated against high-level quantum mechanics (QM) dispersion calculations in conjunction with a suite of experimental meas-urements of aqueous nucleoside and dinucleotide interactions. The resulting RNA parameters reproduce the experimentally measured clustering propensities of aqueous nucleoside sol-utions as a function of concentration, the thermodynamics of base stacking as a function of temperature, and the population

and lifetimes of syn-anti glycosidic transitions; in contrast, we

find that the default AMBER-99 RNA (8) parameters fail all of

these same tests. Wefind that these degrees of freedom are

in-trinsically coupled and that only a simultaneous global repar-ameterization results in the ability to reversibly fold hyperstable tetraloops with their signature noncanonical interactions. The ability to fold small model RNAs de novo enables future computational studies of larger biologically interesting RNAs, such as riboswitches, ribozymes, and mRNA UTRs.

Results

In each REMD simulation, dozens of folding events are

ob-served, which can be seen in Fig. 1C; the resulting melting curves

for all three tetraloops are shown in Fig. 1D. The folded states

calculated for the three tetraloops, which are shown in Fig. 2, are superimposed against their experimentally determined struc-tures, which are shown in Fig. 2, gray. Each calculated structure is the centroid of the most highly populated cluster from the

trajectory found at the lowest REMD temperature (274 K)—not

the structure with the lowest rmsd. This cluster identifies the

most thermodynamically stable configuration from the REMD

en-semble without introducing any artificial bias to the experimental

Significance

We report atomistic simulations describing the spontaneous folding of three RNA tetraloops from the unfolded state to within 1–3 Å of their experimentally determined structures. These common and highly stable motifs serve as building blocks for large structured RNA molecules. This success is caused by the reparameterization of the energy function used to describe the interatomic interactions, which was obtained by calibrating the energy function to reproduce known ther-modynamic and kinetic measurements of RNA monomers and dimers. In particular, the accurate recapitulation of the char-acteristic loop configurations responsible for the thermody-namic hyperstability of these motifs represents a significant milestone to the accurate description of RNA tertiary structure using unbiased all-atom molecular dynamics simulations.

Author contributions: A.A.C. and A.E.G. designed research; A.A.C. performed research; A.A.C. analyzed data; and A.A.C. and A.E.G. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission. See Commentary on page 16706.

1To whom correspondence should be addressed. E-mail: [email protected].

This article contains supporting information online atwww.pnas.org/lookup/suppl/doi:10.

(2)

structure during the selection process. The centroid structures capture all of the unique noncanonical signatures of hyper-stable tetraloops, which are detailed below. In all cases,

hy-drogen bonding interactions between thefirst (L1) and fourth

(L4) base in the loop form a distorted base pair constrained by the reduced interstrand distance of the loop, whereas the second (L2) and third (L3) loop bases are either weakly stacked or

completely unrestrained by virtue of being flipped out into

solution.

The centroid structure for the UUCG tetraloop is just 0.8 Å all

heavy-atom rmsd from the X-ray structure (Fig. 2,Left), with

a closest cluster conformation of 0.6 Å rmsd. The folded state correctly captures the transwobble GU base pair as well as the

correct orientation of the middle loop bases UL2and CL3. The

transwobble GU pair requires GL4 to adopt an unusual syn

conformation and is stabilized by a hydrogen bond between the

N1 of GL4and the 2′OH of UL1but not with N2. UL2is highly

mobile andflipped out into solution, and the amino group of CL3

is observed to form a hydrogen bond to the O1P of UL2, which

are all in agreement with the high-resolution NMR structure (9).

Additionally, we observe hydrogen bonds from the N2 of GL4

to the O2 of CL3and from the O6 of GL4to the 2′OH of UL2,

which tend to form when the other two hydrogen bonds to GL4

are broken, resulting in a cage of weak hydrogen bonds that

cumulatively maintain the highly mobile G4in its unusualsyn

conformation.

Our simulations also accurately fold the GCAA tetraloop, where the most populated cluster centroid is 1.3 Å (all heavy

atom) rmsd from the NMR structure (Fig. 2,Center), with a

closest conformation of 0.8 Å rmsd. The folded state correctly captures the unusual sheared GA loop base pair, with exocyclic

amino group of GL1alternately hydrogen bonding with the N7 of

AL4or the O2P of AL3. The putative hydrogen bond between N7

of AL3and the 2′OH of GL1suggested by NMR studies (10) is

also observed, which stabilizes the stacked conformation of AL3

on top of AL4, whereas CL2isflipped out into the solvent and

largely unrestricted. GL1exhibits strong cross-stacking with the G

of the closing base pair, whereas AL3 is more solvent-exposed

and does not interact strongly with the closing base pair. It can be

seen in Fig. 2,Center that the slightly rotated conformation of

AL4 is the only significant deviation from the 1ZIH NMR

structure. However, it has been shown in the work by Mohan

et al. (11) that GNRA (where N= C or U and R= A or G)

tetraloops within the 70s ribosome crystal structure consistently

exhibit similarly repositioned AL4 conformations on account of

torsional stress within the loop backbone. If these structures (2JJ0) are used as a reference rather than the NMR structure,

the centroid rmsd drops to<1.0 Å.

Additionally, we are also able to fold the CUUG tetraloop, albeit with less accuracy than either the UUCG or GCAA

tet-raloop. The centroid rmsd is 3.1 Å (Fig. 2,Right), with the closest

observed conformation at 2.0 Å rmsd from the experimental

structure. The buckled CG base pair between CL1and GL4of the

loop is correctly recapitulated, essentially creating a 3-bp helix topped by an effective biloop. The primary discrepancy is the

predicted position of UL2, which does not get inserted into the

minor groove as experimentally observed; consequently, the

pre-dicted position of the neighboring U3is also perturbed. This

unusual UL2conformation is known to be stabilized by hydrogen

bonds to the exocyclic amino group of GL4as well as two

addi-tional hydrogen bonds to the proximal stem CG base pair, none of which are observed in our simulations. It should be noted that

UL3, unlike UL2, is poorly restrained by existing NMR NOE

restraints, because it does not participate in any hydrogen bonds and undergoes fast chemical exchange (12). Therefore, it is just

the predicted position of UL2that is at odds with the

experi-mental data and not the arbitrary conformation of UL3as

rep-resented in the experimental structure. If UL3is excluded from

the rmsd calculation, the centroid is∼2.0 Å rmsd from the

ex-perimental structure.

At intermediate temperatures, clustering analysis reveals that the folded ensemble consists of multiple interconverting sub-states that are not as optimally packed as the native state but still possess 2 stem bp. The sequence of events from the observed folding trajectories sheds light on how each loop sequence dictates the overall RNA folding pathway. It should be noted, however, that individual REMD trajectories rapidly swap temperatures, and therefore, the observed pathways may not be identical to the pathways observed at constant temperature. Surprisingly similar folding pathways are observed for all three tetraloop sequences. Each productive folding event always originates from a high-temperature replica, where it is fully extended and unstructured. The initiation of folding from expended conformations does not imply that mispaired or collapsed conformations are not sam-pled, but rather that they were not observed as on-pathway intermediates for productive folding. These extended

high-tem-perature states undergo rapidfluctuations that bring base pairs

on opposing ends of the RNA in close proximity, resulting in 0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 0.1 0.2 0.3 0.4 0.5 0.6 RMSD (nm) H(RMSD) 274 K 323 K 376 K 430 K 496 K Time (ns) Replica # 100 200 300 400 10 20 30 40 50 60 0 1 0 100 200 300 400 0 10 20 30 40 Time (ns) # Replicas Folded GCAA CUUG UUCG 0 300 350 400 450 500 0 0.2 0.4 0.6 0.8 Temp (K) Fraction Folded gcGCAAgc gcUUCGgc cgCUUGcg

B

A

D

C

Fig. 1. Tetraloop folding simulations. (A) Histograms of GCAA rmsd vs. temperature. The folded fraction is taken as rmsd<0.4 nm. (B) Number of folded replicas vs. time. UUCG and CUUG both equilibrate within 100 ns, whereas GCAA requires 275 ns before steady state is reached. (C) Fraction of each 0.5-ns block spent in the folded state for all 64 replicas of the GCAA folding simulation. (D) Fraction folded vs. temperature for all three tetra-loop sequences.

UUCG: 0.8Å GCAA: 1.3Å CUUG: 3.1Å

Fig. 2. Folded state predictions (colored) vs. experimental structure (gray) for all three hyperstable tetraloops and their rmsd values from the experi-mentally determined structures. Predictions are centroids from the most populated cluster of the trajectory visiting the lowest REMD temperature. Nucleosides are colored as red (guanine), green (cytosine), yellow (adenine), and cyan (uracil).

BIOPHYSICS AND COMPUTATIONAL BIOLOGY SEE COM MENTARY

(3)

many transiently formed single base pairs (including nonnative ones). Mispaired, nonnative base pairs tend to rapidly dissociate, whereas the occasional formation of an in-register stem base pair results in the immediate formation of the neighboring second stem base pair (facilitated by a cross-stacked purine). After it is formed, the doubly base-paired stem remains intact for the remainder of the simulation. At this point, the loop bases are in a disordered, collapsed state and begin a restricted search for a more optimal packing arrangement. If the loop happened to collapse very close to the native state, the noncanonical hydrogen bonds between loop

bases L1 and L4 are able to rapidlyfind their folded state. If the

loop collapsed with a suboptimal base packing, the search is a slow,

multistep process involving successive flipping out and in of the

bases at the L1 and L4 positions until the proper noncanonical hydrogen bonds are realized. Only then can the loop bases at positions L2 and L3 properly adjust their stacking to complete the folded tetraloop motif.

For the UUCG tetraloop, a handful of trajectories collapse

with GL4 already in the syn conformation, and therefore, the

transG-U wobble interaction forms quickly (Fig. 3A). However,

typical trajectories collapse with GL4 in the anti conformation

and are stacked against the inner stem CG base pair. In these

cases, the G4will notfit sterically in the narrow, collapsed loop

at the same time as UL1, and therefore, one or the other is forced

toflip out into solution. For the GUtranswobble to form,UL1

must beflipped in, whereas GL4isflipped out and transitions

to thesynconformation. This slow, multistep process does not

always complete for every folded trajectory within the simu-lation timescale. In fact, clustering analysis reveals that these kinetically trapped conformations are highly populated in in-termediate temperature trajectories. However, the fact the na-tive state is at all reachable is a direct consequence of our revised

force field, which correctly captures both the relative

popula-tions of syn-anti conformers and the correct transition rates

between them.

GCAA folding events also begin with the formation of the

inner stem CG base pair stabilized by cross-stacking from GL1

(Fig. 3B). After it has collapsed, GL1 and AL4 can form the

characteristic sheared GA base pair, and then, A3stacks on top

are stabilized by a hydrogen bond to the 2′OH of GL1; the

noninteracting C2remainsflipped out into solution, completing

the motif. However, there is the possibility that a sheared GA

base pair can also form between GL1and AL3on loop collapse,

forcing AL4toflip out into solution to accommodate the

short-ened loop. This nonnative, misfolded conformation can

eventu-allyfind the native state if AL3flips out and then AL4flips in to

form the correct sheared GA base pair. This multistep search

process proceeds much slower than thefirst route, resulting in

the unusually slow overall equilibration rate observed for the

GCAA simulation (Fig. 1B).

Lastly, the CUUG motif involves the formation of a buckled

CG base pair involving CL1 and GL4 of the loop, effectively

creating a biloop (Fig. 3C). Unlike either UUCG or GCAA, it is

observed that there is a possibility of CL1-GL4base pair

forma-tion preceding stem formaforma-tion, because Watson–Crick base pairs

can form outside the context of the collapsed loop. The primary pathway, however, is similar to the other two tetraloops, where

the inner stem base pair formsfirst (aided by a cross-stacked G4),

then the outer base pair forms, andfinally, the loop CG base pair

forms. The experimental CUUG structure indicates that UL2

should then pack vertically into the minor groove to form hy-drogen bonds with both the loop and inner stem CG base pair;

however, in our simulations, we do not observe the U2base to

adopt this configuration, possibly indicating an interaction not

properly modeled by our RNA parameters. In particular, we

have not attempted any modification of the default AMBER-99

phosphate backbone torsions, which have been shown in prior work by Pérez et al. (13) to result in distortions of B-form DNA

helices in long molecular dynamics simulations. An incorrect

description of RNA backboneflexibility combined with the

un-usually strained CUUG loop backbone may explain the inability of the CUUG simulations to adopt the experimentally observed minor grove interaction.

Discussion

Ourfindings indicate that the folding of a minimal RNA motif

(hyperstable tetraloops) exhibits many of the hallmarks of hier-archical folding observed for larger, structured RNAs. Secondary

structure formation (in this case, the formation of two Watson–

Crick stem base pairs) occurs very rapidly (<100 ps) with high

cooperation in a canonical two-state fashion. The noncanonical

U

F

Syn-GL4 -GL4 UL1flipped out GL4flipped in UL1flipped in GL4flipped out

U

F

GL1-AL3basepair AL4flipped out A L3,L4flipped out

U

F

Preformed CL1– GL4 CL1flipped out

A

B

C

basepair

Fig. 3. Tetraloop folding pathways. Two alternate folding pathways are observed for all three hyperstable tetraloops. Thick and thin arrows depict rapid and slow transitions, respectively. Bases are colored the same as in Fig. 2. (A) UUCG folds rapidly if GL4is already insynbefore collapse, which is

required to form thetrans-GU wobble base pair (lower pathway). In con-trast, misfolds containing the anti-GL4mustflip GL4out of the loop to access

thesynconformation and thenflip back in to pair with UL4(upper pathway).

(B) GCAA folds rapidly when GL1correctly pairs with AL4after loop collapse

to form a sheared base pair (lower pathway), but it can also form a non-native GL1-AL3base pair (upper pathway). AL3mustflip out before the native

GL1-AL4base pair can form. (C) The CUUG tetraloop rapidly folds when the

CL1-GL4base pair is preformed before collapse (lower pathway); otherwise,

CL1is initiallyflipped out and mustflip back into the loop to reach the native

(4)

loop interactions, however, often involve a slow conformational search through isoenergetic intermediates that cannot even begin until after the stem is completely formed, indicating the existence of a rough free energy landscape, even for minimal hyperstable RNA motifs.

The existence of multiple substates with suboptimal loop conformations in our REMD simulations is consistent with prior experimental reports of not two-state behavior in tetraloop folding. Menger et al. (14) found that 2-aminopurine stacking at the L2 and L3 positions of the GAAA tetraloop displayed distinctly different relaxation lifetimes on the microsecond timescale using

fluorescence-detected temperature jump experiments. Intriguingly,

Menger et al. (14) also found thatfluorescence changes at either

position were anticorrelated with changes at the other position as a function of temperature, indicating a population shift involving

competing stacking rearrangements of AL2 and AL3. Similarly,

Johnson and Hoogstraten (15), using NMR relaxation, observed a puzzling temperature-dependent multisite exchange process

for the C2′of AL3that they hypothesized could be caused by

extrusion of AL4at higher temperatures. Zhao and Xia (16),

using femtosecond time-resolvedfluorescence, concluded that

multiple loop conformations are possible for GNRA tetraloops,

including a state where AL3is partially stacked with GL1but not

stacked with AL4as in the native state. These observations are all

potentially explained by the two interconverting loop

confor-mations that we observe for the GCAA tetraloop (Fig. 3B), in

which either the GL1-AL4sheared base pair is formed, allowing

AL3to stack on top at low temperatures, or an alternate form is

formed at higher temperatures where a GL1-AL3base pair forms,

resulting in the concomitantflipping out of AL4and allowing AL2

to stack on top.

For the UUCG tetraloop, Ma et al. (17), using temperature jump experiments, observed three distinct relaxation processes requiring four distinct states to describe the folding process. Ma et al. (17) conclude that one on-pathway and one off-pathway intermediate must be present, with the on-pathway intermediate

disappearing for either a UUUU tetraloop or a UUCGwith a

bromonated GL4. The interpretation by Ma et al. (17) is that this

on-pathway intermediate corresponds to a nonnatively packed loop transitioning to a native loop. The disappearance of this transition can, therefore, be explained by the fact that UUUU lacks a native loop packing, whereas the 8-Br-G bypasses the

nonnative loop state altogether by preorganizing the GL4in the

synconformer. An extension of this work by Sarkar et al. (18)

raises the possibility that the four states could be arranged in two parallel pathways, which could also explain the observed kinetic data. In related work, Proctor et al. (19) showed that

incorpo-ration of 8-Br-G in UUCGresults in a 4.1 times faster folding

rate, indicating that preorganization of thesyn GL4conformer

accelerates the conformational search for the native state. These observations are entirely consistent with the two pathways that

we observe for UUCG folding (Fig. 3A): a rapid pathway, in

which ansyn-GL4is preformed on loop collapse, and a much

slower pathway, where GL4and UL1alternateflipping out of the

loop until GL4is able to transition to thesynrotamer. It should

be noted that not all mispacked loop conformations were able to reach the native state within the timescale of our simulations, and therefore, it is possible that some pathological packing is actually an off-pathway intermediate requiring complete un-folding and reun-folding to reach the native state. However, we cannot ascertain this for certain without extensive kinetic simu-lations, which are beyond the scope of this study.

The observed folding pathways also explain the observed closing base pair preferences for the three tetraloops sequences. For both gCUUGc and cGCAAg, when the closing stem G is on the opposite side of the loop as the consensus loop G, the re-sulting RNA is measurably more stable than RNA sequences in which the stem G is on the same side (20). We observe that, for

both cGCAAg and gCUUGc, a cross-stacked G-G interaction is able to preorganize the formation of the correct in-register upper stem base pair, which always forms before the lower stem base

pair (Fig. 3C). Although cUUCGg would seem to break this rule,

in this case, GL4must adopt ansynconformer in the native state

as opposed to the anti conformer stabilized by cross-stacking,

which must then undergo a slow search tofind the nativesyn

rotamer (Fig. 3A).

In prior attempts to fold hyperstable tetraloops from the un-folded state with atomistic detail, none of the noncanonical loop interactions were correctly predicted in the folded state confor-mation. The ROSETTA suite of knowledge-based potentials routinely ranks among the best performers in the critical as-sessment of protein structure prediction (CASP) blind protein structure prediction contests, but related efforts to predict RNA structure could not reproduce any of the hydrogen bonds or stacking patterns within the UUCG tetraloop (5). The GCAA motif has been extensively studied using the massively parallel distributed computing resources of the Folding@Home project, which was only able to capture 19 heterogeneous folding events

among 10,000 folding simulations involving∼475μs of

cumula-tive sampling, with majority of the trajectories kinetically trapped

in nonspecific collapsed states (21). A follow-up study used 2,800

serial replica exchange simulations designed to facilitate rapid equilibration with stochastic heating and cooling; however,

re-versible folding still could not be achieved, despite ∼55 μs of

cumulative REMD sampling (4). In a recent study, both the

UUCG and GAGA tetraloops were simulated for ∼19 μs of

cu-mulative REMD sampling from both the folded and unfolded state; however, equilibrium folding was not achieved, and the folded state was only transiently sampled (6). The lone claim of reversible tetraloop folding from the unfolded state originated from our own laboratory using extensive temperature replica exchange simulations of the UUCG motif, resulting in native

state folds that averaged 4–6 Å rmsd from the experimental

structure (22). However, the folded states observed in those simulations did not recapitulate any of the signature loop-sta-bilizing noncanonical interactions characteristic of the UUCG motif. Furthermore, the single most populated structure ob-tained from clustering analysis of the ensemble was, in fact, a hyperstacked, interdigitated structure with zero base pairs, for which there is no experimental evidence. These hyperstacked structures were caused by the erroneously strong base-stacking

propensity exhibited by the AMBER forcefield (23–25), which

we corrected in the parameters used for this work. These results cumulatively indicate that gross inaccuracies in the underlying physical description of RNA interactions themselves and not merely extent or method of sampling are likely to blame for the poor performance of RNA folding simulations to date.

Multiple prior studies of tetraloops have simulated the equi-librium unfolding using REMD simulations initialized from the

native state (26–28). It is argued that, after equilibrated, the

multicanonical ensemble is thermodynamically equivalent to an REMD simulation initialized from the unfolded state, and therefore, the (un)folding process can still be accurately ob-served. This assumption would be rigorously correct if the simulation parameters unambiguously recapitulated the correct tetraloop native state as the global free energy minimum, an assumption that has never been proven. In fact, a recent study has explicitly tested the convergence of the UUCG and GAGA tetraloops simulated from both the folded and unfolded states in REMD simulations (6), and it concluded that convergence was

not achieved, even at ∼400 ns per replica. In this study, we

found that significant revisions to the underlying RNA

param-eters themselves were needed before equilibrium folding could be achieved, after which rapid folding was observed, even within the initial 100-ns REMD equilibration period. The ability of the

improved force field to accurately recapitulate the signature

BIOPHYSICS AND COMPUTATIONAL BIOLOGY SEE COM MENTARY

(5)

noncanonical interactions of hyperstable tetraloops is a key step to the eventual simulation of the folding and dynamics of larger, bi-ologically interesting RNAs in the near future.

Methods

The three hyperstable tetraloop sequences gcUUCGgc, gcGCAAgc, and cgCUUGcg were initialized in random unfolded states taken from an ex-cluded-volume ensemble with no bias to the folded conformation. These conformations were solvated in a cubic simulation cell of 5.5 nm on each side with explicit water and ions (5,300 TIP3P, 100 K+, and 93 Cl−) to approximate ionic conditions of 1 M excess KCl. This large box size is a departure from the minimal simulation boxes previously used for simulations of an 8-nt RNA, because fully extended states with end-to-end distances of∼4 nm are readily realized at high temperatures using the newly modified RNA parameters. These initial conformations were equilibrated in short constant pressure simulations at ambient conditions, from which snapshots with instantaneous pressures of∼1 bar were used as initial seeds for a constant volume REMD simulation. Each REMD simulation consisted of 64 replicas with a tempera-ture schedule chosen to maintain an∼20% exchange rate from 274 to 498 K, with temperature swaps every 2 ps. Simulations were propagated for 200–400 ns per replica depending on the observed equilibration rate, and a total of∼58μs of cumulative REMD sampling for all three tetraloops was combined.

Conformations were considered folded if the rmsd to the experimental structure was less than 4 Å rmsd, a cutoff chosen based on the apparent two-state behavior exhibited by rmsd vs. temperature shown in Fig. 1A. This definition of the folded state is conservative, in that it essentially requires that both stem CG base pairs be correctly formed, a criterion that excludes nonspecifically collapsed structures and the highly interdigitated, nonbase-paired structures previously observed in the work by Garcia and Paschek (22). Convergence was ascertained by the asymptotic behavior of the num-ber of folded replicas as a function of time (Fig. 1B), with thefirst 100 ns discarded as equilibration for UUCG and CUUG and thefirst 275 ns discarded for GCAA. Configurational clustering of the sampled ensemble was per-formed with the Daura algorithm (29), with a cutoff of 1.2 Å. The use of cluster centroids to characterize the folded state is highly insensitive to the specific clustering algorithm used, because folded states in low-temperature trajectories were long-lived and undergo only minorfluctuations around the

were used as reference structures only for rmsd calculations: UUCG, PDB ID code 1F7Y (30); GCAA, PDB ID code 1ZIH (31); 2J00 nucleotides 897–902 (32) and CUUG, PDB ID code 1RNG (11).

RNA parameters are based on a significant revision to the AMBER-99 force

field for nucleic acids based on detailed calibrations against a wide range of experimental data, which are described in detail inSupporting Information. It has been previously shown that the default AMBER nucleic acid parameters dramatically overestimate base-stacking propensities (23–25, 33), do not cor-rectly balance thesyn-anti glycosidic rotamers (34, 35), and violate the close contact distances calculated by dispersion-corrected quantum chemical calcu-lations (36, 37). These deficiencies in the parameters are all consequences of the inappropriate reuse of van der Waals parameters developed for protein model compounds (8), resulting in bloated nucleobases that do not accurately reflect the physicochemical properties of aqueous heterocycles.

To derive parameters sufficiently accurate for de novo RNA folding, the Lennard–Jones (L-J)σ-parameters for base–base interactions were uniformly reduced to match close contact distances observed from high-level disper-sion-corrected coupled cluster theory including singlet, doublet, and triplet excitations quantum calculations of gas-phase nucleobases (38). The L-J well-depth parameter-ewas then adjusted to achieve the correct stacking pro-pensity in solution, which was adjudicated by clustering propensities of aqueous nucleosides (39). The base–water interaction was also adjusted to reproduce the correct enthalpy and entropy decomposition for dinucleotide stacking. The refinement of parameters was accomplished by comparing stacking free energies calculated from potentials of mean force with circular dichroism measurements as a function of temperature (39). Finally, the glycosidic torsion,χ, was adjusted to optimize thesyn-anti populations and transition frequencies as measured by NMR (40) and ultrasonic absorption (41). It should be noted that torsion optimization can only be performed after optimization of the L-J parameters because of their intrinsic in-terdependency. Details of the parameters and optimization strategy are included inSI Methods.

ACKNOWLEDGMENTS.A.A.C. is supported by National Institutes of Health National Institute of General Medical Sciences Postdoctoral Fellowship F32GM091774, and A.E.G. is supported by National Science Foundation Grant MCB-1050966.

1. Woese CR, Winker S, Gutell RR (1990) Architecture of ribosomal RNA: Constraints on

the sequence of“tetra-loops.”Proc Natl Acad Sci USA87(21):8467–8471.

2. Tinoco I, Jr., Bustamante C (1999) How RNA folds.J Mol Biol293(2):271–281.

3. Zuker M (2003) Mfold web server for nucleic acid folding and hybridization

pre-diction.Nucleic Acids Res31(13):3406–3415.

4. Bowman GR, et al. (2008) Structural insight into RNA hairpin folding intermediates. J Am Chem Soc130(30):9676–9678.

5. Das R (2011) Four small puzzles that Rosetta doesn’t solve.PLoS One6(5):e20044.

6. Kührová P, BanášP, Best RB,Sponer J, Otyepka M (2013) Computer folding of RNA

tetraloops? Are we there yet?J Chem Theory Comput9(4):2115–2125.

7. Lindorff-Larsen K, Piana S, Dror RO, Shaw DE (2011) How fast-folding proteins fold. Science334(6055):517–520.

8. Cornell WD, et al. (1995) A second generation forcefield for the simulation of

pro-teins, nucleic acids, and organic molecules.J Am Chem Soc117(19):5179–5197.

9. Nozinovic S, Fürtig B, Jonker HR, Richter C, Schwalbe H (2010) High-resolution NMR

structure of an RNA model system: The 14-mer cUUCGg tetraloop hairpin RNA.

Nu-cleic Acids Res38(2):683–694.

10. Heus HA, Pardi A (1991) Structural features that give rise to the unusual stability of

RNA hairpins containing GNRA loops.Science253(5016):191–194.

11. Mohan S, Hsiao C, Bowman JC, Wartell R, Williams LD (2010) RNA tetraloop folding

reveals tension between backbone restraints and molecular interactions.J Am Chem

Soc132(36):12679–12689.

12. Jucker FM, Pardi A (1995) Solution structure of the CUUG hairpin loop: A novel RNA

tetraloop motif.Biochemistry34(44):14416–14427.

13. Pérez A, et al. (2007) Refinement of the AMBER forcefield for nucleic acids:

Improving the description of alpha/gamma conformers.Biophys J92(11):3817–

3829.

14. Menger M, Eckstein F, Porschke D (2000) Dynamics of the RNA hairpin GNRA

tetra-loop.Biochemistry39(15):4500–4507.

15. Johnson JE, Jr., Hoogstraten CG (2008) Extensive backbone dynamics in the GCAA

RNA tetraloop analyzed using 13C NMR spin relaxation and specific isotope labeling.

J Am Chem Soc130(49):16757–16769.

16. Zhao L, Xia T (2007) Direct revelation of multiple conformations in RNA by

femto-second dynamics.J Am Chem Soc129(14):4118–4119.

17. Ma H, et al. (2006) Exploring the energy landscape of a small RNA hairpin.J Am Chem

Soc128(5):1523–1530.

18. Sarkar K, Meister K, Sethi A, Gruebele M (2009) Fast folding of an RNA tetraloop on

a rugged energy landscape detected by a stacking-sensitive probe.Biophys J97(5):

1418–1427.

19. Proctor DJ, et al. (2004) Folding thermodynamics and kinetics of YNMG RNA hairpins:

Specific incorporation of 8-bromoguanosine leads to stabilization by enhancement of

the folding rate.Biochemistry43(44):14004–14014.

20. Blose JM, Proctor DJ, Veeraraghavan N, Misra VK, Bevilacqua PC (2009) Contri-bution of the closing base pair to exceptional stability in RNA tetraloops: Roles

for molecular mimicry and electrostatic factors.J Am Chem Soc131(24):8474–

8484.

21. Sorin EJ, Rhee YM, Pande VS (2005) Does water play a structural role in the folding of

small nucleic acids?Biophys J88(4):2516–2524.

22. Garcia AE, Paschek D (2008) Simulation of the pressure and temperature folding/

unfolding equilibrium of a small RNA hairpin.J Am Chem Soc130(3):815–817.

23. Murata K, Sugita Y, Okamoto Y (2004) Free energy calculations for DNA base stacking

by replica-exchange umbrella sampling.Chem Phys Lett385(1-2):1–7.

24. Norberg J, Nilsson L (1995) Potential of mean force calculations of the

stacking-unstacking process in single-stranded deoxyribodinucleoside monophosphates.

Bio-phys J69(6):2277–2285.

25. Norberg J, Nilsson L (1995) Temperature dependence of the stacking propensity of

adenylyl-3′,5′-adenosine.J Phys Chem99(35):13056–13058.

26. Villa A, Widjajakusuma E, Stock G (2008) Molecular dynamics simulation of the structure, dynamics, and thermostability of the RNA hairpins uCACGg and cUUCGg. J Phys Chem B112(1):134–142.

27. Zhang Y, Zhao X, Mu Y (2009) Conformational transition map of an RNA GCAA

tet-raloop explored by replica-exchange molecular dynamics simulation.J Chem Theory

Comput5(4):1146–1154.

28. Zuo G, Li W, Zhang J, Wang J, Wang W (2010) Folding of a small RNA hairpin based on

simulation with replica exchange molecular dynamics.J Phys Chem B114(17):5835–

5839.

29. Daura X, et al. (1999) Peptide folding: When simulation meets experiment.Angew

Chem Int Ed Engl38(1-2):236–240.

30. Ennifar E, et al. (2000) The crystal structure of UUCG tetraloop.J Mol Biol304(1):

35–42.

31. Jucker FM, Heus HA, Yip PF, Moors EH, Pardi A (1996) A network of heterogeneous

hydrogen bonds in GNRA tetraloops.J Mol Biol264(5):968–980.

32. Selmer M, et al. (2006) Structure of the 70S ribosome complexed with mRNA and

tRNA.Science313(5795):1935–1942.

33. BanášP, et al. (2012) Can we accurately describe the structure of adenine tracts in

B-DNA? Reference quantum-chemical computations reveal overstabilization of

(6)

34. Yildirim I, Stern HA, Kennedy SD, Tubbs JD, Turner DH (2010) Reparameterization of RNA chi Torsion Parameters for the AMBER Force Field and Comparison to NMR

Spectra for Cytidine and Uridine.J Chem Theory Comput6(5):1520–1531.

35. Zgarbová M, et al. (2011) Refinement of the Cornell et al. nucleic acids forcefield

based on reference quantum chemical calculations of glycosidic torsion profiles.

J Chem Theory Comput7(9):2886–2902.

36. Hobza P (2008) Stacking interactions.Phys Chem Chem Phys 10(19):2581–

2583.

37. Kolár M, Berka K, Jurecka P, Hobza P (2010) On the reliability of the AMBER force

field and its empirical dispersion contribution for the description of noncovalent

complexes.ChemPhysChem11(11):2399–2408.

38. Morgado CA, Jurecka P, Svozil D, Hobza P, Sponer J (2010) Reference MP2/CBS and CCSD(T) quantum-chemical calculations on stacked adenine dimers. Comparison with

DFT-D, MP2.5, SCS(MI)-MP2, M06-2X, CBS(SCS-D) and forcefield descriptions.Phys

Chem Chem Phys12(14):3522–3534.

39. Solie TN, Schellman JA (1968) The interaction of nucleosides in aqueous solution. J Mol Biol33(1):61–77.

40. Rosemeyer H, et al. (1990) Syn-anti conformational analysis of regular and modified

nucleosides by 1D 1H NOE difference spectroscopy: A simple graphical method based

on conformationally rigid molecules.J Org Chem55(22):5784–5790.

41. Rhodes LM, Schimmel PR (1971) Nanosecond relaxation processes in aqueous

mononucleoside solutions.Biochemistry10(24):4426–4433.

BIOPHYSICS AND COMPUTATIONAL BIOLOGY SEE COM MENTARY

References

Related documents

We identified authenticity as a priority area for research in the use of moulage in health profes- sions simulation [ 14 ], prior to exploring the many other possibilities

It improved the concept of relations between fuzzy sets, soft sets and hybrid models of these two types of sets because the relations have been constructed based on the Q-NSSs

By using six selected amplifiers, absolutely, completely, entirely, fully, totally, and utterly taken from Skobrankova (2006), this research aims to study and compare the

Singleton budget games with more than two resources still have pure Nash equilibria if (a) the order over the demands of the players is the same for each resource and the ratios

However, this study revealed that the combination of lavender aromatherapy and classical music therapy have a highest effect in lowering blood pressure in

Based on the results of research that have been conducted on the role of organizational development and organizational commitment in improving MSME business

For BYOD programs, these capabilities are sometimes used with very minimal device policy management, giving users more freedom to use devices as they wish while carving out

Because both Simply Accounting’s products and Sage 50 HR were standardized on MySQL, Sage has been able to integrate the two products to create a Simply Accounting by Sage