• No results found

Steffen Lindert, René Staritzbichler, Nils Wötzel, Mert Karakaş, Phoebe L. Stewart, and Jens Meiler

N/A
N/A
Protected

Academic year: 2021

Share "Steffen Lindert, René Staritzbichler, Nils Wötzel, Mert Karakaş, Phoebe L. Stewart, and Jens Meiler"

Copied!
9
0
0

Loading.... (view fulltext now)

Full text

(1)

Structure 17

Supplemental Data

EM-Fold: De Novo Folding of

α

-Helical Proteins

Guided by Intermediate-Resolution

Electron Microscopy Density Maps

Steffen Lindert, René Staritzbichler, Nils Wötzel, Mert Karakaş, Phoebe L. Stewart, and Jens Meiler

SUPPLEMENTAL EXPERIMENTAL PROCEDURES EM-Fold uses knowledge-based scores

Both in the EM-Fold assembly as well as refinement protocol, knowledge-based scores are used. A full evaluation of the scores will be published shortly (N. Woetzel, R. Staritzbichler, R.

Mueller, E. Durham, M. Karakas, J. Meiler, unpublished results). Only a brief description of the knowledge-based scores will be given here.

Amino acid pair distance potential

This score was derived by evaluating the distribution of Cβ atom distances between pairs of amino acids in the protein data bank. All distances between amino acids that are separated at least 10 residues in sequence and up to 50 Å in distance contributed to the statistics. This score checks whether pairs of amino acids in the computational model are placed in their preferred distance.

Neighbor count score

The neighbor count score constitutes an amino acid environment potential in that it captures the preference of an amino acid to either be exposed to solvent or buried in the core of the protein. It was implemented by counting the number of neighbors within a radius of 11.4 Å around an amino acid of interest (with a sinusoidal down-weighting of neighbors more than 4 Å away) within the protein data bank.

Loop score

The loop score is central in the EM-Fold algorithm as it is the only knowledge-based score that is used in both the assembly and refinement protocol. The loop score was derived by comprising information on the sequence length of loops and the Euclidian distance these loops are bridging. There are preferences for a combination of certain lengths with specific distances that are bridged.

Secondary structure element packing score

EM-Fold utilized exclusively the α-helix|α-helix packing component of a more general secondary structural element packing score. The score has been derived by gathering statistics over the distance and twist angle between helical axes for contacting helices in the protein data bank. Two helices prefer to pack with an angle of -45º and a distance of 10 Å.

Radius of gyration score

The radius of gyration is proportional to the square root of the mean square distance of all Cβ

atom coordinates to their mean position. It basically describes the compactness of a protein and is used to assemble proteins in a compact manner.

Occupancy score favors models with filled density rods

The occupancy score is one of the three scores used in the assembly step of EM-Fold (as

described in Experimental Procedures). It was introduced to drive EM-Fold to build models that have as many helices placed into the identified density rods as possible, with the fit of the helices into the density rods optimized. Supplemental Figure 1 summarizes the occupancy score. The length of the density rod (A) as well as the length of the helix (B) that is placed into it, are measured. Depending the deviation of these lengths, the occupancy scores takes on values

between 0 (for deviations larger than 3 residues and empty density rods) and -1 (for deviations up to one residue). This rewards models with filled density rods.

(2)

Connectivity score favors models that connect density rods with strong density between them

The connectivity score is one of the three scores used in the assembly step of EM-Fold (as described in Experimental Procedures). It measures the strength of the connection between two density rods. Supplemental Figure 2 exemplifies that some pairs of density rods are connected by stronger density than others. If two helices are placed into density rods with a strong connection between them, the connectivity score rewards this model.

EM-Fold refinement protocol improves α-helical orientations

It was pointed out in the results section that the refinement protocol generally improves the ranking and the quality of the true model. This is mainly due to the identification of native α -helical orientations. Supplemental Figure 3 illustrates the improvement of α-helix orientation during the refinement step on three examples.

ROSETTA iterative high resolution refinement achieves enrichment in true topology models

All the seven proteins where EM-Fold was able to find the correct topology (1IE9, 1N83, 1OUV, 1QKM, 1TBF, 1Z1L and 2AX6) were subjected to an iterative ROSETTA refinement protocol (see

Experimental Procedures). Supplemental Figure 4 shows the total full-atom ROSETTA energy for

all the models and the native structure plotted versus the RMSD of the model for all seven proteins.

Refined Ad35F map shows better FSC curve

The published Ad35F density map (Saban et al., 2006) was refined further by including more particle images and refining the magnification value of each particle with Frealign (Grigorieff, 2007) until large aromatic side chains could be observed. A FSC plot of the refined Ad35F density map is shown in Figure S5. The additional refinement made a relatively small change in the FSC 0.5 resolution (6.9 to 6.8 Å). However, the extra refinement made more significant improvements in the FSC 0.3 (6.1 to 5.8 Å) and FSC 0.143 (5.4 to 5.2 Å) resolutions.

Trp side chain density can be clearly identified in hexon

The refined Ad35F density map exhibits clear density bumps corresponding to large aromatic side chains. Comparison of the density map with the docked penton base and hexon crystal structures indicates that the Trp side chains have the most prominent bumps in the density map. Bumps for other aromatic sidechains are observed but with lower isosurface values. The density and docked hexon crystal structure in the vicinity of the two Trp side chains in hexon helices are shown in Figure S6.

Computation

The EM-Fold and ROSETTA calculations were performed on the ACCRE Linux cluster at

(3)

Figure S1. Schematic Representation of the Occupancy Score Used in the EM-Fold Assembly and Refinement Steps

(A) The length of a density rod (in α-helical residues) is determined.

(B) The length of the α-helix that is placed into the density rod (in α-helical residues) is determined.

(C) Plot of occupancy score vs. the absolute difference between the length of the α-helix and the length of the density into which it is placed.

If the length of the α-helix and density rod do not differ by more than one residue, the maximum reward (-1.0) is given. If they differ by more than the assigned length tolerance (3 residues), or if the density rod is left empty, no reward is given. For any length deviation in between these two extremes a partial reward is given.

(4)

Figure S2. Conceptual Scheme of the Connectivity Score

Panels A and B show simulated density for protein 1OUV, while panels C through F show experimental cryoEM density assigned to adenovirus protein IIIa (Saban et al., 2006). Each density region is shown at high isosurface (red) and at lower isosurface (grey in panels B,D,F) revealing additional weak density. Some density rods are connected by density (black arrows). For 1OUV these connections represent true peptide links between helices. The connectivity score captures the strength of these density connections between ends of density rods that are closer than 10 Å in space.

(5)

Figure S3. Three Examples of Improved α-Helical Orientations after the EM-Fold Refinement Protocol

Black: model before refinement step. Red: native α-helix from PDB. Blue: model after refinement step.

(A, C, and E) The α-helix in the model before refinement is turned by approximately 180° around the α-helical axis with respect to the native structure.

(B, D, and F) The refinement was able to correctly turn the α-helix with respect to the native structure. Generally, only a slight shift along the α-helical axis remains between the model after refinement and the native α-helix.

(6)

Figure S1.

ROSETTA full-atom score vs. rmsd-to-native (over α-helical residues) after 8 rounds of iterative

side chain repacking and backbone relaxation for 1IE9 (A), 1N83 (B), 1OUV (C), 1QKM (D), 1TBF (E), 1Z1L (F) and 2AX6 (G). The red data points represent the final full atom models that went through the whole assembly, loop building and refinement protocol. The black data points show the native after 8 rounds of repacking and relaxation for comparison. For all seven proteins the relaxed native structure scores better than any of the full-atom models built. Within the best 10% scoring models there is an enrichment for the correct topology in six out of seven cases: 7.6 (1Z1L), 4.0 (1IE9), 3.8 (1OUV), 2.6 (1QKM), 1.6 (1TBF) and 1.2 (1N83).

(7)

The final Ad35F structure is based on 3040 particle images and has a resolution of 6.8 Å at the FSC 0.5 threshold (and 5.8 Å at the FSC 0.3, and 5.2 Å at the FSC 0.143 thresholds).

Figure S3.

Density bumps for the only two Trp sidechains in helices 3 (A, B) and 6 (C, D) of hexon. Panels A and C show the hexon crystal structure for residues 462-480 (helix 3, panel A) and residues 759-773 (helix 6, panel C) with the Trp side chain coordinates. Panels B and D show the crystal structure for the same helices overlaid with the refined Ad35F density map. For both Trp side chains clear density bumps are visible. The black arrows represent the position of the Trp side chain.

SUPPLEMENTAL REFERNCES

Grigorieff, N. (2007). FREALIGN: High-resolution refinement of single particle structures. Journal of Structural Biology 157, 117-125.

Saban, S.D., Silvestry, M., Nemerow, G.R., and Stewart, P.L. (2006). Visualization of alpha-helices in a 6-angstrom resolution cryoelectron microscopy structure of adenovirus allows refinement of capsid protein assignments. J Virol 80, 12049-12059.

(8)

Table S1. Overview over the ten proteins used in the assembly benchmark

PDB-ID Description residues α

-helicesa residueshelical b resolution [Å] contact orderc

1IE9 Vitamin D3 receptor 259 7 145 1.4 46

1N83 Nuclear receptor ROR-alpha 270 7 161 1.6 44 1OUV Conserved hypothetical secreted protein 273 14 207 2.0 15

1QKM Estrogen receptor beta 255 8 168 1.8 43

1TBF cGMP-specific 3',5'-cyclic phosphodiesterase 347 13 220 1.3 40 1V9M V-type ATP synthase subunit C 323 8 158 1.9 46 1XQO 8-oxoguanine DNA glycosylase 256 10 151 1.0 43 1Z1L cGMP-dependent 3',5'-cyclic phosphodiesterase 345 12 201 1.7 41

2AX6 Androgen receptor 256 6 145 1.5 40

2CWC ADP-ribosylglycohydrolase 303 9 152 1.7 59

1GZM Bovine Rhodopsin 349 8 197 2.7 82

a All α-helices with at least 12 residues are shown

b The total helical content per protein varies from 60 to 68%

c Contact order is defined as the average sequence separation of all residues that are in contact within the protein. The higher the contact order, the more complex the fold is. Values above 40 are considered to reflect very complex folds.

(9)

Table S2. Evaluation of different secondary structure prediction pools

Pool A a Pool B b Pool C c

protein deviation d (residues) number helices deviation d (residues) number helices deviation d (residues) number helices 1IE9 1.3 35 0.7 36 0.4 143 1N83 1.1 38 0.6 40 0.1 156 1OUV 0.9 67 0.4 67 0.1 268 1QKM 1.1 39 0.9 39 0.3 156 1TBF 1.7 52 0.3 52 0.1 206 1V9M 1.6 39 1.6 41 0.9 160 1XQO 2.1 33 1.1 34 0.9 135 1Z1L 1.6 53 0.8 53 0.4 212 2AX6 1.8 35 1.2 35 0.8 140 2CWC 1.8 56 1.1 57 0.7 225 average 1.5 45 0.8 45 0.4 180

prediction number helices number helices number helices

shorter 255 130 586

longer 56 153 308

equal 43 71 168

a Pool A contains predictions from jufo, psipred, sam; a consensus of the three; and a consensus where long helices are

broken into smaller pieces.

b Pool B replaces all of the helices in pool A with copies that have one residue added to both the N-terminal and

C-terminal ends of each helix.

c Pool C contains all of the helices in pool A and additional longer copies with one residue added to the N-terminal end,

one residue added to the C-terminal end, and one residue added to both the N-terminal and C-terminal ends of each helix.

References

Related documents

As an affiliate sponsored intern, the student was also invited to attend the annual Smithsonian Affiliations National Conference and to be recognized by the Smithsonian American

students agreed they had learned more about course concepts as a result of their service-learning experience, and the majority felt their service-learning activity provided a

2019 Poster Presentation at Biophysical Society Annual Meeting, Baltimore, MD 2018 Oral Presentation at ASMS National Meeting, San Diego, CA. 2018 Oral Presentation at ISHR

combining the 10 synthetic DEMs generated from this input. b) and c) Volume and height: Solid black and black dashed lines as a), with black

Corporate and retail loan portfolio acquisitions; M&A origination and transaction support for acquisition of a number of subsidiaries of Western banks in

To summarize the results of the counterfactual road construction exercise, if we compare results for actual road expenditure (Table 1), actual paved roads (Table 2),

ing whether defendants are constitutionally entitled to exculpatory evidence prior to entering a guilty plea). 2010) (holding that criminal defendants are not

Os temas de maior freqüência, nos artigos da revista brasileira, também estão presentes nas pesquisas de Pinheiro (1997, 2002) sobre a área no exterior, variando apenas a ordem