Computer Science and Engineering Faculty Publications. Computer Science & Engineering

(1)

Wright State University Wright State University

CORE Scholar CORE Scholar

Computer Science and Engineering Faculty

Publications Computer Science & Engineering

2003

Protein Structure, Function, and Folding Protein Structure, Function, and Folding

Dan E. Krane

Wright State University - Main Campus, [email protected]

Michael L. Raymer

Wright State University - Main Campus, [email protected]

Follow this and additional works at: https://corescholar.libraries.wright.edu/cse

Part of the Computer Sciences Commons, and the Engineering Commons

Repository Citation Repository Citation

Krane, D. E., & Raymer, M. L. (2003). Protein Structure, Function, and Folding. . https://corescholar.libraries.wright.edu/cse/389

(2)

(3)

Intro to Bioinformatics

Protein Folding 2

Determining Protein Structure

• There are O(100,000) distinct proteins in the

human proteome.

• 3D structures have been determined for 14,000

proteins, from all organisms

• Includes duplicates with different ligands bound, etc.

• Coordinates are determined by X-ray

(4)

X-Ray Crystallography

~0.5mm

• The crystal is a mosaic of millions of copies of the protein.

(5)

Protein Folding 4

X-Ray diffraction

• Image is averaged

over:

• Space (many copies) • Time (of the diffraction

(6)

(7)

Protein Folding 6

The Protein Data Bank

ATOM 1 N ALA E 1 22.382 47.782 112.975 1.00 24.09 3APR 213 ATOM 2 CA ALA E 1 22.957 47.648 111.613 1.00 22.40 3APR 214 ATOM 3 C ALA E 1 23.572 46.251 111.545 1.00 21.32 3APR 215 ATOM 4 O ALA E 1 23.948 45.688 112.603 1.00 21.54 3APR 216 ATOM 5 CB ALA E 1 23.932 48.787 111.380 1.00 22.79 3APR 217 ATOM 6 N GLY E 2 23.656 45.723 110.336 1.00 19.17 3APR 218 ATOM 7 CA GLY E 2 24.216 44.393 110.087 1.00 17.35 3APR 219 ATOM 8 C GLY E 2 25.653 44.308 110.579 1.00 16.49 3APR 220 ATOM 9 O GLY E 2 26.258 45.296 110.994 1.00 15.35 3APR 221 ATOM 10 N VAL E 3 26.213 43.110 110.521 1.00 16.21 3APR 222 ATOM 11 CA VAL E 3 27.594 42.879 110.975 1.00 16.02 3APR 223 ATOM 12 C VAL E 3 28.569 43.613 110.055 1.00 15.69 3APR 224 ATOM 13 O VAL E 3 28.429 43.444 108.822 1.00 16.43 3APR 225 ATOM 14 CB VAL E 3 27.834 41.363 110.979 1.00 16.66 3APR 226 ATOM 15 CG1 VAL E 3 29.259 41.013 111.404 1.00 17.35 3APR 227 ATOM 16 CG2 VAL E 3 26.811 40.649 111.850 1.00 17.03 3APR 228

(8)

A Peek at Protein Function

• Serine proteases – cleave other proteins

(9)

Protein Folding 8

(10)

Three Serine Proteases

• Chymotrypsin – Cleaves the peptide bond on

the carboxyl side of aromatic (ring) residues: Trp, Phe, Tyr; and large hydrophobic residues: Met.

• Trypsin – Cleaves after Lys (K) or Arg (R)

• Positive charge

• Elastase – Cleaves after small residues: Gly,

(11)

Protein Folding 10

(12)

The Protein Folding Problem

• Central question of molecular biology:

“Given a particular sequence of amino acid

residues (primary structure), what will the tertiary/quaternary structure of the resulting protein be?”

• Input: AAVIKYGCAL…

Output: φ₁ψ₁, φ₂ψ₂…

(13)

Protein Folding 12

Protein Folding – Biological perspective

• “Central dogma”: Sequence specifies structure

• Denature – to “unfold” a protein back to

random coil configuration

• β-mercaptoethanol – breaks disulfide bonds

• Urea or guanidine hydrochloride – denaturant • Also heat or pH

• Anfinsen’s experiments

• Denatured ribonuclease

• Spontaneously regained enzymatic activity

(14)

Folding intermediates

• Levinthal’s paradox – Consider a 100 residue

protein. If each residue can take only 3

positions, there are 3100 = 5 × 1047 possible

conformations.

• If it takes 10-13s to convert from 1 structure to another, exhaustive search would take 1.6 × 1027

years!

• Folding must proceed by progressive

stabilization of intermediates

(15)

Protein Folding 14

Forces driving protein folding

• It is believed that hydrophobic collapse is a key

driving force for protein folding

• Hydrophobic core

• Polar surface interacting with solvent

• Minimum volume (no cavities)

• Disulfide bond formation stabilizes

• Hydrogen bonds

(16)

Folding help

• Proteins are, in fact, only marginally stable

• Native state is typically only 5 to 10 kcal/mole more stable than the unfolded form

• Many proteins help in folding

• Protein disulfide isomerase – catalyzes shuffling of disulfide bonds

(17)

Protein Folding 16

The Hydrophobic Core

• Hemoglobin A is the protein in red blood cells

(erythrocytes) responsible for binding oxygen.

• The mutation E6→V in the β chain places a

hydrophobic Val on the surface of hemoglobin

• The resulting “sticky patch” causes hemoglobin

S to agglutinate (stick together) and form fibers which deform the red blood cell and do not

carry oxygen efficiently

• Sickle cell anemia was the first identified

(18)

Sickle Cell Anemia

(19)

Protein Folding 18

Computational Problems in Protein Folding

• Two key questions:

• Evaluation – how can we tell a correctly-folded protein from an incorrectly folded protein?

 H-bonds, electrostatics, hydrophobic effect, etc.

 Derive a function, see how well it does on “real” proteins

• Optimization – once we get an evaluation function, can we optimize it?

 Simulated annealing/monte carlo  EC

 Heuristics

(20)

Fold Optimization

• Simple lattice models

(HP-models)

• Two types of residues: hydrophobic and polar • 2-D or 3-D lattice

• The only force is hydrophobic collapse

(21)

Protein Folding 20

• H/P model scoring: count noncovalent

hydrophobic interactions.

• Sometimes:

• Penalize for buried polar or surface hydrophobic residues

(22)

What can we do with lattice models?

• For smaller polypeptides, exhaustive search can

be used

• Looking at the “best” fold, even in such a simple model, can teach us interesting things about the protein folding process

• For larger chains, other optimization and search

methods must be used

• Greedy, branch and bound

(23)

Protein Folding 22

• The “hydrophobic zipper” effect:

Learning from Lattice Models

(24)

• Absolute directions

• UURRDLDRRU

• Relative directions

• LFRFRRLLFFL

• Advantage, we can’t have UD or RL in absolute • Only three directions: LRF

• What about bumps? LFRRR

• Bad score

(25)

Protein Folding 24

Preference-order representation

• Each position has two “preferences”

• If it can’t have either of the two, it will take the “least favorite” path if possible

• Example: {LR},{FL},{RL},

{FR},{RL},{RL},{FR},{RF}

• Can still cause bumps:

(26)

“Decoding” the representation

• The optimizer works on the representation, but

to score, we have to “decode” into a structure that lets us check for bumps and score.

• Example: How many bumps in:

URDDLLDRURU?

• We can do it on graph paper

• Start at 0,0

• Fill in the graph

(27)

Protein Folding 26

A two-dimensional array in PERL

(28)

Setting up the grid

foreach $move (@moves) {

(29)

Protein Folding 28

More realistic models

• Higher resolution lattices (45° lattice, etc.)

• Off-lattice models

• Local moves

• Optimization/search methods and φ/ψ representations

 Greedy search

 Branch and bound

(30)

The Other Half of the Picture

• Now that we have a more realistic off-lattice

model, we need a better energy function to evaluate a conformation (fold).

• Theoretical force field:

• ∆G = ∆G_{van der Waals} + ∆G_h-bonds + ∆G_solvent + ∆G_coulomb

• Empirical force fields

(31)

Protein Folding 30

Threading: Fold recognition

• Given:

• Sequence:

IVACIVSTEYDVMKAAR… • A database of molecular

coordinates

• Map the sequence onto

each fold

• Evaluate

• Objective 1: improve scoring function

(32)

(33)

Protein Folding 32

Secondary Structure Prediction

• Easier than folding

• Current algorithms can prediction secondary structure with 70-80% accuracy

• Chou, P.Y. & Fasman, G.D. (1974).

Biochemistry, 13, 211-222.

• Based on frequencies of occurrence of residues in helices and sheets

• PhD – Neural network based

• Uses a multiple sequence alignment

(34)

Chou-Fasman Parameters

Name Abbrv P(a) P(b) P(turn) f(i) f(i+1) f(i+2) f(i+3)

(35)

Protein Folding 34

Chou-Fasman Algorithm

• Identify α-helices

• 4 out of 6 contiguous amino acids that have P(a) > 100

• Extend the region until 4 amino acids with P(a) < 100 found

• Compute ΣP(a) and ΣP(b); If the region is >5 residues and ΣP(a) > ΣP(b) identify as a helix

• Repeat for β-sheets [use P(b)]

• If an α and a β region overlap, the overlapping

(36)

Chou-Fasman, cont’d

• Identify hairpin turns:

• P(t) = f(i) of the residue × f(i+1) of the next residue × f(i+2) of the following residue × f(i+3) of the

residue at position (i+3)

• Predict a hairpin turn starting at positions where:

 P(t) > 0.000075

(37)

Protein Folding 36

Chou-Fasman Example

• CAENKLDHVRGPTCILFMTWYNDGP

• CAENKL – Potential helix (!C and !N)

 Residues with P(a) < 100: RNCGPSTY

• Extend: When we reach RGPT, we must stop • CAENKLDHV: ΣP(a) = 972, ΣP(b) = 843 • Declare alpha helix

• Identifying a hairpin turn

• VRGP: P(t) = 0.000085 • Average P(turn) = 113.25