Chapter 9. Visualizing Protein Structures and Computing Structural Properties
9.2 The Chemistry of Proteins
9.2.1 From 1D to 3D
How does the chemistry of a protein relate to its 1D sequence? In Chapter 8, we discussed techniques for detecting characteristic conserved patterns, called motifs, in families of protein sequences. We can find these sequence patterns in 1D data because although the 3D structure of a protein is complex, it is somehow determined by the invariant sequence of amino acids that makes up the protein. Motifs that are conserved in sequence often are related to important structural or functional features of a protein family, and those features often can be understood by their roles in the protein structure.
When amino acids come together in sequence to form a polymer, they do so by forming a peptide bond between the basic amino group and the acidic carboxyl group of each amino ac id (Figure 9-1). This results in a long chain of amino acids that has a repeating backbone structure.
Figure 9-1. Peptide bond, peptide chain (chemical notation)
The variable group of each amino acid protrudes from the repeating backbone and is referred to in the protein structure business as a sidechain (Figure 9-2). Each of the 20 amino acid sidechains is
chemically different from the others in some respect.
The sidechains can be classified in many ways. Some are relatively large, while others are tiny or in one case nonexistent. Some have a positive or negative charge. Some are oily, or hydrophobic (water - fearing), meaning that it's energetically unfavorable for them to be solvated in water. Others are
hydrophilic (water-loving), and they solvate easily in water. Some have bulky ringlike structures, while others are straight carbon chains. Some are acids, others are bases. Amino acids are conserved through evolution at specific locations in a protein sequence because they are needed there, whether to stabilize the protein structure, to form a specific binding site, or to catalyze a reaction. You can detect that particular amino acids in a protein are conserved by looking at sequence data, but to develop a hypothesis about why they are conserved, it's helpful to examine the 3D protein structure. Figure 9-3
shows the 20 amino acids classified into chemically similar groups. Note that many of the amino acids fall into more than one category. An amino acid sidechain can be both "nonpolar" and "basic," for instance, like lysine, which has a long aliphatic sidechain that terminates in an amino group. Because
the relationship between chemical characteristics and amino acids isn't one-to-one, but rather many-to- many, it's not always simple to predict the effects of an amino acid substitution.
Figure 9-3. The amino acid sidechains (classification in a Venn diagram)
Interatomic forces aren't responsible only for specific interactions that form binding and interaction sites; they also are responsible for the formation of certain standard patterns that are consistently observed in protein structure. The amino acid backbone is sterically constrained—restricted from moving in certain ways because atoms will bump into each other—to follow only certain pathways. You may already be familiar with the alpha helix and beta sheet structures that commonly occur in protein structures; the reason that alpha helices and beta sheets are common is the steric restrictions on the protein backbone.
From the known structures of amino acids, Pauling and Corey first predicted the existence of alpha helices and beta sheets as a component of protein structure. Ramachandran first described exactly what range of conformations are available to amino acids in a peptide chain. Peptide chain conformation is simply described by the values of the dihedral angles in the protein backbone (i.e., the angle described by the four atoms surrounding the N-C bond and the angle described by the four atoms surrounding the C -C bond). These angles are referred to as and , respectively. The chain isn't free to rotate around the third kind of bond in the protein backbone, the peptide bond, because it is a partial double bond and hence chemically constrained to be planar, so the values of and for each amino acid provide a complete description of the protein backbone. A Ramachandran map is simply a plot of versus for an entire protein structure. One means of evaluating a protein structure model is to compare its individual Ramachandran map with the general Ramachandran map of allowed values of
and .
Figure 9-4 is a general Ramachandran map that shows the allowed combinations of and values for amino acids in protein structures. The small shaded region in the lower left quadrant of the map is the standard conformation of an amino acid in an alpha helix. The larger shaded region in the upper left quadrant of the map is the standard conformation of an amino acid in a beta sheet, or extended structure.
It's apparent from the Ramachandran map that steric interactions are very important determinants of the general features of protein structure. Steric interactions instantly eliminate a large fraction of possible conformations for proteins and leave relatively few options for how a compact structure can form from a linear chain of amino acids.
The sequence of a protein is called its primary structure; the most basic level of organization in a protein is the sequence of amino acids. Alpha helix and beta sheet structures, shown in Figure 9-5, are known collectively as secondary structures and are the next level of organization. Interactions between multiple secondary structure elements give rise to supersecondary structure and tertiary structure— helices and sheets contacting each other to form larger characteristic structures, which can be described by their topology.
To create a functional protein, the sequence of amino acids in the protein chain must give rise to the proper 3D fold for the protein, and it must also place individual amino acids at appropriate points on that scaffold to carry out the protein's chemistry. Finding ways to extract those chemical instructions from the sequences of known proteins, formulating them as rules, and using those rules to predict the structure of other proteins is one of the biggest open research problems in bioinformatics.