ON THE ESTIMATION OF EFFECTIVE NUMBER OF ALLELES FROM ELECTROPHORETIC DATA

(1)

ON

O F

O F FROM ELECTROPHORETIC DATA

GEORGE B. JOHNSON

Department of Biology, Washington University, St. h u i s , Missouri 63130

Manuscript received November 2, 1973

ABSTRACT

Estimation of the effective number of alleles in natural populations from the frequencies of electrophoretically detected alleles involves significant ambiguity, which should be explicitly considered in the estimate. For large genetic populations the ambiguity importantly affects the value of ne.

OR

the last decade,

KIMURA

and CROW’S estimation of the effective number of Falleles ( n e ) in a population as 1

+

4 N,U (KIMURA and CROW 1964) has formed the basis of a large body of theoretical work in population genetics. Recently, O H m and

KIMURA

have presented a new model for the estimation of the effective number of alleles in populations (OHTA and KIMURA 1973a, 1973b). They reason that examination of electrophoretic data on allozyme variation may reveal only a fraction of the total number of amino acid changes, as oppositely charged amino acid alterations cancel in their effect upon electrophoretic mobility. To compensate for this, a new estimator of ne, 1 -I- 8

NeU,

is proposed. The new estimator of n, proposed by OHTA and KIMURA is based upon a set of rather unrealistic biochemical assumptions. Because of this, the appropriateness of employing the new formulation to analyze experimental data needs to be ques- tioned.

OHTA and KIMURA’S new model of ne is founded upon the conception that electrophoretically detectable alleles are able to “mutate only to one of the two adjacent (electrophoretic) states. One positive and one negative change in charge cancel each other, leading the allele back to the original state” rather than pro- ducing a third discrete state as assumed in the previous model of KIMURA and CROW. In biochemical terms, this new model makes several basic assumptions: (1) all electrophoretically detectable charges on proteins are assumed to be unit charges, so that opposite changes in charge cancel evenly; (2) differences in the tertiary structures (shapes) of proteins are assumed not to alter significantly the effect of charge upon electrophoretic mobility; (3) non-charge changes should not alter electrophoretic mobility or should do so as unit charge changes; (4) it is assumed that the model is applicable to electrophoretic data in general.

The concept of unit charge: Of the 20 common amino acids which occur in proteins, only six possess side groups potentially charged in aqueous solution. The p K , values of ihese side groups are not identical, however; at any given pH, the proportion of the various side groups which will be ionized will differ. The extent

(2)

TABLE 1

Degree of side group ionization in electrophoresis buffers

“Charged” amino adds Side group

Percent charged

Buffer pH p K , 7.0 8.6 9.0

Acidic ASP -COOH 4.6 99 100 la0

GLU -COOH 4.6 99 100 la0

Basic HIS imidazole 7.0 50 2 1

TYR phenolic 9.6 100 91 80

LYS -HN, 10.2 100 98 94

ARG guanidyl 12 100 100 100

p K , values are those of TANFORD and HAVENSTEIN (1956).

of amino acid ionization in a protein may be directly titrated and individual p K ,

values determined ( TANFORD and HAVENSTEIN 1956), permitting estimation of the proportion of each amino acid side group which should be charged a t any given pH. Surveys of electrophoretic variation employ a variety of buffers (AYALA et al. 1972; PRAKASH, LEWONTIN and

HUBBY

1969; SELANDER and YANG 1969); the principal buffer pH’s are 7.0, 8.6, and 9.0. For each of these three buffer pH values, the proportion of charged side groups is estimated in Table 1. Clearly the proportion of positive and negative charged residues is a sensitive function of the pH. For a random base change in the gene of a-hemoglobin, one can calculate that the probability of the random change involving a charge alteration in the pH range 7-9 is quite high, 0.32. However, within this pH range a

large proportion of the charge alterations will not involve “unit” changes in charge. Table 2 estimates the proportion of non-unit charge changes to be expected for a wide variety of proteins.

I

have restricted consideration to polar residues, as being more characteristic of “surface” amino acids in proteins. In general, charged residues tend to occur on the surface of proteins; changes within the interior of proteins from hydrophobic to charged residues would not normally be expected to be seen because of detrimental effects upon protein structure. The result presented in Table 2 seems quite general: positive and negative alterations in charge very often do not balance, due simply to differing levels of amino acid side group ionization.

(3)

TABLE 2

The probability of non-unit charge changes

7

Gel buffer pH

-

Enzyme PI 9.0 7.0

Alkaline phosphatase Ovalbumin Urease Asparaginase Adenolsine deaminase Tryptophan synthetase Glutamate dehydrogenase Hexokinase Enolase Carboxypeptidase A Alcohol dehydrogenase &-hemoglobin Fumarase Carbonic anydrase Subtilisin B

Glyceraldhye-3-P dehydrogenase Papain

Chymotrypsinogen A Subtilisin C

Ribonuclease A Trypsin Lysozyme Mean (N=24)

4.5 4.6 4.8 4.9 5.0 5.0 5.0 5.2 5.7 6.0 6.8 7.0 7.3 7.3 7.8 8.2 8.8 9.1 9.4 9.6 10.8 11.0 6.9 .45 .4.3 .45 .48 .48 .52 .53 .57 .57 .57 .56 .51 .59 .46 .59 .56 6 3 .54 .62 .64 .56 .42 .52

.I 7

.i6 .16 .14 .18 .18 .21 .18 .21 .19 .17 .36 .25 .21 .22 .23 .15 .15 .20 .23 .18 .13 .20 ~~~

For each of the polar amino acids, the genetic code is used to assess the probability that a charge alteration resulting from a random base change will at a given pH be a fractional charge change and not a ‘‘unit” charge change; relative amino acid composition is then used to correct for the different frequencies with which the polar amino acids occur in different proteins. Non-unit charge alterations are defined as those which involve changes in the ionization fraction

>

0.05, but

<

0.95.

tains 10 lysine residues, and the pK, values for the &-amino groups range from 10.2 to 12 (TANFORD and HAVENSTEIN 1956) in papain, the pKa values for the phenolic groups of the 13 tyrosine residues titrate over the entire range from 8.5 to 14 (GLAZER and SMITH 1961).

This variation of pK, within proteins is commonly thought of as reflecting not only electrostatic effects resulting from the ionization of other groups in the protein but also intra-molecular hydrogen bonding. In myoglobin, 30% of potentially “charged” amino acids are involved in such strong intra-molecular hydrogen bonding, while the other 70% are not (Table 3 ) . Clearly, different residues of a given emino acid are not interchangeable with respect to their charge; the degree of their ionization will in each case depend upon that residue’s position in the protein. This matter is discussed explicitly in the HENNING and YANOFSKY article cited by OHTA and KIMURA (HENNING and YANOFSKY 1963).

Mutations in non-charged amino acids: OHTA and KIMURA’S model tacitly

(4)

TABLE 3

Bond iiiuoluemnt of polar residues in myoglobin*

Amino acid No.

involved in strong Total no. No. “buried” intra-molecular bonding

GLU ASP TYR HIS LYS ARG

19 8 3 11

19

4

* From RICHARDS (1963).

amino acid charge. There is little data available which bears o n this point, but it seems quite possible that changes in non-polar residues might affect conforma- tion. Such conformational changes might be detected as electrophoretic variants if they affect the contribution to electrophoretic mobility of partially exposed charged residues, or alter the retardation coefficient (CHRAMBACH and RODBARD 1971) of the protein migrating in the gel. Evidence for the latter effect has been obtained in m y own laboratory (JOHNSON, in preparation).

Applicability to experimental data: A wide variety of pH values is commonly employed in electrophoretic analysis. Given the differential effect of non-uniform buffer pH upon electrophoretic charge, it is difficult to justify applying OHTA and KIMURA’S model unifolrmly over this range of data.

Commonly employed buffers often differ greatly in ionic strength. Tenfold differences in buffer molarity are common in individual studies (AYALA et al. 1972; SELANDER and

YANG

1969). Ionic strength does not importantly affect ionization of proteins at buffer pH’s near their isoelectric points. However, when electrophoresis buffers are employed with pH values several units from a protein’s isoelectric point, the effect of ionic strength upon hydrogen ion dissociation may no longer be ignored: A 10-fold difference in ionic strength may produce a

5%

to 10% difference in the ionization fraction of a protein in a buffer two pH units from its isoelectric point (TANFORD 1961). For electrophoretic surveys of many enzyme loci, just such experimental conditions are commonly employed. Thus, variability in the ionic strength of experimental buffers also renders uniform application of the model difficult. Needless to say, variation between alleles in their isoelectric points further confounds the issue.

(5)

Clearly, the truth is somewhere intermediate. The real degree of non-identification due to charge cancelation will depend for each protein upon its amino acid composition, and particularly upon the details of its tertiary structure. Any general model of ne which concerns electrophoretic data will require general assumptions about the nature of protein charge distribution; it will be difficult to justify applying any such model to the quite varied data now available. This serves to point up a fundamental problem in coiming to grips with this issue: whenever ne is defined in terms of electrophoretic alleles only, serious ambigui- ties in interpretation are inevitable because of the biochemical complexity of the phenotype being scored. When all alleles can be identified, electrophoretic and non-electrophoretic alike, then an unambiguous formulation olf ne as 1 -I-

4

NeU

will be possible. Until that time, formulations attempting to correct an uncharac- terized bias in electrophoretic data are at best temporary expedients whose empirical utility is impossible to evaluate.

Complete identification of alleles is far from an impossibility and is of such central importance that a major effort in this direction certainly seems warranted. Detailed sequence information, despite great technical and financial difficulties, presents the strongest approach, and ultimately the only fully certain one. A great deal of useful preliminary information may be obtained from simpler approaches: (1 ) surveys of protein structural stability using heat or urea denatur- ation as probes; (2) ascertainment of free electrophoretic mobility under denatur- ing conditions such as in urea gels, when pure proteins are available; (3) surveys of conformational differences using changes in the retardation coefficient on polyacrylamide gels.

Because the consequences of changes in hydrophobic “internal” residues are radically different from those resulting from changes in polar “surface” residues, the evolutionary forces governing the number and frequency of alleles may differ greatly between the two classes. Nor need this difference be the same for different enzyme loci. Thus, some caution is necessary in evaluating the evolutionary significance of patterns of variation in electrophoretic data, whether it involves estimation of n e , or correlations with other patterns of variation.

A

satisfactory general formulation of ne applicable to solely electrophoretic data seems unrealistic. Either the entire range of allelic variation should be empirically examined, or a formulation should be adopted based on explicit and verifiable assumptions about the nature of protein charge distribution. As we see in the case of OHTA and KIMURA’S latest model oE ne, the nature of protein structural variability makes general verification of assumptions about protein charge distribution a practical impossibility with even the most refined biochemical analysis. The probability that such estimates are substantially correct does not seem great. In the absence of a satisfactory formulation of electrophoretic ne, it might be better to explicitly recognize the ambiguity in the data, and define ne only as having some value between 1 I+ 4 N,U and dl

i-

8 N,U. For large

genetic populations the difference is substantial.

_ _ ~ ~

(6)

L I T E R A T U R E C I T E D

AYALA, F., J. POWEL& M. TRACEY, C . MOURAO and S. PEREZ-SALAS, 1972 Enzyme variability in the Drosophila willistoni group. IV. Genic variation in natural populations of Drosophila willistoni. Genetics 70: 113-139.

CHRAMBACH, A. and D. RODBARD, 1971 Polyacrylamide gel electrophoresis. Science 172 :

*%l.

GLAZER, A. and E. SMITH, 1961 Phenolic hydroxyl ionization in papain. J. Biol. Chem. 236: 2948-295 1.

HENNING, U. and C. YANOFSKY, 1963 An electrophoretic study of mutationally altered A proteins of the tryptophan syiithetase cf Escherichia coli. J. Mol. Biol. 6: 1G21.

KIMURA, M. and J. CROW, 1964 The number of alleles that can be maintained in a finite population. Genetics 49 : 725-738.

KOJIMA, K., J. GILLESPIE and Y. TOBARI, 1970 A profile of Drosophila species’ enzymes assayed by electrophoresis. I. Number of alleles, heterozygosities, and linkage disequilibrium in glucose-metabolizing systems and some other enzymes. Biochem. Genet. 4: 627-637.

A new model for estimating the number of electrophoretically detectable alleles in a finite population. Genetics 74: ~201.

-

, 1973b A model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a finite population. Genet. Res. 22: 201-W.

PRAKASH, S., R. LEWONTIN and 5. HUBBY, 1969 A molecular approach to the study of genic heterozygosity in natural populations. IV. Patterns of genic variation in central, marginal and isolated populations of Drosophila pseudrwbscura. Genetics 61 : 841-858.

OHTA, T. and M. KIMURA, 1973a

RICHARDS, F. M , 1963

SUNDER, R. and S. YANG, 1969 Protein polymorphism and genic heterozygosity in a wild

TANFOBD, C., 1961

TANFORD, C . and J. D. HAVENSTEIN, 1956 Soc. 78: 5287.

Structure of proteins. Ann. Rev. Biochem. 32: 269.

population of the house mmse ( M u s musculus). Genetics 63 : 653-667.

Physical Chemistry of Macromolecules. Wiley, New York.

Hydrogen ion equilibria of ribonuclease. J. Am. Chem.