A COMPUTER PROGRAM FOR CALCULATING DEGREE OF BIOGEOGRAPHICAL RESEMBLANCE BETWEEN AREAS

(1)

BIOGEOGRAPHICAL RESEMBLANCE BETWEEN AREAS

JAMES A. PETERS Abstract

A program, written in the computer language BASIC, is described that calculates by various formulas the degree of biogeographical resemblance between areas. The various formulas are described, and the proper machine instructions for each are indicated.

In their recent analysis of the distribu-tional patterns of North American mam-mals, Hagmeier and Stults (1964:131) summarized the various formulas that have been proposed by biogeographers for use in the analysis of the degree of relationship between two areas. They chose three of these formulas to use in their calculations, and listed four more that had been sug-gested and used by various authors. The decision by Hagmeier and Stults to use the "Coefficient of Community" and the "Simpson Coefficient" was based on the simplicity of the formulas and the ease with which they could be calculated. Sev-eral of the other formulas used, such as the "Resemblance Equation" of Preston (1962), include logarithms, square roots, or consultation of tables, and the increased amount of effort involved has interfered with their use by biogeographers. Since most of the formulas were introduced as attempts to eliminate difficulties resulting from different sample size, it seems likely that a worker might wish to see what results all would give with his data. In the past, however, biogeographers have usually chosen one formula, because if the number of localities involved is very large, the amount of time devoted to the calcula-tion of values is almost prohibitive.

The advent of computers greatly facili-tated such calculations, and there has been an increase of attempts to deal with such data. But the computer has presented additional problems of accessibility, pro-gramming, and cost. Time-shared com-puters, which I have discussed elsewhere (Peters, in press), have eliminated these problems by using a teletype connection

to a central computer used simultaneously by many subscribers. Since the teletype can be installed any place, access is maxi-mum. The cost, because of the simul-taneous use, is very low, and is well within the range of tolerance of museum budgets. The programming, representing a major breakthrough, is usually in the user oriented computer language called "BASIC," devel-oped at Dartmouth University. I have written a program for such a time-share computer that will calculate all of the formulas for biogeographic comparisons known to me. This program runs all for-mulas at the same rate of speed, and the difficulties presented by square roots or logarithms no longer exist, since the ma-chine can do all such calculations. It should be clearly understood at this point that the biogeographer does not need to understand the jargon of the program, nor .does he need previous experience with a computer. I am assuming only that he has access to a teletype connected to a com-puter, and that he either knows or will take five minutes to learn the rudiments of teletype operation and how to contact the computer. After that, all he need do is to insert his data as directed here, and watch the answers being printed out for as many different formulas as he wishes to test.

The following program is written in BASIC. There should be little difficulty in converting it into another language, if anyone reading this paper has access to a computer that does not accept it. Ordi-narily, this will have to be done by a computer programmer, because, although work is being done on machine translation

64

at University of Sussex on April 14, 2012

http://sysbio.oxfordjournals.org/

(2)

from one computer language to a second, it has not yet been achieved. The first part of the program is as follows:

PROGRAM "FRF"

10 LET A = [See instructions on p. 68 for method of insertion for A value] 15 DIM S(40,40), P(40) 25 LET W = A 110 F0R I = 1 T 0 A 120 READ P(I) 130 NEXT I 140 F0R 1 = 1 T 0 ( A - l ) 150 F0R J = l T 0 ( W - 1 ) 160 READ S(I,J) 170 NEXT J 175 LET W = W - 1 180 NEXT I 190 LET N = 0 196 LET W = A 200 F0R 1 = 1 T 0 ( A - l ) 210 PRINT 211 PRINT 220 LET N = N + 1 230 LET B = P(N) 240 LET T = N 250 F0R J = l T 0 ( W - l ) 270 LET T = T + 1 280 LET C = P(T) 310 LET V = INT(V + .5) 315 PRINT V; 320 NEXT J 325 LET W = W - 1 330 NEXT I 1000 END

If the user has program storage space available in a time-sharing system, this much should be stored under the name of his choosing. We have called it "FRF" in our system, because it is used to calculate the Faunal Resemblance Factors. The dif-ferent foimulas that have been used by previous authors to calculate these values require slightly different instructions for the computer. These are listed below along with the formula itself and name that has been given to it. If the user settles on a

specific formula that he wishes to use extensively, it should be saved with the rest of the program. This should not be done if he wishes to test his data against several or all of the formulas, which is done rapidly and easily by computer.

In all of the formulas given below, the following symbols are used:

C = Number of taxa common to two samples.

Ni = Number of taxa in smaller sample. N2 = Number of taxa in larger sample.

The sequence in which the formulas are arranged here is such that one can cut a single paper tape with the instructions for each formula and separate one from the next by a series of rubouts. Then the formulas are fed in one after the other, without the need of making any changes manually. I find it convenient to use a very low line number for an identification statement concerning the data being proc-essed. For example:

3 PRINT "DATA FROM SAVAGE, 1960, BAJA CALIFORNIA HERPS" If a similar identification line is included with each of the formulas on the master tape, the user will find that he need not write any further information on the an-swers printed out by the computer. I prefer to use the next sequential line for formula identification, as shown here:

4 PRINT "COEFFICIENT OF COMMUNITY'

COEFFICIENT OF COMMUNITY This formula was used by Hagmeier and Stults (1964) as the primary basis for the determination of the "Mammal Provinces" of North America. This is what Webb (1950) called the "Similarity Value." The formula is:

~ ^ — ~ X 100.

The computer instruction that needs to be

(3)

added to the program above to produce the solutions using this formula is:

300 LET V = S(I,J)/(B + C - S(I,J)) H00

BURT COEFFICIENT

This formula, developed by several ear-lier authors, and used by Burt (1958) for comparing the mammalian faunas of three continents, takes the average of the two samples as the denominator, in an attempt to reduce the effect of difference in size between them. This coefficient has also been called "Pirlot's Index." The formula is:

2C

X100.

The instruction to be added to the pro-gram is:

300 LET V = 2*S(I, J ) / ( B + C)*100 FIRST KULCZYNSKI COEFFICIENT This formula, presented by Kulczynski (1927:178) for the study of plant associa-tions, emphasizes more strongly than most the number of shared species, by subtract-ing twice that number from the total number of species. This can and does produce values greater than 100, when expressed as a percentage. The formula is:

-r= S — =

X

100.

The computer instruction is:

300 LET V = S(I, J ) / ( B + C - 2*S(I, J)) *100

SECOND KULCZYNSKI COEFFICIENT Hagmeier and Stults (1964:132) indi-cated that Kulczynski (1927) used the following formula for calculating floral relationships between areas, but it repre-sents interpretation and extrapolation on their part to arrive at that conclusion. Kulczynski actually used a similar formula (p. 180) from which the one below can be derived for calculating degree of relationship ("Verwandtschaftgrad (V)").

Kulczynski gives (p. 178) the formula called the First Kulczynski Coefficient above, but immediately points out that he considered it inaccurate ("ungenau"), since it is based on the assumption that all species are equally important. Kulczynski then constructed a measure of constancy ("Stetigkeit") based on the frequency with which a given species occurred in a given association. The frequencies were grouped, a completely arbitrary value assigned to each group (footnote, p. 179), and then these arbitrary values were used to cal-culate the degree of relationship. It is perhaps unfair to call this the Second Kulczynski Coefficient, since he composed it in order to prevent the error of giving all species equal weight. However, I maintain the connection here because it is identified with Kulczynski in Hagmeier and Stults. While the formula can be used for the calculation of faunal resem-blance, it should be kept in mind that Kulczynski did not mean for it to be used as it is here. The formula is:

2(N

1

xN

2

)

X l

°°"

The instruction to be added to the pro-gram is:

300 LET V = S(I, J)*(B + C)/(2*B*C) *100

OTSUKA COEFFICIENT

This formula was used by Ochiai (1957) for analysis of fish distribution in the region of Japan. Ochiai credited Otsuka as originator of the formula, which also rep-resents an attempt to minimize differences between sample size. However, its use has been minimal because a square root must be extracted in every pair calcula-tion. The computer eliminates this diffi-culty. The formula is:

X 100. \xN2

The instructions to be added to the pro-gram are:

(4)

BIOGEOGRAPHICAL RESEMBLANCE BETWEEN AREAS 295 LET D = B * C

300 LET V = S(I,J)/SQR(D)*10O SIMPSON COEFFICIENT

This formula was proposed by Simpson (1943) for use in comparison of mam-malian faunas. He reviewed this and other formulas in his 1960 paper. He continued to prefer this formula because it selected the smaller of the two samples for com-parison with the number of species found in common between the samples, and gives emphasis to the similarities. The formula is:

The instructions to be added to the pro-gram are as follows:

285 IF C > B THEN 300 290 LET V = S(IJ)/C*100 295 G0 T 0 310

300 LET V = S(I,J)/B*100 JACCARD COEFFICIENT

This formula was defined by Braun-Blanquet (1932:362), who attributed it to Jaccard. It is the same as that used in the Simpson Coefficient, except that the value of the larger sample is used as the denominator rather than that of the smaller sample. The formula is:

The instructions to be added to the pro-gram are: 285 IF C < B THEN 300 290 LET V = S(I,J)/C*100 295 G0 T 0 310 300 LET V = S(I,J)/B*100 RESEMBLANCE EQUATION

This formula was first used by Preston (1962) for calculating degree of similarity of several different plant and animal groups. It is a rather formidable equation, and involves the use of either logarithms or a table of "z" values in its original form. This cumbersome nature has left it

virtu-ally unused. Again, the computer program eliminates this difficulty, because it han-dles logarithms without intervention by the operator. The formula is:

The formula is solved for z, which is an expression of difference. The instructions to the computer are:

285 LET F = B/(B + C 287 LET G = C/(B + C -289 IF F = G THEN 302 291 IF F > G THEN 295 292 LET V = .4*F + .6*G 293 G 0 T 0 300 295 LET V = .6*F + .4*G 300 LET V = .4342945 * L0G (V) 301 G0 T 0 305 302 LET V = .4342945 * L0G (F) 305 LET V = -3.32*V*100

This program permits the calculation of the z value in three different ways, cor-responding to the alternatives given by Preston (1962:418). If the two fractions are equal, the formula "z = -3.32 LOG x" is used. If one fraction is larger than the other, the computer selects one of two alternative routes to permit the bias "in favor of the larger value" that Preston recommends, and calculates the formula "z = -3.32 LOG (0.6x + 0.4t/)." Preston said that this equation was valid "at least as far from equality as x = 2y." Since he gave no alternative equation for values higher than x = 2t/, however, this program will compute such values as any other. The user should keep in mind, then, that a number of species for one locality more than twice as large than the number for a second locality may result in values of questionable validity, according to Preston. Hagmeier and Stults, (1964:131) con-verted Preston's z value to a "Coefficient of Association," which they expressed "in the form of similarity (S):"

This formula is solved by adding the

(5)

LINE"! 900 _p CALIFORNIA DESERT COLORADO DESERT PENINSULAR DESERT SAN LUCAN DESERT CALIFORNIA DESERT 4 4 63 42 COLORADO DESERT \32\^ 66 56 PENINSULAR DESERT LINE 901 20 '21 81 LINE \ 9 0 2 \ SAN LUCAN DESERT 1 9 ' 18' LINE 903

FIG. 1.—The distribution of reptile species in Baja California, from Savage (1960). The diagonal from upper left to lower right shows the number of species occurring in each region. The remaining squares show the number of species shared between the two regions named. The line numbers corre-spond to those used in the instructions for insertion of data.

following line to the instructions for the Resemblance Equation:

309 LET V = 100-V

COEFFICIENT OF DIFFERENCE

Used by Savage (1960:197) in an evalu-ation of the herpetofauna of Baja Califor-nia, this is the same measure as the Jaccard Coefficient except that it is a measure of difference between two samples rather than of similarity. It can be found by subtract-ing the Jaccard Coefficient from one, as seen in the following formula:

- — xlOO.

The instructions to be added to the pro-gram include those for the Jaccard Coeffi-cient and the following:

305 LET V = 100-V

ENTRY OF DATA INTO PROGRAM The data are entered into the program beginning in line 900, and this line should

contain the actual counts of taxa known from each locality to be tested (see Figure 1). In the next data line available (usually 901, but not always) the values for the number of taxa shared between the first locality and each of the other localities should be entered in the same sequence. The next data line should begin with the number of taxa shared between the second locality and each of the remaining locali-ties (but not the first, since that has already been recorded). The final data line should include only one value, that of the shared species between the next to last and the last locality. A sample series is shown below, taken from Figure 1.

900 DATA 50, 32, 32, 45, 901 DATA 14, 20, 19, 902 DATA 21, 18, 903 DATA 26.

In line 900, there should be as many entries as there are localities. In this example, there are four. The final step before telling the computer to run the

(6)

data is to insert this count as the A value in line 10 as follows:

10 LET A = 4

FINAL READOUT OF ANSWERS

In this program, all of the values for each locality in a table or matrix, as in Fig. 1, are calculated before going to the next locality. This results, therefore, in one less calculation each time, until there is only one for the next to last locality and none for the last, since it has already been compared with all the others. The readout shows this by a shrinking series of values. A readout on the calculation of the values for the matrix in Fig. 1 looks exactly like this:

DATA FROM SAVAGE, 1960 SIMPSON'S COEFFICIENT 44 63 42 66 56

81

The first line of values are for the first locality, the California Desert, and are in the same sequence as in the figure. They can be filled in directly in the vertical column under the name California Desert. The second row of figures are for the Colorado Desert and the remaining two localities. The third line, consisting of a single value, is for the comparison between the Peninsular and San Lucan deserts. Thus, the calculations fill in the lower half of the matrix, and the coefficient for any pair of localities can be found in the proper intersecting spaces.

ADDENDUM

Users of this program should note that the double column format of Systematic

Zoology results in some of the "statement lines" being printed on two lines, rather than one. In every case, however, these are single line instructions, and should be written that way, or the computer will not accept them.

LITERATURE CITED

BRAUN-BLANQUET, J. (transl. by G. D. Fuller

and H. S. Conrad). 1932. Plant sociology: the study of plant communities. McGraw-Hill, New York.

BURT, W. H. 1958. The history and affinities

of the recent land mammals of western North America. In C. L. Hubbs [ed.], Zoogeography. Publ. A.A.A.S., 51:141-154.

HACMEIER, E. M., AND C. D. STULTS. 1964. A

numerical analysis of the distributional patterns of North American mammals. Syst. Zool., 13: 125-155.

KULCZYNSKI, S. 1927. Zespofy roslin w Pieni-nach.—Die Pflanzenassoziationen der Pieninen. Bull. Internat. Acad. Polon. Sci. Lett., 1927, suppl. 2:57-203.

OCHIAI, A. 1957. Zoogeographical studies on

the Soleoid fishes found in Japan and its neighbouring regions—II. Bull. Japan. Soc. Sci. Fisheries, 22:526-530.

PETERS, J. A. [in press]. The role of multi-access computing in systematics. Proc. Vol., Systematics Conference, Ann Arbor, Mich., 1967.

PRESTON, F. W. 1962. The canonical

distribu-tion of commonness and rarity: part II. Ecology, 43:410-432.

SAVAGE, J. M. 1960. Evolution of a peninsular

herpetofauna. Syst. Zool., 9:184-212.

SIMPSON, G. G. 1943. Mammals and the

na-ture of continents. Amer. J. Sci., 241:137-163.

SIMPSON, G. G. 1960. Notes on the

measure-ment of faunal resemblance. Amer. J. Sci., 258a:300-311.

WEBB, W. L. 1950. Biogeographic regions of

Texas and Oklahoma. Ecology, 31:426-433. U.S. National Museum, Smithsonian In-stitution, Washington, D.C. 20560.