Maximum Likelihood Estimation of Correlation Between Variates Having Equal Coefficients of Variation*

(1)

TECHNOMETRIC& VOL. 15, No. 3 AUGUST 1973

Maximum Likelihood

Estimation of Correlation

Between Variates Having Equal Coefficients

of Variation*

S. P.

AZEN

Departments of Community Medicine and Biomedical Engineering University

of Southern

California

Los Angeles, California

A. H.

REED BioScience Laboratories

Van Nuys, California

For a sample of n pairs of observations from a bivariate normal distribution in which X and Y have the same coefficient of variation c, we derive the equations for the m.1. estimator /i of p. The efficiency of the Pearson correlation coefficient r relative to fi is studied asymptotically and for small samples. It is shown for certain values of c and p that T is inefficient. An application is described for choosing the optimum standard in the clinical chemistry laboratory.

KEY WORDS

Maximum Likelihood Estimates BAN Estimates

Correlation Coefficient Coefficient of Variation Scoring

1. INTRODUCTION

In the clinical chemistry laboratory, determination of the concentration of a chemical constituent in a patient’s serum (e.g., cholesterol or uric acid) may be measured indirectly by comparing absorbance of light through the suitably prepared patient’s sample with that of a standard solution. Processing is done in batches. A batch consists of one or more control samples (typically, one or more aliquots from a thoroughly mixed pool of serum from many healthy persons) and unknown serum specimens from lo-30 different patients. Control samples are used for quality control and for standardization of the absorbance value of the constituent.

For this latter purpose, the laboratory may choose as a standard value, either the average absorbance values from controls for a given batch, or the average of all of the absorbance values of controls from the same pool for all batches, past and present. Recently, it was shown [l] that the correct choice of standard absorbance depends on a correlation coefficient p determined from the analysis of

* Research partially supported by the National Institutes of Health under Grant No. GM 16197-04.

Received Feb. 1972; revised Feb. 1972

(2)

458 S. I’. AZEN AND A. H. REED

aliquots from two pools of serum. This correlation may be estimated by the usual Pearson product-moment correlation coefficient r. However, this estimate does not utilize the laboratory properties that a) the coefficient of variation c of control sample absorbance values in successive batches is constant over time for each pool, and b) for all practical purposes, c is known after a sufficient number of batches has been analyzed.

In this paper we investigate the efficiency of r relative to the maximum likelihood (m.1.) estimator p^ of p obtained when the additional structure is imposed. Thus, we assume that we have a sample (x1 , yl), (zz , yZ>, + . . , (2, , yn) from a bivariate normal distribution with means 0 > 0 and 4 > 0, standard deviations c0 and c+, and correlation p. We derive the m.1. equations for 0, $, and p, and calculate the asymptotic and small sample efficiencies of the usual estimators f., $j and r relative to the ml. estimators I?,&, and 8: Note that under the additional imposed structure, 2, fl and r are no longer ml. estimators.

2. THE MAXIMUM LIKELIHOOD ESTIMATORS

Calculating the partial derivatives of the log-likelihood function 1, we obtain for jpj # 1, the three efficient scores

and al np ep=rpz - +p5)I _II q2. - e)” -+g- + VYi - d2 C242

1

2 + (: _+

_{p,2 C}

Gi -

O>(Yi - 4) 284

I.

(3)

Since these functions are nonlinear in the parameters, it is necessary to use an iterative procedure to obtain the m.1. estimates. Using the scoring procedure [2] for this purpose, we calculate the information matrix

I-+

P

I

P2 - 22- c-2 PO + C2P> e c$-- p(l + c2p) p2 - 2 - c-2 CTg-- ₊₂ $ k? -(P” + 1) I e s -l--pr (4)

Letting 8, , d;,, , and go be initial estimates of the parameters, then for j = 1, 2, . . . , successive estimates 8, , $; , and sj may be obtained from the linear recursive equations

k, -$ + k, $ -I- k, -$

)I

k,$+k,$+k,$

(3)

MAXIMUM LlKELlHOOD ESTIMATION k, 2 + k, -$ + k, $ 459 (7) where k = -c’(p + l)(p2 + 2c2p + 2c2 + l)-’ k, = - 02(2c2 + p2 + l)(p + 2c* + l)-’ k, = -+ep(p2 + 2pc2 + l)(p + 2c2 + 1>-’ k, = - Bp(1 - p”) k, = +(2c* + p* + l)(p + 2c2 + l)-’ k, = -+PU - P’> and k, = (2pzc2 - 2c2 + p - l)(l - p’)c-‘.

In equations 5-7 terms in each bracket are evaluated at 0 = 8,-, , 4 = +i-, and p = #i-l , This includes the coefficients k, k, , . . . , k, of the efficient scores which are elements of the inverse information matrix. If 8, , & , and p^,, are consistent estimators (e.g., 2, g, and r), then 8; , & , and fil are best asymptotically normal (BAN) estimators [3]. The estimates, obtained by iteratively solving Eqs. 5-7 until convergence, are the numerical m.1. estimates 8, & and 3.

3. ASYMPTOTIC RELATIVE EFFICIENCIES

Since the asymptotic variances of 2/;;: 8, 6 6, and z/n $ are kk, , kk, , and kk, , respectively, the asymptotic efficiencies of 2, 8, and r relative to 8, 4, and B are given by kk,/(co)2, kk,/(c+)‘, and kk,/(l - p’)‘, respectively. Simplifying these expressions, we obtain

Asym. Eff. (2) = Asym. Eff. ($) = (1 + k,)-’ (8) and

Asym. Eff. (r) = (1 + kg)-’ ₍₉₎

where k, = [4c2p2 + 2c2(p + 1)(2c2 + 1)][(2c” + p2 + l)(p + l)]-’ and k, = p2[2c2(p + 1) + 11-l. Note that both expressions are functions only of c and p.

Table 1 presents these efficiencies for representative values of p and for c = .Ol, .l, 1. and 10. Note that r is clearly less efficient (<81oj,) than $ for IpI 1 .5 and c 5 .l. On the other hand, r is almost as efficient (>95%) as $ for c = 10. Finally, note that when r is almost efficient, z and g are inefficient, and conversely.

4. SMALL-SAMPLE RELATIVE EFFICIENCIES

A FORTRAN-IV Program was written to calculate the small-sample relative efficiencies of the usual estimates (2, $j, and r), and the BAN estimates (8, , & , and $J relative to the m.1. estimates (8, 4, and 8). Five hundred samples of size n were generated from the above bivariate distribution. For each sample, the usual estimates, the BAN estimates, and the numerical ml. estimates were calculated (the convergence criterion was I(& - t”i-l)t^T!,] < .OOl for each parameter t = 0, 4, or p). For each sample, parameter and estimate, the mean square error was

(4)

440 S. P. AZEN AND A. H. REED

TABLE 1

Asymptotic EjGncies of

2, %, and r Relative

to

8, 4, and fi

for Values of p and c,

C \ P -. 9 -.

7

-.

5

-.

3

-.

1 .1

.3

.5

.7

.9

.Ol

z,y

r

0.998

0.552

0.999

0.671

0.999

0.800

1.000

0.917

1.000

0.990

1.000

0.990

1.000

0.917

1.000 0.8Op

1.000

0.671

1.000

0.552 .l

1.0 L Y

r

X,7

r

0.842

0.553

0.090

0.597

0.946

0.672

0.218

0.766

0.969

0.802

0.289

0.889

0.978

0.918

0.322

0.964

0.980

0.990

0.332

0.996

0.980

0.990

0.333

0.997

0.980

0.919

0.330

0.976

0.979

0.805

0.328

0.941

0.979

0.678

0.328

0.900

0.979

0.562

0.330

0.856

10.0 X,y

r

0.005

0.963

0.005

0.992

0.005

0.998

0.005

0.999

0.005

1.000

0.005

1.000

0.005

1.000

0.005

1.000

0.005

1.000

0.005

1.000

calculated. The small-sample efficiencies of the

usual estimates

and the BAN estimates were defined as the inverse of the ratio of the corresponding mean square error to that of the m.1. estimates.

Table 2 presents the small-sample efficiencies of r and & relative to fi for the case 0 = .5, 9 = .5, and c = .Ol. The values of 0 and 9 were arbitrary, but the value of c was chosen because r is asymptotically least efficient for small values of c, and c is invariably less than 1 in the actual laboratory situation. From Table 2, we note that 1) p^ is less efficient than r when a) n = 5 and JpJ _<

.7,

b) n = 10

and

Ip( 5 .5, c) n = 25 and (p( < .3; 2) when r is less efficient than ,6, then fil is more efficient than r; 3) when n = 25, the small-sample efficiency of r approximates the asymptotic efficiency of r (see appropriate column of Table 1).

The small-sample efficiencies of 2, g, 8, , and & relative to 8 and 4 are also approximately equal to 1 when c = .Ol, i3 = .5, 4 = .5 for all values of n and p ex- amined.

5. AN EXAMPLE

Table 3 shows absorbance values of 3 substances analyzed in 19 runs of a laboratory test for serum concentration level of an enzyme, leucine amino peptidase. The standard, P-naphthylamine, has a known concentration. When the test is carried out, the absorbance of the enzyme product is related to that of the standard. Controls 1 and 2 are two control samples from the same pool analyzed in the

(5)

MAXIMUM LIKELIHOOD ESTIMATION ₄₆₁

of variation (the subscripts 1, 2, 3 refer to columns in the table). Note that all three coefficients of variation are approximately equal to c = .036.

The Pearson correlation coefficients and their estimated asymptotic standard deviations are r12 = .314 f .207, r13 = ,375 f .197, and rZ3 = .749 f .lOl. These estimates and the means in Table 3 were used to obtain the BAN estimates and the m.1. estimates. The BAN estimates and their estimated asymptotic standard deviations are plCljlZ = .322 f .196, a(l)I3 = .394 f .180, and JC1j23 = .759 f .078, respectively. The m.1. estimates and estimated asymptotic standard deviations are fi12 = .323 f .196, p113 = .395 f .180, and fiZ3 = .759 f .078, respectively. Note that the estimated asymptotic standard deviations are least for the BAN and m.1. estimates when compared to the corresponding Pearson estimates. Further, since m.1. estimates are more efficient for IpI 1 .5, the change in standard deviation

is greatest for pZ3 when comparing r with 8.

6. CONCLUSIONS AND RECOMMENDATIONS

From these and other tables not presented here, we recommend for samples of size n = 25 or larger that j& or fi be used if c < 1 and it is suspected that IpI 2 .5. This is because, asymptotically, r is at most 80% as efficient as fi for IpI 2 .5. If n is approximately equal to 10, we recommend using #1 or 8 if IpI 2 .7. On the

TABLET

Small-Sample E&iencies of r and p1 Relative to p for the Case 0 = 5, $Z = 5, c = .01.

n

\

P

-. 9

-. 7

-. 5

-. 3

-. 1

.l

.3

.5

.7

.9

5 h

r

Pl

0.17

0.47

1.06

1.11

1.32

1.23

1.40

1.31

1.30

1.28

1.35

1.28

1.38

1.28

1.39

1.24

1.00

1.06

0.17

0.32

10 A

r

p1

0.23

0.75

0.64

0.84

1.27

1.25

1.39

1.30

1.42

1.31

1.34

1.28

1.41

1.28

1.17

1.14

0.64

0.83

0.39

0.82

25 h

r

Pl

0.46

0.93

0.53

0.84

0.89

0.98

1.07

1.11

1.20

1.16

1.14

1.09

1.08

0.88

1.00

0.55

0.85

0.46

0.98

(6)

462 S. P. AZEN AND A. H. REED

TABLE 3

Absorbance Values of Three Substances in a Chemical Assay for Leucine Amino Peptidase.

(1) (2) (3) (1) (2) (3)

Run Standard Control Control Run Standard Control Control

1 2 1 2 1 126 73 72 11 122 72 72 2 124 68 66 12 124 70 72 3 116 73 68 13 124 71 71 4 118 69 69 14 124 71 69 5 115 63 66 15 118 73 71 6 116 71 71 16 115 66 63 7 118 73 71 17 121 70 71 8 111 71 71 18 127 73 72 9 121 70 70 19 122 70 69 10 122 71 71 yl = 120.2 x 2 = 70.4 x 3 = 69.7 s1 = 4.33 s2 = 2.59 s3 = 2.47 c, = .0360 c2 = .0368 c3 = .0354

other hand, for samples as small as n = 5, we recommend using Pearson’s r. Finally, for c > 1 we recommend Pearson’s r.

7. ACKNOWLEDGMENT

The authors wish to thank Gau Tzu Wu of Bio-Science Laboratories whose energy contributed greatly to this effort.

REFERENCES

[l] REED, A. H. and HENRY, R. J. (1973). “Accuracy, precision, quality control and miscellaneous statistics,” Clinical Chemistry: Principles and Technics, 2nd Ed., D. C. Cannon et al., Harper and Row, New York.

121 RAO, C. R. (1965). Linear Statistical Inference and Its Applications, John Wiley and Sons, Inc., New York.

[3] FERGUSON, T. (1958). “A method for generating best asymptotically normal estimates with application to the estimation of bacterial densities,” Ann. Math. Statist. 29, 1046-62.