TECHNOMETRIC& VOL. 15, No. 3 AUGUST 1973
Maximum Likelihood
Estimation of Correlation
Between Variates Having Equal Coefficients
of Variation*
S. P.
AZENDepartments of Community Medicine and Biomedical Engineering University
of Southern
CaliforniaLos Angeles, California
A. H.
REED BioScience LaboratoriesVan Nuys, California
For a sample of n pairs of observations from a bivariate normal distribution in which X and Y have the same coefficient of variation c, we derive the equations for the m.1. estimator /i of p. The efficiency of the Pearson correlation coefficient r relative to fi is studied asymptotically and for small samples. It is shown for certain values of c and p that T is inefficient. An application is described for choosing the optimum standard in the clinical chemistry laboratory.
KEY WORDS
Maximum Likelihood Estimates BAN Estimates
Correlation Coefficient Coefficient of Variation Scoring
1. INTRODUCTION
In the clinical chemistry laboratory, determination of the concentration of a chemical constituent in a patient’s serum (e.g., cholesterol or uric acid) may be measured indirectly by comparing absorbance of light through the suitably prepared patient’s sample with that of a standard solution. Processing is done in batches. A batch consists of one or more control samples (typically, one or more aliquots from a thoroughly mixed pool of serum from many healthy persons) and unknown serum specimens from lo-30 different patients. Control samples are used for quality control and for standardization of the absorbance value of the constituent.
For this latter purpose, the laboratory may choose as a standard value, either the average absorbance values from controls for a given batch, or the average of all of the absorbance values of controls from the same pool for all batches, past and present. Recently, it was shown [l] that the correct choice of standard ab- sorbance depends on a correlation coefficient p determined from the analysis of
* Research partially supported by the National Institutes of Health under Grant No. GM 16197-04.
Received Feb. 1972; revised Feb. 1972
458 S. I’. AZEN AND A. H. REED
aliquots from two pools of serum. This correlation may be estimated by the usual Pearson product-moment correlation coefficient r. However, this estimate does not utilize the laboratory properties that a) the coefficient of variation c of control sample absorbance values in successive batches is constant over time for each pool, and b) for all practical purposes, c is known after a sufficient number of batches has been analyzed.
In this paper we investigate the efficiency of r relative to the maximum likelihood (m.1.) estimator p^ of p obtained when the additional structure is imposed. Thus, we assume that we have a sample (x1 , yl), (zz , yZ>, + . . , (2, , yn) from a bivariate normal distribution with means 0 > 0 and 4 > 0, standard deviations c0 and c+, and correlation p. We derive the m.1. equations for 0, $, and p, and calculate the asymptotic and small sample efficiencies of the usual estimators f., $j and r relative to the ml. estimators I?,&, and 8: Note that under the additional imposed structure, 2, fl and r are no longer ml. estimators.
2. THE MAXIMUM LIKELIHOOD ESTIMATORS
Calculating the partial derivatives of the log-likelihood function 1, we obtain for jpj # 1, the three efficient scores
and al np ep=rpz - +p5)I II q2. - e)” -+g- + VYi - d2 C242
1
2
+ (: _+
p,2 C
Gi -
O>(Yi - 4) 284I.
(3)Since these functions are nonlinear in the parameters, it is necessary to use an iterative procedure to obtain the m.1. estimates. Using the scoring procedure [2] for this purpose, we calculate the information matrix
I-+
PI
P2 - 22- c-2 PO + C2P> e c$-- p(l + c2p) p2 - 2 - c-2 CTg-- +2 $ k? -(P” + 1) I e s -l--pr (4)Letting 8, , d;,, , and go be initial estimates of the parameters, then for j = 1, 2, . . . , successive estimates 8, , $; , and sj may be obtained from the linear recursive equations
k, -$ + k, $ -I- k, -$
)I
k,$+k,$+k,$MAXIMUM LlKELlHOOD ESTIMATION k, 2 + k, -$ + k, $ 459 (7) where k = -c’(p + l)(p2 + 2c2p + 2c2 + l)-’ k, = - 02(2c2 + p2 + l)(p + 2c* + l)-’ k, = -+ep(p2 + 2pc2 + l)(p + 2c2 + 1>-’ k, = - Bp(1 - p”) k, = +(2c* + p* + l)(p + 2c2 + l)-’ k, = -+PU - P’> and k, = (2pzc2 - 2c2 + p - l)(l - p’)c-‘.
In equations 5-7 terms in each bracket are evaluated at 0 = 8,-, , 4 = +i-, and p = #i-l , This includes the coefficients k, k, , . . . , k, of the efficient scores which are elements of the inverse information matrix. If 8, , & , and p^,, are consistent estimators (e.g., 2, g, and r), then 8; , & , and fil are best asymptotically normal (BAN) estimators [3]. The estimates, obtained by iteratively solving Eqs. 5-7 until convergence, are the numerical m.1. estimates 8, & and 3.
3. ASYMPTOTIC RELATIVE EFFICIENCIES
Since the asymptotic variances of 2/;;: 8, 6 6, and z/n $ are kk, , kk, , and kk, , respectively, the asymptotic efficiencies of 2, 8, and r relative to 8, 4, and B are given by kk,/(co)2, kk,/(c+)‘, and kk,/(l - p’)‘, respectively. Simplifying these expressions, we obtain
Asym. Eff. (2) = Asym. Eff. ($) = (1 + k,)-’ (8) and
Asym. Eff. (r) = (1 + kg)-’ (9)
where k, = [4c2p2 + 2c2(p + 1)(2c2 + 1)][(2c” + p2 + l)(p + l)]-’ and k, = p2[2c2(p + 1) + 11-l. Note that both expressions are functions only of c and p.
Table 1 presents these efficiencies for representative values of p and for c = .Ol, .l, 1. and 10. Note that r is clearly less efficient (<81oj,) than $ for IpI 1 .5 and c 5 .l. On the other hand, r is almost as efficient (>95%) as $ for c = 10. Finally, note that when r is almost efficient, z and g are inefficient, and conversely.
4. SMALL-SAMPLE RELATIVE EFFICIENCIES
A FORTRAN-IV Program was written to calculate the small-sample relative efficiencies of the usual estimates (2, $j, and r), and the BAN estimates (8, , & , and $J relative to the m.1. estimates (8, 4, and 8). Five hundred samples of size n were generated from the above bivariate distribution. For each sample, the usual estimates, the BAN estimates, and the numerical ml. estimates were calculated (the convergence criterion was I(& - t”i-l)t^T!,] < .OOl for each parameter t = 0, 4, or p). For each sample, parameter and estimate, the mean square error was
440 S. P. AZEN AND A. H. REED
TABLE 1
Asymptotic EjGncies of
2, %, and r Relative
to8, 4, and fi
for Values of p and c,C \ P -. 9 -.
7
-.5
-.3
-.1
.1
.3
.5
.7
.9.Ol
z,y
r
0.998
0.552
0.999
0.671
0.999
0.800
1.000
0.917
1.000
0.990
1.000
0.990
1.000
0.917
1.000
0.8Op
1.000
0.671
1.000
0.552
.l
1.0
L Y
r
X,7
r
0.842
0.553
0.090
0.597
0.946
0.672
0.218
0.766
0.969
0.802
0.289
0.889
0.978
0.918
0.322
0.964
0.980
0.990
0.332
0.996
0.980
0.990
0.333
0.997
0.980
0.919
0.330
0.976
0.979
0.805
0.328
0.941
0.979
0.678
0.328
0.900
0.979
0.562
0.330
0.856
10.0
X,y
r
0.005
0.963
0.005
0.992
0.005
0.998
0.005
0.999
0.005
1.000
0.005
1.000
0.005
1.000
0.005
1.000
0.005
1.000
0.005
1.000
calculated. The small-sample efficiencies of the
usual estimates
and the BAN estimates were defined as the inverse of the ratio of the corresponding mean square error to that of the m.1. estimates.Table 2 presents the small-sample efficiencies of r and & relative to fi for the case 0 = .5, 9 = .5, and c = .Ol. The values of 0 and 9 were arbitrary, but the value of c was chosen because r is asymptotically least efficient for small values of c, and c is invariably less than 1 in the actual laboratory situation. From Table 2, we note that 1) p^ is less efficient than r when a) n = 5 and JpJ _<
.7,
b) n = 10and
Ip( 5 .5, c) n = 25 and (p( < .3; 2) when r is less efficient than ,6, then fil is more efficient than r; 3) when n = 25, the small-sample efficiency of r approximates the asymptotic efficiency of r (see appropriate column of Table 1).The small-sample efficiencies of 2, g, 8, , and & relative to 8 and 4 are also ap- proximately equal to 1 when c = .Ol, i3 = .5, 4 = .5 for all values of n and p ex- amined.
5. AN EXAMPLE
Table 3 shows absorbance values of 3 substances analyzed in 19 runs of a labo- ratory test for serum concentration level of an enzyme, leucine amino peptidase. The standard, P-naphthylamine, has a known concentration. When the test is carried out, the absorbance of the enzyme product is related to that of the standard. Controls 1 and 2 are two control samples from the same pool analyzed in the
MAXIMUM LIKELIHOOD ESTIMATION 461
of variation (the subscripts 1, 2, 3 refer to columns in the table). Note that all three coefficients of variation are approximately equal to c = .036.
The Pearson correlation coefficients and their estimated asymptotic standard deviations are r12 = .314 f .207, r13 = ,375 f .197, and rZ3 = .749 f .lOl. These estimates and the means in Table 3 were used to obtain the BAN estimates and the m.1. estimates. The BAN estimates and their estimated asymptotic standard deviations are plCljlZ = .322 f .196, a(l)I3 = .394 f .180, and JC1j23 = .759 f .078, respectively. The m.1. estimates and estimated asymptotic standard deviations are fi12 = .323 f .196, p113 = .395 f .180, and fiZ3 = .759 f .078, respectively. Note that the estimated asymptotic standard deviations are least for the BAN and m.1. estimates when compared to the corresponding Pearson estimates. Further, since m.1. estimates are more efficient for IpI 1 .5, the change in standard deviation
is greatest for pZ3 when comparing r with 8.
6. CONCLUSIONS AND RECOMMENDATIONS
From these and other tables not presented here, we recommend for samples of size n = 25 or larger that j& or fi be used if c < 1 and it is suspected that IpI 2 .5. This is because, asymptotically, r is at most 80% as efficient as fi for IpI 2 .5. If n is approximately equal to 10, we recommend using #1 or 8 if IpI 2 .7. On the
TABLET
Small-Sample E&iencies of r and p1 Relative to p for the Case 0 = 5, $Z = 5, c = .01.
n
\P
-. 9
-. 7
-. 5
-. 3
-. 1
.l
.3
.5
.7
.9
5
h
r
Pl
0.17
0.47
1.06
1.11
1.32
1.23
1.40
1.31
1.30
1.28
1.35
1.28
1.38
1.28
1.39
1.24
1.00
1.06
0.17
0.32
10
A
r
p1
0.23
0.75
0.64
0.84
1.27
1.25
1.39
1.30
1.42
1.31
1.34
1.28
1.41
1.28
1.17
1.14
0.64
0.83
0.39
0.82
25
h
r
Pl
0.46
0.93
0.53
0.84
0.89
0.98
1.07
1.11
1.20
1.16
1.16
1.14
1.09
1.08
0.88
1.00
0.55
0.85
0.46
0.98
462 S. P. AZEN AND A. H. REED
TABLE 3
Absorbance Values of Three Substances in a Chemical Assay for Leucine Amino Peptidase.
(1) (2) (3) (1) (2) (3)
Run Standard Control Control Run Standard Control Control
1 2 1 2 1 126 73 72 11 122 72 72 2 124 68 66 12 124 70 72 3 116 73 68 13 124 71 71 4 118 69 69 14 124 71 69 5 115 63 66 15 118 73 71 6 116 71 71 16 115 66 63 7 118 73 71 17 121 70 71 8 111 71 71 18 127 73 72 9 121 70 70 19 122 70 69 10 122 71 71 yl = 120.2 x 2 = 70.4 x 3 = 69.7 s1 = 4.33 s2 = 2.59 s3 = 2.47 c, = .0360 c2 = .0368 c3 = .0354
other hand, for samples as small as n = 5, we recommend using Pearson’s r. Finally, for c > 1 we recommend Pearson’s r.
7. ACKNOWLEDGMENT
The authors wish to thank Gau Tzu Wu of Bio-Science Laboratories whose energy contributed greatly to this effort.
REFERENCES
[l] REED, A. H. and HENRY, R. J. (1973). “Accuracy, precision, quality control and miscellaneous statistics,” Clinical Chemistry: Principles and Technics, 2nd Ed., D. C. Cannon et al., Harper and Row, New York.
121 RAO, C. R. (1965). Linear Statistical Inference and Its Applications, John Wiley and Sons, Inc., New York.
[3] FERGUSON, T. (1958). “A method for generating best asymptotically normal estimates with application to the estimation of bacterial densities,” Ann. Math. Statist. 29, 1046-62.