Statistical Precision in the Calibration and Use of Sorting Machines and Other Classifiers

(1)

TECHNOMETRICS 0, VOL. 24, NO. 2, MAY 1982

Statistical Precision in the Calibration and Use

of Sorting Machines and Other Classifiers

A. Grassia R. Sundberg

Division of Mathematics and Statistics _{Department of Mathematics} CSIRO, Private Mail Bag, P.O. The Royal Institute of Technology (KTH) Wembley, Western Australia 6014 _{S-l 00 44 Stockholm 70. Sweden}

Mechanical sorters with a nonnegligible probability of misclassification need to be calibrated. This article provides formulas for the statistical precision of class frequency estimates for lots (populations) of items, considering contributions of error from the calibration, from sampling the population, and from random misclassifications in the sorting of the sample.

KEY WORDS: Misclassification; Sampling; Calibration error.

1. INTRODUCTION

If the sorting made by a grader or some other mechanical sorter is compared with a careful manual sorting, deviations will usually appear due to errors in the mechanical sorting. The misclassification probabilities for the sorter can be estimated in a calibration experiment, by sorting a lot consisting of items already correctly classified. When the sorter has been calibrated it can then be used to estimate the class frequency distribution of a lot of unknown com- position by letting it sort a random sample of the population. The estimated class frequency distribution might for instance be used in the setting of a price for the lot.

As an illustration we consider sorting of apples according to size using a roll grader. In a calibration experiment performed by B. D. Richardson at the Grove Research Farm in Tasmania, 626 hand-sorted and size-marked apples were passed through a roll grader six times. The roll grader had been set to what was thought to be the optimum adjustment. The outcome of the experiment is shown in Table 1. We see that most apples are correctly classified, but also that the spread in classification for any given size is con- siderable.

The mechanical sorter is a classifying device, and more generally, the theory to be presented is poten- tially applicable to all sorts of classifiers. An example pointed out by a referee is when the classifier is a formula that classifies observations from a mixture of distributions into the components of the mixture; see Ode11 and Basu (1976, Sec. 4.1).

For simplicity we assume that the classification probability for an item depends only on the class to which the item belongs. Usually this assumption is not

quite satisfied because the probability could often be expected to vary between individual items of the same class. For instance, in the apple-sorting experiment just mentioned, the classification probability was expected to depend on form and size within size classes, and this opinion was supported by the results of a comparison of the six repeated sortings of the same lot. Actually the six outcomes were somewhat too equal to represent six independent replicates with unique multinomial probabilities for the different classes. However, provided that this intraclass vari- ation in individual classification probabilities is the same for the lot used in the calibration as for lots to be sorted using this calibration, and that the classification probabilities are interpreted as referring to a randomly selected item of the particular class, the theory we will present will remain valid. In particular this implies that when a grader is to be used for sorting of different varieties of apples, a separate calibration should be made for each variety. Some further consequences of the apple-sorting example are discussed in Section 7.

In the following sections we give formulas for the precision of a class frequency estimate for a lot of items, considering the contributions of random errors from the calibration, from the sampling, and from the misclassifications in the sorting of the sample from the lot. Corresponding situations in linear regression have been discussed by several authors. In particular we want to mention Williams (1959), who treats the case of simultaneous linear regression.

2. PRELIMINARIES

(2)

sorting the items independently, with (conditional) probability Pij for an item from class i to be sorted into class j. The expected behavior of the sorter is then given by

E(fj) = C Izi pij > j=l 1 ..., k (2.1)

L

where E( ) denotes expectation and fj is the total number of items sorted into class j. In row vector and matrix notations, (2.1) takes the shorter form

E(f) = nP. _(2.2)

In Markov chain language, P is the transition matrix. In Ode11 and Basu the transpose of P is called the confusion matrix. Throughout this article we assume that P is a nonsingular matrix. In many practical cases it will only show a moderate deviation from the unit matrix.

In the calibration experiment we know the correct class of each sorted item. Since the items of a particular class i are sorted according to a multinomial distribution with probabilities (Pi1, . . . , P,), the natural, and moreover maximum likelihood estimate of Pij is

Bij = observed relative frequency of items from class i being

sorted into class j. (2.3) This means that from a table of absolute frequencies, exemplified by Table 1, the Plij’s are obtained by simply dividing each row by the row total.

Now suppose that the sorter has been calibrated and that we have a lot of times of known size N, with unknown proportions 71i = NJN of items in the classes i = 1, . . . , k. We take a simple random sample of size II I N from the lot. Let the unknown random number of items of the sample belonging to class i be denoted by ni, i = 1, . . ., k, c ni = n. Sorting this sample by the sorter we observe the absolute fre- quenciesfj,j= l,..., k,xfj=n.

A simple and natural (except for not being adjusted to integer values) predictor of the vector n of numbers ni is obtained from the vector f by applying (2.2) in inverted form, with the estimate P of P obtained from the calibration. This gives

fi=f@-1. _(2.4)

The corresponding natural estimator of the vector R of proportions 71i of the lot is

it = (l/n)fP-‘. _(2.5) Equivalently the vector N of class sizes Ni = Nni is estimated by A = Nit (like ti, not adjusted to integer values). In the limiting case of an infinite population (N = co), (2.5) is the maximum likelihood estimator of the probability vector R.

3. PROPERTIES OF THE ESTIMATOR ji

The estimator rZ is a nonlinear function of its random components f and P, and we can only lin- earize and make approximate or asymptotic state- ments about the distribution of f, or its expectation vector and variance-covariance matrix. First we note that if the calibration is precise, so that P - P is small, a good and suitable approximation of P- ’ is provided by the linearization in terms of P - P,

p-1 -p-l g _{- P-i@ - P)P-‘.} _(3.1)

Inserting (3.1) for P-l in (2.5) and neglecting the prod- uct of the relatively small deviations P - P and

f - E(f), we obtain the linear approximation

f=n+;(f-E(f))P-‘-n(P-P)P-1. (3.2)

This approximation of ic is good, in the sense that the approximation error is of a smaller magnitude than the two random terms of (3.2) if these two terms are both small, and this holds with a high probability if the sample size n and the size of the calibration experiment are both large. In practice this is necessary anyway to obtain satisfactory precision in f.

From the approximation (3.2) it is possible to conclude that iz will be approximately normally distributed if both the sample size n is large enough (but not too close to the population size N) to make the first term approximately normally distributed, and the calibration experiment is large enough to make the second term approximately normally distributed. The approximating normal distribution is characterized by the first and second moments of (3.2). From the calculation rules for expectations and variance- covariance matrices, we find that (3.2) has the correct expectation II and the variance-covariance matrix

(P-‘)TIV((l/n)f) + V(rcP)]P-‘, (3.3) where V( . ) denotes the variance-covariance matrix and T denotes transpose. In (3.3) the total variance is decomposed in two components, the first relating to the use of the calibrated sorter, incorporating sampling and classification randomness but neglecting calibration errors, and the second expressing the contribution from the calibration errors. In Sections 4 and 5 we give explicit formulas for these two variance components.

(3)

CALIBRATION OF SORTING MACHINES 119 fl = 0 plays a similar role in the common “classical”

estimator 2. see for instance Williams (1969).

In practice, however, we desire a P close to diagonal and we are normally convinced, for example from previous experience or the design of the sorter, that the true P is far from singular. So if by an extremely unlucky chance (2.3) defined for us a singular or “almost singular” estimate P, we would not accept it uncritically but would undertake some special action to clarify the situation and would rather readjust or redesign the sorter than use a singular P. Hence (2.3) does not define a realistic estimate of P for use in it for all outcomes.

A simple, formal way to get rid of the statistical nuisance of a possibly singular or “almost singular” P is to redefine P when (2.3) yields an outcome “too close to singular” to be reasonable, “too close” defined by for instance Det(P) I E for some given (small) E. A simple redefinition is to put P for such outcomes to be the unit matrix or some other prechosen nonsingular matrix; the precise definition is not important, since E should be such that the event has an extremely small probability. With P modified in this way iz will have an expectation and a covariance matrix. Furthermore, if the calibration experiment is large, these character- istics will be practically independent of E, with the covariance matrix approximated by (3.3), within a wide range of E values: just not so big that the precise modification plays a nonnegligible role, and not so small that it allows a nonnegligible contribution from almost singular outcomes.

4. THE SAMPLING AND MISCLASSIFICATION

COMPONENT OF VARIANCE

To find the variance-covariance matrix V(f/n) in the sampling and misclassification component of (3.3), we first note that the distribution of n, the vector of unobserved numbers ni of items of the sample belonging to the different classes, c ni = n given, is multi- variate hypergeometric (n out of N). For definition and properties of this distribution, see for example Johnson and Kotz (1969). Next, the 1 ni items are sorted, with unobserved results (Xii, . . . , Xik), say, from sorting the ni items of class i, with independence for different i. Conditional on ni these vectors are multinomially distributed with probability vectors

(pi1, ...2 Pik). Finally, the total number of items sorted

into the different classes is observed,fj = Ci Xij. From the above relations the covariance matrix of the vector f is easily calculated, and we get the elements of V(f/n) as the variances

V

( t fi = t .$ Xi Pij (1 - Pij) 1 I 1

1 r k / k \21

+ i t1 - o)~~lniP~p (zlnipij) 1 t4.1)

and the covariances (h # j)

X [~l~iPihPij-(ii~iPik)(~l~iPij)],

(4.2)

where 1 - 19, with 6’ = (n - l)/(N - l), is the finite population correction factor.

Of particular interest are the special cases 0 = 1 and 0 = 0. The case 8 = 1 means that the whole population is sorted, so we have no sampling error. We concluded that the first terms in (4.1) and (4.2) represent the contribution due to random misclassification of the sample, and the remaining terms, with the factors 1 - 8, represent the additional statistical un- certainty when we extend the estimation from the sample to the population. The case 8 = 0, at the other extreme, corresponds to an infinite population size and is a suitable approximation when n/N is small, since it simplifies both (4.1) and (4.2).

As an example, in the apple-sorting at Grove Re- search Farm (see Sec. 1) two different sampling prin- ciples have been applied, corresponding to 0 = 0 about 3.

5. THE CALIBRATION COMPONENT

OF VARIANCE

I and

We now give an explicit expression for V(nP), characterizing the calibration component of (3.3). Let the rows of P be denoted by Pi, i = 1, . .., k. These rows are statistically independent and

7cP=

&pi,

i=l (5.1)

whence

V(lcP)

= i n”V(B,).

_(5.2)

i=l

Let noi denote the known number of items belonging to class i in the calibration experiment. Since these items are sorted according to a multinomial distribution with probability vector Pi, the covariance matrices V(bi) in (5.2) have the elements

V(Fij) = $ Pij(l - Pij) and

(4)

6. USE OF THE VARIANCE FORMULAS

The variance formula (3.3), combined with (4.1-4.2) and (5.2-5.4), represents the theoretical variance and contains the true parameter values. In practical use with a calibrated sorter we of course insert the current estimates of the unknown parameters rci and Pij to get an estimated variance. In planning the size of the calibration experiment or the sample size n, we may use plausible or common values of the parameters.

Normally the calibration of a mechanical sorter will be intended for long use of the sorter, so it will be reasonable to require a negligible calibration error. A statistical criterion for this is obtained by specifying a maximum value of the calibration component relative to the sampling and misclassification component inside the brackets of (3.3). More precisely, we might calculate the variances V(fj/n) and the variance of the components of r&j, and choose nol, . . ., nok in the calibration large enough to make the latter variances at most equal to a specified fraction, for example, $ of the corresponding former ones.

In the use of the calibrated sorter we might be primarily interested in some linear form ncT = 1 ci rci, where cl, . . . . ck are known numbers representing, for instance, price settings or quality scores. As a measure of the precision of the estimator fcT we have the variance

V(ik’) = cv@)c=, _(6.1) where I’(%) is given by (3.3), with the results of Sec- tions 4 and 5 inserted.

7. EXAMPLE

We use data from experiments by Richardson on apple sorting to illustrate the results of the preceding sections. As already mentioned in the Introduction, Table 1 shows the result of a calibration of a grader to

Table 1. Calibration Experiment: Number of

Apples in a True Size x Grader Size Cross - Classification and True Size Proportions

be used for sorting. The 3,755 data come from six repeated sortings of a lot of 626 size-marked apples. The resulting estimated transition matrix P^ and its inverse are shown in Tables 2 and 3. To determine the statistical precision of the estimate Ij is not quite straightforward in this case, however. The classification probabilities are individually different rather than exactly equal for all items in a class and thus six repeated sortings of 626 are not statistically equival- ent to one sorting of 6 x 626 different items. Fur- thermore we should remember that the classification probabilities now must refer to a randomly chosen individual of a particular class.

The results of the repeated sortings were separately available and a comparison (using a x2 test statistic) not only revealed a tendency of individuality, but also indicated that the statistical information about P was the same as in one sorting of a lot of apples of lot size about three times the actual lot size 626. So when calculating the values of the calibration component of variance we have used the lot size 3 x 626.

By using the formulas in Section 5 the calibration component of variance was estimated, and the result is given in Table 4.

Comparing Table 4 with the misclassification and sampling components (see below), we conclude that the calibration component of variance is of little im- portance, and hence a precise value of it is not quite needed.

Table 2. Transition Matrix @

0.846 0.154 0.000 0.000 0.259 0.706 0.035 0.000 0.000 0.257 0.542 0.201 0.000 0.000 0.223 0.733 0.000 0.000 0.000 0.390

Table 3. Transition Matrix inverse p-’

! -0.4783 -0.0813 0.2569 0.0520 1.2691 -0.8393 -0.1698 -0.2844 0.2655 1.5624 -0.1143 -0.6807 0.0208 2.1517 0.4352 -0.0059 -1.0311 -0.6136 0.0326 1.6128 0.0004 -0.0024 0.0443 -0.1163 1.7137

I

0.000 0.000 0.000 0.044 0.610 1

Table 4. Calibration Component - of - Variance

(5)

CALIBRATION OF SORTING MACHINES 121 In one experiment the calibrated grader was used to

sort a random sample of size n = 255 from the lot (4.1) and (4.2) we now have 8 = 254/625 = .41. Tables 6 and 7 give the misclassification and sampling components of variance, respectively. The standard errors were computed by adding the three matrices of Tables 4, 6, and 7, and taking the square roots of the variances, neglecting the correlations. From Table 5 we can see that the estimated class proportions (n^i) based on the sample of 255 apples are close to the true used for the calibration. The relative frequencies fi/n obtained are shown in Table 5, together with the estimated class proportions pi, their individual standard errors, and the true values 71i. In fact the same items had been used in the calibration, but we will neglect that fact and instead assume that we want to make inference about the population being the lot of 626 items. Then we can check with the true values, since these were known. In the application of formulas

proportions. The standard errors appear to be of the right size when compared with those arising from straight binomial sampling, that is, ni (1 - 71i)/255.

Table 5. Relative Class Frequencies and Standard

Errors

c1a**

Grader HWId(*] True E*timated S.E. Gl, sorting sorting ill Gi lrl=255) 1 0.204 0.204 0.165 0.165 0.162 0.162 0.156 0.156 0.038 0.038 2 2 o.*w o.*w 0.282 0.282 0.262 0.262 0.262 0.262 0.057 0.057 3 0.216 0.216 0.282 0.282 0.303 0.303 0.304 0.304 0.065 0.065 4 0.227 0.227 0.200 0.200 0.197 0.197 0.186 0.186 0.053 0.053 5 5 0.055 0.055 0.071 0.071 0.076 0.076 0.076 0.076 0.024 0.024 I I I I I

Table 6. Misclassification Component - of -

Variance Matrix ( no sampling, 0 = 1) 0.9797 -1.3386 2.4633 1O-3 L 0.6098 -1.9271 3.2384 -0.2903 0.9297 -2.2371 2.1767 0.0394 -0.1274 0.3160 -0.5790 0.3510 J

Table 7. Sampling Component - of - Variance

Matrix( sampling fraction 8 = .41)

-0.0988 10-3 i 0.3157 0.4491 -0.1141 -0.1839 0.4908 -0.0739 -0.1207 -0.1384 0.3676 -0.0288 -0.0461 -0.0533 -1.0352 I 0.1638

The computations to obtain the covariance matrix (3.3) and its components were easily done using the facilities of the SAS computer package program MATRIX.

8. ACKNOWLEDGMENTS

We wish to thank Mr. B. D. Richardson, Horticul- turist, Department of Agriculture, Hobart, Tasmania, for having pointed out the problem and providing the data, Professor T. P. Speed for his comments on previous versions, and also Miss C. Daniel, CSIRO, for typing the manuscript.

[Received March 1980. Revised September 1981.1

REFERENCES

JOHNSON, N. L., and KOTZ, S. (1969), Discrete Distributions, Boston: Houghton Mitllin Company.

ODELL, P. L., and BASU, J. P. (1976), “Concerning Several Methods for Estimating Crop Averages Using Remote Sensing Data,” Communications in Statistics A, 5, 1091-1114.

WILLIAMS, E. J. (1959), Regression Analysis, New York: John Wiley.