EVALUATION OF THE SKELETAL AGE METHOD OF ESTIMATING CHILDREN'S DEVELOPMENT

(1)

EVALUATION

OF

THE

SKELETAL

AGE

METHOD

OF

ESTIMATING

CHILDREN’S

DEVELOPMENT

II. Variable

Errors

in

the

Assessment

of

Roentgenograms

By DONALD MAINLAND, D.SC.

(With the technical assirtance of Ruth Bowering Mainland)

New York City

165

A

PREVIOUS report1 questioned the

reli-ability of single estimates of skeletal

age made by the Todd-Greulich-Pyle

method of assessing hand roentgenograms, largely because of ignorance of the system-atic error, i.e., the differences between

cx-pcrts’ assessments and those of other ob-servers. To remove this ignorance a set of

RGs, assessed by experts and distributed to other workers, would be necessary. If this is not feasible assessments may nevertheless

be used by any observer to estimate the progress in skeletal age of an individual

child. For this purpose each observer must estimate his variable error, i.e., the variation

among his independent readings of the

same RG with the same atlas. Then having

estimated the change in skeletal age

be-tween two RGs taken at different times from the same child, he can affix to his esti-mate an error, ± so many months.

If his systematic error does not vary from filiii to film it will cancel out, and leave his

estimate of variable error unaffected; but

if, unknown to him his systematic error

differs between films, the estimate derived from his variable error will vary in

reli-Fronl the l)epartnlent of Anatomy, I)alhousie

University, 1-lalifax, N.S., Canada, and the Depart-ment of Medical Statistics, New York University

College of Medicine, New York City.

This investigation is part of a study of the age changes in tile bones and joints of children and adults, initiate(l by a grant to Dalhousie University fronl the John and Mary R. Markle Foundation, and further supported by grants from the Division of Medical Research of the National Research

Council of Canada. The project is now being

con-(lucteci in New York, supported (in part) by a

re-search grant (PHS Grant No. A-104) from)) the Na-tional Institute of Arthritis and Metabolic Diseases,

Public Health Service.

(Received for publication July 13, 1953.)

ability. Such inerfilm variation in

system-atic error was recorded in the authors’ previous report, but for reasons given there it is doubtful whether it would be great if the expert assessed the test films with proper precautions. Moreover, it is un-reasonable to expect observers to suspend the making of assessments until this

ques-tion has been settled; therefore it is import-ant to see whether the variable error itself is large enough to throw doubt on the use-fulness of the assessment method.

The purpose of this investigation was to explore thoroughly the variable error of one observer, RBM, especially in order to

dis-cover whether in estimating the error

allow-ance should be made for the following eight factors: (1) the atlas (Todd2 or its successor, the Greulich-Pyle atlas), (2) age of child,

(

3) sex,

(

4) differences between skeletal and chronologic age,

(

5) differences between RGs of different children, (6) differences be-tween RGs from the same child, (7) quality of RGs, and

(

8) speed of assessment.

MATERIAL AND METHODS

Three series of postero-anterior RGs were used:

1. The Macy Series -actual-size

reproduc-tions of left hands in Macy’s4 Nutrition and

Chemical Growth in Childhood; 79 films from

1 1 boys and 9 girls (1 to 10 films/child); ages

3 to 16 yr.

2. The Orphanage Series-157 actual RGs prepared by RBM in the Dalhousie University

Anatomy Department, of the right hands of

79 boys and 78 girls (1 film/child); ages 5 to 15 yr. witji a few up to 17 yr. These children

comprised the total population (except for a few transient residents) over the age of 5 yr.

(2)

166

hands of 29 boys and 27 girls (24 children: 1 film each; 30 children: 2 films each; 2 children:

3 films each-intervals between films : generally about 6 mo); ages 16 mo. to 73 yr. These child-ren were subjects of a nutrition survey con-ducted in Halifax by Dr. E. Gordon Young. They were mostly from the less favorable

socio-economic groups.

No attempt was made in either Series 2 or

Series 3 to secure children in optimum health. The Orphanage children were well cared for and were mostly in good general health. The Nutrition Series contained some cases of recent

rather serious illness. In none of the series were grossly pathologic hands encountered.

Except for ‘the Nutrition Series, which was assessed by the Greuhich-Pyle atlas only, all RGs were assessed first by the Todd atlas and about four years later by the Greulich-Pyle atlas. Two independent readings by the same atlas were made on all RGs, the interval

be-tween the readings varying from about 1 mo. to about 3 mo., except that the interval

be-tween the Todd readings of many of the Orphanage films was greater, even up to 1

yr.

The total number of readings on 326 RGs was 1,124.

Investigation of the Macy Series was planned

as a study of observational error and there-fore, as described previously,’ the RGs were as-scsscd in random order and RBM knew only

the sex of the child. The Greulich-Pyle

assess-ments of the Orphanage Series were made in the same way, but the Todd assessments of the Orphanage films and the assessments of

the Nutrition films were made more or less in the order in which the films were collected, as

is customary in routine assessments for diagnos-tic purposes. In none of the assessments, how-ever, did RBM recall a previous reading when making a later one; nor was she aware of exact

chronologic ages.

In each reading the skeletal age represented the arithmetic mean of the stages reached by the various indicators (carpals, epiphyses, etc.). Each indicator was assessed individually cx-cept in certain series of readings discussed later under the heading “Speed of Assessment.”

STATISTICAL TECHNICS

-When independent readings, 2 from each

RG, show no persistent tendency for the first to be greater or less than the second, the

abso-lute difference (i.e., without plus or minus sign) represents the variable erroi; but the difference itself is not a convenient form for many of the

necessary analyses, because such absolute

dif-ferences do not have a frequency distribution of normal (Gaussian) shape. Even although there

arc only 2 readings the variation is most satis-factorily expressed as a standard deviation, as

would be done if there were more than 2 read-ings.

Since every assessor should estimate his variable error it is desirable to recall the simphi-fled arithmetics suitable for a series of, say, 50 RGs, each providing 2 readings. For any one film, square the difference between the

readings and divide by 2. The result is the

same as the mean square or variance for 1 de-gree of freedom (one less than the number of readings). The square root of the variance

would be an estimate of the standard devia-tion from the film concerned. To obtain an estimate from all films together, add the 50 variances, divide by 50 and find the square

root. This is the standard deviation represent-ing the variable or random error of

observa-tion. Its use will be illustrated later.

Before such final estimates of standard devia-tions were reached in this study much analysis was done which, because of the results

ob-tamed here, probably few assessors will feel the need to perform. Therefore little description of methods is required. For statistical readers, however, it should be mentioned that when the variable error was subjected to analysis of variance (e.g., between atlases, between child-ren and within children) and when it was tested by regression methods (e.g. , for its

re-lationship to age of child) each inter-reading variance (with one degree of freedom) was transformed into its logarithm in order to achieve an approximation to normality of the

frequency distribution.

Homogeneity of variance was tested by Bartlett’s method, Thompson and Merring-ton’s tables being used when the samples were few; and where there was only one degree of freedom in each sample Bishop and Nair’s7 critical values of - 2 log i were employed.

In all tests of statistical significance the 5% level was adopted as the minimal standard, which implies that, in order to be pronounced significant, a difference must have a value of P

(3)

ESTIMATING CHILDREN’S DEVELOPMENT

DIFFERENCE BETWEEN DUPLICATE READINGS

AS A MEASURE OF VARIABLE ERROR

When the second reading of a RG is subtracted from the first reading and the sign is retained, a significant majority of

plus or minus signs in a series (i.e. , a

per-centage significantly greater than 50) mdi-eates a systematic difference in reading, in addition to the variable error; and the same

feature can be tested by finding whether the mean of the series of differences, with signs retained, is significantly different from

zero. In none of the three series of RGs was there a significant majority of plus or minus signs. Most of the mean differences, also,

were hot significant; for example: - 0.42

month for the Macy Series

(

Todd’s atlas);

- 0.835 month for the Macy Series

(

Greu-lich-Pyle atlas); - 0.52 month for the

Orphange Series (Todd’s atlas). In the

Nu-trition Series the mean difference, + 1.13 months, was significant

(

P less than

0.01),

but, as will be seen later, this systematic difference was so small compared with the variable error that in most of the analyses, and in the practical application of the re-sult, it could be disregarded.

FAcrrolls THAT MIGHT AFFECr THE

\TARIABLE ERR0IS

The eight factors, already enumerated, were explored where possible in all three series of RGs, and the results were

essen-tially the same in all. As most of the values obtained from the tests were far from being

significant at the 5% level, few numerical

details need be given.

1. Differences between Atlases: It has been shown1 that RBM’s systematic error differed when she assessed the same films by the two atlases; and the difference in her

absolute skeletal age estimates from the two atlases will be discussed in a later report. By contrast, her variable error showed no indication of differing according to the

atlas that she used. Whatever uncertainties and fluctuations may have been responsible for the error, they apparently affected

equally the assessments of the same series of films by the two atlases. There was

found, however, no correlation between the magnitude of her Todd and Greulich-Pyle errors in the same film.

2. Age Differences: Expert assessors state

that they find greater difficulty in assessing hands at certain ages than at other ages, and this difficulty might be expected to mani-fest itself as an association between chrono-logic age and magnitude of variable error. No such association was found. If it existed

it was obscured by other factors and did not account for the size of the error. Since there were films from only eight children below the age of two years, the evidence regarding the youngest children is inadequate; but from 5 to 15 years an estimate of variable error could, according to all three series, be applied without regard to age.

3. Sex Differences: None of the series

suggested that boys’ and girls’ hands dif-fered in variable error of assessment.

4. Differences between Skeletal and Chronologic Age: When something disturbs the sequence of skeletal development, caus-ing either retardation or acceleration, it might be expected to produce disharmony between indicators or between parts of the same indicator, and thus perhaps lead to difficulty or uncertainty in assessment and an increase in the variable error. In the films surveyed the differences between chrono-logic and skeletal age varied greatly, rang-ing from less than one month to more than

two years; but there was found no sugges-tion of a relationship between this variation and differences in the variable error. There-fore it appears safe to obtain and apply estimates of variable error without regard to the age discrepancy, at least within the range observed in these films.

5. Differences between Children : It might be thought that, if the hands of certain children were consistently more difficult to assess than the hands of other children, the

(4)

cx-pected if a series contained some grossly pathologic hands, but there was no indica-tion of it in any of the three series assessed byRBM.

6. Differences between Roentgenograms from the Same Child: Since some films are more difficult to assess than other films from the same child, it is conceivable that there might be interfilm heterogeneity of variable error within the same child. Forty-seven children

(

15 in the Macy Series and 32 in the Nutrition Series) provided more than one RG per child. When all the films in each particular child were compared with each

other no significant heterogeneity of van-able error was found; but some further in-formation on this point was obtained by studying the quality of RGs.

7. Quality of Roentgenograms : Without recollecting her assessments of the Macy

RGs RBM graded the quality of each pie-tune as “good,” “fair,” or “poor.” In each of 13 children it was possible to compare the variable error in two films of extreme quality

(

good versus poor) and RBM’s van-able error was found to be significantly greater in the poor films than in the good ones. This factor cannot, however, account for more than a small part of the variable error; for in the actual RGs

(

Orphanage Series) the technic had produced such a uniform clarity of image that it was impos-sible to grade the films by quality or to de-tect during the reading a difficulty attnibut-able to this factor; and yet, as will be seen, the variable error was even greater than in the Macy Series.

8. Speed of Assessment: Although most of the skeletal age estimates were made by

assessing each indicator separately and

find-ing the arithmetic mean of these assess-ments, after RBM had gained experience with the Greulich-Pyle atlas she tried a quicker method-assessing by over-all in-spection without itemizing. This method was used in the second readings of the Nu-tnition Series, and may have accounted for the small but real systematic error in that series, already mentioned. The “quick” method was used in both first and second

Greulich-Pyle readings of the Orphanage films, except for a random sample of 36

films, in which one reading was “quick” and

the other “slow”

(

the regular itemizing meth-od). When this sample was compared with the other Orphanage films, all assessed by the “quick” method, no appreciable

difference in the variable error was found, and no significant systematic error. For group surveys, involving hundreds of films, the quicker method has the obvious ad-vantage of saving time, but it appears to be too coarse for the assessment of the prog-ress of an individual child. As applied by RBM it produced an excessive number of zero differences between first and second readings. These were compensated for, in the long run, by the occurrence of a few large differences; but this erratic behavior is undesirable in the study of individuals.

EsTIr.rATRs OF VARIABLE ERISOR

Although there is always a risk in accept-ing a nonsignificant difference as if it meant “no real difference,” there was so little cvi-dence of a difference in variable error be-tween children, or between RGs of the same child, that a pooling of variances was per-missible in estimating the following stand-and deviations to express RBM’s variable error:

Macy Series

(

79 RGs)-Todd’s atlas : 3.04 months; Greulich-Pyle atlas: 2.95 months.

Orphanage Series

(

157 RGs)-Todd’s atlas: 3.82 months; Greulich-Pyle atlas: 4.20 months.

Nutrition Series (90 RGs)-Greulich-Pyle atlas: 2.90 months.

Differences between the Three Series: As already stated, the variable error did not

differ significantly between the two atlases when both were used on the same films; but the error was not the same in the three series. The Macy and Nutrition Series have

essentially identical values, a standard

(5)

cx-planation of this difference has been found.

Method of Using the Estimates: The well-known method of applying such esti-mates can l)e illustrated by assuming that

all assessor’s variable error is represented by a standard deviation of three months, that he has two RGs of the same child’s hand taken 12 months apart, and that by mdc-pendent assessment of the two films he has estimated the progress in skeletal age as

10 fli)IitIi5.

To allow for his variable error he finds the standard deviation of the difference be-tween two independent readings of the same film by multiplying the standard

dcvi-ation, three months, by \/2to obtain 4.242 months. To achieve, as is customary, 95% probability for his estimate of skeletal age

progress, he will take twice this standard deviation, or more accurately 1.96 S.D., i.e., 8.3 ITloliths. Therefore lie cannot, with the required probability, give a more

prc-cisc estimate of the true change in skeletal age than the range 10 ± 8.3 months, i.e., between 1.7 and 18.3 months. These are

the 95% confidence limits.

If the original standard deviation were four Inonths instead of three, and if the

estimated change were again 10 months, the same method of calculation would give a range of ± 11.1 months. Ruling out the negative lower limit, the assessor could not,

with the minimal degree of confidence usually required, estimate the true change

lIt skeletal age more precisely than zero to 21 months.

Experimental Confirmation of Estimates: The foregoing estimates arc derived, as usual, from our knowledge of normal (Gaussian)

fre-quency distributions of measurements. If

measurements from such a series arc taken strictly at random, 2 at a time, and the

differ-ence is found in each pair, 5% of these differ-ences will exceed 1.96 times the standard

de-VittiO1l of differences-more accurately

cx-pressed, the value of 5% will be approached more alld more closely as the experiment is contintle(l.

Although the variable error in instrument

scale readings is commonly represented by t distribution that resembles the normal curve

in shape it is desirable, where possible, to test

experimentally the safety of normal-curve

esti-mates. Such tests are even more desirable with (lata like skeletal age estimates which are not simple scale readings but have a rather unusual structure. From each of the 5 standard devia-tions on page 168 (2 from Todd readings, 3 from

Greuhich-Pyle readings) an estimate of the

standard deviation of the difference between 2 independent readings was made by the method shown for the standard deviation 3 mo. Then in each series of RGs it was found how many of the actual differences exceeded 1.96 times the standard deviation of the differ-ence estimated for that series. Of the total 562 pairs of readings in all series combined, 37 pairs were in that category, i.e., 6.6%, which

is not significantly greater than 5%. The

normal-curve estimates have not let us far astray. Reduction of Variable Error: If an assessor

can make 2 independent readings on each RG and use the arithmetic mean as his estimate of skeletal age, he can reduce the allowance

to be made for variable error. If the standard

deviation for his individual readings is 3 mo. as before, the standard deviation for means

(two readings on each film) is 3/V2.

Proceed-ing as above, he will multiply this by v’2 in order to find the standard deviation of differ-ences between 2 (mean) readings, i.e., 3 mo. With an estimate of 10 mo. progress in skeletal age the 95% confidence limits would be given by 10 th 1.96

x

3; that is, 4.1 and 15.9 mo. Where single readings were used, above, the

confidence limits were 1.7 and 18.3 mo.; and the assessor would have to decide whether the

gain in precision was worth the double labor.

NUMBERS OF ROENTGENOCRAMS REQUIRED

FOR ESTIMATES

Although significant differences in van-able error were not found between RGs in this series

(

except in the comparison of re-productions of extremely different quality in the Macy Series), this does not imply that such differences would not occur in other series. An observer who desires merely an

(6)

Two

independent

readings

on each film are recommendable.

A standard deviation estimated from a sample of 50 or 100

RGs

may

be

either

larger or smaller than the “true” standard deviation the value that would be ap-proached by reading more and more films tinder the same conditions. An observer should decide in advance how many films he is going to use. He will probably be mostly concerned to avoid a serious under-estimate; therefore the authors give here some indications of the risk in that direction for samples of 60 and 120 RGs (2 independ-ent readings on each film). The following results were derived from the table of van-ance ratios (5% points) of Fisher and Yates.8 1. If 60 RGs have given a standard devia-tion (variation between single readings) of 3 mo. the assessor can state with 95% probabil-ity that the true value, if it is greater than the estimate, is unlikely to be more than 3.54 mo. If this latter were the true value, instead of allowing 8.3 mo. for variable error in corn-paring 2 films (p. 169) he should allow 9.8 mo. Using his estimate, 3. mo., he would err unwittingly by 1.5 mo.

2. If 120 RGs have given a standard devia-tion of 3 mo. the true value is unlikely to be more than 3.35 mo., and with this value the

allowance for variable error should be 9.3 mo.

FURTHER RESEARCH ON VARIABLE ERROR

It is conceivable that other observers’ variable errors are much smaller than those of RBM; and before the skeletal assessment method can be finally evaluated evidence from many observers will be necessary. In the article on systematic error1 an expeni-ment with 20 of the Macy RGs was sug-gested for readers, and a request for data was made. If, under the conditions specjfied there, two independent readings of each picture were made, the data would permit a comparison of different observers, all assessing the same films, even although 20 films would not provide a sufficiently pre-cisc estimate for the use of an individual observer. The authors would appreciate the opportunity to make this comparison.

Some observers might assert that the proper way to estimate skeletal progress is to xamine two films side by side, assess the change in each individual indicator sepa-nately and then average the differences so found. If this technic were used it would be necessary to adopt some method of insur-ing that the assessor did not know which film was chronologically the earlier; other-wise bias would almost certainly occur, for bias can arise even from the effort to avoid it. To estimate variable error the procedure would have to be repeated on a series of such pairs of films without knowledge of the previous assessments.

USE OF MULTIPLE SUCCESSIVE ROENTGENOGRAMS

If it is found that many observers’ variable errors are as large as those discussed here, much dependence will have to be placed on the search for a significant upward trend in skeletal age revealed by a series of 5 or 6 RGs taken at intervals of 3 or 4 mo. Even if the difference between the first and last film, con-sidered without the intervening films, is not significantly greater than variable error, there may be such a steady, almost rectilinear, mere-ment as to leave no reasonable doubt that the child is making progress. Before such a con-elusion can be drawn a statistical regression test must be applied, but with such data this is not difficult. The regression coefficient cx-presses the slope of the line, e.g. , 0.4 mo.

aver-age gain in skeletal agc/ mo. of increase in chronologic age; and from its standard

devia-tion the confidence limits for the estimate of

skeletal progress can be determined.

EFFECT OF LAPSED PRACTICE ON

ASSESSMENT ERROR

About a year after all the assessments used in the foregoing discussion had been completed-a year during which RBM as-sessed only 2 on 3 RGs-she reassessed a random sample of 18 Orphanage films by the Greulich-Pyle atlas, for comparison with the readings made about 12 months previ-ously. Both the previous and new readings were made by the “slow”

(

itemizing) method. The mean of the 18 differences

(7)

+3.12 months. Standard deviation of series of differences 4.66 months. Standard de-viation of mean difference 1.10 months. Therefore t 2.84, which is significant (P almost as low as 0.01). Two conclusions can be drawn from this experience:

1. Even after an observer has used the skeletal assessment method for five years, has made more than twelve hundred read-ings and has developed stability of technic,

interruption of practice can cause a serious

change in her standards.

2. When the skeletal progress between

two RGs is to be estimated it is desirable to assess both films within a short time

( a

few weeks) of each other, even although the first film may have been already assessed many months previously. To fulfill this condition and yet preserve independence of the two assessments, various precautions will be

necessary. For example, a number of films

from different children can be assessed in random order within the same period, and, as in any case desirable, all identification marks except sex should be covered by an assistant before the assessor examines the

films.

A FURTHER N0TR ON SYSTEMATIC ERROR The rather large variable error of an ob-server who has had considerable practice in the assessment technic tends to enhance the doubt thrown on the value of single assess-mcnts in the previous article.’ With refer-ence to that report Dr. Idell Pyle (in a per-sonal communication for which the authors are much indebted) suggests that in seeking for an explanation of the differences in RBM’s systematic error in different films

there should he considered, as a possibly major cause, the fact that the Macy RGs are photographic reproductions, in which some

of the indicators are lost and others en-hanced. Since the expert assessments pub-lished by Macy had been made on the onig-inal RGs the quality of the reproductions,

including the absence of the terminal parts of the fingers in some pictures, had been cx-amined in the study of systematic error, and, as stated in the report, no relationship with

RBM’s systematic error had been found. On Dr. Pyle’s recommendation, however, the question has been investigated further.

Since the Todd assessments and the Pyle assessments had been made on the actual RGs it would be expected that, if RBM’s differences from these experts were largely due to incorrect reproduction in the Macy photographs, hen discnepancies from the two experts, although not necessarily equal, would in the main tend to be positively con-related. For each of the 10 children studied, therefore, correlation coefficients have been found, the two variates being RBM’s differ-ence from the Todd assessment and her dif-ference from the Pyle assessment on the same film. The coefficients ranged from

-0.49 to +0.97. Only 2 were significant and, most notably, 5 were negative and 5 positive. Even if RBM had assessed the

onig-inal RGs it can be assumed, because of the

relationship of the two atlases to each other, that a certain amount of correlation of her two errors would have been found. The small degree of correlation actually ob-served, therefore, does not support the sug-gestion that the defective photographic

re-production was responsible for the variabil-ity in her systematic error.

Regarding the authors’ other suggested explanation of the variability in error be-tween films, i.e., lack of independence in experts’ assessments of different films from the same child, Dr. Pyle states that in her assessments of the Macy films she took

pne-cautions to secure independence. Whatever the explanation may be, it does not prevent the use of the Macy pictures for a compari-son of observers with each other, as sug-gested in the previous neport and again in the present article.

A further point raised by Dr. Pyle merits consideration. An assessment made by an cx-pert who has produced an atlas will gen-erally be made, not from the atlas itself, but

(8)

dif-ference from the expert may, therefore, be due to the failure of the atlas pictures to re-produce correctly the original RGs. This does not, however, detract from the value of this method of ascertaining the general

use-fulness of the assessment technic, for such comparisons, if widely made, would enable one to answer the question: How do ob-servers in general, using the facilities at their disposal (the atlas) differ in assessment from an expert with the superior knowledge and facilities (original RGs or intermediates) available to him?

SUMMARY

An observer’s variable error in skeletal age assessment of hand RGs

(

i.e., the ir-regular variation between independent readings of the same film) was studied on 1,124 readings of 326 films from 233 chil-dren aged 16 months to 17 years. Seventy-nine of the RGs were full-size reproductions in Macyr’s Nutrition and Chemical Growth

in Childhood; the remainder were actual

films of children in Halifax, Canada

(

healthy Orphanage residents and children examined in a nutrition survey).

There was no significant difference in variable error associated with the atlas (Todd, Greulich-Pyle), age of child, sex, differences between skeletal and chrono-logic age, differences between children, or differences between RGs of the same child, except for a tendency in the Macy Series for the poorest reproductions to have a larger variable error than the best repro-ductions. In most readings the individual indicators were assessed separately and the results averaged, but a quicken method

(oven-all appraisal) did not produce a Signi-ficantly different variable error. The quick

method may be useful in large surveys, although it appears too coarse for the study of individual children.

The observer’s variable error was cx-pressed by standard deviations of approxi-mately three months (Macy Series-both at-lases; Nutrition Senies-Greulich-Pyle atlas) and four months (Orphanage Series-both atlases). With a standard deviation of three

months an assessor must affix an erron of

± 8.3 months to his estimate of a child’s progress in skeletal age, in order to obtain confidence limits with 95% probability. If his standard deviation is four months he must allow ± 11.1 months. For evaluation of the assessment method, many observers’ estimates of variable error are needed, and an appeal for data is issued.

After more than 1200 readings had been made the observer’s practice lapsed for about a year. Reassessment of a random sample of RGs then showed, besides van-able error, a mean systematic difference of approximately three months from the previ-ous readings of the same films with the same atlas. To avoid this risk, any two films that are to be assessed for skeletal progress should be read within a few weeks of each other, and special precautions are therefore necessary to secure independence of the two readings.

REFERENCES

1. Mainland, D., and Mainland, R. B., Evalua-tion of skeletal age method of estimating children’s development. I. Systematic er-rors in assessment of roentgenograms, PEDIAT1uC5 12:114, 1953.

2. Todd, T. W., Atlas of Skeletal Maturation (Hand), St. Louis, The C. V. Mosby Com-pany, 1937.

3. Greulich, W. W., and Pyle, S. I., Radio-graphic Atlas of Skeletal Development of Hand and Wrist, Stanford, Calif., The Stanford University Press, 1950.

4. Macy, I. G., Nutrition and Chemical Growth in Childhood, Springfield, Ill., Charles C Thomas, Publisher, 1946, vol. 2.

5. Mainland, D., Elementary Medical Statis-tics: Principles of Quantitative Medicine, Philadelphia, W. B. Saunders Company, 1952.

6. Thompson, C. M., and Merrington, M., Tables for testing homogeneity of set of estimated variances, Biometnika 33:296, 1946.

7. Bishop, D.

J.,

and Nair, U. S., Note on cer-tam methods of testing for homogeneity of set of estimated variances,

J.

Roy. Stat. Soc. (supp.) 6:89, 1939.

(9)

173

477 First Avenue SPANISH ABSTRACr

Valoraci#{243}n del

M#{233}todo de

Medir

la

Edad

Osea

Para

Estimar

el Desarrollo

de

los Ni#{241}os

II. Errores

Variables

en la Apreciaci#{243}n

de

RadiografIas

Se cstudi#{243} cl error variable de uno de los

autores en ha valoraci#{243}n de Ia edad Osea radio-l#{243}gicaen 1 124 lecturas de 326 placas tomadas

a 233 niflos, con edad entre 16 meses y 17

afios; el error variable se determina por las

discrepancias que el observador tiene en las lecturas y observaciones aisladas utihizando las

mismas placas. 79 radiografIas correspondieron a reproducciones de tamaflo natural dcl libro de Macy “NutriciOn y Crecimiento QuImico en ha Infancia”; el resto de has radiografIas per-tenecicron a ni#{241}osde un orfanatonio de Hahi-fax, Canada, examinados en una encuesta nu-tricionah.

No se cncontrO diferencia significativa en el error variable debido al atlas (Todd, Greuhich-Pyle), edad dcl ni#{241}os, sexo, diferencias entre

has cdadcs Osea y cronol#{243}gica, difcrcncias entre los ni#{241}os,y diferencias entre las radiograflas dcl mismo niflo, y sI tendencia de has peores reproducciones de ha scnie de Macy de pre-sentar un error variable mayor que las mejores

reproducciones. En ha mayor parte de has lee-turas los halhazgos individuales se valoraron separadamente y los resultados se promediaron;

el m#{233}todo m#{225}sr#{225}pido de apreciaciOn general no di#{243}error variable significativamente difer-ente. Este m#{233}todor#{225}pidoes Otih en grandes encuestas pero grosero y burdo en el estudio de ni#{241}osindividuales.

El error variable dcl observador de este artIcuho se manifestO con desviacioncs standard

de aproximadamcntc 3 meses para ha senie de

Macy con ambos atlas y ha serie de ha encuesta

nutricional con el de Greuhich-Pyhe, y de 4 meses en la seric dcl orfanatonio con ambos

atlas. Con ha desviaci#{243}n standard de tres mcses

cualquier asesor puede tcncr un error de th8.3 meses en su estimaci#{243}n dcl progreso de ha edad

Osea dcl ni#{241}opara poder tener lImites

con-fiabhcs con un 95% de probabihidades. Si su desviaciOn standard es dc 4 meses, debe

per-mitirse un error de 1 1.1 meses.

Despu#{233}s de realizar m#{225}s1,200 lecturas, el autor, dejO pasar 12 meses. La revahoraci#{243}n de

radiograflas tomadas al azar mostr#{243}entonces adem#{225}s dcl error variable, una diferencia media

sistcm#{225}tica de cerca de 3 meses con relaci#{243}n

a has hecturas reahizadas previamente con has

mismas placas y con los mismos altas. Para

evitar este riesgo, los autores sugieren que

cu-ando se han de valorar dos placas desde el punto de vista dcl progreso #{243}seo,su hectura debe reahizarse con unas cuantas semanas de

diferencia, tom#{225}ndose precauciones especiales para obtcnersc independencia en las dos

(10)

1954;13;165

Pediatrics

DONALD MAINLAND

Roentgenograms

CHILDREN'S DEVELOPMENT: II. Variable Errors in the Assessment of

EVALUATION OF THE SKELETAL AGE METHOD OF ESTIMATING

Services

Updated Information &

http://pediatrics.aappublications.org/content/13/2/165

including high resolution figures, can be found at:

Permissions & Licensing

http://www.aappublications.org/site/misc/Permissions.xhtml

entirety can be found online at:

Information about reproducing this article in parts (figures, tables) or in its

Reprints

http://www.aappublications.org/site/misc/reprints.xhtml

(11)