EVALUATION
OF
THE
SKELETAL
AGE
METHOD
OF
ESTIMATING
CHILDREN’S
DEVELOPMENT
II.
Variable
Errors
in
the
Assessment
of
Roentgenograms
By DONALD MAINLAND, D.SC.
(With the technical assirtance of Ruth Bowering Mainland)
New York City
165
A
PREVIOUS report1 questioned thereli-ability of single estimates of skeletal
age made by the Todd-Greulich-Pyle
method of assessing hand roentgenograms, largely because of ignorance of the system-atic error, i.e., the differences between
cx-pcrts’ assessments and those of other ob-servers. To remove this ignorance a set of
RGs, assessed by experts and distributed to other workers, would be necessary. If this is not feasible assessments may nevertheless
be used by any observer to estimate the progress in skeletal age of an individual
child. For this purpose each observer must estimate his variable error, i.e., the variation
among his independent readings of the
same RG with the same atlas. Then having
estimated the change in skeletal age
be-tween two RGs taken at different times from the same child, he can affix to his esti-mate an error, ± so many months.
If his systematic error does not vary from filiii to film it will cancel out, and leave his
estimate of variable error unaffected; but
if, unknown to him his systematic error
differs between films, the estimate derived from his variable error will vary in
reli-Fronl the l)epartnlent of Anatomy, I)alhousie
University, 1-lalifax, N.S., Canada, and the Depart-ment of Medical Statistics, New York University
College of Medicine, New York City.
This investigation is part of a study of the age changes in tile bones and joints of children and adults, initiate(l by a grant to Dalhousie University fronl the John and Mary R. Markle Foundation, and further supported by grants from the Division of Medical Research of the National Research
Council of Canada. The project is now being
con-(lucteci in New York, supported (in part) by a
re-search grant (PHS Grant No. A-104) from)) the Na-tional Institute of Arthritis and Metabolic Diseases,
Public Health Service.
(Received for publication July 13, 1953.)
ability. Such inerfilm variation in
system-atic error was recorded in the authors’ previous report, but for reasons given there it is doubtful whether it would be great if the expert assessed the test films with proper precautions. Moreover, it is un-reasonable to expect observers to suspend the making of assessments until this
ques-tion has been settled; therefore it is import-ant to see whether the variable error itself is large enough to throw doubt on the use-fulness of the assessment method.
The purpose of this investigation was to explore thoroughly the variable error of one observer, RBM, especially in order to
dis-cover whether in estimating the error
allow-ance should be made for the following eight factors: (1) the atlas (Todd2 or its successor, the Greulich-Pyle atlas), (2) age of child,
(
3) sex,(
4) differences between skeletal and chronologic age,(
5) differences between RGs of different children, (6) differences be-tween RGs from the same child, (7) quality of RGs, and(
8) speed of assessment.MATERIAL AND METHODS
Three series of postero-anterior RGs were used:
1. The Macy Series -actual-size
reproduc-tions of left hands in Macy’s4 Nutrition and
Chemical Growth in Childhood; 79 films from
1 1 boys and 9 girls (1 to 10 films/child); ages
3 to 16 yr.
2. The Orphanage Series-157 actual RGs prepared by RBM in the Dalhousie University
Anatomy Department, of the right hands of
79 boys and 78 girls (1 film/child); ages 5 to 15 yr. witji a few up to 17 yr. These children
comprised the total population (except for a few transient residents) over the age of 5 yr.
166
hands of 29 boys and 27 girls (24 children: 1 film each; 30 children: 2 films each; 2 children:
3 films each-intervals between films : generally about 6 mo); ages 16 mo. to 73 yr. These child-ren were subjects of a nutrition survey con-ducted in Halifax by Dr. E. Gordon Young. They were mostly from the less favorable
socio-economic groups.
No attempt was made in either Series 2 or
Series 3 to secure children in optimum health. The Orphanage children were well cared for and were mostly in good general health. The Nutrition Series contained some cases of recent
rather serious illness. In none of the series were grossly pathologic hands encountered.
Except for ‘the Nutrition Series, which was assessed by the Greuhich-Pyle atlas only, all RGs were assessed first by the Todd atlas and about four years later by the Greulich-Pyle atlas. Two independent readings by the same atlas were made on all RGs, the interval
be-tween the readings varying from about 1 mo. to about 3 mo., except that the interval
be-tween the Todd readings of many of the Orphanage films was greater, even up to 1
yr.
The total number of readings on 326 RGs was 1,124.
Investigation of the Macy Series was planned
as a study of observational error and there-fore, as described previously,’ the RGs were as-scsscd in random order and RBM knew only
the sex of the child. The Greulich-Pyle
assess-ments of the Orphanage Series were made in the same way, but the Todd assessments of the Orphanage films and the assessments of
the Nutrition films were made more or less in the order in which the films were collected, as
is customary in routine assessments for diagnos-tic purposes. In none of the assessments, how-ever, did RBM recall a previous reading when making a later one; nor was she aware of exact
chronologic ages.
In each reading the skeletal age represented the arithmetic mean of the stages reached by the various indicators (carpals, epiphyses, etc.). Each indicator was assessed individually cx-cept in certain series of readings discussed later under the heading “Speed of Assessment.”
STATISTICAL TECHNICS
-When independent readings, 2 from each
RG, show no persistent tendency for the first to be greater or less than the second, the
abso-lute difference (i.e., without plus or minus sign) represents the variable erroi; but the difference itself is not a convenient form for many of the
necessary analyses, because such absolute
dif-ferences do not have a frequency distribution of normal (Gaussian) shape. Even although there
arc only 2 readings the variation is most satis-factorily expressed as a standard deviation, as
would be done if there were more than 2 read-ings.
Since every assessor should estimate his variable error it is desirable to recall the simphi-fled arithmetics suitable for a series of, say, 50 RGs, each providing 2 readings. For any one film, square the difference between the
readings and divide by 2. The result is the
same as the mean square or variance for 1 de-gree of freedom (one less than the number of readings). The square root of the variance
would be an estimate of the standard devia-tion from the film concerned. To obtain an estimate from all films together, add the 50 variances, divide by 50 and find the square
root. This is the standard deviation represent-ing the variable or random error of
observa-tion. Its use will be illustrated later.
Before such final estimates of standard devia-tions were reached in this study much analysis was done which, because of the results
ob-tamed here, probably few assessors will feel the need to perform. Therefore little description of methods is required. For statistical readers, however, it should be mentioned that when the variable error was subjected to analysis of variance (e.g., between atlases, between child-ren and within children) and when it was tested by regression methods (e.g. , for its
re-lationship to age of child) each inter-reading variance (with one degree of freedom) was transformed into its logarithm in order to achieve an approximation to normality of the
frequency distribution.
Homogeneity of variance was tested by Bartlett’s method, Thompson and Merring-ton’s tables being used when the samples were few; and where there was only one degree of freedom in each sample Bishop and Nair’s7 critical values of - 2 log i were employed.
In all tests of statistical significance the 5% level was adopted as the minimal standard, which implies that, in order to be pronounced significant, a difference must have a value of P
ESTIMATING CHILDREN’S DEVELOPMENT
DIFFERENCE BETWEEN DUPLICATE READINGS
AS A MEASURE OF VARIABLE ERROR
When the second reading of a RG is subtracted from the first reading and the sign is retained, a significant majority of
plus or minus signs in a series (i.e. , a
per-centage significantly greater than 50) mdi-eates a systematic difference in reading, in addition to the variable error; and the same
feature can be tested by finding whether the mean of the series of differences, with signs retained, is significantly different from
zero. In none of the three series of RGs was there a significant majority of plus or minus signs. Most of the mean differences, also,
were hot significant; for example: - 0.42
month for the Macy Series
(
Todd’s atlas);- 0.835 month for the Macy Series
(
Greu-lich-Pyle atlas); - 0.52 month for the
Orphange Series (Todd’s atlas). In the
Nu-trition Series the mean difference, + 1.13 months, was significant
(
P less than
0.01),
but, as will be seen later, this systematic difference was so small compared with the variable error that in most of the analyses, and in the practical application of the re-sult, it could be disregarded.
FAcrrolls THAT MIGHT AFFECr THE
\TARIABLE ERR0IS
The eight factors, already enumerated, were explored where possible in all three series of RGs, and the results were
essen-tially the same in all. As most of the values obtained from the tests were far from being
significant at the 5% level, few numerical
details need be given.
1. Differences between Atlases: It has been shown1 that RBM’s systematic error differed when she assessed the same films by the two atlases; and the difference in her
absolute skeletal age estimates from the two atlases will be discussed in a later report. By contrast, her variable error showed no indication of differing according to the
atlas that she used. Whatever uncertainties and fluctuations may have been responsible for the error, they apparently affected
equally the assessments of the same series of films by the two atlases. There was
found, however, no correlation between the magnitude of her Todd and Greulich-Pyle errors in the same film.
2. Age Differences: Expert assessors state
that they find greater difficulty in assessing hands at certain ages than at other ages, and this difficulty might be expected to mani-fest itself as an association between chrono-logic age and magnitude of variable error. No such association was found. If it existed
it was obscured by other factors and did not account for the size of the error. Since there were films from only eight children below the age of two years, the evidence regarding the youngest children is inadequate; but from 5 to 15 years an estimate of variable error could, according to all three series, be applied without regard to age.
3. Sex Differences: None of the series
suggested that boys’ and girls’ hands dif-fered in variable error of assessment.
4. Differences between Skeletal and Chronologic Age: When something disturbs the sequence of skeletal development, caus-ing either retardation or acceleration, it might be expected to produce disharmony between indicators or between parts of the same indicator, and thus perhaps lead to difficulty or uncertainty in assessment and an increase in the variable error. In the films surveyed the differences between chrono-logic and skeletal age varied greatly, rang-ing from less than one month to more than
two years; but there was found no sugges-tion of a relationship between this variation and differences in the variable error. There-fore it appears safe to obtain and apply estimates of variable error without regard to the age discrepancy, at least within the range observed in these films.
5. Differences between Children : It might be thought that, if the hands of certain children were consistently more difficult to assess than the hands of other children, the
cx-pected if a series contained some grossly pathologic hands, but there was no indica-tion of it in any of the three series assessed byRBM.
6. Differences between Roentgenograms from the Same Child: Since some films are more difficult to assess than other films from the same child, it is conceivable that there might be interfilm heterogeneity of variable error within the same child. Forty-seven children
(
15 in the Macy Series and 32 in the Nutrition Series) provided more than one RG per child. When all the films in each particular child were compared with eachother no significant heterogeneity of van-able error was found; but some further in-formation on this point was obtained by studying the quality of RGs.
7. Quality of Roentgenograms : Without recollecting her assessments of the Macy
RGs RBM graded the quality of each pie-tune as “good,” “fair,” or “poor.” In each of 13 children it was possible to compare the variable error in two films of extreme quality
(
good versus poor) and RBM’s van-able error was found to be significantly greater in the poor films than in the good ones. This factor cannot, however, account for more than a small part of the variable error; for in the actual RGs(
Orphanage Series) the technic had produced such a uniform clarity of image that it was impos-sible to grade the films by quality or to de-tect during the reading a difficulty attnibut-able to this factor; and yet, as will be seen, the variable error was even greater than in the Macy Series.8. Speed of Assessment: Although most of the skeletal age estimates were made by
assessing each indicator separately and
find-ing the arithmetic mean of these assess-ments, after RBM had gained experience with the Greulich-Pyle atlas she tried a quicker method-assessing by over-all in-spection without itemizing. This method was used in the second readings of the Nu-tnition Series, and may have accounted for the small but real systematic error in that series, already mentioned. The “quick” method was used in both first and second
Greulich-Pyle readings of the Orphanage films, except for a random sample of 36
films, in which one reading was “quick” and
the other “slow”
(
the regular itemizing meth-od). When this sample was compared with the other Orphanage films, all assessed by the “quick” method, no appreciabledifference in the variable error was found, and no significant systematic error. For group surveys, involving hundreds of films, the quicker method has the obvious ad-vantage of saving time, but it appears to be too coarse for the assessment of the prog-ress of an individual child. As applied by RBM it produced an excessive number of zero differences between first and second readings. These were compensated for, in the long run, by the occurrence of a few large differences; but this erratic behavior is undesirable in the study of individuals.
EsTIr.rATRs OF VARIABLE ERISOR
Although there is always a risk in accept-ing a nonsignificant difference as if it meant “no real difference,” there was so little cvi-dence of a difference in variable error be-tween children, or between RGs of the same child, that a pooling of variances was per-missible in estimating the following stand-and deviations to express RBM’s variable error:
Macy Series
(
79 RGs)-Todd’s atlas : 3.04 months; Greulich-Pyle atlas: 2.95 months.Orphanage Series
(
157 RGs)-Todd’s atlas: 3.82 months; Greulich-Pyle atlas: 4.20 months.Nutrition Series (90 RGs)-Greulich-Pyle atlas: 2.90 months.
Differences between the Three Series: As already stated, the variable error did not
differ significantly between the two atlases when both were used on the same films; but the error was not the same in the three series. The Macy and Nutrition Series have
essentially identical values, a standard
cx-planation of this difference has been found.
Method of Using the Estimates: The well-known method of applying such esti-mates can l)e illustrated by assuming that
all assessor’s variable error is represented by a standard deviation of three months, that he has two RGs of the same child’s hand taken 12 months apart, and that by mdc-pendent assessment of the two films he has estimated the progress in skeletal age as
10 fli)IitIi5.
To allow for his variable error he finds the standard deviation of the difference be-tween two independent readings of the same film by multiplying the standard
dcvi-ation, three months, by \/2to obtain 4.242 months. To achieve, as is customary, 95% probability for his estimate of skeletal age
progress, he will take twice this standard deviation, or more accurately 1.96 S.D., i.e., 8.3 ITloliths. Therefore lie cannot, with the required probability, give a more
prc-cisc estimate of the true change in skeletal age than the range 10 ± 8.3 months, i.e., between 1.7 and 18.3 months. These are
the 95% confidence limits.
If the original standard deviation were four Inonths instead of three, and if the
estimated change were again 10 months, the same method of calculation would give a range of ± 11.1 months. Ruling out the negative lower limit, the assessor could not,
with the minimal degree of confidence usually required, estimate the true change
lIt skeletal age more precisely than zero to 21 months.
Experimental Confirmation of Estimates: The foregoing estimates arc derived, as usual, from our knowledge of normal (Gaussian)
fre-quency distributions of measurements. If
measurements from such a series arc taken strictly at random, 2 at a time, and the
differ-ence is found in each pair, 5% of these differ-ences will exceed 1.96 times the standard
de-VittiO1l of differences-more accurately
cx-pressed, the value of 5% will be approached more alld more closely as the experiment is contintle(l.
Although the variable error in instrument
scale readings is commonly represented by t distribution that resembles the normal curve
in shape it is desirable, where possible, to test
experimentally the safety of normal-curve
esti-mates. Such tests are even more desirable with (lata like skeletal age estimates which are not simple scale readings but have a rather unusual structure. From each of the 5 standard devia-tions on page 168 (2 from Todd readings, 3 from
Greuhich-Pyle readings) an estimate of the
standard deviation of the difference between 2 independent readings was made by the method shown for the standard deviation 3 mo. Then in each series of RGs it was found how many of the actual differences exceeded 1.96 times the standard deviation of the differ-ence estimated for that series. Of the total 562 pairs of readings in all series combined, 37 pairs were in that category, i.e., 6.6%, which
is not significantly greater than 5%. The
normal-curve estimates have not let us far astray. Reduction of Variable Error: If an assessor
can make 2 independent readings on each RG and use the arithmetic mean as his estimate of skeletal age, he can reduce the allowance
to be made for variable error. If the standard
deviation for his individual readings is 3 mo. as before, the standard deviation for means
(two readings on each film) is 3/V2.
Proceed-ing as above, he will multiply this by v’2 in order to find the standard deviation of differ-ences between 2 (mean) readings, i.e., 3 mo. With an estimate of 10 mo. progress in skeletal age the 95% confidence limits would be given by 10 th 1.96
x
3; that is, 4.1 and 15.9 mo. Where single readings were used, above, theconfidence limits were 1.7 and 18.3 mo.; and the assessor would have to decide whether the
gain in precision was worth the double labor.
NUMBERS OF ROENTGENOCRAMS REQUIRED
FOR ESTIMATES
Although significant differences in van-able error were not found between RGs in this series
(
except in the comparison of re-productions of extremely different quality in the Macy Series), this does not imply that such differences would not occur in other series. An observer who desires merely anTwo
independent
readings
on each film are recommendable.A standard deviation estimated from a sample of 50 or 100
RGs
may
be
either
larger or smaller than the “true” standard deviation the value that would be ap-proached by reading more and more films tinder the same conditions. An observer should decide in advance how many films he is going to use. He will probably be mostly concerned to avoid a serious under-estimate; therefore the authors give here some indications of the risk in that direction for samples of 60 and 120 RGs (2 independ-ent readings on each film). The following results were derived from the table of van-ance ratios (5% points) of Fisher and Yates.8 1. If 60 RGs have given a standard devia-tion (variation between single readings) of 3 mo. the assessor can state with 95% probabil-ity that the true value, if it is greater than the estimate, is unlikely to be more than 3.54 mo. If this latter were the true value, instead of allowing 8.3 mo. for variable error in corn-paring 2 films (p. 169) he should allow 9.8 mo. Using his estimate, 3. mo., he would err unwittingly by 1.5 mo.
2. If 120 RGs have given a standard devia-tion of 3 mo. the true value is unlikely to be more than 3.35 mo., and with this value the
allowance for variable error should be 9.3 mo.
FURTHER RESEARCH ON VARIABLE ERROR
It is conceivable that other observers’ variable errors are much smaller than those of RBM; and before the skeletal assessment method can be finally evaluated evidence from many observers will be necessary. In the article on systematic error1 an expeni-ment with 20 of the Macy RGs was sug-gested for readers, and a request for data was made. If, under the conditions specjfied there, two independent readings of each picture were made, the data would permit a comparison of different observers, all assessing the same films, even although 20 films would not provide a sufficiently pre-cisc estimate for the use of an individual observer. The authors would appreciate the opportunity to make this comparison.
Some observers might assert that the proper way to estimate skeletal progress is to xamine two films side by side, assess the change in each individual indicator sepa-nately and then average the differences so found. If this technic were used it would be necessary to adopt some method of insur-ing that the assessor did not know which film was chronologically the earlier; other-wise bias would almost certainly occur, for bias can arise even from the effort to avoid it. To estimate variable error the procedure would have to be repeated on a series of such pairs of films without knowledge of the previous assessments.
USE OF MULTIPLE SUCCESSIVE ROENTGENOGRAMS
If it is found that many observers’ variable errors are as large as those discussed here, much dependence will have to be placed on the search for a significant upward trend in skeletal age revealed by a series of 5 or 6 RGs taken at intervals of 3 or 4 mo. Even if the difference between the first and last film, con-sidered without the intervening films, is not significantly greater than variable error, there may be such a steady, almost rectilinear, mere-ment as to leave no reasonable doubt that the child is making progress. Before such a con-elusion can be drawn a statistical regression test must be applied, but with such data this is not difficult. The regression coefficient cx-presses the slope of the line, e.g. , 0.4 mo.
aver-age gain in skeletal agc/ mo. of increase in chronologic age; and from its standard
devia-tion the confidence limits for the estimate of
skeletal progress can be determined.
EFFECT OF LAPSED PRACTICE ON
ASSESSMENT ERROR
About a year after all the assessments used in the foregoing discussion had been completed-a year during which RBM as-sessed only 2 on 3 RGs-she reassessed a random sample of 18 Orphanage films by the Greulich-Pyle atlas, for comparison with the readings made about 12 months previ-ously. Both the previous and new readings were made by the “slow”
(
itemizing) method. The mean of the 18 differences+3.12 months. Standard deviation of series of differences 4.66 months. Standard de-viation of mean difference 1.10 months. Therefore t 2.84, which is significant (P almost as low as 0.01). Two conclusions can be drawn from this experience:
1. Even after an observer has used the skeletal assessment method for five years, has made more than twelve hundred read-ings and has developed stability of technic,
interruption of practice can cause a serious
change in her standards.
2. When the skeletal progress between
two RGs is to be estimated it is desirable to assess both films within a short time
( a
few weeks) of each other, even although the first film may have been already assessed many months previously. To fulfill this condition and yet preserve independence of the two assessments, various precautions will benecessary. For example, a number of films
from different children can be assessed in random order within the same period, and, as in any case desirable, all identification marks except sex should be covered by an assistant before the assessor examines the
films.
A FURTHER N0TR ON SYSTEMATIC ERROR The rather large variable error of an ob-server who has had considerable practice in the assessment technic tends to enhance the doubt thrown on the value of single assess-mcnts in the previous article.’ With refer-ence to that report Dr. Idell Pyle (in a per-sonal communication for which the authors are much indebted) suggests that in seeking for an explanation of the differences in RBM’s systematic error in different films
there should he considered, as a possibly major cause, the fact that the Macy RGs are photographic reproductions, in which some
of the indicators are lost and others en-hanced. Since the expert assessments pub-lished by Macy had been made on the onig-inal RGs the quality of the reproductions,
including the absence of the terminal parts of the fingers in some pictures, had been cx-amined in the study of systematic error, and, as stated in the report, no relationship with
RBM’s systematic error had been found. On Dr. Pyle’s recommendation, however, the question has been investigated further.
Since the Todd assessments and the Pyle assessments had been made on the actual RGs it would be expected that, if RBM’s differences from these experts were largely due to incorrect reproduction in the Macy photographs, hen discnepancies from the two experts, although not necessarily equal, would in the main tend to be positively con-related. For each of the 10 children studied, therefore, correlation coefficients have been found, the two variates being RBM’s differ-ence from the Todd assessment and her dif-ference from the Pyle assessment on the same film. The coefficients ranged from
-0.49 to +0.97. Only 2 were significant and, most notably, 5 were negative and 5 positive. Even if RBM had assessed the
onig-inal RGs it can be assumed, because of the
relationship of the two atlases to each other, that a certain amount of correlation of her two errors would have been found. The small degree of correlation actually ob-served, therefore, does not support the sug-gestion that the defective photographic
re-production was responsible for the variabil-ity in her systematic error.
Regarding the authors’ other suggested explanation of the variability in error be-tween films, i.e., lack of independence in experts’ assessments of different films from the same child, Dr. Pyle states that in her assessments of the Macy films she took
pne-cautions to secure independence. Whatever the explanation may be, it does not prevent the use of the Macy pictures for a compari-son of observers with each other, as sug-gested in the previous neport and again in the present article.
A further point raised by Dr. Pyle merits consideration. An assessment made by an cx-pert who has produced an atlas will gen-erally be made, not from the atlas itself, but
dif-ference from the expert may, therefore, be due to the failure of the atlas pictures to re-produce correctly the original RGs. This does not, however, detract from the value of this method of ascertaining the general
use-fulness of the assessment technic, for such comparisons, if widely made, would enable one to answer the question: How do ob-servers in general, using the facilities at their disposal (the atlas) differ in assessment from an expert with the superior knowledge and facilities (original RGs or intermediates) available to him?
SUMMARY
An observer’s variable error in skeletal age assessment of hand RGs
(
i.e., the ir-regular variation between independent readings of the same film) was studied on 1,124 readings of 326 films from 233 chil-dren aged 16 months to 17 years. Seventy-nine of the RGs were full-size reproductions in Macyr’s Nutrition and Chemical Growthin Childhood; the remainder were actual
films of children in Halifax, Canada
(
healthy Orphanage residents and children examined in a nutrition survey).There was no significant difference in variable error associated with the atlas (Todd, Greulich-Pyle), age of child, sex, differences between skeletal and chrono-logic age, differences between children, or differences between RGs of the same child, except for a tendency in the Macy Series for the poorest reproductions to have a larger variable error than the best repro-ductions. In most readings the individual indicators were assessed separately and the results averaged, but a quicken method
(oven-all appraisal) did not produce a Signi-ficantly different variable error. The quick
method may be useful in large surveys, although it appears too coarse for the study of individual children.
The observer’s variable error was cx-pressed by standard deviations of approxi-mately three months (Macy Series-both at-lases; Nutrition Senies-Greulich-Pyle atlas) and four months (Orphanage Series-both atlases). With a standard deviation of three
months an assessor must affix an erron of
± 8.3 months to his estimate of a child’s progress in skeletal age, in order to obtain confidence limits with 95% probability. If his standard deviation is four months he must allow ± 11.1 months. For evaluation of the assessment method, many observers’ estimates of variable error are needed, and an appeal for data is issued.
After more than 1200 readings had been made the observer’s practice lapsed for about a year. Reassessment of a random sample of RGs then showed, besides van-able error, a mean systematic difference of approximately three months from the previ-ous readings of the same films with the same atlas. To avoid this risk, any two films that are to be assessed for skeletal progress should be read within a few weeks of each other, and special precautions are therefore necessary to secure independence of the two readings.
REFERENCES
1. Mainland, D., and Mainland, R. B., Evalua-tion of skeletal age method of estimating children’s development. I. Systematic er-rors in assessment of roentgenograms, PEDIAT1uC5 12:114, 1953.
2. Todd, T. W., Atlas of Skeletal Maturation (Hand), St. Louis, The C. V. Mosby Com-pany, 1937.
3. Greulich, W. W., and Pyle, S. I., Radio-graphic Atlas of Skeletal Development of Hand and Wrist, Stanford, Calif., The Stanford University Press, 1950.
4. Macy, I. G., Nutrition and Chemical Growth in Childhood, Springfield, Ill., Charles C Thomas, Publisher, 1946, vol. 2.
5. Mainland, D., Elementary Medical Statis-tics: Principles of Quantitative Medicine, Philadelphia, W. B. Saunders Company, 1952.
6. Thompson, C. M., and Merrington, M., Tables for testing homogeneity of set of estimated variances, Biometnika 33:296, 1946.
7. Bishop, D.
J.,
and Nair, U. S., Note on cer-tam methods of testing for homogeneity of set of estimated variances,J.
Roy. Stat. Soc. (supp.) 6:89, 1939.173
477 First Avenue SPANISH ABSTRACr
Valoraci#{243}n del
M#{233}todo de
Medir
la
Edad
Osea
Para
Estimar
el Desarrollo
de
los Ni#{241}os
II.
Errores
Variables
en la Apreciaci#{243}n
de
RadiografIas
Se cstudi#{243} cl error variable de uno de los
autores en ha valoraci#{243}n de Ia edad Osea radio-l#{243}gicaen 1 124 lecturas de 326 placas tomadas
a 233 niflos, con edad entre 16 meses y 17
afios; el error variable se determina por las
discrepancias que el observador tiene en las lecturas y observaciones aisladas utihizando las
mismas placas. 79 radiografIas correspondieron a reproducciones de tamaflo natural dcl libro de Macy “NutriciOn y Crecimiento QuImico en ha Infancia”; el resto de has radiografIas per-tenecicron a ni#{241}osde un orfanatonio de Hahi-fax, Canada, examinados en una encuesta nu-tricionah.
No se cncontrO diferencia significativa en el error variable debido al atlas (Todd, Greuhich-Pyle), edad dcl ni#{241}os, sexo, diferencias entre
has cdadcs Osea y cronol#{243}gica, difcrcncias entre los ni#{241}os,y diferencias entre las radiograflas dcl mismo niflo, y sI tendencia de has peores reproducciones de ha scnie de Macy de pre-sentar un error variable mayor que las mejores
reproducciones. En ha mayor parte de has lee-turas los halhazgos individuales se valoraron separadamente y los resultados se promediaron;
el m#{233}todo m#{225}sr#{225}pido de apreciaciOn general no di#{243}error variable significativamente difer-ente. Este m#{233}todor#{225}pidoes Otih en grandes encuestas pero grosero y burdo en el estudio de ni#{241}osindividuales.
El error variable dcl observador de este artIcuho se manifestO con desviacioncs standard
de aproximadamcntc 3 meses para ha senie de
Macy con ambos atlas y ha serie de ha encuesta
nutricional con el de Greuhich-Pyhe, y de 4 meses en la seric dcl orfanatonio con ambos
atlas. Con ha desviaci#{243}n standard de tres mcses
cualquier asesor puede tcncr un error de th8.3 meses en su estimaci#{243}n dcl progreso de ha edad
Osea dcl ni#{241}opara poder tener lImites
con-fiabhcs con un 95% de probabihidades. Si su desviaciOn standard es dc 4 meses, debe
per-mitirse un error de 1 1.1 meses.
Despu#{233}s de realizar m#{225}s1,200 lecturas, el autor, dejO pasar 12 meses. La revahoraci#{243}n de
radiograflas tomadas al azar mostr#{243}entonces adem#{225}s dcl error variable, una diferencia media
sistcm#{225}tica de cerca de 3 meses con relaci#{243}n
a has hecturas reahizadas previamente con has
mismas placas y con los mismos altas. Para
evitar este riesgo, los autores sugieren que
cu-ando se han de valorar dos placas desde el punto de vista dcl progreso #{243}seo,su hectura debe reahizarse con unas cuantas semanas de
diferencia, tom#{225}ndose precauciones especiales para obtcnersc independencia en las dos