Adaptations
of the
Denver
Developmental
Screening
Test:
A Study
of Preschool
Screening
Raymond A. Sturner, MD, Mark Horton, MD, Sandra
C.
Funk, PhD,Joanne Barton, RN, Thomas E. Frothingham, MD, and
Joseph N. Cress, PhD
From the Department of Pediatrics, Duke University Medical Center, Durham,
North
CarolinaABSTRACT. Developmental screening tests are only
rarely used in pediatric practice, reportedly because of
lack of available time. This study evaluated a shortened form of the Denver Developmental Screening Test
(DDST-S) consisting of only those items immediately to
the left of the child’s individual age line, three in each sector or a total of 12. This DDST-S was administered to
four cohorts of preschool children (aged 52 to 64 months),
1,819 children in all. Subsamples of these children re-turned within three months for one of several
develop-mental (criterion) tests (McCarthy Scales of Children’s
Abilities, the complete DDST, or the Stanford-Binet). The DDST-S was scored by selecting the profile of passes and failures most predictive of McCarthy test results,
using indices of copositivity, conegativity, underreferral,
and overreferral as the basis for the decision. Utilizing
this scoring system, use of the DDST-S was able to
identify low scorers (those scoring less than 70) on the
Stanford-Binet (sensitivity = .67, specificity = .95,
predic-tive value = .54, underreferral = 2.5%, overreferral = 4%)
as well as the complete DDST. Low scoring children .could thus be identified in less than half the time required by the complete DDST. A two-stage DDST-S and DDST
procedure was found to have even greater predictive
value (76%; 100% if borderline cases [score of 70 to 80]
are considered positive) than either form alone. Pediat-rics 69:346-350, 1982; developmental screening, pediatric screening, preschool child development, well child care.
In a recent survey of pediatricians,’ 97% of the respondents felt that formal developmental screen-ing should be performed routinely at the time of the “well child visit.” However, very few of these pedia-tricians actually accomplished this ideal. Only 10% to 15% said they used any of the standard
devel-Received for publication Sept 11, 1980; accepted May 21, 1981.
Reprint requests to (R.A.S.) Department ofPediatrics, Box 3890, Duke University Medical Center, Durham, NC 27710.
PEDIATRICS (ISSN 0031 4005). Copyright © 1982 by the
American Academy of Pediatrics.
opmental tests, and most of these tested only chil-dren already suspect developmentally. The reason
most often given for failure to use developmental screening tests was that they are too
time-consum-ing.
When these tests are used in pediatric practice, the one most commonly used”2 is the Denver De-velopmental Screening Test (DDST).3 This test is reliable,4 and its validity has been documented through comparison with recognized full diagnostic
developmental tests5’6 and follow-up school
per-formance.7 However, the above survey indicates that the 15 to 20 minutes required for testing each child is an obstacle to its pediatric usage.
For scoring purposes, all DDST items are
consid-ered to be in one of two categories. One category consists of those items expected to be achieved by more than 90% of children. These items appear to the left of the child’s age line on the standard DDST score sheet and are called “delays” if the child fails. The other items on the DDST are expected to be achieved in 25% to 89% of children, depending on where the age line intersects the individual item. The pass/fail criterion for the DDST is based
pri-marily on delays. The study reported here explored
the possibility of detecting developmental problems
by initially administering only the items expected
to be achieved in more than 90% of children (poten-tial “delay” items). The study is presented in two
parts: (1) the development of a scoring system from
analysis of data for one year of the study and (2) a
validation study incorporating data from an addi-tional three years.
The studies being reported here are those which have, in part, led to the current recommendations
by the originators of the DDST for use of a
ing. Because some developmentally handicapping conditions are rare in the general population but important to recognize, the stratification procedure was designed to maximize the likelihood of includ-ing these subjects within the group to receive the criterion test by oversampling DDST-S failures. McCarthy testing was performed on the 129 chil-dren in the stratified sample within three months.
As a stratified sampling technique with unequal proportions was used to select the sample, the re-sulting group could not be considered representa-tive of the total population. In all analyses, subjects’ scores were weighted by a reciprocal function of the subject’s probability of selection to approximate the population distribution. Potential DDST-S scoring methods that closely resembled the unshortened DDST scoring method (ignoring items passing through the age line) were computed on weighted scores of the stratified sample (weighted n = 255)
and compared on their ability to predict children scoring low (scores less than 70) on the McCarthy General Cognitive Index based on standard
mdi-ces’2 used to evaluate developmental screening tests (Figure).
Of
the potential scoring systems, several had similar indices. The following system was chosen from among them because it had the highest sen-sitivity and had other indices that were at least as good as the other systems: any child who failed two or more items in any single category or who failed a single item in three or more categories was con-sidered to have failed the screening test; failing one item in one or two categories was considered ques-tionable; and not failing any items was considered a pass. Of the 46 children receiving low scores (<70) on the McCarthy Scales, 28 were correctly identi-fled by the DDST-S (sensitivity =.61),
and 162 ofthe 207 scoring borderline or above (70) were correctly identified (specificity = .78). The rates of
under- and overreferral were 7% and 18%, respec-tively.
Data collected on 50 randomly selected children
Screening Test Results
METHODS
Setting and Sample
The study was conducted in a medically under-served rural area (Person County, North Carolina). To identify the health needs of the children in this community, health and developmental screening are performed in conjunction with registration for kindergarten. Each year, in February, parents of children who will begin kindergarten in August are urged by radio and newspaper publicity to attend this combined health screening and school registra-tion week. The present study was part of a larger investigation of the value of several health and developmental screening procedures9 and encom-passed four years of the program (1975 through 1978). During this period, the number of children seen annually varied between 382 and 475. Accord-ing to school officials, at least 95% of the appropriate aged children in the county (aged 52 to 64 months) were tested each year.
Demographic data are available for 1977 and 1978, and inasmuch as the data are very similar for these two years, they are combined. There were
approximately equal numbers of male (52%) and
female (48%) children in the sample, with blacks representing 43%. The children were predominantly from rural homes (69%), and the majority of the households were headed by fathers (71%). Of the heads of household, 56% had either graduated from high school or received post-high school education. All income leveLs were represented, with 38% of the sample reporting incomes of $10,000 or more per year. Previous census data indicate that this county has somewhat higher levels of education and in-come than the state as a whole, and thus more closely approximates national ms’#{176}
Determination of Scoring: Year 1
The short form of the DDST (DDST-S) evalu-ated here included items immediately to the left of, but not touching, the child’s individual age line-three items in each of the four sectors-gross motor, fine motor adaptive, personal social, and language, or a total of 12 items. To determine the best method of scoring this shortened form, both the DDST-S and a criterion test (McCarthy Scales of Children’s Abilities)” were administered to a cohort of chil-dren. Public health nurses who had completed rec-ommended DDST training under the supervision of a clinical psychologist and for whom reliability had been established (96% agreement) administered the DDST-S to 440 children. Because it was not feasible to perform criterion testing on all of the children to whom the short form was administered, a stratified sample was recalled for McCarthy
test-Criterion Test Results
Problem No Problem
Problem a b
No Problem c d
Figure. Formulas for clinical indices are as follows:
sensitivity (copositivity) = a/(a + c); specificity (co-negativity) = d/(b + d); predictive validity = a/(a + b); overreferral rate = b/(a + b + c+ d); underreferral rate
= c/(a + b + c + d); percentage agreement = (a + d)/(a
TABLE 1. Denver Developmental Screening Test Short Form (DDST-S) Prediction of Stanford-Binet
Scores*
Failure Questionable Pass
Total
18 4 11 33
8 27 48 83
1 16 229 246
27 47 288 362
TABLE 2. Two-Stage DDST-S/DDST Procedure Prediction of Stanford-Binet Scores*
DDST-S/DDST
Failure (abnormal) Questionable Pass (normal)
19 6 0 25
5 8 5 18
4 31 283 318
indicated that the average DDST-S testing time was seven minutes. This included some time to check the child’s immunization record and com-ment regarding the need for further immunizations, as well as time for greeting the mother, explaining the test procedure, and recording results. Thus, seven minutes may be a slight overestimate of ac-tual test administration time.
Validation of DDST-S: Years 2 to 4
Using the same procedure as before, the DDST-S was administered to three additional cohorts of children (1,738 children). In each cohort, a stratified sample of children (oversampling DDST-S failures as before) was recalled within three months to receive the unshortened DDST (401 children). Fol-lowing administration of the DDST, one of the three cohorts received the Stanford-Binet.’3 DDST testing was performed by experienced nurse clini-cians or psychometrists; all Stanford-Binet testing was performed by the psychometnsts. Testers did not know the previous testing results. The shortened form of the DDST was scored as de-scribed above. The unshortened DDST was scored according to the revised scoring system recom-mended by the authors in the 1975 manual.’4
As before, for analysis purposes, the data for each subject were weighted by a reciprocal function of his/her probability of selection to approximate the population distribution (weighted n = 1,379). The
DDST-S was related to the unshortened DDST and was able to classify correctly 68 of the 86 children who received failing scores on the unshor-tened DDST and 1,195 of the 1,293 scoring in the questionable and passing categories. Sensitivity was .79 and specificity, .92; the underreferral rate was only 1% and overreferral, 7%.
A more important means of assessing the useful-ness of the DDST-S was felt to be its ability to predict a recognized criterion diagnostic test (not just another screen) which is independent of the
DDST-S in content and was not used to develop the scoring system (as was the McCarthy). The Stanford-Binet was the criterion chosen. The data from the one cohort receiving the Stanford-Binet indicate that the DDST-S did an even better job at predicting the Stanford-Binet than at predicting the McCarthy (Table 1). Of the 27 low-scoring subjects (those scoring less than 70), 18 were cor-rectly identified as were 320 of the 335 subjects whose scores were borderline or above (sensitivity
= .67 and specificity = .95). When the unshortened
DDST results were compared with the Stanford-Binet results for these same subjects, the indices were almost identical, indicating that the longer test provided no better indices of prediction
(sen-sitivity = .68, specificity = .95, overreferral = 2.5%,
underreferral = 4.4%).
The same data were used to evaluate the possi-bility of enhancing the predictive value of the DDST-S by using a two-stage screening process in which the complete DDST was administered to all children with a questionable or failure score on the DDST-S before referring any for diagnostic testing. As can be seen in Table 2, the results indicate that sensitivity and specificity were not improved (.68 and .98, respectively) when the DDST was per-formed on all children with failing or questionable scores on the DDST-S, and only those failing this second screening were referred for diagnostic test-ing. However, the predictive value (chance that the referral is correct) did increase from .54 for the DDST-S alone to .76 with the two-stage procedure,
DDST-S
Low
Stanford-B met Score
Border- Av or Total
(<70) line
(70-79)
Above
(80)
Indices for prediction of low scorers on Stanford-Binet form failures on DDST-S are as follows: sensitivity (co-positivity), .67; specificity (conegativity), .95; predictive value, .54; underreferral rate, 2.5%; overreferral rate, 4.1%;
% agreement, 93.1%.
* Number of children receiving specified scores is shown;
total actual n = 1 16; total weighted n = 362.
Stan ford-Bine t Score Total
Av or Low Bor-(<70) derline (70-79) Above (?80)
Total 28 45 288 361
Indices for prediction of low scores on Stanford-Binet from failures on DDST-S/DDST are as follows: sensitiv-ity (copositivity), .68; specificity; (conegativity), .98; pre-dictive value, .76; underreferral rate, 2.5% overreferral rate, 1.7%; % agreement, 95.8%.
* Complete DDST was performed on those children
reflecting somewhat lower overreferral and under-referral rates (2.5% and 1.7%, respectively). When those children who failed and those who had ques-tionable scores were referred from the second-stage DDST, sensitivity was improved over the DDST-S used alone, but there was no improvement in the predictive value (.56 vs .54 with the DDST-S alone).
Up to this point, we have been exclusively con-cerned with predicting low scorers (those scoring less than 70) on the criterion tests. However, bor-derline children (those scoring between 70 and 80) may also be at risk for later classroom dysfunction.
If children scoring less than 80 are considered to be true “positives,” the two-stage procedure shows some advantage over direct referral from the DDST-S. This is because all of the “false-positives” from the two-stage procedure were in the borderline category, whereas only 27% of the false-positives from the DDST-S fell into this relatively important category (Tables 1 and 2). Unfortunately, with any combination of tests (DDST-S, DDST, or two-stage), approximately 30% ofthe sample would have had to be referred for diagnostic testing in order to identify the majority of the borderline children.
DISCUSSION
The clinical indices derived for prediction of the criterion by the DDST-S (Tables 1 and 2) indicate that the DDST-S approach should be useful as a screening test. It should be clear that these results apply only to the age range (52 to 64 months) investigated by this study. However, inasmuch as more than 1,800 children of a rather wide distribu-tion of income and education levels were assessed, these results should represent good estimates of test effectiveness, at least for this age group of a predominantly rural population. Further, these children aged 52 to 64 months are approaching the age of school entry and are therefore the group for whom screening is mandated by federal legislation
(PL 94-142).’#{176}
Referral for Stanford-Binet testing directly from the DDST-S was virtually identical with that ob-tamed for the complete DDST. Inasmuch as the DDST-S requires less than half the time required for administration of the unshortened DDST, the DDST-S offers an obvious advantage for pediatric practice.
A parent-administered prescreening developmen-tal questionnaire’6 was devised to shorten the time required for screening by administering complete DDSTs only to children who seemed to be at risk on this parent report inventory on two occasions.8 Inasmuch as the prescreening developmental ques-tionnaire (PDQ) has been shown to exhibit differ-ential patterns of prediction based on level of soci-oeconomic status (SES) (with better prediction in
the higher levels),’7 additional analyses were per-formed to determine whether similar differences exist for the DDST-S. The sample was stratified by education level of the head of household for one set of analyses and by annual family income for a second test. Clinical indices relating the DDST-S to the Stanford-Binet and the McCarthy General Cognitive Index scores were calculated within each stratum. Although the number of children scoring low on the criterion measures was small in some of the upper levels (and thus the results must be considered tentative in these cases) the sensitivity and specificity indices were similar to those re-ported earlier in this paper for all levels of income and education. However, the predictive values were substantially lower in the upper levels of education and income.
Frankenburg, the originator of the DDST, has followed up on our present study, and has confirmed the usefulness of the DDST-S in low SES popula-tions for all ages. His group has recommended the prescreening developmental questionnaire (PDQ) for high SES populations and a “two-stage” DDST-S/DDST procedure similar to the one outlined in this paper for low SES. Instead of calling children back for the second stage, Frankenburg’s group accomplished the entire procedure in one setting. That is, the remaining DDST items needed to complete an unshortened DDST are administered on the spot only for those children with a delay on one of the 12 DDST-S items. It was a two-stage procedure that yielded the best predictive value in our study.
Developmental screening is typically designed to identify those children who will score more than 2 SD below the mean on diagnostic developmental and intelligence tests as in the current study. This is the group defined by convention as mentally retarded. Although it appears from this study and others5’6 that nearly all children scoring in the ab-normal category on the DDST have current devel-opmental difficulties and that future problems in school can be anticipated,7 it should be remembered that there are additional children with present de-velopmental difficulties of a behavioral nature that are not identified. Furthermore, the majority of those proving to have later classroom dysfunction are not flagged by DDST tests. The scoring systems
Despite its limitations, adaptations of the DDST described here are cost efficient with regard to staff time and do identify preschool children at very high risk for later classroom dysfunction. These advan-tages lead us to recommend their use in routine pediatric practice.
ACKNOWLEDGMENTS
This work was supported in part by the Department of Health and Human Services, Bureau of Community Health Service, Maternal and Child Health Research
grant MC-R-370427-0i-0, a contract with the North Car-olina Department of Human Resources, Maternal and
Child Health Branch, and the Robert Wood Johnson Foundation.
We thank the public health nurses of Person County for their work as testers and coordinators of the project. The Person County Board of Education provided essen-tial assistance through collaborating in the joint screening effort under the direction of Greta Jeffers. Dr Wffliam
Frankenburg and his associate, Ceil Coons, are acknowl-edged for their critical review and helpful comments. Madalu Wright is recognized for her assistance with data tabulation and coordination of criterion testing. Appre-ciation is expressed also to Teressa Coleman for secretar-ial assistance in preparing the manuscript.
REFERENCES
1. Smith RD: The use of developmental screening tests by primary-care pediatricians. J Pediatr 93:524, 1978
2. Moore BD: Implementing the developmental assessment components of the EPSDT program. Am J Orthopsychiatry 48:22, 1978
3. Frankenburg WK, Dodds JD: The Denver Developmental
Screening Test. J Pediatr 71:181, 1967
4. Frankenburg WK, Camp BW, Van Natta PA, et al:
Relia-biity and stability of the Denver Developmental Screening Test.Child Det’ 42:1315, 1971
5. Frankenburg WK, Camp BW, Van Natta PA: Validity of the Denver Developmental Screening Test. Child Dev 42:475,
1971
6. Frankenburg WK, Goldstein A, Chabot A, et al: The Revised Denver Developmental Screening Test: Its accuracy as a
screening instrument. J Pediatr 79:988, 1971
7. Camp BW, Van Doorninck WJ, Frankenburg WK, et al: Preschool developmental testing in prediction of school
problems. Clin Pediatr 16:257, 1977
8. Fandal AW, Kemper MG, Frankenburg WK: Needed: Rou-tine developmental screening in all children. Pediatric Bas-ics. Ormond Beach, FL, Gerber Publications, 1979, vol 24 9. Sturner RA, Funk SG, Barton J, et a!: Simultaneous screen-ing for child health and development: A study of visual/ developmental screening of preschool children. Pediatrics
65:614, 1980
10. US Department of Commerce, Social and Economic Statis-tics Administration, Bureau of Census: Characteristics of the Populations, pt 35: North Carolina. Table 158 and pp
35-605, 1973
11. McCarthy D: A Manual for the McCarthy Scale of Chil-dren’s Abilities. New York, Psychology Corp, 1972 12. Frankenburg WK, Camp BW: Pediatric Screening Tests.
Springfield, IL, Charles C Thomas Publisher, 1975
13. Terman LM, Merril MA: Stanford-Binet Intelligence Scale Manual for the Third Revision Form L-M. Boston, Hough-ton Mifflin Co, 1972
14. Frankenburg WK, Dodds JB, Fandal AW, et al: Denver Developmental Screening Test Reference Manual revised 1975 edition. Denver, University of Colorado Medical Center 15. The education for all handicapped act of 1975, PL 94-142; 20
USC 1401 et seq: Federal Register 42(163):42474 (Aug 23) 1977
16. Frankenburg WK, van Doorninck WJ, Liddell T, et al: The
Denver Prescreening Developmental Questionnaire (PDQ). Pediatrics 57:744, 1976
17. Kemper M: The two of three stage screening procedure in
developmental screening, in Proceedings of the Second In-ternational Conference on Developmental Screening.