The Thorny Nature of Predictive Validity Studies on Screening Tests for Developmental-Behavioral Problems

(1)

COMMENTARY

The Thorny Nature of Predictive Validity Studies on

Screening Tests for Developmental-Behavioral

Problems

Kevin Marks, MDa_{, Frances Page Glascoe, PhD}b_{, Glen P. Aylward, PhD}c_{, Michael I. Shevell, MD}d_{, Paul H. Lipkin, MD}e,f_{, Jane K. Squires, PhD}g

a_{Department of Pediatrics, PeaceHealth Medical Group, Eugene, Oregon;}b_{Department of Pediatrics, Vanderbilt University, Nashville, Tennessee;}c_{School of Medicine,}

Southern Illinois University, Springﬁeld, Illinois;d_{Departments of Neurology/Neurosurgery and Pediatrics, McGill University, Montreal, Quebec, Canada;}e_{Division of}

Neurology and Developmental Medicine, Kennedy Krieger Institute, Baltimore, Maryland;f_{Department of Pediatrics, Johns Hopkins University School of Medicine,}

Baltimore, Maryland;g_{College of Education, University of Oregon, Eugene, Oregon}

The authors have indicated they have no ﬁnancial relationships relevant to this article to disclose.

O

VER THE LASTfew years, several researchers have focused on the predictive validity of developmen-tal-behavioral screening tools.1–11 _{Predictive validity}

studies compare the results of a screening test adminis-tered at a single point in time (referred to hereafter as time 1) to the results of a diagnostic test or battery administered 3 months to several years later (time 2).12

Unlike concurrent validity studies in which both screen-ing and diagnostic measures are administered at the same time to determine the sensitivity and specificity of a screen, predictive validity studies depend on longitu-dinal measurement and focus on how well a screen predicts future developmental status. As a consequence, predictive validity research offers a critical illustration of whether a screening test measures dimensions of devel-opment that are enduring and have a meaningful impact on children’s long-term outcomes. As already stated by the American Academy of Pediatrics, a concerning, high-quality screening result should generate a referral to early intervention or special education, which has been shown to improve a child’s developmental, behavioral, and/or school-readiness trajectory.9,13–17

Nevertheless, predictive validity studies on screening tests are also fraught with challenges, particularly be-cause young children change rapidly. For example, at 18 months some children may not be talking much. By 24 months we expect much more. Some children who seemed fine at 18 months have begun to have trouble (eg, combining words). Others will have had the benefit of intervention and overcome earlier deficits. The ad-verse impact of psychosocial risk becomes more appar-ent with age. Developmappar-ental functions can be emergappar-ent, latent (not yet measurable), delayed, deficient, or disor-dered.13_{Measuring the moving target that is child}

devel-opment is not impossible, but it is one of the reasons that professional organizations such as the American Acad-emy of Pediatrics recommend ongoing surveillance at each well-child visit along with periodic screening using high-quality measures. Early intervention programs re-spond to referrals on the basis of screening tests with additional testing, intervention when indicated, and, if not, then often with ongoing monitoring, in full recog-nition that a child’s need for assistance often changes.

Accordingly, we, all of us screening test researchers and authors, encourage our colleagues to deploy skill

and insight when conducting predictive validity studies and, thus, recommend the following:

1. Account for intervening variables (ie, what happens to children during the interval between screening and subsequent administration of a diagnostic measure). Did these variables alter the child’s developmental and/or behavioral trajectory? Do these variables dif-fer significantly between the population screened and not screened? This accounting should include docu-mentation of both medical and psychosocial inter-ventions at both times 1 and 2. Intervening medical conditions and treatments (including recovery) oc-curring between times 1 and 2 should be documented (eg, iron-deficiency anemia, obstructive sleep apnea, anticonvulsant therapy). Other processes that might have enhanced development in the interim between measurement should be considered (eg, whether par-ents were given suggestions for developmental pro-motion activities at time 1, whether by time 2 they implemented those suggestions, and whether families enrolled, in the interim, in any among a range of intervention services [eg, housing assistance, aca-demic tutoring, speech-language therapy, parenting classes, etc]). Developmental and/or behavioral screens are unlike many medical screens (eg, testing for blood lead levels). They may have an “observa-tional effect.” Parent-based questionnaires can serve as a teaching tool when the parent is thoughtfully filling out the answers. They may alter clinicians’ conversation or actions during or after the visit. Most high-quality screening toolkits come with interven-tional parent handouts and/or activity sheets.

2. Ensure that the criterion battery is of good quality. In selecting diagnostic tests, attention should be paid to whether they have recent standardization (preferably in the last 10 years) on a large, nationally

represen-Opinions expressed in these commentaries are those of the author and not necessarily those of the American Academy of Pediatrics or its Committees.

www.pediatrics.org/cgi/doi/10.1542/peds.2007-3142

doi:10.1542/peds.2007-3142

Accepted for publication Jun 20, 2008

Address correspondence to Frances Page Glascoe, PhD, 25 Bragg Dr, East Berlin, PA 17316. E-mail: [email protected]

866 MARKS et al

at Viet Nam:AAP Sponsored on August 29, 2020

www.aappublications.org/news

(2)

tative sample, assess a broad range of developmental skills (preferably via domain scores so that strengths and weaknesses in screening test performance can be viewed), have been validated against other high-quality diagnostic tests so that measurement strengths and weaknesses are thoroughly identified, and have proven levels of various kinds of reliability (eg, test-retest, interrater, internal consistency). Screens are usu-ally broadband, and the reference measures need to be also. Criterion tests should also focus on outcomes and, thus, include measuring critical variables such as school performance, in-grade retention, enrollment in special services, graduation rates, etc. Nevertheless, selecting the criterion battery will always be a thorn on the stem of developmental-behavioral research, because there is no truly perfect “rose” to serve as the reference or gold standard.

3. Administer both the criterion battery and the screen-ing measure at times 1 and 2, which provides an indicator of developmental stability for each child and for each test. The resulting data serve as useful co-variants in accounting for growth (or lack thereof) and provide valuable guidance on how well both screening and reference measures account for ex-pected developmental changes. Such study design also provides valuable real-world information on children’s progress and whether the screening test under study has the capacity to predict future eligi-bility for services. Focusing on long-term outcomes in later childhood or adulthood affords a comparison between the results of the screen plus or minus cri-terion battery at time 1 and diagnostic testing at time 2. Two challenges remain, though. The first challenge is that in some cases the criterion battery may require different measures at each point in time. Some tests have a limited age range, and others must be selected in their stead. Nevertheless, the need for other mea-sures that define outcomes offers helpful variability but also potency in confirming conclusions. The second, and more important, challenge is that prob-lematic screening/diagnostic testing results in early childhood often trigger a number of interventions. Long-term predictive validity research, again, has a

thorn: testing can lead to altered outcomes.17

Re-search instead should focus on how and when screens can better identify medical and/or develop-mental conditions or disorders that are responsive to treatment(s). Investigating which populations re-spond best to which modalities of early intervention or special education is a more direct approach to improving long-term outcomes.

4. Thoroughly analyze the data set. If prediction from a screen to a diagnostic measure does not meet stan-dards for concurrent accuracy (typically, sensitivity

and specificity of ⱖ70% to diagnostic measures

ad-ministered along with a screen at time 1), determine which domains or items on a screen performed well. Gross motor performance, for example, may not have strong predictive validity, but receptive language per-formance may well be a long-term indicator of

suc-cess or problems. Because stable performance is char-acteristic of children with severe disabilities, when assessing those with potentially milder problems, consider which cutoffs on the diagnostic measure (usually set to various SDs below the mean) best capture the results of a screening test. Another worthwhile approach is to apply to the criterion bat-tery, criteria used to determine eligibility for early intervention and special education, because it helps ensure that the research findings have ecological va-lidity.

5. Appreciate findings in which screening test results predict the majority of diagnostic test performance at time 2, even if less than desired for concurrent accu-racy. Given that screening measures are brief by def-inition, they include only a few items at any 1 age level. Because development is dynamic and develop-mental problems evolve, it is impressive that any screening results at time 1 identify the majority of children with and without difficulties at time 2. Most predictive validity study results on diagnostic mea-sures (intelligence tests, educational batteries, etc) are expressed as correlations (an effect size) or percent-age of variance accounted for. For clinical decision-making on the basis of screening tests, such reporting is understandably less than satisfactory. Computing other tests of effect size, such as odds ratios, offers an alternative, as does tolerance, if not admiration, of sensitivity/specificity figures for predictive validity

studies that may be ⬍70% but are, nevertheless,

much greater than chance.

CONCLUSIONS

Even while we express consternation about how find-ings have been interpreted in several recently published predictive validity studies of screening tests, all such research is illustrative of the complexities of measuring child development. Because screening tests are designed to identify current problems so that they can be ad-dressed as early as possible, we encourage researchers to interpret their predictive validity findings in a more pos-itive light. Above all, we urge clinicians to value strong relationships between screens and diagnostic measures. In response to a screening test failure, referral to an early intervention agency is the first step. A diagnosis is not required. At the same time, medical providers should also make use of a concerning screening result to provide more diligent surveillance, possibly with supple-mental screening, and/or a pediatric subspecialist refer-ral, per the wise recommendations of the American Academy of Pediatrics.14

REFERENCES

1. Wake M, Gerner B, Gallagher S. Does parents’ evaluation of developmental status at school entry predict language, achievement, and quality of life 2 years later?Ambul Pediatr.

2005;5(3):143–149

2. Hess CR, Papas MA, Black MM. Use of the Bayley Infant Neurodevelopmental Screener with an environmental risk group.J Pediatr Psychol.2004;29(5):321–330

PEDIATRICS Volume 122, Number 4, October 2008 867 at Viet Nam:AAP Sponsored on August 29, 2020

(3)

3. Harris SR, Daniels LE. Reliability and validity of the Harris Infant Neuromotor Test.J Pediatr.2001;139(2):249 –253 4. Leonard CH, Piecuch RE, Cooper BA. Use of the Bayley Infant

Neurodevelopmental Screener with low birth weight infants.

J Pediatr Psychol.2001;26(1):33– 40

5. Aylward GP, Verhulst SJ. Predictive utility of the Bayley Infant Neurodevelopmental Screener (BINS) risk status classifications: clinical interpretation and application.Dev Med Child Neurol.2000; 42(1):25–31

6. Klee T, Carson DK, Gavin WJ, Hall L, Kent A, Reece S. Con-current and predictive validity of an early language screening program.J Speech Lang Hear Res.1998;41(3):627– 624 7. Sturner RA, Funk SG, Green JA. Preschool speech and

lan-guage screening: further validation of the sentence repetition screening test.J Dev Behav Pediatr.1996;17(6):405– 413 8. Rydz D, Srour M, Oskoui M, et al. Screening for developmental

delay in the setting of a community pediatric clinic: a prospec-tive assessment of parent-report questionnaires. Pediatrics.

2006;118(4). Available at: www.pediatrics.org/cgi/content/ full/118/4/e1178

9. McCormick MC, Brooks-Gunn J, Buka SL, et al. Early inter-vention in low birth weight premature infants: results at 18 years of age for the Infant Health and Development Program.

Pediatrics.2006;117(3):771–780

10. van Agt HME, van der Stege HA, de Ridder-Sluiter H, Verho-even LTW, de Koning HJ. A cluster-randomized trial of screen-ing for language delay in toddlers: effects on school perfor-mance and language development at age 8.Pediatrics. 2007; 120(6):1317–1325

11. Briggs-Gowan MJ, Carter AS. Social-emotional screening sta-tus in early childhood predicts elementary school outcomes.

Pediatrics.2008;121(5):957–962

12. Buck AA, Gart JJ. Comparison of a screening test and a refer-ence test in epidemiologic studies: I. Indices of agreement and their relation to prevalence. Am J Epidemiol. 1966;83(3): 586 –592

13. Capute AJ, Accardo PJ. A neurodevelopmental perspective on the continuum of developmental disabilities. In: Capute AJ, Accardo PJ, eds.Developmental Disabilities in Infancy and Child-hood. 2nd ed. Vol 1. Baltimore, MD: Paul H. Brookes; 1996:1–22 14. American Academy of Pediatrics, Council on Children With Disabilities, Section on Developmental Behavioral Pediatrics; Bright Futures Steering Committee; Medical Home Initiatives for Children With Special Needs Project Advisory Committee. Identifying infants and young children with developmental disorders in the medical home: an algorithm for developmental surveillance and screening [published correction appears in

Pediatrics. 2006;118(4):1808 –1809]. Pediatrics. 2006;118(1): 405– 420

15. Shonkoff JP. From neurons to neighborhoods: old and new challenges for developmental and behavioral pediatrics.J Dev Behav Pediatr.2003;24(1):70 –76

16. Guralnick MJ. Effectiveness of early intervention for vulnera-ble children: a developmental perspective.Am J Ment Retard.

1998;102(4):319 –345

17. Ramey CT, Ramey SL. Effective early intervention.Ment Retard.

1992;30(6):337–345

868 MARKS et al

(4)

DOI: 10.1542/peds.2007-3142

2008;122;866

Pediatrics

Lipkin and Jane K. Squires

Kevin Marks, Frances Page Glascoe, Glen P. Aylward, Michael I. Shevell, Paul H.

Developmental-Behavioral Problems

The Thorny Nature of Predictive Validity Studies on Screening Tests for

Services

Updated Information &

http://pediatrics.aappublications.org/content/122/4/866 including high resolution figures, can be found at:

References

http://pediatrics.aappublications.org/content/122/4/866#BIBL This article cites 15 articles, 4 of which you can access for free at:

Subspecialty Collections

rning_disorders_sub

http://www.aappublications.org/cgi/collection/cognition:language:lea

Cognition/Language/Learning Disorders

al_issues_sub

http://www.aappublications.org/cgi/collection/development:behavior

Developmental/Behavioral Pediatrics

following collection(s):

This article, along with others on similar topics, appears in the

Permissions & Licensing

http://www.aappublications.org/site/misc/Permissions.xhtml in its entirety can be found online at:

Information about reproducing this article in parts (figures, tables) or

Reprints

http://www.aappublications.org/site/misc/reprints.xhtml Information about ordering reprints can be found online:

(5)

DOI: 10.1542/peds.2007-3142

2008;122;866

Pediatrics

Lipkin and Jane K. Squires

Kevin Marks, Frances Page Glascoe, Glen P. Aylward, Michael I. Shevell, Paul H.

Developmental-Behavioral Problems

The Thorny Nature of Predictive Validity Studies on Screening Tests for

http://pediatrics.aappublications.org/content/122/4/866

located on the World Wide Web at:

The online version of this article, along with updated information and services, is

the American Academy of Pediatrics, 345 Park Avenue, Itasca, Illinois, 60143. Copyright © 2008 has been published continuously since 1948. Pediatrics is owned, published, and trademarked by Pediatrics is the official journal of the American Academy of Pediatrics. A monthly publication, it