Published online: 29 Jul 2014.

(1)

This article was downloaded by: [Wayne State University] On: 30 July 2014, At: 10:36

Publisher: Routledge

Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Applied Neuropsychology: Child

Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/hapc20

Attention Problems and Stability of WISC-IV Scores

Among Clinically Referred Children

Marla Green Bartoi a , Jaclyn Beth Issner b , Lesley Hetterscheidt c , Alicia M. January d , Jeffrey Garth Kuentzel a & Douglas Barnett a

a

Psychology , Wayne State University , Detroit , Michigan b

Child/Adolescent Psychiatry , University of Michigan , Ann Arbor , Michigan c

Pine Rest Christian Mental Health Services , Grand Rapids , Michigan d

Psychology , Marquette/Shriners Hospitals for Children , Chicago , Illinois Published online: 29 Jul 2014.

To cite this article: Marla Green Bartoi , Jaclyn Beth Issner , Lesley Hetterscheidt , Alicia M. January , Jeffrey Garth Kuentzel

& Douglas Barnett (2014): Attention Problems and Stability of WISC-IV Scores Among Clinically Referred Children, Applied Neuropsychology: Child, DOI: 10.1080/21622965.2013.811075

To link to this article: http://dx.doi.org/10.1080/21622965.2013.811075

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no

representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any

form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

(2)

Attention Problems and Stability of WISC-IV Scores

Among Clinically Referred Children

Marla Green Bartoi

Psychology, Wayne State University, Detroit, Michigan

Jaclyn Beth Issner

Child=Adolescent Psychiatry, University of Michigan, Ann Arbor, Michigan

Lesley Hetterscheidt

Pine Rest Christian Mental Health Services, Grand Rapids, Michigan

Alicia M. January

Psychology, Marquette=Shriners Hospitals for Children, Chicago, Illinois

Jeffrey Garth Kuentzel and Douglas Barnett

Psychology, Wayne State University, Detroit, Michigan

We examined the stability of Wechsler Intelligence Scale for Children-Fourth Edition (WISC-IV) scores among 51 diverse, clinically referred 8- to 16-year-olds (Mage¼ 11.24

years, SD¼ 2.36). Children were referred to and tested at an urban, university-based training clinic; 70% of eligible children completed follow-up testing 12 months to 40 months later (M¼ 22.05, SD ¼ 5.94). Stability for index scores ranged from .58 (Processing Speed) to .81 (Verbal Comprehension), with a stability of .86 for Full-Scale IQ. Subtest score stability ranged from .35 (Letter-Number Sequencing) to .81 (Vocabulary). Indexes believed to be more susceptible to concentration (Processing Speed and Working Memory) had lower stability. We also examined attention problems as a potential moderating factor of WISC-IV index and subtest score stability. Children with attention problems had signiﬁ-cantly lower stability for Digit Span and Matrix Reasoning subtests compared with children without attention problems. These results provide support for the temporal stability of the WISC-IV and also provide some support for the idea that attention problems contribute to children producing less stable IQ estimates when completing the WISC-IV. We hope our report encourages further examination of this hypothesis and its implications.

Key words: attention problems, stability, WISC-IV

The Wechsler Intelligence Scale for Children and its revisions (WISC, Revised [WISC-R], Third Edition [WISC-III], Fourth Edition [WISC-IV]; Wechsler, 1949,

1974, 1991, 2003a, 2003b) have been the most frequently used standardized, nationally normed, psychometric measure of a child’s intellectual functioning as compared with same-age peers (Reschley, 1997). Intelligence is generally considered a stable ability measurable by at least the preschool years (Kaufman, 2009; Sattler, 2001). Through three revisions, the WISC Full-Scale

Address correspondence to Marla Green Bartoi, Ph.D., Psychology, Wayne State University, 60 Farnsworth, Psychology Clinic, Detroit, MI 48202. E-mail: [email protected] ISSN: 2162-2965 print=2162-2973 online

DOI: 10.1080/21622965.2013.811075

(3)

IQ (FSIQ) score has been found to be strongly (.79–.95) stable across intervals of a few months to a few years (Canivez & Watkins, 1998; Gehman & Matyas, 1956; Schwean & Saklofske, 1998; Wechsler, 1974, 1991).

The WISC-IV (Wechsler, 2003a, 2003b) includes numerous changes that might have an impact on score stability. These revisions include eliminating or substituting subtests (e.g., Picture Arrangement, Arithmetic), adding new subtests (e.g., Matrix Reasoning), reducing timed tasks, and using a new factor structure yielding four index scores: Verbal Comprehension Index (VCI), Per-ceptual Reasoning Index (PRI), Working Memory Index (WMI), and Processing Speed Index (PSI). Stability of the WISC-IV was examined across 12 to 63 days among 243 children from the standardization sample with the following results: IQ¼ .89, VCI¼ .85, PRI¼ .85, WMI¼ .85, and PSI ¼ .79 (Wechsler, 2003b). Because of the widespread clinical use of the WISC-IV, its numer-ous revisions, and the unique social and motivational context of a potentially high-stakes clinical assessment, we think it is necessary to examine this new edition beyond its standardization sample, especially its practi-cal clinipracti-cal use. Ryan, Glass, and Bartels (2010) exam-ined the test–retest stability of 43 children attending private school with an 11-month interval and found reli-abilities of .88, .76, .68, .75, and .54 for IQ, VCI, PRI, WMI, and PSI, respectively. In their study of WISC-IV process score stability, Ryan, Umﬂeet, and Kane (2013) suggested that future stability studies include a more diverse sample and different test–retest intervals. It was also suggested that these studies include a sample of children with attention-deﬁcit disorder. Consequently, we designed a study utilizing a diverse sample taking the test following a clinical referral. In addition, we also were interested in whether and how attention problems and impulsivity, common concerns among clinically referred children, might negatively affect test performance and lead to inconsistent retest performance.

Findings on the effects of attention problems and attention-deﬁcit hyperactivity disorder (ADHD) on IQ scores based on prior editions of the WISC have been mixed: In one study, attention problems were estimated to have a 2- to 5-point effect on IQ and were particularly likely to appear on subtests such as Digit Span, Coding, Information, and Arithmetic—the latter two being optional subtests on the WISC-IV (Jepsen, Fagerlund, & Mortensen, 2009). Another study showed that nonverbal tests have lower stability estimates than ver-bal measures among children with attention problems (Nyden, Billstedt, Hjelmquist, & Gillberg, 2001). Perhaps nonverbal tasks are more susceptible to lapses in attention because they are less interactive and more time-dependent than verbal subtests.

A few studies have examined the stability of IQ scores among children with ADHD in previous versions of the

Wechsler intelligence tests. Nyden et al. (2001) found stable FSIQ, Verbal IQ (VIQ), and Performance IQ (PIQ) scores 1 to 2 years after initial assessments for boys with ADHD tested with the WISC-R (n¼ 1) or WISC-III (n¼ 13) with a significant increase in nonverbal subtests over time. Among clinically referred children with ADHD, Schwean and Saklofske (1998) also reported finding significant and high levels of WISC-III stability 30 months later in an unpublished study. Stability was .84, .86, .74, .87, .74, .74, and .58 for IQ, VIQ, PIQ, VCI, Perceptual Organization Index (POI), Freedom from Distractibility Index, and PSI respectively. Notably, the PSI, recognized as a measure of attention, had the lowest stability among the index scores.

Although we have not found any publications that examine the stability of the WISC-IV among children with and without attention problems, a few studies have examined whether and how attention problems and ADHD may influence its scores. The WISC-IV manual (Wechsler, 2003b) reported that children with ADHD scored significantly lower on the PSI, WMI, and FSIQ compared with children without ADHD. Mayes and Calhoun (2007) found that children with ADHD scored significantly lower on the WMI and PSI on both the WISC-III and the WISC-IV, relative to their own performance on the VCI, POI (WISC-III), and PRI (WISC-IV). Schwean and Saklofske (2005) reported that while children with ADHD scored lower on the VIQ than the PIQ as well as on the VCI and Perceptual Organization Index (POI) factors on the WISC-III (Schwean, Saklofske, Yackulic, & Quinn, 1993), this was less pronounced on the WISC-IV. Perhaps, the WISC-IV is less susceptible to attention problems than its predecessor (Schwean & Saklofske, 2005). One rea-son for this difference may be the reduced dependence on time constraints on the WISC-IV. Time-dependent tests may be more susceptible to momentary lapses in attention than untimed tests.

We expected lapses in attention among children with higher attention problems would lead to inconsistent responding and thereby lower their mean scores and stability estimates as compared with those of children with fewer attention problems. We expected that these attention effects would be most apparent on timed tasks (e.g., Block Design, Coding) during which even momen-tary lapses in attention and impulse control might result in delays on timed tasks and thereby have an impact on scores. Moreover, we thought impulsive responding especially would have a negative impact on subtests that relied upon multiple choice (e.g., Matrix Reasoning, Picture Concepts). Lastly, because working memory is often poorer among children with attention problems (Martinussen, Hayden, Hogg-Johnson, & Tannock, 2005), we also thought WMI scores would be lower among children with attention problems.

2 BARTOI ET AL.

(4)

METHOD Participants

Participants were 51 children aged 8 to 16 years old (Mage¼ 11.24 years, SD ¼ 2.36) who had been referred

for and completed a psychoeducational evaluation, minimally 1 year prior. The majority of the children came from lower-income households and were ethnically diverse with 43% Caucasian, 55% African American, and 2% Hispanic. Of the 51 participants, 35 were boys (69%). At the initial assessment, all children were referred by their parents because of concerns regarding academic or behavioral functioning, or a combination of the two. As a result of the assessment, participants were judged by the evaluator and Ph.D.-level supervisor as to whether the child met criteria for any Diagnostic and Statistical Manual for Mental Disorders-Fourth Edition diagnoses. Furthermore, the parents were provided these diagnoses during a feedback session and in writing as part of a psy-choeducational evaluation report provided at the end of the Time 1 (T1) assessment. See Table 1 for percentages of participants who met criteria for the diagnoses in the full sample as well as in the high- and low-attention groups. Only those classiﬁed in the high-attention group had diagnoses of ADHD (n¼ 10), oppositional deﬁance disorder (n¼ 2), conduct disorder (n ¼ 2), and enuresis

(n¼ 1); the high-attention group did not have diagnoses of a nonverbal learning disability, phonological disorder, or anxiety. Also, 12 participants in the high-attention group did not have any diagnoses, whereas only 5 participants in the low-attention group had no diagnosis. Because the reliability of the diagnosticians was not established systematically and because the diagnoses were not independent of the assessment data gathered, no analyses were conducted by clinical diagnosis. The aver-age time between testing at T1 and Time 2 (T2) was 22.05 months (SD¼ 5.94, range ¼ 12–32 months). Procedures

Families were identiﬁed from a clinic database contain-ing assessment information on all children seen at a Midwestern, urban, university training clinic between 2003 and 2005. One year after their initial assessment, parents and children were invited back for an assessment follow-up research study when they met the following criteria: (a) The parent sought the assessment concerned about problems their child was having at school; (b) the child completed the WISC-IV; (c) the child was aged 8 to 16 years old; and (d) the child had an IQ greater than 60. The criteria were utilized so that we could focus our resources on a more homogeneous sample. There were 72 children who met our criteria. Unless we heard otherwise from a parent, we made repeated attempts to contact each by phone and mail. Ultimately, 51 (70%) agreed and completed the assessment. Informed consent was obtained verbally over the phone during recruitment and again in writing along with child assent at the onset of the research visit. Parents were promised and paid a $50 honorarium for taking the time and resources to travel to and complete our follow-up study.

Measures

Attention problems. Children were judged to have attention problems based on parent and teacher ratings gathered within a few weeks of the initial child assessment. Speciﬁcally, the child’s parent completed the Child Behavior Checklist (CBCL; Achenbach, 1991a) and their teacher completed the Teacher Report Form (TRF; Achenbach, 1991b) at T1. Both forms list more than 112 child emotion and behavior problems that the informant rated on a scale from 0 (not true), 1 (sometimes true), or 2 (very true) for the child in ques-tion. The CBCL and TRF are nationally normed, widely used, and well-validated measures of child psychopath-ology. Both include a 10-item subscale for attention problems (e.g., ‘‘can’t concentrate,’’ ‘‘impulsive’’). Their validity is supported by signiﬁcant correlations with other established measures of corresponding child

TABLE 1

Diagnoses in the Full Sample and in the Low- and High-Attention Problem Groups Diagnosis Total Sample (n¼ 51) Percentage % Low-Attention Problems (n¼ 21) Percentage % High-Attention Problems (n¼ 29) Percentage % Attention-Deﬁcit Hyperactivity Disorder 19.6 0 34.5 Reading Learning Disability 19.6 23.8 17.2 Writing Learning Disability 7.8 9.5 6.9 Math Learning Disability 9.8 9.5 10.3 Nonverbal Learning Disability 3.9 4.8 0 Phonological Disorder 3.9 9.5 0 Mild Mental Retardation 5.9 9.5 3.4 Mood Disorder 3.9 4.8 3.4 Anxiety Disorder 5.9 14.3 0 Oppositional Deﬁant Disorder 3.9 0 6.9 Conduct Disorder 3.9 0 6.9 Enuresis 2.0 0 3.4 No Diagnosis 33.3 23.8 41.4

(5)

behavioral functioning and also their ability to discrimi-nate statistically between clinically referred and nonre-ferred, ‘‘normal’’ children (Achenbach, 1991a, 1991b). A T score of 65 is considered by the test developers to be the cutoff for borderline clinical significance. Among the children in our sample, the average attention scores were 64.10 (SD¼ 9.96, n ¼ 48) and 62.85 (SD ¼ 7.46, n¼ 41) for parent and teacher reports, respectively. For the purposes of our analyses, we coded children as having an attention problem if the child received a score of greater than 65 on the Attention Problems Scale as rated by parent or teacher; all others were coded as not having attention problems. We chose this definition of attention problems for a variety of reasons: (a) The ratings were based on adults who know the child well; (b) they were independent of the child’s behavior when completing the WISC; (c) every child had at least one rating from parent or teacher (n¼ 39 had both); and parent and teacher ratings on attention problems were moderately associated (r¼ .41, n ¼ 39, p < .01). In contrast, utilizing an official diagnosis of ADHD would be based on examiner judgments and the pattern of WISC-IV performance itself, and consequently, it would not be independent of the WISC-IV. In addition, a diagnosis, in contrast to obtaining standardized ratings, would be harder to reproduce reliably across studies. However, it is noteworthy that even though an ADHD diagnosis was not used in determining attention group categorization, there were no participants with an ADHD diagnosis in the low-attention problems group, and there were 10 (34.5%) participants with an ADHD diagnosis in the high-attention problems group (see Table 1 for more details on diagnoses). The diagnoses were not independent of the assessment data gathered; rather, they were based upon the data.

Intellectual functioning. Initially, children completed the WISC-IV (Wechsler, 2003a) as part of a psycho-educational assessment conducted because of school problems and secondly as part of a follow-up research study on child assessment. The WISC-IV is a standar-dized, nationally normed, individually administered IQ test that is widely used in assessments to assist in treat-ment planning and academic placetreat-ment of children including eligibility for special education programs (Kaufman, 2009; Sattler, 2008). The WISC-IV is con-sidered to have excellent reliability and validity. Only the 10 required subtests were administered. Examiners were advanced graduate students in clinical psychology who had been extensively trained in administering and scoring the WISC-IV. Different examiners were employed at the two assessments. Scoring was reviewed in a standardized independent double-checking procedure (Kuentzel, Hetterscheidt, & Barnett, 2011).

RESULTS

The double-checking procedure revealed that 1 child’s initial Comprehension subtest and another child’s follow-up Symbol Search subtest had administration errors, and they were excluded from analyses. Consequently, analyses for these two scores were based on 50 rather than 51 participants. For these cases, subt-est scores were prorated to complete the composite indexes for these 2 participants. Also, 1 child did not have any parent- or teacher-reported attention ratings at T1, so that child’s scores were excluded from the analyses that compared attention groups. There were no significant outliers, skew, or kurtosis on any of the focal variables. Test–retest means, standard deviations, and Pearson product–moment correlation coefficients between test and retest scores were calculated for the WISC-IV subtests, indexes, and FSIQ and are reported in Table 2. Correlation coefficients for the composite scores ranged from .58 (PSI) to .86 (FSIQ), and subtests’ reliabilities ranged from .35 (Letter-Number Sequencing) to .81 (Vocabulary). Individual variation in scores across the retest interval is presented in cumu-lative frequency distributions in Table 3. More than 80% of the children scored within 10 points (5) on FSIQ and the VCI. However, this number falls to <65% for PRI and <60% for PSI and WMI. Overall, the changes in scores over time were normally distributed for FSIQ and the indexes such that participants were equally likely to improve at T2 testing as they were to deterio-rate, even for those who showed large retest changes

TABLE 2

Comparison of T1 and T2 WISC-IV Composite Scores and Subtests

Variable First Testing Second Testing Stability M SD M SD Pearson’s r Full-Scale IQ 92.80 15.34 92.20 16.24 .86 Verbal Comprehension 98.08 14.53 98.10 16.64 .81 Perceptual Reasoning 95.43 16.62 94.27 17.32 .79 Working Memory 90.73 12.82 90.67 15.30 .60 Processing Speed 90.10 15.52 89.31 13.75 .58 Block Design 8.69 3.47 8.55 3.22 .78 Similarities 9.76 3.20 9.47 3.70 .80 Digit Span 8.25 2.24 8.35 2.53 .63 Picture Concepts 9.71 3.01 9.75 3.41 .54 Coding 7.63 3.13 7.45 2.44 .52 Vocabulary 9.73 2.43 9.57 3.07 .81 Letter-Number Sequencing 8.73 3.40 8.59 3.38 .35 Matrix Reasoning 9.20 3.28 8.75 3.29 .74 Comprehension 9.62 2.62 10.12 2.70 .49 Symbol Search 8.78 3.09 8.82 3.00 .62

Note. n¼ 51 except for Comprehension T1 and Symbol Search T2 where n¼ 50. There were no signiﬁcant changes in scores over time.

All correlations were signiﬁcant at p < .01.

4 BARTOI ET AL.

(6)

(e.g., >15-point difference). However, for the PSI, there were 2 participants who showed especially large changes; they deteriorated by 35 and 36 points, respectively. In contrast, for the WMI, there was 1 participant who showed a 30-point improvement. Time between testing was not related to variation in any subtests or indexes across the retest interval.

To examine our hypotheses regarding the role of attention problems on the stability of WISC-IV scores, we calculated stability coefﬁcients for the overall group, as well as for the high-attention problem and low-attention problem groups. Stability coefﬁcients were calculated by correlating the scores from T1 to T2 using Pearson corre-lation statistics. Fisher’s (1924) r-to-z transformation was

utilized to convert the Pearson’s r coefﬁcients to the nor-mally distributed variable z to compare stability of IQ scores for the low-attention problems group to those of the high-attention problems group. Time between testing was not signiﬁcantly different, t(39)¼ 1.22, p > .05, when

TABLE 4

Comparison of Test–Retest Stability Scores for Children With Low and High Levels of Attention Problems

Variable Low-Attention Problems (n¼ 21) (Pearson’s r) High-Attention Problems (n¼ 29) (Pearson’s r) Z Score Full-Scale IQ .91 .79 1.49 Verbal Comprehension .85 .78 0.69 Perceptual Reasoning .86 .67 1.57 Working Memory .62 .52 0.48 Processing Speed .72 .47 1.30 Block Design .80 .81 0.09 Similarities .81 .83 0.20 Digit Span .73 .40 1.65 Picture Concepts .65 .39 1.19 Coding .65 .43 1.03 Vocabulary .87 .78 0.94 Letter-Number Sequencing .35 .39 0.15 Matrix Reasoning .88 .50 2.70 Comprehension .52 .36 0.65 Symbol Search .76 .48 1.54 _{p < .05.}_{p < .01.} TABLE 5

Comparison of Low- and High-Attention Problems WISC-IV Composite Scores and Subtests

Variable Low-Attention Problems N¼ 21 High-Attention Problems N¼ 29 First Testing M (SD) Second Testing M (SD) First Testing M (SD) Second Testing M (SD) Full-Scale IQ 91.62 (18.96) 91.38 (20.88) 93.07 (12.31) 92.21 (12.19) Verbal Comprehension 97.05 (15.09) 97.29 (18.03) 97.86 (13.63) 98.07 (15.82) Perceptual Reasoning 95.33 (21.05) 93.19 (22.84) 95.48 (13.29) 94.79 (12.63) Working Memory 89.19 (17.14) 90.00 (19.24) 91.28 (8.48) 90.76 (12.19) Processing Speed 87.67 (15.23) 88.76 (16.74) 91.52 (15.95) 89.14 (11.28) Block Design 8.38 (3.22) 8.38 (3.92) 8.90 (3.74) 8.66 (2.74) Similarities 9.62 (2.89) 9.05 (4.18) 9.69 (3.37) 9.69 (3.39) Digit Span 8.33 (2.96) 8.48 (3.17) 8.14 (1.60) 8.21 (2.02) Picture Concepts 9.86 (3.53) 10.10 (3.95) 9.62 (2.69) 9.45 (3.05) Coding 7.38 (2.78) 7.43 (2.66) 7.69 (3.39) 7.34 (2.26) Vocabulary 9.52 (2.54) 9.24 (3.02) 9.79 (2.40) 9.72 (3.17) Letter-Number Sequencing 8.29 (4.41) 8.24 (4.17) 8.90 (2.44) 8.76 (2.77) Matrix Reasoning 9.43 (4.32) 8.14 (4.18) 9.00 (2.41) 9.10 (2.51) Comprehension 9.40 (2.89) 10.43 (2.84) 9.55 (2.20) 9.72 (2.49) Symbol Search 8.10 (3.43) 8.52 (3.91) 9.28 (2.83) 8.96 (2.19) TABLE 3

Test–Retest Changes: Cumulative Frequency Distributions (in Percentages) of Wechsler Intelligence Scale for Children-Fourth

Edition FSIQ and Index Scores Difference Between T1 and T2 Scores Full-Scale IQ Verbal Comprehension Perceptual Reasoning Processing Speed Working Memory 0 5.9 11.8 2.0 5.9 9.8 1 13.7 11.8 2.0 5.9 9.8 2 23.5 23.5 9.8 11.8 17.6 3 33.3 29.4 11.8 17.6 25.5 4 43.1 39.2 23.5 17.6 25.5 5 51.0 41.2 27.5 25.5 29.4 6 60.8 52.9 43.1 39.2 43.1 7 66.7 52.9 49.0 45.1 45.1 8 74.5 62.7 56.9 52.9 49.0 9 78.4 68.6 56.9 54.9 54.9 10 80.4 76.5 64.7 58.8 56.9 11 86.3 78.4 70.6 60.8 60.8 12 86.3 80.4 74.5 64.7 62.7 13 88.2 82.4 82.4 66.7 72.5 14 94.1 88.2 88.2 70.6 72.5 15 94.1 90.2 88.2 78.4 76.5 16 96.1 90.2 88.2 78.4 78.4 17 96.1 90.2 88.2 86.3 80.4 18 96.1 94.1 90.2 88.2 82.4 19 96.1 94.1 92.2 88.2 82.4 20 98.0 96.1 92.2 90.2 88.2 21 100.0 98.0 92.2 92.2 90.2 22 — 98.0 92.2 92.2 90.2 23 — 98.0 96.1 92.2 92.2 24 — 100.0 98.0 92.2 94.1 25 — — 100.0 94.1 96.1 26 — — — 96.1 98.0 27 — — — 96.1 98.0 28 — — — 96.1 98.0 29 — — — 96.1 98.0 30 — — — 96.1 100.0 31 — — — 96.1 — 32 — — — 96.1 — 33 — — — 96.1 — 34 — — — 96.1 — 35 — — — 98.0 — 36 — — — 100.0 —

(7)

comparing high-attention group members (M¼ 22.84 months, SD¼ 4.55) to low-attention group members (M¼ 20.81 months, SD ¼ 6.08). Furthermore, time between testing was normally distributed in both groups.

As indicated in Table 4, the high-attention problems group showed significantly lower stability than the low-attention problems group in performance on the Digit Span (r¼ .40 vs. .73, p < .05) and Matrix Reasoning (r¼ .50 vs. .88, p < .01) subtests for the high- and low-attention problem groups, respectively. T tests showed no significant differences at T1 or T2 in FSIQ, indexes, or subtests when comparing high- and low-attention problem groups. Hence, children with higher attention problems did not show significantly different scores when compared with children with lower attention problems. See Table 5 for the means and standard deviations for WISC-IV scores at T1 and T2 for the low- and high-attention problem groups.

DISCUSSION

Our study provides new data supporting the stability of the WISC-IV as well as extending psychometric support for WISC-IV scores to African American and Caucasian 8- to 16-year-olds referred because of scholastic prob-lems. Our ﬁndings were in line with existing estimates including those of the test manual (Ryan et al., 2010; Wechsler, 2003b). We think it is important to make data available on youth with clinical concerns because they are a group who frequently seek out evaluations by clini-cal and school psychologists and differ from the stan-dardization sample in their motivations for testing. Moreover, our sample had a roughly equal number of African American and Caucasian children and sup-ported the stability of the WISC-IV for both ethnic groups as well as for children from low-income families. We did limit our sample to children with initial IQs greater than 60 on the WISC-IV. Consequently, our ﬁndings cannot generalize to those who were more than 2 standard deviations below the mean. On the other hand, our stability estimates appear rather robust, given we restricted the variance, which would presumably attenuate stability estimates.

Although our general overall findings appear to support the stability of WISC-IV factor scores, we also found task and child attention problems appear to inter-act and perhaps influence the stability of finter-actor and subtest scores. We say interaction, because not all areas and subtests of the WISC-IV demonstrated equivalent stability. For instance, we replicated the frequently found pattern whereby FSIQ was most stable along with VCI and PRI close behind. The WMI and PSI stability coefficients drop off quite a bit (see Table 2)—a finding consistent with existing studies. The more modest

estimates for WMI and PSI are hypothesized to be a result of both being made up of fewer subtests than the FSIQ, VCI, and PRI, as well as a result of WMI and PSI involving cognitive processes that may be less consistent or reliable. Performance on the subtests com-prising the PSI and WMI is considered to be more dependent on attention and concentration, which fre-quently vary as a function of the examinee’s mental state (e.g., mood, motivation level) and=or circumstantial factors such as fatigue (Duckworth, Quinn, Lynam, Loeber, & Stouthamer-Loeber, 2011; McGee, Clark, & Symons, 2000).

Given that indexes more dependent on attention had lower long-term stability, we hypothesized that children with higher levels of attention problems, as measured on the CBCL and TRF scales (Achenbach, 1991a, 1991b), would have comparatively less stable performances on the WISC-IV. Significant differences in stability, how-ever, were found for just two subtests (i.e., Matrix Reasoning and Digit Span). Interestingly, the stability of the WMI, arguably the index most affected by atten-tion (Martinussen et al., 2005), did not differ signifi-cantly between children with low and high levels of attention problems. In addition, although Letter-Number Sequencing had the lowest stability for the overall sample, no difference in stability between children with low and high levels of attention problems was found. Perhaps attention lapses affected both children with and without clinically significant attention problems enough to lower stability for both groups.

The significant difference in stability for Digit Span between the high- and low-attention problems groups is not surprising, given its attention demands. What is more surprising, however, is the significant difference in stab-ility for Matrix Reasoning. When compared with Block Design, a subtest that requires handling and manipulat-ing objects, Matrix Reasonmanipulat-ing may be less engagmanipulat-ing for children with attention problems. Performance on Matrix Reasoning also may be affected by impulsivity, such that children with ADHD features may be too impatient to carefully consider all response options before making their choice—clinicians should observe closely for this possibility. This could be investigated further by (after the entire test has been completed) asking examinees who appeared impulsive about strategies that they used for Matrix Reasoning (Sattler, 2008), as impulsive chil-dren may admit that they guessed. A testing-of-limits approach might include (after the entire WISC-IV was administered under standard conditions) readmininster-ing failed Matrix Reasonreadmininster-ing items with coachreadmininster-ing or scaf-folding to carefully consider all five response choices before selecting one. Children could also be reminded to double check their answers. Why the difference in stability between attention groups was not also found 6 BARTOI ET AL.

(8)

for Picture Concepts is unclear, but it may be a result of the more meaningful, familiar, and engaging stimuli in that subtest as compared with the abstract patterns in Matrix Reasoning. Another factor contributing to the equivalence in stability between attention groups for Block Design may be the order in which the subtests are administered. Perhaps children with attention problems are less affected during the subtests that appear earlier in the testing session.

Because children with ADHD in Wechsler’s (2003b) standardization sample had lower scores on FSIQ, PSI, and WMI on the WISC-IV, we expected that the children in our sample with higher attention problems would exhibit a similar pattern, but their scores were not lower than those of their counterparts with lower levels of attention problems. However, all of the chil-dren in our sample had been referred for academic problems, so those reported to have low levels of atten-tion problems likely had other kinds of impairments including lower general cognitive abilities (see Table 1 for the range of diagnoses).

Given that children with higher attention problems in our sample did tend to show as much long-term stability on most subtests and indexes of the WISC-IV as chil-dren with lower attention problems, perhaps the WISC-IV can successfully and reliably maintain the chil-dren’s attention to the tasks. One feature of the test that may promote sustaining the child’s interest is the alter-nating order of the verbal- and performance-based tasks. The individual administration format of the WISC-IV may also contribute to its success in reliably maintaining examinees’ attention span. The WISC-IV also purports to have decreased its reliance on timed tasks (Wechsler, 2003b), perhaps reducing its suscepti-bility to being inﬂuenced by brief lapses in attention.

When comparing children with high-attention prob-lems to children with low-attention probprob-lems, this study used a cutoff of greater than 65 (T score) on Attention Problems on either the CBCL or the TRF. Because par-ent and teacher reports are sometimes incongrupar-ent, we did not want to restrict our attention-problems group to only children for whom there were congruent reports. If we had restricted group membership to children with a score of greater than 65 as reported by the teacher and parent, we may have excluded children who were in fact showing attention problems in at least one setting. The CBCL and TRF cutoff was used instead of comparing children diagnosed with ADHD to children without the diagnosis because WISC-IV performance had been considered when arriving at the diagnosis, which would have created a confound. Moreover, it would be difﬁcult to establish reliability and validity on diagnosis. In con-trast, the standardized nature of the CBCL and TRF makes replication more straightforward. Notably, the validity of the CBCL and TRF Attention Problems scale

is limited by the parents’ and teachers’ perceptions and ability to observe and report accurately their level of familiarity with the child, as well as their threshold for perceiving=tolerating different behaviors in children. Consequently, utilizing both reports reduced the impact of particular context or rater effects.

A potential weakness of the present study may be related to the use of graduate student examiners, because they have been noted in the literature to make frequent scoring errors (Ryan & Schnankenberg-Ott, 2003). However, our clinic employs a double-checking system so that all scoring is thoroughly reviewed by an advanced graduate student (Kuentzel et al., 2011), and all clinicians administering the WISC-IV at T1 and T2 were extensively trained advanced graduate students in clinical psychology.

Another possible limitation may have been the potential change in motivation between testing at T1 and T2. At T1, children were referred to the clinic for testing to determine diagnostic impressions and recommendations, whereas at T2, they were recruited for the sole purpose of research. This possible change in the children’s testing investment may have had an impact on the stability of scores.

Although a strength of the study was our use of a relatively diverse ethnic referred sample, a definite limi-tation was the sample size, which may have limited our ability to detect differences in the stability of WISC-IV scores between our low- and high-attention problems groups. For example, we had marginally significant dif-ferences in stability for FSIQ, PSI, PRI, and Symbol Search, which may have been statistically significant with a larger sample. The relatively small sample size and our decision to leave out children who scored less than 60 may have affected the overall stability that we obtained, although their magnitudes were comparable with what has been reported in the literature. An additional strength is the length of the retest interval, which constitutes a more rigorous test of temporal stab-ility and may be more clinically meaningful than the shorter intervals (e.g., 1 month) that have appeared in the literature. Moreover, time between testing adminis-trations varied but had no effect on variation between scores. As noted, the children in our ‘‘low-attention problems’’ group had other significant cognitive, beha-vioral, and academic problems. Future research should compare WISC-IV stability of children with attention problems to a matched community sample.

Intelligence tests are not just of scientific interest; they are regularly used for a variety of potentially life-changing reasons, including determining children’s eligibility for special education services and programs for the gifted. Our findings provide evidence for the stab-ility of IQ scores as well as for the potential for attention problems to negatively affect IQ estimates. Individualized testing reports frequently include confidence intervals to

(9)

acknowledge the role of nonintellectual, situational factors that may have an impact on scores (Sattler, 2008). However, our findings suggest that confidence intervals may not be as uniform for all children and per-haps greater error in IQ estimating may be expected among children with attention problems, particularly for Digit Span and Matrix Reasoning. Despite significant stability for FSIQ (.86) during a nearly 2-year period as well as an overall average change score of near 0, we found FSIQ scores changed from a decrease of 20 points to an increase of 14 points. Moreover, nearly 15% of our sample of children demonstrated IQ change that exceeded expectations beyond the 95% confidence interval for IQ. Future research is needed to examine whether the direction and magnitude of this change is a predictable consequence of adherence to intervention recommendations made following the assessment.

REFERENCES

Achenbach, T. M. (1991a). Manual for the Child Behavior Checklist and 1991 Proﬁle. Burlington: University of Vermont, Department of Psychiatry.

Achenbach, T. M. (1991b). Manual for the Teacher’s Report Form and 1991 Proﬁle. Burlington: University of Vermont, Department of Psychiatry.

Canivez, G. L., & Watkins, M. W. (1998). Long-term stability of the Wechsler Intelligence Scale for Children-Third Edition. Psychological Assessment, 10, 285–291.

Duckworth, A. L., Quinn, P. D., Lynam, D. R., Loeber, R., & Stouthamer-Loeber, M. (2011). Role of test motivation in intelligence testing. Proceedings of the National Academy of Sciences, 108, 7716–7720.

Fisher, R. A. (1924). On a distribution yielding the error functions of several well known statistics. Proceedings of the International Congress of Mathematics, 2, 805–813.

Gehman, I. H., & Matyas, R. P. (1956). Stability of the WISC and Binet tests. Journal of Consulting Psychology, 20, 150–152. Jepsen, J. R. M., Fagerlund, B., & Mortensen, E. L. (2009). Do

attention deﬁcits inﬂuence IQ estimate in children and adolescents with ADHD? Journal of Attention Disorders, 12, 551–562. Kaufman, A. (2009). IQ testing 101. New York, NY: Springer. Kuentzel, J. G., Hetterscheidt, L. A., & Barnett, D. (2011). Testing

intelligently includes double-checking Wechsler IQ scores. Journal of Psychoeducational Assessment, 29, 39–46.

Martinussen, R., Hayden, J., Hogg-Johnson, S., & Tannock, R. (2005). A meta-analysis of working memory impairments in children

with attention-deﬁcit=hyperactivity disorder. Journal of the American Academy of Child & Adolescent Psychiatry, 44, 377–384.

Mayes, S. D., & Calhoun, S. L. (2007). Wechsler Intelligence Scale for Children-Third and Fourth Edition predictors of academic achievement in children with attention-deﬁcit=hyperactivity disorder. School Psychology Quarterly, 22, 234–249.

McGee, R. A., Clark, S. E., & Symons, D. K. (2000). Does the Conners’ Continuous Performance Test aid in ADHD diagnosis? Journal of Abnormal Child Psychology, 28, 415–424.

Nyden, A., Billstedt, E., Hjelmquist, E., & Gillberg, C. (2001). Neurocognitive stability in Asperger syndrome, ADHD, and reading and writing disorder: A pilot study. Developmental Medicine and Child Neurology, 43, 165–171.

Reschly, D. J. (1997). Diagnostic and treatment utility of intelligence tests. In D. P. Flanagan, J. L. Genshaft, & P. L. Harrison (Eds.), Contemporary intellectual assessment: Theories, tests, and issues (pp. 437–456). New York, NY: Guilford.

Ryan, J. J., Glass, L. A., & Bartels, J. M. (2010). Stability of the WISC-IV in a sample of elementary and middle school children. Applied Neuropsychology, 17, 68–72.

Ryan, J. J., & Schnakenberg-Ott, S. D. (2003). Scoring reliability on the Wechsler Adult Intelligence Scale-Third Edition (WAIS-III). Assessment, 10, 151–159.

Ryan, J. J., Umﬂeet, L. G., & Kane, A. (2013). Stability of WISC-IV process scores. Applied Neuropsychology: Child, 2, 43–46. Sattler, J. M. (2001). Assessment of children: Cognitive applications

(4th ed.). San Diego, CA: Jerome M. Sattler.

Sattler, J. M. (2008). Assessment of children: Cognitive applications (5th ed.). San Diego, CA: Jerome M. Sattler.

Schwean, V. L., & Saklofske, D. H. (1998). WISC III assessment of children with attention deﬁcit=hyperactivity disorder. In A. Priﬁtera & D. H. Saklofske (Eds.), WISC-III clinical use and interpretation (pp. 91–118). San Diego, CA: Academic.

Schwean, V. L., & Saklofske, D. H. (2005). Assessment of attention deﬁcit hyperactivity disorder with the WISC-IV. In A. Priﬁtera & D. H. Saklofske (Eds.), WISC-IV clinical use and interpretation: Scientist-practitioner perspectives (pp. 235–277). Amsterdam, The Netherlands: Elsevier Academic.

Schwean, V. L., Saklofske, D. H., Yackulic, R. A., & Quinn, D. (1993). WISC-III performance of ADHD children. Journal of Psychoeduca-tional Assessment, WISC-III Monograph, 56–70.

Wechsler, D. (1949). Manual for the Wechsler Intelligence Scale for Children. San Antonio, TX: Psychological Corporation.

Wechsler, D. (1974). Manual for the Wechsler Intelligence Scale for Children-Revised. San Antonio, TX: Psychological Corporation. Wechsler, D. (1991). WISC-III manual. San Antonio, TX:

Psychological Corporation.

Wechsler, D. (2003a). WISC-IV administration and scoring manual. San Antonio, TX: Psychological Corporation.

Wechsler, D. (2003b). WISC-IV technical and interpretive manual. San Antonio, TX: Psychological Corporation.

8 BARTOI ET AL.