SEVENTH EDmON
) ;
EDUCATIONAL TESTING
AND MEASUREMENT
Classroom Application and Practice
TOM KUBISZYN
Universilyof
Hourtofl. GARY BORICH
The Universilyof
Texas at AustillAr.qIIisiUoIls Editor BnId H _ Marbting MaDagef ~ K4I'i1t Ilo1107
Senior Production Editor Wlkrk A ~
Senior Designer Harry NoIim l'rodlIc1ion Management Services AIJO$)'
This book was set in 10/12 TImes Roman by Argosy and prinIed and bound by R. R. DonneIIey & Sons Company.
The cover was prinIed by Phoenix ColorOJcponlion. This book is printed 011 acid·free papet:
§
Copyright 2003 (\ loon Wiley & Sons, 1Dc. All rigbrs reserved.
No part of this publication may be reproduced. stom1 in a reuir.vaI system or transmitted in any form or by any means, electronic, mechanical, pOOIocopying. n:cording, SClIIlIIing or<llilerwise, except as permitted under 8«;. lions 107 or lOS of!he 1976 United States Ccpyright Act, without eiIher the prior written ~ of!he Pub-lisher, or audtorizalion Ihrough payment of!he appropriate per-copy fee to !he Copyrighl Oearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 7S04470. Requests to !he Publisher for
pennis-sian sbould be addn:ssed to die Pennissioos Depanment, IoIJn Wiley & Sons, Inc., 111 River Street, Hoboken, NI 07030. (201) 748-6011, fax (201) 748{i()()8, &Mail: [email protected].
10 otderboob please caUI(800)-225-S94S. l.iImfry of Omgrns C4laIoging ;" l'dlic:GtiiHt Daz: KubiszyD. Tom.
EdooIIionallesling and measuremeot: classroom application and pracIice ITom Kubiszyn, Gary Boricl!.-71h ed.
p.em.
Includes bibliographical references 8IId iIIdex.
l. EducaIionaI tests and measuremeots-United States. L BoricIl, Gary D. H. Title.
LB305I.K8 2003 371.26'0973-0021
ISBN 0-471·14977·2 (cloth: alt. paper)
Printed in !he United States of America 10981654321
PREFACE
Two major developments in classroom testing and measurement explain why we have decided to incorporate substantial additions and revisions to this, the seventh edition of
Educational Testing
muJ
Measure~nt. These developments were the rapid spread of bigh-stakes testing to all 50 states and the District of Columbia in the past few years and grow-ing awareness among regular education ¢achers about their increased responsibility for special education students under the 1997 amendments to the Individuals with DisabilitiesAct (IDEA-97).
High-stakes test scores are now widely used for student promotion and graduation decisions and for educational accountability purposes, sometimes with substantial school or district incentives and penalties tied to student performance on high-stakes tests. And, with the passage ill January 2002 of the "No Child Left Behind Act," we now have a federal mandate that will soon require annual academic assessments of all school children in grades 3-8. This ensures continued and probably increased attention to high-stakes tests in the foreseeable future. Under the IDEA-97 regular education teachers are now required to play a much broader role than in the past in the instruction and assessment of special education students included in regaIar education classrooms.
Because these developments have generated intense controversy (i.e., especially the rapid spread of high-stales testing), one of the goals of this revision was to inform instruc-tors and future teachers about these important developments in a balanced and thoughtful way. And, because all future teachers will have to cope with the demands of high-stakes testing and full compliance with IDEA-97, another goal was to provide future teachers with practical information and recommendations they can immediately use in the classroom to prepare themselves and their students for high-stakes testing and the challenges of
IDEA-97. Nevertheless, as important as these developments are, the overarching goal of this revision was to remain true to the friendly style, content, order of presentation, and length of past editions of Educational Testing and Measurement.
As with all previous editions we have continued to present complex test and mea-surement content in a mendly, non intimidating, and unique manner and to relate this con-tent in meaningful ways to important developments in educational measurement and assessment In completing this revision we have kept our audience-classroom teachers-fully in mind. We have striven to present often abstract and sometimes diffICUlt concepts and procedures in an up-to-date and accurate, but accessible manner. Rather than over-whelm students with jargon and statistical theory, we continue to use a friendly, conversa-tional style to enhance our emphasis on the application of theory. At the same time, we provide sufficient theoretical background to ensure that students will understand the foun-dations of measurement and avoid an oversimplified approach to measurement. Thus, long-time users of the text should continue to feel comfortable with it
Past users of the text should bave no difficulty recognizing and adapting to this revi-sion.. The overall organization has been only slightly modified, and the flexible organiza-tion of the text continues to enable instructors to either follow the chapter
sequence
as is or-modify it as needed to meet their particular needs. A new chapter has been added (Chapter 2, High-Stakes Testing), another has been significantly expanded (Chapter 7, Writing J!ssay and Higher OrderTest
Items), and several other chapters have been revised and updated to seamlessly integratethe
new material on high-stakes testing and IDEA-97 and other devel-opments. To helpKeep the text's length reasonable the section on planning a schooIwide testing program has been deleted from Chapter 19 since this function has become obsolete in theface
of the adoption by all states of state-mandated high-stakes tests. Other changes to the seventh &tition are described in more detail nextChapter I has been revised and updated. It continues to provide up-to-date infor-mation on the increasingly important distinction between testing and assessment and new information about a variety of contemporary trends, especially high-stakes testing, the implications of IDEA-97 for regular education teachers, and competency testing for
~rs.
Chapter 2 is a new chapter devoted to the high-stakes testing phenomenon. It defines high-stakes testing. traces its histoly, reviews both sides of the controversy surrounding the use of high-stakes tests, considers the position taken by national measurement associations, and provides future teachers with concrete recommendations they can use to prepare them-selves and their students for high-stakes tests.
Chapter 3, which was Chapter 2 in previous editions. has been updated.
Chapter 4, Norm- and Criterion-Referenced Tests and Content Validity Evidence, consolidates Chapters 3 and 4 from previous editions into a single chapter. Several review-. ers suggested combining Chapters 3 and 4 into a single chapter because oflheir brevity. To
minimize confusion we have maintained the same topic sequence as in previous
editionS.
Throughout Chapter 4 and in several later chapters we have substituted "validity evidence" for "validity" when appropriate to ensure continuity with language included in the most recent edition of the Standards for EduclJtional and Psychological Tests (AmericanEduca-tional Re~ Association, 1999).
Chapters 5 and 6 also have been updated.
Chapter 7, Writing Essay and Higher Order Test
Items,
has been substantially revisedand expanded. It now includes a wider variety of examples of essay items to help teachers see how they can be used to measure higher order thinking and problem-solving ability. The sections on scoring also have been revised and updated. And, a new section has beeR added to help teachers assess how well students can organize and access knowledge and another new section provides guidance and many examples to help teachers design and utilize
open-book questions and tests.Chapter 8, Performance-Based Assessment, was Chapter 9 in previous editions. Chapter 9, Portfolio Assessment, was Chapter 10 in previous editions.
Chapter 10, Administering, Analyzing, and Improving the Test, was 0Iapter 8 in pre-vious editions.
I
j
These chapters were reordered at the recommendation of reviewers who noted that many principles covered in Administering, Analyzing, and Improving the Test applied to
per-formance and portfolio assessments, and not just to objective and essay items. In previous
I'·.~
editions this chapter followed the chapters on objective and essay items but pm:eded thevii
chapters on
perfOtmance
and portfolio assessments. In this edition this chapter now follows all four chapters devoted to cIassroom-based assessment-objective items, essay and higher order test items, performaoce assessments,and portfolio assessments.Chapters 11-14 have been updated.
Chapter 15, Validity, has been revised to make it consistent with the approach to the
establishment of validity evidence described in the latest edition of the StlINkmls for
Edu-cational and Psychological Tests (American Educational Research Association, 1999). Rather !han considering validity to be a characteristic of a test, the new edition stresses the importance of acquiring evidence of a test's validity for a particular use.
Chapters 16 and 17 have been updated.
Chapter 18, Standardized Tests, has been revised It continues its extensive treatment of the history, utility, and interpretation of standardized tests, with increased attention paid to the use of both standanlized norm-referenced and standardized criterion-referenced tests
in high-stakes testing programs.
Chapter 19. Types of Standardized Tests, has been revised and infonoation regarding various standardized tests has been updated. The entire section entitled "Planning a School-or District-wide Testing Program" has been deleted because this function has been sup-planted by state legislatures and state education agencies with the spread of the high-stakes testing phenomenon.
Chapters 20 and 21 have been revised and updated to better infono regular education teachers about their increased responsibilities for evaluating the educational and behavioral progress of special education students included in their regular education classrooms and curricula under IDEA-97. At the request of reviewers, person-first language (e.g .• children with disabilities) has replaced the language previously used to refer to children in special education programs (ie.., special learners ). New examples of recently developed or revised behavior rating scales that regular education teachers are increasingly expected to complete are included in Chapter 21. These include scales used to assess medicatiou safety and effi-cacy for the growing number of pupils taking medications that can affect learning and behavior.
Pinally, Chapter 22 also has been revised to reflect the rapid spread of the high-stakes testing phenomenon and the added responsibilities for regular education teachers for com-pliance with IDEA-97.
Throughout the text we have added references to a variety of contemporary measure-ment trends. tying these today-to-day decision making for the classroom teacher. And, we have updated our references, suggested readings. and list of supplemental statistics and measurement texts to include recent articles. chapters, and books that reinforce and expand the changing face of educational measurement in today's classroom. ..
As with earlier editions, readers will find at the conclnsion of each chapter a step-by-step summary in which all important concepts in the chapter are identified for review. Addi-tionally, we have prepared new discussion questions and/or exercises for each new chapter
and section. These discussion questions and exercises should help students learn how to apply the concepts presented and, along with the Instructor's Manual. should help instruc-tors identify organized activities and assignments that can be integrated into their class pre-sentations. Discussion questions and exercises marted with an asterisk: have answers listed in Appendix D.
R
We have tried to select ttaditional and contemporary topics and provide examples that help the teacher, especially the beginning teacher, deal with practical, day-to-day issues related to the testing and assessment of students and measuring their behavior. The topics we have chosen, their natural sequences and linkage to the real-life tasks of teachers, the step-by-step summaries of major concepts. and our discussion questions and exercises all work. we believe, to make this text a valuable tool and an important resource for observing. measuring. and understanding life in today's changing classroom.
Tom Kubiszyn Gary Borich
ACKNOWLEDGMENTS
We would like to express our appreciation to the following instructors for their constructive comments on this revision: W. Robert Houston, University of Houston; Alice Coridll, Uni-versity of Nevada-Las Vegas; Robert Paugb, UniUni-versity of Central Florida; Priscilla J. Hambrick, City University of New York; and Pam Fernstrom, University of North Alabama.
Thanks also are owed to Bill Fisk, Clemson University; David E. Tanner, California State University at Fresno; Gregory 1. Cizek, University of Toledo; Thomas 1. Sheeran, Niagara University; Jonathan A Plucker, Indiana University; Aimin Wang, Miami University; William M. Bechtol, late of Southwest Texas State University; Deborah E. Bennett, Purdue
University; Jason Millman, Cornell University; David Payne, University of Georgia; Glen Nicholson, University of Arizona; Carol Mardell-Czudnowski, Northern Illinois University; and James Collins, University of Wyoming, for their constructive comments on earlier revi-sions. Also. we thank Marty Tombari for his contributions to Chapters 8 and 9 and other examples, ilInstrations, and test items in this volume and Ann Schulte for her contributions to Chapter 17. Fmally. we thank Denise Branley for her revisions to the Instructor's Manual
and Test Bank.
CONTENTS
CHAPTER 1 AN INfRODUC[ION TO
CONTEMPORARY EDUCATIONAL TESTING AND MEASUREMENT 1
Tests Ate Only Tools 1
Tests Are Not Infallible 2
Testing: Part of Assessment 2
Testing and Assessment Skills: Vital to Teachers 4
Recent History in Educational Measurement 5 Current Trends in Educational Measurement 6
"High-Stakes" Testing 6
1997 Amendments to the Individuals widi Disabilities EducatinnAct (IDEA-97) 9
PerfOl1llllllCe and Portfolio Assessment 10 Education Reform and the Global Economy 11
Competency Testing for Teachers 12 Increased Interest from Professional Groups 12 Effects on the Oassroom Teacher 13
AbouttheText 16
What If You're "No Good in Madi"? 16 Summary 17
For Discussion 18
CHAPTER 2 HIGH-STAKESTESflNG 19
High-Stakes Tests Ate Only Tools 20 What Is High-Stakes Testing and Wby Does It
Matter? 21
Promotion aod Graduation Decisions Affect Students 22
Principal and Teacher Incentives Ate Linked to Pedonnance 23
Effects on Property Values, Business Decisions, and Polities 23
The History of High-Stakes Testing 24 Education Reform 24
Standards-Based Reform 25 The High-Stakes Testing Backlash 32
Is There Really a High-Stakes Testing Backlash? 33 What Do National Organizations Say about High-Stakes
Tests? 35
AERA's 12 Conditions That High-Stakes Testing Programs Sbould Meet 35
HowCan a Teacher Use the 12 Conditions? 37 Helping Students (and Yourself) Prepare for High-Stakes
Tests 38
Focus on the Task. Not Your Feelings about It 39 1nf0llll Students and Parents about the Importance of
theTest 39
Teach Test-Taking Skills as Part of Regular Instruction 40
As the Tes.t Day Approaches Respond to Stodent Questions Openly and Directly 42
Talce Advantage ofWbatever Preparatioo Malerials Are Available 42
Summary 43
For Discussion 44
CHAPTER 3 THE PURPOSE OFTESTlNG 45
Testing, Accountability. aod the Classroom Teacher 46 Types of Edncational Decisions 47
"Pinching" in die Qassroom 52 What to Measure 53
How to Measure 54 W rinen Tests 54
Summary 56
For Discussion 56
CHAPTER 4 NORM· AND CRITERION-REFERENCED TESTS AND CONTENT VAllDlTY EVIDENCE 51
Defining NOllII-Referenced and Criterion-Referenced Tests 57
Comparing Nann-Referenced and Criterion-Referenced Tests 61
Differences in the Construction of NOllII-Referenced and Criterioo-Referenced Tests 62
NRTs. CRrs. and Language, Cultural, and Sncial Sensitivity 63
NRTs, CRTs. and Validity Evidence 64
xii CONTENTS
A. Three-Stage Model of Classroom Measurement 66 Why Use Objecrives?Why Not Just Write Test Items? 67
Where Do Goals Come From? 69
Are There Diffenmt Kinds of Goals and Objectives? 71 How Can IIISInIdiooaI Objectives Make a Teacher's Job
Easier'? 74 Summary 75 For Discussion 76
CHAPTER 5 MEASURING LEARNING OrncOMES 78
Writing Instructional Objectives 78
Identifying Learning Outcomes 78
Identifying Observable and Directly Measurable Learning Outcomes 79
Stating Conditions 80 Stating Criterion Levels 81
Keeping It Simple and Straightforward 82 Matching Test Items to Instructional Objectives 83 Taxonomy of Educational Objectives 85
Cognitive Domain 85 Affective Domain 88 The Psychomotor Domain 90
The Test Blueprint 91
Content Outline 93 Categories . 93 Number ofltems 94 Functions 94 Summary 96 For Practice 96
CHAPTER 6 WRlT1NG OBJECflVE TEST ITEMS 98
Which Format? 98 True-Faise Items 99
Suggestions for Writing True-False Items 102 Matching Items 103
Suggestions for Writing Matching Items 106 Multiple-Choice Items 107
Higher Level Multiple-Choice Questions 112 Suggestions for Writing Multiple-Choice Items 116
Completion Items 116
Suggestions for Writing Completion or Supply Items 119
Oender and Racial Bias in Test Items 120 Ouiddines for Writing Test Items 120 Advantages and Disadvantages of Different
Objective-Item Formats 122 Summary 124
For Practice 124
CHAPTER 7 WRITING ~y AND HIGHER ORDER TI!ST ITEMS 126
What Is an Essay Item? 127
Essay Items Shoold Measure Complex Cognitive Skills or Processes 127
Essay Items ShooId Structore the Student's Response 127
Types of Essays: Extended or Restricted Response 129
Examples of Restricted-Response Essays 130 When Shoold Restricted-Response Essays Be
Considered 130
Pros and Cons of Essay Items 132 Advantages {If !he Essay Item 132 Disadvantages of the Essay Item 132 Suggestions for Writing Essay Items 133 '; Scoring Essay Questions 135
Well-Written Items Enhance Essay Scoring Ease and Reliability 135
Essay Scoring Criteria, or Rubrics 136 Scoring Extended-Response and Higher Level
Questions 136
General Essay Scoring Suggestions 141 Assessing Knowledge Organization 142 Opea-Book Questions and Exams 144
Some Open-Book Techniques 147
Guidelines for Planning an Open-Book Exam 150 Summary 151
For Practice 153
CHAPTER 8 PERfORMANCE-BASED
154
Performance Tests: Direct Measures of Competence 154 Performance Tests Can Assess Processes and Products 155 petformance Tests Can Be Embedded in Lessons 155; Petformance Tests Can Assess Affective and Social
Skills 157
Developing Performance Tests for Yoor Learners 158 Step I: Deciding What to Test 158
Step 2: Designing the Assessment Context 162 Step 3: Specifying the Scoring Rubrics 164 Step 4: Specifying Testing Constraints 170
A Fmal Wool 171 Summary 172
For Discussion and Practice 173
CHAPTER 9 PORTFOUO ASSESSMENT 174
Rationale for the Portfolio 175
Ensuring Validity of the Portfolio 175
Summary 189 For Practice 190
CHAPTER 10 ADMINISTERING. ANALY7JNG. AND IMPROVING THE WRlTTENTEST 191
Assembling the Test 191
Packaging the Test 192
Reproducing the Test 194
Adminisreriog the Test 194
Scoring the Test 196
Analyzing the Test 197
Quantitative Item Analysis 197
Qualitative Item Analysis 204
Item Analysis Modifications for the Criterion-Referenced Test 205
Debriefing 209
Debriefing Guidelines 210
The Process of Evaluating Classroom Achievement 211
Summary 213
For Practice 214
CHAPTER 11 MARKSANDMARKlNGSYSTEMS 215 What Is the Purpose of a Mark? 215
Why Be Coocemed about MarkiBg? 215
What Should a Mark Reflect? 216
Marking Systems 217
Types of Comparisons 217
Which System Should You Choose? 221
Types of Symbols 222
Combining and Weighting the Compooents of a Mark 223
Who Is the Better Teacher? 224
Combining Grades from Quizzes, Tests, Papers, Homework, Etc •• into a Single Mark 225
Practical Approaches to Equating before Weighting in the Busy Classroom 229 Front-end Equating 229 Back-end Equating 229 Summary 233 For Practice 233 CHAPTER 12 SUMMARlZlNGDATAAND MEASURES OF CENTRAL TENDENCY 235 What Are Statistics? 235
Why Use Statistics? 236
Tabulating Frequency Data 237
CON1iNT$
xiii
The List 237
The Simple Frequency Distribution 238
The Grouped Frequency Distribution 238
'Steps in ConstnJctioll a Grouped Frequency Distribution 240
Graphing Data 243
The Bar Graph, or Histogram 244 The Frequency Polygon 244 The Smooth Curve 246
Measun:s of Central Tendency 250 TheMean 251
The Median 252 TheMode 256
The Measures of Central Tendency in Various Distributions 258
Summary 259 For Practice 261
CHAPTER 13 VAR1ABIUfY, THE NORMAL DISTRIBUTION, AND CONVERTED SCORES 263
The Range 263
The Semi-Interquartile Range (SIQR) 264
The Standard Deviation 265
The Deviation Score Method for Computing the Standard Deviation 269
The Raw Score Method for Computing the Standard
Deviation 270
The Notmal Distribution 272
Properties of the Normal Distribution 273
Converted Scores 277 z-Scores 280 T-Scores 284 Summary 285 For Practice 285 CHAPTER 14 CORREUTlON 287
The Correlation Coefficient "288
Strength of a Correlation 289
Direction of a Correlation 289
Scatterplots 290
Where Does rCorne From? 292 Causality 293
Otller Interpretive Cautions 295
Summary 297 For Practice 298
CHAPTER 15 VAUDlTY 299
dv
CONTENTSl)pes of Validity Evidence 299 Content Validity 300
Criterion-Related Validity 300 Construct Validity 302
WIIal Have We Beea Saying? A Review 303
Interpreting Validity Coefficients 305 Content Validity Evidence 305
Concurrent and Predictive Validity Evidence 30S
)ummary 309
For Practice 309
CHAPTER 16 REUAB1UfY 311
-Medtods of Estimating Reliability 311
Test-Retest or Stability 311 Alternate Foons or Equivalence 313 Internal Consistency 313
IIlterpreting Reliability Coefficients 316
Summary 319
For Prac1ice 320
CHAPTER 17 ACCURACY AND ERROR 321 Eaor-What Is It? 321
The Standard Error of Measurement 323 Using the Standard Enor of Measurement 324 More Applications 327
Standard Deviation or Standard Error of Measurement? 330
Wby AD the Fuss about Eaor? 330 Error within Test Takers 330 Eaor within the Test 331 Eaor in Test Administration 331 Eaor in Scoring 332
Sourees of Error Inf10encing Various Reliability Coefficients 332
'1est-Retest 332
Alternate Foons 333 Internal Consistency 333 Band Interprelation 335
Steps; Band Interpretation 336 A Fmat Word 339
Summary 340 For Prac1ice 341
CHAPTER 18 STANDARDIZED TESTS 343 What Is a Standardized Test? 343
Do Test StimnIi. Administration, and Scoring Have to Be Standardized? 345
Standardized Testing: Effects of Accommodations and AItemative Assessmems 345
Uses of Standardized AcI!ievement Tests 346
WiD Performance and PorUoIio AssessmeIIt Mate Standardized Tests Obsolete? 347
Administering Standardized Tests 348
'JYpes of ScoRs Offered for Standardized Achievement Tests 350
Grade Equivalents 350
Age Equivalents 351 Percentile Ranks 352
Standard Scores 352
Interpreting Standardized Tests: Test and Student Factors 354
Test-Related Factors 354
Student -Related Factors 361
Aptitude-Achievement DisCrepancies 365
Interpreting Standardized Tests: Parent-Teacher Conferences and EducaIiooal Decision Making 368
An Example: Pressure to Change an Educational
Placement 369
A Secnnd Example: Pressure from the Opposite Direction 373
Interpreting Standardized Tests: Score Reports from the Publishers 376
The Press-on Label 377
A Criterion-Referenced Skins Analysis or Mastery Report 380
An Individual Performance Profile 381
Other Publisher Reports and Services 382 Summary 383
For Practice 385
CHAPTER 19 TYPES OF STANDARDIZED TESTS 387
Standardized Achievement Tests 387
Achievement Test Batteries, or Survey Batteries 388 Single-Sobject Acbievement Tests 390
Diagnostic Achievement Tests 390 Standardized Academic Aptitude Tests 391
The History of Academic Aptitude Testing 391 Stability of IQ Scores 392
What Do IQ Tests Predict? 393
Individually AdministemI Academic Aptitude Thsts 394
Group Administered Academic Aptitude Tests 394
Standardized Personality Assessment Instruments 395
What Is Personality'! 395 Objective Personality Tests 397 Projective Personality Tests 398
SIII1IIIIII'Y 398 For DiscuSsion 399
CHAPTER 20 TESTING ANDASSESSINGCH1WREN wrrH SPECIAL NEEDS IN THE REGULAR
CUSSROOM 400
A Brief History of Special Education 403
P.1. 94-142 and the Individuals with Disabilities EducationAct 403
Section 504 of the Rehabilitation Act 404
Special Education Service Delivery: An Evolution 404
Service Delivery Setting 405 Determining Eligibility for Services 407 Disability Categories to Developmental Delays 408 IDEA-97 and the Classroom Teacher 409
Testing or Assessment? 409 Child Identification 410 Individual Assessment 412
Individual Educational Plan (IEP) Development 415 Individualized Instruction 418
Reviewing the IEP 418 Manifestation Determinations 419
At the Other End of the Curve: The Gifted Child 419 Defining "Gifted" 420
Assessment and Identification 420
Current Trends in Teaching and Assessing the Gifted and Talented 424
Summary 425
For Discussion 426
CHAPTER 21 ASSESSING CHIWREN WITH DISABIUfIES IN REGULAR EDUCATION . ClASSROOMS 421
IDEA-97: Issues and Questions 428 Assistance for Teachers 428
Can Regular Teachers Assess a Child with a Disability? 429
Should Regular Teachers Assess a Child with a Disability? 429
Assessing Academic Performanre and Progress 430 Teacher-Made Tests and Assessments 430 Standardized Tests and Assessments 431 Limitations of Accommodalions and Ahernative
Assessments 431
Assessing Behavioral and Attitudinal Factors 432 Assessment, Not Diagnosis 432
Classroom Diversity, Behavior, and Attitudes 433 Behavior Plan Requirements UDder 1DEA-97 434
Teacher-Made Behavior and Attitude Assessments 434
Distinguishing Behavior from Attitude 434 Assessing Behavior 435
A,ssessing Attitudes 441
MOOiloring Cbi1dren with Disabilities Who Are 13king . MedicaIioo 448
Medication Use Is Increasing 449 Side Efft:cts May Be Pmsent 449
The Teacher's Role in Evaluating Medication and PsychosociaIlnterventions 449
Commonly Used Standardized Scales and Checklists 450
Summary 453 For Discussion 455
CHAPTER 22 IN THE ClASSROOM: A SUMMARY DIALOGUE 457
High·Stakes Testing 462
Criterion-Referenced Versus Norm-Referenced Tests 462 New ~ibilities for Teachers under IDEA-97 463 Instructional Objectives 463
The Test Blueprint 464
Essay Items and the Essay Scoring Guides 464 Reliability, Validity, and Test Statistics 465 Grades and Marks 466
Some Final Thoughts 467
APPENDIX A MATH SKIUS REVIEW 469
APPENDIX B PEARSON PRODUCT-MOMENT CORRELATION 471
APPENDiX c STATISTICS AND MEASUREMENT TEXTS 419
APPENDIX D ANSWERS FOR PRACTICE 480
SUGGESTED READINGS 484
REFERENCES 491 CREDfIS 495
CHAPTER
1
AN INTRODUCTION TO
I
CONTEMPORARY EDUCATIONAL
TESTING AND MEASUREMENT
CHANCES ARE that some of your strongest childhood and adolescent memo-ries include taking tests in school. More recently, you
probably
remember taking a great number of tests in coUege. If your experiencesare
like those of most who come throUghour
educational system, you probably have very strong or mixed feelings about tests and test-ing. Indeed, some of you may swear that you will never test your students when you become teachers. If so, you may think that test results add little to the educationalprocess
and fail to reflect learning or that testingmay
tum off students to learning. Others may believe that tests are necessary and vital to the educationalprocess.
For you, they may rep-resent irrefutable evidence that learning has occurred. Rather than view tests as deterrents that tum off students, you may see them as motivators that stimulate students to study andprovide them with feedback about their achievement.
TESTS ARE ONLY TOOLS
Between
those who feel positively abouttests
and those who feel negatively about them lies a thinI group. Within this group, which includes the authors, are those who see tests as tools that can contribute importantly to the process of evaluating pupils, the curriculum, andteaching methods, but who question the status and
power
often ~ven to tests andtest
scores. We are concemed that test users often uncritically accept test scores. This concerns us for three reasons. First, tests are only tools, and tools can be appropriately used, unintentionally misused, and intentionany abused. Second, tests, like other tools, can be well designed or poorly designed. Third, both poorly designed tools and well-designed tools in the haIIds ofill-trained or inexperienced users can be dangerous. These three concerns motivated us to
write this text By helping you learn to design and to use tests and test results appropriately we hope you will be less likely to misuse tests and their results.
TESTS ARE NOT ii\iFALUBlE
Test misuse and abuse C3Il occur when users of test results are unaware of the factors that C3Il intluence the usefulness of test scores. The technical adequacy of a test, or its validity (seeCbapter 15) and reliability (seeCbapter 16), is one such factor; A variety of factors can dtamaticany affect the validity and reliability of a test. When a test's validity and reliability are impaired. test results should be interprett:d
very
cantiously, if at all. Too often, such coo-siderations are overlooked or ignored by professionals and casual observers alike.Even when a test is technically adequate, misuse and abuse C3Il occur because tech-nical adequacy does not ensure that test scores are accurate or meaningful (see Chapters 17 and 18). A number of factors C3Il affect the accuracy and meaningfulness of test scores.
These include the test's appropriateness for the
purpose
of testing, the test's content valid-ity evidence (if it is an achievement test), the appropriateness of its norms table (if it is a norm-referenced test, a term we willleam more about in Chapter 3), the appropriateness of the reading level, the language proficiency and cultural characteristics of the students, teacher and pupil factors that may have affected administration procedures and scoring of the test, and the pupils' motivation and en~nt with the test on the test day.Because technical adequacy and these interpretive factors can affect test
scores
dra-matically, our position is that test scores should never be uncritically employed as the sole basis for important educational decision mating. Nevertheless, with the rapid spread of the high-stakes testing movement (which we win discuss in detail in Chapter 2), a disturbing number of promotion and gtaduation decisions are being based on test scores alone. Instead of relying on such a limited "snapshot" of student achievement for important decision making, we recommend that test results should be considered to be part ofa
broader "movie" orprocess
called f1SSeSsmenJ. It sbouId be the findings of the broad assessment, not just test results, that form the basis for important educational decisioo making. We willdescribe the
process
of assessment in the next section and distinguish between testing and assessment See the sidebar about the Waco, Texas, public schools for a recent example of the controversial use of test results aIooe to make important educational decisions.TESTING: PART OF ASSESSMENT
Unfortunately, the situation described in the sidebar is not unusual. Well-inlended educators
cootinue to rely solely or primarily on test results to make important edncational decisions. They may unintentionally misuse test results because they have
come
to regard test results as the end point rather than an early or midpoint in the much broaderprocess
of assessment Or, they may mistakeuly believe that testing and assessment are synonymous.In the assessment
process,
test results are subject to critical study according to estab-lished measurement principles. If important educational decisions are to be made, critically evaluated test results should be combined with results from a variety of other measurement procedures (e.g., performance and portfolio assessments, observations, checklists, rating scales-an covered later in the text), as appropriate, and integrated with relevantback.-ground and contextual ioformation (e.g., reading level, language proficiency, cultural considerations-also covered later in the text) to ensure that the educatiooal decisions are
BOX - -~"'.' . "~' . ' '" t ;.- "'fe.,~".",,<· .
- - -;;... • - . ~ - ";' ~i", -""{.. - ~ ~ . I
WACO, TEXAS. SCHOOLS USE STANDARDIZED TEST SCORES AWNE TO MAKE
/PROMOTION DECISIONS
Social promotion is a practice that purports 10 protect student self-esteem by promoting students to the next grade so that they may stay with their classmates even when students are not academically ready for promotion.
Educational. psychological, politicaI, fiscal. cultural, and other controversies are all associated with social promotion.
Concerned with possible negative effects of social promotion, the Waco, Texas. public schools decided 10 utilize standardized test scores as the basis for pr0mo-tion decisions beginning with first graders in 1998. As a result. the number of students retained increased from 2% in 1997 10 20% in 1998 (Austin American-Statesman, June 12, (998). The Waco schools are not alone in curtailing social promotion. The Chicago public schools, in the midst of a wide-ranging series of
educational reform initiatives, retained 22,000 students in 1994, with 175,000 retained in 1998 (Newsweek. June 22, 1998).
What has come 10 be known by some as the "Waco EXperimentf f
also raised a number of measurement related issues.
Wbereas the Waco schools' decision was doubtless well intended, their policy may have overlooked the fact that the utility of test scores varies dependent on age, with test results for yOWlg children less stable and more prone 10 error than those for older children. A relatively poor score on a test may disappear in a few days, weeks, or monltls after additional development has occurred,
irrespective of achievement. In addition, older children are less susceptible to distractions and, with years of test-taking experience under their belts, are less likely 10 be confused by the tests or have difficulty completing tests prGperty. All these factors can negatively affect a
SIlIdent's score and resuk in a score that uadetrepresents the student's true level of knowledge.
Furtbennore, a single standardized test score pr0-vides only a portion of a child's achievement over the school year, regardless of the grade level. As we will see when we consider the interpretation of staodard-!zed test results in Chapter 18, there are a DUmber of student-related factors (e.g., illness. emotional upset)
and administrative factors (e.g .. allowing too little
time. failing to read instructions verbatim) that can negatively affect a student's performance on the d'I)'
the test was taken. Thus, making a decision that so substantially affects a child's educatiun based 00 a single measure obtained on a single day rather than
relying ona compilation of measures (j.e .• tests,
rat-ings, observations, grades 011 assessments and portfo-lios. homework, etc.) obtained over the course of the
school year seems ill-advised
On the other hand, using data collected on a single day and from a single test 10 make what otherwise woald be complex, time--consuming, and difficult deci-sions has obvious attraction. It appears 10 be expedient, accurate, and cost-effective and appears 10 be address-ing concerns about the social promotion issue. How-ever, it also may be simplistic and shortsighted if no plan exists to remediate those who are retained. As noted in a June 12, 1998, editorial in the Austill Ameri-can-Statesman, "Failing students who don't meet a minimum average score. without a good plan 10 help them improve, is the fast track to calamity." Neverthe-less, this trend has not diminished since we first reported on it in our sixth edition. Indeed, the use of test scores 10 make high-stakes pr~motjon decisions has increased across the nation. We will explore this phe-nomenon in depth in Chapter 2.
appropriate. You can see that aJtboogh testing is one part of assessment, assessment encom-passes much more than testing. Figure 1.1 further clarifies the distillCtioo between testing
lind assessment
j)
Throughout the text we will refer to testing and/or assessment To avoid confusion later, note the distillCtion between testiog and assessment in Figure 1.1. Next, we will
sum-marize wby we believe it is of vital importance that all educators obtain a finn grounding in educational testing and assessment practice.4 CHAPml1 AN IN11IODUC1lONTO CONTM'ORARY EDUCA1lONAl.1£S11NGANO MEASUREIoBIT
Testing
1. Tests are developed or selected (if standardized-see Chapter 18), adminis-tered to the class, and scored.
2. Test results are then used to make decisions about a pupil (to assign a grade, recommend for an advanced program), instruction (repeat. review, move on), curriculum (replace, revise), or other educational factors.
Assessment
1. Information is collected from tests and other measurement instruments
(portfolios and performance assessments, rating scales, checklists, and observations).
2. This information is criticaHy evaluated and integrated with relevant back-ground and contextual information.
3. The integration of critically analyzed test results and other information results in a decision about a pupil (to assign a grade, recommend for an advanced program), instruction (repeat, review, move on), curriculum (replace, revise), or other educational factors.
FIGURE 1.1 The distinction between testing and assessment
TESTING AND ASSESSMENT SKILLS:
VITAL TO TEACHERS
Over the next several pages we will alert you to a number of recent developments that indi-cate that classroom teachers will engage in more testing and assessment than ever before. Because the decisions that will be made may also be of increased importance, we believe that a firm grounding in testing and assessment is more than merely important for teachers. We believe it is vital! Here's why:
1. Appropriate or not, the use of test results to make annual high-stakes decisions about students (e.g., promotion, graduation), school personnel (e.g., pay increases and con-tinued employment), and even control of schools (e.g., state takeover of low per-forming schools) has increased dramatically in spite of vocal protests and complaints from teachers and other educators, attomeys and other advocates, and some parents and students (see Chapter 2).
2. To ensure that students are progressing toward achievement of state academic and performance standards (discussed more fully in Chapter 2) measured by high-stakes tests, the use of teacher-made tests and other measurement procedures (e.g., perfor-mance and portfolio assessments, observations, checklists, and rating scaJes--see Chapters 8, 9, and 21) to assess academic progress and support day-to-day instruc-tional decisions is also 011 the increase.
3. Recent federal legislation now requires the classroom teacher's involvement in the instruction and regular assessment of the performance and progress of special educa-tion pupils in the general curriculum--the domain of the classroom teacher, not the special educator (see Chapters 20 and 21).
,
RECENT HISlORY III EDUCATIONAL MEASUREMENr 5
4. Testing and assessmeat are now widely accepted as necessary for students, teachers, parents, administrators, and other decision makers to detennine whether students are learning and, increasingly, what th~ most cost-effective, culturally sensitive instruc-tional methods may be.
5. To be useful for decision making, tests and other measurement procedures must be technically adequate and appropriately and sensitively used.
Let's now tum to a review of recent and current developments and trends that have led us to the conclusion that enhanced skills in testing and assessment should now be consid-ered vital to the classroom teacher.
RECENT HISTORY IN EDUCATIONAL
MEASUREMENT
Beginning in the late 1960s, a fairly strong anti test sentiment began to develop in our coun-try. Over the next two decades many scholarly papers and popular articles questioned or denounced testing for a variety of reasons. Some decried tests as weapons willfully used to . suppress minorities. To others, tests represented simplistic attempts to measure complex traits or attributes. Still others questioned whether the traits or attributes tested could be measured, or whether these traits or attributes even existed! From the classroom to the Supreme Court, testing and measurement practice came under close scrutiny. It seemed to some that tests were largely responsible for many of our society's ills.
Initially it looked as though the antitest movement might succeed in abolishing test-ing in education-and there was professional and lay support for such a move. Today, it appears that this movement was part of a swinging pendulum. Calls for the abolition of test-ing and gradtest-ing gradually subsided, and by the late 1980s more tests than ever were betest-ing administered. Today, all 50 states have some sort of annual high-stakes test program in place and federal legislation passed in January 2002 will soon require annual "academic -; assessments" for all students in grades 3-9.
Voices calling for the abolition of testing have been overshadowed by the high-stakes testing juggernaut. While measurement experts continue to emphasize that all test scores are at best estimates that are subject to greater or lesser margins of error and that they should be used along with other sources of data to make important educational decisions, their voices too have been muted by high-stakes testing adv~.
Nevertheless, critics of testing continue to rai$e important issues. And most have come to realize that abolishing testing will not« a panacea for the problems of education and con-temporary society. Even the most outspoke1 critics of testing would have to agree that in our everyday world decisions must be made. If tests were eliminated, these decisions would still be made but would be based on nontest data that might be subjective, opinionated, and .~ biased. For example, you may have dreaded taking the Scholastic Assessment Test (SAT) during your junior or senior year in high school. You may have had some preconceived, stereotyped notions about the test. However, the SAT had no preconceived, stereotyped, or prejudiced notions about you! Advocates and many critics of testing would now agree that it is not the tests themselves that are biased, but the people who misuse or abuse them.
6
While the high-stakes testing movemeat bas swept the nation. there bas been a related shift from the historical emphasis on muhiple-dloice, true-false, and matclUng item for-mats for tests to the use of more flexible measurement formats_ More and,more calls are being beard and heeded for tests and procedures that assess higher level thought processes
than are typically measured by such item formats. Essay
tests.
portfolios, and various per-formance tests (all discussed in detail in Qmpters 7, 8, and 9) are ioaeasingly being dzedin addition to traditional multiple-choice tests in contemponuy assessment efforts. Advo-cates refer to perfonnance and portfolio assessments as authentic flSSessments, a term that suggests that these assessments measure achievement more accurately and validly than do
traditional tests. Advocates argue that these types of assessments often represent the most
objective, valid, and reliable information that can be gathered about individuals.
However,
these assessments do have their disadvantages as wen. They are costly and time consuming to administer and score, andthey
are hampered by questions about their validity and relia-bility. All these issues will be addressed later iothe text.CURRENT
TRENDS IN EDUCATIONAL
MEASUREMENT
There have
been a
number of recent developments that have considerably altered the face of contemporary education. Only a select few are reviewed here to help you see that the classroom teacher's involvement with testing and assessment will only increase, as will its importance."High-Stakes" Testing
"High-stakes" testing refers to the use of tests and assessments alone to make decisions that
are of prominent educational, financial, or social impact. Examples include whether (a) a student may be promoted to the next grade (see the sidebar about the Waco, Texas, public schools) or graduate from high school; (b) a schoo~ principal, or teacher receives a finan-cial reward or other incentive, such as a school being identified as "exemplary" or "low per-forming"; (c) a state takes over the administrative control of a local school; and (d) a principal or teacher is offered an employment contract or extension.
For example, in 1994, the state of Texas began to use a passing score on the Texas Assessment of Academic Skills (TAAS) test to determine which students would be granted high school diplomas. Students are first given the opportunity to obtain a graduation cutoff score on the TAAS in the 10th grade. If they do not, stodents may retake the
teSt.
High-stakes testing is not a Texas or even a regional phenomenon, however. By 2003 ~states will be using high-stakes test results for graduation decisions (Doherty, 20(2). Use of the TAAS cutoff score as a graduation requirement is not without its detrac-tors. Critics are concerned that this requirement will be unfair to the state's growing
minor-ity populations
becayse
their performance on the TAAS lags that of Caucasian students. Indeed, a lawsuit was filed in 1997 by the Mexican American Legal Defense andEduca-tioual Fund (MALDEF) asking the state to stop using what was described as an
"invalid.
courts ruled in favor of the stale and a similar suit also failed in Indiana (Robelen, 2001). Critics also point out that, in states tbat have recently adopted higb-stalces graduation tesIs, today's seniors may be disadvantaged beclR,lse they are being held to higher standards than existed when they began school and that adequate remedial opportunities may
nOt
exist. Only abourhalf of the 18 Stales that require graduation exams provide funds for remedia-tion of students who fail (Doherty, 200 I).High-stakes test results also are increasingly being used to make promotion decisions, with seven stales slated to use high-stakes test results for promotion decisions in 2003 (Doherty, 2(02). And, with the January 2002 passage of the No Child
Left
Behind Act annual "academic assessmentsf lto determine whether pupils are learning will soon be a nationwide requirement for all children in grades 3-9.
Supporters of high-staIc.es testing point to continually increasing percentages of stu-dents who have passed high-stakes tests like the TAAS in Texas since it was implemented in 1994 and claim it has had a motivating effect on students, parents, teachers, and princi-pals. Both critics and supporters of TAAS-related education refonn initiatives in Tt;Xas agree on one thing: In general, Texas students have demonstrated substantial gains in TAAS performance. In 1994,53% passed all the TAAS subtests (reading. math, and writing). with 73% passing in 1998 and 82% passing in 2001. Even tatger proportional gains have been evident for African-American and Hispanic students. However. critics point to a RAND Corporation study (RAND, 2000) that indicated that these gains are far less evident when performance of the same students is measured on a different test, such as the National Assessment of Educational Progress (NAEP), the closest thing we have to a national test. And, this same report showed that the reading achievement gap between minority and Caucasian students was increasing rather than decreasing.
High-Stakes Testing: Pressure on Teachers and Administrators There is little doubt that high-stakes testing has bad a motivational effect Unfortunately, what is motivated may not always he desirable Qf appropriate. There have been disturbing examples
of the lengths to which some teachers and administrators may go to increase test
scores.
"Students Implicate Round Rock Teachers" was a headline in a Texas newspaper (Austin American-Stmesman, Septemher 29, 1994). The article went on to Stale that students testi: fied that three district teachers "pointed out answers, urged pupils to louk over questions again, and made gestures to indicate whether answers were correctf lwhile administering the
TAAStest.
One of the attorneys representing tile teachers said the teachers only were guilty of "unintentional mistakes," which may be true, especially if they did not have any specific coursework in tests and measurements during their preservice training. He went on to state tbat these kinds of violations of standardized testing procedure were "minor, approaching trivial." As you will see in Chapter 18, such violations are far
fro!fi
trivial. They undermine the very reason districts undergo the considerable expense and effort involved inadminis-~ terlng standardized tests. This is not an isolated problem. Similar incidents were reported in
the 32 states and 34 big city districts that had student examination systems based, at least in part, on standardized test scores in 1998 (EdJlcatiQn Week on the Web, February II, 1998).
In a June 6, 2000 episode of Nightline entitled "Cheating Teachers" Ted Koppel
docu-mented a range of incidents of cheating involving various teachers and administrators (Koppel, 20(0). And, in January 2002 tile Austin public schools became the first school
district to be convicted of criminal charges as a result of tampering with high-stakes test scores by district officials (Martinez. 20(2). !
Interpreting High·StakesTests:The Lake Wobegon Effect lnauthorGar-rison Keiller's (1985) fictional town at Lake Wobegoo, all students score above the national average. How
can
this happen'! At first, it seems a statistical impossibility. If the average is the sum of all the test scores divided by the IlUIllber of students who took: thetest.
then abouthalf must score below average. Righi? Right! Well, then, what is the explanation'! We could
simply remind you that the novel is fiction and let it go at that, but by now you know we would not do that. We must find an explanation.
First, a standatdized noon-referenced test uses a norms table to determine score
rank-ings. The table is compiled from the scores of students who took the test earlier when the test was being developed or revised. In reality, none of the scores of the students who take a standardized test after it is distributed will affect the norms table. In theory, it is possible for all students who take a test to score above average, or even above the 90th percentile. Indeed, as teachets and district administtators become more familiar with a particular test
it becomes increasingly enticing to "teach to the test." a practice that should be condemned but too frequently is condoned either
directly
or indirectly. This is most likely to occur when standardized test scores become the only basis for high-stakes decisions involving pr0mo-tion; graduation; financial incentives for schools, administrators, and teachers; teacher eval-uations; allocation of tax dollars; or local real estate development, or when political rather than pedagogical considerations are allowed to become of paramount ooncem. The impor-tant point is that scoring above the average on a standardized test may not necessarily mean a student is doing well. It may mean the teacher has been teaching to the test rather than teaching the critical thinking, independent judgment, and decision-making skills that are more closely related to performance in the real world. This is one of the reasons why the performance and portfolio assessment trends we describe in Chapters 8 and 9 have become as popular as they have.High-Stakes Testing
at
the National Level Where is all this interest in statewide educational testing and assessment headed'! Calls for the creation of a single, national set of tests for the various academic subjects came from former President Bush in 1991 and President Clinton in 1996. In January 2002 PresidentGe<qe W. Bush signed into law federallegislatioo that will require annual academic assessments of all pupils in gtades 3-9. However, itappears
that states will be free to decide on the type of annual assessment to administer, rather than being compelled to use a single, national test.The notion of a unifonn national test that would be required of all students bas proven to be a political hot potato. Advocates emphasize the need for a single test to uniformly evaluate education reform initiatives to facilitate accountability among states, districts, and schools. Detractors argue that a national test is the first step toward a federal takeover of loca1- and state-determined educational curricula or that the results of such a national test will be used to reinforce biases against low performing students, schools, and states. In spite of the fact that both Republican and Democratic presidents and others have mgued for
national tests, the issue remains a highly partisan. emotional, and politicized one that may prove difficult to resolve.
Currently, the closest thing we have to a single. national test or assessmeut is the National Assessment of Educational Progress. The NAEP is an independent and compre- . hensive assessment system used to evaluate progress toward the six National Education Goals established by former President 'Bush and the nation's governors in 1989. The NationatEduc:ation Goals were renamed "Goals 2000" in 1994, and they are listed in
the
sidebar.The NAEP is employed by the National Education Goals Panel, an independent
bipartisan panel charged with reporting annually 011 progress toward the National Education Goals. Although the NAEP potentially enables "apples to apples" oomparisons across stu-dents, schools, districts, and states, this potential has yet to be realized because it is BOt uni-fonnly accepted or administered across states and localities. In contrast, there now exist a wide variety of statewide required tests and standardized tests in various subjects that states and districts now use to evaluate their educational programs.
1997 Amendments to the Individuals with Disabilities Education Act (lDEA-97)
The passage of the 1997 Amendments to the Individuals with Disabilities Education Act· (IDEA-97) represented a signifICant change in the education of the disabled. The implica-tions of this change for the role of general education teachers in educating and assessing children with disabilities are significant
The intent of Congress in passing IDEA-97 was to reaffinn that children with dis-abilities are entitled to a free and appropriate public education (FAPE) and to ensure that special education students have access to all the potential benefits that regular education
>":aox
_ ,"':'. ,',,'
. J " ' " , " , ".' '
~ , "
THE NATIONAL EDUCATION GOALS (GOALS 2000)
l. By the year :1MO, all children in America will , start school tad}' to learn.
2. By the year:1MO, the high school graduation rate will increase to at least 90%.
3. By the year 2000, American students will leave grades 4, 8, and 12 having demonstrated
c0mpe-tency in cballenging subject matter, including English, mathematics, science. history, and geograpby; and every school in America wiD ensure Ibat aU students learn to use their minds
wen,
so dJat they may be prepared for responsi. ble citizenship, further learning, and productive employment ia our modem economy. 4. By the year 2001, U.s. students wiD be first inthe world in science and mathematics achievement
5. By the year 2000, every adult American will be literate &lid will possess the knowledge and skiJIs necessary to compete in a global economy and exercise the rights and responsibilities of citizensbip.
6. By the )'e!If 2000, every school in America will
be free of drugs and violence and wiD offer a disciplined environment conducive to learning. For more iDfonnation conIaCt:
National Education Goals Panel 1350 M Street, NW
Suite 270
Washington, DC 20036
Or, visit the Web site at bUp:/www.coIed.umn.eduI CARE\www/K-12"s/NationalCenter.btml
students have from the general cuniculum and education reform. Accountability was enhanced by requiring that children with disabilities participate in the sami annual mlua-tions required of general education pupils. IDEA-97 also emphasiZed the importance of raising standards for children with disabilities and regularly evaluating their progress
toward these Slandards.
Under IDEA-97. with only rare exceptions, schools are now required to include all children with disabilities in the general education classroom, curriculum, and IIDIlUIIl
statewide and districtwide achievement testing. And regular education teacbers are now
required to be members of the Individual Education Program (lEP) teams for each child with a disability in their classes. The IEP teams must determine bow the performance and
progress of disabled learners in the general cuniculum will be assessed and collect thedata necessary to make such determinations, including behavioral data when a child with a dis-ability's behavior impedes progress in the general cuniculum.
The implications oflDEA-97 for the testing and assessment skills of teachers will be discussed more thoroughly in Chapters 20 and 21. For now, suffice it to say that full
imple-mentation of the law will require more, rather than less, testing and assessment Testing of children with disabilities within the regular classroom by the classroom teacher will be required to help determine whether a student is in need of special education, to adhere to each student's IEP, to evaluate each child with a disability's progress toward the goals and
short-term objectives established in the lEP, to help determine whether behavior is
imped-ing progress in the general cuniculum, and to meet the law's accountability requirements. This may seem to place an additional burden on regular classroom teachers, and it does. In
Chapters 20 and 21 we will explain bow the testing and assessment skills you will develop
in this course, coupled with the use of existing standardized tests, checklists. and rating scales,
can
enable you to meet these new requirements with a minimum of additioual effort.Performance and Portfolio Assessment
Performance and portfolio assessment, sometimes referred to as authentic assessment, gained popularity in the 1990s for different reasons. To some, these approaches represent a small revolution in the way testing is defined and implemented. They reject the notion that accurate assessments of behavior
can
be derived only from formal tests, such asmultiple-choice, ttue-false, matching, short answer, and essay examinations. Instead, they advocate for performance and portfolio examinations that ask the learner to carry out the activities
actually used in the real world (such as measuring the tensile strength of a building mafer-ial, estimating the effects of pollutants on aquatic life, designing circuitry for a
IlIlicro-processor,
or assemblinga
folder or portfolio of "works in progress" to reflect growthin
critical or integrative thinking or other skills over the course of a semester or school year). Assessment techniques might include having students videotape portions of their ownprojects, conducting interviews with the student to probe for understanding and thiBking abilities, or making a visual inspection of the product to determine whether it has the required characteristics for successful operation. These forms of assessment are intended to reduce pressures to test solely for facts and skills and to provide a stimulus to introduce more extended thinking and reasoning activities in the cuniculum (Resnick & Resnick,
1989; Tombari & Boricb, (999). We will have more to say about performance and portfolio assessment in Chapters 8 and 9.
Under IDEA-97, cbildreo with disabilities must be evaluated regularly to assess !heir ongoing progress in the general educa60n cuniculum. The purposes of these evaluatioas are
twofold: (a) to provide parents with re~ reports of progress at least as often as
noodis-abled children receive report canfs and
(b)
to determine whether children withdisabiuues
as a group are progressing in the general curriculum as indicated by their perl'ormaoce on statewide and districtwide annual assessments.Yet, the same disabilities 1hat qualify some pupils as children with disabilities may also hamper or preclude their ability to participate appropriately in testing that is required for nondisabled pupils. Performance and portfolio assessment may offer general and spe-cial education teachers alternative means by which to annually and on a day-to-day basis evaluate the progress of children with disabilities in the general education cuniculum.
Education Reform and the Global Economy
Two other trends, educational reform and the emergence of the global economy, or global competition, have been the impetus for much change in education. Increased use of tests and other assessments is ensured because they will be used to evaluate the effectiveness of
these reforms and changes, at least in part. f
The education reform movement arose in response to the National Commission on Excellence in Education's release of A Nation at Risk: The Imperative for EducatiofUll Reform in 1983. which documented the shortcomings of the U.S. public education system at that time. Since then a number of reforms have been widely adopted in public education. These include raising expectations. establishing academic and performance standards, implementing high-stakes testing. requiring greater accountability, providing incentives fur
improved perl'ormance, improving teacher salaries. establishing local or site-based man-agement and decision making, and making innovations in teacher training. Nevertheless. the results of almost 20 years of education reform have been mixed, and decision makers continue to rely on test results to evaluate the effectiveness of various reforms.
With recent advances in telecommunications and technology it is now clear that our students will have to compete in an international or gIobaIlIllIIketplace for jobs that will require strong skills in mathematics and science. But are our students able to hold their own in an international competition? A recent study indicated that 9-and 100year-oid Americans scored above average in science and math, with 13-year-01ds at the international average in math and below average in science (Education Week on the Web, lune 18, ,1997). However.
a more recent study contradicts this finding. The results of the Third Inteniational Mathe-maties and Science Study, released in early 1998, indicated that U.S. pupils are poorly pre-pared to compete internationally in math and science. Of 21 nations that took part in the study, only students from lithuania, Cyprus, and South Africa did worse than U.S. seniors (Education Week on the Web, March 4, 1998).
Since public education is supported by taxpayers, educators must be responsive to the demands of taxpayers. With the mixed results of education reform, and with continued con-cern about global competition, increasingly high-tech job requirements. grade iIIfIation, and decreasing confidence in public education, school boards and the general public are becom-ing more and more concerned that they "get what they pay for." Yearly, high-stakes tests have become the mechanism through which these accountability demands are cummtly being met. Often these results are combined with other ratings (e.g., attendance, drop-out
rates) and linked to complex formulas that are used to evaluate the performance of districts, schools, principals, and teachers, with financial and other incentives, or penalties, tied to improvements or declines in ratings.
Competency Testing
for TeachersAnother current trend in our educationaI system is toward competency testing for teachers. In the early 19808 a number of states passed legislation requiring teachers to pass paper-and-pencil competency tests of teaching. Concurrent with these trends was the development of professional teaching standards. The National Board for Professional Teaching Standards
.<;:~;{;:~!'J"',,-(NBPTS) for advanced certification was formed in 1987 with three major goals:
,,;.",.)p/ --.";,'.::::~.:...
~"i. To establish high and rigorous standards for what effective teachers should know and
7~'~ be able to do;
~. ',-;
,
£~ To develop and operate a national, voluntary system to assess and certify teachers\(\ ., () "'~ /;)} who meet these standards;
\,,':~"'" f ,"". :Vi11! Ai;./J
~;'1~:rf;~I1~'~::~v 3. To advance related education reforms for the purpose of improving student learning
~~ in American schools.
During the same year, the Interstate New Teacher Assessment and Support Consortium (INTASC) was formed to create "board-compatible" standards that could be reviewed by
professional organizations and state agencies as a basis for licensing beginning teachers (Miller, 1992). Much controversy still surrounds the use of tests aligned with these standards. No one wants to see poorly trained teachers in the classroom. However, the development of a cost-effective, vaIid, and reliable paper-and-pencil method of measuring the complex set of skills and traits that go into being an effective classroom teacher remains elusive. Currently, such tests are not ordinarily used to prohibit a teacher from teaching or to termiuate an expe-rienced teacher. Instead, they often are used on a pre-employment basis to document mini-mum levels of competency in various content areas (Holmes, 1986; Miller, 1992). But with the increase in emphasis on performance and portfolio assessment of students (several types of which will be described in Chapters 8 and 9), development of standardized performance and portfolio assessment medlods for teachers are on the horizon. If such procedures are developed, they may provide for a more valid assessment of teaching competency than paper-and-pencil measures alone for both preservice and inservice teachers.
Nevertheless, interest in the use of paper-and-pencil tests to evaluate teacher compe-tency has continued to increase. In 1998 the Higher Education Act required teacher training progtams in colleges and universities to report their students' scores on teacher licensing tests. By 200 1 forty-two states and the federal government relied heavily on test scores to judge teacher quality and the quality of teacher preparation programs (Blair, 20(1). Yet, a report from the National Research Council (2000) cautioned that there are many limitations to reliance on tests as the sole measure of teacher competence.
Increased Interest from Professional Groups
An assorIment of contemporary issues have drawn the attention of professional organiza-tions, researchers, and policy makers in recent years. Individually and in collaboration,