[Tom Kubiszyn, Gary D. Borich] Educational Testing

(1)

SEVENTH EDmON

) ;

EDUCATIONAL TESTING

AND MEASUREMENT

Classroom Application and Practice

TOM KUBISZYN

Universily

of

Hourtofl

. GARY BORICH

The Universily

of

Texas at Austill

(2)

Ar.qIIisiUoIls Editor BnId H _ Marbting MaDagef ~ K4I'i1t Ilo1107

Senior Production Editor Wlkrk A ~

Senior Designer Harry NoIim l'rodlIc1ion Management Services AIJO$)'

This book was set in 10/12 TImes Roman by Argosy and prinIed and bound by R. R. DonneIIey & Sons Company.

The cover was prinIed by Phoenix ColorOJcponlion. This book is printed 011 acid·free papet:

§

No part of this publication may be reproduced. stom1 in a reuir.vaI system or transmitted in any form or by any means, electronic, mechanical, pOOIocopying. n:cording, SClIIlIIing or<llilerwise, except as permitted under 8«;. lions 107 or lOS of!he 1976 United States Ccpyright Act, without eiIher the prior written ~ of!he Pub-lisher, or audtorizalion Ihrough payment of!he appropriate per-copy fee to !he Copyrighl Oearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 7S04470. Requests to !he Publisher for

pennis-sian sbould be addn:ssed to die Pennissioos Depanment, IoIJn Wiley & Sons, Inc., 111 River Street, Hoboken, NI 07030. (201) 748-6011, fax (201) 748{i()()8, &Mail: [email protected].

10 otderboob please caUI(800)-225-S94S. l.iImfry of Omgrns C4laIoging ;" l'dlic:GtiiHt Daz: KubiszyD. Tom.

EdooIIionallesling and measuremeot: classroom application and pracIice ITom Kubiszyn, Gary Boricl!.-71h ed.

p.em.

Includes bibliographical references 8IId iIIdex.

l. EducaIionaI tests and measuremeots-United States. L BoricIl, Gary D. H. Title.

LB305I.K8 2003 371.26'0973-0021

ISBN 0-471·14977·2 (cloth: alt. paper)

Printed in !he United States of America 10981654321

(3)

PREFACE

Two major developments in classroom testing and measurement explain why we have decided to incorporate substantial additions and revisions to this, the seventh edition of

Educational Testing

muJ

Measure~nt. These developments were the rapid spread of bigh-stakes testing to all 50 states and the District of Columbia in the past few years and grow-ing awareness among regular education ¢achers about their increased responsibility for special education students under the 1997 amendments to the Individuals with Disabilities

Act (IDEA-97).

High-stakes test scores are now widely used for student promotion and graduation decisions and for educational accountability purposes, sometimes with substantial school or district incentives and penalties tied to student performance on high-stakes tests. And, with the passage ill January 2002 of the "No Child Left Behind Act," we now have a federal mandate that will soon require annual academic assessments of all school children in grades 3-8. This ensures continued and probably increased attention to high-stakes tests in the foreseeable future. Under the IDEA-97 regular education teachers are now required to play a much broader role than in the past in the instruction and assessment of special education students included in regaIar education classrooms.

Because these developments have generated intense controversy (i.e., especially the rapid spread of high-stales testing), one of the goals of this revision was to inform instruc-tors and future teachers about these important developments in a balanced and thoughtful way. And, because all future teachers will have to cope with the demands of high-stakes testing and full compliance with IDEA-97, another goal was to provide future teachers with practical information and recommendations they can immediately use in the classroom to prepare themselves and their students for high-stakes testing and the challenges of

IDEA-97. Nevertheless, as important as these developments are, the overarching goal of this revision was to remain true to the friendly style, content, order of presentation, and length of past editions of Educational Testing and Measurement.

As with all previous editions we have continued to present complex test and mea-surement content in a mendly, non intimidating, and unique manner and to relate this con-tent in meaningful ways to important developments in educational measurement and assessment In completing this revision we have kept our audience-classroom teachers-fully in mind. We have striven to present often abstract and sometimes diffICUlt concepts and procedures in an up-to-date and accurate, but accessible manner. Rather than over-whelm students with jargon and statistical theory, we continue to use a friendly, conversa-tional style to enhance our emphasis on the application of theory. At the same time, we provide sufficient theoretical background to ensure that students will understand the foun-dations of measurement and avoid an oversimplified approach to measurement. Thus, long-time users of the text should continue to feel comfortable with it

(4)

Past users of the text should bave no difficulty recognizing and adapting to this revi-sion.. The overall organization has been only slightly modified, and the flexible organiza-tion of the text continues to enable instructors to either follow the chapter

sequence

as is or-modify it as needed to meet their particular needs. A new chapter has been added (Chapter 2, High-Stakes Testing), another has been significantly expanded (Chapter 7, Writing J!ssay and Higher Order

Test

Items), and several other chapters have been revised and updated to seamlessly integrate

the

new material on high-stakes testing and IDEA-97 and other devel-opments. To helpKeep the text's length reasonable the section on planning a schooIwide testing program has been deleted from Chapter 19 since this function has become obsolete in the

face

of the adoption by all states of state-mandated high-stakes tests. Other changes to the seventh &tition are described in more detail next

Chapter I has been revised and updated. It continues to provide up-to-date infor-mation on the increasingly important distinction between testing and assessment and new information about a variety of contemporary trends, especially high-stakes testing, the implications of IDEA-97 for regular education teachers, and competency testing for

~rs.

Chapter 2 is a new chapter devoted to the high-stakes testing phenomenon. It defines high-stakes testing. traces its histoly, reviews both sides of the controversy surrounding the use of high-stakes tests, considers the position taken by national measurement associations, and provides future teachers with concrete recommendations they can use to prepare them-selves and their students for high-stakes tests.

Chapter 3, which was Chapter 2 in previous editions. has been updated.

Chapter 4, Norm- and Criterion-Referenced Tests and Content Validity Evidence, consolidates Chapters 3 and 4 from previous editions into a single chapter. Several review-. ers suggested combining Chapters 3 and 4 into a single chapter because oflheir brevity. To

minimize confusion we have maintained the same topic sequence as in previous

editionS.

Throughout Chapter 4 and in several later chapters we have substituted "validity evidence" for "validity" when appropriate to ensure continuity with language included in the most recent edition of the Standards for EduclJtional and Psychological Tests (American

Educa-tional Re~ Association, 1999).

Chapters 5 and 6 also have been updated.

Chapter 7, Writing Essay and Higher Order Test

Items,

has been substantially revised

and expanded. It now includes a wider variety of examples of essay items to help teachers see how they can be used to measure higher order thinking and problem-solving ability. The sections on scoring also have been revised and updated. And, a new section has beeR added to help teachers assess how well students can organize and access knowledge and another new section provides guidance and many examples to help teachers design and utilize

open-book questions and tests.

Chapter 8, Performance-Based Assessment, was Chapter 9 in previous editions. Chapter 9, Portfolio Assessment, was Chapter 10 in previous editions.

Chapter 10, Administering, Analyzing, and Improving the Test, was 0Iapter 8 in pre-vious editions.

I

j

These chapters were reordered at the recommendation of reviewers who noted that many principles covered in Administering, Analyzing, and Improving the Test applied to

per-formance and portfolio assessments, and not just to objective and essay items. In previous

I'·.~

editions this chapter followed the chapters on objective and essay items but pm:eded the

(5)

vii

chapters on

perfOtmance

and portfolio assessments. In this edition this chapter now follows all four chapters devoted to cIassroom-based assessment-objective items, essay and higher order test items, performaoce assessments,and portfolio assessments.

Chapters 11-14 have been updated.

Chapter 15, Validity, has been revised to make it consistent with the approach to the

establishment of validity evidence described in the latest edition of the StlINkmls for

Edu-cational and Psychological Tests (American Educational Research Association, 1999). Rather !han considering validity to be a characteristic of a test, the new edition stresses the importance of acquiring evidence of a test's validity for a particular use.

Chapters 16 and 17 have been updated.

Chapter 18, Standardized Tests, has been revised It continues its extensive treatment of the history, utility, and interpretation of standardized tests, with increased attention paid to the use of both standanlized norm-referenced and standardized criterion-referenced tests

in high-stakes testing programs.

Chapter 19. Types of Standardized Tests, has been revised and infonoation regarding various standardized tests has been updated. The entire section entitled "Planning a School-or District-wide Testing Program" has been deleted because this function has been sup-planted by state legislatures and state education agencies with the spread of the high-stakes testing phenomenon.

Chapters 20 and 21 have been revised and updated to better infono regular education teachers about their increased responsibilities for evaluating the educational and behavioral progress of special education students included in their regular education classrooms and curricula under IDEA-97. At the request of reviewers, person-first language (e.g .• children with disabilities) has replaced the language previously used to refer to children in special education programs (ie.., special learners ). New examples of recently developed or revised behavior rating scales that regular education teachers are increasingly expected to complete are included in Chapter 21. These include scales used to assess medicatiou safety and effi-cacy for the growing number of pupils taking medications that can affect learning and behavior.

Pinally, Chapter 22 also has been revised to reflect the rapid spread of the high-stakes testing phenomenon and the added responsibilities for regular education teachers for com-pliance with IDEA-97.

Throughout the text we have added references to a variety of contemporary measure-ment trends. tying these today-to-day decision making for the classroom teacher. And, we have updated our references, suggested readings. and list of supplemental statistics and measurement texts to include recent articles. chapters, and books that reinforce and expand the changing face of educational measurement in today's classroom. ..

As with earlier editions, readers will find at the conclnsion of each chapter a step-by-step summary in which all important concepts in the chapter are identified for review. Addi-tionally, we have prepared new discussion questions and/or exercises for each new chapter

and section. These discussion questions and exercises should help students learn how to apply the concepts presented and, along with the Instructor's Manual. should help instruc-tors identify organized activities and assignments that can be integrated into their class pre-sentations. Discussion questions and exercises marted with an asterisk: have answers listed in Appendix D.

(6)

R

We have tried to select ttaditional and contemporary topics and provide examples that help the teacher, especially the beginning teacher, deal with practical, day-to-day issues related to the testing and assessment of students and measuring their behavior. The topics we have chosen, their natural sequences and linkage to the real-life tasks of teachers, the step-by-step summaries of major concepts. and our discussion questions and exercises all work. we believe, to make this text a valuable tool and an important resource for observing. measuring. and understanding life in today's changing classroom.

Tom Kubiszyn Gary Borich

(7)

ACKNOWLEDGMENTS

We would like to express our appreciation to the following instructors for their constructive comments on this revision: W. Robert Houston, University of Houston; Alice Coridll, Uni-versity of Nevada-Las Vegas; Robert Paugb, UniUni-versity of Central Florida; Priscilla J. Hambrick, City University of New York; and Pam Fernstrom, University of North Alabama.

Thanks also are owed to Bill Fisk, Clemson University; David E. Tanner, California State University at Fresno; Gregory 1. Cizek, University of Toledo; Thomas 1. Sheeran, Niagara University; Jonathan A Plucker, Indiana University; Aimin Wang, Miami University; William M. Bechtol, late of Southwest Texas State University; Deborah E. Bennett, Purdue

University; Jason Millman, Cornell University; David Payne, University of Georgia; Glen Nicholson, University of Arizona; Carol Mardell-Czudnowski, Northern Illinois University; and James Collins, University of Wyoming, for their constructive comments on earlier revi-sions. Also. we thank Marty Tombari for his contributions to Chapters 8 and 9 and other examples, ilInstrations, and test items in this volume and Ann Schulte for her contributions to Chapter 17. Fmally. we thank Denise Branley for her revisions to the Instructor's Manual

and Test Bank.

(8)

(9)

xiii

The List 237

The Simple Frequency Distribution 238

The Grouped Frequency Distribution 238

'Steps in ConstnJctioll a Grouped Frequency Distribution 240

Graphing Data 243

The Bar Graph, or Histogram 244 The Frequency Polygon 244 The Smooth Curve 246

Measun:s of Central Tendency 250 TheMean 251

The Median 252 TheMode 256

The Measures of Central Tendency in Various Distributions 258

CHAPTER 13 VAR1ABIUfY, THE NORMAL DISTRIBUTION, AND CONVERTED SCORES 263

The Range 263

The Semi-Interquartile Range (SIQR) 264

The Standard Deviation 265

The Deviation Score Method for Computing the Standard Deviation 269

The Raw Score Method for Computing the Standard

Deviation 270

The Notmal Distribution 272

Properties of the Normal Distribution 273

Converted Scores 277 z-Scores 280 T-Scores 284 Summary 285 For Practice 285 CHAPTER 14 CORREUTlON 287

The Correlation Coefficient "288

Strength of a Correlation 289

Direction of a Correlation 289

Scatterplots 290

Where Does rCorne From? 292 Causality 293

Otller Interpretive Cautions 295

CHAPTER 15 VAUDlTY 299

(12)

dv

CONTENTS

l)pes of Validity Evidence 299 Content Validity 300

Criterion-Related Validity 300 Construct Validity 302

WIIal Have We Beea Saying? A Review 303

Interpreting Validity Coefficients 305 Content Validity Evidence 305

Concurrent and Predictive Validity Evidence 30S

)ummary 309

For Practice 309

CHAPTER 16 REUAB1UfY 311

-Medtods of Estimating Reliability 311

Test-Retest or Stability 311 Alternate Foons or Equivalence 313 Internal Consistency 313

IIlterpreting Reliability Coefficients 316

Summary 319

For Prac1ice 320

CHAPTER 17 ACCURACY AND ERROR 321 Eaor-What Is It? 321

The Standard Error of Measurement 323 Using the Standard Enor of Measurement 324 More Applications 327

Standard Deviation or Standard Error of Measurement? 330

Wby AD the Fuss about Eaor? 330 Error within Test Takers 330 Eaor within the Test 331 Eaor in Test Administration 331 Eaor in Scoring 332

Sourees of Error Inf10encing Various Reliability Coefficients 332

'1est-Retest 332

Alternate Foons 333 Internal Consistency 333 Band Interprelation 335

Steps; Band Interpretation 336 A Fmat Word 339

Summary 340 For Prac1ice 341

CHAPTER 18 STANDARDIZED TESTS 343 What Is a Standardized Test? 343

Do Test StimnIi. Administration, and Scoring Have to Be Standardized? 345

Standardized Testing: Effects of Accommodations and AItemative Assessmems 345

Uses of Standardized AcI!ievement Tests 346

WiD Performance and PorUoIio AssessmeIIt Mate Standardized Tests Obsolete? 347

Administering Standardized Tests 348

'JYpes of ScoRs Offered for Standardized Achievement Tests 350

Grade Equivalents 350

Age Equivalents 351 Percentile Ranks 352

Standard Scores 352

Interpreting Standardized Tests: Test and Student Factors 354

Test-Related Factors 354

Student -Related Factors 361

Aptitude-Achievement DisCrepancies 365

Interpreting Standardized Tests: Parent-Teacher Conferences and EducaIiooal Decision Making 368

An Example: Pressure to Change an Educational

Placement 369

A Secnnd Example: Pressure from the Opposite Direction 373

Interpreting Standardized Tests: Score Reports from the Publishers 376

The Press-on Label 377

A Criterion-Referenced Skins Analysis or Mastery Report 380

An Individual Performance Profile 381

Other Publisher Reports and Services 382 Summary 383

For Practice 385

CHAPTER 19 TYPES OF STANDARDIZED TESTS 387

Standardized Achievement Tests 387

Achievement Test Batteries, or Survey Batteries 388 Single-Sobject Acbievement Tests 390

Diagnostic Achievement Tests 390 Standardized Academic Aptitude Tests 391

The History of Academic Aptitude Testing 391 Stability of IQ Scores 392

What Do IQ Tests Predict? 393

Individually AdministemI Academic Aptitude Thsts 394

Group Administered Academic Aptitude Tests 394

Standardized Personality Assessment Instruments 395

What Is Personality'! 395 Objective Personality Tests 397 Projective Personality Tests 398

(13)

SIII1IIIIII'Y 398 For DiscuSsion 399

CHAPTER 20 TESTING ANDASSESSINGCH1WREN wrrH SPECIAL NEEDS IN THE REGULAR

CUSSROOM 400

A Brief History of Special Education 403

P.1. 94-142 and the Individuals with Disabilities EducationAct 403

Section 504 of the Rehabilitation Act 404

Special Education Service Delivery: An Evolution 404

Service Delivery Setting 405 Determining Eligibility for Services 407 Disability Categories to Developmental Delays 408 IDEA-97 and the Classroom Teacher 409

Testing or Assessment? 409 Child Identification 410 Individual Assessment 412

Individual Educational Plan (IEP) Development 415 Individualized Instruction 418

Reviewing the IEP 418 Manifestation Determinations 419

At the Other End of the Curve: The Gifted Child 419 Defining "Gifted" 420

Assessment and Identification 420

Current Trends in Teaching and Assessing the Gifted and Talented 424

Summary 425

For Discussion 426

CHAPTER 21 ASSESSING CHIWREN WITH DISABIUfIES IN REGULAR EDUCATION . ClASSROOMS 421

IDEA-97: Issues and Questions 428 Assistance for Teachers 428

Can Regular Teachers Assess a Child with a Disability? 429

Should Regular Teachers Assess a Child with a Disability? 429

Assessing Academic Performanre and Progress 430 Teacher-Made Tests and Assessments 430 Standardized Tests and Assessments 431 Limitations of Accommodalions and Ahernative

Assessments 431

Assessing Behavioral and Attitudinal Factors 432 Assessment, Not Diagnosis 432

Classroom Diversity, Behavior, and Attitudes 433 Behavior Plan Requirements UDder 1DEA-97 434

Teacher-Made Behavior and Attitude Assessments 434

Distinguishing Behavior from Attitude 434 Assessing Behavior 435

A,ssessing Attitudes 441

MOOiloring Cbi1dren with Disabilities Who Are 13king . MedicaIioo 448

Medication Use Is Increasing 449 Side Efft:cts May Be Pmsent 449

The Teacher's Role in Evaluating Medication and PsychosociaIlnterventions 449

Commonly Used Standardized Scales and Checklists 450

Summary 453 For Discussion 455

CHAPTER 22 IN THE ClASSROOM: A SUMMARY DIALOGUE 457

High·Stakes Testing 462

Criterion-Referenced Versus Norm-Referenced Tests 462 New ~ibilities for Teachers under IDEA-97 463 Instructional Objectives 463

The Test Blueprint 464

Essay Items and the Essay Scoring Guides 464 Reliability, Validity, and Test Statistics 465 Grades and Marks 466

Some Final Thoughts 467

APPENDIX A MATH SKIUS REVIEW 469

APPENDIX B PEARSON PRODUCT-MOMENT CORRELATION 471

APPENDiX c STATISTICS AND MEASUREMENT TEXTS 419

APPENDIX D ANSWERS FOR PRACTICE 480

SUGGESTED READINGS 484

REFERENCES 491 CREDfIS 495

(14)

(15)

CHAPTER

1 AN INTRODUCTION TO

I

CONTEMPORARY EDUCATIONAL

TESTING AND MEASUREMENT

CHANCES ARE that some of your strongest childhood and adolescent memo-ries include taking tests in school. More recently, you

probably

remember taking a great number of tests in coUege. If your experiences

are

like those of most who come throUgh

our

educational system, you probably have very strong or mixed feelings about tests and test-ing. Indeed, some of you may swear that you will never test your students when you become teachers. If so, you may think that test results add little to the educational

process

and fail to reflect learning or that testing

may

tum off students to learning. Others may believe that tests are necessary and vital to the educational

process.

For you, they may rep-resent irrefutable evidence that learning has occurred. Rather than view tests as deterrents that tum off students, you may see them as motivators that stimulate students to study and

provide them with feedback about their achievement.

TESTS ARE ONLY TOOLS

Between

those who feel positively about

tests

and those who feel negatively about them lies a thinI group. Within this group, which includes the authors, are those who see tests as tools that can contribute importantly to the process of evaluating pupils, the curriculum, and

teaching methods, but who question the status and

power

often ~ven to tests and

test

scores. We are concemed that test users often uncritically accept test scores. This concerns us for three reasons. First, tests are only tools, and tools can be appropriately used, unintentionally misused, and intentionany abused. Second, tests, like other tools, can be well designed or poorly designed. Third, both poorly designed tools and well-designed tools in the haIIds of

ill-trained or inexperienced users can be dangerous. These three concerns motivated us to

write this text By helping you learn to design and to use tests and test results appropriately we hope you will be less likely to misuse tests and their results.

(16)

TESTS ARE NOT ii\iFALUBlE

Test misuse and abuse C3Il occur when users of test results are unaware of the factors that C3Il intluence the usefulness of test scores. The technical adequacy of a test, or its validity (seeCbapter 15) and reliability (seeCbapter 16), is one such factor; A variety of factors can dtamaticany affect the validity and reliability of a test. When a test's validity and reliability are impaired. test results should be interprett:d

very

cantiously, if at all. Too often, such coo-siderations are overlooked or ignored by professionals and casual observers alike.

Even when a test is technically adequate, misuse and abuse C3Il occur because tech-nical adequacy does not ensure that test scores are accurate or meaningful (see Chapters 17 and 18). A number of factors C3Il affect the accuracy and meaningfulness of test scores.

These include the test's appropriateness for the

purpose

of testing, the test's content valid-ity evidence (if it is an achievement test), the appropriateness of its norms table (if it is a norm-referenced test, a term we willleam more about in Chapter 3), the appropriateness of the reading level, the language proficiency and cultural characteristics of the students, teacher and pupil factors that may have affected administration procedures and scoring of the test, and the pupils' motivation and en~nt with the test on the test day.

Because technical adequacy and these interpretive factors can affect test

scores

dra-matically, our position is that test scores should never be uncritically employed as the sole basis for important educational decision mating. Nevertheless, with the rapid spread of the high-stakes testing movement (which we win discuss in detail in Chapter 2), a disturbing number of promotion and gtaduation decisions are being based on test scores alone. Instead of relying on such a limited "snapshot" of student achievement for important decision making, we recommend that test results should be considered to be part of

a

broader "movie" or

process

called f1SSeSsmenJ. It sbouId be the findings of the broad assessment, not just test results, that form the basis for important educational decisioo making. We will

describe the

process

of assessment in the next section and distinguish between testing and assessment See the sidebar about the Waco, Texas, public schools for a recent example of the controversial use of test results aIooe to make important educational decisions.

TESTING: PART OF ASSESSMENT

Unfortunately, the situation described in the sidebar is not unusual. Well-inlended educators

cootinue to rely solely or primarily on test results to make important edncational decisions. They may unintentionally misuse test results because they have

come

to regard test results as the end point rather than an early or midpoint in the much broader

process

of assessment Or, they may mistakeuly believe that testing and assessment are synonymous.

In the assessment

process,

test results are subject to critical study according to estab-lished measurement principles. If important educational decisions are to be made, critically evaluated test results should be combined with results from a variety of other measurement procedures (e.g., performance and portfolio assessments, observations, checklists, rating scales-an covered later in the text), as appropriate, and integrated with relevant

back.-ground and contextual ioformation (e.g., reading level, language proficiency, cultural considerations-also covered later in the text) to ensure that the educatiooal decisions are

(17)

BOX - -~"'.' . "~' . ' '" t ;.- "'fe.,~".",,<· .

- - -;;... • - . ~ - ";' ~i", -""{.. - ~ ~ . I

WACO, TEXAS. SCHOOLS USE STANDARDIZED TEST SCORES AWNE TO MAKE

/PROMOTION DECISIONS

Social promotion is a practice that purports 10 protect student self-esteem by promoting students to the next grade so that they may stay with their classmates even when students are not academically ready for promotion.

Educational. psychological, politicaI, fiscal. cultural, and other controversies are all associated with social promotion.

Concerned with possible negative effects of social promotion, the Waco, Texas. public schools decided 10 utilize standardized test scores as the basis for pr0mo-tion decisions beginning with first graders in 1998. As a result. the number of students retained increased from 2% in 1997 10 20% in 1998 (Austin American-Statesman, June 12, (998). The Waco schools are not alone in curtailing social promotion. The Chicago public schools, in the midst of a wide-ranging series of

educational reform initiatives, retained 22,000 students in 1994, with 175,000 retained in 1998 (Newsweek. June 22, 1998).

What has come 10 be known by some as the "Waco EXperimentf f

also raised a number of measurement related issues.

Wbereas the Waco schools' decision was doubtless well intended, their policy may have overlooked the fact that the utility of test scores varies dependent on age, with test results for yOWlg children less stable and more prone 10 error than those for older children. A relatively poor score on a test may disappear in a few days, weeks, or monltls after additional development has occurred,

irrespective of achievement. In addition, older children are less susceptible to distractions and, with years of test-taking experience under their belts, are less likely 10 be confused by the tests or have difficulty completing tests prGperty. All these factors can negatively affect a

SIlIdent's score and resuk in a score that uadetrepresents the student's true level of knowledge.

Furtbennore, a single standardized test score pr0-vides only a portion of a child's achievement over the school year, regardless of the grade level. As we will see when we consider the interpretation of staodard-!zed test results in Chapter 18, there are a DUmber of student-related factors (e.g., illness. emotional upset)

and administrative factors (e.g .. allowing too little

time. failing to read instructions verbatim) that can negatively affect a student's performance on the d'I)'

the test was taken. Thus, making a decision that so substantially affects a child's educatiun based 00 a single measure obtained on a single day rather than

relying ona compilation of measures (j.e .• tests,

rat-ings, observations, grades 011 assessments and portfo-lios. homework, etc.) obtained over the course of the

school year seems ill-advised

On the other hand, using data collected on a single day and from a single test 10 make what otherwise woald be complex, time--consuming, and difficult deci-sions has obvious attraction. It appears 10 be expedient, accurate, and cost-effective and appears 10 be address-ing concerns about the social promotion issue. How-ever, it also may be simplistic and shortsighted if no plan exists to remediate those who are retained. As noted in a June 12, 1998, editorial in the Austill Ameri-can-Statesman, "Failing students who don't meet a minimum average score. without a good plan 10 help them improve, is the fast track to calamity." Neverthe-less, this trend has not diminished since we first reported on it in our sixth edition. Indeed, the use of test scores 10 make high-stakes pr~motjon decisions has increased across the nation. We will explore this phe-nomenon in depth in Chapter 2.

appropriate. You can see that aJtboogh testing is one part of assessment, assessment encom-passes much more than testing. Figure 1.1 further clarifies the distillCtioo between testing

lind assessment

j)

Throughout the text we will refer to testing and/or assessment To avoid confusion later, note the distillCtion between testiog and assessment in Figure 1.1. Next, we will

sum-marize wby we believe it is of vital importance that all educators obtain a finn grounding in educational testing and assessment practice.

(18)

4 CHAPml1 AN IN11IODUC1lONTO CONTM'ORARY EDUCA1lONAl.1£S11NGANO MEASUREIoBIT

Testing

1. Tests are developed or selected (if standardized-see Chapter 18), adminis-tered to the class, and scored.

2. Test results are then used to make decisions about a pupil (to assign a grade, recommend for an advanced program), instruction (repeat. review, move on), curriculum (replace, revise), or other educational factors.

Assessment

1. Information is collected from tests and other measurement instruments

(portfolios and performance assessments, rating scales, checklists, and observations).

2. This information is criticaHy evaluated and integrated with relevant back-ground and contextual information.

3. The integration of critically analyzed test results and other information results in a decision about a pupil (to assign a grade, recommend for an advanced program), instruction (repeat, review, move on), curriculum (replace, revise), or other educational factors.

FIGURE 1.1 The distinction between testing and assessment

TESTING AND ASSESSMENT SKILLS:

VITAL TO TEACHERS

Over the next several pages we will alert you to a number of recent developments that indi-cate that classroom teachers will engage in more testing and assessment than ever before. Because the decisions that will be made may also be of increased importance, we believe that a firm grounding in testing and assessment is more than merely important for teachers. We believe it is vital! Here's why:

1. Appropriate or not, the use of test results to make annual high-stakes decisions about students (e.g., promotion, graduation), school personnel (e.g., pay increases and con-tinued employment), and even control of schools (e.g., state takeover of low per-forming schools) has increased dramatically in spite of vocal protests and complaints from teachers and other educators, attomeys and other advocates, and some parents and students (see Chapter 2).

2. To ensure that students are progressing toward achievement of state academic and performance standards (discussed more fully in Chapter 2) measured by high-stakes tests, the use of teacher-made tests and other measurement procedures (e.g., perfor-mance and portfolio assessments, observations, checklists, and rating scaJes--see Chapters 8, 9, and 21) to assess academic progress and support day-to-day instruc-tional decisions is also 011 the increase.

3. Recent federal legislation now requires the classroom teacher's involvement in the instruction and regular assessment of the performance and progress of special educa-tion pupils in the general curriculum--the domain of the classroom teacher, not the special educator (see Chapters 20 and 21).

(19)

,

RECENT HISlORY III EDUCATIONAL MEASUREMENr 5

4. Testing and assessmeat are now widely accepted as necessary for students, teachers, parents, administrators, and other decision makers to detennine whether students are learning and, increasingly, what th~ most cost-effective, culturally sensitive instruc-tional methods may be.

5. To be useful for decision making, tests and other measurement procedures must be technically adequate and appropriately and sensitively used.

Let's now tum to a review of recent and current developments and trends that have led us to the conclusion that enhanced skills in testing and assessment should now be consid-ered vital to the classroom teacher.

RECENT HISTORY IN EDUCATIONAL

MEASUREMENT

Beginning in the late 1960s, a fairly strong anti test sentiment began to develop in our coun-try. Over the next two decades many scholarly papers and popular articles questioned or denounced testing for a variety of reasons. Some decried tests as weapons willfully used to . suppress minorities. To others, tests represented simplistic attempts to measure complex traits or attributes. Still others questioned whether the traits or attributes tested could be measured, or whether these traits or attributes even existed! From the classroom to the Supreme Court, testing and measurement practice came under close scrutiny. It seemed to some that tests were largely responsible for many of our society's ills.

Initially it looked as though the antitest movement might succeed in abolishing test-ing in education-and there was professional and lay support for such a move. Today, it appears that this movement was part of a swinging pendulum. Calls for the abolition of test-ing and gradtest-ing gradually subsided, and by the late 1980s more tests than ever were betest-ing administered. Today, all 50 states have some sort of annual high-stakes test program in place and federal legislation passed in January 2002 will soon require annual "academic -; assessments" for all students in grades 3-9.

Voices calling for the abolition of testing have been overshadowed by the high-stakes testing juggernaut. While measurement experts continue to emphasize that all test scores are at best estimates that are subject to greater or lesser margins of error and that they should be used along with other sources of data to make important educational decisions, their voices too have been muted by high-stakes testing adv~.

Nevertheless, critics of testing continue to rai$e important issues. And most have come to realize that abolishing testing will not« a panacea for the problems of education and con-temporary society. Even the most outspoke1 critics of testing would have to agree that in our everyday world decisions must be made. If tests were eliminated, these decisions would still be made but would be based on nontest data that might be subjective, opinionated, and .~ biased. For example, you may have dreaded taking the Scholastic Assessment Test (SAT) during your junior or senior year in high school. You may have had some preconceived, stereotyped notions about the test. However, the SAT had no preconceived, stereotyped, or prejudiced notions about you! Advocates and many critics of testing would now agree that it is not the tests themselves that are biased, but the people who misuse or abuse them.

(20)

6

While the high-stakes testing movemeat bas swept the nation. there bas been a related shift from the historical emphasis on muhiple-dloice, true-false, and matclUng item for-mats for tests to the use of more flexible measurement formats_ More and,more calls are being beard and heeded for tests and procedures that assess higher level thought processes

than are typically measured by such item formats. Essay

tests.

portfolios, and various per-formance tests (all discussed in detail in Qmpters 7, 8, and 9) are ioaeasingly being dzed

in addition to traditional multiple-choice tests in contemponuy assessment efforts. Advo-cates refer to perfonnance and portfolio assessments as authentic flSSessments, a term that suggests that these assessments measure achievement more accurately and validly than do

traditional tests. Advocates argue that these types of assessments often represent the most

objective, valid, and reliable information that can be gathered about individuals.

However,

these assessments do have their disadvantages as wen. They are costly and time consuming to administer and score, and

they

are hampered by questions about their validity and relia-bility. All these issues will be addressed later iothe text.

CURRENT

TRENDS IN EDUCATIONAL

MEASUREMENT

There have

been a

number of recent developments that have considerably altered the face of contemporary education. Only a select few are reviewed here to help you see that the classroom teacher's involvement with testing and assessment will only increase, as will its importance.

"High-Stakes" Testing

"High-stakes" testing refers to the use of tests and assessments alone to make decisions that

are of prominent educational, financial, or social impact. Examples include whether (a) a student may be promoted to the next grade (see the sidebar about the Waco, Texas, public schools) or graduate from high school; (b) a schoo~ principal, or teacher receives a finan-cial reward or other incentive, such as a school being identified as "exemplary" or "low per-forming"; (c) a state takes over the administrative control of a local school; and (d) a principal or teacher is offered an employment contract or extension.

For example, in 1994, the state of Texas began to use a passing score on the Texas Assessment of Academic Skills (TAAS) test to determine which students would be granted high school diplomas. Students are first given the opportunity to obtain a graduation cutoff score on the TAAS in the 10th grade. If they do not, stodents may retake the

teSt.

High-stakes testing is not a Texas or even a regional phenomenon, however. By 2003 ~

states will be using high-stakes test results for graduation decisions (Doherty, 20(2). Use of the TAAS cutoff score as a graduation requirement is not without its detrac-tors. Critics are concerned that this requirement will be unfair to the state's growing

minor-ity populations

becayse

their performance on the TAAS lags that of Caucasian students. Indeed, a lawsuit was filed in 1997 by the Mexican American Legal Defense and

Educa-tioual Fund (MALDEF) asking the state to stop using what was described as an

"invalid.

(21)

courts ruled in favor of the stale and a similar suit also failed in Indiana (Robelen, 2001). Critics also point out that, in states tbat have recently adopted higb-stalces graduation tesIs, today's seniors may be disadvantaged beclR,lse they are being held to higher standards than existed when they began school and that adequate remedial opportunities may

nOt

exist. Only abourhalf of the 18 Stales that require graduation exams provide funds for remedia-tion of students who fail (Doherty, 200 I).

High-stakes test results also are increasingly being used to make promotion decisions, with seven stales slated to use high-stakes test results for promotion decisions in 2003 (Doherty, 2(02). And, with the January 2002 passage of the No Child

Left

Behind Act annual "academic assessmentsf l

to determine whether pupils are learning will soon be a nationwide requirement for all children in grades 3-9.

Supporters of high-staIc.es testing point to continually increasing percentages of stu-dents who have passed high-stakes tests like the TAAS in Texas since it was implemented in 1994 and claim it has had a motivating effect on students, parents, teachers, and princi-pals. Both critics and supporters of TAAS-related education refonn initiatives in Tt;Xas agree on one thing: In general, Texas students have demonstrated substantial gains in TAAS performance. In 1994,53% passed all the TAAS subtests (reading. math, and writing). with 73% passing in 1998 and 82% passing in 2001. Even tatger proportional gains have been evident for African-American and Hispanic students. However. critics point to a RAND Corporation study (RAND, 2000) that indicated that these gains are far less evident when performance of the same students is measured on a different test, such as the National Assessment of Educational Progress (NAEP), the closest thing we have to a national test. And, this same report showed that the reading achievement gap between minority and Caucasian students was increasing rather than decreasing.

High-Stakes Testing: Pressure on Teachers and Administrators There is little doubt that high-stakes testing has bad a motivational effect Unfortunately, what is motivated may not always he desirable Qf appropriate. There have been disturbing examples

of the lengths to which some teachers and administrators may go to increase test

scores.

"Students Implicate Round Rock Teachers" was a headline in a Texas newspaper (Austin American-Stmesman, Septemher 29, 1994). The article went on to Stale that students testi: fied that three district teachers "pointed out answers, urged pupils to louk over questions again, and made gestures to indicate whether answers were correctf l

while administering the

TAAStest.

One of the attorneys representing tile teachers said the teachers only were guilty of "unintentional mistakes," which may be true, especially if they did not have any specific coursework in tests and measurements during their preservice training. He went on to state tbat these kinds of violations of standardized testing procedure were "minor, approaching trivial." As you will see in Chapter 18, such violations are far

fro!fi

trivial. They undermine the very reason districts undergo the considerable expense and effort involved in

adminis-~ terlng standardized tests. This is not an isolated problem. Similar incidents were reported in

the 32 states and 34 big city districts that had student examination systems based, at least in part, on standardized test scores in 1998 (EdJlcatiQn Week on the Web, February II, 1998).

In a June 6, 2000 episode of Nightline entitled "Cheating Teachers" Ted Koppel

docu-mented a range of incidents of cheating involving various teachers and administrators (Koppel, 20(0). And, in January 2002 tile Austin public schools became the first school

(22)

district to be convicted of criminal charges as a result of tampering with high-stakes test scores by district officials (Martinez. 20(2). !

Interpreting High·StakesTests:The Lake Wobegon Effect lnauthorGar-rison Keiller's (1985) fictional town at Lake Wobegoo, all students score above the national average. How

can

this happen'! At first, it seems a statistical impossibility. If the average is the sum of all the test scores divided by the IlUIllber of students who took: the

test.

then about

half must score below average. Righi? Right! Well, then, what is the explanation'! We could

simply remind you that the novel is fiction and let it go at that, but by now you know we would not do that. We must find an explanation.

First, a standatdized noon-referenced test uses a norms table to determine score

rank-ings. The table is compiled from the scores of students who took the test earlier when the test was being developed or revised. In reality, none of the scores of the students who take a standardized test after it is distributed will affect the norms table. In theory, it is possible for all students who take a test to score above average, or even above the 90th percentile. Indeed, as teachets and district administtators become more familiar with a particular test

it becomes increasingly enticing to "teach to the test." a practice that should be condemned but too frequently is condoned either

directly

or indirectly. This is most likely to occur when standardized test scores become the only basis for high-stakes decisions involving pr0mo-tion; graduation; financial incentives for schools, administrators, and teachers; teacher eval-uations; allocation of tax dollars; or local real estate development, or when political rather than pedagogical considerations are allowed to become of paramount ooncem. The

impor-tant point is that scoring above the average on a standardized test may not necessarily mean a student is doing well. It may mean the teacher has been teaching to the test rather than teaching the critical thinking, independent judgment, and decision-making skills that are more closely related to performance in the real world. This is one of the reasons why the performance and portfolio assessment trends we describe in Chapters 8 and 9 have become as popular as they have.

High-Stakes Testing

at

the National Level Where is all this interest in statewide educational testing and assessment headed'! Calls for the creation of a single, national set of tests for the various academic subjects came from former President Bush in 1991 and President Clinton in 1996. In January 2002 PresidentGe<qe W. Bush signed into law federallegislatioo that will require annual academic assessments of all pupils in gtades 3-9. However, it

appears

that states will be free to decide on the type of annual assessment to administer, rather than being compelled to use a single, national test.

The notion of a unifonn national test that would be required of all students bas proven to be a political hot potato. Advocates emphasize the need for a single test to uniformly evaluate education reform initiatives to facilitate accountability among states, districts, and schools. Detractors argue that a national test is the first step toward a federal takeover of loca1- and state-determined educational curricula or that the results of such a national test will be used to reinforce biases against low performing students, schools, and states. In spite of the fact that both Republican and Democratic presidents and others have mgued for

national tests, the issue remains a highly partisan. emotional, and politicized one that may prove difficult to resolve.

(23)

Currently, the closest thing we have to a single. national test or assessmeut is the National Assessment of Educational Progress. The NAEP is an independent and compre- . hensive assessment system used to evaluate progress toward the six National Education Goals established by former President 'Bush and the nation's governors in 1989. The NationatEduc:ation Goals were renamed "Goals 2000" in 1994, and they are listed in

the

sidebar.

The NAEP is employed by the National Education Goals Panel, an independent

bipartisan panel charged with reporting annually 011 progress toward the National Education Goals. Although the NAEP potentially enables "apples to apples" oomparisons across stu-dents, schools, districts, and states, this potential has yet to be realized because it is BOt uni-fonnly accepted or administered across states and localities. In contrast, there now exist a wide variety of statewide required tests and standardized tests in various subjects that states and districts now use to evaluate their educational programs.

1997 Amendments to the Individuals with Disabilities Education Act (lDEA-97)

The passage of the 1997 Amendments to the Individuals with Disabilities Education Act· (IDEA-97) represented a signifICant change in the education of the disabled. The implica-tions of this change for the role of general education teachers in educating and assessing children with disabilities are significant

The intent of Congress in passing IDEA-97 was to reaffinn that children with dis-abilities are entitled to a free and appropriate public education (FAPE) and to ensure that special education students have access to all the potential benefits that regular education

>":aox

_ ,"':'. ,',,'

. J " ' " , " , "

.' '

~ , "

THE NATIONAL EDUCATION GOALS (GOALS 2000)

l. By the year :1MO, all children in America will , start school tad}' to learn.

2. By the year:1MO, the high school graduation rate will increase to at least 90%.

3. By the year 2000, American students will leave grades 4, 8, and 12 having demonstrated

c0mpe-tency in cballenging subject matter, including English, mathematics, science. history, and geograpby; and every school in America wiD ensure Ibat aU students learn to use their minds

wen,

so dJat they may be prepared for responsi. ble citizenship, further learning, and productive employment ia our modem economy. 4. By the year 2001, U.s. students wiD be first in

the world in science and mathematics achievement

5. By the year 2000, every adult American will be literate &lid will possess the knowledge and skiJIs necessary to compete in a global economy and exercise the rights and responsibilities of citizensbip.

6. By the )'e!If 2000, every school in America will

be free of drugs and violence and wiD offer a disciplined environment conducive to learning. For more iDfonnation conIaCt:

National Education Goals Panel 1350 M Street, NW

Suite 270

Washington, DC 20036

Or, visit the Web site at bUp:/www.coIed.umn.eduI CARE\www/K-12"s/NationalCenter.btml

(24)

students have from the general cuniculum and education reform. Accountability was enhanced by requiring that children with disabilities participate in the sami annual mlua-tions required of general education pupils. IDEA-97 also emphasiZed the importance of raising standards for children with disabilities and regularly evaluating their progress

toward these Slandards.

Under IDEA-97. with only rare exceptions, schools are now required to include all children with disabilities in the general education classroom, curriculum, and IIDIlUIIl

statewide and districtwide achievement testing. And regular education teacbers are now

required to be members of the Individual Education Program (lEP) teams for each child with a disability in their classes. The IEP teams must determine bow the performance and

progress of disabled learners in the general cuniculum will be assessed and collect thedata necessary to make such determinations, including behavioral data when a child with a dis-ability's behavior impedes progress in the general cuniculum.

The implications oflDEA-97 for the testing and assessment skills of teachers will be discussed more thoroughly in Chapters 20 and 21. For now, suffice it to say that full

imple-mentation of the law will require more, rather than less, testing and assessment Testing of children with disabilities within the regular classroom by the classroom teacher will be required to help determine whether a student is in need of special education, to adhere to each student's IEP, to evaluate each child with a disability's progress toward the goals and

short-term objectives established in the lEP, to help determine whether behavior is

imped-ing progress in the general cuniculum, and to meet the law's accountability requirements. This may seem to place an additional burden on regular classroom teachers, and it does. In

Chapters 20 and 21 we will explain bow the testing and assessment skills you will develop

in this course, coupled with the use of existing standardized tests, checklists. and rating scales,

can

enable you to meet these new requirements with a minimum of additioual effort.

Performance and Portfolio Assessment

Performance and portfolio assessment, sometimes referred to as authentic assessment, gained popularity in the 1990s for different reasons. To some, these approaches represent a small revolution in the way testing is defined and implemented. They reject the notion that accurate assessments of behavior

can

be derived only from formal tests, such as

multiple-choice, ttue-false, matching, short answer, and essay examinations. Instead, they advocate for performance and portfolio examinations that ask the learner to carry out the activities

actually used in the real world (such as measuring the tensile strength of a building mafer-ial, estimating the effects of pollutants on aquatic life, designing circuitry for a

IlIlicro-processor,

or assembling

a

folder or portfolio of "works in progress" to reflect growth

in

critical or integrative thinking or other skills over the course of a semester or school year). Assessment techniques might include having students videotape portions of their own

projects, conducting interviews with the student to probe for understanding and thiBking abilities, or making a visual inspection of the product to determine whether it has the required characteristics for successful operation. These forms of assessment are intended to reduce pressures to test solely for facts and skills and to provide a stimulus to introduce more extended thinking and reasoning activities in the cuniculum (Resnick & Resnick,

1989; Tombari & Boricb, (999). We will have more to say about performance and portfolio assessment in Chapters 8 and 9.

(25)

Under IDEA-97, cbildreo with disabilities must be evaluated regularly to assess !heir ongoing progress in the general educa60n cuniculum. The purposes of these evaluatioas are

twofold: (a) to provide parents with re~ reports of progress at least as often as

noodis-abled children receive report canfs and

(b)

to determine whether children with

disabiuues

as a group are progressing in the general curriculum as indicated by their perl'ormaoce on statewide and districtwide annual assessments.

Yet, the same disabilities 1hat qualify some pupils as children with disabilities may also hamper or preclude their ability to participate appropriately in testing that is required for nondisabled pupils. Performance and portfolio assessment may offer general and spe-cial education teachers alternative means by which to annually and on a day-to-day basis evaluate the progress of children with disabilities in the general education cuniculum.

Education Reform and the Global Economy

Two other trends, educational reform and the emergence of the global economy, or global competition, have been the impetus for much change in education. Increased use of tests and other assessments is ensured because they will be used to evaluate the effectiveness of

these reforms and changes, at least in part. f

The education reform movement arose in response to the National Commission on Excellence in Education's release of A Nation at Risk: The Imperative for EducatiofUll Reform in 1983. which documented the shortcomings of the U.S. public education system at that time. Since then a number of reforms have been widely adopted in public education. These include raising expectations. establishing academic and performance standards, implementing high-stakes testing. requiring greater accountability, providing incentives fur

improved perl'ormance, improving teacher salaries. establishing local or site-based man-agement and decision making, and making innovations in teacher training. Nevertheless. the results of almost 20 years of education reform have been mixed, and decision makers continue to rely on test results to evaluate the effectiveness of various reforms.

With recent advances in telecommunications and technology it is now clear that our students will have to compete in an international or gIobaIlIllIIketplace for jobs that will require strong skills in mathematics and science. But are our students able to hold their own in an international competition? A recent study indicated that 9-and 100year-oid Americans scored above average in science and math, with 13-year-01ds at the international average in math and below average in science (Education Week on the Web, lune 18, ,1997). However.

a more recent study contradicts this finding. The results of the Third Inteniational Mathe-maties and Science Study, released in early 1998, indicated that U.S. pupils are poorly pre-pared to compete internationally in math and science. Of 21 nations that took part in the study, only students from lithuania, Cyprus, and South Africa did worse than U.S. seniors (Education Week on the Web, March 4, 1998).

Since public education is supported by taxpayers, educators must be responsive to the demands of taxpayers. With the mixed results of education reform, and with continued con-cern about global competition, increasingly high-tech job requirements. grade iIIfIation, and decreasing confidence in public education, school boards and the general public are becom-ing more and more concerned that they "get what they pay for." Yearly, high-stakes tests have become the mechanism through which these accountability demands are cummtly being met. Often these results are combined with other ratings (e.g., attendance, drop-out

(26)

rates) and linked to complex formulas that are used to evaluate the performance of districts, schools, principals, and teachers, with financial and other incentives, or penalties, tied to improvements or declines in ratings.

Competency Testing

for Teachers

Another current trend in our educationaI system is toward competency testing for teachers. In the early 19808 a number of states passed legislation requiring teachers to pass paper-and-pencil competency tests of teaching. Concurrent with these trends was the development of professional teaching standards. The National Board for Professional Teaching Standards

.<;:~;{;:~!'J"',,-(NBPTS) for advanced certification was formed in 1987 with three major goals:

,,;.",.)p/ --.";,'.::::~.:...

~"i. To establish high and rigorous standards for what effective teachers should know and

7~'~ be able to do;

~. ',-;

,

£~ To develop and operate a national, voluntary system to assess and certify teachers

\(\ ., () "'~ /;)} who meet these standards;

\,,':~"'" f ,"". :Vi11! Ai;./J

~;'1~:rf;~I1~'~::~v 3. To advance related education reforms for the purpose of improving student learning

~~ in American schools.

During the same year, the Interstate New Teacher Assessment and Support Consortium (INTASC) was formed to create "board-compatible" standards that could be reviewed by

professional organizations and state agencies as a basis for licensing beginning teachers (Miller, 1992). Much controversy still surrounds the use of tests aligned with these standards. No one wants to see poorly trained teachers in the classroom. However, the development of a cost-effective, vaIid, and reliable paper-and-pencil method of measuring the complex set of skills and traits that go into being an effective classroom teacher remains elusive. Currently, such tests are not ordinarily used to prohibit a teacher from teaching or to termiuate an expe-rienced teacher. Instead, they often are used on a pre-employment basis to document mini-mum levels of competency in various content areas (Holmes, 1986; Miller, 1992). But with the increase in emphasis on performance and portfolio assessment of students (several types of which will be described in Chapters 8 and 9), development of standardized performance and portfolio assessment medlods for teachers are on the horizon. If such procedures are developed, they may provide for a more valid assessment of teaching competency than paper-and-pencil measures alone for both preservice and inservice teachers.

Nevertheless, interest in the use of paper-and-pencil tests to evaluate teacher compe-tency has continued to increase. In 1998 the Higher Education Act required teacher training progtams in colleges and universities to report their students' scores on teacher licensing tests. By 200 1 forty-two states and the federal government relied heavily on test scores to judge teacher quality and the quality of teacher preparation programs (Blair, 20(1). Yet, a report from the National Research Council (2000) cautioned that there are many limitations to reliance on tests as the sole measure of teacher competence.

Increased Interest from Professional Groups

An assorIment of contemporary issues have drawn the attention of professional organiza-tions, researchers, and policy makers in recent years. Individually and in collaboration,