Teacher Evaluation: The Relationship Between Performance Evaluation Ratings and Student Achievement

(1)

TEACHER EVALUATION: THE RELATIONSHIP BETWEEN PERFORMANCE EVALUATION RATINGS AND STUDENT ACHIEVEMENT

by

Erin E. Alexander Liberty University

A Dissertation Presented in Partial Fulfillment Of the Requirements for the Degree

Doctor of Education

Liberty University 2016

(2)

TEACHER EVALUATION: THE RELATIONSHIP BETWEEN PERFORMANCE EVALUATION RATINGS AND STUDENT ACHIEVEMENT

by Erin E. Alexander

A Dissertation Presented in Partial Fulfillment Of the Requirements for the Degree

Doctor of Education

Liberty University, Lynchburg, VA 2016

APPROVED BY:

Deanna Keith, Ed.D, Committee Chair

Jessica Talada, Ed.D, Committee Member Kristin Kopta, Ed.D, Committee Member

(3)

ABSTRACT

The Performance Evaluation Reform Act established new policies for teacher evaluation ratings, the inclusion of student growth, the acquisition of tenure, and the dismissal process in Illinois. As a result, standards-based performance ratings and student achievement are factored into summative evaluation ratings. The purpose of this study is to determine the relationship between performance evaluation ratings and student achievement to contribute to the current body of research. Participants in the study were drawn from a sample population of fifth grade students (n = 317) and teachers (n = 19) in elementary schools (n = 7) from a school district located in a western suburb of the Chicago metropolitan area. Student achievement was measured by the 2015-2016 Northwest Evaluation Association (NWEA) Measures of Academic Progress (MAP) assessment in math and reading. The performance evaluation ratings in this study were based on the Framework for Teaching, designed by Charlotte Danielson. Archived data from the 2015-2016 school year was collected from the assistant superintendent and an online database. A Pearson correlation test was conducted to analyze the strength of the relationship between performance evaluation ratings and student achievement in math and reading. The analysis did not provide evidence of a significant relationship between performance evaluation ratings and math or reading. Recommendations for future research include replicating this study with other grade levels, subject areas, and school districts to determine generalizations to other settings.

Keywords: teacher evaluation, student achievement, performance ratings, joint committee, Framework for Teaching, Performance Evaluation Reform Act

(4)

Dedication

This dissertation would not have been completed without the support of many people in my life. First, I would like to thank Dr. Keith and Dr. Talada for their guidance and

encouragement through this process. Thank you to Dr. Kopta, who has been a colleague and friend to me throughout my career. She encouraged me to pursue my administrative degree and continue my educational journey. Thank you to my parents, Tom and Karen, whose support and love laid the foundation in my life that allowed me to achieve my goals. Thank you to my brother, Daniel, who raises my spirits and keeps me laughing. Finally, to my husband, Shawn, who encouraged me to go back to school one last time. He knew this was on my heart to complete this doctorate. He believed in me and provided unwavering support and

encouragement through this process. Through the doctorate program at Liberty University, I not only experienced deeper level of knowledge in educational leadership, but also was enlightened to the critical role of spiritual leadership in public schools. “Not to us O Lord, but to your name be the glory, because of your love and faithfulness” (Psalm 115:1 New International Version).

(5)

Table of Contents ABSTRACT………...3 Dedication………....4 List of Tables………...7 List of Figures……….…8 List of Abbreviations………..…9

CHAPTER ONE: INTRODUCTION………10

Background………10

Problem Statement………...……..16

Purpose Statement………...….17

Significance of the Study………...17

Research Questions………...….…19

Null Hypotheses……….….………...19

Definitions………...19

CHAPTER TWO: LITERATURE REVIEW………...……….…22

Introduction……….………...………..…..22

Related Literature………….………..23

Summary of Literature……….………...….…..50

CHAPTER THREE: METHODS………..……52

Design………...……..…………..….52

Research Questions………..……….…….53

Null Hypothesis………..……….……..53

(6)

Instrumentation………...…...…57

Procedures………..……60

Data Analysis………...…………..61

CHAPTER FOUR: FINDINGS……….……62

Research Questions………...………...…..…62

Null Hypothesis……….…………62

Descriptive Statistics………..…………62

Results………..………..…64

Summary………66

CHAPTER FIVE: DISCUSSION, CONCLUSIONS, AND RECOMMENDATIONS...…...….67

Discussion………..67

Conclusions………....69

Implications………71

Limitations……….72

Recommendations for Future Research……….…………74

REFERENCES………...……...77

(7)

List of Tables

(8)

List of Figures

Figure 1: Five-year trend of student enrollment………...……….54

Figure 2: Five-year trend of students considered low-income………..………54

Figure 3: Five-year trend of student demographic information………55

Figure 4: Five-year trend of mobility………56

Figure 5: Five-year trend of students with disabilities………..…………56

Figure 6: Five-year trend of students considered English-language learners………...57

Figure 7: Normality histogram for student achievement in math………...64

Figure 8: Normality histogram for student achievement in reading……….65

(9)

List of Abbreviations Measures of Academic Progress (MAP)

National Council for Effective Teaching (NCET) No Child Left Behind (NCLB)

Northwest Evaluation Association (NWEA) Performance Evaluation Reform Act (PERA) Rasch Unit Scale (RIT)

(10)

CHAPTER ONE: INTRODUCTION Background

The relationship between performance evaluation ratings and student achievement is a current topic of research as school leaders and policy-makers reform practices surrounding teacher evaluation. School systems need effective teachers because research indicates teacher quality is a key factor influencing student outcomes (Aaronson, Barrow, & Sander, 2007; Odden, Borman, & Fermanich, 2004; Rockoff, 2004). The ability to distinguish between levels of performance is intended to help teachers grow and develop professionally and ultimately improve student achievement. One of the primary challenges in teacher evaluation is determining how to measure effective teaching. Therefore, the search for valid and reliable measures of teaching performance is underway in many states across the country.

Teacher evaluation reform was addressed at the federal level when President Barack Obama focused his educational platform on the need to recruit, prepare, and reward teachers while creating an equitable distribution of quality teachers across the country

(Darling-Hammond, 2009). The Obama administration enacted Race to the Top to propel reform efforts impacting current practices of evaluating effective teaching. Race to the Top focused on rigorous assessments, attracting and retaining high quality teachers, using data to inform decisions, and sustaining reforms to improve education in the United States ("United States Department of Education," 2009a). As a result, Race to the Top invested 4.35 billion dollars in educational spending to award states with competitive grants. States receiving the grants were required to adopt new requirements for teacher evaluation systems. The requirements of Race to the Top included (a) designing and implementing rigorous standards and high-quality

(11)

assessments, (b) attracting and retaining quality teachers and leaders, (c) supporting data systems that inform decisions and improve instruction, (d) using innovative reforms to transform

struggling schools, and (e) demonstrate sustaining educational reform ("United States

Department of Education," 2009a). Race to the Top spurred a rapid pace of change to re-define standards and measures of effective teaching. According to a report issued by the White House, the Obama administration put state-level innovation to work to generate the best ideas on raising standards to enable students to be college and career ready (Education, 2014).

Evaluation reform should coincide with a teaching and learning system that supports continuous improvement with useful feedback (Darling-Hammond, 2014). The problem stems from evaluation systems failing to recognize individual the strengths and weaknesses of teachers (Weisberb, 2009). A study known as the Widget Effect, found that in school districts using binary rating systems of satisfactory and unsatisfactory, 99% of teachers received satisfactory ratings (Weisberb, 2009). Weisberb (2009) also found that in school districts using a broader range of ratings, 94% of teachers received the top two ratings with 1% rated unsatisfactory. Therefore, state and local school leaders are working to implement new systems of evaluation, observation, and accountability.

As lawmakers work to develop consistent definitions and shared understandings of teacher evaluation, one of the most significant changes to evaluation systems in Illinois is the addition of multiple ratings such as excellent, proficient, needs improvement, and unsatisfactory ("Performance Evaluation Reform Act," 2010). Illinois addressed the inclusion of student growth by implementing the Performance Evaluation Reform Act (PERA). PERA was a significant reform designed to address evaluation ratings, student growth, the acquisition of tenure, and the dismissal process ("Performance Evaluation Reform Act," 2010). The

(12)

performance evaluation ratings include a combination of professional practice and student achievement ("National Council for Teacher Quality," 2012). As school districts in Illinois implement PERA, data and indicators of student growth must be a significant factor in rating teacher performance. The use of student data is new in teacher evaluation systems; therefore, continued research is needed to determine if student achievement is a fair and reliable indicator of teacher performance.

Ultimately, at the heart of conversations surrounding teacher evaluation reform is the desire to ensure the continuous improvement of teaching and learning. The purpose of

evaluation is to help teachers improve their practice and support personnel decisions (Shakman, Breslow, Kochnek, Riordan, & Haferd, 2012). The identification of effectiveness in teaching is important for the purpose of instructional improvement, accountability, professional

development, resource allocations, and teacher compensation (Odden et al., 2004). In order to do so, the process must include evidence to validate decisions and distinguish between varying levels of service. As policy makers debate the inclusion of student achievement, the challenge is that rewarding teachers based on their performance requires a credible system to measure quality (Toch & Rothman, 2008). In order to create a credible system, local school districts in Illinois are working on the development and implementation of teacher evaluation systems to meet the requirements of PERA. Illinois School Code requires districts to establish a joint committee “composed of equal representation selected by the district and its teachers, or where applicable, the exclusive bargaining representative of its teachers” ("Performance Evaluation Reform Act," 2010). The purpose of the joint committee is to engage in a collaborative decision-making process with teachers and administrators. As a result, the design and implementation of PERA may vary due to the uniqueness in culture from district to district. As joint committees

(13)

determine the details of implementation, research is needed to monitor the effectiveness of new evaluation systems.

The ongoing reforms must be a collaborative process with policy makers and educators. Darling-Hammond (2014) suggested that a productive evaluation system should consider curriculum goals, student needs, and multifaceted evidence of teacher contributions to student learning and to the school as a whole. As school districts determine the percentage of student growth in teacher evaluation systems, research is needed to determine the relationship between performance evaluation ratings and student achievement to determine if the percentage the joint committees decided upon is an effective indicator of teacher performance.

As teacher evaluation reform strives to improve the quality of instruction and guide professional development, student achievement is an important indicator to consider in the process. PERA allows districts flexibility to design their own combination of measures to evaluate professional practice and student growth. Value-added assessment models are designed to estimate effects of individual teachers or schools on student achievement while accounting for differences in student backgrounds (American Statistical Association, 2014). The results may factor in to teacher evaluation ratings and staffing decisions. As districts design value-added models and incorporate research-based frameworks for performance evaluation, the thoughtful design and careful implementation is critical to success (Papay, 2012). Although continued research is needed to understand the reliability of value-added models, student data still has an important role to play in teacher evaluation (American Statistical Association, 2014). It is crucial that the work be transparent and information about challenges and successes is shared with stakeholders (McGuinn, 2012).

(14)

field of research as states enact new policies to measure effective teaching. If performance evaluation ratings have a positive relationship to student achievement, the results could be useful information in the distribution and effects of teacher quality (Borman & Kimball, 2005). A positive relationship between performance evaluation scores and student achievement would suggest that helping teachers improve professional practice would improve student learning (Kimball, White, & Milanowski, 2004). The following chapter will establish a foundation for the study of the relationship between performance evaluation ratings and student achievement. Theoretical Framework

The generalizability theory provides a framework for the following study and may be a significant asset in the development of evaluation systems (Hill, Charalambous, & Kraft, 2012). According to Hill et al. (2012), the theory of generalizability includes a comprehensive

framework for making judgments about multiple elements of observational systems. The reliability of teacher evaluation systems will depend on a combination of researched-based frameworks, inter-rater reliability, certification systems, and scoring designs of student achievement. Kane, Taylor, Tyler, and Wooten (2011) found that some teaching practices predict student achievement more than others. Therefore, the results of this study may be useful to generalize to the existing body of research on performance evaluation ratings and student achievement. PERA is a statewide reform intended to generalize basic requirements for evaluating teachers across all districts. This study may potentially extend the theory of generalizability because a positive relationship between performance evaluation ratings and student achievement may provide support for using student data in a comprehensive teacher evaluation framework.

(15)

Another theoretical framework for this study is grounded in Vygotsky’s theory of social development. Vygotsky defined the zone of proximal development as the distance between the developmental level of independent problem solving and the potential development level of problem solving under guidance (Miller, 2011). The evaluation process includes trained evaluators and teachers engaging in a professional growth process. The Framework for Teaching is a collaborative process that includes discussion, joint participation, modeling, explanation, and leading questions; all components of Vygotsky’s learning theory (Danielson, 2013; Miller, 2011). The Framework for Teaching allows teachers to reflect, problem solve, and develop their practice in the areas of planning and preparation, classroom environment,

instruction, and professional practice under the guidance of the evaluator. The theory of social development has informed the body of literature on teacher evaluation because the purpose of developing professional growth is to continually improve teaching and learning.

Additionally, Vygostsky’s social development theory is the foundation of constructivism that is the basis for the Framework for Teaching (Danielson, 2013). The constructivist theory has influenced the evaluation of teaching over time with a focus on active learning. Previous evaluation systems included one-way transmissions of information from evaluator to teacher. Since the implementation of PERA, teachers are required to actively engage in the evaluation process through the Framework for Teaching. The evaluator continually provides feedback and opportunities for teacher reflection throughout the process, while determining the summative rating and the conclusion. The evaluation plan is based on professional discussions designed to problem solve and continually assess how instruction impacts student learning. As a result, the evaluation process is a reciprocal experience between the teacher and administrator, where the teacher plays an active role in the professional learning process. This study may potentially

(16)

extend the constructivist theory of active learning if there is a positive relationship between performance evaluation ratings and student achievement.

Problem Statement

PERA outlines requirements for teacher evaluation, but allows joint committees to decide implementation details at the local level. The review of literature suggests the most effective teacher evaluation model will likely include multiple components of evidence such as

observations, artifacts, student assessments, surveys, professional contributions, and portfolios to assist the evaluation of good teaching (Darling-Hammond, 2013; Taylor & Tyler, 2012). Since student achievement should not be the only factor in determining effective teaching, the best solution may be a hybrid model that combines student achievement and objective ratings (Donaldson, 2012; Hanushek & Rivkin, 2010). A more rigorous evaluation system requires additional investments in time and resources. Therefore, decision-makers at the national, state, and local levels need to know whether increased expenditures will benefit student learning (Holtzapple, 2003). During the first two years of PERA implementation, at least 25% of teacher evaluations are comprised of student growth with the remainder based on professional practice. The third year and beyond, student growth must represent at least 30% of the performance evaluation rating ("Performance Evaluation Reform Act," 2010). The percentage requirements are minimums and the joint committee may decide to increase the percentage of student growth. In the end, a positive relationship between evaluation ratings and student achievement would suggest helping teachers improve their practice would contribute to the improvement of student learning (Kimball et al., 2004). The problem is research is needed to determine the strength of the relationship between professional practice ratings and student achievement to guide joint committees in deciding fair and reliable measures of effective teaching.

(17)

Purpose Statement

The purpose of this study is to determine if there is a statistically significant relationship between performance evaluation ratings using the Framework for Teaching and student

achievement as measured by the Northwest Evaluation Association (NWEA) Measures of Academic Progress (MAP). During the first two years of implementation of PERA, teacher evaluation must include at least 25% of student growth. From the third year and beyond, student growth must represent at least 30% of the performance evaluation rating. The following study will add to current research identifying the strength of the relationship between performance evaluation ratings and measures of student achievement. Through a quantitative design, the research examined a school district comprised of eight public schools located in a western suburb of the Chicago metropolitan area. The research took place within an elementary school district of 260 teachers serving 4,299 students from early childhood through eighth grade. Participants in the study were identified from a population of fifth grade general education teachers (n = 19) and students (n = 317) from elementary schools (n = 7). To control for extraneous variables, students with an individualized educational plan, an invalid MAP score, or more than 18 days of absences were removed from the data set. The Pearson product-moment correlation test

determined the degree and the direction of the linear relationship between the variables. The independent variables were the performance evaluation ratings and the dependent variables were student achievement in math and reading.

Significance of the Study

The following study is important, as Illinois is a participant in the growing momentum of teacher evaluation reform across the country. As local school districts comply with the

(18)

of student growth and the types of assessments to be included in the final evaluation rating (Performance Evaluation Reform Act, 2010). Therefore, continued research is needed to determine the relationship between performance evaluation ratings and student achievement to guide joint committees in decision-making. Standards-based evaluations processes have been found to be predictive of student learning gains and productive for teacher learning provided evaluators are trained, feedback is frequent, and mentoring and professional development are available (Darling-Hammond, Amrein-Beardsley, Haertel, & Rothstein, 2012). Other studies such as Gallagher (2004); Heneman, Milanowski, Kimball, and Odden (2006); Kimball et al. (2004); and Milanowski (2004) examined the relationship between the Framework for Teaching and student achievement. Though positive relationships were determined, continued research is needed to support generalization to other settings. Therefore, the purpose of this study is to determine the relationship between performance evaluation ratings and student achievement to contribute to the current body of research (Boyd, Lankford, Loeb, & Wyckoff, 2011; Carrell & West, 2010; Chetty, Friedman, & Rockoff, 2011; Donaldson, 2012; Goldhaber & Hansen, 2010; McCaffrey, Sass, Lockwood, & Mihaly, 2009; Papay, 2011; Pecheone & Chung, 2006; Range, Scherz, Holt, & Young, 2011; Rockoff, Staiger, Kane, & Taylor, 2012). A positive relationship between evaluation scores and student achievement would suggest helping teachers improve professional practice would improve student learning (Kimball et al., 2004). Additionally, a positive relationship would provide evidence for expanding the use of student achievement in determining teacher effectiveness. Ultimately, the results of this correlational study will

contribute to current research guiding joint committees in the local implementation of PERA and support the continuous improvement of teaching and learning.

(19)

Research Questions The study is based on the following research questions:

RQ1: Is there a statistically significant relationship between performance evaluation ratings and student achievement in math as measured by the spring MAP test in fifth grade? RQ2: Is there a statistically significant relationship between performance evaluation ratings and student achievement in reading as measured by the spring MAP test in fifth grade?

Null Hypotheses The study is based on the following null hypotheses:

H01: There is no statistically significant relationship between performance evaluation

ratings and student achievement in math as measured by the MAP test in fifth grade.

H02: There is no statistically significant relationship between performance evaluation

ratings and student achievement in reading as measured by the MAP test in fifth grade. Definitions

1. Formal observation – A scheduled time when a qualified evaluator directly observes a teacher’s professional practice in the classroom or throughout the school ("Growth Through Learning," 2012).

2. Informal observation – An unannounced observation of a teacher, by a qualified evaluator, not subject to a time minimum requirement ("Growth Through Learning," 2012).

3. Formative – Feedback from a qualified evaluator designed to enhance the professional practice of teachers (Danielson & McGreal, 2000)

4. Framework for Teaching – A research-based set of components of instruction, aligned to professional teaching standards and grounded in a constructivist view of

(20)

learning and teaching. The performance indicators are divided into 22 components and clustered into four domains consisting of planning and preparation, classroom environment, instruction, and professional responsibility ("The Danielson Group," 2013).

5. Joint committee – A committee comprised of equal representation of administrators and union representatives which shall have the duties to establish a performance evaluation plan that incorporates data and indicators of student growth as a significant factor in rating teacher performance ("Performance Evaluation Reform Act," 2010). 6. Measures of Academic Progress – assessments in reading, math, and language to

measure growth, project proficiency on high-stakes tests, and inform how educators differentiate instruction, evaluate programs, and structure curriculum ("Northwest Evaluation Association," 2014).

7. No Child Left Behind – A law signed by President Bush in 2010 that established funding, accountability, school district report cards, school choice, and the

requirement for highly qualified teachers ("United States Department of Education," 2009b)

8. Performance evaluation plan – A plan designed to evaluate teachers that include indicators of student growth as a significant factor in judging performance and measures the individual’s professional practice ("Performance Evaluation Reform Act," 2010).

9. Performance Evaluation Reform Act – In 2010, this law was passed requiring school districts to include student growth as a significant factor in the evaluation of

(21)

category rating system of excellent, proficient, needs improvement, and

unsatisfactory. Additionally, anyone undertaking an evaluation after September 1, 2012 must complete a pre-qualification program approved by the state to be considered a qualified evaluator ("Performance Evaluation Reform Act," 2010). 10. Qualified evaluator - An individual who has completed the prequalification process

as required by Illinois School Code and successfully passed the state-developed assessments specific to evaluation of teachers. Each qualified evaluator maintains his or her qualification by completing the retraining as applicable ("Growth Through Learning," 2012).

11. Standards-based evaluation - a comprehensive set of standards and rubrics that provide detailed written feedback designed to enhance instruction and strengthen accountability (Borman & Kimball, 2005).

12. Summative evaluation rating – a final evaluation rating designed to make consequential decisions using the ratings of unsatisfactory, needs improvement, proficient, and excellent (Danielson & McGreal, 2000).

13. Value-added models - assessments designed to estimate effects of individual teachers or schools on student achievement while accounting for differences in student

(22)

CHAPTER TWO: LITERATURE REVIEW Introduction

Race to the Top was a catalyst for school improvement by placing teacher evaluation in the spotlight of reform across the country ("United States Department of Education," 2009a). Individual states responded to Race to the Top by passing new regulations to increase

accountability and define measures of effective teaching. Specific to Illinois, in January 2010, the governor signed the Performance Evaluation Reform Act (PERA) establishing new

regulations for teacher evaluation throughout the state. The new law included requirements for performance evaluation ratings, the inclusion of student growth, the acquisition of tenure, and the dismissal process. Student growth is a new component in teacher evaluation calculated in the overall performance rating. During the first two years of PERA implementation, at least 25% of teacher evaluations are comprised of student growth with the remainder based on professional practice. On the third year and beyond, student growth must represent at least 30% of the performance evaluation rating (Performance Evaluation Reform Act, 2010). Research is needed to determine the strength of the relationship between professional practice ratings and student achievement to guide lawmakers and local leaders in improving the teacher evaluation process. The purpose of this study is to determine if there is a statistically significant relationship between performance evaluation ratings using the Framework for Teaching and student achievement as measured by the Northwest Evaluation Association (NWEA) Measures of Academic Progress (MAP). This dissertation will parallel other studies that have compared standards-based evaluation ratings using the Framework for Teaching and student achievement such as Borman and Kimball (2005); Gallagher (2004); Heneman et al. (2006); Kimball et al. (2004).

(23)

The following review of literature will discuss historical influences on teacher evaluation prior to Race to the Top. Further discussion will focus on the strengths and challenges of using student data in calculating summative evaluation ratings. The majority of research supports a hybrid model combining value-added measures and professional practice which will be explored in more detail (Darling-Hammond, 2013; Darling-Hammond et al., 2012; Donaldson, 2012; Hanushek & Rivkin, 2010; Taylor & Tyler, 2012). Final considerations will be given to research surrounding teacher evaluation and implications for professional development. The review of literature will establish a framework for the current body of research surrounding the relationship between performance evaluation ratings and student achievement.

Related Literature Historical Influences on Education

The criterion for evaluating teaching has evolved over time. In the 1940s and 1950s, trait research placed variables such as voice, appearance, emotional stability, trustworthiness, and enthusiasm as the basis for evaluating teachers (Danielson & McGreal, 2000; Rowley, 2010). Throughout the 1960s and 1970s, the focus shifted from teacher traits to teacher effects that correlated teacher behavior and student performance (Danielson & McGreal, 2000; Rowley, 2010). Additionally, federal influences have directed educational reform dating back to

President Lyndon Johnson in 1965. As part of the war on poverty, President Johnson passed the Elementary and Secondary Education Act (ESEA) ("United States House of Representatives," 1965). ESEA ensured all students would have a fair and equal opportunity to obtain a high-quality education and achieve proficiency on challenging state standards and assessments ("United States House of Representatives," 1965). Again in the 1980s and 1990s educational reform was prominent in the public eye when concerns about the economy and the changing job

(24)

market shifted the focus of education toward critical thinking, problem solving, life-long learning, collaborative learning, and deeper understanding (Danielson & McGreal, 2000).

In the 1980s, Madeline Hunter developed a behaviorist theory of learning to show a relationship between teacher behavior and student motivation, retention, and transfer of

knowledge (Danielson & McGreal, 2000; Rowley, 2010). The planning and preparation stage of instruction developed when Hunter created the seven-step lesson plan that included the

anticipatory set, statement of the objective, instructional input, modeling, checking for

understanding, and independent practice (Danielson & McGreal, 2000). The lesson plan would be a way for teachers to demonstrate their ability to design coherent instruction. However, what resulted was the development of rating scales and checklists encouraging a single view of teaching (Danielson & McGreal, 2000).

Another milestone in educational reform occurred in 2001, with the reauthorization and renaming of ESEA by President George W. Bush. The reform movement, known as No Child Left Behind (NCLB), focused on improving teacher quality and student achievement. NCLB made the bold promise that all students would achieve 100% proficiency in reading and math as measured by high-stakes testing ("United States Department of Education," 2009b). Although a catalyst for reform, the unintended consequences of NCLB included states lowering standards, one-size fits all mandates, and the overshadowing of curriculum by the pressures of high-stakes standardized testing (Education, 2014).

Another key point of NCLB was the establishment of requirements to be considered “highly qualified.” Highly qualified teachers were those who held at least a bachelor’s degree, a state license, and demonstrated competency in their subject matter ("United States Department of Education," 2009b). Because of NCLB reform efforts, Weems and Rogers (2010) suggested that

(25)

teachers are now entering the classroom more prepared than ever before. However, the

drawback of an overemphasis on highly qualified teachers was an intensified public view of the importance of credentialism. Kane, Rockoff, and Staiger (2008) suggested a misplaced emphasis on certification, as there was little difference in the average teacher effectiveness between

certified and uncertified teachers. In nearly all states, teachers must pass a basic skills, content, and teaching knowledge test for licensure, however Darling-Hammond (2010) suggested these tests are not strongly related to classroom success. Additionally, Danielson and McGreal (2000) noted that state licensure ends with the guarantee of minimum competence. Therefore, the school district is responsible for ensuring effective teaching by evaluation and professional growth. NCLB ensured certification and preparedness, however the measurement of effective teaching requires additional indicators. What resulted from NCLB was a culture of

accountability using statewide-standardized test scores, which paved a way for the inclusion of student achievement data in teacher evaluation.

The Need for Reform

School systems need effective teachers because research consistently indicates that teacher quality is a key factor influencing student outcomes (Aaronson, Barrow, & Sander, 2007; Goldhaber, Brewer, & Anderson, 1999; Rivkin, Hanushek, & Kain, 2005; Rockoff, 2004; (Odden et al., 2004). The importance of schools attracting and retaining high quality teachers is supported by Goldhaber, Gross, and Player (2011) who found that more effective teachers are less likely to leave their individual schools and the public school system in general. Historically, the only difference between the evaluation of novice and veteran teachers has been the frequency of evaluations conducted (Rowley, 2010). In addition, Weisberb (2009) found several schools

(26)

around the country using binary rating systems of satisfactory or unsatisfactory for teacher evaluation.

Research by Weisberb (2009) and Kane et al. (2011) supported the finding that a large number of teachers may be labeled as effective, but may not be providing the same level of service. In addition, Jacob and Walsh (2011) found that most teachers were evaluated every other year and the majority were given “superior” or “excellent” ratings. While some teachers may be deserving of the highest ratings, the lack of differentiation in performance levels may result in an inflated sense of performance when the majority are rated “good” or “great” (Weisberb, 2009). Additionally, Darling-Hammond (2014) agreed the problem is that existing evaluation systems rarely help teachers improve or distinguish between who is exceeding and who is struggling. A teacher who does not demonstrate proficiency is provided with an improvement plan and specific steps toward professional growth. The plan may outline requirements, resources, and professional development to support the teacher. Although interesting to note, research by Range et al. (2011) revealed the majority of principals felt improvement plans were effective, but 40% were speculative that such plans remediated ineffective teachers.

According to Jacob (2011), popular strategies used with ineffective teacher performance include monetary settlements, voluntary departure, and dismissal. However, key barriers to dismissal include difficulty in providing documentation, legal expenses, and administrative time (Range, Duncan, Scherz, & Haines, 2012). Weisberb (2009) stated that “administrators are deterred from pursuing remediation or dismissal because they view the process as overly time consuming and cumbersome, and the outcomes for those who do invest the time in the process is uncertain” (p. 17). For potentially similar reasons, a study by Jacob (2011) revealed that 38.8%

(27)

to 46.2% of the elementary principals did not dismiss any teachers despite their ability to do so within the Chicago Public School system policy.

Individual and organizational improvement needs are addressed through the evaluation process (Razik & Swanson, 2010). Therefore, as districts seek to maximize budgetary costs, staffing and salary are considerations to ensure the district is paying for the highest quality work. According to Hanushek and Rivkin (2006), if districts systematically hire the best available teachers, the average quality of education would increase. Likewise, retaining the best teachers should also be systematic and thoughtful. Therefore, there is concern with the practice of reduction-in-force using seniority-based layoffs. Salary and benefits comprise a large part of a school budget, therefore positions may be eliminated ensure the financial stability of a school district. A seniority-based reduction is impractical because younger teachers have lower salaries, therefore more staff will be eliminated to meet the budget reduction requirement. Wiswall (2013) suggested experience is not an indicator of teacher effectiveness and supported a need for more emphasis on teacher quality rather than seniority for retention decisions. Another problem with seniority-based layoffs is that when younger, effective teachers lose their positions, more senior teachers remain in the district, regardless of performance (Boyd et al., 2011).

Parents, teachers, and administrators would agree that the role of a teacher is critical to the quality and success of the school. The problem with historical practices in teacher evaluation is that poor performance is under addressed and excellence goes unrecognized (Weisberb, 2009). According to Danielson and McGreal (2000), like other professions, education evolves based on current research. Therefore, new approaches and pedagogical practices must move forward with evaluating teacher effectiveness as well. The challenge of improving teacher evaluation

(28)

evaluate quality teaching. However, policymakers are working to define effective teaching as research continues in a new era of best practice.

A New Era of Teacher Evaluation

In 2009, President Barack Obama focused his educational platform on the need to recruit, prepare, reward, and reward teachers while creating an equitable distribution of quality teachers across the country (Darling-Hammond, 2009). Under the Obama administration, the Race to the Top initiative invested 4.35 billion dollars in educational spending to award states with

competitive grants. As a result, 11 states and Washington D.C. received the Race to the Top federal funding and began the innovative journey of reform. Tennessee and Delaware were the first states to undergo reform followed by Colorado, Florida, Illinois, Louisiana, New York, North Carolina, Ohio, and Rhode Island (Education, 2014). States who were recipients of the grant money acted with urgency to pass new regulations in compliance with requirements of Race to the Top. Race to the Top spurred research and policy change across the nation for teacher evaluation.

The requirements of Race to the Top included (a) designing and implementing rigorous standards and high-quality assessments, (b) attracting and retaining quality teachers and leaders in schools, (c) supporting data systems that inform decisions and improve instruction, (d) using innovative reforms to transform struggling schools and (e) demonstrate sustaining educational reform ("United States Department of Education," 2009a). As a result, state and local school leaders are working to implement new systems of evaluation, observation, and accountability. Darling-Hammond (2010) recommended a national teacher performance assessment to share a common framework for defining and measuring teacher effectiveness, however Race to the Top empowers states to work individually on reform. According to a report issued by the White

(29)

House, the Obama administration put state-level innovation to work to generate the best ideas on raising standards to enable students to be college and career ready (Education, 2014). Therefore, although states are required to comply with federal regulations, flexibility exists on the design and implementation.

According to the recommendations by The New Teacher Project, meaningful teacher evaluation systems should reflect a set of core convictions about quality instruction ("The New Teacher Project," 2010). Some of these core values include the belief that (1) all children can master rigorous academic material regardless of socioeconomic status, (2) a teacher’s primary responsibility is to ensure that students learn, (3) teachers contribute to student learning in ways that can be observed and measured, (4) evaluation results should form the foundation of teacher development, and (5) while no system is perfect, evaluations should play a major role in

employment decisions ("The New Teacher Project," 2010). Compared to previous binary evaluation systems, these new ideas and reforms reflect a more holistic approach to measuring effective teaching.

As the focus shifted from credentials to effectiveness, raising teacher quality may be key in improving student outcomes (Gordon, Kane, & Staiger, 2006; Rockoff, 2004). Since 2009, the National Council for Teacher Effectiveness reported that 36 states have made changes to policies on teacher evaluation. In addition, there is an increase in the number policies requiring the inclusion of student achievement data in summative evaluation ratings. Overall, 25 states require teacher evaluation systems to have multiple ratings, and 39 states require annual

observations of classroom instruction. The reform efforts supported by the National Council for Effective Teaching (NCET) recommended annual evaluations and evidence of student learning as the most important factor in measuring effective teaching. NCET also recommended the need

(30)

for multiple measures to assess teacher effectiveness, the use of multiple ratings and quality feedback, and the use of evaluations to determine tenure ("National Council for Teacher

Quality," 2012). Ullman (2012) recommended defining effectiveness, using multiple indicators, developing clear composite ratings, differentiating performance levels, building data analysis, and improving instructional leadership. Suggestions for additional criteria in teacher evaluation may include student growth, student surveys, self-assessments, and a research-based framework (Donaldson, 2012; Hanushek & Rivkin, 2010; Weems & Rogers, 2010).

Teacher Evaluation Reform in Illinois

Illinois is one of many states engaged in teacher evaluation reform through Race to the Top. State agencies play a critical role in supporting local districts with teacher evaluation. As states define their role according to statutory provisions, decisions are made on how to fund and support the evaluation process (McGuinn, 2012). Additionally, states must determine how to support local districts based on the unique needs and resources of the particular state. Other consideration should be given to supporting administrators long-term and remaining transparent about the work.

In January 2010, the governor of Illinois signed the Performance Evaluation Reform Act (PERA) establishing new regulations for teacher evaluation. The new law addressed

performance evaluation ratings, student growth, the acquisition of tenure, and the dismissal process ("Performance Evaluation Reform Act," 2010). Staiger and Rockoff (2010) suggested teacher evaluation should identify and retain only the best teachers early in their teaching career and tenure should be limited to those who meet a very high bar. However, PERA retains tenure, but creates four groupings of teachers to eliminate the seniority-based model in the event of a reduction-in-force. Group four consists of teachers who received excellent ratings on their last

(31)

two most recent performance evaluations. Group three is made up of teachers who received at least a proficient rating on their two most recent performance evaluations. Group two includes teachers who have received needs improvement or unsatisfactory on any one of the last two performance ratings. Finally, group one includes any non-tenured teachers who have not received a performance evaluation rating and/or part time teachers ("Performance Evaluation Reform Act," 2010). As a result, PERA allows the district to reduce the positions of teachers by a four-grouping plan instead of the previous seniority model. In July of 2014, the Illinois State Board of Education issued a policy update thereby issuing recall rights to teachers in group two who may be dismissed, provided that one of the last two performance evaluations were proficient or excellent ("Illinois State Board of Education," 2014a). Continued discussion is ongoing at the state level regarding guidance to local school districts on student growth, peer evaluation, special education, and early childhood teacher evaluation.

Local school districts are working on the design and implementation of teacher

evaluation systems to meet the requirements of PERA. Section 24A-4(b) of the Illinois School Code requires school districts to establish a joint committee “composed of equal representation selected by the district and its teachers, or where applicable, the exclusive bargaining

representative of its teachers” ("Performance Evaluation Reform Act," 2010). Administrators and union leaders work collaboratively in a joint committee to design and implement new systems of teacher evaluation in accordance with the law. The design and implementation of PERA will vary from district to district based on the decisions made by the joint committee. Key considerations for designing teacher evaluations systems include formal observations,

observation logistics, training, principal engagement, and feedback for the evaluator (Sartain, 2011). The motivation exists for collaboration and compromise because according to Illinois

(32)

School Code, if “within 180 calendar days of the first meeting, the joint committee does not reach agreement on the evaluation plan, then the district shall implement the model evaluation plan established by the State Board of Education with respect to the use of data and indicators on student growth as a significant factor in rating teacher performance” ("Performance Evaluation Reform Act," 2010, p. 27).

Standards-Based Teacher Evaluation

According to Borman and Kimball (2005), standards-based systems for teacher evaluation assess teaching practice using a comprehensive set of standards and rubrics to

enhance instruction and strengthen accountability. PERA requires the use of a researched-based framework for observations, for which many school districts are turning to the work of Charlotte Danielson and the Framework for Teaching (Danielson, 2013). The framework assesses 22 components of instruction grounded in a constructivist view of teaching and learning (Danielson, 2013). Furthermore, the rubrics assign ratings of distinguished, proficient, needs improvement, and unsatisfactory. The Framework for Teaching is a method of teacher evaluation, which revolutionizes the former binary rating systems. Other popular evaluation tools include the Marzano Evaluation Framework, the COMPASS Observation Instrument, The System for Teacher and Student Advancement, and other state-developed instruments ("Reform Support Network," 2014).

The Framework for Teaching provides common language to create professional dialogue, enhance consistency, and increase objectivity by placing the emphasis on evidence-based

judgments of teacher performance (Rowley, 2010). Teaching practice is summarized by four domains: planning and preparation, classroom environment, instruction, and professional

(33)

provides a collaborative process, and allows the evaluator to engage in professional growth conversations. Evaluators use lesson plans, student work, and other relevant documents to include as evidence. The Framework for Teaching includes a pre-conference and a post-conference allowing the teacher and administrator the opportunity to discuss professional practice (Danielson, 2013). In concurrence with the Framework for Teaching, is a study by the Center for American Progress that recommended incorporating goal setting and including teachers as partners in the evaluation process (Donaldson, 2012). The challenge of standards-based observations is the amount of time required to complete each one. Kersten and Israel (2005) found evaluation requirements difficult to complete along with other demands on the principal. Some school districts address this issue with the help of technology by collecting data on an iPad, laptop, or other portable device. With programs like Teachscape, data is saved, aggregated, and e-mailed to teachers for immediate feedback (Ullman, 2012). A computer-based assessment tool limits the amount of time the evaluator and teacher spend completing paperwork. In turn, this would allow for more time spent on dialogue and reflection, which is central to the purpose of the Framework for Teaching.

Over time, the role of the principal has evolved from building manager to the

instructional leader of teaching and learning (Blankstein, 2012; Kouzes & Posner, 2012; Short & Greer, 2002). A significant part of an administrator’s role is to evaluate effective teaching. Despite the debate surrounding specific measures of evaluation, Jacob and Lefgren (2008) found that overall, good teaching is observable. Likewise, Kane et al. (2011) suggested that

evaluations based on well-executed classroom observations identify effective teachers and teaching practices. The training and support of evaluators is key to the measurement of effective teaching. Evaluators should be knowledgeable and competent with the understanding of

(34)

evaluation requirements (Razik & Swanson, 2010). Therefore, PERA requires the completion of the Teacher Growth Training Module to become a certified evaluator. The program collaborates with Teachscape to provide a standardized training for every evaluator in the state of Illinois. The training includes watching videos, practicing observations, and completing non-biased evaluations. The goal of this requirement is to increase inter-rater reliability and provide a standardized training tool for all evaluators across the state. Classroom observation must assess what is being taught and how the content is being taught ("The New Teacher Project," 2012). The training supports evaluators in the use of the Framework for Teaching to determine the observational and student growth components of the summative rating. Upon completion of classroom observations, the evaluator collects data and transfers feedback to state approved evaluation instrument. Jacob and Lefgren (2008) found observable characteristics were effective in evaluations. Although principals were generally effective and identifying the very best and worst teachers, they were not able to distinguish teachers in the middle of the achievement distribution (Jacob & Lefgren, 2008). The New Teacher Project suggested the rubrics are only as effective as the observers who use them and the systems that support them, indicating the need for quality training and inter-rater reliability ("The New Teacher Project," 2012).

Student Data

Value-added models are estimate effects of individual teachers or schools on student achievement while accounting for differences in student backgrounds ("American Statistical Association," 2014). Harris and Nathan (2009) noted that value-added models are appealing because they attempt to estimate how much teachers contribute to student achievement. In Illinois, value-added models are a key component of PERA. As a result, school districts are required to include data and indicators of student growth as a “significant factor” in teacher

(35)

evaluations ("Performance Evaluation Reform Act," 2010). Student growth is a supplement to the performance-based rubrics to provide another component of information to differentiate teacher performance.

The joint committee determines the percentage of student growth included in the

summative rating. Therefore, the implementation of student growth may look different in every district. The student growth component applies to all certified full-time and part-time staff, excluding psychologists, social workers, speech pathologists, counselors, and nurses. Illinois School Code states that during the first and second year of implementation, student growth must make up at least 25% of the evaluation. In years two and beyond, student growth must make up at least 30% of the evaluation. The motivation exists for collaboration because if within 180 calendar days of the first meeting, the joint committee does not reach agreement on the evaluation plan, then the district shall implement the state model, which includes 50% of the evaluation based on performance and 50% based on student growth ("Performance Evaluation Reform Act," 2010, p. 27).

In support of value-added models, Chetty et al. (2011) completed a longitudinal study that suggested students randomly assigned to talented teachers in kindergarten had significantly higher incomes as adults and generally better future life outcomes in attending college,

retirement savings, and homeownership (Chetty et al., 2010). The results suggested that good teachers could potentially create greater social value as kindergarten classroom quality

potentially impacts wage earnings in the future. The leading theory is improvement in non-cognitive or “soft” skills and social skills formed in kindergarten impact success as an adult. Despite national attention from The New York Times, President Obama, and the MacArthur foundation recognition of this study, separation of causation and correlation is highly debated in

(36)

this study by Adler (2013) who questioned the validity of results. Therefore, continued studies are needed to contribute to the field of research to define what makes a “high-quality” class or teacher.

The Center for American Progress recommends the inclusion of student data as a way to focus attention on outcomes of learning (Donaldson, 2012). States such as Colorado, Idaho, Arizona, and Florida are underway using student growth measures ranging from 35% to 50% of the evaluation (Range et al., 2011). Pecheone and Chung (2006) suggested performance

assessments including evidence from teaching practice have the potential to provide more evaluation of teaching ability. The results from Goldhaber and Hansen (2010) suggested that student achievement data is stable enough that early career estimates of teacher effectiveness predict student achievement at least three years later and better than observable teacher characteristics. Although results may not generalize to larger populations, the study provides support for added measures. According to McCaffrey et al. (2009), the validity of value-added measures depends on the contribution of systematic errors and the stability over time. McCaffrey et al. (2009) found potential for value-added measures to improve the performance of teachers, provided the inclusion of multi-year averages in the calculation. However, there still appears substantial variation in teacher performance over time in qualitative measures. Research by Boyd et al. (2011) found teachers laid off under a value-added system were on average less effective that those laid off under a seniority-based system and showed promise for the ability of value-added measures to differentiate performance. Papay (2011) and Carrell and West (2010) discovered relationships between student achievement and teacher performance which supports the use of value-added measures in teacher evaluation. Further evidence is supported by

(37)

assessment system was able to identify which teachers had higher than expected levels of student achievement. Holtzapple (2003) determined a relationship between teachers who received unsatisfactory and basic ratings and lower student achievement scores on state tests. Teacher value-added models generate unbiased and reasonably accurate predictions of the casual and short-term impact of a teacher on student test scores (Kane & Staiger, 2008). The results suggest teacher evaluation ratings are related to student learning and achievement (Holtzapple, 2003).

Rockoff et al. (2012) suggested that standardized teacher performance data is useful to principals leading school improvement. The results provided support for district moving forward using student achievement data. Aaronson et al. (2007) recommended including controls for across-school differences. There is concern about the instability of value-added year to year. However, McCaffrey et al. (2009) found that three-year averages reduced sampling errors and demonstrated moderate stability. Therefore, joint committees may consider the use of multiple years of data in a combination with other sources to increase reliability. Because value-added measures should not be the sole basis for retention or dismissal, Boyd et al. (2011) suggested a fair and rigorous evaluation tool should include a variety of approaches of assessing teacher effectiveness.

Challenges of Using Student Data

While some research supports the use of student data in teacher evaluation, others suggest concerns about validity and generalizability. Controls are needed for external factors such as class size, curriculum materials, home and community life, prior knowledge, and culture. Internal factors also exist and may include influences from other teachers, school conditions, quality of curriculum, tutoring supports, class size, team teaching, and block-scheduling practices (Baker et al., 2010). Additionally, linking teacher evaluation to test scores may

(38)

discourage teachers from working in the neediest schools (Baker et al., 2010). Borman and Kimball (2005) suggested teacher quality might not show reliable relations to closing the achievement gap between minority and nonminority and low- and high-achieving students. Teachers who serve in large populations of English language learners and low-income students or special education students may be at a disadvantage. Consideration should be given to factors such as appropriate tests, summer learning loss, narrowing of curriculum, and less teacher collaboration (Baker et al., 2010). Teachers in higher socio-economic districts may have higher achieving students receiving support from outside of the school, which may allow for higher scores on standardized tests. In turn, teachers in lower socio-economic communities may be excellent teachers, but the test scores do not reflect their work because of external factors hindering student success. Therefore, value-added measures controlling for extraneous factors are key to ensure the results accurately measure the effect teacher performance on student learning.

A negative effect of value-added measure linked to teacher evaluations could limit the collaboration of teachers and return to the archaic individuality of the teaching profession (Munoz, Prather, & Stronge, 2011). According to Darling-Hammond (2014), there is a need to create and sustain productive, collegial working conditions and allow teachers to work in an environment that supports their learning. Furthermore, evaluation reforms should not adopt individualistic and competitive approaches to ranking and sorting that undermines the growth of learning communities (Darling-Hammond, 2014). Another challenge is that value-added

measures only apply to tested grades and subjects and many teachers do not teach subjects with large scale standardized assessments (Toch & Rothman, 2008). For subjects and grades that are tested, value added estimates may be unstable based on small numbers of students and

(39)

empirically isolating the effects of individual teachers make it difficult to measure (Boyd et al., 2011). Additionally, value-added models are challenging due to the inability to disentangle other influences on student progress (Darling-Hammond et al., 2012).

Some evaluation models assign teachers to four categories by numerical cutoffs to aggregated weighted components (Green, Baker, & Oluwole, 2012). However, according to results found by Green et al. (2012), teachers on either side of a cutoff score are not statistically different from one another. These results caution administrators from basing decisions on high-stakes testing because teachers who border evaluation ratings are often misidentified (Baker, Oluwole, & Green, 2013). In addition, some components of the evaluation model will vary more than others due to a greater chance of random noise in the calculation (Baker et al., 2013).

Another concern about the use of value-added measures is the consistency of data. Factors affecting value-added measures could be validity, reliability, bias, and teacher attrition. Kimball et al. (2004) found tentative evidence when examining the relationship between teacher performance ratings and student test scores, however coefficients were not statistically

significant in all cases. Amrein-Beardsley and Collins (2012) critically examined the effects of the SAS Educational Value-Added Assessment System to analyze evidence collected from four teachers with non-renewed contracts. The results showed none of the four teachers with three years of consistent data, which raised concerns. Kersting, Mei-kuang, and Stigler (2013) explored the effects of data and model specifications on the stability of teacher value-added scores and found that student sample size considerably affected the stability of value-added measures when up to one-third of teachers were reclassified into performance groups.

Rothstein (2010) suggested teacher effects based on studies examining value-added models could not be interpreted as casual. Therefore, research is needed on how to precisely

(40)

measure individual teacher contribution to student learning. Furthermore, it is difficult to determine effective teachers at the time of hire because performance on the job needs to be accumulated over time (Staiger & Rockoff, 2010). Hanushek and Rivkin (2010) found value-added models lacked historical information and failed to account for all relevant variables affecting student achievement. Measuring effective teaching is challenging because of the many factors affecting student learning.

Classroom and school compositional effects may be among the most powerful factors affecting student achievement (Berliner, 2014). One way to reduce system error rates would be to create balance student characteristics across classroom assignments (Schochet & Chiang, 2010). Although ideal, that may not be possible depending on various factors such as individual student needs, scheduling, and teacher certification. Therefore, Rothstein (2010) suggested principals should exercise good judgment by allocating students to teachers in a way to maximize the output of student achievement, since demographics may vary from one class to another. In theory, the concept appears equitable, however special education, English language learners, and students in gifted programs often do not have a choice of teachers. Random sorting may not always be possible when students are assigned to teachers based on certification and course offerings.

Teacher evaluation should not be solely based on student achievement because value-added measures are subject to a considerable degree of random error (Schochet & Chiang, 2010). For example, Staiger and Rockoff (2010) suggested value-added measures could have reliability ranging from 30 to 50 percent. More than 90% of the variability was due to student factors not within control of the teacher, therefore system error rates should be carefully considered when implementing policy (Schochet & Chiang, 2010). Performance measurement systems at the

(41)

school level will likely yield error rates of about five to 10 percent lower than at the teacher level. This may be due to a large sample size and could be a consideration for using school-wide goals instead of individual teacher goals.

However, it can be suggested that principle assessments of teacher effectiveness are reasonably accurate at identifying the best and worst teachers (Jacob & Lefgren, 2008). Misclassification rates could be lower if value-added measures were coordinated with other measures of teacher quality (Schochet & Chiang, 2010). The combination of value-added measures and principal assessments are a strong predictor of teacher effectiveness rather than each type alone (Jacob & Lefgren, 2008). Gallagher (2004) found a stronger relationship between evaluation ratings in literacy than in math. However, additional research is needed to determine if the relationship between performance-based ratings and student achievement differ based on subject area. In any event, high-stakes decisions should not be made based on a single test score (Amrein-Beardsley, 2008; Collins, 2014). Multiple measures are the most appropriate way to create a well-rounded perspective of teacher performance, while controlling for

demographic and extraneous factors. Additionally, multiple cohort data may be a better use of value-added measures due to the increased controls for teacher performance (Kersting et al., 2013). Papay (2012) suggested some of these challenges can be mitigated by using multiple years of data in value-added models and investing heavily in evaluator training to increase reliability. Although, according to Harris and Nathan (2009), the debate is limited because of the assumption that value-added components are the only factor used in high-stakes decisions for determining compensation and retention.

In the end, effective teachers do more than just raise test scores; therefore the best solution may be a hybrid model that combines value-added measures with other forms of

(42)

objective measures. While some researchers suggest that value-added measures are volatile year to year or demonstrate a margin of error, the differentiation of teacher quality is critical based on results demonstrated in the Widget Effect (Weisberb, 2009). As joint committees work together it is important to continue improving the use of value-added measures rather than disregarding them altogether. The American Statistical Association believes value-added models are complex and a high-level of statistical expertise is needed in the development and interpretation

("American Statistical Association," 2014). Additionally, Danielson and McGreal (2000), believe the challenges of using valued-added should not prevent us from using student learning as a standard in teacher evaluation. A collaborative and balanced approach to implementation is key.

Type I, II, and III Assessments

The inclusion of student growth poses limitations and therefore teacher evaluation policies should move toward a holistic system of measurement providing educators with

practical, formative, and timely feedback (Amrein-Beardsley & Barnett, 2012). PERA requires the summative evaluation to include performance-based ratings and student growth. The student growth component provides an assessment tool different from the checklists or binary rating systems of the past. According to PERA, the teacher evaluation plan must include data and indicators of student growth as a significant factor in rating licensed staff performance

("Performance Evaluation Reform Act," 2010). A significant factor of student growth includes 30% of the performance evaluation rating beginning the 2015-2016 school year. Illinois policy allows options for determining the types of assessments used in teacher evaluation. The purpose of the assessments is to analyze the data and identify a change in student knowledge or skills over time. Therefore, PERA allows for options when choosing an assessment appropriate for

(43)

each teacher and grade level. The assessment types are classified into three categories known as Type I, Type II and Type III. A Type I assessment is a reliable assessment measuring a certain group or subset of students and is scored by a non-district entity. A statewide standardized assessment can be used considered Type 1. Another Type I, known as the Measures of

Academic Progress (MAP) test, assesses math, reading, and language. MAP is provided by the Northwest Evaluation Association to measure growth, project proficiency on high-stakes tests, and inform educators on how to differentiate instruction, evaluate programs, and structure curriculum ("Northwest Evaluation Association," 2004). Another common Type I assessment is AIMSweb. As a non-district entity, AIMSweb is designed to predict student growth in reading and math. AIMSweb is a curriculum-based assessment tool for universal screening, progress monitoring, and data management (AIMSweb, 2015).

Type II assessments are developed or adopted and approved for use by the school district and used on a district-wide basis by all teachers in a given grade or subject area. Examples of Type II assessments include collaboratively developed common assessments, curriculum tests, and assessments designed by textbook publishers that are used district wide ("Illinois State Board of Education," 2014b). A Type III assessment is rigorous, aligned to the course curriculum, and the qualified evaluator and licensed staff determine measures of student learning. Examples of Type III assessments include teacher-created assessments, assessments designed by textbook publishers, student work samples or portfolios, assessments of student performance, and assessments designed by staff who are subject or grade-level experts that are administered commonly across a given grade or subject. The student growth component of the evaluation must include a Type I or Type II and include at least one Type III. If neither a Type I nor Type

(44)

II can be identified the evaluation plan will require at least two Type III assessments to be used ("Illinois State Board of Education," 2014b).

The student growth component is a collaborative discussion between the teacher and evaluator. Student growth goals are created by the licensed staff and approved by both the licensed staff member and the evaluator by a date mutually upon. At the goal-setting meeting, the teacher and evaluator determine the assessments and plan for implementation. At this time the measurement model and targets are established. At this time, nurses, speech-language pathologists, social workers, psychologists, occupational therapists, and physical therapists are exempt from incorporating student growth as a sig