Protocol for a Systematic Review: Comparing Alternatively Certified to Regularly Certified Teachers: A Systematic Review and Meta-Analysis

(1)

1 The Campbell Collaboration | www.campbellcollaboration.org

Protocol for a Systematic Review:

Comparing Alternatively Certified to Regularly

Certified Teachers: A Systematic Review and

Meta-Analysis

Jeffrey C. Valentine, Spyros Konstantopoulos, Timothy Lau

Submitted to the Coordinating Group of: Crime and Justice

Education Disability International Development Nutrition Social Welfare Other: Plans to co-register: No

Yes Cochrane Other

Maybe

Date Submitted: 20 February 2013 Date Revision Submitted: 8 August 2013 Approval Date: 27 January 2014

(2)

BACKGROUND

The Problem, Condition or Issue

Like virtually all organizations that hire people to carry out their work, school districts have a strong interest in hiring candidates who are most likely to be successful. The “person on the street” definition of a successful teacher is one who teaches his or her students well.

However, teaching success is a multidimensional construct that also encompasses variables such as duration in the teaching profession (especially in the district in which hired) and collegiality, in addition to broader measures of academic achievement such as the extent to which teachers help students to be self-motivated, and the extent to which they instill a love of learning.

The Intervention

In an effort to ensure that students have qualified teachers, most educational jurisdictions require that teachers be certified before they are allowed to teach independently. Although the specifics vary across jurisdictions, they generally involve (a) earning a bachelor’s degree, (b) completing an approved teacher preparation program, (c) demonstrating basic skills competency (e.g., by obtaining a passing score on a test of basic skills), (d) demonstrating subject matter competence (e.g., by obtaining a passing score on a subject matter test), and (e) successfully completing a teaching internship. Many jurisdictions also require a passing score on a teacher certification exam.

For a variety of reasons, school districts in the United States have also found it advantageous to allow for alternative routes to certification. One force driving this process is the shortage of teachers in certain subject areas, often math and science, and especially at the later grade levels in the southern and western parts of the U.S. The usual process of obtaining

certification represents a significant barrier to working adults who might consider entering the teaching profession as a secondary career. Such individuals may bring to the classroom valuable human capital, such as real-world experience and high degrees of technical

expertise. But entering an undergraduate teaching program—which may require three years of full time study, even for an individual with a degree in the discipline in which they want to teach—may be neither desirable nor feasible. Alternative certification programs are often accelerated, and are often offered in formats that make it somewhat easier for working adults to negotiate (e.g., more evening, weekend, and online courses).

The focus of this proposal is on the relative outcomes for teachers credentialed through an alternative certification (AC) process versus those certified through a traditional certification (TC) program. Emergency certification programs are those that provide temporary

certification for individuals in high need areas (often special education in the United States). Such certification programs will not be a focus of this review, as they serve a qualitatively different purpose.

(3)

3 The Campbell Collaboration | www.campbellcollaboration.org How the Intervention Might Work

Alternative certification usually involves reducing or waiving some of the requirements for teacher certification. Generally, the effectiveness of alternative certification as public policy is thought to hinge on one of two dimensions. First, as discussed earlier, these programs may bring in individuals with different human capital (e.g., levels of knowledge, experience, and/or academic motivation). Alternatively, or in conjunction with differences in human capital, it is possible that alternative certification programs do a better job of preparing teachers. However, the locally regulated nature of teacher certification means that it is difficult to systematically experiment with changes to the process, and in fact the processes used (like those for training doctors and lawyers) emerged from an informal political and public policy framework. A readable summary of the history of certification in the United States is provided by LaBue (1960), who argued that there have been four main stages of certification efforts: (a) initial licensing of teachers, (b) shift in responsibility of licensing from local to state authorities, (c) the establishment and expansion of schools and university departments dedicated to training teachers, and (d) efforts to improve standards for teacher training. LaBue argued that the critical assumption underling certification is that the quality of the student educational experience is primarily driven by the ability and the preparation of teachers, and that the certification process is intended to provide important “quality control” of what are believed to be the most important influences on student learning that are under the control of school systems.

Studies that test the effects observed in alternative vs. regularly certified teachers provide an indirect test of the traditional certification process. The content and specific aspects of both TC programs and AC programs vary, which may lead to variations in the effectiveness of alternatively certified teachers across programs. However, it should be noted that inferences regarding certification type are complicated by the fact that there is a great deal of overlap in the preparation of both alternatively certified and regularly certified teachers in many jurisdictions. For example, AC teachers may receive less instruction in classroom management than TC teachers, but they do receive some. Further, studies of AC vs. TC teachers are usually not tests of the relative effects of specific teacher certification programs (i.e., studies often involve teachers from multiple regular certification programs compared to teachers from multiple alternative certification programs). Therefore, the focus of this review is on comparing the effects observed in TC teachers relative to AC teachers. It is not a review that examines the relative effectiveness of different types of AC programs (e.g., we do not directly compare Teach for America to other AC models). We will, however, conduct a subgroup analysis of the effects of specific AC models relative to TC if we have at least three studies from an identified AC model like Teach for America that meet our eligibility criteria. A prototypical example of research that attempts to link teacher characteristics with student achievement was carried out by Rockoff, Jacob, Kane, and Staiger (2008). These researchers collected an impressive amount of information on the teachers’ academic histories,

(4)

pertinent to this review is that the researchers captured information on certification type (alternative vs. traditional), but they did not examine any specifics about the teachers’ alternative or traditional certification program. As is typical in this line of research, Rockoff et al. (2008) included a number of statistical control variables that attempt to reduce potential bias in their analyses; these included school-level student ethnicity, gender, student-teacher ratio, and eligibility for free lunch, among others. Analyses of student achievement data also included controls for prior mathematics and reading test scores. As described more fully below, for this review we would use the standardized regression coefficient associated with certification type as the effect size for the effect of certification type, and would base meta-analyses on these effect sizes collected across studies.

Why it is Important to do the Review

Several literature reviews on the effects of certification type have been conducted (e.g., Darling-Hammond, Holtzman, Gatlin, & Heiling, 2005; Lai & Grossman, 2008; Sparks, 2004; Suell & Pitrowski, 2007). Generally, authors conclude that no differences exist in student achievement outcomes across certification types. Relatively little attention has been paid to other metrics of success, such as principal ratings of effectiveness or student

psychosocial outcomes.

However, to date no reviews that we know of have combined the elements of a thorough literature search and a state-of-the-art approach to meta-analysis that we propose below. In particular, we believe that the inclusion of individual participant data will be particularly informative (see Statistical Procedures and Conventions, below).

OBJECTIVES

The primary public policy question we are addressing is whether alternatively certified teachers have outcomes that are similar to those obtained by regularly certified teachers. As such, we will synthesize the literature on alternative teacher certification on both student outcomes (e.g., academic achievement) and broader markers of success (e.g., principal ratings, retention in the profession).

METHODOLOGY

Criteria for including studies in the review

To be included in our review, documents must meet several criteria. Specifically, documents will (a) be primary reports (or re-analyses) of the effects of (b) regular vs. alternative teacher certification on (c) teacher employment outcomes (broadly defined, including student academic achievement, student socio-emotional outcomes that might be related to

(5)

Based on our experience with studies that are likely to meet inclusion criteria, we expect that most will rely on information obtained from schools or school districts to determine teacher certification type. As such, for this review we will rely on the label provided by the study authors as the operational definition of TC and AC certified teachers.

Types of study designs

To our knowledge, no studies have randomly assigned teachers to receive either traditional or alternative certification. As such, eligible studies can be either prospective or retrospective (we expect many more retrospective studies than prospective studies), will categorize

teachers by certification type, and will analyze the effects of certification type using regression-based techniques.

Studies must therefore include both TC and AC teachers to be included in this review.

Types of participants

Participants in included studies must consist of teachers in publically funded schools and their students. We anticipate that most studies will be conducted in the United States, but we will search for studies in other countries.

Types of interventions

Interventions in included studies must consist of alternative vs. regular (or traditional) teacher certification.

Types of outcome measures

Eligible operational definitions of outcomes include the following:

1. Student academic achievement: scores on standardized achievement tests (e.g., SAT, ACT ITBS), scores on state-mandated achievement tests, overall grades (e.g., grade point average), and attainment. Within this category, we will synthesize the

achievement and the attainment constructs separately.

2. Socio-emotional outcomes: student motivation, attitudes toward school, and student self-beliefs. Within this category, we will synthesize constructs separately.

3. Teacher retention: turnover metrics (e.g., proportion who teach for at least five years).

4. Observer ratings of teaching: classroom observation and principal ratings (e.g., annual performance review). Within this category, we will synthesize classroom observation outcomes separately from annual performance review outcomes.

(6)

6 The Campbell Collaboration | www.campbellcollaboration.org Exclusion criteria

We intend to place few restrictions on the precise nature of the studies that we include, in part because there are few empirical guidelines that would inform these choices. We believe we do have sufficient empirical justification to limit studies to those that employ a “local” comparison group (Cook, Shadish, & Wong, 2008). This means, for example, that we would exclude studies that compare alternatively certified teachers from one jurisdiction with traditionally certified teachers from another jurisdiction.

We will not exclude studies based on the pattern of covariates used: for example, whether or not they control for teacher experience, prior student achievement, or student socioeconomic status (SES). As described more fully below, we will explore the implications of these design choices in an effort to better inform policy and future research practice.

Studies published prior to 1990 will be excluded.

Search Strategy

The following electronic databases will be used for the search: Academic Search Premier, ERIC, EconLit, Proquest Digital Dissertations, and Education Full Text (Wilson). We propose to use the following search terms:

(“alternative certification” n3 teach* OR “teach* n1 certifi*” OR “alternative teacher certification” OR “alternatively certified teacher” OR “licensed teacher” OR “articled

teacher” OR “qualified teacher status” OR “initial teacher education” OR “initial teacher training”) AND (achiev* or effect*)

Note: terms in bold are designed specifically to pick up on terms used in England. At this point, we do not plan on synthesizing studies from England with studies from the United States.

Because the electronic search includes dissertations and at least some conferences, we are relatively confident that the unpublished literature is being at least partially covered. To complement the electronic search, we will check the websites of the departments of education for all 50 U.S. states. In addition, we will search the websites of the National Bureau of Economic Research, the U.S. Department of Education’s Institute of Education Sciences, the Lumina Foundation, and the Gates Foundation. We will also conduct a Google search from 2010 on “alternative certification” and achievement, focusing on the first 1,000 hits. We will send emails to the lead author of studies that meet our inclusion criteria, asking if they know of other studies that might meet our inclusion criteria. In addition, we will search the reference sections of included studies and relevant reviews for potential studies.

Description of methods used in primary research

(7)

the impacts of type of certification on student outcomes, with student academic achievement being the most common outcome assessed. Typically, but not always, researchers control for prior student achievement. Other relatively common covariates include measures of SES (usually either student-based or neighborhood-based) and years of teacher experience.

Criteria for determination of independent findings

We will use the shifting unit of analysis approach to analyze the data. Specifically, when estimating the overall effect of certification type, we will allow each independent sample to contribute only one effect size to the analysis. However, when analyzing potential

moderators of the effect of certification type, a study could contribute one effect size to every level of the moderator. As an example, assume a study provides both an overall estimate of the effect of teacher certification type on student achievement, then provides separate estimates for students from low SES and from middle SES families. The study would contribute one estimate (the overall estimate) to the main analysis examining the effects of certification type, and one estimate to each level of the SES analysis.

We do not expect any studies to have multiple groups within the AC and TC teacher

categories. If we find studies that have these, our initial approach will be to select the groups that either (a) best capture the research question or (b) best represent the groups used in the other studies in the review. For example, if a study has two groups of TC teachers, one with little experience and one with more experience, and compares these to a group of AC teachers with little teaching experience, we will include only the TC group with little experience.

Similarly, we expect most studies to assess effects at only one point in time. For this reason, when a study presents multiple time points we will select the one that is most like the others in the review.

One source of multiplicity of findings that is less common in Campbell Collaboration reviews, but that we expect to see in the studies included in this review, is that researchers will often present multiple models, and it is difficult to judge which model to use. We will attempt to select the model that appears to be the “final” or “best” model for the overall analysis. For example, if a study estimates the effect of certification type without using prior student achievement as a covariate, then estimates the effect with student achievement as a covariate, the latter will be the estimate that is used in the overall analysis.

Details of study coding and study coding categories

A draft screening guide and a draft coding guide are presented in Appendix A and Appendix B.

All coding (i.e., screening and coding of studies) will be conducted by at least two

(8)

has participated in and led many projects of this type, and TL has screened and coded studies for a Cochrane Collaboration review and has screened studies for a What Works Clearinghouse review.

For screening, JV and TL will code studies based on information available in the titles and abstracts. If JV and TL disagree at screening regarding whether a study should be retrieved, they will ensure that the disagreement was not caused by a clear error. Otherwise, studies will be retrieved for full text evaluation if either JV or TL believe it should, based on the screening criteria articulated in Appendix A.

Studies appearing to meet screening criteria will be obtained, and screened again by JV and TL, this time with the full text of the study available. Studies passing this stage will be coded independently by JV and TL, with disagreements resolved in conference and with SK serving as an arbiter.

Statistical procedures and conventions

We already noted that most studies will use ordinary least squares regression to test the impact of certification type. One difficulty associated with meta-analyzing standardized regression coefficients is that the specific models generating these differ from study to study (e.g., the control variables will likely be unique across studies). Given the complexity of studies of this kind, this is not a surprise. However, it means that the regression coefficients have somewhat different meanings from study to study, and as a result there is no

straightforward way to apply standard meta-analytic techniques to them. We therefore propose to conduct meta-analyses at two different levels of specificity. Each of these is described below.

Meta-analysis of individual participant data

At the most molecular level, we propose to work with the authors of studies meeting

inclusion criteria to obtain their actual data sets (individual participant data, or IPD; Cooper & Patall, 2009). Relative to meta-analyses of effects from summary data, IPD meta-analyses have been shown to yield similar overall estimates of effect (e.g., Olkin & Sampson, 1998), and more precise estimates of the effects of moderating variables (e.g., Lambert, Sutton, Abrams, & Jones, 2002). Even if only some of the data sets are ultimately obtained, having this information will allow us to investigate the effects of adjusting for different covariates, to compare partially and more fully adjusted estimates of effect, and to synthesize a set of studies that use the same covariates, thus proving insight into the potential biases in other studies for which effects are available only in summary form. Notably, the primary

difficulties associated with IPD meta-analyses are (a) actually obtaining the data sets from authors and (b) determining how to use them. The latter is made difficult by the (often) poor documentation accompanying the datasets (e.g., identifying what data labels mean; correctly coding variables as binary, categorical, or continuous in the new software).

(9)

9 The Campbell Collaboration | www.campbellcollaboration.org Meta-analysis of standardized regression coefficients

One partial solution to the problem of having regression coefficients with slightly different meanings across studies is to synthesize conceptually similar groups of studies. For example, we could compare the effects arising from studies that do vs. do not use student SES as a covariate. Furthermore, we can exploit the individual data sets that we will obtain. We can for example create effects from the IPD studies that maximize similarity with the effects arising from other studies. Our primary analysis will involve estimating the relative effect of AC program by controlling for prior student achievement and student SES.

Other Statistical Considerations

When information necessary for computing effect sizes and/or conducting moderator tests is missing, we will write study authors in an attempt to obtain the information.

Due to the aforementioned differences in both AC and TC preparation within and across studies, we cannot defend the assumption that studies will be estimating the same

population parameter. As such, for overall analyses, we will employ a random effects model. For moderator tests, we will employ a mixed effects model.

Results will be presented as mean effect sizes with 95% confidence intervals.

Heterogeneity will be examined by conducting a statistical test and by computing the statistic I2_.

Publication bias will be assessed using the trim and fill method and by presenting funnel plots. Both methods will be cautiously interpreted.

Statistical analyses will be done using the “ipdmeta” and “metawin” packages in R, in conjunction with Comprehensive Meta-Analysis (Borenstein, Hedges, Higgins, & Rothstein, 2005).

We plan on conducting three primary subgroup analyses. First, we will investigate the extent to which effects vary as a function of student SES. We will also investigate the extent to which effects vary as a function of student prior achievement. Finally, we will investigate specific AC models (e.g., Teach for America) if at least three studies using that model meet our eligibility criteria.

Treatment of qualitative research

We anticipate that the great majority of studies we collect will not include a qualitative component. As such we will not attempt to synthesize qualitative research for this project.

(10)

REFERENCES

Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. (2005). Comprehensive

meta-analysis version 2. Englewood, NJ: Biostat.

Cook, T. D., Shadish, W. R., & Wong, V. C. (2008). Three conditions under which

experiments and observational studies produce comparable causal estimates: New findings from within-study comparisons. Journal of Policy Analysis and

Management, 27, 724-750.

Cooper, H., & Patall, E. A. (2009). The relative benefits of meta-analysis conducted with individual participant data versus aggregated data. Psychological Methods, 14, 165-176. doi:10.1037/a0015565

Darling-Hammond, L., Holtzman, D. J., Gatlin, S. J., & Heilig, J. V. (2005). Does teacher preparation matter? Evidence about teacher certification, Teach for America, and teacher effectiveness. Education Policy Analysis Archives, 13(42). Retrieved from http://epaa.asu.edu/epaa/v13n42/

LaBue, A. C. (1960). Teacher certification in the United States: A brief history. Journal of

Teacher Education, 11, 147-172. doi: 10.1177/002248716001100203

Lai, K. C., & Grossman, D. (2008). Alternate routes in initial teacher education: A critical review of the research and policy implications for Hong Kong. Journal of Education

for Teaching: International Research and Pedagogy, 34, 261-275.

Lambert, P. C., Sutton, A. J., Abrams, K. R., & Jones, D. R. (2002). A comparison of summary patient-level covariates in meta-regression with individual patient data meta-analysis. Journal of Clinical Epidemiology, 55, 86-94.

Olkin, I., & Sampson, A. (1998). Comparison of meta-analysis versus analysis of variance of individual patient data. Biometrics, 54, 317-322.

Rockoff, J. E., Jacob, B. A., Kane, T. J., & Staiger, D. O. (2008). Can you recognize an

effective teacher when you recruit one? National Bureau of Economic Research Working Paper No. 14485. Cambridge, MA: National Bureau of Economic Research

Sparks, K. (2004). The effect of teacher certification on student achievement. (Doctoral dissertation). Retrieved from Proquest Dissertations and Theses (305075519).

Suell, J. L., & Piotrowski, C. (2007). Alternative teacher education programs: A review of the literature and outcome studies. Journal of Instructional Psychology, 34(1), 54-58.

(11)

REVIEW AUTHORS

Lead review author:

Name: Jeff Valentine

Title: Associate Professor

Affiliation: University of Louisville

Address: 309 CEHD

City, State, Province or County: Louisville, KY

Postal Code: 40292

Country: USA

Phone: +1 (502) 852-3830

Email: [email protected]

Co-authors

Name: Spyros Konstantopoulos

Title: Associate Professor

Affiliation: Michigan State University

Address: 450 Erickson Hall

City, State, Province or County: East Lansing, MI

Postal Code: 48824

Country: USA

Phone: +1 (517) 432-0259

Email: [email protected]

Name: Timothy Lau

Title: Graduate Research Assistant

Affiliation: University of Louisville

Address: 364 CEHD

City, State, Province or County: Louisville, KY

Postal Code: 40292

Country: USA

Phone: +1 (502) 852-3830

(12)

ROLES AND RESPONSIBLIITIES

• Content: JV conducted a review of the relationship between teacher characteristics knowable at the time of hire and student learning indicators. A proposal for expanding this review is currently under consideration. The overlap in content and methods between that review and the proposed work is considerable. SK is an experienced educational researcher and is broadly familiar with the teacher certification literature. TL is a doctoral student in educational psychology, measurement, and evaluation, and is broadly familiar with the teacher certification literature.

• Systematic review methods: Both JV and SK are highly expert in all methods related to systematic reviewing. Statistical analysis: Both JV and SK are highly expert in meta-analysis. In addition, SK is a statistician and is well-prepared for navigating the complexities of a review of this nature. TL will assist with analysis.

• Information retrieval: Both JV and SK have basic familiarity with information retrieval strategies. TL will assist with study retrieval.

SOURCES OF SUPPORT

This review is supported by a grant from the Campbell Collaboration to Jeff Valentine and Spyros Konstantopoulos.

DECLARATIONS OF INTEREST

There are no known conflicts of interest.

PRELIMINARY TIMEFRAME

Draft protocol submission: February 2013 Revised protocol submission: August 2013 Literature search: August – December 2013 Study coding: January – February 2014 Data analysis: March-July 2014

(13)

PLANS FOR UPDATING THE REVIEW

The need for an updated review will be driven in part by the extent of the evidence base. Research volume in this area seems to be increasing, and as such we anticipate 2-3 years will be an appropriate time frame for an updated review.

(14)

AUTHOR DECLARATION

Authors’ responsibilities

By completing this form, you accept responsibility for preparing, maintaining and updating the review in accordance with Campbell Collaboration policy. The Campbell Collaboration will provide as much support as possible to assist with the preparation of the review. A draft review must be submitted to the relevant Coordinating Group within two years of protocol publication. If drafts are not submitted before the agreed deadlines, or if we are unable to contact you for an extended period, the relevant Coordinating Group has the right to de-register the title or transfer the title to alternative authors. The Coordinating Group also has the right to de-register or transfer the title if it does not meet the standards of the Coordinating Group and/or the Campbell Collaboration.

You accept responsibility for maintaining the review in light of new evidence, comments and criticisms, and other developments, and updating the review at least once every five years, or, if requested, transferring responsibility for maintaining the review to others as agreed with the Coordinating Group.

Publication in the Campbell Library

The support of the Coordinating Group in preparing your review is conditional upon your agreement to publish the protocol, finished review, and subsequent updates in the Campbell Library. The Campbell Collaboration places no restrictions on publication of the findings of a Campbell systematic review in a more abbreviated form as a journal article either before or after the publication of the monograph version in Campbell Systematic Reviews. Some journals, however, have restrictions that preclude publication of findings that have been, or will be, reported elsewhere and authors considering publication in such a journal should be aware of possible conflict with publication of the monograph version in Campbell Systematic

Reviews. Publication in a journal after publication or in press status in Campbell Systematic Reviews should acknowledge the Campbell version and include a citation to it. Note that

systematic reviews published in Campbell Systematic Reviews and co-registered with the Cochrane Collaboration may have additional requirements or restrictions for co-publication. Review authors accept responsibility for meeting any co-publication requirements.

I understand the commitment required to undertake a Campbell review, and agree to publish in the Campbell Library. Signed on behalf of the authors:

(15)

APPENDIX A: DRAFT SCREENING GUIDE

1. Does the document report on a study? 0. No

1. Yes 2. Can’t tell

9. The study is a relevant literature review

IF NO THEN STOP

2. Does the study involve a P-12 educational system (or equivalent if not in the U.S. or Canada)?

0. No 1. Yes 2. Can’t tell

IF NO THEN STOP

3. Is the study quantitative? 0. No

1. Yes 2. Can’t tell

IF NO THEN STOP

4. Does the study involve teachers who have been certified to teach through an alternative to the usual certification process?

NOTE: Although they are not likely to be part of this review, include studies of teachers who have emergency certification.

IF NO THEN STOP

5. Does the study compare outcomes for alternatively vs. regularly certified teachers?

(16)

APPENDIX B: DRAFT CODING GUIDE

Traditional vs. Alternative Teacher Certification Coding Guide v 0.1

Report Characteristics

Study ID

Article Title (short) First Author Year of Publication

Publication Type 1. Journal article

2. Dissertation 3. Conference presentation 4. Report 5. Other _________________________ __ Certification Characteristics

Type of non-traditional certification 0. Emergency

1. Alternative

2. Other

_________________________ __

Was the non-traditional certification process described in the report?

0. No

1. Yes

If yes, indicate page number

Sample Descriptors

Country 1. U.S.

2. Canada

3. Other _____________________

Schools

Total number of schools School size (describe) School SES (describe)

School racial/ethnic mix (describe) School average achievement level (describe)

Teachers

Traditionally certified teacher age (mean, sd)

Traditionally certified teacher ethnicity % White

% African-American/African descent % Latino or Hispanic

% Asian descent % Minority % Other

(17)

% Unknown Traditionally certified teacher teaching

experience (mean, sd)

Traditionally certified teacher % with advanced degree

Traditionally certified teacher % with education bachelor’s degree

Alternatively certified teacher age (mean, sd)

Alternatively certified teacher ethnicity % White

% African-American/African descent % Latino or Hispanic % Asian descent % Minority % Other % Unknown Alternatively certified teacher teaching

experience (mean, sd)

Alternatively certified teacher % with advanced degree

Alternatively certified teacher % with education bachelor’s degree

Students

Student age (mean, sd)

Student ethnicity % White

% African-American/African descent % Latino or Hispanic % Asian descent % Minority % Other % Unknown Student grade level(s)

Student achievement level (describe)

Measurement Characteristics DV # _____ of ______

Note: Complete this section and the effect size section for each DV

Note: When there are multiple DVs, be as clear as possible in your

labeling. Use the rest of the cell to make notes that might help your memory.

Note: Treat subscales as separate measures.

(18)

if given, e.g., TOWL-3, otherwise, indicate subject matter – e.g., reading, math chapter test, etc.)

Source/Informant for Data 1. Participant

2. Parent

3. Teacher

4. Raters (e.g., principals) 5. Archival (e.g. records) 6. Other

______________________ 9. Can’t tell

What construct is this instrument tapping? 1. Student academic achievement

2. Student social and/or emotional functioning (e.g., well-being) 3. Teacher job satisfaction 4. Rating of teacher effectiveness 5. Teacher time in profession

6. Other

_________________________ ___

Were reliability estimates reported for the achievement measures?

0. No

1. Yes

What was the reliability estimate? (Note: prefer coefficient alpha from the sample over coefficient alpha from another source if both are given. If multiple estimates are available – e.g., boys and girls – average the estimates. )

If a study presents multiple types of reliability estimates, use in this order: 1. internal consistency 2. split half 3. test-retest)

List reliability type 1. coefficient alpha or KR- ## or

internal consistency or Cronbach’s alpha

2. test-retest 3. split half 4. Cant’ tell

List source of reliability estimates 1. participants in this study

2. cited from another study 3. Can’t tell

If judges were involved in assessing the DV, did the authors report judge agreement?

0. No, there were judges but judge agreement was not reported

(19)

percent agreement.

2. Yes, judge agreement was reported as Cohen’s kappa (chance-adjusted agreement) 3. Yes, judge agreement was reported in a metric other than percent agreement or Cohen’s kappa

9. N/A, there were no judges involved in the assessment of this DV

Did the authors cite any evidence regarding the validity of the measures?

0. no

1. yes, from evidence generated in this study

2. yes, from evidence generated in another study

At how many points in time was the DV measured? (Not including pretests)

Effect Size Information

Number of traditionally certified teachers Number of alternatively certified teachers Number of students taught by traditionally certified teachers

Number of students taught by alternatively certified teachers

Are multiple effect size estimates available for this DV? (e.g., based on different model specifications)

0. No

1. Yes

Variables that are controlled in the effect size estimate (list all, by category)

School Teacher Student

Other (be specific) Effect size estimate (note: if multiple effect

sizes are available, choose the one that the author(s) seem to offer as their “best” estimate (e.g., their final model)

Effect size metric 1. unstandardized regression

coefficient (e.g., b)

2. standardized regression coefficient (e.g., B or β)

3. Other ______________

Effect size standard error Does the standard error reflect

adjustments for clustering etc.? (note: These will often be referred to as robust standard errors.)