What does your organization want to learn from an evaluation in the new location? What specific evaluation questions and outcomes would your organization be most interested in focusing on as part of the effort to improve your evidentiary base? How would you plan to use the information you learn from this evaluation? As Blueprint exists to scale the effective practices of gap-closing schools to districts across the country, learning from each implementation of our model allows us to better systematize and augment our implementation as a whole. Therefore, identifying the impact of our work in Greater Boston, as well as the various elements of our work that drive this impact will be essential if we hope to further enhance our model in Greater Boston and across the United States.
Specifically, some evaluation questions that we typically ask and that we hope to continue to learn from in Greater Boston include:
1) What is the impact of a Blueprint-partnered turnaround on academic achievement? What is the impact of enrollment in a Blueprint-partnered school on academic growth? 2) What is the impact of a Blueprint-partnered turnaround on student attendance? 3) What is the impact of a Blueprint-partnered turnaround on out-of-school suspensions and other behavioral outcomes?
4) What is the impact of a Blueprint-partnered turnaround on high school graduation rates?
5) What is the impact of a Blueprint-partnered turnaround on college acceptance? As Blueprint has focused primarily on traditional benchmarks and on accelerated achievement in low-performing schools, evaluations have yet to identify our impact on important inputs such as, college completion, and non-traditional outcomes such as grit and persistence. In turn, Blueprint is interested in better understanding the answers to the following questions that Blueprint’s evaluations have yet to address:
1) What is the impact of a Blueprint-partnered turnaround on principal recruitment efforts?
2) What is the impact of a Blueprint-partnered turnaround on principal retention? 3) What is the impact of a Blueprint-partnered turnaround on principal quality?
4) What is the impact of a Blueprint-partnered turnaround on teacher recruitment efforts? 5) What is the impact of a Blueprint-partnered turnaround on teacher retention?
6) What is the impact of a Blueprint-partnered turnaround on teacher value-added? 7) What is the impact of a Blueprint-partnered turnaround on student self-regulation? 8) What is the impact of a Blueprint-partnered turnaround on student persistence / grit / character development?
9) What is the impact of a Blueprint-partnered turnaround on college completion?
10) What is the impact of a Blueprint-partnered turnaround on high-performing students? The findings from such an analysis may enable Blueprint to adjust some of its
achievement across all student groups, and to coach principals and teachers on how to foster deeper socio-emotional skills in their students.
Given the SIF-defined levels of evidence described on page 11 of the RFP, how would you describe your organization’s current level of evidence? Please explain, including an example of an evaluation or research methodology, date of completion and the findings.
The strength of the research base supporting Blueprint’s framework, as well as the
ongoing research that underpins Blueprint’s implementation, justifies the classification of the organization’s current levels of evidence as “strong” with respect to SIF’s guidelines. A detailed description Blueprint’s research base, the methodologies supporting them, and the evaluations taking place of Blueprint’s work is discussed below.
Research Base
Blueprint’s model is driven by the most comprehensive data-collection exercise ever conducted on gap-closing schools, an exercise that resulted in the identification of the five tenets we implement in partnership with schools. This study, as described and analyzed in Dobbie and Fryer (2011) included the collection of a host of data across dozens of schools of varying quality in New York City, including:
School Leader Interview Data Student Survey Data
School Artifact Data
Classroom Observational Data
From this data collection, researchers at the Education Innovation Laboratory at Harvard University (EdLabs) identified what strategies were most highly correlated with the success of the gap-closing schools. It was from this strong correlational research that the five tenets for school turnaround were born (See Attachment 1 for full report).
An earlier study of the causal impact of attending a high-poverty high-achievement school – the Harlem Children’s Zone (HCZ) Promise Academy (Dobbie and Fryer, 2010) - spearheaded the abovementioned data collection and analysis. The evaluation of HCZ fostered the overarching conviction that school-based interventions alone have the potential to close achievement and opportunity gaps in high-need communities. This study used a number of identification strategies to estimate the causal impact of attending the Promise Academy including a 2-stage least squares (2SLS) Instrumental Variables (IV) identification and the HCZ lottery as a natural randomization.
More significant than the research that has informed our program model, however, is the statistical power of the evaluations of Blueprint’s current work. EdLabs, Blueprint’s research and evaluation partner for our work in both the Houston Independent School District (HISD) and Denver Public Schools (DPS), was the lead architect of the initiative in Houston and oversaw the identification strategies for HISD’s Apollo 20 Program and DPS’s Denver Summit School Network (DSSN). The implementation strategies for both the Apollo 20 Program and the DSSN have been carefully designed to facilitate the
collection and analysis of data that is used for comprehensive program evaluations. Although the implementation approaches differed slightly across each district, both facilitated causal estimates for the Apollo 20 and DSSN initiatives. The identification strategies for each initiative include:
Apollo 20 Secondary Schools: The nine (9) Apollo 20 secondary schools were selected using a regression discontinuity design among the lowest achieving schools in HISD. Control schools represent those schools just above the cutoff for being identified as HISD high priority schools.
Apollo 20 Elementary Schools: The eleven (11) Apollo 20 elementary schools were randomly selected among a set of 22 targeted elementary schools. Control schools consist of the 11 schools not selected to participate in Apollo 20.
DSSN Schools: Causal impact was estimated using a matched pairs/propensity score match identification strategy. For each student enrolled in the DSSN, a student “match” was identified in another Denver Public School that was not participating in the DSSN turnaround effort. Student “matches” were selected using demographic, achievement, and administrative historical data.
See below for an example of the methodology used for EdLabs’ evaluation of the DSSN (ongoing):
DSSN Evaluation Methodology – Ordinary Least Squares
The simplest and most direct test of the Denver Summit Schools program would be to examine the outcomes of interest (achievement test scores, behavior, attendance,
graduation rates, college application rates, etc.) regressed on an indicator for enrollment in one of the Blueprint’s partner schools (D) and controls for basic student characteristics (Xi):
Outcomei,s = α2+βXi+ γZs+ δD+εi,s Where:
α: A constant value estimated from the statistical equation of interest. If all other variables are equal to zero, this constant variable α will take some value depending on what the data tells us.
β: A number of coefficients that tell us the impact of each student-level characteristic on the outcome we are looking at (e.g. the impact of being female, Hispanic, or from an impoverished background).
Xi: characteristics for some student i (e.g. female, Hispanic, being from an impoverished background).
γ: A number of coefficients that tell us the impact of each school-level characteristic on the outcome we are looking at (e.g. the impact of a school’s location, size, etc.). Z¬s: characteristics for some school s (e.g. a school’s location, size etc.).
δ: The causal effect of enrollment in a Blueprint-partnered school on the outcome of interest.
D: An indicator for enrollment in a Blueprint-partnered school.
ε¬i,s: A statistical error term capturing random fluctuations. This error term is clustered at the school level.
For this analysis, EdLabs selected, ex ante, “matched pairs” of schools and/or students to serve as a quasi-experimental control group. Their matching algorithm is based on propensity score methods (see Rosenbaum and Rubin, 1983). For each of Blueprint’s schools, EdLabs calculated a “propensity score,” or probability of being selected for treatment. To do this, Edlabs begin by estimating a logistic regression model based on the rich set of observed covariates collected by DPS. Because these schools have been
selected on the basis of their failure to meet federal accountability guidelines, the propensity score is primarily based on the schools’ mean proficiency levels on the Colorado State Assessment Program (CSAP) in reading, math, science, social studies, and writing, although the School Performance Framework (SPF), a calculation of the school’s overall performance and growth, will also be highly considered. Thus, it is relatively simple to construct a control group using those schools that are the closest matches to our treatment schools based on their propensity scores. This method relies on a control group that is as similar as possible to Blueprint’s treatment schools on the variables that were most important for their selection into treatment. An important limitation of this approach is that estimates will be biased if there is some variable that Edlabs does observe which is both correlated with Blueprint’s outcomes and
differentially affects treatment and control schools.
After constructing the control group, EdLabs used two statistical models to estimate the causal effect of being enrolled in a DSSN school. Let D be an indicator for being enrolled in a DSSN school on the first day of school. The mean difference in outcomes between students who were enrolled in a D school at the beginning of school (D=1) and students who are enrolled in the control schools (D=0) is known as the “Intent-to-Treat” (ITT) effect, and is estimated by regressing student outcomes on D. In theory, predetermined student characteristics (Xi) should have the same distribution within the treatment and control group because treatment assignment is statistically independent of other variables (or covariates). In small samples, however, more precise estimates of the ITT can often be found by controlling for these student characteristics (Xi). The specifications EdLabs will estimate are of the form:
Outcomei,s = α2+Dδ1+ Xiβ1+Xsγ1+ε2i,s
The ITT is an average of the causal effects for students enrolled in schools that were selected for treatment compared to students in our control schools that did not receive treatment – from the above equation. In other words, ITT provides an estimate of the impact of being enrolled in a DSSN school at the beginning of the school year.
Under several assumptions (i.e. that control schools do not receive the treatment, and that enrollment in DSSN schools only affects outcomes through the effects of the
interventions), we can also estimate the causal impact of actually attending a DSSN school. This parameter, known as the “Treatment-on-the-Treated” (TOT) effect,
measures the average effect of attending a DSSN school. The TOT parameter can be estimated through a two-stage least squares regression of student achievement on fraction of the school year that the student was enrolled in a DSSN school (Fraction of Year in Treatmenti) using initial assignment, D, as an instrumental variable for the fraction of the school year treated:
Outcomei,s = α2+Fraction of Year in Treatmenti* δ2+ Xiβ2+Xsγ2+ε2i,s
The TOT is the estimated difference in outcomes between students who actually attend DSSN schools and students in the control group whose schools were not selected. Results
HISD Apollo 20:
At the close of the first year of the Apollo 20 implementation, EdLabs published a working paper demonstrating the results of the initiative. EdLabs’ evaluation of the first year of the Apollo 20 program in HISD’s network of schools found evidence of a statistically significant impact of the turnaround initiative on student achievement. The highlights of these results include:
The math skills acquired by the average Apollo 20 student represent an estimated extra 3.5 months of additional schooling (across all grades).
Sixth-grade students, who received daily math tutoring, gained the equivalent of 6 additional months of schooling. Gains achieved by ninth-grade students ranged between nearly 5 months of additional learning to more than 9 months of additional learning. Reading performance improved slightly, producing results roughly equal to or just less than a month of additional instruction.
Across the Apollo 20 high schools, 100% of mainstreamed graduating seniors were accepted to a 2- or 4-year college in 2011.
The detailed evaluation of Apollo 20 can be found in Attachment 2.
DPS DSSN: While the independent evaluation for the DSSN has yet to be published, the DSSN’s 2012 state test results show compelling student performance growth in schools that implemented all five Blueprint tenets. Results include the following:
Math Performance Highlights
All grades in all Blueprint schools that implemented the five tenets outperformed DPS in percentage point gain for students scoring either proficient or advanced in math
All Blueprint schools ranked in the top 13 percent of all Colorado schools for yearly test score growth in math
After a single year, the DSSN closed the achievement gap with its peers in math by 23.4%, with two of the six schools receiving all five tenets reversing the achievement gap with the district average in a single year.
Reading Performance Highlights
students scoring either proficient or advanced in reading
Three of six Blueprint schools ranked in the top 6 percent of all Colorado schools for yearly test score growth in reading
Third grade TCAP Reading scores, which are indicators for placing students on Individualized Reading Plans, identified that DSSN turnaround schools outpaced the district and in the state with respect to year-over-year growth in proficiency (showing 7, 17 and 18 percentage point increases each). The district and state, in contrast, only showed 3 and 1 percentage point increases, respectively.
EdLabs’ initial analysis indicated that first year results showed increases in achievement that surpass those of the Harlem Children’s Zone’s Promise Academy as measured by standard deviation increases relative to a control group.
School Culture Performance Highlights
Students in Blueprint’s phase-in high schools improved attendance by 5.6 percentage points by the end of the year, to 90.6% (up from 85% in 2010-11). These phase-in schools boasted higher attendance than the district high school average of 88.2% for the year.
Behavioral events declined meaningfully year-over-year. The ninth grade phase-in schools demonstrated 11 percent fewer Out of School Suspensions when compared to the same cohort in 2010-11.
For a detailed breakdown of DSSN results please see Attachment 2
Describe your organization’s current evaluation activities and how they align with your theory of change. Please include staffing and responsibility for evaluation and performance management within the organization and its local offices.
Blueprint is committed to ensuring that ongoing evaluations occur for each year of its implementation. As a result, Blueprint currently has two ongoing evaluations of its work in partnership with HISD/EdLabs and DPS taking place, both for the 2011-12 school year and for the coming years. Blueprint’s Director of Operations and the Regional Director of each district currently work with evaluators conducting any analyses of Blueprint’s work. Blueprint engages its leadership team during these evaluations to ensure evaluators understand the implementation (or treatment) so they can set controls for the various elements of its model.
Describe your organization’s performance management and/or regular ongoing data collection efforts. Specifically, what are examples of performance measures or outcomes your organization is currently using to determine if programs are making a positive impact? What are three-to-five key sources of data you are using to evaluate the impact? How is data collected, who analyzes it, who reviews it and how does your organization use this data to make decisions?
In addition to the end-of-year evaluations performed by independent evaluation partners, Blueprint regularly conducts ongoing evaluations of its work. Blueprint observes a variety of necessary inputs at the classroom, school, and network level and provides real- time feedback on these efforts to its district partners. The following sections outline our current evaluation activities including:
1) On-going data-collection and analysis; and 2) The site visit process.
In conjunction with the end-of-year program evaluations described above, these regular evaluation exercises enable Blueprint to benchmark the extent to which it and its partners are achieving short, medium, and long-term goals.
1) Ongoing data collection and analysis
Blueprint is dedicated to ongoing data-collection and monitoring, both to track the progress our partner schools are making, but also to ensure our implementation is effective. Data collection methods depend on the source of the data. Specifically, the various domains are collected in the following ways:
• Formative assessment data: Collected at the school level by teachers on a daily and weekly basis, formative assessment data is used to identify which students are mastering particular objectives and identify targeted interventions for students as needed. Principals, teachers, and teacher teams use formative assessment data for this purpose on a week-to- week basis, collecting both exit slip data and week-end assessment data to drive
instructional decisions. While Blueprint does not manage this data collection directly, our Regional Director works with schools to ensure that this data collection, interpretation, and planned adaptation/differentiation takes place. Additionally, Blueprint looks for indicators surrounding formative assessment data use during its monthly site visits. • Interim assessment data: Blueprint and its district partners collect interim assessment data using whatever district systems or portals exist for pulling data. In instances where no system exists, Blueprint will either work with the District to implement a system or will leverage other partners to administer interim assessments.
• Administrative data: As with interim assessment data, administrative data is often housed on a district’s Student Information System (SIS), which facilitates fairly easy data collection on a weekly basis. Blueprint collaborates with its district partners to generate weekly engagement reports, which snapshots administrative outcomes across schools and provides a year-to-date comparison with the previous year to identify trends.
• Observational data: Observational data is collected frequently within the school, with principals expected to observe teachers every two weeks and to provide feedback on that observation soon thereafter. Blueprint collects classroom observational data each month during its site visits, which it then aggregates. Dashboards that highlight snapshots of classroom strengths and areas for growth are created from this data and used in the monthly reporting process that leads to eventual action items in each school. From the data, network schools are scored on relative (e.g, comparing across DSSN