Summary of Data Analysis. - An Argument-Based Validation Study of the Teacher Performance Asses

To summarize, the following table provides an overview of the validity methods used in the study and the validity evidence collected to address each research question. An asterisk (*) indicates the use of TPA data (See Appendix I for IA and Evidence).

Table 3.32

Summary of Validity Argument Question, Evidence, and Method

Validity Question Validity Method Validity Evidence

Inference 1: Construct Representation 1. Do the tasks elicit

performances that actually reflect the intended construct (teacher readiness)?

1. Analysis of documents to correlate teaching standards and performance criteria.

2. Analysis of performance data using statistical methods (descriptive, factor, bivariate analysis) from the sample of candidates to explore the relationships between items and constructs.

3. Analysis of examiner/expert responses and opinions about the types of cognitive demands on candidates, construct irrelevant variance, and construct validity.

4. Analysis of performance data and

corresponding scores for insights as to how the questions were answered by candidates and scored by examiners.

Documents* Test scores & submissions* Interviews Course evaluations Surveys Factor analysis* Item analysis* MTMM*

5. For misfitting items, analysis of candidate survey and case study responses to gather insights into sources of construct irrelevant variance.

Inference 2: Scoring/Evaluation 2. Are the scoring procedures

sound and reliable? 3. Are the rubric score levels

achieved by the candidate actually representative of what that candidate performed on the TPA? (Does candidate

performance correlate to the assigned test score)?

1. Review of SCALE/Pearson documents on marking and scoring procedures. 2. Analysis of the reliability of scores.

3. Statistical analysis of candidate exam results. 4. Statistical analysis of candidate exam results in

relationship to results of similar traits collected from different instruments.

5. Analysis of candidate responses on surveys and in case study interviews.

Documents* Test scores* Interviews Course evaluations Self-evaluations Surveys MTMM* Inference 3: Generalization 4. Are the score levels

achieved on the rubrics a true representation of a candidate’s performance? In other words, are the scores a candidate earned consistent and

generalizable with other samples of that

candidate’s teaching performance?

5. Does poor performance on the TPA imply a lack of adequate mastery of the construct?

6. How generalizable are the criteria, rubrics,

procedures, and scores derived from the TPA across different candidates and handbooks?

7. Does TPA proficiency depend upon factors beyond the candidate’s control? How

generalizable are the criteria, rubrics, procedures, and scores derived from the TPA across testing sites, placements and placement length and programs?

1. Analysis of examiner/expert opinions on TPA construct and rubric constructs and the tasks that evaluate candidate KSJ within each construct.

2. Analysis of mentor/ supervisor opinions of the TPA construct and rubric constructs and the tasks that evaluate candidate KSJ within each construct.

3. Analysis of documents to compare and contrast teaching standards to performance criteria on tasks.

4. Analysis of candidate TPA scores. 5. Correlation of candidate TPA scores in

relationship to student teaching success. 6. Analysis of candidate responses on assessment

procedures from observations, case studies and surveys.

7. Analysis of candidate TPA scores correlated to supervisor/mentor course evaluations and lesson observations.

8. Analysis of candidate responses on assessment procedures from observations, case studies and surveys.

9. Analysis of candidate TPA scores correlated to supervisor/mentor course evaluations and lesson observations.

10. Analysis of candidate TPA scores across programs and disciplines.

11. Analysis of reliability of scores.

Documents* TPA Scores* Surveys Interviews Course Evaluations Observations Self-Reports MTMM* Case Study Interviews Document Review* Composite Reliability Analysis*

Inference 4: Extrapolation 8. Do TPA test scores

provide reliable indicators of a readiness to teach? Are the TPA scores, as a whole, a true

measurement of teaching ability?

1. Correlation of candidate TPA scores in relationship to student teaching success. 2. Analysis of candidate responses on assessment

procedures from observations, case studies and surveys.

3. Analysis of candidate TPA scores correlated to supervisor/mentor course evaluations and lesson observations.

4. Analysis of candidate TPA scores across programs and disciplines.

TPA Scores* Course Evaluations Observations Interviews Surveys MTMM*

Inference 5: Decision Making 9. Guidance is in place so

that all stakeholders know what scores mean and how the outcomes will be used?

1. Analysis of stakeholder surveys and interviews gathering views on what they know about the TPA procedures, scoring, meaning and uses. 2. Review of guidance documents, the TPA

handbook, and materials relating to score meaning and use.

Documents* Surveys Interviews MTMM*

The ABV framework and the IA developed for this study mandated that the VA include multiple qualitative and quantitative data for each of the RQ. This section outlined the data analysis methods used, based on types of evidence collected.

Limitations

As is often the case with ABV, one of the limitations of this study is that the breadth of data collected was extensive. Though the TPA is one assessment, its complexity, and the numerous types of data required to validate each inference, meant that this was a broad study, rather than an in- depth review of a single construct (i.e., one set of rubrics or one task). Compared to other validation studies, complexity in the number of types and instruments limited a detailed analysis and

discussion of any one part of the instrument, or any one sub-group of participants. However, some data is more significant in addressing the research questions and the IA of this project. Those data include the TPA scores, course evaluations, and the case study interviews. These will be analyzed in more depth because they helped to better explain the consequences and uses of the TPA. They are, simply put, more significant sources of evidence. Recommendations and suggestions for future studies will be discussed in Chapter five.

Another limitation in this study is that the participants came from different programs that did not always collect the same sources of information on candidates’ performance during ST. For

this reason, it was often the case that the data analyzed applied to only one of the programs, approximately half of the participants. Where similar instruments were collected, they were not always identical. Efforts to simplify the data by coding same standards and traits made it possible to compare these evidence types. When comparison was not possible, or the population size is

affected, it has been indicated.

Finally, the principal investigator was charged with assisting candidates in their understanding and completion of the TPA during the field test. Because her role was to help

candidates perform well on the TPA, the struggles, concerns, and advice revealed in the early phases of data collection were immediately applied for programmatic improvement. As such, the data is irrevocably shaped by the investigators influence. For instance, TPA scores were changed by these programmatic improvements, in ways that cannot be fully measured. Whenever possible, the investigator noted when a programmatic change resulted from research collected in process from this study and will be discussed as the evidence is analyzed and presented in Chapter four.

Conclusion

ABV guided a set of comprehensive procedures for the development of the research design, including the IA, based on the assumptions and claims of the uses of the TPA scores. A series of assumptions determined the types of data to be collected, which were analyzed using both

qualitative (coding) and quantitative (statistical) methods. The methodologies used in the analysis of the data are varied and diverse because the types of data are similarly diverse, as is the nature of a multiple method study, and requisite in order to fully answer the research questions. Using ABV, this chapter proposed five core inferences and nine sub-questions that determine the research questions of this study. In Chapter four, the data results from the validation study and the findings for each of the research questions will be shared.

Chapter Four

In document An Argument-Based Validation Study of the Teacher Performance Assessment in Washington State (Page 128-132)