Validity Evidence 3: Internal Consistency with Cronbach’s Alpha and Factor Analysis

Test reliability is a validity concern primarily aimed at the accuracy of measures. Reliability can be determined in two ways. First, reliability is based on internal consistency, the precision and regularity of test scores on one testing event. Second, reliability is concerned with the stability of the test scores over time; and stability is often measured through a test-retest. The reliability of scoring,

or inter-rater agreement, is critical for determining internal consistency. As a new, high-stakes field test, data to determine the reliability of test scores over time was not available. For this reason, internal consistency and inter-rater agreement of TPA scores was the primary objective of the traditional analysis of test score data.

Exploratory factor analysis was conducted in SPSS on item level score data to explore the traits that may underlie the test scores for the fifty-eight participating candidates. For example, are the rubric items within a task testing the same construct (trait), or are some rubrics testing unrelated skills such as writing or technological ability? Factor analysis can provide information to explore relationships between scores on different rubrics and provide insights about how, together, the rubrics contribute to the measurement of a single construct, how they might measure different constructs, or unintended traits (construct validity). If the latter occurs, it raises a question about the relevancy of the trait being assessed and the meaning for the construct, as a whole. Factor analysis is a well-established methodology for determining construct validity. Cronbach and Meehl (1955) argued that factors can be considered synonymous with constructs on an assessment.

Findings.

To determine reliability across all TPA subject areas, Cronbach’s alpha (α) was calculated for each of the first three tasks and the two embedded categories. Cronbach’s α is a common indicator of the internal consistency of a testing method and is used to estimate the reliability of scores across a sample of test-takers. Task 4 is measured by one rubric and could not be calculated using Cronbach’s α. The three subscales of the TPA all had high reliabilities, with

Table 4.2

Cronbach’s  by TPA Task and Category

Cronbach’s  Task 1 .839 Task 2 .779 Task 3 .805 Academic Language .792 Student Voice .731

A principal component analysis (PCA) was conducted on the sixteen rubrics (rubric 2 reported two sets of scores) with orthogonal rotation (varimax). The Kaiser-Meyer-Olkin measure verified the sampling adequacy for the analysis, KMO=.835 (“great” according to Field, 2009), and all KMO values for individual items were above the acceptable limit of .5 (Field, 2009). Bartlett’s test of sphericity x2 _{(120) =465.068, p < .001, indicated that correlations between items were sufficiently} large for PCA. In factor analysis, eigenvalues help researchers determine how many variables are significant. An initial analysis obtained eigenvalues for each component.

Table 4.3 shows an abridged factor analysis output table displaying the total variance explained. The “total” column indicates the eigenvalues corresponding to the three factors of interest. The “% of Variance” column demonstrates how much variance can be explained by each of the three individual factors. The “Cumulative %” column shows each consecutive factor added together to indicate the amount of variance. Three components had eigenvalues greater than one. An eigenvalue of one indicates that the factor variability explains as much as a single original variable might (Shaw, Crisp, & Johnson, 2012). The total variances explained by the first three components were the most influential.

Table 4.3

Abridged Factor Analysis

Component Initial Eigenvalues

Total % of Variance Cumulative %

Task 3: Assessment 7.095 44.343 44.343

Task 1: Planning 1.697 10.607 54.950

Three components had eigenvalues over Kaiser’s criterion and in combination explained 62.99% of the variance. Figure 4.1 shows the scree plot for analysis. The scree plot displayed inflexions that would justify retaining components one through four. Given the sample size, and the convergence of the scree plot and Kaiser’s criterion on three components, three is the number of components that were retained in the final analysis.

Figure 4.1

Scree Plot Analysis

Table 4.4 shows the related component matrix for factor loadings after rotation. The rotation allowed questions relating to each factor to be considered for commonalities. The items that cluster on the same components suggest the following inferences about the meaning of the factors:

 Factor 1 rubrics appear to measure teaching readiness around assessment and correlate with assessment of student voice and Task 4: professional reflection.

 Factor 2 rubrics appear to measure teaching readiness to plan the taught segment and correlate with planning for academic language.

 Factor 3 rubrics appear to measure teaching readiness around observed practice from the instructional video, including observed use of student voice and academic language. Table 4.4

Rotated Component Matrix

Rotated Component Matrixa

Component

Assessment Planning Instruction Student Voice: Supporting Student Use of Resources to Learn

and Monitor their own Progress

.769 Assessment: Using Assessment to Inform Instruction .723 Assessment: Using Feedback to Guide Further Learning .719 Student Voice: Reflecting on Student-Voice Evidence to

Improve Instruction

.665 .438

Assessment: Analyzing Student Work .630

Planning: Planning Assessments to Monitor and Support Student Learning

.519 .486

Analyzing Teaching Effectiveness .509

Planning: Planning for [Content Specific] Understandings .841 Planning: Using Knowledge of Students to Inform Teaching and

Learning, A

.821 Planning: Using Knowledge of Students to Inform Teaching and

Learning, B

.817 Academic Language: Understanding Students’ Language

Development and Associated Language Demands

.436 .621 Academic Language: Developing Students’ Academic Language

and Deepening Content Learning

.509 .415

Instruction: Engaging Students in Learning .853

Instruction: Deepening Student Learning .774

Student Voice: Eliciting Student Understanding of Learning Targets

.472 .534

Academic Language: Scaffolding Students’ Academic Language and Deepening Content Learning

.521

Extraction Method: Principal Component Analysis. Rotation Method: Varimax with Kaiser Normalization.a

Support for Validity.

The factor analyses indicated generally good coherence of the factors assessed by Task 1: Planning, Task 2: Instruction, Task 3: Assessment, and the embedded categories of academic language (AL) and student voice (SV).

Threats to Validity.

None identified.

In document An Argument-Based Validation Study of the Teacher Performance Assessment in Washington State (Page 138-143)