Approaches to Testing and Decision Making

CHAPTER II. LITERATURE REVIEW

2.2 Approaches to Testing and Decision Making

Understanding where VL-CCT based on the Classical Test Theory psychometric model fits into the larger context of testing requires that the associated nomenclature used be crystal clear. The lack of a standardized nomenclature in testing literature has been

recognized (Thompson, 2007) and the use of acronyms that have common letters is frequent (e.g. CAT, CBT, and CCT), consequently, distinguishing among different types of test approaches can be challenging. Table 1 presents a proposed extended version of the nomenclature suggested by Thompson (2007) whose purpose is twofold: (1) to clarify test nomenclature for use in the remainder of the proposal and (2) to provide a framework for understanding what falls under the scope of the literature review, VL-CCT, and what does not.

2.2.1 Testing Attributes, Options, and Abbreviations

Commonly used terms such as computer adaptive testing (CAT) and computer based testing (CBT) have been criticized (Thompson, 2007; Rudner, 2009) for their lack the precision which has contributed to difficulties in distinguishing amongst different types of testing approaches and associated research. Every test has specific testing attributes with respect to length, deployment method, and goal. Each of these three attributes will be discussed in turn below.

Table 1. Testing Attributes, Options, and Abbreviations Framework

Testing Attribute Available Options Abbreviation

Length Fixed-Length FL

Variable-Length VL

Deployment Method Computerized C

Traditional T

Goal Ability Classification C

Ability Estimation E

The length of a test can be fixed or variable. A variable-length test, frequently referred to as an adaptive test, is any test whose length varies according to a pre-established set of rules that typically define the conditions under which the test terminates (e.g.

confidence in a classification decision or precision of an ability estimate) (Thompson, 2007). The goal of variable-length testing is to achieve the purpose of the test (e.g. a classification decision or point estimate of ability) more efficiently than fixed-length tests without

compromising reliability or validity. With fixed-length tests all examinees receive tests with the same number of items.

Table 2. Example Tests Associated with Combinations of Testing Attributes Length

Fixed Variable

Deployment Method Deployment Method

Goal Computerized Traditional Computerized Traditional Ability Classification Current test associated with Indiana University plagiarism tutorial Connecticut Mastery Test National Council Licensure Examination – Registered Nurses (NCLEX-RN) Many job interviews Ability Estimation Graduate Record Examination (GRE) Trends in International Mathematics and Science Study (TIMSS) Graduate Record Examination (GRE) Binet IQ test (Binet & Simon, 1905)

There is an issue with focusing on test-length instead of whether the test is adaptive or not – it is possible for a fixed-length test to be adaptive. For example, a test that selects items from an item-bank based on previous examinee responses but always presents the same number of items would be considered both fixed-length and adaptive.

Fixed branching approaches to adaptive testing (e.g. Linn et al., 1969) provide tests where every examinee responds to the same number of items but selection of items is dependent on examinee responses and the location of items in a pyramid or tree structure. Given that adapting a test is typically associated with increasing test efficiency (i.e. enabling tests to terminate once specific conditions are met) the negatives associated with the

(e.g. adding an item selection attribute with two options: flexible and inflexible) is viewed as greater than possible benefits that such an addition would provide.

Two primary methods for deploying a test are via computer resources or using traditional approaches (e.g. orally or using paper and pencil). Thompson (2007), in his effort to clarify nomenclature related to testing, suggests that variable-length testing requires the use of computing resources (Thompson, 2007). However, Thompson’s suggestion could confuse rather than clarify since examples of variable-lengths tests that have not used

computers to vary test length do exist. Early research on variable-length testing by both Binet (1905) and Hutt (1947) deployed variable-length tests where a human examiner took the place of the computer in selection of items to present to the examinee. The complexity of providing a variable-length test using traditional methods may make non-computerized deployment seem unreasonable. However, many job interviews and live performance tests are both variable in length and adaptive with on-the-spot human judgment applied to make decisions about what task or question the examinee should do next and when the test should end. Table 1 treats length and deployment method as two unique test attributes to avoid confusion and to leave room for the possibility of a variable-length test deployed via traditional methods.

The two primary goals of testing are classification or estimation of examinee ability (Rudner, 2009). Recall, classification decisions place an examinee into one of two or more mutually exclusive groups (e.g. master or nonmaster; basic, proficient, or advanced ability, etc.) based on an "absolute standard of quality" (Glaser, 1963, p. 519). A point estimate of ability, on the other hand, is based "upon a relative standard" (Glaser, 1963, p. 519) and

typically takes the form of a numerical score on a continuous scale (e.g. a score of 600 on the verbal component of the GRE).

The value of the framework presented in Table 1 can be illustrated by how it can be used clear-up confusion over the use of the term Computer Adaptive Testing. Welch (1997) asserts “Since adaptive tests are usually mastery-type tests, they are criterion-referenced as opposed to norm-referenced” (p. 9). Parshall and colleges (2002), on the other hand, contrast computerized adaptive testing (CAT) with computerized classification testing (CCT) by indicating that the former is focused on determining a point estimate of ability whereas the later classifies examinee ability into two or more categories. Thompson (2007) supports Parshall’s perspective when he associates CAT with point estimates of ability that are typically applied in norm-referenced testing.

Using the term computerized adaptive testing to refer to tests whose goal is a point estimate of examinee ability is common practice (e.g. Chang & Lu, 2010) but may cause confusion since no mention is made in the term to estimation of ability. A computerized adaptive test with classification of ability as the goal can be reasonably viewed as a CAT. In contrast, Table 1 abbreviations clearly delineates between tests with different goals: VL-CCT approaches have the goal of classification and can easily be distinguished from VL-CET approaches whose goal is estimation of ability.

2.2.2 VL-CCT Focus

The focus of the literature review and the subsequent two studies are on variable- length computerized classification testing (VL-CCT) approaches because they enable efficient classification decisions about learner knowledge that educators frequently must make. A focus on VL-CCT removes all but one of the eight combinations of length, delivery

methods, and goal options presented in Table 2 from the scope of this review. However, design choices amongst various components for the construction of tests that apply VL-CCT approaches provide an additional opportunity for narrowing the focus particularly with respect to the psychometric model and item bank.

In document Facilitating Variable-Length Computerized Classification Testing Via Automatic Racing Calibration Heuristics (Page 35-40)