Models for Items with More Than Two Response Categories

CHAPTER 5. SYNTHESIS AND CONCLUSION

A.2 Models for Items with More Than Two Response Categories

Polytomous data come from responses to items with more than two response categories. These include multiple-choice items, open-ended mathematics questions, Likert-type, ordinal items, rating-scale responses, and graded responses to test or survey questions. The type of IRT model used to describe the interaction between respondents and test items is dependent on the nature of the data that have been collected. Several models have been used for polytomous data, such as the partial credit model, the generalized partial credit model, the nominal response

150

model,and the graded response model. The characteristics of these models are described in van

der Linden and Hambleton (1997). We focus on the graded response model (grm) proposed by Samejima (1969) because it will be illustrated in the application.

A.2.1 The graded response model (grm)

The grm model is appropriate for items whose response categories are ordered. It describes the probability of scoring or selecting a score equal to k or higher. The response options include rating scale or Likert type categories such as strongly disagree, disagree, neutral, agree, and

strongly agree. For simplicity, we assume that all items have the same K number of unique

categories.

In the graded response model, a test or a survey item is supposed to have more than two categories that are dependent on one another and the successful accomplishment of one step requires the successful accomplishment of the previous steps (Reckase, 2008). The probability of accomplishing k or more steps is called the cumulative probability is assumed to increase mono- tonically with an increase of the hypothetical construct underlying the test, θ. This probability is typically represented by a normal ogive or a logistic model.

Let P (Yijk= k|θj, bik, ai)be the probability of receiving a score k. Let P (Yijk≥ k|θj, bik, ai) be the cumulative probability for k or more steps (the cumulative probability of scoring in

or above category k). Then P (Yijk = k|θj, bik, ai) is difference between of two cumulative

probabilities, for k or more steps and for k + 1 or more steps

P (Yijk = k|θj, bik, ai) = P (Yijk≥ k|θj, bik, ai) − P (Yijk≥ k + 1|θj, bik, ai) (A.11) where k = 1, ..., Kis the kth_{steps in K categories, P (Yijk} _{≥ k|θ}

j, bik, ai)is the cumulative prob-

ability of scoring in or above category k of item i given θj and the item parameters; ai as before

is the item slope; bik is the category boundary or threshold for category k of item i.

The normal ogive form of the grm model is given by

P (Yijk = k|θj, bik, ai) = √1 2π

ˆ ai(θj−bi,k)

ai(θj−bi,k+1)

e−t22 dt (A.12)

P (Yijk= k|θj, bik, ai) = P (Yijk ≥ k|θj, bik, ai) − P (Yijk≥ k + 1|θj, bik, ai) = 1 1 + e−ai(θi−bi,k) − 1 1 + e−ai(θi−bi,k+1) (A.13) The two forms are equivalent when the discriminations of the logistic model are multiplied by a constant of 1.7. In general the logistic model is more popular as it is simpler looking and easier to understand than the normal ogive for many people.

As previously discussed, an IRT model derives the probability of a response for a particular item in a survey or test as a function of the latent trait θ and the item parameters. We are interested in the probability of responding in a specific category. In the graded response model, the cumulative probabilities are modeled directly. This is the probability of responding in or

above a given category. Then the probability of responding in a specific category is modeled

as the difference between two adjacent cumulative probabilities. Let K note the number of response categories of item i. For simplicity, we assume that all items have the same K number of unique categories. Then there are K-1 thresholds between the response options. The cumulative

probabilities have the mathematical representation as in equation A.14

P (Yijk≥ 1|θj, bi1, ai) = 1 P (Yijk≥ 2|θj, bi2, ai) =

1 1 + e−ai(θi−bi2)

P (Yijk≥ 3|θj, bi3, ai) =

1 1 + e−ai(θi−bi3)

... P (Yijk≥ K + 1|θj, bik, ai) = 0

(A.14)

and these cumulative probabilities lead to the graded response model, or the probability of a

response Yijk = kto be

P (Yijk= k|θj, bik, ai) = P (Yijk ≥ k|θj, bik, ai) − P (Yijk≥ k + 1|θj, bik, ai)

= 1

1 + e−ai(θi−bi,k)

− 1

1 + e−ai(θi−bi,k+1)

(A.15) where k = 1, ..., K, P (Yijk ≥ k|θ_j, bik, ai) is the cumulative probability of scoring in or above

152

the category boundary or threshold for category k of item i. The cumulative functions for the

middle categories look very much like the 2PL model, except for multiple bik parameters.

Thus, equation A.15is the form of the grm model. The plots of the boundary probabilities,

P (Yijk ≥ k|θj, bik, ai), and the probabilities of responding at a specific category in an item,

P (Yijk = k|θj, bik, ai), are displayed in FigureA.5. They are referred to as the item operating characteristic function (OCC) and the item category characteristics functions (ICC), respec- tively. The OCC curves are the same as the two parameter logistic model for the dichotomous items. The top curves in the (OCC) specify the probability of a response in the categories above or below the threshold. The bottom curves (ICC) show the probability of each score categories 1, 2, 3, and 4 for a person at a specific θ level. The OCCs cross the .5 probability at the point equal to the step difficulty (threshold) and their slopes are steepest at that point. Although the two ICC curves (bottom) for the lowest and highest categories (1 and 4) cross the .5 probability

line (horizontal dotted blue line) at the item threshold b1 = −3.06, and b3 = 1.54, the curves

for the middle categories (2 and 3) do not necessarily correspond to the item thresholds. The

peaks of the curves do not have any obvious connection to the bik parameters. We can identify

which categories are less likely to be chosen from the ICC curves.

For polytomous items, the questions about the bik−parameters and the ai−parameters

should be: “What is the spread of the category difficulties?” and “How discriminating is each

item?” If the bik-parameters of an item are spread out, the item can measure across a wider

range of θ. If the locations are close together or span a narrow area, this item may not differen- tiate well among respondents across the area. Also low discriminating items have very flat ICC

curves. An example of item with low ai is shown in Figure A.6. The ai-parameter as before

indicates how steep the slope is, or how rapidly the response probability changes as attitude increases. The bik-parameters are the category thresholds where respondents at that attitude location have a 50% chance of choosing a designated option or higher.

The expected score on a test or item in a grm model, similar to the dichotomous items, is the sum of the products of the probability of an item score and the item score and expressed in

Figure A.5 OCC and ICC curves for an item with four response categories. The OCC curves (top) represent the cumulative probability functions crossing the .5 probability at

the step difficulty parameters (threshold) bik = −3.06, −0.91,and 1.54 (see the

light blue vertical lines). Each of the five ICC curves (bottom) represents the probability for each response category. The two curves for the lowest and highest categories (1 and 4) cross the .5 probability line (horizontal dotted blue line) at

the item threshold b1 = −3.06,and b3= 1.54. However, the curves for the middle

categories (2 and 3) do not intersect at the item thresholds. equationA.16.

Ei(Item Score) = E (yijk= k|θj, bik, ai) = K X

k=1

kP (yijk= k|θj, bik, ai) (A.16)

To recap, this section provides an overview of the characteristics of some common IRT models for dichotomous and polytomous data. Along with the item parameters, there are several statistics that describe the function of items and tests that are unique to IRT. The next section presents a summary of other descriptive statistics for items and test or instruments.

154

What is the spread of the category difficulties?

How discriminating is the item?

Figure A.6 Plot of Operating Characteristics Curves of a polytomous item. The slopes of the curves are fairly flat except for the last curve, indicating a low discriminating power.

In document Applying item response theory modeling in educational research (Page 162-167)