Discrimination and point biserial coefficient

6.3 Results and Discussion: AMS Version 1 CTT study

6.3.3 Discrimination and point biserial coefficient

The discrimination and point biserial coefficient values were calculated for the Version

1 AMS responses marked by the UHM, and the results are given in Table 6.2 and Figure 6.7 below.

Question Question Type Discrimination Point Biserial Coefficient

Q1 FRQ 0.50 0.51 Q2 FRQ 0.46 0.42 Q3 FRQ 0.06* 0.22 Q4 FRQ 0.39 0.37 Q5 FRQ 0.67 0.56 Q6 MRQ 0.75 0.64 Q7 MRQ 0.78 0.65 Q8 MCQ 0.26* 0.42 Q9 MCQ 0.47 0.55 Q10 MCQ 0.51 0.49 Q11 FRQ 0.53 0.50 Q12 MCQ 0.50 0.56 Q13 FRQ 0.76 0.66 Q14 MCQ 0.38 0.47 Q15 MCQ 0.81 0.78 Q16 MCQ 0.65 0.63 Q17 FRQ 0.49 0.45 Q18 MCQ 0.43 0.43 Q19 FRQ 0.58 0.58 Q20 FRQ 0.39 0.46 Q21 MRQ 0.81 0.72 Q22 FRQ 0.58 0.56 Q23 FRQ 0.29* 0.36 Q24 MCQ 0.58 0.44 Q25 FRQ 0.42 0.45 Q26 MCQ 0.67 0.62 Q27 FRQ 0.42 0.50 Q28 MCQ 0.78 0.65 Q29 FRQ 0.24* 0.24 Q30 FRQ 0.32 0.34 Q31 FRQ 0.74 0.62 Q32 FRQ 0.40 0.43 Q33 MRQ 0.82 0.74

Table 6.2: Table showing the discrimination and point biserial coefficient of each question on Version 1 of the AMS.

Figure 6.7: Graph showing the point biserial coefficient and the discrimination of each question on Version 1 of the AMS. The red horizontal line represents the lower bound of the acceptable values for point biserial coefficient, and the green horizontal line represents the mean value of the point biserial coefficient. The red vertical line represents the lower bound of the acceptable values for discrimination, and the blue vertical line represents the mean value of the discrimination.

The acceptable range of values for discrimination are [0.3, 1], and the acceptable range of values for point biserial coefficient are [0.2, 1]. From Table 6.2 and Figure

6.7, four questions had discrimination values that were outside the acceptable range; of these, two of the questions also had lower values for the point biserial coefficient.

In contrast, three questions had values for discrimination and point biserial coefficient that were at the higher end of the acceptable range of values. These cases are discussed

below.

Cases where the discrimination and/or point biserial coefficient were low

Q3 had a discrimination value that was outside the acceptable range of values,

whereas its point biserial coefficient was at the lower end of the acceptable range of values. The features and workings of Q3 were previously identified as problematic

due to its difficulty values also being outside the acceptable range. The item had almost no discriminatory power, and this is consistent with the fact that almost all test-takers, regardless of ability, got this question right. Additionally, the item had

lower alignment with the rest of the test, which could be a consequence of Q3 being

a new question added to the already well-established FCI. Several issues raised with Q3 through calculation of its difficulty, discrimination and point biserial coefficient

implied that the question required re-wording or removal to resolve the issue.

Q8 had an acceptable value for the point biserial coefficient, but its discrimina-

tion value was outside the acceptable range of values. Q8 was previously found to be too easy in the difficulty aspect of this analysis, which is consistent with its low

discriminatory power since test-takers of all abilities are able to get this question right.

Q23 had acceptable values for both point biserial coefficient and difficulty, so it was not a problematic item in these respects. In contrast, Q23 had a discrimination

value which was slightly below the acceptable range of values. Q23 was the question in which 20 participants gave up their attempt on Version 1 of the AMS; it was previously postulated in Subsection 6.3.1 that test-takers with a lower previous exposure

to physics gave up after answering this question, which would instead indicate that Q23 would have high discriminatory power. However, the discrimination statistic is

calculated using complete AMS attempts only, which means that the scores of the 20 students who gave up on their attempts were not taken into account for the calculation

of the discrimination value of Q23. As a result, since the students of lower abilities were not included in this calculation, the discrimination value for Q23 would be expected

to be lower by definition, because there was a smaller range of abilities to discriminate between ab initio.

Q29 had a lower (but still acceptable) point biserial coefficient value, but its dis-

crimination value was below the acceptable range of values. It is a free-response question, and it asks the test-taker to identify what happens to the speed of a box when

the force being exerted on it is doubled. Q29 is not a new question added to the AMS, since it was adapted from Q26 of the original FCI; it follows that Q29 would

be expected to align well with the rest of the AMS. The correct answer is that the speed increases, although the speed does not double, since force and velocity are not

related in this way. As a result, answers that state that the speed does double need to be marked as incorrect. Q29 did not have any issues with difficulty, as its difficulty

values were within the acceptable range. In addition, the wording of Q29 was also not likely to be an issue, since it was adapted from its original FCI counterpart, which had already been tested and validated. As a result, the discrimination issues with Q29

may instead arise from a factor which cannot be measured quantitatively with CTT

statistics. For example, the context of the situation presented in the question may be difficult for students from particular demographic groups to interpret.

Cases where the discrimination and/or point biserial coefficient were high

Q15 had the highest value for the point biserial coefficient out of all of the questions as well as having a high discrimination value, positioning it in the top right-hand corner

of Figure 6.7. Q15 was adapted from Q13 of the original FCI. It is a multiple-choice question, and it asks the test-taker to identify what forces act on a ball after it had

been thrown upwards. The effective performance of Q15 is in stark contrast to the ineffective performance of Q3, which is similar in conceptual content. One important

difference between Q15 and Q3 is that Q15 is multiple-choice (and very similar to Q13 in the original FCI), whereas Q3 is free-response, and this may have been a factor which contributed to the differences in the CTT statistics between the two questions.

However, the situation in Q15 involves the ball being tossed upwards, whereas the situation in Q3 involves the ball being dropped; this makes Q15 more conceptually

demanding than Q3, as it requires test-takers to do more than simply identify the name of a force. This offered a possible explanation for the differences in discriminatory

power of the two items, since only the higher-performing students would be expected to answer conceptually more demanding questions well. In addition, the fact that

Q15 was taken directly from the FCI offers a logical explanation to the differences in alignment to the rest of the test between the two questions, because Q3 was not

originally an FCI question.

Q33 had the highest value for discrimination out of all of the questions in addition to having a high point biserial coefficient value. Q33 was adapted from Q30 of the

original FCI. It is a multiple-response question, and it asks the test-taker to identify what forces are acting on a tennis ball while it is in flight. Q33 is different from the

other questions on the AMS because it asks the test-taker to take into account forces due to the air, whereas air resistance is supposed to be ignored for the other questions.

By looking through the responses given to this question, the high discrimination was a reflection of the highest-scoring students being those who realized that there is no

residual force acting on the tennis ball from being hit. It is also possible that since this was the last question on the AMS, less keen students stopped paying attention to this question and gave incorrect answers to it as a result, contributing the the observed

effect. The high point biserial coefficient could be a reflection of the content of the

question being highly consistent with the rest of the AMS.

Q21 was the third point in the top right-hand corner of Figure 6.7, showing that

it had high values for both the discrimination and the point biserial coefficient. Q21 is a multiple-response question adapted from Q18 of the FCI. The question asks the

test-taker to identify the forces acting on a boy on a swing while he is in motion. A few features of this question could explain the high discriminatory capabilities of

this question. First, as this is a multiple-response question, it requires students to identify more than one force in order to give the correct answer, which is a more

conceptually demanding task that only the higher-scoring students may be expected to complete. Second, the question features a diagram, and this may help medium-

level students to visualize the situation, and scaffold them towards the correct answer (Dawkins et al, 2017). However, the diagram may not offer much assistance to higher

or lower achieving students; this is because higher-achieving students do not require the scaffolding in order to provide the correct answer, whereas lower-achieving students

are not able to use the provided scaffolding to reach the correct answer. As was the case for Q15 and Q33, the high value for the point biserial coefficient might be expected

since Q21 was adapted from an FCI question.

Summary

Overall, 29 questions on the AMS had discrimination values that were within the

acceptable range of values. This meant that on the question level, 29 out of the 33 questions could differentiate between higher and lower performing students. For the

test level statistics, the mean value of the discrimination of the individual questions was 0.53, which was also within the acceptable range for the discrimination value.

Taking this together with the question-level findings, this implied that the overall AMS was capable of distinguishing between the higher-performing and lower-performing

students.

For the point biserial coefficient, all 33 questions had values that were within the acceptable range of values; practically, this means that on the question level, every

question on the AMS aligned to test similar concepts. For the test level statistics, the mean value of the point biserial coefficients of the individual questions was 0.52, and this was also within the acceptable range for the point biserial coefficient value.

Taking this together with the question-level findings, this implied that the overall

AMS contained questions that assessed similar topics. These results taken together with those for the difficulty and discrimination statistics provided important evidence

for the overall functionality of the AMS questions.

In document Establishing Physics Concept Inventories Using Free-Response Questions (Page 122-127)