Chapter V. Test Scoring. 2020.ppt

(1)

(2)

This chapter is designed to cover concepts related to

how tests are scored and interpreted, the process of

analyzing the importance and functions of the test in

education, analyzing the data, interpreting and using test

results. In the analysis process, it is pointed out that

descriptive statistics are of great help for the teacher to

summarize student scores in a comprehensible way.

Different methods of interpreting test results are outlined

in this learning activity. The unit ends with a brief

discussion on the different ways in which test results are

used by different stakeholders in the education process.

(3)

Scoring of Tests

The following guidelines are suggested to scoring tests:

 You must remember that multiple choice tests are difficult to design, difficult to administer, especially in a large class, but easy to score. The reasons for easy scorability of multiple-choice tests are because they usually have one correct answer which must be accepted across the board.

 Essay tests are relatively easy to set and administer, especially in a large class. They are, however, difficult to mark or assess. The reason is because easy questions require a lot of writing of sentences and paragraphs. The examiner must read all these.

(4)

Scoring of Tests

Cont.

 Scoring or marking on impression is dangerous. Some students are very good at impressing examiners with flowery language without real academic substance. If you mark on impression, you may be carried away by the language and not the relevant.

Scoring can be done question-by-question or all questions at a time. The best way is to score or mark one question across the board for all students. Sometimes, this may be feasible and tedious, especially in a large class.

(5)

USING TEST RESULTS

As earlier mentioned, conducting tests is not an end in itself. However, before tests could be used for those purposes, the teacher needs to know how well designed the test is in terms of difficulty level and discrimination power, then the teacher should be able to compare a child’s performance with those of his peers in the class. Occasionally, he may like to compare the child’s performance in one subject area with another.

To do this, he carries out the following activities at various times:

1. Item analysis.

2. Drawing of frequency distribution tables.

3. Finding measures of central tendency.

4. Finding measures of Variability 5. Derived scores

(6)

Item analysis helps to decide whether a test is good or poor, therefore item analysis is a process of examining class-wide performance on individual test items. There are three common types of item analysis which provide teachers with three different types of information:

 Item difficulty gives information about the difficulty level of a question.

 Item Discrimination indicates how well each question shows the difference (discriminate) between the bright and dull students. In essence, item analysis is used for reviewing and refining a test.

Analysis of Response Options - In addition to examining the performance of an entire test item, teachers are often interested in examining the performance of individual distractors (incorrect answer options) on multiple-choice items. By calculating the proportion of students who chose each answer option, teachers can identify which distractors are "working" and appear attractive to students who do not know the correct answer, and which distractors are simply taking up space and not being chosen by many students. To eliminate blind guessing which results in a correct answer purely by chance (which hurts the validity of a test item), teachers want as many plausible distractors as is feasible. Analyses of response options allow teachers to fine tune and improve items they may wish to use again with future classes.

(7)

Item Difficulty

Teachers produce a difficulty index for a test item by calculating the proportion of students in class who got an item correct. (The name of this index is counter-intuitive, as one actually gets a measure of how easy the item is, not the difficulty of the item.) By difficulty level we mean the number of students that got a particular item right in any given test.

For example, if in a class of 37 students, 24 of the students got a question correctly, then the difficulty level is 65% or 0.65 (24/37). The proportion usually ranges from 0 to 1 or 0 to 100%.

Example:

Interpretation: An item with an index of 0 is too difficult hence everybody missed it while that of 1 is too easy as everybody got it right. Items with index of 0.5 are usually suitable for inclusion in a test.

Though the items with indices of 0 and 1 may not really contribute to an achievement test, they are good for the teacher in determining how well the students are doing in that particular area of the content being tested. Hence, such items could be included. However, the mean difficult level of the whole test should be 0.5 or 50%.

Students A* B C D

Total of students Number of Students Choosing Each

Answer Option 11 0 1 1 13

(8)

Item Discrimination

The discrimination index shows how a test item discriminates between the bright and the dull students. A test with many poor questions will give a false impression of the learning situation. Usually, a discrimination index of 0.4 and above are acceptable. Items which discriminate negatively are bad. This may be because of wrong keys, vagueness or extreme difficulty.

Item Discrimination is the proportion of the better-prepared students who had the item correct minus the proportion of less-prepared students who had the item correct.

The following figure is an example of an item analysis for item 12 from a 20-item with four-option multiple choice test.

Students

A

B

C*

D

Upper 25%

0.20

0.00

0.80

0.00

Lower 25%

0.30

0.00

0.20

0.50

All students

0.26

0.00

0.38

0.36

(9)

Item Discrimination

None of the best prepared student chose either options B or D, whereas 80% of those students chose option C (the correct choice) and 20% chose option A. The next row of the item analysis shows the proportion of students from the lowest quartile (the 25% of the students who had the lowest total scores on the test). Of this group 50% chose option D, 30% chose option A, and only 20% chose the correct answer, option C.

The last row of the item analysis shows the proportion of all students from the class who chose of the four options. Therefore the item difficulty level is 0.38 (38% chose option C, the correct answer)

Note: Discrimination Indices range from -1.0 to 1.0.

Students

A

B

C*

D

Upper 25%

0.20

0.00

0.80

0.00

Lower 25%

0.30

0.00

0.20

0.50

All students

0.26

0.00

0.38

0.36

For determine item discrimination index it is the difference in the proportion of the better-prepared students who had the item correct as compare to the proportion of less-prepared students who had the item correct, for our example: Item discrimination Index = 0.80 – 0.20 = 0.60

(10)

Distribution and Measures

Central tendency

The central indicate that the data seem to cluster: Mean, median and mode

Measures of Variability

Indicate the degree of concentration data with respect to mean:

Standard deviation, coefficient of variation, range, variance, maximum and minimum

Point Measures (quantiles)

Divide an ordered set of data into groups with the same number of individuals:

Percentiles, deciles, quartiles, ...

Distribution

(11)

Measures of Central Tendency

Measures of central tendency provide information about the average or typical score in a data set. The most widely used and familiar average. The most reliable and the most stable of all measures of central tendency.

n

x





inches

n

x

89

.

29

7

625

7

40

70

100

150

140

80

45







Interpretation: The average height in inches is 89.29

(12)

Descriptive Statistics with SPSS vs 23

For the above example, follow the procedure for entering data into SPSS SPSS procedure: Create data file

Creating a new SPSS data file consist of two stages: • Defining the variables

• Entering the data

Defining Variable

Step 1. Click the Variable View in the lower-left corner of the data editor window (see the following figure). Type Name in the first cell under the Name column. Assign variable name based on your exercise that you want to analyze.

Height in inches 45 80 140 150 100 70 40

(13)

Descriptive Statistics with SPSS vs 23

Step 3. Go to analyze> Frequency and follow the steps in the figure

Step 4. Output Height in InchesStatistics

N Valid 7

Missing 0

Mean 89.29

Median 80.00

(14)

Measures of Central Tendency Cont.

Median: Is the middle score in a set of ranked scores. The scores that divides the distribution into halves. It is sometimes called the counting average.

Steps to computing the median

1. Line up scores from lowest to highest 2. Count up to middle score

• If there is 1 middle score, that’s the median

• If there are 2 middle scores, median is their average

Mode

Most common value. In the previous example (the height in inches), there is no mode, because nobody has the same height.

Me= 80 inches

Interpretation: 50% of the participants

(15)

Measures of variability

Indicate the degree of concentration data with respect to mean or how far away the measurements are from the center; special cases are:

Variance, standard deviation, coefficient of variation, range, maximum and minimum

Range

The range is the difference between the maximum and minimum values in a set:

RANGE = (X_largest– X_smallest)

Example

Data set 1: [1, 25, 50, 75, 100]; R: 100-1 = 100 Data set 2: [48, 49, 50, 51, 52]; R: 52-48 = 5

The range ignores how data are distributed and only takes the extreme scores into account.

Variance

(16)

Formula of Variance:

Standard Deviation

Shows the data scatter about the mean. The standard deviation (SD) quantifies variability. It is expressed in the same units as the data.

A small standard deviation means that the group has small variability or relatively homogeneous. At a distance of one half standard deviations of 68% will observations. At a distance of two half standard deviation of 95% will observations.

Example: A sample of 9 students is taken and its score is measured (0 - 100). You want to know the variability of this score.

X(score)

54

77

67

68

46

64

62

56

38

Interpretation: The variability of the score around mean is 11.98 ≈ 12. The standard deviation, usually accompanied by the mean, help to you know how a set of data values distributes around its mean. In our example you conclude that most score of the students in this sample are between 47.13 (59.11-11.98) of score and 71.09 (59.11+11.98) of score

Measures of variability

1 ) ( : , ) ( : 2 2 2 2     



n x x s Variance Sample N x Variance

Population  

11

.

59

,

98

.

11

51

.

143

51

.

143

2





s



x



(17)

(18)



Quartiles: Divide the data into 4 equal parts



Deciles: Divide the data into 10 equal parts



Percentiles: Divide the information into 100 equal parts

Defines the order quantile as a variable value below which is a cumulative frequency.

Special cases are the percentiles, quartiles, deciles,

Point Measures: Quantile

(19)

DERIVED SCORES

The Normal Distribution: A “bell-shaped” curve in which most of the scores are clustered around the mean; the farther from the mean, the less frequently the score occurs. Distribution characterized by a bell-shaped curve, and the:

mean = median = mode

Commonly Reported Test Scores Based on the Normal Curve

(20)

Z

Scores

• The most fundamental standard score, which is a simple

conversion of an individual’s raw score to a new score that has a

mean of 0 and standard deviation of 1.

• To compute a

Z

score,

subtract the mean from a

raw score and divide by the

standard deviation (SD)

• To convert a

Z

score back to

a raw score, multiply the

Z

score by the SD and then

add the mean

SD

M

X

Z



(



)

M

SD

Z

X



(

)(

)



•

Where Z = Z-score

•

X = any raw score

•

M = the Mean

•

SD = Standard Deviation

(21)

Example to converting a raw score to a z-score

Example: Let’s say an individual takes a Statistic exam, the data is

following:

Step:

1.Mean =

2.Variance=

3.Standard deviation =

4.Z-score=

1 ) ( , 1 )

( 2 ₂ 2 2

2      



n x n x s or n x x s

3

.

13

4

)

6

.

12

*

5

847

(

1

)

(

2 2

2













n

x

n

x

s

Studen t Raw Scor e (x)

(x)2

Z-score

1 15 225 0.66 2 10 100 -0.71 3 17 289 1.21 4 13 169 0.11 5 8 64 -1.26

63 847

(22)

Student

Raw

Score

(x)

Z-score

Probabilistic

Normal value

Percentile

1

15

0.66

0.7454

74.54%

2

10

-0.71

(1-0.7611)=

0.2389

23.89%

3

17

1.21

0.8869

88.69%

4

13

0.11

0.5438

54.38%

5

8

-1.26

(1-0.8962)=

0.1038

10.38%

Interpreting

From this example

we can see the first

student “1” that

individual who

scored a 15 on the

exam has a z-score

of 0.66. By

examining

probabilistic Normal

table you can see

that this student has

a value is 0.7454,

that mean has a

percentile score of

approximately

74.54%

(23)

Z

Scores - Example

Consider the

maximum scores obtained in in Inferential and

Research in the table beside. We cannot easily guarantee which of the subject was more tasking and in which the examiner was more generous. Hence, for justice and fair play, it is advisable to convert the scores in the two subjects into common scores

(Standard scores) before they are

ranked. Z score are often used.

Total Mean= 93 Total SD = 5.558

Student

Score in Inferential

Score in

Research Total Rank Z-Score

A 68 20 88 8 -0.900

B 58 45 103 1 1.800

C 47 39 86 9 -1.260

D 45 40 85 10 -1.440

E 54 42 96 3 0.540

F 50 48 98 2 0.900

G 62 30 92 7 -0.180

H 59 36 95 4 0.360

I 48 46 94 5 0.180

J 52 41 93 6 0.000

(24)

Z

Scores - SPSS

(25)

(26)

• Percentiles represent the point on the normal curve below which a

percentage of test scores is distributed.

• A student’s percentile rank on a test indicates the percentage of

students who scored lower in the comparison group.

For example, if a student is ranked in the 55

th

percentile, the

student’s score was 55% better than the comparison group who

took the test.

Percentile Ranks

This expresses a given score in terms of the percentage scores below

it i.e. in a class of 30, Emanuel scored 60 and there are 24 pupils

scoring below him. The percentage of score below 60 is:

(27)

Stanines

Stanines are standard scores

based on normalized z-scores.

They extend from 1 to 9 with a

mean of 5 and a standard

deviation of 2. Parents find stanine

results easiest to understand

because their child’s standardized

test scores are reported as:

Stanine Letter Grade Remark

9 A1 Excellent

8 A2 Very Good

7 A3 Good

6 C4 Credit

5 C5 Credit

4 C6 Credit

3 P7 Pass

2 P8 Pass

(28)

Stanines - Example

Student

Raw score

(0 to 30) Z-score Stanine A 15 -0.61 4 B 26 1.01 7 C 10 -1.35 2 D 17 -0.32 1 E 28 1.30 8 F 19 -0.02 5

Compare the z-score to the ranges of stanine scores.

Stanine 1 consists of z-scores below -1.75; stanine 2 is -1.75 to -1.25;

(29)

Normal distribution curve showing

(30)

Exercises

1. The following figure show the results from a Testing, Measurement and Evaluation exam. (Note, the correct answer for each item is noted with an asterisk). Analyze each item (difficulty and discrimination per each table)

a.Find the item difficulty and

interpret the result. b.Find the item

discrimination index and interpret the result.

Item 1:

Students A B C* D Upper 25% 0.25 0.35 0.4 0 Lower 25% 0.4 0.1 0.5 0 All students 0.3 0.2 0.45 0.05

Item 2:

Students A* B C D Upper 25% 0.75 0.05 0.15 0.05 Lower 25% 0.73 0 0.2 0.07 All students 0.74 0.02 0.18 0.06

2. Find and interpret the difficulty of the item for the following data

Note: B* is the correct answer

Students A B* C D

Number of Students Choosing Each

(31)

Exercises

2. If in a class of 9, the scores are 29, 85, 78, 73, 40, 35, 20, 10 and 5.

a.Find the measures of Central tendency and interpret. What measure is the most representative for this data and why?

b.Find and interpret measure of variability (standard deviation and coefficient of variation

c.Find and interpret Z score

3. (a) Find the mean and standard deviation for the following marks and interpret the result. (b) Find Z-score and interpret.

20, 45, 39, 40, 42, 48, 30, 46 and 41.

4. Explain why:

(32)

Exercises

5. Who student was better in a test. The table shows how is the performance for each students.

Student Mon Tue Wed Thu Fri

Samuel 20 21 22 20 21

Pheneas 30 15 12 36 28

a. Find the mean and standard deviation for Samuel and Pheneas b. Who is most consistent?

c. Who makes the most parts in a week?

6. Harry scored 55 in an English test for which the mean was 50 and the standard deviation was 6. He scored 64 in a Mathematics test for which the mean was 59 and the standard deviation 9.

a.Calculate his standardized score for each subject.

(33)

Exercises

Rules to interpret  Items with a

discrimination index of 0,40 or greater are very good items

 Items with a

discrimination index of 0,30 and 0,39 are quite good, but be worked on to improve

 Items with a

discrimination index of 0,20 and 0,29 are the ones to be corrected and improved

 Items with a

discrimination index of 0,19 or less are very weak and must be

removed from the test if they can not be corrected and improved.

7. Interpret the following table

Item Analysis of the Academic Achievement Test for Inferential Statistics Lesson

Item No

Difficulty (Pj)

Discrimi-nation (rjx) Item No

Difficulty (Pj)

Discrimi-nation (rjx)

1 0.24 0.25 21 0.20 0.24

2 0.78 0.37 22 0.62 0.56

3 0.19 0.13 23 0.21 -0.02

4 0.6 0.42 24 0.65 0.66

5 0.65 0.45 25 0.87 0.37

6 0.58 0.53 26 0.54 0.6

7 0.63 0.54 27 0.37 0.44

8 0.54 0.47 28 0.67 0.59

9 0.47 0.49 29 0.71 0.61

10 0.34 0.37 30 0.71 0.61

11 0.41 0.37 31 0.66 0.67

12 0.72 0.57 32 0.85 0.46

13 0.59 0.52 33 0.77 0.57

14 0.67 0.59 34 0.72 0.6

15 0.49 0.53 35 0.85 0.45

16 0.64 0.65 36 0.64 0.55

17 0.65 0.54 37 0.69 0.57

18 0.56 0.7 38 0.5 0.58

19 0.57 0.57 39 0.68 0.57

(34)