Analysis - Think DSP: Digital Signal Processing in Python

There are different types of reliability measures. These measures are estimated by different methods. The chief methods of estimating reliability measures are illustrated in table 1 below.

Table 1: Methods of Estimating Reliability

143

succession Split – half

method

Measure of internal consistency Give test once. Score t equivalent halves say od and even number items, correct reliability coefficie to fit whole test by Spearman-Brown formula Kuder-

Richardson methods

Measure of internal consistency Give test once. Score t test and apply kuder- Richardson formula

wo d nt

otal

3.2.1 Test Retest Method – Measure of Stability

Estimating reliability by means of test-retest method requires the same

test to be administered twice to the same group of learners with a given time interval between the two administrations. The resulting test scores

are correlated and the correlation coefficient provides a measure of stability. How long the time interval should be between tests is determined largely by the use to be made of the results. If the results of

both administrations of the test are highly stable, the testees whose scores are high on one administration of the test will tend to score high

on other administration of the test while the other testees will tend to stay in the same relative positions on both administration of the test.

Such stability would be indicated by a large correlation coefficient.

An important factor in interpreting measures of stability is the time interval between tests. A short time interval such as a day or two inflates the consistency of the result since the testees will remember

some of their answers from the first test to the second. On the other hand, if the time interval between tests is long about a year, the results will be influenced by the instability of the testing procedure and by the

actual changes in the learners over a period of time. Generally, the longer the time interval between test and retest, the more the results will

be influenced by changes in the learners’ characteristics being measured

and the smaller the reliability coefficient will be.

3.2.2 Equivalent Forms Method - Measure of Equivalence

To estimate reliability by means of equivalent or parallel form method involves the use of two different but equivalent forms of the test. The two forms of the test are administered to the same group of learners in

close succession and the resulting test scores are correlated. The resulted correlation coefficient provides a measure of equivalence. That is, correlation coefficient indicates the degree to which both forms of the test are measuring the same aspects of behaviour. This method reflects

the extent to which the test represents an adequate sample of the characteristics being measured rather than the stability of the testee. It

eliminates the problem of selecting a proper time interval between tests

144

as in test retest method but has the need for two equivalent forms of the

test. The need for equivalent forms of the test restricts its use almost

entirely to standardized testing where it is widely used.

3.2.3 Split-Half Method – Measure of Internal Consistency.

This is a method of estimating the reliability of test scores by the means

of single administration of a single form of a test. The test is administered to a group of testees and then is divided into two halves

that are equivalent usually odd and even number items for scoring purposes. The two split half produces two scores for each testee which

when correlated, provides a measure of internal consistency. The coefficient indicates the degree to which equivalent results are obtained

from the two halves of the test. The reliability of the full test is usually

obtained by applying the Spearman-Brown formula.

That is, Reliability on full test = 2 x Reliability on ½ test 1 + Reliability on ½ test

The split-half method, like the equivalent forms method indicates the extent to which the sample of test items is a dependable sample of the content being measured. In this case, a high correlation between scores on the two halves of a test denotes the equivalence of the two halves and

consequently the adequacy of the sampling. Also like the equivalent-

forms method, it tells nothing about changes in the individual from one

time to another.

3.2.4 Kuder-Richardson Method – Measure of Internal Consistency.

This is a method of estimating the reliability of test scores from a single administration of a single form of a test by means of formulas such as those developed by Kuder and Richardson. Like the spilt-half method, these formulas provide a measure of internal consistency. However, it

does not require splitting the test in half for scoring purposes. Kuder-

Richardson formula 20 which is one of the formulas for estimating internal consistency is based on the proportion of persons passing each

item and the standard deviation of the total scores. The result of the analysis using this formula is equal to all possible split-half coefficients for the group tested. However, it is rarely used because the computation is rather cumbersome unless information is already available concerning the proportion of each item. Kuder-Richardson formula 21, a less accurate but simpler formula to compute can be applied to the results of

any test that has been scored on the basis of the number of correct

answers. A modified version of the formula is:

Reliability estimate (kR21) = k

﴾1 – (m (k-m)))

145

Where

K = M = S =

K – 1 ks

the number of items in the test the mean of the test scores

standard deviation of the test scores.

The result of this formula approximates that of Kuder-Richardson formula 20. It has a smaller reliability estimate in most cases.

This method of reliability estimate test whether the items in the test are homogenous. In other words, it seeks to know whether each test item measures the same quality or characteristics as every other. If this is established, then the reliability estimate will be similar to that provided

by the split-half method. On the other hand, if the test lacks homogeneity an estimate smaller than split-half reliability will result.

The Kuder-Richardson method and the Split-half method are widely used in determining reliability because they are simple to apply.

Nevertheless, the following limitations restrict their value. The

limitations are:

They are not appropriate for speed test in which test retest or

equivalent form methods are better estimates.

They, like the equivalent form method, do not indicate the constancy

of a testee response fromday to day. It is only the test-retest procedures that indicate the extent to which test results are

generalizable over different periods of time.

They are adequate for teacher-made tests because these are usually

power tests.

In document Think DSP: Digital Signal Processing in Python (Page 78-89)