The cumulative distribution - Visual representations of equality of condition

3. Proposed operationalisation of equity measurement

3.2 Visual representations of equality of condition

3.2.2 The cumulative distribution

Another well-known visualisation of equality of condition is the cumulative distribution function (CDF). The CDF is defined as the cumulative proportion, or frequency, of a population that has attained a certain level of educational output or less. It provides a direct method of measuring the proportion or number of students who have achieved up to a certain score and plots that proportion or number for every possible value of the assessment. For example, the CDF can be used to determine what percentage of the population scored below the lowest score on a standardised test, such as 400 on the SAT higher education entrance examination in the United States.12

Oman Canada

Figure 3.2 Empirical distributions of PIRLS test scores for Canada and Oman

−350 −250 −150 −50 50 150 250 350

PIRLS scaled score

Note: PIRLS test scores are mean-centred.

Figure 3.3 plots two cumulative distributions of a hypothetical learning assessment with possible scores ranging between 200 and 500, under two different distributions. These are not based on real data but rather are generated under different distributional assumptions for illustration. The figure in Panel A displays the CDF of a test score distribution that follows a Gaussian distribution function, with a mean of 400 and a standard deviation of 20. We can see in this case that 50% of the population have a score less than or equal to the mean score. However, it is unclear whether this indicates a high or low degree of inequality in terms of test score performance. Panel B displays the CDF of a test score distribution that is positively skewed, meaning that the majority of students have low test scores while a small number of students have much higher test scores.13_{The CDF of} the skewed distribution shows that the mean is much closer to the lowest obtainable score and that 70%

13 The test score distribution in Panel B is generated following a distribution with a mean of 225 and one degree of freedom. 14 Pen’s Parade is a concept created by Jan Pen (1971) to describe the income distribution in an economy.

of the population have a score lower than the mean. This distribution clearly shows that, in terms of test score performance, the condition of the students is relatively unequal and illustrates the concept of Pen’s parade.14

In addition, cumulative distribution graphs provide a simple representation of perfect equality of

condition: in the case of learning assessment scores, perfect equality refers to a distribution where all students received an identical score and, as such, is represented by any straight line that is parallel to the horizontal axis. In other words, 100% of the population achieve the same outcome. This line can serve as a benchmark to compare the empirical distribution to a hypothetical perfect equality line. The larger the distances between each point on the CDF and the perfect equality line, the greater the degree of inequality in learning outcomes.

500 400 300 500 200 225

Figure 3.3 Hypothetical cumulative distributions under different distributional assumptions

1 1

0.5 0

Test score

Proportion of population Panel A: Test score Data Generating

Process (DGP) is Gaussian Panel B: Test score DGP is 2 ₍₁₎

Test score

Proportion of population

Note: Data Generating Process (DGP) means that we use different distributional assumptions for how the data are generated for the examples used throughout this chapter.

0.7 0

In the following example, we replicate the CDF analysis using data from Early Grade Reading Assessment (EGRA) administrations in Haiti and Uganda. As in Figure 3.3, Figure 3.4 includes two panels representing the CDF. We chose EGRA oral reading fluency results to illustrate the CDF’s usefulness because EGRA results typically follow positively-skewed distributions with a large portion of students receiving zero scores. Panel A of Figure 3.4 shows the CDF for Haiti where about 23% of the test-takers received a score of zero. At the same time, 64% of all test-takers are able to read 24 words correctly in a minute or less, which is also the mean oral reading fluency score in Haiti. In general, when more than one-half of the sample population’s outcomes are lower than the mean, this indicates a relatively high degree of inequality. Thus, the higher the proportion of the population that is below the mean, the higher the degree of inequality. As an extreme example, the highest possible inequality would be a case where nearly all students receive the lowest possible score and only one student has

a non-zero result, at which point almost 100% of the population have a score lower than the mean. In Uganda, 83% of all test-takers are able to read correctly four words per minute or less, which is also the mean ORF score for the entire population. This result suggests two straightforward conclusions. First, Uganda exhibits a high degree of inequality with the vast majority of the student population scoring below average. And second, Uganda exhibits a higher degree of inequality than Haiti based on these results because a higher proportion of its population scores below average.

In document Handbook on Measuring Equity in Education (Page 51-53)