CHAPTER 5 SPATIAL ABILITY
5.2 Statistical constructs used in the chapter
The spatial ability tests generated quantitative data which required statistical analysis. Researchers use two statistical techniques to analyze quantitative data; descriptive statistics which allows researchers to describe characteristics of a sample, and inferential statistics which allows researchers to make inferences about characteristic of populations from which samples were selected (McMillan & Schumacher, 2001; van Lill & Grieve, 1990). The aim of this study was to describe characteristics of the sample, not to generalize findings to the population. As a result, the study employed descriptive statistics. The following sections describe theoretical constructs associated with descriptive statistics used in the study.
5.2.1 Types of data distribution
Most research deals with data consisting of more than one variable. When variables are plotted in a graph, data can make any of the following types of distribution.
Normal distribution: A distribution is normal if values cluster at the centre and spread uniformly towards edges (Fraenkel & Wallen, 1990). Figure 5.1 illustrates an example of a normal distribution.
Figure 5.1 Example of a normal distribution
Skewed distribution: A distribution is skewed if values cluster towards one of the edges. A distribution is positively skewed if values cluster towards the left edge, as illustrated in Figure 5.2.
Figure 5.2 Example of a positively skewed distribution
On the other hand, a distribution is negatively skewed if values cluster towards the right edge, as illustrated in Figure 5.3.
Figure 5.3 Example of a negatively skewed distribution
Sometimes distributions have outliers. McMillan and Schumacher (2001:167) define an outlier as a ―point that falls far outside the main distribution of scores‖; usually located beyond three standard deviations from the mean (I define standard deviation and mean in Sections 5.2.2 and 5.2.3, pp 111- 114). If there are clear reasons for a score to be too far from others (e.g. errors in marking), researchers may correct or drop the score. However, there is no consensus about what to do with
outliers if there are no clear explanations for the extreme values (McMillan & Schumacher, 2001). I included outliers when calculating measures of central tendency and dispersion (discussed below).
5.2.2 Central tendency
van Lill and Grieve (1990:41) define central tendency as a ―value which is central to a distribution which can be used to represent all the scores in a distribution‖. Three measures of central tendency are used to describe data.
Mean: This is the measurement of average values, obtained by summing all the values and dividing the sum by the number of values (Fraenkel & Wallen, 1990; Fraser, 1991).
Median: This is the middle score in a distribution (Fraenkel & Wallen, 1990), which is equivalent to a score obtained by an average person (Haslam & McGarty, 2003).
Mode: This is the most frequent score in a distribution (Fraenkel & Wallen, 1990; McMillan & Schumacher, 1993).
In a normal distribution, the three measures of central tendency assume the same value, located at the centre of the distribution (see Figure 5.4).
Figure 5.4 Measures of central tendency in a normal distribution
However, the three measures assume different values in skewed distributions. The mode is located at the peak, the median is located at the centre, and the mean is located towards the tail of the distribution (see Figure 5.5).
Figure 5.5 Measures of central tendency in skewed distributions (McMillan & Schumacher, 2001:161)
5.2.3 Variability/dispersion
Measures of dispersion help us understand the extent to which values in a data set differ from the central score (Haslam & McGarty, 2003; van Lill & Grieve, 1990). I discuss two types of dispersion, the first dealing with differences between high and low values in a distribution, and the second dealing with deviation of values from the mean.
Difference between high and low values: This measure has two types, namely the range and the
inter-quartile range. The range is calculated by subtracting the minimum value from the
maximum value to find the difference between these values. Outliers can affect the range of a distribution because only two values are used to calculate this measure (McMillan & Schumacher, 2001; van Lill & Grieve, 1990). Thus, this measure can be misleading.
The inter-quartile range is used to overcome the impact of outliers. This measure gives information about the difference between values located at the lower and upper quartiles in a distribution (van Lill & Grieve, 1990). Quartiles are values located ¼ and ¾ away from the minimum value in a distribution. That is, quartiles are located at the 25th and 75th percentiles. Lower quartile is the value located at the 25th percentile while upper quartile is the value located the 75th percentile in a distribution.
Seventy five learners formed a sample for the current study. When the learners‘ scores are arranged in order from minimum to maximum, the lower quartile is the score obtained by the 19th learner, the upper quartile is a score obtained by the 55th learner, while the inter-quartile range is the difference between scores obtained by the 19th and the 55th learners. The inter- quartile range indicates the difference of scores between 36 learners who obtained middle scores.
Deviation of each score from the mean: van Lill and Grieve (1990) discuss three types of this measure; deviation, variance and standard deviation.
Deviation is the difference between each score and the mean, obtained by subtracting mean
from the scores. Deviation becomes positive for scores larger than the mean and negative for scores lower than the mean.
Variance is obtained by squaring deviation of each value, summing the squares of deviations,
negative sign obtained for values lower than the mean. As a consequence of squaring deviations, variance is expressed in square units of a quantity being measured.
Standard deviation is obtained by taking square root of variance. Thus, standard deviation is
expressed in the same units as the quantity being measured (Haslam & McGarty, 2003). Standard deviation indicates the distance (on average) of the scores from the mean (McMillan & Schumacher, 2001).
5.2.4 Application of statistical analysis in this study
When discussing learners‘ performance in each test, first I discuss general performance of learners in each of the tests. Table 5.1 illustrates criteria used to describe learners‘ competence in the skills measured by each test, and to evaluate easiness of executing skills measured by each test based on learners‘ scores.
Table 5.1 Criteria used to categorise learners’ competence in skills measured by spatial ability tests6
Score (expressed in %) Learners’ competence in the skill Easiness of executing skills measured by the test
≥ 80 Very high Very easy
60 ≥ 79 High Easy
40 ≥ 59 Intermediate Moderate
20 ≥ 39 Low Difficult
0 ≥ 19 Very low Very difficult
< 0 Extremely low Extremely difficult
The table shows that I classify competence into three main categories; high competence for learners scoring 60% and above, intermediate competence for learners scoring between 40% and 59%, and low competence for learners scoring 39% and below. I argue that execution of skills measured by the test was easy for learners obtaining high scores, to be moderate for learners obtaining intermediate scores, and that learners obtaining low scores encountered more difficulties when executing these skills.
After discussing general performance of the learners in each test, I illustrate test scores in a bar graph to note the type of distribution, and to identify scores that fall away from the rest (i.e. outliers). All results gave skewed distributions. In the discussion that follows, I discuss measures of central tendency and variance for each test. For central tendency, I discuss both the mean and the median. I discuss the mean because its calculation uses all values in a distribution, and discuss the
6
Negative scores were possible because the formulae recommended for calculating learners‘ scores used both correct and incorrect marks.
median because it is the value located the middle in skewed distributions. I discuss both the range and the standard deviation to give an idea about dispersion of learners‘ scores.