Assessing Outliers on Performance, Read_in_Eng and Read_in

check for outliers. However, box plots are more useful for identifying outliers and for comparing distributions. Thus, to detect outliers for variables: Performance, Read_in_Eng and Read_in_L1 in this study and compare the distribution of scores of Sindhi and Urdu groups parallel box plots were created for each of these variables respectively. Figure 4.5 displays two Box plots for the dependent variable Performance with respect to Sindhi and Urdu participants.

122

Figure 4.1: Outliers on Performance of Sindhi and Urdu groups

Box plots in the figure above, show only one little circle with case (ID = 35) for Sindhi group, which can be referred to as an outlier. However, no outliers are displayed for Urdu students on performance. Pallant (2010) suggests if the outlier/s appear to be genuine score/s, then the researcher need to decide either to remove them from the data file or change the value to a less extreme value. Since there was only one case (participant) from the Sindhi group that appeared to be an outlier it was decided to delete that respondent from the data set. Additionally, half the Sindhi scores are between 12 and 16, whereas half the Urdu scores are between 14 and 19 (See Figure 4.1). This suggests that Urdu respondents’ generally got higher scores in the reading test than Sindhi respondents.

Likewise, the variables: Read_in_Eng and Read_in_L1 were also checked for outliers and distribution of scores of Sindhi and Urdu participants by producing box plots for each variable respectively. No outliers were detected for the variable ‘Read_in_L1’ either for Sindhi or Urdu groups; hence no changes were made to the data set for the variable Read_in_L1. Additionally, there was no difference between Sindhi and Urdu respondents regarding the distribution of scores of Read_in_L1 because half of Sindhi scores fell between 2.4 and 3.1, whereas half of Urdu scores were determined between 2.5 and 3.3.

For the variable Read_in_Eng one outlier was detected with respect to the Urdu group only, which was deleted from the data set (See Urdu boxplot in Figure 4.2).

123

Figure 4.2: Outliers on Reading Habits in English of Sindhi and Urdu groups

In the figure 4.2, there appears to be a slight difference between the distribution of scores of Sindhi and Urdu respondents: Half of the Sindhi scores fall between 3.3 and 3.9 whereas half of the Urdu scores are between 3.0 and 3.4.

Having dealt with the issue of possible outliers and the distribution of scores between Sindhi and Urdu participants with respect to each variable, it was now necessary to check for the assumption of normality for the dependant variable (Performance) before applying specific analyses to address the research questions. The following section discusses how the normality check was performed on Performance, before deciding for higher level statistical analyses.

4.6.1 Normality

Researchers have suggested two main methods of assessing normality: graphical methods and statistical measurements such as tests of skewness and kurtosis (Tabachnick and Fidell, 2007; Pallant, 2010). The authors argue that a number of statistical tests have the advantage to objectively assess normality but are disadvantaged by sometimes not being sensitive enough at low sample sizes or overly sensitive to large sample sizes. As such, with the large samples, observing the shape of the distribution by producing ‘frequency histograms’ is more useful than formal inference tests for assessing normality (Tabachnick and Fidell, 2007). Furthermore, graphical interpretation has the advantage of allowing good judgement

124

to assess normality in situations when numerical tests might be over or under sensitive, but they do lack objectivity. Hence, it is useful to check for the assumption of normality for dependent variable using both graphical and numerical methods (Tabachnick and Fidell, 2007).

In the present study, normality for the dependent variable Performance was assessed: first graphically and then numerically. Tabachnick and Fidell (2007) state that in normal distribution the cases are spread a bell-shaped curve from bottom left to top right in a diagonal line. If the cases deviate from normality they decrease the robustness of the statistical inference. Therefore, continuous variables should be assessed for normality before conducting specific statistical analysis (Tabachnick and Fidell, 2007).

Pallant (2010) argues that Histograms are useful to display the distribution of scores on the dependent variable to assess normality. Therefore, in this study, normality was assessed, by producing histograms for continuous variable Performance for Sindhi and Urdu students. Figure 4.1 shows the distribution of scores of the Sindhi and Urdu participants on the dependent variable Performance (English reading performance).

125

In figure 4.3, Urdu participants’ scores on Performance exhibit a reasonably normal distribution in a bell-shaped curve (See histogram for Urdu), where most scores are combined together in the middle and a small number of residuals are observed on the right and left ends (Tabachnick and Fidell, 2007; Pallant, 2010). However, the scores of Sindhi participants appear to be positively skewed on the variable Performance because most scores are combined together on left ends.

Moreover, in order to confirm further whether the distribution of scores of Sindhi and Urdu participants was normal or skewed two samples Kolmogorov-Smirnov test was run. This test compares the cumulative distributions of two data sets (Sindhi and Urdu) and is useful for checking the normality of continuous variables, ratio or interval data, where ties are rare. The results of this test suggested significant results for Sindhi (p = .006, p < .05) and Urdu (p = .009, p < .05) groups, which means that the two groups were sampled from populations with different distributions. Hence, the scores of both Sindhi and Urdu groups were not normally distributed on Performance. Therefore, it was decided to run nonparametric tests that don’t assume a Gaussian (normal) distribution, in order to address Research Questions Two (RQ2) to Four (RQ4).

In document Researching Sindhi and Urdu students’ reading habits and reading performance in a Pakistani university context (Page 136-140)

Assessing Outliers on Performance, Read_in_Eng and Read_in_L1