Data analysis procedures - Original: I plan to attend every statistics class session

RESEARCH SETTING AND METHODOLOGY

36. Original: I plan to attend every statistics class session

4.6.7. Data analysis procedures

Some comparisons were done for each component factor, as well as cross-sectional comparisons within the institutions. The findings of the analysis were presented in the form of graphs, tables and summarising reports. This section focuses on information gathered from the method of observation, comparing the observed with the expected findings, and multivariate analysis.

4.6.7.1. Descriptive analysis

For each variable, frequency (total scores), percentage, means, standard deviation, lowest and highest possible values, and some figures are presented. Given that each factor was summarised per component, the comparisons were done in each set of data, for both universities.

4.6.7.2. Multiple comparisons

Although there are many different statistical techniques available in the SPSS package, only three (3) main techniques were applied to compare the mean scores for the respondents and variables across universities. Independent samples t-tests were used for only two groups, the analysis of variance (ANOVA) techniques were used in three or more groups, and the chi-squared was used to cross-tab each variable across universities.

The observations that make up the data have to be independent of one another;

therefore, each observation or measurement should not be influenced by any other observation. Violation of this assumption, according to Schau et al. (1995), is very serious. If Levene‟s test of homogeneity is greater than .05, the first line in

the table of the independent samples t-test, which refers to equal variances, is assumed. In case Levene‟s test of homogeneity is less or equal to .05, the variances for the two groups are not the same; therefore, the data violate the assumption of equal variances. The second line of the t-test table, which refers to equal variances not assumed, must be used (Zimmerman, 2004; Garson, 2012).

For the t-test, the procedure for calculating Eta squared (R²), is as follows:

R²(Eta squared) = t² / [t² + (N₁ + N₂ – 2)] where t is the score of … from the t-test table, N₁ is the size of the first group, and N₂ the size of the second group.

It is assumed that the population, from which the samples were taken, were normally distributed, according to parametric techniques. The two data sets were assumed as Normal distribution. With a p-value of less than, or equal to .05, the null hypothesis is rejected, indicating that at least two of the three group populations‟ means were equal. Therefore, the differences in mean are caused by the induced variable, and among the three means, at least two of the means differ.

Again, there is a need to look at which groups differ. For the one-way ANOVA, there were the deviations of individual scores from the overall mean of the data, into the deviations of the group means from the overall mean, and then the deviations of the individuals from their group means. Obviously, the tests of equal variances are based only on the values in this one experiment. R² (eta squared) is Effect Size or Size of Effect, and provides an indication of the magnitude of the differences between groups [not just whether the difference could have occurred by chance, or due to some external factors] (Field, 2009). It is calculated from the ANOVA table, equalling the sum-of-squares between groups, divided by the total sum-of-squares. The standards for interpreting this value are .01=small effect, .06=moderate effect, and .14=large effect (Cohen, 1988).

For the combined data of both universities, cross-sectional analysis was applied, which involved the observation of all the variables, at one specific point in time, or without regard to the differences in time. This method consists of comparing the differences observed across each instrument and for each variable (Frankfort-Nachmias & (Frankfort-Nachmias, 2007). These include items, components and overall

components or factors. The difference between the scores of items, reveal the level of improvements realized for every category of instrument (component).

However, if the difference is negative, the quality of the score is relatively poor for that particular item, or component (Wilson, 2004). The change in the score structure, during the survey could be characterized by a gradual decline in relative outcome, for instance, of the young or older students‟ age (Considine & Zappalà, 2002; Lee & Burkam, 2003). The comparison indicates the differences in scores from data, broken down by background information (for example, gender, student status, marital status). Apart from differential under enumeration in various individual characteristics, the comparison helps to highlight whether the data suffers from distortion, due to social, cultural and legal habits, as well as norms observed in an academic environment in South Africa (Nkabinde, 1997; Ball, 2006). The change describes the variation between the current score and the previous score, at a point in time. If the value of the variation is positive, it indicates improvement; while, if the value of the variation is negative, it indicates that deterioration has occurred (Nkabinde, 1997).

4.6.7.3. Multivariate analysis

The ordinal regression method was used to model the relationship between different levels of self-efficacy to learn statistics, regarding the learning ability to apply statistics in academic research, and the explanatory variables, concerning demographics, emotion, behaviour and the students‟ learning environment at UCT and UWC. The major decisions involved in the model building for ordinal regression were, deciding on the explanatory variables to include in the model, and choosing the link function, for example, logit link, probit link, negative log-log link, complementary log-log-log-log link and Cauchit link, which demonstrated the model‟s appropriateness (McCullagh, 1980). In addition, the model fitting statistics, the accuracy of the classification results and the validity of the model assumptions, for example, parallel lines, were essentially assessed in order to select the best model (McCullagh, 1980; Goldstein, 2011).

The outcome variable for students‟ self-efficacy to learn statistics was measured on an ordered, categorical, six-point Likert scale, ranging from „no confidence at

all‟, to a „little confidence‟, „a fair amount of confidence‟, „much confidence‟,

„very much confidence‟ and finally, „complete confidence‟. It is implausible to assume the normality and homogeneity of variance for ordered categorical outcomes (Elamir & Sadeq, 2010). The test of normality was not significant;

therefore, the ordinal regression model becomes a preferable modelling tool that does not assume the normality and constant variance, but requires the assumption of parallel lines across all levels of the categorical outcome (Elamir & Sadeq, 2010).

Explanatory variables include seven demographic levels, namely, gender, age, ethnic group, marital status, postgraduate programme, student status and type of study; 6 items related to the level of experience in research methodology and statistics; 51 questionnaire items related to the statistical anxiety of STARS; 36 items of attitudes towards statistics; and 12 items of perceived social support.

The outcome variable for students‟ self-efficacy to learn statistics was measured on an ordered, categorical, six-point Likert scale, ranging from „no confidence at all‟, to a „little confidence‟, „a fair amount of confidence‟, „much confidence‟,

In document An analytical model for assessing the knowledge of statistical procedures amongst postgraduate students in a higher educational environment (Page 141-145)