Modelling student performance in a tertiary preparatory course

(1)

Modelling student performance in a

tertiary preparatory course

Colin Stuart Carmichael

M.Ed, B.Sc(hons)

Submitted in fulfilment of Master of Philosophy

at the University of Southern Queensland.

(2)

Abstract

In this dissertation a review of the literature as it applies to the modelling of educational performance data is undertaken. Statistical linear models, including the novel Beta, Tweedie and Tobit regression models, are then ap-plied to the performance data of students who have undertaken a preparatory mathematics course. These models are then critically reviewed and compared with the commonly used standard linear regression model.

Issues that arise from the application of statistical linear models to ed-ucational performance data are then explored. For example, the effects of non-Normality, which characterizes educational performance data, and the presence of large numbers of students who fail to complete the course (a characteristic of this particular context), are examined and reported. Both of these effects can violate the underlying assumptions of the standard lin-ear regression model. Simulation studies are then used to assess the ap-propriateness of the linear model when it is applied under the condition of non-Normality and the presence of large numbers of missing observations.

Findings from this study indicate that issues relating to model effec-tiveness are clouded in the educational context by typically large values of the error variance (high noise) and the difficulty in finding suitable perfor-mance predictors. Educational models of perforperfor-mance typically lack statisti-cal power, so that in many instances it doesn’t matter what model is applied to the data. Nevertheless, the study highlights many reasons why models alternative to the standard linear regression model should be applied to such

(3)

ii

(4)

Certification of dissertation

I certify that the ideas, experimental work, results, analyses, software and conclusions reported in this dissertation are entirely my own effort, except where otherwise acknowledged. I also certify that the work is original and has not been previously submitted for any other award, except where other-wise acknowledged.

Colin Stuart Carmichael Date

Endorsement

Peter Dunn Date

Janet Taylor Date

(5)

Acknowledgements

The completion of this project would not have been possible without the generous support of the following people.

• My supervisors Peter Dunn and Janet Taylor.

• My wife Patricia and son Stephen for their continued support.

(6)

List of Tables

4.1 Highest level of education . . . 66 4.2 Assessment tasks and weights for overall course score . . . 69 4.3 Total number of assessment items completed . . . 71 4.4 Cross tabulation of predicted counts against observed counts

for the ordinal model of progression . . . 98 5.1 Effectiveness of the linear model for different degrees of

skew-ness and sample size . . . 110

(10)

List of Figures

2.1 Taxonomy of individual difference constructs [91] . . . 11 2.2 Typical achievement distribution . . . 33 3.1 Three gamma distributions shown on the top graph and the

typical results from an ordinal model on the lower graph . . . 46 3.2 Two beta distributions (α = 6, β = 2) and (α= 0.5, β = 1.5) . 50 3.3 Simplex distributions (σ2 _{= 2) . . . 51}

3.4 A Tweedie distribution (µ= 7.8, p= 1.14, φ= 6.4) . . . 56 3.5 Achievement distributions of student sub-groups together with

achievement distribution for overall group . . . 60 4.1 Distribution of M-test results (out of 38) shown on top graph

and distribution of time since last studied maths on the lower graph . . . 67 4.2 Distribution of ages . . . 68 4.3 Distribution of student score by population sub-group . . . 70 4.4 Student score against M-test result for complete students, with

local fitted polynomial regression curve. . . 73 4.5 Student score against M-test result by semester, with simple

linear models of score against M-test result shown for each semester. . . 76 4.6 Distribution of M-test results by semester . . . 77

(11)

LIST OF FIGURES x

4.7 Plot of residuals against M-test results for the standard linear regression model (influential points numbered 1 to 5) shown on the top graph. Plot of empirical quantiles against theoretical quantiles shown on the lower graph. . . 79 4.8 Plot of residuals against M-test results for the beta regression

model. . . 81 4.9 Distribution of student scores for incomplete students with

Tweedie (p= 1.07) distribution also shown . . . 83 4.10 Plot of quantile residuals against M-test results for the Tweedie

model (influential points are labelled 1 to 5) shown on the top graph. Plot of residual quantiles against theoretical quantiles shown on the lower graph. . . 85 4.11 Comparison of Tweedie and Tobit models . . . 87 4.12 Models of achievement for all students and sub-groups, with

M-test results used as the explanatory variable. Model 1 is a linear model applied to all students; Model 2, a Tweedie model applied to incomplete students; and, Model 3 a linear model applied to complete students. . . 88 5.1 Application of the linear model to both high and low skewed

distributions. The 95% confidence interval is shown for the linear model as dashed lines and the true (generating) model as an unbroken line. . . 109 5.2 Distributions of sub-populations (in the ratio 1:1) and

com-bined population N = 1000 . . . 115 5.3 Comparison of models applied to sub-populations and to a

(12)

LIST OF FIGURES xi

5.4 Distribution of simulated achievement scores for incomplete students . . . 120 5.5 Models applied to non-zero achievement data generated by a

Tweedie regression model. Graph 1 shows the standard linear regression model applied to all non-zero data and Graph 2, a beta model applied to the same subset. . . 122 6.1 Distribution of scores for each of the assessment instruments

Modelling student performance in a tertiary preparatory course