Design & Implementation of SimplifIDE
15 Administering Practical
5.5 Method of Data Analysis
5.5.3 Data Analysis
The rules used for preparing data for analysis are discussed, followed by a description of the analysis techniques applied to the data.
Preparation of Data
Data collected is considered for discarding under the following rules:
• if a participant elects to withdraw from the study, conduct an interview and
then discard all quantitative data. During the interview, qualitative data is
collected regarding the reasons for withdrawing from the study and their attitudes regarding the designated PDE used.
• if a participant does not have a complete set of data, discard all quantitative
data. A data set may be incomplete due to a number of reasons, such as the
participant being exempted from the course due to passing a competency test or the participant cancelling registration for the course.
• if a participant was administered any supplementary assessment, discard all
quantitative data. Participants may be allowed to have supplementary
assessments administered in the event of missing allocated assessments due to valid reasons. In this case, data is discarded due to the participant having has additional time to prepare before the assessment and due to the assessments potentially being on different content or difficulty.
• if a participant makes use of the PDE from a different experimental group,
then discard all quantitative data. This case can occur when participants
install the designated PDE at home and distribute the software to another participant from a different experimental group. Participants are questioned about their usage of software and those admitting to using the incorrect software are removed from the experiment.
Participants not admitted to the final examinations due to not obtaining a continuous assessment score of 40% or more or those admitted to the final examination, but who do not write the examination due to various reasons, are not excluded from the experiment. In such cases, an examination and final grade are calculated using Excel’s FORECAST function (Microsoft 2006a). This enables such a participant’s
quantitative data to be included in the analysis of academic performance of the experimental group.
Quantitative Statistical Techniques
In this section, the statistical techniques applied to the data collected in the study will be described, after which the test statistics are specified, as well as the statistical techniques applied to them. Two statistical techniques are appropriate in the analysis of the test statistics for the current study (Dunn 2001) namely:
• pooled-variance two-tailed t-test for independence of groups (referred to as the t-test for convenience);
• χ2
test for homogeneity of proportions using a contingency table for testing for equality of proportions between groups (referred to as the χ2
-test for convenience); and
• multiple regression using sigma-restricted parameterisation (referred to as multiple regression).
The t-test is commonly used for hypothesis testing when the parameters of the population of interest are not known and the sample size is relatively small. The population means are compared and should the difference between the respective means be equal to 0, then the populations are not independent of one another (Dunn 2001). The test for independence of groups is tested at the 95% percentile (α= 0.05). If the total sample size is less than the recognised large sample size, then the assumption of normality needs to be assessed.
The assumption of normality and equality of variances in the experimental groups can be tested by calculating zskewness and zkurtosis and determining if they fall within ±1.96
(Hair et al. 1998). Once the assumption for normality and equality of variances has been established, the t-test may be applied to the sample data.
The χ2-test is used to determine whether there is a significant difference in the population proportions of sample data containing two or more categories, for example percentage of participants who disagree, are undecided or agree with a given statement. The equivalence of the proportions is tested at the 95% percentile (α= 0.05).
Multiple regression is used to determine if there is a significant relationship between one or more predictor variables and a dependent variable. If a significant relationship is determined, a formula that predicts the dependent variable given the predictor variables that significantly contribute to the dependent variable is produced. Sigma- restricted parameterisation is used for variables that have two values (e.g. using SimplifIDE or using Borland© DelphiTM). One value is encoded as +1 and the as -1 to ensure that the sum of the encoded values sum to zero. A supporting statistic, namely the multiple-R2 value, is calculated to establish how much of the variance in the dependent variable is accounted for by the predictor variables. The higher the multiple-R2 value, the more the predictor variables account for the variance in the dependent variable. The prediction relationships are tested at the 95% percentile (α=0.05).
Quantitative Statistics
There are a number of quantitative test statistics applicable to the current study, which can be categorised into those measuring academic performance, perceptions and programming behaviour. Academic performance is measured using the grades obtained from pen-and-paper and practical assessments and final grades. Perceptions are measured in terms of proportions of answer ratings to questions in the weekly questionnaires, while programming performance is determined by analysing the event data captured during programming sessions.
Academic performance makes use of a computed mean (the average mark) obtained by participants and a computed proportion (the percentage pass or throughput) of participants for the pen-and-paper and practical assessments. For the computed average marks, the t-test is used to determine whether the samples are identical or not, while the χ2
-test is used for the computed throughput proportion. Multiple regression is used to determine whether there is a significant relationship between the flow chart and pseudo-code assessment and PDE used (predictor variables) and the final score obtained (dependent variable). The t-test statistics and multiple regression test match hypotheses H0.1.1.1…H0.1.1.4 for the computed mean and hypotheses H0.1.2.1…H0.1.2.4 for
the throughput proportions.
Novice programmer perceptions are analysed by using the χ2
-test. If the designated PDE has no effect on the perceptions of the novice programmer, then there will be no
significant difference in the proportions of responses to questions in the questionnaires.
The Lickert scale values obtained as responses to questions in the questionnaires are converted from a five point scale to a three prior scale prior to the χ2
-test being applied (Figure 5.6). This is done in order to make interpretation of the proportions more readily interpretable.
Responses that strongly disagree or disagree with a given statement are grouped into a disagree response, while responses that agree or strongly agree with a given statement are grouped into an agree response (Dunn 2001). Once obtained, the frequency counts for each response used in the calculation of the χ2
-test. In the event of a significant difference being detected by the test, it is more readily possible to determine if a group’s responses were more in agreement, disagreement or undecided for a given statement.
Strongly
Disagree Disagree Undecided Agree
Strongly Agree
Disagree Undecided Agree
1
2
3
4
5
Questionnaire Responses
Recategorised Responses
Figure 5.6: Recategorisation of Lickert Scale Questionnaire Responses
Programming behaviour is stored as a time sequence of captured events (Section 5.3.2.5). From the events, test statistics are derived, namely:
• mean number of key presses between compiles/runs; • mean number of errors between compiles/runs; • frequency of compiles/runs;
• compilation event pairs time; and • compilation event pairs proportion.
Each of the test statistics above is analysed using the t-test, except for the compilation event pairs proportion which is analysed using the χ2
-test. The test statistics match hypotheses H0.3.1…H0.3.6.
Compilation event pairs examine the relationship between successive pairs of compilations (Jadud 2004). Jadud (2004) indicated that if a compilation ends in a syntax error, it is labelled as F, otherwise as T. This leads to four combinations of pairs, namely T→T, T→F, F→T and F→F (Figure 5.7).
T
F
TT
FF
FT TF
Figure 5.7: Diagram of Compile Event Pairs
The ideal situation is T→T in which a program containing no syntax errors is modified by the programmer and upon the next compilation no syntax errors were introduced. The compilation event pair T→F is where a program containing no syntax errors is modified by a programmer and syntax errors are introduced. The compilation event pair F→T is where a program contains syntax errors, but after modification by the programmer, there are no syntax errors. Lastly, the F→F compilation pair is where a program containing syntax errors is modified by a programmer, but upon compilation there are still syntax errors remaining (although not necessarily the same as the initial ones).
Ideal programmers would spend the majority of their time between T→T compilation event pairs, if it is assumed that ideal programmer do not often make introduce syntax errors into their code (Gugerty & Olson 1986). In the event of syntax errors being introduced (T→F) the fastest correction of these would lead to the F→T compilation
event pair. In other words, ideal programmers would have no F→F compilation pairs, with negligible, but equal T→F and F→T pairs and the vast majority of time spent in the T→T compilation event pair state.
Additionally, the time between compiles in each compilation pairs is also useful for examining programmer behaviour. The more time a programmer spends making modifications to a program, the larger the chance becomes of introducing syntax errors.
Excel (Microsoft 2006a) and Statistica (StatSoft 2005) are the software tools to be used in performing the quantitative statistical tests. The following section discusses the risks to the current study and the measures in place to offset these risks.