Contributions - Conclusion and Future Work

Chapter 7. Conclusion and Future Work

7.2 Contributions

This dissertation has addressed the more general problem of how to automatically generate high quality feedback from paired artefacts with the same referent. It defined such artefacts and their constituent features (chapter 3, section 3.3). It applied the paired artefacts approach to the design/implementation context using an assessment tool applied to a set of student submitted coursework. Each coursework consisted of two artefacts: a design diagram and its accompanying implementation. The tool generates formative feedback based upon the features contained in the artefacts. Features are labelled as being either consistent, superfluous or missing. Feedback positively reinforces consistent features whilst superfluous and missing features are reported as errors.

The dissertation also developed a method for evaluating formative feedback comments. Comments were evaluated by both the students and a team of expert markers. The experts compared human-generated with tool-generated feedback comments produced by the assessment tool while the students evaluated

feedback generated by the tool from an analysis of their submission. The evaluation showed that the feedback from the tool was widely regarded as good, if not better, than that produced by the human markers.

Consequently, the research contained in this dissertation makes the following significant contributions:

•_{It defines criteria for categorising automated assessment tools.}

•_{It presents a method for automating the assessment of design diagrams by} utilising both their implementations and established work that has identified known errors made by novice designers.

•_{It provides a definition of high quality formative feedback and presents a} novel and robust method for its evaluation.

•_{It presents the generic case by defining terms for multiple artefacts and} their assessment.

•_{It describes an automated assessment tool that generates formative} feedback.

7.2.1 Classification of Automated Assessment Tools

The dissertation has identified the core characteristics of tools that automate assessment (chapter 2, section 2.3). This is helpful when considering their adoption as many differ in their approach and the type of feedback generated. A categorisation of such systems was developed using three characteristics: the type of student submission (free or fixed form), the extent of the automation (fully or semi-automated) and the type of feedback generated (formative or summative). Automated assessment tools identified in the literature review were categorised according to these characteristics.

7.2.2 Automated Assessment of Diagrams

The dissertation has identified five challenges for the automated assessment of diagrams: the support for a student to draw and submit a diagram, the support for a tutor to submit a marking scheme, a mechanism to compare a student diagram with a model solution, a mechanism to cope with extraneous/erroneous data and a mechanism to provide feedback to the student (chapter 2, section 2.4.1).

The dissertation has presented a method for automating the formative assessment of student diagrams. The method adopts a blended approach through initially searching for typical errors in the student design before comparing the diagram with its implementation. One benefit of this approach is that it removes the need for a tutor-supplied model answer. Feedback on the comparison offers the student formative support when the development of their solution moves from high to low levels of abstraction. Two potential mechanisms for the comparison have been presented: design-centric and code-centric. The limitations of model differencing, reverse and forward engineering to compare artefacts have been highlighted. This is useful to those who wish to develop the mechanisms further.

7.2.3 Defining and Evaluating Good Quality Feedback

This dissertation has presented an approach to evaluating formative feedback that is both novel and easily transferable to other contexts. It required the development of two Likert-based questionnaires: one completed by a team of evaluators and one by a group of students. The evaluators were members of the computer science academic community. This enabled the perspective of both the suppliers and receivers of feedback to contribute to the evaluation.

Definitions for the quality, relevance and coverage of formative feedback comments were defined (chapter 5, section 5.6.1). From this, fourteen evaluative statements have been derived and formed the questionnaire completed by the evaluators.

Tool-generated feedback was compared with human-generated feedback because there are no metrics for objective measures of feedback quality. A bank of student coursework submissions was collated over several years. The bank was divided into two groups: one used for the development of the tool and one used for its evaluation (chapter 5, section 5.3). Dividing the submissions in this fashion ensured that, during evaluation, the tool had not previously seen the student submissions. It also ensured that the development of the assessment heuristic contained in the tool had not been informed by a student submission that was being used in the tool’s evaluation.

A random sample of human and tool-generated comments was sent to a team of evaluators who completed the Likert-based evaluative questionnaire. A comparison between human- and tool- generated comments was conducted which concluded that, on the criteria of quality, relevance and coverage the tool performs well in comparison with the human markers. On the criteria of relevance and coverage all evaluators rated the tool’s comments as higher or equal to those generated by an expert human (chapter 6, section 6.4).

The questionnaire used with the students focused upon how the feedback comments helped them with their learning. The tool’s feedback was received favourably by the students with most students either agreeing or strongly agreeing that it was helpful, clear, relevant and would help them improve their solution (chapter 6, section 6.5).

7.2.4 Multiple Artefacts: the Generic Case for Diagram Comparison

A novel framework has been developed for the generic case of comparing artefacts. An artefact has been defined as a set of features. Definitions have been provided for consistent and superfluous features (chapter 3, section 3.3). Consistent features have been used for positive reinforcement and superfluous features for where more learning is required. Comparing two artefacts requires

visiting each feature contained in the first artefact and comparing it with each feature in the second artefact. The results of the comparison produce a set of formative feedback comments for each artefact pair. The multiple artefacts approach contributes a new perspective to existing automated diagram assessment systems.

7.2.5 The Development of an Automated Assessment Tool

The efficacy of the multiple artefact framework has been demonstrated through a tool that provides a proof-of-concept implementation (chapter 4). The tool was applied to a set of student-submitted artefacts. It compared two artefacts and identified a set of differences and a set of similarities. When the two artefacts represent a design diagram and its accompanying implementation the differences represent errors in the submission. These errors have either been introduced by the implementation (extraneous) or are those features contained in the design that have not been implemented (omissions). The tool generated formative feedback for these features in addition to positively reinforcing the consistency similarities.

In document The Automatic Assessment of Multiple Artefacts: An Investigation into Design Diagrams and Their Implementations (Page 193-197)