3. Literature review II – Feedback in medical education
4.11 Exploring conditions for effective feedback
My use of chi-squared analysis had indicated that there were few consistently significant relationships between certain assessment conditions (stage of training, modal assessment score, overall assessment judgement) and particular types of feedback being provided by assessors (e.g. positive feedback, negative feedback, specific comments on observed performance and so on). According to Glaesser and Cooper (2012), this is not uncommon when conducting traditional statistical analysis of coded data drawn from the naturalistic settings in which most social sciences research is conducted. As these authors point out, traditional statistical analysis is often too rigid to be able to reveal the meaningful but less absolute relationships between variables that are typically encountered in naturalistic social science settings. Instead, these authors have proposed an alternative approach to qualitative comparative analysis (QCA) first described by Ragin (1987, 2000, 2008). This approach offers the possibility of exploring coded data numerically but with due regard to the more ‘fuzzy’ nature of relationships that exist in the field in social science research. For this reason, I chose to employ a modified version of Ragin’s (ibid.) technique as described below.
4.11.1 Ragin's approach – necessity and sufficiency
According to Glaesser and Cooper (2012), Ragin's approach to QCA offers the advantage of retaining the causal complexity of the social world, in which factors which are interrelated can be treated as interrelated and interdependent, rather than being regarded as independent (which is often a requirement for conventional statistical analysis). These authors argue that, in the world of the social sciences, a particular outcome (e.g. educational attainment) may be causally linked to a number of conditions or predicates. Particular conditions, or combinations of conditions, may function either together or in isolation to bring about the outcome. Thus, these conditions may be regarded as 'sufficient' for the generation of the outcome. However, in complex social settings, other conditions or combinations of conditions may also be sufficient to bring about the same outcome. These alternative conditions may also be regarded as sufficient, but it is clear that neither set of conditions is 'necessary' in itself – if one sufficient condition (or set of conditions) is absent, the presence of the other would still bring about the outcome.
An example offered by Glaesser and Cooper (2012), taken from the work of Mahoney and Goertz (2006), is as follows:
Y = A*B*c + A*B*C*D
In this formula, the outcome of interest is represented by the upper case Y. The asterisk (*) represents the logical 'AND', while the plus (+) sign represents the logical 'OR' operator. In addition, in Boolean notation, capital letters signify the presence of a condition (e.g. A) whereas lower case letters represent the absence of a condition (e.g. c). The requirement for a particular condition to be absent is also referred to as a 'NOT' function. In this formula, the combination of conditions denoted by A*B*c is sufficient to bring about the outcome Y. However, A*B*c is not necessary to generate Y, as Y may also be brought about by the presence of A*B*C*D. Mahoney and Goertz’s (2006) example deals with combinations of predicates, but the logic still holds true for individual conditions that are capable of giving rise to a particular outcome.
4.11.2 Quasi-sufficiency and quasi-necessity
A second principle of Ragin's approach to QCA is that, in the naturalistic social setting, relationships between variables are rarely perfectly overlapping, but instead overlap somewhat imperfectly. These imperfect yet important relationships are dealt with using the concept of fuzzy logic, which was first proposed by Zadeh (1965). In traditional logic, relationships are deemed either to be true, or untrue, and are consequently assigned the value 1 or 0 respectively to indicate this. In fuzzy logic, relationships that are partially or mostly true may be represented by a number between 0 and 1, with the strength of the relationship increasing as the number approaches 1. Thus, fuzzy logic allows the researcher to analyse relationships between predicates and outcomes in terms of partial truth, and to reflect degrees of connectedness between the two, as opposed to being confined to the binary true/false decisions that are normally supported by conventional logic. This can be illustrated using Venn diagrams (see Figures 4.6 and 4.7), and the language used to describe such relationships refers to 'coverage', which, as Glaesser and Cooper (2012) identify, is analogous to the concept of variance which is found in regression analysis. Predicates and outcomes that
achieve sufficient coverage may be classed as quasi-necessary or quasi-sufficient, as long as they exceed threshold values. Typically, coverage of at least 0.8 (or 80%) is taken to be adequate for quasi-sufficiency (or quasi-necessity) to be declared, with 0.7 (or 70%) normally demarcating the lowest threshold.
These concepts of quasi-sufficiency and quasi-necessity were useful in my study, in which traditional statistical analysis had revealed few consistent relationships between particular conditions and the incidence of different types of feedback. However, in order to conduct Ragin analysis of these conditions, it was necessary to identify a suitable outcome for which the sufficiency and necessity of certain conditions could be tested.
(a) Sufficiency (b) Quasi-sufficiency
Figure 4.6 Venn diagrams illustrating the concept of sufficiency and quasi-sufficiency, after Glaesser and Cooper, 2012. O = outcome; A = condition. In (a), if A is present, then O occurs. In (b), if A is present, then O nearly always occurs.
(a) Necessity (b) Quasi-necessity
Figure 4.7 Venn diagrams illustrating the concept of necessity and quasi-necessity, after Glaesser and Cooper, 2012. O = outcome; A = condition. In (a), in order for O to occur, then A must be present. In (b), in order for O to occur, then A must almost always be present.
O
O
A
A
A
A
O
O
4.11.3 Choosing an outcome
A difficulty that arose when utilising Ragin’s approach was that there was no single outcome measure for which the potential necessary and sufficient conditions could be analysed. The outcome in which I was interested – 'good quality feedback' – was not a single entity, manifesting as the presence of a single type of feedback comment. Rather, it was a composite, determined by the analysis of the formative assessment literature, and consisted of the presence of certain types of comment (e.g. specific comments on observed behaviour), and the absence of certain other types of comment (e.g. personal comments). It was therefore possible to construct qualitatively a model of 'ideal' or ‘high quality’ feedback, which was comprised of a combination of features that were supported by the review of formative assessment literature as being educationally beneficial. These features were as follows:
The presence of:
positive and negative comments; specific comments on observed performance; linkage to the assessment criteria; specific suggestions for development,
and the absence of:
global feedback; personal feedback; general comments on observed performance; general developmental comments.
In using these parameters to filter the coded assessor feedback statements from all three years of assessment data, no examples were found. This was in itself a significant finding. It also created the further complication of identifying a level of feedback quality that would allow the database software to return some analysable results.
In attempting to find a standard that was suitably stringent and yet allowed for the analysis of assessor feedback comments, the requirement for general comments to be absent was dropped. This was due to the fact the assessment-related evidence had not shown general comments clearly to be directly linked to negative educational outcomes, to the extent that they may undo the work of specific comments. Thus, with the requirement for specific comments retained, the presence of general comments was ignored. The requirement for global comments to be absent was dropped for the
same reason. However, the requirement for personal comments to be absent was retained. This was due to the weight of evidence in the literature demonstrating the unhelpful nature of these types of comment, including their ability to divert the learner’s attention away from any task-related feedback, regardless of its quality. In applying these criteria to the second set of 500 feedback statements taken from the 2010-11 data, only 23 feedback statements met the criteria. Given that the sample of 500 statements was comprised of assessments for trainees from ST1-ST5 who had achieved a range of global outcomes and numerical assessment judgements, it was felt that further analysis of this subset of feedback statements by further dividing it by stage of training, modal assessment score, overall assessment judgement and length of feedback was likely to result in subsets of very small numbers of comment, which may not be usefully or credibly analysed by Ragin’s technique.
Similar numbers were found when applying the same filter to the data from 2011-12 and 2012-13, and so a still lower threshold for feedback quality was sought. Removing the requirement for specific (rather than general) comments on the observed performance resulted in an increased number of feedback statements which could be subjected to further analysis, and so the outcome of ‘high quality feedback’ was taken to be feedback which met the following criteria:
The presence of positive and negative comments (either specific or general) on the observed performance with specific suggestions for further development,
and the absence of comments at the personal level.
Feedback statements that satisfied these criteria were found in 41/500 (8%), 34/500 (7%) and 37/500 (7%) of the Rad-DOPS assessments sampled from 2010-11, 2011-12 and 2012-13 respectively.
4.11.4 Necessity or sufficiency?
Another consideration in applying Ragin's approach was whether to analyse the feedback data in terms of necessity or sufficiency. For example, Glaesser and Cooper (2012) had chosen to explore their data for sufficiency in the first instance, and presented their results accordingly. The relationship between sufficient conditions and
a given outcome is that the condition must be a subset of the outcome (as illustrated in Figure 4.6). Given the relatively small numbers of assessments that displayed the chosen outcome (‘high quality feedback’), it seemed unlikely that any of the four conditions chosen for analysis would be likely to overlap with it substantially enough to be labelled as sufficient or quasi-sufficient. Instead, I chose to begin Ragin analysis by examining the necessity of each chosen condition for the provision of ‘high quality feedback’. For necessity to occur, the outcome must be a subset of the necessary condition. Given the relatively small numbers of feedback statements that bore the chosen outcome of ‘high quality feedback’, it seemed most sensible to begin by examining the potential necessity, or quasi-necessity, of these conditions for the provision of ‘high quality feedback’. Once the necessity analysis was completed, it was relatively straightforward to check each of the chosen conditions for sufficiency as well.
As previously mentioned, the conditions that were analysed for potential necessity and sufficiency were: modal assessment score; global assessment judgement; stage of training; length of feedback. The results of this analysis are reported in the next chapter.